Sept. 12, 2023

#214 AI Video Technology: A Deep Dive with Elai.io's CTO Alex Uspenskyi

#214 AI Video Technology: A Deep Dive with Elai.io's CTO Alex Uspenskyi

Ever wondered about the future of video editing? Join us as we promise to unravel the world of AI video technology with Alex Uspenskyi, the co-founder CTO of Elai.io. An unforgettable journey from web development to AI, he shares his vision of transforming video editing through AI avatars. Prepare for a thrilling discussion on the innovative video creation process, the potential impact of OpenAI's new fine-tuning technology, and the ethical considerations involved in AI video creation. You won't want to miss the in-depth exploration into the future of this groundbreaking technology!

 

As we continue our conversation, expect to be captivated by the mesmerizing world of AI-driven avatars and virtual reality. Discover the technology needed, its potential effects on podcasting and the importance of partnering with the right people and infrastructure. Alex also sheds light on how Elai.io has managed to grow without depending on venture capitalists. Furthermore, we dive into the growth and potential of Elai.io's AI-powered video technology, its use across various niches, and Alex's invaluable advice for entrepreneurs. So, buckle up for a thrilling journey into the dynamic world of AI video technology!

 

More about Alex:

https://www.linkedin.com/in/uspenskyi/

 

https://elai.io/

 

Transcript


0:00:01 - Mehmet
Hello and welcome back to a new episode of the CTO show with Mehmet. Today, I'm very pleased to have with me Alex. Alex, thank you very much for joining me. If you can, let me introduce yourself and tell us about what you are doing. 

0:00:16 - Alex
Thank you, mehmet, for inviting me. Happy to chat. So my name is Alex and I'm the co-founder and CTO at Elai.ioio. We are building a next-generation video editing tool to create videos powered by AI avatars. So the core of our technology is talking avatars, where you can actually write some text and choose the avatar. Avatar will say what you just wrote. You can choose language, you can choose voice and so on. So basically it's all about creating videos with avatars. 

0:00:54 - Mehmet
That's great. Thank you for being on the show. So, alex, my first question is really about your journey. You were in the web development and now you are a co-founder and CTO in Elai.io. So how did you decide at first to be in this domain and then why you decided to be in the AI space? 

0:01:22 - Alex
I started to work in IT maybe like 10 years ago and that was basically all my passion through all my career Because initially I started to work as a web programmer, web developer, so basically I never worked anything except creating software. I really like building products, and except of just working as a developer, I also tried to launch different businesses, startups. It was not that popular back then, but still I tried to launch some online businesses and maybe around six years ago I launched a software agency because I see a niche, I see a demand on that and basically I grew the web development agency to 40-50 people and it still exists. It's working, but still my passion was to create my own products and I never give up with this idea creating something new. 

And AI definitely is a game changer and I bElai.ioeved in AI even like five years ago when I started some other solutions, other companies, Basically because I see a really this is something that's really disrupt different industries, real world industries. 

So I never worked, for example, with crypto because it's not my type of product. I didn't see the case for myself in it. But in AI, I always seen a potential and I've seen different ideas and how really it can impact businesses. As a technical guy, I could imagine that and exactly about Elai.io. So I was looking for different ideas, working with different people, maybe like two, three years ago, and ended up with researching different technologies and, specifically, I found this technology to change the lips of the person according to audio and found that really cool, and I thought that it could be really beneficial to create educational videos, so when you don't need to actually film anybody but you can just type text, and this is how this idea started. So I talked to my co-founder, vitaly, about that and he also liked that and, yeah, we both agreed that it is a good idea and started to work on that, and this is how it turned into what we have right now. 

0:03:59 - Mehmet
That's great to hear about this. So it tested for the audience, because now AI is everywhere, so it's like an evolutionary platform converting or generating videos just from text. So how this technology works, if you can walk us through the process, and what makes your platforms stand from other platforms in the same niche. 

0:04:32 - Alex
Yeah, so the technology works. So it's different steps to create the video. First one is we need to create audio, so turn text into audio. For that, maybe your audience already know about platforms like 11labs or Azure Text-to-Speech, and we've got integration with more than five I think six already six platforms that create audio for us. So we don't specifically focus on audio because it's totally different AI technology that we don't want to spend time on Because then we'll lose focus from video creation. So as soon as we have audio which we use our partners for, then we need to create a video and for that we film real people. So all our avatars are made from real people, real actors, which actually behaves and they talk on the camera when we film them. And basically after that we just put audio on recorded video of the actor and this is new video when this actor is talking with an absolutely different voice and different language. And after that, of course, also a huge part of what we do is actually the video editing tool where we can add all other stuff on the video. So, except of the actor, you also need like text and images and other videos on this, and then we combine all this into into final video. 

Basically, and what specifically is interesting about Elai.io and it's pretty unique technology and our competitors don't have it is a story. We call it story editor or AI story builder. Basically, it's a tool which looks pretty same as medium text editor, so you just work with text and then we automatically turn this text into slides. So basically, we help editors to create visuals without even like working with visuals. So you, as an editor, you can just upload your text, you can copy past it or upload from any other document and we help you automatically put it into different screens and create the video. And it's also powered by GPT, openai technology. So basically, you can enhance or create from scratch your text and then turn into a video. So basically, fully automation process of creating the video. 

0:07:07 - Mehmet
That's great, alex, alex. Now the question is just maybe by the time I would be publishing this episode, it will be like maybe two weeks since now GPT or OpenAI they announced that now you can tune your own models. So how important this to help you actually in customizing your product in a better way, like how this would help not only from video creation perspective, but in general. Also, if you can shed some light, because people are very excited about this news. 

0:07:43 - Alex
Yeah, I also had a deep dive to this specific niche as well, to work with large language models, and this fan tuning is basically I would say it's not that game changing as it could seem initially, because as it's working right now. I just checked the email from OpenAI yesterday and I've seen how it works. So basically, initially you can do the same just with prompt engineering. So you create prompts with different steps and you basically help the system to learn about some business or help OpenAI model to understand what exact output you would need. So that was all possible before. The big thing here is that you wouldn't need to send all this information with each request to OpenAI. Now you can just do it initially and fine tune your own model and then any other requests you do, it will already know all this previous information that you fine tuned your model with. So it helps you to reduce tokens that you use for each request and basically for large scale model. 

It's different and it's different. It could be really game changer when you really need a large language model to know about specific niche or business or like specific use case. In our case, I didn't think yet about how it could impact our business because still like this. Openai is like 5% of what we do. Right, it's not our core technology, but still I think in future we might use it and we might fine tune the model to work better with video or like with specific niches and specific topics of the videos, for example. So it's developing pretty rapidly. So it's lots of things to be changed for sure, and we might use it as well in future. 

0:09:41 - Mehmet
Yeah, that's cool. Now, alex, one thing that people usually, of course these things are very cool, and maybe up to one year back, it was kind of, if I tell someone, hey, I will give a text to a software and it will create a full like video scene for me, maybe they will laugh at me. But now here we go and it's a reality. Now, one of the things that people are little bit worried about is, maybe from ethical perspective, about the misuse or maybe spreading misinformation. So how do you think, as someone like you are a builder, you are in this part of this revolution, I would say so how we can balance between creativity and, at the same time, keeping you know like this technology doesn't hurt someone, right? So what are your thoughts about this? 

0:10:39 - Alex
Yeah, so from the first days of our launch, we had basically, from my perspective, it's, of course, a part of responsibility on us as a service provider as well, and that's why, from the first days, we always thought about okay, how could people misuse this? And that's why we provided moderation of the content. Specifically, this is applied to all the videos that people create with our actors, because actors are real people and they don't want to be involved into some misusage of videos that were created with them as a people. And basically, if we're talking about actors that we have on our platform, we strictly we have a policy which doesn't allow our users to create any type of content, so it's specific case which doesn't. So we actually invested in that and so we have an auto moderation. Then we have manual moderation, which we basically control what kind of content people create so they don't misuse it. 

Also, if we're talking about custom avatars, which also a part of our technology, so, for example, you yourself can create the avatar with yourself, right, and then you can basically do more videos with yourself as you own your own face, right, and then you can create any kind of videos because it's your face and basically, you don't involve any actors. 

You don't hire anyone else and if we're talking about overall technology, of course I think that providers like us will control it and will make it not possible to misuse the technology. But of course, open source part is also going to be developed, and this is you cannot really control. So as soon as you use some open source tools, nobody can control you and you can do anything you want. So that could be maybe a part of YouTube or other platform that actually streamed the video for them to control the content and they will then see how this content is spread and what exactly is in the content. So I think all players on this like journey of the video creation, a video distribution, will take some part in this, and YouTube might add some AI sign on the videos when avatars use something like that. So I assume it's going to be in the future and it's going to be half of the videos with not real people and YouTube, for example, just mark this videos as artificial and it's going to work. 

0:13:34 - Mehmet
So they will tag. They will be able to put tags, maybe there. Actually, I've seen some watermarks, some providers. They put watermark and they keep this watermark on purpose, so it doesn't get used. Now, what are the industries that you are seeing? More adoption in AI, video generated content. Of course I've seen maybe education, but have you seen adoption in other industries as well? 

0:14:10 - Alex
So, as you mentioned, your correct, education is basically the biggest market. For now. That's adopted this technology already, so it's both like B2C education and also corporate education and it's already widely used in this sector. As a result of that, we also experiment a lot with news, for example, creating videos where there is no real human that is talking about some news that's happening in the world. 

And there are some other things which is not that wide, for example, like different applications or different use cases, when you need to make a personal video to somebody, and in this case you can use avatar instead of the real human because it will reduce your costs, and so on. So it could be, for example, healthcare with recommendations. It could be basically anything that doesn't need that, where you need the video, where you just need to tell about something, and in this case, avatars. Videos are pretty well for now. So for now, the only niche that you cannot really use avatars is where you need to give emotions through the video, because avatars still cannot give you emotions in the video and that's why you cannot use it right now, but in the future, I think that AI video is going to be with emotions as well. 

0:15:42 - Mehmet
That's great, because I wanted to ask you what should we expect? Also more in the future, Alex, regarding this technology. Where we are heading. 

0:15:52 - Alex
I think there will be more and more realistic. Custom avatars will be much easier to create, so you just add the photo and you have your avatar. Voices also will be much more realistic. You can have your own voice in any language, and the most challenging part here but still I think it will be in the future is basically to make avatar talk with emotion, so you can actually manipulate with emotions when avatar is speaking, so where he is like louder when he moves his head and body, according to some scenario. So definitely it's going to be happening and at some time I think we won't be able to realize where it's a real video or AI based video. So I think this is what's what's going to happen. 

0:16:52 - Mehmet
Do you think this avatar or, let's say, this AI character will jump out of the screen one day and we're going to see kind of a realistic one in front of us, maybe in a form of a VR or extreme reality? Do you think that is possible? 

0:17:06 - Alex
Well, yeah, if we're talking about VR, we already have some integrations with the company who actually do like VR screens and they use our avatar, so it's already happening. I think that it's not just something that you're really going to use. It's not the really wide usage for this technology, right, because VR screens, like where exactly they use, right now it's not that much cases. They use them like IT conferences, maybe somewhere else, so it's not much, but we'll see. I don't know. It's also involves hardware developing, right, but it's much, much harder. So, if you're talking about real world not when you just see the screen, I think it's this part will develop much, much slower, but we'll see. Like who knows, it's already. If you've seen these robots, they look like humans and I don't know what to expect, but definitely it's going to be changing as well. 

0:18:07 - Mehmet
Yeah, the other day someone was asking me. He said when do you think you're going to have an avatar on your podcast? And I said I'm not sure like how fast it could be, because you mentioned hardware, but do you think like it can be and she can hear about completely with you? Can it become real time? Can we see it like I'm chatting to you now, so maybe I will be talking to Alex, but actually it's not the real Alex. It would be an avatar of Alex. 

0:18:36 - Alex
Yeah, exactly, I think it's pretty close. It's pretty close. So OpenAI gives you, for example, live streaming text, and then it like you use stream so you can just turn the text into audio, audio into avatar, and it's definitely going to be real time. So I don't know when exactly, but I think in one, two years it's already going to be real time. So you can do a podcast with avatar in a year or two, I think so. 

0:19:10 - Mehmet
I did it all your perspective, honestly, like couple of months back. So I use the 11th lab mainly to generate the voice from text, and I acted as if I was asking questions and I did this way. So, yeah, maybe next time I would turn the video using your. 

0:19:27 - Alex
Yeah, yeah, yeah, so real time it's pretty close, I'm sure. 

0:19:32 - Mehmet
Yeah, now you mentioned, like, of course, you have to rely on other technologies and so on. Now, one of the things that a lot of people, when I ask them about you know of course you have your own technology, but you are relying on, let's say, open AI, right? So, as a CTO and I think this is one of the things that you need also to consider how do you make sure that you don't have what we call it a vendor or partner lock in or technology lock in? Because if, let's say, one day and I've seen this, not in AI, but in other fields where, for example, the technology vendor decided, all of a sudden recently it happened with Reddit, with Twitter they decide to change the way they, for example, they let people use the APIs or they they try to, you know, impose additional cost for that. So how do you, as a CTO, like, have this contingency plan in place and make sure that you can? 

you know, continue business as usual. 

0:20:37 - Alex
Of course it's. The first thing would be, I think, rElai.ioability of the partner, because good API partners they always like give you time to adapt. Like they say okay, we're gonna make new version in like six months and this is gonna be different, so make sure you like can adapt. So basically choosing the proper partner is pretty important. And another thing would be to have a proper infrastructure that can support and you can actually adjust to use other provider fast. For example, we have a pretty solid service that basically handles all the text to speech partners. As I said, we have like already six of them. 

And basically, in this case we can easily, easily add new provider or remove. So it's much easier if you have the proper architecture to manage this. And, yeah, I think this to think is like the most important things, because if you have proper partners, they will give you in advance like the idea of what they're gonna change and if you have proper architecture, you can easily change your code and add or remove partners and like change to other providers. And, of course, if somebody is like making unique technology that nobody else can do, then it's like you know, tricky thing, but in another way, like if they only once they usually other companies are raised and do the same, right, because it's really not really like the common thing when somebody is doing something and nobody else can do it. So, yeah, good thing for us as the leaders of the city. 

0:22:26 - Mehmet
Yeah, but I think in this space, like, the race is very hard and I think everyone, like they, need to rely on you know companies, that they built products like what you do, alex, so I think we are in the same side, at least for the coming five, six years. I would say Now one thing when you know I was, I was preparing for, you know, today, congratulations, you have managed to grow this, bootstrapping the business, and I always do like this when I see a startup that puts trap without relying on venture capitalists and you know large investments. So how did you manage to do that, alex? 

0:23:07 - Alex
First of all, I can't say that we didn't raise it all, so we had some pretty small rounds initially, I think in the last year, but it was small, yeah, so we basically now grow. 

We can say that we boost trap, because now we like earn more than we spent and we basically spent the revenues that we, that we grow. I think that's the key, for this would be to focus on your customers and to focus on growth and like to manage your costs properly. So you just see, okay, I have this amount of customers, how can I grow it? So you grow your revenues and it's definitely a good thing, even if you do that for any business, if it's venture business, it's, of course, also important because you need to prove that you grow when you need to grow. 

And for us, it was the key because we've seen specific niches and specific use cases when we help people and we started to earn from that. We started to give them service, they started to pay us and we grow it, grow it, grow it and basically, yeah, it helped us to do it. So, focus on your customers, focus on product and focus on business. So we didn't spend much time on fundraising, on like, instead of of, you know, talking to investors and always trying to raise new round, we focused on growing business and it give us the way that we are doing right now. So we just we feel ourselves pretty safe because we don't rely on venture capital, and it's much makes our business much more stable. 

0:24:57 - Mehmet
Yeah, that makes sense. Now, what are now the strategies you know like from both technology and business perspective? You know like you are trying to implement, to expand. You know the business and I know like maybe you started. You can tell us more like, have you started to get into new geographies, into new areas? What you can tell us about your future plans? 

0:25:24 - Alex
So, yeah, we both are growing in terms of, like, geography. So it's more for our marketing team to decide where to go and how to grow and where. From my perspective, I'm more about growing as a product and basically we definitely grow all the time and release new features, try new things and, for example, right now we're just releasing photo avatars, which basically pretty cool thing, and we see already a lot of interest in this. So, instead of creating avatar from videos, you can create avatar from photo, and it's much easier, for of course, it's not that realistic as from video, but for many use cases it works because people don't need to create video. They just need to upload their own photo and then create the videos. 

So, in terms of product, we grow this. We invest in this story feature as well because we see a potential, we see how it's adopted by users. We create photo avatars. We also focus a lot on our enterprise customers as well, because they have other types of features that we need. So, in terms of geography, we are thinking about, as far as I know, looking into South America market because it's growing and seems like it's pretty well adopting avatar's technology. In terms of product, it's photo avatars and much, much more in the future. We have a pretty big pipElai.ione hiring actually developers and yeah, I don't have an exact list, but it's gonna be a lot of new things definitely. 

0:27:23 - Mehmet
That's cool. If you allow me to ask like also here maybe I missed, I should have asked this before, but do you see interest from? Because I know you folks B2B right, but I can go to your website as a single creator and also subscribe. Like, where are you seeing more interest from customers? Like, is it like people who are solopreneurs or, let's say, single creators, or you are seeing also mass adoption from businesses as well. 

0:27:54 - Alex
I think both. So basically, initially, of course, it was like more B2B2C creators, just like people who bought some not really big plans and they created videos for themselves, different medias, for marketing purposes and education as well. So like more small businesses, basically. And but as we grow, we also have seen lots of potential in huge customers. We have pretty big customers on our list already, like world famous companies I cannot say all the names because of NDAs but it's definitely pretty well adopted by enterprises as well, especially in the sector of education and learning development. 

I would say that it's pretty huge and for us as a business, of course, this enterprise clients are very important because the checks are much, much bigger and, of course, we see a lot of potential in our API technology. So, basically, when we are a partner of somebody else who use us as an API to create videos and they distributed in their apps or even allowing their customers to create content using our avatars, so it's also something that give us huge checks because in this case, like they are basically distributing the technology and we are, as an API partner, helping them. So we still like trying different niches and we still see potential in both sides. So B2B companies and B2C like small businesses that create marketing content, and I think that in the future we're going to also develop both these sectors. So for us initially it was like more B2C, but now B2B is growing faster, I would say. 

0:29:59 - Mehmet
Yeah, that makes sense also as well, and you need sustainability and because you need long-term contracts and people who need to use this technology on the long term rather than me like I want to play a little bit and create a video for myself, so that's 100% makes sense. Now, as we're coming almost to the end, what advice you give to fellow entrepreneurs with technical or non-technical background who want to start their journey? And maybe because now you have this experience with AI, what can you tell them from your experience? 

0:30:38 - Alex
I think right now with a really good time to start the AI business because, especially in the sector of large language models and these text enhancement things because it's really something that changed how market is working it's much more technologies right now that people can use and adopt to businesses. Basically also generative AI, video to video, text to video. So, for example, generation of videos just from scratch when you just say the prompt and you get the video. So it's definitely a good time to start such business because technology is booming right now and I would say that for now, it's pretty important to have a balance between developing the technology, because AI is much harder to develop. So you need to have this balance between creating the technology, having some insight in the technology, and also developing the business itself. 

So I would suggest to start with, of course, specific business niche and seeing the need, seeing the business that you can sell this technology to, but, of course, not to forget about developing the technology itself. So try with open source things and then grow it to some extended business. So definitely, if you create startup, the first thing is customers and your product. That feeds to some specific business. 

0:32:20 - Mehmet
That I think my biggest advice, and after that comes after yeah great advice, actually, and you know I can say you are a success story, alex, what you've done and the company and your achievements, of course. Now, usually this is why I keep a space to my guests at the end While I ask is there anything that you wish I have asked you and feel free to answer it. Anything you wish I have asked you that I didn't. Is there any questions you wish that I had asked you? 

0:33:03 - Alex
Oh, I wasn't prepared for that, but I think you did ask, because I was thinking about large language models and all this space of creating videos that interact with large language models, and we pretty much covered that, and also about boost draft and my advice. So I think I'm pretty clear with anything. 

0:33:29 - Mehmet
I wanted to say Okay, thank you very much, Alex. Like, I try to cover all the angles, of course, but I'm not perfect as well, so this is why I like to ask the guests, so maybe there's something that they wanted to say, or maybe where they can find more. Of course, I'm going to put all the links, but tell us, like, where they can find more about you and the company. 

0:33:55 - Alex
Yeah, it's like my LinkedIn profile and about the company. Of course, it's our website, ilayio. So it's pretty much that's it, because on the website we have everything it's like our entry point to our business and on LinkedIn, of course, it's like everybody's using this. So I'm there and, yeah, sorry, it's also actually my first podcast ever, so maybe I'm also not well prepared. 

0:34:23 - Mehmet
But no, no, no, no, no. You did perfect, Don't worry. 

0:34:27 - Alex
Okay, okay, great. So yeah, basically it's LinkedIn and our website and happy to chat if you drop me a message. 

0:34:35 - Mehmet
Sure, thank you very much, alex. I will make sure that I will drop both the company and your LinkedIn profile in the episode description and thank you very much for your time today, and the way I like to end my episodes is to go to the audience and tell them. Please, guys, let me know your feedback, keep them coming. I'm receiving a lot of messages recently, so I'm very happy about all the feedback you are giving to me, and thank you very much for being loyal audience and, as usual. Also, if you are interested, like Alex was, I guess, today if you want to be on the show and tell us about a cool startup that you are in now, or maybe you want to share your experience about anything related to tech and entrepreneurship, feel free to reach out to me and you will make an arrangement and we can record the episode together. So thank you very much for tuning in. We will meet again very soon. Thank you, bye, bye. Thank you, mehmet. 

Transcribed by https://podium.page