Oct. 25, 2024

#405 Inside the World of Industrial AI: Manas Talukdar on Data Pipelines and Predictions

Show Notes
Transcript

In this episode of The CTO Show with Mehmet, we dive deep into the evolving world of industrial AI with Manas Talukdar, Director of Engineering at Labelbox. Based in the Bay Area, Manas has nearly two decades of experience in enterprise AI, data infrastructure, and distributed systems. Throughout his career, he has played pivotal roles in companies like C3 AI and OSIsoft, where he contributed to developing platforms for mission-critical industrial AI applications. In our conversation, Manas unpacks his journey from electrical engineering to working with leading AI platforms and shares valuable insights into the challenges and opportunities of data infrastructure in AI.

Manas explains the anatomy of data pipelines, emphasizing the importance of data storage, ingestion, normalization, and analytics as the building blocks of AI systems. He highlights why high-quality, real-time data is crucial for training AI models, especially in mission-critical applications like predictive maintenance and industrial automation. We discuss the emerging concept of synthetic data and how enterprises can leverage it to fill the gap left by the lack of high-quality training data. Manas also touches on AI’s reliance on data infrastructure and the complexities of using synthetic data to enhance large language models.

We explore real-world use cases, such as using drones and AI algorithms to predict rust rates in pipelines, showcasing the transformative impact of AI in the industrial sector. Manas also shares his thoughts on integrating AI into traditional industrial systems, offering a glimpse into the future of AI-enabled operations. He sheds light on the importance of building scalable, resilient AI systems and the role of loose coupling in creating adaptable AI solutions that can withstand rapid technological changes.

Towards the end of the episode, Manas shares career advice for aspiring AI professionals, emphasizing the importance of foundational knowledge, continuous learning, and finding mentors in the AI field. He also highlights resources like Andrew Ng’s Coursera courses that provide comprehensive training in machine learning and deep learning.

More about Manas:

Manas Talukdar is a senior software engineering leader in Data Infrastructure for Enterprise AI. He has significant experience designing and developing products in artificial intelligence and large-scale data infrastructure, used in mission critical sectors across the world. He is a senior member of IEEE, AI 2030 Senior Fellow and Advisory Board member.

His experience spans over a decade and a half in leadership and critical roles for leading enterprise software companies in the San Francisco Bay Area. He has made key contributions to the preeminent industrial data historian in the world, specially ubiquitous throughout the process industry. As Director of Platform Engineering at C3 AI, the leading Enterprise AI company, he founded and led an organization of multiple teams developing cutting edge capabilities at the intersection of artificial intelligence and large-scale systems. He is currently Director of Engineering at Labelbox, a startup building a data-centric AI platform, where he runs platform and product engineering organizations.

https://manastalukdar.github.io/

https://www.linkedin.com/in/manastalukdar/

00:00 Introduction and Guest Welcome

01:14 Manas Talukdar's Background

02:57 Choosing a Career in Data Infrastructure and AI

04:56 Understanding Data Infrastructure

06:42 The Importance of High-Quality Data for AI

08:52 Challenges with Current LLMs and Data Quality

13:11 Synthetic Data and Ensuring Quality

15:11 Enterprise AI and Custom LLMs

16:38 Industrial Use Cases of AI

18:08 Real-Life Use Cases of Predictive Maintenance

19:22 Building Scalable Data and AI Infrastructure

25:47 Guidance for AI Startups

30:02 Opportunities in Data Processing and Machine Learning

33:29 Advice for Aspiring AI Professionals

36:18 Connecting with Manas and Final Thoughts

[00:00:00]

Mehmet: Hello and welcome back to a new episode of the CTO Show with Mehmet. Today, I'm very pleased joining me from the Bay Area, Manas Talukdar. Manas, thank you very much for being with me on the show today. As I was telling you, the way I love to do it is I keep it to my guests to introduce themselves. So [00:01:00] I will leave the floor to you.

Mehmet: Just, you know, tell us a little bit more about your background and, you know, what you're currently up to. And then we can start the discussion from there.

Manas: Great. Thanks a lot for having me, Mehmet. Longtime listener of your podcast. Great to be here with you. Uh, so a little bit about myself. So my name is Manas Talukdar.

Manas: I'm director of engineering for an AI startup called LabelVox, based out of the San Francisco Bay Area. I've been here at LabelVox for a little less than a year. I've been in the industry for nearly two decades now. Um, before LabelVox, I worked for an enterprise AI company, C3 AI, where I was part of platform engineering, uh, built out teams and organization working at the cutting edge of, uh, distributed systems and artificial intelligence, uh, helping build out the leading enterprise AI platform in the world.

Manas: Uh, before that I worked for an industrial software company called OSI Soft for over a decade, also in the Bay Area. Well, OSI Soft has been acquired by a British company called Aviva since I left. Which, uh, since then, if I am not mistaken, got [00:02:00] majority stake taken over by. a European behemoth called Snyder Electric.

Manas: So it's all part of Snyder now. So back at OSI Software, what is now Aviva and Snyder? I worked on things like time series historians, middleware SDKs, IoT platforms, and cloud based data platforms, all geared towards industrial AI use cases. So basically, I've been in this data infrastructure, enterprise AI, industrial AI domain for almost two decades now, pretty much all my career.

Manas: Uh, so great to be here with you Mehmet happy to discuss anything related to data and AI

Mehmet: Absolutely. Absolutely again. Thank you Manas for uh, giving me the time and you know I always appreciate my guests who wake up early For making it on the show. So people they know i'm based in dubai you are yourself You are on the in the bay area and the west coast in the united states So it's early for you.

Mehmet: I know and thank you for being with me on the show today. So Um You know, like when us [00:03:00] as someone who comes also from technical background. So every one of us, you know, when we get exposed when we were younger to the, you know, realm of the technology, there's plenty of options for us. So some people, they go to the software development.

Mehmet: Some people, they go mainly even in the software development to the databases. Some people, they go to infrastructure. Some people, you know, you know, For you, what was, you know, the, the thing that drove you to choose, uh, you know, data infrastructure and AI as, as your specialization?

Manas: Right. That's a really good question.

Manas: I think it, to be very honest, it happened Little bit by intent, little bit by accident. I'm happy to go into the background. So my academic background is actually in electrical engineering. Uh, I did my undergrad in electrical engineering. Then I went to grad school, uh, and also in electrical engineering, but I specialized in control systems, uh, mechatronics, robotics, that kind of stuff.

Manas: So when I graduated, I got this opportunity to work at [00:04:00] OSIsoft. It seemed like it was a good fit because it was an industrial software company. It allowed me to move to the Bay Area here, where really things happen. And it also allowed me to kind of keep my roots in, in control systems, in, in industrial systems and so forth.

Manas: So that seemed like a good opportunity and it turned out pretty well for me. So I joined OSI Soft and then I got the opportunity to work on the leading time series data historian in the world. So it happened a little bit by accident that I, that I happened to get an offer from the company. It also happened a little by intent that I wanted to continue working at a place where I wasn't just building, say, social media software or something like that, but I was working on, on software.

Manas: Uh, that would be used in real world use cases, in industries, in data infrastructure, in mission critical sectors all over the world.

Mehmet: Cool. Now let's start to dissect things one by one, [00:05:00] Manas, with you today. So when we say Data infrastructure, right? So what do we really mean by this? And why this data infrastructure is crucial for, you know, making AI truly indispensable, you know, especially for, you know, you talk about very mission critical systems, right?

Mehmet: So, so, so walk me through this, you know, anatomy, let's say of the data infrastructure. a structure. Um, so we can understand more about about this topic.

Manas: Yeah. So I think that's distilled that a little bit because I feel like there's multiple layers in there. So data, when we say data infrastructure, before we think about say AI and which is all the rage right now.

Manas: means really historians at its simplest, in my opinion, you know, data storage. So whether it's time series data or whether it's other kinds of relational data, uh, historizing the data, then the other parts to that [00:06:00] is actually creating data pipelines for ingestion. Like, for example, say you have a, a power plant and then you have a lot of equipment in there, transformers, boilers and whatnot.

Manas: And then you want sensors, which are sending off ridiculous amounts of time series data. And then you want some means of being able to store that data. So you can see what's going on. You can analyze that data. You can do analytics on that data. And data analytics, which is now coming. transform into what everyone's calling AI.

Manas: And we can talk a little bit about that as well. So data infrastructure encompasses data storage, data ingestion, data normalization, data analytics, even, and so forth. Now, why is it important for AI? Well, data is at the heart of AI. Without good quality, without high quality data, there really is no AI. And by that, I mean, One, if you think about what's going on in the industry right now with all this large language models and whatnot, uh, the performance of this [00:07:00] large language models is contingent on the quality of the data they've been training.

Manas: Most of these large language models have been trained on the internet, basically, whether you think about open AI is chat GPT or, or Google's Gemini or, or, or methods, Lama and, and so forth. So. One thing, one common theme in my position right now in the industry that I hear very frequently, and I do consider myself to have a front row seat, especially at my current, current, uh, company.

Manas: So one thing I hear right now is the need for very high quality data. And the, the outright truth here is that these companies, these AI labs developing large language models, they have run out of high quality data to train their LLMs for, for example, if you ask of a generic question to any of these LLMs, whether it's Gemini or Chad, GPT or, or, uh, one of tropics clo and so forth.

Manas: Say, for [00:08:00] example, you know, tell me about Dubai or, or tell me about San Francisco. The answer will be pretty much the same. I mean, you'd be splitting hairs to find the differences. And the reason is because they have been trained on the same data. So there is a very dire need right now for high quality specialized data so that each of these LLMs can differentiate themselves from the other.

Manas: I was listening to a conference a couple of months back, uh, run by Meta and the head of, uh, AI Infrared Meta. In fact, Plainly say it in that conference in your keynote that we have run out of data to train our LLMs and we need high quality data. And that's where a lot of companies come in providing high quality curated training data.

Manas: Um, so going back to your question. So data is really the fuel that runs AI systems at the end of the day.

Mehmet: Manas, this is a very interesting point, and you know, funny enough, you know, I'm transparent with the audience, so we're recording on the 7th of [00:09:00] October, 2024. Um, I was talking to a friend in the around noon time, Dubai time, and he was, he was complaining to me about, you know, chat GPT mainly, right?

Mehmet: And Gemini actually. And he said, like, you know, I started to feel that these models start to get lazy, right? So lazy in a sense. Whenever you ask the question, so it's, it, it invent something. Okay. I know about, you know, all the hallucination thing, but actually it's not hallucination. It is something else. I don't know how to describe it.

Mehmet: So is that data infrastructure and the lack of, of data? Uh, you just mentioned, is it like what are letting us see this behavior from some of the LLMs? Because I noticed something, correct me if I'm wrong, Manas, maybe, maybe I'm hallucinating.. So they release a model, right? At the beginning, it's amazing. So we [00:10:00] see this, you know, really cool things. After a couple of months, we start to figure out that this model started to become lazy, inconsistent. So is this like something related also to what you were just mentioning?

Manas: It is related in the sense that, uh, there is a upper bound of time for which this model at which these models have been trained.

Manas: And what I mean by that is say, say a large language model came out today. There is a very high probability. It was trained on data, uh, with a time limit up to say two months back. And to be more specific, what I mean by that is, uh, Uh, say you ask a question about an event that happened one day ago or maybe one week back uh, there is a 90 95 chance the llm will not be able to answer that question because it is constrained by the Time range of the data on which it was trained.

Manas: That's one secondly Uh, which is about the hallucinogen [00:11:00] aspect if an llm doesn't know about what it's talking about. There is a chance that It might actually invent data, which we call hallucination to answer a question. Now, there are ways to get around it. There's, there's, uh, two shot prompting, multiple, multiple shot prompting, and there's kind of prompt engineering techniques that can be used to get around hallucination.

Manas: But really the hard constraint is that the, uh, the answer the LLM is going to provide is limited by the upper bound. of the time range of the training data that it was trained on. For say enterprises, uh, like for example, uh, maybe there's a company out there, let's say, let's just come up with a name, say Gale for example, that is a very large oil and gas energy company.

Manas: And Gale wants to use an LLM to enable its, uh, executives or enable its employees to be able to, able to ask a question and [00:12:00] get. accurate and actionable insights. Now, obviously, if they use an off the shelf LLM, they are running into this constraint that I mentioned. So there are ways to get around it by things like, say, for example, uh, RAG pipelines retrieval augmented generation, where you would constantly have a live stream of new data coming in.

Manas: So the LLM has context about this new data and it's not limited by the constraint I mentioned. So yes, the freshness of the data is definitely a constraint.

Mehmet: Now, to that point, Manas, I saw also like, um, another day, one thought leader, he showed, he shared, uh, something that, um, Even with the live data, if majority of the content is generated by these LLMs, right?

Mehmet: So it's, so it's kind of like we are doing an empty loophole while AI is being trained by AI generated content. So what's your point of view on [00:13:00] this? Right.

Manas: I think you're asking excellent questions. So basically to your point, if, if a LLM is being trained by garbage data that it itself generated, then it's garbage in garbage out.

Manas: So, uh, since, so that's the generated data is called synthetic data. That's a technical term. So yes, basically we are moving into a stage where synthetic data is a given, given that we are running out of high quality training data and it may be difficult to generate the same internet scale or provide the same internet scale data that was freely available in the past.

Manas: Uh, synthetic data is a given. Now, it is important to ensure that the quality of the synthetic data is acceptable. For example, let me give you an example. Some months back, NVIDIA released their, uh, model report on, on the Netronome LLMs. And they had, there was a very interesting section in there where they talk about using synthetic data to train their LLMs.

Manas: However, they were very clever about it. And I think that is a pointer to how probably [00:14:00] the industry is going to move, where they use things like, uh, LLMs as judge. To actually validate the quality of the synthetic data before feeding it to the LLM itself. Now some might raise a question say wait a minute.

Manas: So We are generating data using AI. We are using AI to also evaluate that synthetic data And then we are feeding that same synthetic data back to AI. So are we just trusting AI all throughout here? Which is a very valid point in my opinion So there are ways to get around it and the ways to get around it is that you use human in the loop to validate the quality of the synthetic data and also, which is even more important to validate the quality of the AIS judge technique.

Manas: So AIS judge or LLMS judge evaluates the quality of your data, but then you have human in the loop who actually grades the quality of these evaluations. And in an iterative process, the quality of these evaluations [00:15:00] progressively gets better. So synthetic data is a given, but it is important to use the right kind of techniques to ensure the quality of the synthetic data.

Manas: Otherwise, as you said, it's just going to be garbage in garbage out.

Mehmet: But also you mentioned something Manas, which I think again, this was my, uh, I'm not expert. I mean, in this field, but you know, when, when my friend was questioning, you know, this, I said, like moving forward, I think what will happen is exactly what you mentioned.

Mehmet: So we will have, you know, our own data, right? So each company may be on enterprise level. They have their own data. So they're gonna have their, they're LLMs available, you know, which can be, you know, even put on on on their premises, right? If they don't want to get it out and then they're gonna train it and then they're gonna use it there because, um, you know, if they keep leveraging, I would say, or utilizing the ones which are available publicly to your point, that's what's gonna happen.

Mehmet: And, you know, we're gonna like have this garbage in garbage out. So I think the way [00:16:00] moving forward is like this enterprise AI where each company, they have their own. Set of data. And by the way, I've, I've seen like use cases, people showed me what they have done with custom made, I would say, uh, kind of chat GPT, if I can say, or Gemini, and really the results are fantastic because they brought their own knowledge base and they use some LLMs.

Mehmet: And really, I saw like something different from what you see in a typical chat GPT setup now, you know, keeping in the, in the, in the, in the AI, you know, enterprise. Um, so, uh, you know, when, when we see, for example, something, uh, we talk about like industrial, uh, right. So, so the industrial use cases, I'm curious to know how this integration can happen in, in that [00:17:00] domain specifically, because, you know, when we talk about, Bye.

Mehmet: Uh, and you mentioned, you know, even your background and I'm familiar with these because I was close to that, you know, like IOT and the mechatronics and all these things. So how this fusion, I would say, between AI and, you know, the technology off of the industrial area, industrial, uh, vertical. are merging when, when it comes to AI.

Mehmet: I'm curious about to know like use case, I would say that you're seeing successful.

Manas: So I'll, I'll give an example of an actual true use case. So let's say there's an oil refinery, massive oil refinery with pipelines all over the place. And now, and then what they have done is they have drones flying about.

Manas: taking pictures of their pipelines. And based on the pictures of these pipelines, they have AI algorithms running, which can detect the rust rates. So it's really quite interesting. So now what they want to happen is to be able to detect [00:18:00] and predict actually the rust rate. So they know when they have to do maintenance on a particular pipeline in the refinery well ahead of time.

Manas: So that is one use case. And that has been extremely successful. In fact, my previous employer, C3 AI, they have a reliability application which is really used all over the world to solve these sort of problems, to solve problems like predictive maintenance and so forth. Other examples are, say, you have a government with its military arm who wants to improve the availability of their, uh, Say the fighter jets and very similar.

Manas: They want to be able to predict when to maintain something so that they can have better availability ratios of their fighter jets. And these are all real life use cases, by the way, other use cases are say you have, uh, a, an industry, say a plant where you have sensors going off of, [00:19:00] uh, an equipment, uh, that runs at a very high frequency.

Manas: And they want to be able to predict, say, uh, what's going to be the failure rate. of the tires that they built, because in, in, in that particular industry, even a slight millimeter of difference means that they may have to discard particular tires that they output, that they build. So all of this is based on first, which I alluded to earlier, making sure that All of the massive amounts of data are rightfully brought in that is modeled correctly in certain cases that the ingestion pipelines are built at scale, then where necessary that the data is normalized, then the data is stored in a, in a scalable way.

Manas: In certain cases, we also want to build out feature stores for feature engineering. So having feature store also is a part of the, is kind of at the intersection of data and AI. So yes, the storage of the features itself falls in data infrastructure. And then, of course, you have to build the [00:20:00] ML pipelines, which can be used to actually, uh, build the AI algorithms, which will, uh, operate on top of this data and then do predictions and so forth.

Manas: So these are some of those real life use cases and and kind of thinking more technically about how these things are built out to do things like predictions and classifications, generations and so forth.

Mehmet: I got it. And I believe, you know, like when I was like, um, building, uh, you know, these systems in a resilient way, we, you know, like having high level of resiliency is, is key here, right?

Manas: Right. Absolutely. Absolutely. Because particularly, you know, these are real world systems where if something goes down or something is, is incorrectly predicted, there can be losses in tens of millions in the best case and hundreds of millions in the worst case. So the resiliency and guardrails and fallback mechanisms is absolutely critical.

Mehmet: Absolutely. Absolutely. Now I want to understand one thing, you know, um, you [00:21:00] know, when I was reviewing, you know, the, the, the, the, your website and the bio. So I've seen the term data centric approach to AI, right? So, um, And it's something related to the development life cycle. So the word data centric, is it like something different from what we were mentioning?

Mehmet: Or is it like in the in the same concept of what we were like discussing a few moments ago?

Manas: It's the same concept of discussing a few moments ago. It's basically about a mindset where we think that we know Uh, that data is the fuel that runs AI and the importance of generating high quality data, the importance of having resilient and scalable and performant data pipelines for ingestion, uh, for, and then having important, you know, high throughput storage engines, having feature stores have think about thinking about it from a really end to end perspective, where building an AI is not just about using an LLM or using a machine learning model, [00:22:00] but it's about A holistic view where we cannot have AI without data, without actually ingesting data, without storing data, without doing feature engineering on top of the data and then having ML pipelines and then inference engines all the way through.

Mehmet: I got it. Now I want to a little bit shift gears and, you know, discuss something related to leadership a little bit here. And you know, in, in such a, um, You know, in such, I would say, working environment. So the bar is very high. You know, the expectations are very high. And at the same time, things move very, very quickly.

Mehmet: So, you know, how do you manage, you know, this, you know, keeping things on, you know, going at the same time, you have a high level of it, of, uh, disruption, I would say that can happen by the day. And at the same time being able to keep, you know, giving this, uh, you know, [00:23:00] kind of mentorship and leadership to, to your team members.

Mehmet: So how do you manage this balance of having these all things together at the same time, especially we, you know, in the AI field, it's, it's very challenging. I know this.

Manas: Oh, yeah, absolutely. You know, I mean, I've been in the, as I said, I've been in the industry for almost two decades, and I saw some of the, the pace of this happened in the last decade in the distributed systems cloud domain.

Manas: And it is happening many times over now in the AI domain. I mean, the pace of change is absolutely unprecedented, whether it's what's happening in the industry or whether it's what's coming out of academia. So going back to your question, uh, keeping Teams and companies focus. It's extremely critical.

Manas: Otherwise it's going to be very chaotic. So the way I think about it is one. It is very important to think about what's coming out of academia so that we know which way the industry is going to trend. We do not want to be caught unawares, particularly in an [00:24:00] extremely competitive domain as in AI that we are at right now, that my current employees, my company is at right now.

Manas: So we want to keep an eye on what's coming out of academia so we are able to think about proactively and incorporate the latest research that's coming out of universities and so forth. That's one. Second is that when we do not want to, we, we do not want to be too hard and fast. About, hey, what are we going to build six months down the road or one year down the road?

Manas: I mean, to be very frank with you, one year down the road is simply out of the realm of possibility right now, given the pace at which things are happening in AI. So it's important to think at a holistic view, hey, what we might do. Six months down the road, but it's also important to be extremely dynamic about it, that, hey, by the time we get there, things may have evolved in a way where we now have to work on some other projects.

Manas: So that's one, uh, third is that it is very important that, uh, we tell engineers. That, hey, uh, you cannot just operate on your day to day [00:25:00] task, but we provide them the psychological safety and the really the means to continue to learn and grow, to learn about the new technologies out there, to provide them the resources to perhaps go take a course that's being offered that's talking about some of the cutting edge research out there.

Manas: So holistically, these are some of the multiple things that I think about to make sure that people, the engineers, and teams are kept focused, even in this. very dynamic situation.

Mehmet: Absolutely. And, you know, to your point, Manas, I remember when I started the show. So I started at the same time when Chad GPT was still like new.

Mehmet: And, you know, at the beginning, you know, I used to ask, for example, my guests, okay, tell me what do you expect in this domain after I would say five years, six years. So now I cannot dare to ask this question anymore because AI changed things very fast. Now, um, I know also Manas, you work a lot with, uh, I mean, startups, you know, in, in advisory and mentorship, uh, roles.

Mehmet: Um, [00:26:00] so I discussed this a lot with multiple guests here in, in, on the podcast, but every time I like to take more opinions, you know, because I believe the more opinions we have on the topic, like we can have kind of, uh, You know, we can get our own, you know, opinion on the matter. So the thing is, especially when we talk about startups building AI products, right?

Mehmet: So I've heard different opinions on, like, what is the best way to build a future proof technology? Especially because, you know, again, we just mentioned a couple of minutes ago. So things change very fast. So how do you advise usually these founders building AI products on making sure that they are building something that can at least survive for a while?

Mehmet: I don't know how many years, but I mean, [00:27:00] it can, it can scale. In the future.

Manas: Yeah. So, you know, a lot of startup, uh, founders or employees do reach out to me from time to time on different platforms, asking for guidance on their products and so forth. So specifically about this point that you mentioned being quote unquote future proof So what I tell them to do is to think first of all to not just go out and build something but to really understand What is the cutting edge out there?

Manas: So I point them to some papers I point them to some books which actually holistically talk about the cutting edge and so forth That's one second thing. What I do is I ask them to not build hard couplings in any of their infrastructure uh, so so for example In an architectural review, for example, I might ask them to build things in a manner where they are loosely coupled so that if need be, they can easily swap out one component and rebuild that component using perhaps a newer technology and so forth.

Manas: Other things I asked him to do is, is, uh, [00:28:00] Do constant reviews of what they have built. So not just talk to me one time and then okay Now we go ahead and build it, but I do keep engaging with some organizations on a continual basis For example, I give them some ideas on how to go about an ai project I provide them a report based on my recommendations and then I keep touching base with them to see how they are doing

Mehmet: right And what what's your take malas on very quickly on on the people who build?

Mehmet: You know some companies on top of other companies APIs, especially in the domain of AI

Manas: So so the the hard truth is that if we think about llms There are only a very few companies that are actually developing large language models and the reason is because Simply because of the economics of it, given high GPU training costs and so forth, only very few companies can actually afford to develop these LLMs.

Manas: That's the hard reality of it. So I think it is a given that companies that other smaller [00:29:00] companies will have to use APIs, uh, specifically for talking to LLMs and so forth. There's no way around it. For better or for worse. But what my guidance to companies using the APIs is not to do again, to think about loose coupling, always to build out, to think about a data access layer and not hard couple, uh, their, uh, business logic with the API itself so that say one year down the road, if there is a better solution out there, they can very easily refactor the data access layer by swapping it one API with another API.

Manas: Instead of having to refactor that business logic So creating the right coding abstractions creating loose coupling That's kind of the thing I tell people to constantly think about

Mehmet: great. Absolutely, you know exactly, you know the Uh, I I think you you've got like a very balanced approach Manas and you know, this is really helpful, I would say for a lot of Founders [00:30:00] who are building on on top of these LLMs.

Mehmet: Now, I, I know, uh, Manas, you have filed for multiple patents in data processing and, you know, you, you, you really have done great, you know, I would say work on, on, on these innovations and, you know, the impact that, that is very obvious. So in the context of, of, you know, what you have done from research, from these innovations that you have done, what you can tell us about the biggest opportunities in, I would say, field of data processing and machine learning that excites you?

Mehmet: And I will not give you a time frame because we don't know when things gonna happen

Manas: Right. Well, what really excites me right now is the massive opportunity out there Uh, to provide high quality training data to these AI labs and by AI labs, I mean the companies that are developing these large language models.[00:31:00]

Manas: Uh, so, so it is a, it is very clear to anyone who's watching the industry in this particular domain is that there is a dire need for AI labs. To get access to high quality training data so they can build increasingly powerful multi modal large language models, one and two, so they can differentiate themselves from the other companies building these LLMs.

Manas: And also there is a need for specialized AI models as well. So think about, say, maybe an AI model. Uh, that, uh, is able to be do a really good job in, uh, reading x rays in cardiology, for example, or an AI model that is able to do a really good job in, in, um, atmospheric science. For example, obviously these AI models, the baseline for this would be some LLM and there would probably be some fine tuning and, and post training on top of that.

Manas: So that is a domain that excites me a lot because the opportunity there is just absolutely massive in my [00:32:00] opinion. That's one. Uh, second is the, the potential for AI to enable a lot of fields of human endeavor, uh, whether it's astrophysics, whether it's medical science, whether it's creativity is absolutely immense.

Manas: Um, in fact, just I think eight or nine months back, some AI model was able to parse some astrophysics images, some, some images from some, uh, some satellite that's out there looking into the deep space. And they were able to actually detect a planet previously humans would not be able to detect. And this, this model, as it turns out, and I looked it up, wasn't even trained, wasn't even fine tuned for reading this, uh, space images.

Manas: So it was absolutely incredible to me. And it's also a pointer to what can happen once we actually start fine tuning these LLMs for specialized tasks. So that is [00:33:00] another, another domain that really excites me.

Mehmet: Yeah, really fascinating. Especially I'm excited, although I'm not like a practitioner in this domain, but what excites me a lot is anything that has impact, whether like healthcare, education, right?

Mehmet: So these, these fields, you know, um, really, they need, Innovations all the time and AI, I think it's, it's in my humble opinion, it's accelerating this. So yeah, absolutely. I agree with you Manas on this. Now, as you are coming close to the, to the end of, of this episode today, um, I know that, you know, now a lot of our listeners, so maybe some of them, they are, About to change career.

Mehmet: Maybe some of them, they are in college and they, they, they want to build a career in, in AI and data. So what would be, you know, the advice that you would give them for, you know, getting in, in this domain?

Manas: So, uh, if [00:34:00] somebody already has a STEM degree, obviously it's easier for them to move into AI. Um, but my recommendation is don't just think about jumping into a jumping into AI without building the foundation.

Manas: So if you, if you are currently in, in, uh, in, uh, in the industry and you're a software engineer, say building web applications and you want to move into AI, then you should think about how can you build the AI enabled web app, you know, and there's, there's a literature out there. There's books out there that actually show you how to do that.

Manas: Regardless of that approach, I strongly recommend anyone who wants to move into AI to do some courses to understand, you know, what's machine learning, to understand things about, hey, what's, uh, what's, uh, what does building a data pipeline mean? What is a machine learning pipeline? What does inference mean?

Manas: So, uh, if somebody's in the industry, you've already graduated, one pointer, one resource I point people to, uh, these courses from Coursera offered by Andrew N. [00:35:00] Who's a really well known person in the AI community. Um, I mean, I have myself taken those courses, the machine learning, I took the machine learning course back in the day, many, many years back, it's now broken down into a three course specialization.

Manas: I strongly recommend people to take that. I also recommend people to take the deep learning specialization, again, off of courses offered by Professor N as well. Uh, that can be a little bit dense. So just be aware that you may have to dedicate some time to to work on that. But building that strong foundation is extremely important.

Manas: Second thing I would say, explore the opportunity of doing some kind of AI project at your current workplace. If that opportunity arises, that's fantastic. If you cannot, Explore that opportunity, then build a project on your own. Build your GitHub profile with some AI projects, and again, use some of these learnings from these courses, use some of the learnings from the books, and so forth.

Manas: Third thing I would say, which would really accelerate this, on top of the first two options I gave you, is try to find a mentor in [00:36:00] the industry who's already working in the AI domain. I mean, that can really exponentially accelerate your Your move from, say, some other domain to working on the AI domain.

Mehmet: These are really very valuable, uh, recommendations and advice from you and I thank you for sharing that. Finally, where people can get in touch with you,

Manas: well, I'm on LinkedIn, uh, so they can definitely reach out to me on LinkedIn. I also have a website, uh, uh, I maintain, uh, on, on, uh, on GitHub. Uh, obviously I consider myself to be an engineer, so it's all, uh.

Manas: You know, code it up on github. I just push a commit and then changes go live. So I'm pretty happy about that. So you can always contact me either via the contact page on my website or the best thing to do is also reach out to me on LinkedIn. Just be aware that I get a lot of messages on LinkedIn, so I may not respond right away.

Manas: But, but you can always reach out to me on website and, uh, my email address is also listed on my website. Obviously I don't list out the [00:37:00] exact address. So, you know, people don't, don't, uh, like scrape that, but it's listed. It's my first name dot last name at gmail. com. You can always email me on my, on my, on my Gmail address as well.

Manas: Great.

Mehmet: Great. Man, I thank you very much for that. And for the audience, don't worry, you will find a Manass website on, in, in, in the show notes. If you are listening to us on your favorite podcasting app and you will find the description on YouTube again, but as really, I enjoyed this conversation with you today.

Mehmet: Plenty and a lot and a lot of, uh, valuable information, especially the part, uh, you know, about the advice for, for the career in, in the AI and, and data, which is important. I think, you know, this is the next, uh, uh, important. Or like let's say big thing that will will be in the career for for people in technology.

Mehmet: And I think everyone should know about it, at least, you know, to your point. And I liked how you mentioned there is one course which everyone should know. And there is also the [00:38:00] deep the deep side of it. So thank you very much for sharing that. And this is how I end my episode. This is for the audience. If you just discovered our podcast by luck.

Mehmet: Thank you for passing by. If you enjoyed. This episode, please give us a thumb up, subscribe and share it with your friends and colleagues. And if you are one of the people who keep coming back, thank you for doing so. Keep sending me your suggestions, questions, feedbacks. I really appreciate all, all what you share with me.

Mehmet: And I take even every single recommendation into consideration. So thank you for doing this. And as I say, always, thank you very much for tuning in today. We'll meet again very soon. Thank you. Bye bye.

#405 Inside the World of Industrial AI: Manas Talukdar on Data Pipelines and Predictions

Listen On

Featured Episodes

Recent Episodes

Support On

New to The CTO Show with Mehmet Gonullu?

#128 Ken Lonyai on Humanizing User Experience: The Intersection of AI…

#91 Elevate Your Sales Game and Scale Your Tech Startup with Gary Gar…

#164 Solving the Equity Split Puzzle with Mike Moyer

#213 Innovation in Motion: Gleb Yushin on Battery Technology and Elec…

#275 Christian Espinosa: The Adrenaline of Risk and Cybersecurity Ent…

#315 Venture Capital in the Age of AI: Expert Perspectives from Manis…

#383 Building the Operating System for AI: Rob Futrick on Anaconda’s …