Podcast Image: Andrey Cheptsov from dstack

Andrey Cheptsov from dstack

Andrey discusses his journey of starting his company and the challenges faced by ML engineers in managing infrastructure.

Host

Rod Rivera

๐Ÿ‡ฌ๐Ÿ‡ง Chapter

Guest

Andrey Cheptsov

CEO, dstack

Andrey Cheptsov from dstack

In this episode, we sit down with Andrey, the founder of dstack, to discuss the challenges and innovations in AI infrastructure. Andrey shares his journey from a PyCharm product manager at JetBrains to creating an open standard for managing AI workloads.

This episode offers a fascinating look into the world of AI infrastructure and the power of open-source solutions. Whether you're an AI professional or simply interested in the field, Andrey's insights provide a compelling perspective on the current state and future direction of AI development tools.

Takeaways

  • Open source tools will play a crucial role in the evolution of AI tools and platforms.
  • ML engineers often struggle with infrastructure management, and vendor-agnostic tools like dstack can simplify the process.
  • dstack allows users to easily switch between cloud vendors and utilize GPU resources without heavy investments.
  • The future of AI tools will involve more commoditization and competition in generative models, as well as an increase in open source models.

Episode Transcript

An Unusual Travel Story

Rod: Tell us a fun fact or story about yourself that most people don't know.

Andrey: Well, many people know I'm a foodie and that I travel a lot. However, what they might not know is that I sometimes travel just for food. I once went to Hong Kong twice, solely to eat dumplings.

Rod: Wow, that sounds quite adventurous!

Andrey: It does sound crazy when I say it out loud, doesn't it?

The Genesis of dstack

Rod: What pivotal moment led you to start your company?

Andrey: It's challenging to pinpoint just one moment, as there were quite a few. Before dstack, I was working at JetBrains, building development tools, which I really enjoy. As the Product Manager for PyCharm, I conducted many interviews with Machine Learning teams.

One thing that stood out was how ML engineers often struggle with infrastructure management. They love working with models and code, but infrastructure is more in the realm of operations people. This struggle could lead to ML teams manually managing infrastructure or even more serious issues.

I remember a particularly eye-opening conversation at JetBrains about training more models. The team said, "We have some hardware and we use it as much as we can. However, we can't buy a lot of hardware because of the fixed costs. At the same time, it's really difficult to use the cloud because of the implied costs."

This situation essentially paralyzed the team. Instead of working on many projects, they were limited to very few. That was the moment I realized we needed to do something.

Describing dstack: An Open Standard for AI Workloads

Rod: How would you describe your startup to someone who has never heard of it?

Andrey: We've built a stack for ML folks. If an ML person asked what we do, I'd explain it like this:

As an ML professional, you need infrastructure for development, model training, fine-tuning, and deployment. While there are vendors offering tools for these tasks, most are proprietary. This means you have to use their specific user interface and follow their way of doing things, investing a lot of time in learning how that particular vendor works.

Now, compare this to using open-source tools like Docker or Terraform. These are open standards. It doesn't matter if you're using AWS, Azure, or GCP โ€“ once you understand how Docker works, you grasp the fundamentals of containers across all platforms.

That's what we've built at dstack โ€“ an open standard for managing AI workloads, covering development, training, and deployment. You could think of it as a Kubernetes replacement, but specifically designed for ML engineers.

The Importance of dstack in AI Development

Rod: Why is this stack pivotal for those building in AI?

Andrey: There are several reasons. Firstly, because dstack is an open standard that works across vendors, it makes it much easier to switch between providers or avoid dependency on a particular vendor.

Today, there's a significant GPU shortage, and getting on-demand, spot, or even reserved instances for developing, training, or deploying models isn't easy. Only companies with substantial resources can afford that. With dstack, we allow ML teams to use multiple vendors simultaneously without worrying about the specifics of each provider.

I've spoken to many users who use dstack specifically to access on-demand and spot instances, as it helps them use GPUs without heavy investments.

A Success Story

Rod: Can you share a story where dstack significantly helped one of your users?

Andrey: Certainly. Many of our users previously had to either use multiple vendors or write custom scripts to request GPUs on demand. These scripts would ping providers every few minutes to check for available GPUs, and once found, they'd book it for use the next day.

I've seen teams using multiple providers with custom code for each โ€“ some code runs on RunPod, some on TensorFlow, some on AWS. When I told one user that with dstack, they wouldn't need to write that code anymore โ€“ that dstack allows them to book GPUs or run tasks on GPUs with any vendor of their choice or even multiple vendors โ€“ it was a game-changer for them.

Typical Use Case Scenarios

Rod: Can you walk us through a typical use case scenario of dstack?

Andrey: Sure. There are four major scenarios that users typically switch between when using dstack:

  1. Getting GPU access: dstack allows you to get GPUs from a vendor of your choice or multiple vendors. We have a feature called the dstack pool, which lets you book capacity from particular vendors.

  2. Development: Users often run long-running jobs like fine-tuning or deploying models. dstack lets you run a development environment where you can use Jupyter notebooks, Python scripts, or IDEs like PyCharm or VS Code.

  3. Tasks: Once you know what you want to do, like fine-tuning or deploying a model, you can define a task in dstack. You specify the infrastructure requirements, and dstack uses the infrastructure from the pool or requests it on demand.

  4. Services: When you want to deploy a model, you define a service. dstack handles the deployment, ensuring it's available as a public endpoint, regardless of which provider you use.

Advice for AI Beginners

Rod: For those who are starting to build AI products, what do you recommend?

Andrey: Based on my experience, I'd say: Don't fall in love with a particular vendor. Instead, fall in love with best practices or specific tools that you can reuse across vendors. I'm a big fan of open-source tools in general.

While companies like OpenAI are leading the way in AI, their products evolve quickly, and it can be hard to keep up. However, if you focus on best practices and mature tools, you'll find that these remain consistent even as vendors come and go.

I'd recommend learning more about best practices like retrieval augmented generation, structured data extraction, using open-source models, and using Docker to deploy generative models.

dstack's Benefits for AI Beginners

Rod: How does dstack help beginners in AI?

Andrey: When we talk about AI today, there's often some confusion. Some people still think of AI as machine learning, while others think of it as something magical available through certain endpoints, like OpenAI's.

When I refer to AI, I mainly mean something you develop, most likely using open or custom models. That's where dstack comes in. Because it's vendor-agnostic, it doesn't matter which model you use. You can use any open model out there, grab any model from Hugging Face, and use any serving or fine-tuning framework.

What's most profound about dstack is that it doesn't depend on any particular vendor or method of training or deploying models. We strive to make it work in any scenario, regardless of the model or tool you use, while handling the infrastructure for you.

The Future of AI Tools and Platforms

Rod: How do you foresee the evolution of AI tools and platforms in the next 5 to 10 years?

Andrey: It's challenging to make predictions, especially in a field evolving as rapidly as AI. However, I foresee significant commoditization in the area of generative models. We'll likely see more competitors to OpenAI, while OpenAI continues to push boundaries.

We'll see more companies offering generic models for generating text, video, audio, and images. At the same time, the trend of open-source models will continue to grow. Even though some companies release open-source models as a marketing strategy, it's still beneficial for the market as a whole.

I strongly believe we'll see more and more open-source models, and their quality will continue to improve, potentially approaching that of proprietary models.

Top Three Tips for AI Engineers

Rod: Which top three tips can you give to fellow AI engineers?

Andrey:

  1. Learn the fundamentals: While it's natural to want to learn the latest technologies, understanding the fundamentals is crucial. They remain constant even as specific tools change, making it easier to adapt to new developments.

  2. Balance theory with practice: Don't just focus on theoretical knowledge. Push yourself to build real projects. It's through practical application that you truly understand how things work and where theory might fall short.

  3. Embrace open source: Don't underestimate the value of open-source tools and technologies, especially in the era of AI. Open source can significantly enhance your capabilities and understanding.

Closing Thoughts

Rod: As a wrap-up, what's one message that you'd like our listeners to take away from this conversation?

Andrey: I would encourage everyone to have more faith in open source. It's easy to underestimate its value, but open source will change your life for the better, especially now in the era of AI.

Rod: Fantastic, thank you so much, Andrey.

Andrey: Thank you, Rod.