Podcast Image: Marc Klingen from Langfuse

Marc Klingen from Langfuse

From Code to AI: Unveiling Langfuse's Open-Source Revolution in LLM Monitoring

Host

Rod Rivera

๐Ÿ‡ฌ๐Ÿ‡ง Chapter

Guest

Mark Klingen

CEO, Langfuse

Marc Klingen from Langfuse

In this episode, Rod Rivera interviews Marc from Langfuse. They discuss Marc's journey from business to computer science and the aha moment that led to the creation of Langfuse. Langfuse is an open-source monitoring tool for AI engineers that provides observability and analytics for complex language model applications. They explore the typical use cases of Langfuse, the challenges and pitfalls in building AI applications, and the future of AI engineering and tools. Marc also shares tips for newcomers in the AI engineering field and discusses the importance of designing usable interfaces for AI applications. The episode concludes with a discussion on staying up to date in the AI field and the potential of AI assistants.

Takeaways

  • Langfuse is an open-source monitoring tool for AI engineers that provides observability and analytics for complex language model applications.
  • Experimentation and user feedback are crucial in building successful AI applications.
  • The future of AI engineering lies in automating experimentation and integrating with other tools and platforms.
  • Building reliable and robust AI agents is a challenging task that requires unique setups and experimentation.
  • AI has the potential to enhance productivity and automate repetitive tasks, but it will not fully replace human expertise.
  • To progress in the AI engineering field, newcomers should focus on understanding the building blocks of AI, iterate quickly, and spot problems that can be solved with AI technology.
  • Designing usable interfaces for AI applications is essential for user adoption and success.
  • Staying up to date in the AI field can be done through following the right people on Twitter and engaging with the AI community.
  • The potential of AI assistants lies in their ability to automate tasks and provide personalized services, such as meal planning.
  • The AI engineering field is rapidly evolving, and there are many opportunities to build innovative applications and tools.

Episode Transcript

AI Engineering and Observability with Marc from Langfuse

Rod Rivera: Welcome to the AI Product Engineer podcast. I'm thrilled to have Marc from Langfuse with us today.

Marc: Thanks for having me, Rod. I'm excited to be here.

Marc's Background and Journey to AI Engineering

Rod Rivera: Before we dive into AI engineering, can you share a fun fact or story about yourself that most people don't know?

Marc: While I'm fully immersed in computer science now, my academic journey actually began with a business undergrad. My passion for programming was sparked in high school, thanks to an inspiring teacher and my mom. During my undergrad, I interned at a startup in Latin America, which ignited my love for building things. I eventually found my way back to computer science during my master's at TUM in Munich. Now, I'm thrilled to be building in this space and seeing what others are creating.

Rod Rivera: That's fascinating and encouraging for those starting their journey. It shows that you don't need to be a "pure-blood" computer scientist to get into this field. What was the aha moment that led you to start Langfuse?

Marc: It all began when we joined Y Combinator in January this year. Throughout the program, we worked on various LLM-based prototypes and ideas. We started with a short-form course generator, similar to Duolingo, but for any topic. When that didn't gain traction, we moved on to an autonomous web scraper and finally a pull request bot for GitHub issues.

Throughout these experiments, we realized how much iteration is needed on the engineering side. It's not always clear what will work with LLMs and the various frameworks available. You need to start building, test it yourself, run some light evaluations, and then iterate.

We often found that users were using our applications in unexpected ways. For example, our learning bot worked well for physics topics, but people wanted to use it for liberal arts. Or with our pull request bot, we designed it for full-stack TypeScript projects, but users wanted to use it for Go SDKs.

We quickly realized that these applications could break if the context or user inputs changed, and we needed a way to monitor how they performed in production. That's what led us to start working on Langfuse.

What is Langfuse?

Rod Rivera: For those unfamiliar with Langfuse, how would you describe it? And why is it crucial for budding AI engineers?

Marc: Langfuse is an open-source observability and analytics tool for AI engineers building complex LLM applications. Whether you're using JavaScript, TypeScript, Python, or popular frameworks like LangChain, OpenAI SDK, or even no-code tools like Flowise or LangFlow, Langfuse helps you analyze cost, latency, and quality while iterating on your application in production.

With Langfuse, you can:

  • Track your experiments to see how changes affect your metrics
  • Break down data by end users
  • Close the feedback loop with user input
  • Run evaluations

It provides a structured process for building your application and understanding how it works in production. Being open-source at its core, we love collaborating with others in this rapidly developing space.

Rod Rivera: So, in essence, Langfuse is an open-source monitoring tool for LLM applications that allows you to track what your models are doing and fix errors when they occur.

Marc: Exactly. It starts with monitoring in production to identify errors and quality problems. But it goes beyond that. Langfuse helps you understand where problems originate. Is it an LLM call? The context retrieval? Or perhaps an API call to an internal interface? With the traces you get in Langfuse, you have the full context of what's happening in production.

User Feedback and Langfuse's Evolution

Rod Rivera: Your users are highly technical - AI engineers, software developers, data scientists. Can you share how feedback from these early users has shaped or improved Langfuse?

Marc: Most of our users with significant production workloads have very complex applications. We've built our SDKs to help them log these complex traces and attach various scores and feedback metrics, ranging from explicit user feedback to implicit user signals and model-based evaluations.

However, we continuously received feedback that there was a steep learning curve to get started. This led us to launch our OpenAI integration, where users just need to change the import statement to start using Langfuse. It's been a key learning for us that making the initial onboarding process easier and allowing users to see value quickly is crucial.

Rod Rivera: I noticed this myself when using Langfuse. I was pleasantly surprised to see predefined dashboards and everything prepared upon logging in. It's quite different from applications where you start with an empty canvas and wonder where to begin.

Marc: Absolutely. We've worked closely with early users on the dashboard. We initially used a no-code solution for dashboarding, using Looker under the hood, to iterate quickly. This allowed us to understand what's valuable for these teams, as many engineers aren't always sure what they need to monitor. Now, when new users sign up and integrate their app, they see dashboards that other teams have found helpful, giving them a head start on understanding what to look for when monitoring their application.

Typical Use Case for Langfuse

Rod Rivera: Can you walk us through a typical use case scenario for a Langfuse user?

Marc: While use cases vary, let's consider a common scenario: a documentation retrieval chatbot. This is similar to the public demo on our website. It's a complex web stack that retrieves context and performs operations before responding, and it's conversational, involving multiple interactions with the user.

Langfuse is useful in developing these applications by:

  1. Tracing user interactions throughout a conversation
  2. Monitoring costs
  3. Identifying when users report negative feedback or drop sessions
  4. Analyzing retrieved context that led to poor responses

Often, the issue isn't with the prompt or summarization step, but with the knowledge base itself. For example, users might ask about React Native integrations, but if your documentation doesn't cover React Native, how should the chatbot respond? Langfuse helps you gain insights on what to add to your knowledge base, which is often the real quality bottleneck.

We also partner with RAGAS, a popular open-source evaluation framework, to evaluate model performance without a known baseline. This allows you to monitor metrics without relying solely on user feedback.

Finally, Langfuse helps you collect representative input sets, so when you develop the next iteration of your application, you can test it against known questions to ensure it works as expected.

Internal vs External Use Cases

Rod Rivera: Are these use cases typically for internal company operations, or are companies also exposing these applications to their end users?

Marc: It's a mix of both. Our startup users tend to focus more on external-facing applications, while enterprise teams using Langfuse primarily have internal use cases for now. We're seeing them gradually move towards customer-facing use cases. However, Langfuse isn't geared towards either specifically - the challenges and solutions are often similar whether it's for internal or external users.

Getting Started in AI Engineering

Rod Rivera: Given the complexity of the field, do you have any recommendations for newcomers to AI engineering? How can Langfuse assist them in their journey?

Marc: For those new to building LLM-based applications, I recommend starting with popular abstractions to understand how agents work, how prompt chaining functions, and the trade-offs between different models. YouTube tutorials can be a great starting point.

Interestingly, no-code tools like Flowise or Langflow can be incredibly helpful, even for engineers. They allow you to experiment with different agent frameworks without getting bogged down in code. Both of these tools have drop-in integrations with Langfuse, so you can easily visualize and understand how your application is working once you use it.

This combination of no-code tools for rapid prototyping and Langfuse for in-depth analysis can be a great start for people new to the space.

Pitfalls and Lessons Learned

Rod Rivera: Before Langfuse, you explored multiple ideas in the LLM space. What pitfalls did you encounter? Do you have any cautionary tales for those building AI apps now?

Marc: As engineers, we initially focused on CI, testing, and ensuring everything worked perfectly before moving to production or selling to users. However, we've observed that most successful teams, especially in Y Combinator, tend to launch quickly, label their product as early-stage, and improve continuously based on user feedback and behavior metrics.

This approach allows you to learn from users and identify which use cases work well, rather than trying to perfect everything before shipping. While larger enterprise teams often aim for higher certainty before launch, it's challenging to anticipate all possible user inputs or contexts. A faster, more iterative approach often yields better results.

Rod Rivera: So you'd suggest launching even if the product isn't 100% complete?

Marc: I'd recommend launching when it's working in staging and giving it to internal users for diverse testing. Often, engineering or data science teams have an imperfect understanding of use cases or user expectations. Getting internal user feedback on a staging environment is incredibly helpful before over-investing in evaluation frameworks.

The Future of AI Engineering Tools

Rod Rivera: This field is evolving rapidly. Do you have any vision or ideas about how this space will evolve in the next five to ten years, especially regarding tools and platforms for AI engineers?

Marc: It's fascinating to see how current tools are adapting to LLM-based application development. I'm excited about the potential for automating much of the experimentation that data scientists and engineers are currently doing manually. Once you have a good understanding of your use cases and evaluation criteria, you could automatically generate and test different ideas, prompts, and agent setups in a structured way.

I also see the number of options exploding, which is why I'm particularly excited about open platforms that focus on doing one thing well and integrate easily with other tools. I believe the future tooling landscape will consist of specific, focused tools that are open to integration with others.

On the application level, I'm most excited about enterprise use cases. For example, agents for ERPs and CRMs that can handle data entry, email management, and communication between large companies. These agents could act as a bridge between big enterprise systems, using text as the common denominator for communication. The vision of distributed agents communicating with each other and negotiating interfaces on the fly is particularly intriguing.

The Reliability of AI Agents

Rod Rivera: Agents often make for impressive demos but can be challenging to use reliably in practice. What do you think is missing for agents to become more robust and reliable?

Marc: Unlike RAG (Retrieval-Augmented Generation) use cases where you can follow a cookbook approach with minor adjustments, successful agent implementations tend to have very unique setups. Teams that are making progress with agents are thinking deeply about how to make them useful for specific use cases, rather than applying a generic framework.

Building valuable agent-based solutions is much harder and requires more experimentation and custom modeling of the problem. Simply having an autonomous system that plans multiple steps and constantly asks itself what to do next isn't always the most efficient approach.

I'm excited to see more cool demos, but moving to production with these systems is significantly more challenging.

The Future of Work in AI

Rod Rivera: When discussing agents, the future of work often comes to mind. How much work do you think these tools will take away from technical individuals like developers, AI engineers, and data scientists? Do you foresee an "AI data scientist" agent handling all our data analysis tasks, or will it primarily enhance productivity without fully replacing human expertise?

Marc: We're already seeing significant impacts. Personally, I use dozens of GPT-4 or other tool queries every day. It helps us move fast with product development and ship more frequently, as simple changes can be handled beautifully by tools like Copilot or GPT-4.

In the data science realm, these tools are making great progress. The exciting aspect is that individuals can now have much more output than before. For instance, data scientists can spend less time on tedious tasks like data wrangling or cleaning with pandas, allowing them to focus on more interesting and challenging topics.

I believe the future lies in knowing how to effectively use these tools, enabling individuals to be significantly more productive and focus on higher-value work.

Tips for Newcomers in AI Engineering

Rod Rivera: For newcomers to AI engineering, what are your top 3-5 tips, either technical or mindset-related, to help them progress in their journey?

Marc: Here are my top tips:

  1. Start immediately: The field is moving incredibly fast, so it's crucial to dive in and start experimenting as soon as possible.
  2. Identify relevant problems: If you're working at a larger company, look for problems in your environment that could be good candidates for AI solutions. Build quick prototypes and show MVPs to colleagues to gain buy-in for full-time work on these projects.
  3. Understand the building blocks: Focus on understanding the available building blocks in AI engineering. This will help you quickly build MVPs and prototypes to demonstrate the potential of your ideas.
  4. Start with larger models: For new projects, most people begin with larger models. Understand how chat completions and functions work with models like OpenAI or Anthropic.
  5. Focus on structured output: When integrating with existing business workflows, pay attention to structured output. Often, you'll want to use the AI's output to trigger other actions rather than just engage in a chat interface.

Innovative Interfaces for AI Applications

Rod Rivera: Many demos seem to default to chat interfaces. How do you approach designing interfaces that go beyond this? Do you have any mental frameworks or resources that inspire you?

Marc: It's interesting that demos often use chat interfaces, but many significant use cases are actually asynchronous or background batch workloads. These might involve using LLMs for document extraction, summarization, or data enrichment without a direct user interface.

I'm particularly interested in tools that generate front-end components for the web. For example, friends are building in the space of generating customizable dashboard components. This involves creating structured output that can be rendered in real-time, allowing for a more native and visually appealing presentation of AI-generated insights.

The exciting part is the potential for simple prompts to generate dedicated, visually appealing components that are native to the application, rather than just text summaries. This combination of AI-generated content with custom UI components is a fascinating area of development.

Exciting Developments in AI Interfaces

Rod Rivera: Are there any companies, startups, or demos in this space that have caught your eye recently?

Marc: I was very excited when Vercel launched their V0 prototype, even though it initially had a waitlist. The design patterns from IOTM are also very interesting. Their prototype allows you to select your current frontend format and then get customized frontend components based on your input.

What's particularly exciting about these tools is the iterative nature of the design process. You can start with a simple prompt to get an initial component, then refine it with subsequent prompts. For example, you might ask for a date range picker, then specify that it should include time as well. This iterative approach, combined with live previews and the ability to directly use the generated code, makes these tools incredibly useful for rapidly developing user interfaces.

Staying Updated in the AI Field

Rod Rivera: With the constant flood of new developments in AI, how do you stay up-to-date? Are there any specific sources or websites you recommend?

Marc: For me, Twitter is the primary source. I follow key people in the space, which provides a constant stream of information about new frameworks, interesting demos, and updates from others in the industry. I particularly enjoy following founders and builders working on complementary products and projects.

Beyond social media, I find great value in talking directly with other founders building on the application layer. These conversations help me understand what new abstractions and ideas they're using to make their products useful.

Lastly, discussions with larger companies help me understand which of these innovations are gaining traction with bigger teams.

The Future of AI Assistants

Rod Rivera: Looking to the future of AI assistants, is there a particular application or use case you're excited about?

Marc: From a non-work perspective, I'm really excited about the potential for AI in meal planning and nutrition. Imagine having a personal AI chef that designs your menu for the coming weeks, considering your preferences for quick, healthy meals, and even ordering groceries for you.

I think there's a lot of potential in B2C applications related to nutrition and sports. While many of the current interesting developments are in B2B, I believe there's room for fun and useful B2C products in these areas.

Rod Rivera: That's a great idea. The combination of LLMs' vast knowledge of recipes, ability to consider dietary restrictions and preferences, and potential for personalization could indeed make for a very impressive product.

Marc: Exactly. It's about democratizing services that were previously only available to the wealthy. AI allows us to build products that provide professional-level services to everyone, which is incredibly exciting.

Closing Thoughts

Rod Rivera: As we wrap up, is there any final message you'd like to share with our audience?

Marc: There's still so much to be built in this space. There are numerous problems waiting to be solved with LLMs and the frameworks around them. It's up to AI engineers and data scientists to iterate on these applications, embrace experimentation, and figure out what works.

I'm incredibly excited to see the products that will launch in the coming months and years. The technology is there โ€“ now it's up to engineers to figure out how to achieve production-grade outcomes for these use cases.

Rod Rivera: Thank you so much for your time and insights, Marc. Where can people find you and learn more about Langfuse?

Marc: Thanks for having me, Rod. The easiest way to learn more is to visit langfuse.com. You can also find us on GitHub at langfuse/langfuse. If you'd like to follow me personally, you can find me on Twitter as Marc Klingen. Thanks again for the invite, and I'm excited to see what's coming in the weeks ahead.

Rod Rivera: Thank you so much, Marc. Have a great day.

More from the Podcast

Post Image: E20: AI Agents & The Intelligence Age: Hype vs. Reality

E20: AI Agents & The Intelligence Age: Hype vs. Reality

Chris Rod Max dive into Sam Altman's 'Intelligence Age' predictions, the state of AI growth, and the rise of AI agents. They debate how close we really are to general AI, examine enterprise AI adoption challenges, and discuss Salesforce's positioning as an AI leader. Plus: will AI fundamentally reshape competitive dynamics in tech, and what role will humans play as AI capabilities expand? Tune in for a nuanced look at the opportunities and risks in the rapidly evolving AI landscape.

Max Tee

VC Expert, AI Investor, BNY Mellon