E14: NVIDIA's AI chip delays, Meta's open-source gambit & Big Tech's $100B AI spending spree

In this episode, Max and Rod discuss several topics related to AI, including NVIDIA's dominance in the GPU market and the design flaw in their next AI chip. They also talk about Apple's use of Google chips and Meta's AI strategy. The conversation touches on the potential impact of antitrust issues and the challenges of optimizing AI architectures for different hardware. The hosts also explore the significant investments being made by big tech companies in AI and the differing perspectives on the potential returns on these investments.

Takeaways

NVIDIA's dominance in the GPU market and the design flaw in their next AI chip could impact the supply of GPUs and slow down innovation in the AI space.
Apple's use of Google chips and Meta's AI strategy indicate the competition and diversification in the AI market.
Investments in AI by big tech companies, despite not generating immediate revenue, are strategic moves to secure a dominant position in the future.
The differing perspectives on the potential returns on AI investments highlight the debate between short-term profitability and long-term transformation.
New players can disrupt the AI landscape, and incumbents must invest in AI to stay competitive and avoid being disrupted.

Episode Transcript

Introduction

Max: Welcome to the next episode of the Chris Rod Max show. Today we have Rod with us. Unfortunately, Chris couldn't join us, but we'll be discussing AI stories as we do every week. We have a lot of news to cover today, especially given the recent market fluctuations. Our main topics will be:

NVIDIA's market dominance and recent acquisition
Apple's GPU strategy
Meta's AI approach
The impact of market changes on AI companies

Let's dive in.

NVIDIA's Market Dominance and Chip Design Flaw

Max: We know NVIDIA has a huge dominance in the GPU market, controlling about 75% to 90% according to the latest reports. There's recent news about a design flaw in their next AI chip, causing a delay. This is interesting for two reasons:

Many players are waiting for this type of GPU.
NVIDIA is facing more competition, despite some competitors like Intel not performing as well.

Rod, what's your take on this AI chip design flaw and its impact on the overall GPU supply?

Rod: We're talking about the next-generation B200, or so-called Blackwell AI chip. The design flaw was discovered very late in the process, pushing the chip's release back to the first quarter of next year. This affects many large buyers of NVIDIA GPUs, such as Microsoft, Google, and Meta, who have already placed significant orders.

This chip is the successor to the very popular H100 chips. Anyone working with GPUs can attest that these are among the most sought-after chips. This delay can affect innovation for many companies.

Bold statement: Currently, if you want to order an NVIDIA A100, you're put on a waiting list that can be up to two years long.

This delay could potentially impact research and advancements in the AI space. Many breakthroughs result from optimizations in chip architecture, squeezing the last drop of performance from an AI chip. If we can't access the latest models, researchers developing new technologies might be slowed down or unable to pursue new avenues of investigation.

Max: That's interesting. Blackwell is the new chip that's supposed to supplement or replace the H100 chip, which is already in high demand. With this delay, we'll likely see a temporary shortage in the market.

There are other players trying to enter this market, such as AMD, Intel, and even Google. Do you think this delay might push some players to switch to other chips? What are your thoughts on the competition?

Rod: Some companies might start considering a multi-vendor strategy and looking into other options. However, the reality is that all other manufacturers also face their own challenges with supply chain and manufacturing. It's not a situation where you can simply switch from NVIDIA to AMD and get immediate delivery. They also have their constraints and don't have the same level of manufacturing capacity as NVIDIA.

More importantly, many AI developments are the result of over-optimizing for specific hardware architectures. When transitioning to architectures that aren't NVIDIA GPUs, you lose some of that edge. While other AI chip manufacturers like AMD, Intel, and Google try to compensate and offer ways to bridge this performance gap, it's not a simple plug-and-play solution.

Bold statement: The models we see are so finely tuned and tailored to the NVIDIA architecture that switching to another vendor requires significant redesign and optimization.

DOJ Probe into NVIDIA's Acquisition of Run.ai

Max: On a related note, the Department of Justice (DOJ) is probing NVIDIA for potential antitrust issues, especially regarding their acquisition of Run.ai. Given your technical background, could you explain what Run.ai does for NVIDIA and why the DOJ is looking into this?

Rod: Earlier this year, NVIDIA acquired Run.ai, a company specializing in GPU management software, for about $700 million. Run.ai's software is crucial for large organizations that don't just have one or two GPUs, but hundreds of GPU cards. It helps orchestrate workflows and optimize the use of these resources.

The DOJ's interest stems from two main concerns:

NVIDIA's size and dominance in the AI market. They're often the first company people associate with AI.
Potential market pressure. There are claims that NVIDIA might have pressured cloud providers into buying their products and overcharging for networking equipment when customers choose competitors' chips.

Bold statement: It's estimated that NVIDIA controls up to 95% of the AI chip market. We're not talking about equal players; there's NVIDIA, and then there's a long tail of tiny players.

Max: That's fascinating. It seems Run.ai could potentially prioritize resources towards NVIDIA's chips, further solidifying their market position. The monopolistic implications here could be quite profound.

Apple's Choice of Google Chips

Max: Moving on to another player in the chip market, there's news that Apple has opted for Google chips in its infrastructure. This seems to be a move to counter NVIDIA's dominance and address some of the delays we discussed. It's fascinating to see two tech giants coming together like this. Rod, what are your thoughts on Apple using Google chips, at least for some of their projects?

Rod: Yes, a recent research publication revealed that Apple has been using Google TPUs (Tensor Processing Units) at least partially for their research efforts. TPUs are an alternative architecture to NVIDIA GPUs. Traditionally, cutting-edge researchers would default to working with NVIDIA GPUs and hardware, so it was quite surprising to see Apple exploring this alternative architecture.

The assumption is that these Google TPUs might power Apple's cloud services, such as their upcoming AI writing assistants and image editors. This aligns with Apple's history of dealing with monopolies, from Microsoft to Intel.

However, we should remember that Apple has previously bet on suppliers with superior technical solutions that ultimately couldn't deliver the necessary supply. For example, in the late '90s, Apple used PowerPC architecture before switching to Intel chips due to manufacturing and innovation constraints.

Bold statement: Apple's move could be seen as a positive diversification, introducing new architectures and fostering innovation. However, we've seen this playbook before, and it's possible that Apple might eventually return to using GPUs if Google's TPUs prove too niche for mass-scale production.

Meta's AI Strategy

Max: Let's shift gears and discuss Meta's AI strategy. Mark Zuckerberg recently outlined Meta's vision going forward, including a significant drive to build compute clusters and data centers. They've also released their own AI model, Llama. Rod, could you break down Meta's AI strategy and how they're positioned compared to other players in the AI space?

Rod: Meta's strategy has several key components:

Open-source contribution: Meta's Llama models are free of charge and available for download. These models are among the best in the market, often rivaling OpenAI's models.
Ecosystem development: By offering open-source models, Meta is enabling other players to compete with OpenAI, potentially preventing OpenAI from becoming a monopoly in the AI space.
Platform integration: Meta is incorporating AI functionality across all its products, aiming to keep users within their ecosystem rather than migrating to platforms like ChatGPT.
Independence and platform creation: Historically, Meta has struggled to be a "rule maker" rather than a "rule taker," always living on someone else's platforms. Their AI initiatives, along with previous efforts like the metaverse and attempts at creating a Facebook phone or operating system, are attempts to become a platform themselves.

Bold statement: Meta is reportedly spending around $40 billion this year on AI initiatives, with expectations that the next Llama model will require nearly 10 times more computing power than its predecessor.

Max: That's a significant investment. From an investor's perspective, it really depends on the type of investor you are. Venture investors, aligned with the Silicon Valley mindset, are more likely to support high upfront costs with the expectation of low marginal costs and high returns in the future.

The key questions are: When will we see these AI applications reach prime time? What are the underlying signals indicating that some of this investment can start generating revenue? From an investor's standpoint, it's crucial to look at the use cases and the value they're driving from an end-user perspective.

Big Tech's AI Spending Spree

Max: There's a recent article in the Financial Times stating that big tech groups are upping their AI spending spree, with over $100 billion invested just in the beginning. At the same time, activist investors like Elliott are saying that AI is overhyped and NVIDIA is in a bubble. Rod, could you break down how much the large tech companies have been spending in the first six months of 2024, and what gains or losses they've seen so far?

Rod: It's estimated that large tech companies like Microsoft, Alphabet, Amazon, and Meta have spent over $100 billion on AI-related investments just in the first half of this year. This is 50% more than what they were spending last year. Yet, none of them really has the revenue numbers to justify this level of expenditure.

We're in a "trust us, we're doing the right thing" situation. These companies are betting on overinvesting now to secure a dominant position in the future. However, this could lead to overexpansion, similar to what happened with internet infrastructure in the early 2000s.

Bold statement: We could see a situation where these companies are making massive investments, but we may not see any relevant returns on this expenditure for many years.

Max: I think that resonates with some of my thoughts. A lot of this expenditure is going into new data centers for cloud computing businesses. We may not see immediate value today, but that doesn't mean it won't come. Building new applications and new ways of doing things always takes longer than we'd like, especially if it's transformational.

In the long run, I believe some of this expenditure will eventually see returns because it's potentially transformational. Even if we don't see immediate results, I'd rather see these companies invest in something with the potential to change the way we work than do nothing at all.

Conclusion

Max: Thank you for listening in today. We've covered a lot of ground:

NVIDIA's chip design flaw and the DOJ's probe into their acquisition of Run.ai
Apple opting for Google chips in their infrastructure
Meta's AI strategy and its focus on future profits
Big tech groups spending billions on AI, and the mixed reactions to this investment

Remember to like and subscribe for the latest news on AI. Thank you for listening, and we'll speak to you next week.