UPDATE: Startups are raising alarms over the performance of Amazon’s AI chips, claiming they fall short compared to the highly regarded Nvidia GPUs. An internal document reveals that AWS’s Trainium chips are struggling to keep pace, posing significant challenges for Amazon’s ambitions in the AI market.
According to an internal memo dated July 2023, AI startup Cohere reported that Amazon’s Trainium 1 and 2 chips are “underperforming” when matched against Nvidia’s H100 GPUs. The document, obtained by Business Insider, highlights that access to Trainium 2 is “extremely limited” and plagued by frequent service disruptions—issues Amazon is reportedly still investigating with its chip division, Annapurna Labs.
Why This Matters: With Nvidia commanding over 78% of the AI chip market, Amazon’s struggles could hinder its plans for growth in the lucrative AI sector. As startups express dissatisfaction, AWS risks losing valuable business to competitors who can deliver better performance and reliability.
Other startups, including Stability AI, echoed similar concerns, indicating that Amazon’s chips lag in speed and cost efficiency compared to Nvidia’s offerings. The document warns that these “performance challenges” could deter customers from adopting Amazon’s cloud services.
Amazon’s in-house chips are seen as a crucial element in its strategy to provide AI services without incurring high costs associated with Nvidia’s GPUs. Historically, AWS’s profitability stemmed from designing its own data-center chips, but now, customer feedback reveals significant obstacles in making the switch to Trainium.
An Amazon spokesperson stated the company is “grateful” for feedback that supports the improvement of their chips. They emphasized that Trainium and another chip, Inferentia, have garnered positive results from clients like Ricoh and Datadog. However, they acknowledged that the case with Cohere is “not current,” suggesting ongoing improvements.
Despite this, the memo indicates that other clients, such as Typhoon, found Nvidia’s older A100 GPUs to be up to three times more cost-efficient than AWS’s Inferentia 2 chips for specific workloads. Additionally, a research group, AI Singapore, determined that AWS’s G6 servers with Nvidia GPUs offered superior cost performance compared to Inferentia.
In a significant partnership announcement, AWS recently entered a $38 billion deal with OpenAI for AI cloud services, which exclusively utilize Nvidia GPUs, sidelining Amazon’s Trainium. Analysts noted that this absence could be seen as a setback for Amazon, particularly given Nvidia’s established platform and performance.
Analysts from Bank of America expressed skepticism regarding Trainium’s capabilities, questioning whether demand would grow beyond its current high-profile customer, Anthropic, which is focused on training AI models using AWS’s chips.
What’s Next: Amazon is set to preview Trainium 3 later this year, aiming to address the current shortcomings highlighted by customers. AWS CEO Andy Jassy announced that the Trainium 2 chips have become a “multibillion-dollar” business and emphasized the importance of diversifying chip options for clients.
As Amazon navigates these challenges, the pressure is on to enhance its AI chip offerings to remain competitive. Investors and customers alike are eagerly watching for improvements in Trainium’s performance and broader adoption across the cloud landscape.
With Amazon’s shares having surged following reports of a 20% revenue growth to $33 billion in the last quarter, the stakes are high for AWS as it seeks to retain its position in an increasingly competitive market. The outcome of these developments could reshape the cloud computing landscape, making it essential for Amazon to address the pressing concerns raised by its clients.






































