Breaking the Wall: How CharacterAI Scaled AI Inference to Millions with AMD and DigitalOcean

What does it actually take to scale AI inference to millions of users in production? In this conversation, Guy Currier, Chief Analyst for Visible Impact & Research Director at the Futurum Group, sits down with leaders from AMD, CharacterAI, and DigitalOcean to pull back the curtain on one of the most demanding real-world AI deployments in the market today.

CharacterAI’s Chief Architect walks through the memory bottlenecks that were limiting their ability to serve larger, more immersive conversational AI models. AMD’s MI325X accelerators, with their substantial 256GB HBM capacity, enabled CharacterAI to fit the same models on half the hardware. But getting there required DigitalOcean and AMD to work in close lockstep — resolving early memory access issues, writing and upstreaming vLLM kernels, and ensuring the full hardware-software stack was production-ready before CharacterAI ever flipped the switch.

But the hardware is only part of the story. This discussion also covers the software stack, the collaborative process, and the organizational factors that enabled a complex platform migration in roughly three months. Whether you're an AI engineer wrestling with inference costs, an architect evaluating GPU platforms, or a technical leader navigating a major infrastructure transition, this discussion offers rare, ground-level insight into the decisions and tradeoffs that separate a smooth deployment from a painful one.

Scaling AI inference to millions of users is no longer theoretical. In this session, leaders from AMD, CharacterAI, and DigitalOcean sit down with Guy Currier, Chief Analyst for Visible Impact & Research Director at the Futurum Group, to walk through what it took to make it work in production.

At the center of the story is a hard constraint: memory. CharacterAI’s Chief Architect shares how memory bottlenecks limited their ability to run larger, more immersive models, and how AMD MI325X accelerators with 256GB of HBM enabled them to run those same models on half the hardware.

Getting there wasn’t plug-and-play. AMD and DigitalOcean worked side by side to resolve early memory access issues, contribute vLLM kernel optimizations, and bring the full stack to production readiness before launch.

This session goes beyond hardware to unpack the software, collaboration model, and decision-making that enabled a complex platform migration in just three months. If you’re evaluating inference infrastructure or managing the tradeoffs between cost, performance, and scale, this is a practical look at what it really takes.

Click here to learn more about how CharacterAI runs inference at scale on DigitalOcean:
(https://www.youtube.com/watch?v=Fysbs9mrGEI )

Markus Hartikainen

Senior Manager, Software Development - AMD

Markus leads engineering teams focused on AI inference performance and production readiness on AMD Instinct™ GPUs. He drives customer-driven roadmap work in vLLM, upstreaming production learnings into features such as speculative decoding and attention optimizations. Markus holds a PhD in computer science and brings deep experience across the AI stack, from systems-level C++ and performance engineering to model deployment at scale.

James Groeneveld

Chief Architect - Character.AI

James has spent the last three years architecting the technical stack and driving cross-functional alignment for one of the world's most high-traffic generative AI platforms, with about 20 million monthly users. He bridges the gap between frontier research and product engineering, partnering with leadership to define top-line product strategy and identify monetization opportunities. Beyond his technical oversight, James plays a pivotal role in the company’s long-term roadmap, managing the complex intersection of large-scale consumer infrastructure and high-velocity AI innovation to deliver nuanced intelligence to millions of users.

Debarshi Raha

VP/Fellow Engineer - DigitalOcean

Debarshi is a Fellow Engineer at DigitalOcean, where he focuses on advancing data and AI services for DigitalOcean inference-cloud. With a deep background in distributed systems and cloud architecture, he previously spent years at AWS building and launching core services, including OpenSearch and Personalize.

Guy Currier

Research Director - The Futurum Group

Guy is responsible for positioning, go-to-market, and sales guidance across technologies and markets. He has decades of field experience describing technologies, their business and community value, and how they are evaluated and acquired. Guy’s specialty areas include AI, DevOps/cloud-native, enterprise applications, application integration, Big Data, governance-risk-compliance, containerization, virtualization, HPC, CPUs-GPUs-xPUs, and systems lifecycle management.

Guy started his technology career as a research director for technology media company Ziff Davis, with stints at PC Magazine, eWeek, and CIO Insight. Prior to joining Visible Impact, he worked at Dell, including postings in marketing, product, and technical marketing groups for a wide range of products, including engineered systems, cloud infrastructure, enterprise software, and mission-critical cloud services. He lives and works in Austin, TX.