How AMD’s New AI PC Makes Running a 235B-Parameter Model Locally Possible

Marina Lyubimova

Running large language models meant renting cloud GPUs or building expensive multi-GPU workstations.

Image source: thetradable.com

Contents

Why Memory Has Become the Main Bottleneck
What Models Fit Into 128GB
Why AMD Chose Qwen3-235B
The Hardware Behind the System
What This Means for Local AI
The Next Test

AMD is betting that memory capacity matters more. Its new Ryzen AI Max+ 395 platform combines a 16-core Zen 5 processor with up to 128GB of unified memory. According to AMD's published specifications, the system can run models as large as Qwen3-235B on a desktop-sized machine measuring just 193 × 185.8 × 77 mm.

The announcement raises a practical question: how much hardware is actually needed to run frontier-scale AI models?

Why Memory Has Become the Main Bottleneck

Most large language models hit memory limits before they hit compute limits. A model cannot run if it does not fit into available memory, regardless of how powerful the processor is. This is why developers often rely on cloud infrastructure or multiple GPUs when working with larger models.

Hardware	Memory
RTX 5070 Ti	16 GB
RTX 5080	16 GB
RTX 5090	32 GB
AMD AI Max+ System	128 GB

The gap is significant. A 128GB memory pool provides four times the capacity of an RTX 5090 and eight times the capacity of most high-end consumer GPUs.

What Models Fit Into 128GB

According to AMD's platform specifications, the 128GB configuration supports several of the largest open-source reasoning models currently available.

These include:

DeepSeek-R1 70B
DeepSeek-R1 120B
GPT-OSS 120B
Llama 4 109B
Qwen3-235B

The last model on that list is particularly notable. Qwen3-235B is not a lightweight local model. It belongs to the same category of systems used to benchmark against OpenAI, Google, and DeepSeek.

Why AMD Chose Qwen3-235B

The model contains 235 billion parameters and uses a Mixture-of-Experts architecture that activates only a fraction of them during inference. That design allows the model to deliver stronger reasoning performance without requiring the full computational cost of a dense 235B model.

Benchmark results place it among the strongest publicly available models.

Benchmark	Qwen3-235B	OpenAI o1	DeepSeek-R1	Gemini 2.5 Pro
ArenaHard	95.6	92.1	93.2	96.4
AIME 2024	85.7	74.3	79.8	92.0
AIME 2025	81.5	79.2	70.0	86.7
LiveCodeBench	70.7	63.9	64.3	70.4
CodeForces	2056	1891	2029	2001

The results show that AMD's demo was built around a model capable of competing with leading proprietary systems in reasoning, mathematics, and coding tasks.

The Hardware Behind the System

The platform is powered by the Ryzen AI Max+ 395:

16 Zen 5 CPU cores
32 threads
Up to 5.1 GHz boost clock
Radeon 8060S graphics
40 RDNA 3.5 compute units
50 TOPS NPU performance
Up to 126 TOPS total AI performance
55W–120W TDP

Unlike traditional AI workstations, the design relies on unified memory shared across CPU, GPU, and NPU resources. That approach reduces the need for large amounts of dedicated GPU memory, which is often the limiting factor when deploying large models locally.

What This Means for Local AI

The more interesting part of AMD's announcement is not the processor itself. It is the idea that large reasoning models are becoming constrained by memory architecture rather than raw GPU performance.

For developers, researchers, and companies running private workloads, a 128GB local system changes what can be deployed without cloud infrastructure. The remaining question is performance. AMD has demonstrated that Qwen3-235B fits and runs on the platform, but real-world adoption will depend on inference speed, responsiveness, and cost relative to GPU-based alternatives.

The Next Test

The next benchmark to watch is not model quality but tokens per second. If systems based on Ryzen AI Max+ 395 can deliver usable inference speeds with models such as Qwen3-235B, DeepSeek-R1 120B, and Llama 4 109B, local AI deployment may become a practical alternative to cloud inference for a wider range of workloads.

At that point, the discussion shifts from whether large models can run on a desktop PC to whether they still need a data center at all.

News Source

#AI #AMD

Marina Lyubimova E-mail

Marina Lyubimova is the acting editor at TheTradable.com with over eight years of experience in journalism, overseeing market news during UK trading hours.