What makes these launches notable isn't just their timing, but how they balance smarter reasoning with dramatically lower costs. For anyone building AI applications, these models represent a practical step forward in making powerful reasoning systems actually affordable to deploy.
What's New in These Models
AI analyst Artificial Analysis recently called attention to DeepSeek's aggressive release schedule, with two new models—V3.1 Terminus and V3.2 Exp—dropping in September just weeks apart. V3.1 Terminus arrived as an upgraded reasoning model that improves on V3.0's performance without burning through more tokens. Benchmarks show it scored 4 points higher on the Artificial Analysis Intelligence Index while maintaining the same token efficiency. Essentially, you get better outputs without paying more per query.
Conversation
V3.2 Exp takes a different approach with experimental hybrid reasoning architecture. While it scores slightly below Terminus, it makes up for this by cutting token costs substantially. For large-scale applications where you're processing millions of queries, those savings add up fast. It's a deliberate trade-off: slightly less powerful, but much cheaper to run at scale.
- V3.1 Terminus: Available through SambaNova, DeepInfra, Fireworks, GMI, and Novita. SambaNova is particularly impressive here, pushing around 250 tokens per second—roughly 10x faster than DeepSeek's own servers.
- V3.2 Exp: Running on DeepSeek's API, DeepInfra, GMI, and Novita, with DeepInfra leading at 79 tokens per second.
Having multiple providers means you're not locked into one platform and can shop around for the best pricing and performance for your specific needs.
DeepSeek is showing how competitive the reasoning model space has become. While OpenAI, Anthropic, and Google dominate headlines with their multimodal systems, DeepSeek is focusing on something more pragmatic: efficiency, cost reduction, and deployment speed. Lower per-query costs make it viable to run AI reasoning at scale without blowing your budget. Higher accuracy scores mean fewer errors in applications where decisions actually matter. And wide platform support gives developers flexibility in how and where they deploy.