DeepSeek-V3: Chinese Large Model's Commercial Path From Compute Arms Race to Pursuing Maximum Model Efficiency
China is pursuing a “scale + efficiency” growth model in AI
Recently, High-Flyer Quant (幻方量化) released the DeepSeek-V3 model, sparking considerable attention. Early discussions focused on its near-competitive performance with closed-source models like GPT-4o and Claude-3.5-Sonnet, while allegedly requiring only one-tenth the training cost of similarly capable models. According to reports, DeepSeek-V3 used just 2,048 H800 GPUs and finished training in under two months at a total cost of under $6 million; meanwhile, Llama 3.1 405B consumed 16,000 H100 GPUs over 80 days. This discrepancy has led many to ask: “Has the computational demand for large models been overstated?”
I. Later Entrants Don’t Necessarily Mean “Less Compute”
From a broader perspective, it’s indeed true that the compute needed for each new generation of models tends to drop over time due to algorithmic improvements, hardware cost reductions, and data distillation. These factors allow newer models to avoid the pitfalls of their predecessors—like traversing a maze where the first explorers map out a path, and subsequent followers simply enjoy a shortcut.
However, some expenses aren’t accounted for in the official figures. According to Fantasia Quant, the reported training budget for DeepSeek-V3 covers only its formal training phase; it excludes the resources devoted to data preparation, model architecture adjustments, and the use of the R1 model (akin to OpenAI’s o1) for generating high-quality data. The post-training knowledge distillation from R1 to V3 also consumed substantial compute power—it’s merely not reflected in the main cost line.
II. Compute Hasn’t Really Decreased; It’s Just Shifted Elsewhere
Historically, model improvements hinged on large-scale pre-training—feeding massive amounts of data and parameters, effectively “scaling up.” However, as returns have diminished, many teams now channel compute toward other targeted applications, such as improved synthetic data, deeper RL routines, or optimization in the inference phase.
Estimates suggest that boosting the reasoning abilities of models on par with GPT-4 or Claude-3.5 may involve generating 1–10 TB of high-quality synthetic reasoning data, potentially costing billions of dollars. In other words, the GPUs once dedicated purely to “scaling up” are now being spread across multiple steps—synthetic data, post-processing, inference tests—all used to overcome limitations in data quality and efficiency.
III. Major Companies Haven’t Stopped Ramping Up GPU Purchases
Statistics from various media and research outlets (such as LessWrong) indicate that leading tech firms are not cutting back on GPU investments; quite the opposite. Microsoft, Google, Meta, Amazon, and even xAI are aggressively acquiring or provisioning H100-class GPUs. “Burning compute” remains essential for iterating large models, and as major players refine revenue models around AI, lavish spending becomes increasingly justifiable.
IV. Smaller Teams Can Still Deliver Surprising Results
DeepSeek-V3 demonstrates a crucial takeaway: You don’t need a mountain of GPUs to produce a model that rivals top-tier giants. If you refine technical details and engineering methodologies, an organization with relatively moderate resources can still achieve impressive outcomes. As Kai-Fu Lee has often pointed out, China’s AI advantage typically lies in “good, fast, affordable” engineering, rather than boundless spending.
Admittedly, certain practical hurdles remain—China’s supply of high-end GPUs may be limited, and overseas closed-source models still hold considerable sway in training. Nonetheless, constraints on training do not equally affect inference, and real-world deployment hinges on complex business, cost, and operational considerations—factors that simple hardware restrictions cannot completely halt.
V. Diverging Paths: Potentially Different Tracks for U.S. and China
Looking ahead, the U.S. and China could diverge onto two distinct AI tracks:
United States: Capitalizes on near-unlimited resources to push the frontiers of large-model breakthroughs, anchoring a robust enterprise SaaS ecosystem for stable returns.
China: Focuses on finding the ideal balance between cost and efficiency, leveraging its vast user base to fuel model applications—effectively pursuing a “scale + efficiency” growth model.
Over the past decade, the U.S. pivoted to enterprise software after the mobile internet wave, producing a host of thriving SaaS companies, while China’s consumer-facing internet soared with platforms like Meituan and Pinduoduo. Now, “AI industrialization” could follow a similarly bifurcated path—American efforts may concentrate on ever more powerful large models, while China explores large-scale commercial implementations.
I've tested it a bit today, it's very powerful but too much limitations with the "terms and conditions" policy that limits the usage for news research and related writing needs.