AI Chip Wars: Biggest winners in the move from training to inference
This TechTV panel discussion on the “AI chip wars” explores who will power the AI future amid intense competition and a shifting technology landscape.
TechTV host Bill Mew starts by framing the market: Nvidia dominates with GPUs but faces pressure from hyperscalers (Google, AWS) building their own chips, a resurgent AMD and Intel, and both new entrants and startups like Cerebras, SambaNova, Groq, Tenstorrent, d-Matrix, and Untether AI.
The difference between training and inference
The difference between training and inference:
Training is the process of building a model — feeding it large amounts of data and adjusting millions (or billions) of internal parameters (weights) so it learns patterns. It’s computationally intensive, runs once (or periodically), and requires massive hardware like GPU clusters. Think of it as education.
Inference is using the trained model to generate outputs from new inputs — a question gets an answer, an image gets a caption. It’s much lighter computationally than training, happens in real time, and is what end users actually experience. Think of it as applying knowledge.
Key differences at a glance:
| Training | Inference | |
|---|---|---|
| Purpose | Build the model | Use the model |
| Compute cost | Very high | Lower |
| Frequency | Occasional | Continuous / real-time |
| Data flow | Input → adjust weights | Input → output |
| Hardware | GPU clusters | GPUs, CPUs, or edge devices |
Our expert guests then weigh in: Andrew Chien (University of Chicago, ex-Intel), Michael Chien (CEO of Fractal), Sarah Osentoski (CTO of Vinci), Wes Hook (customer success director at Fractal), and later Aidong Xu, a semiconductor and photonics consultant.
Andrew explains Nvidia’s dominance came from mastering dense linear algebra and pioneering low-precision math for training. Despite expectations of depreciation, GPU rental prices for H100/H200 chips have surged back above $2/hour, reflecting demand far outstripping supply.
The conversation pivots to the major industry shift from training to inference. Michael argues real enterprise value comes from smaller, focused models running on specific data at the edge—meaning the massive centralized data center buildout may be overbuilt for what’s actually needed. Andrew counters that betting against computing growth has always failed; inference will likely add to, not replace, training demand, especially as agentic AI drives inference-heavy workloads (potentially 90% inference, 10% training).
Wes reports from the field that many enterprise AI implementations struggle at the inference/daily-use stage. Customers increasingly demand constraints on AI, prioritize cost efficiency, and—most importantly—worry about data sovereignty, wanting to keep sensitive records out of cloud LLMs that might scrape them.
Sarah highlights that hardware is becoming harder to design as traditional manufacturing rules of thumb break down—chips warp, overheat, and struggle with power. Her company uses AI foundation models for physics to help hardware designers build better chips. She emphasizes on-prem deployment driven by sovereignty concerns.
Michael stresses that I/O, not compute, is the real bottleneck. CPUs calculate lightning-fast but move data slowly. Organizing data to minimize movement has yielded customers performance gains of 10x to a million-fold, suggesting much current infrastructure may be unnecessary if software becomes more efficient.
Andrew explains the power crisis: Moore’s Law scaling has ended, so each new GPU generation burns dramatically more power—300W to 700W to 1200W to an announced 2300W. Hyperscale training centers now consume 1–5 gigawatts, forcing operators to build their own natural gas, solar, or nuclear power plants. This sets up a fundamental tension between centralized mega-facilities and distributed edge inference.
Aidong Xu adds optimism: current inefficiency is partly because GPUs were repurposed for AI rather than designed for it. New architectures (memory-compute integration), new materials (photonics, optical processors, meta-materials), and supply-chain sovereignty concerns will drive more efficient solutions.
Closing thoughts converge on efficiency. Michael notes enterprise software stacks are spectacularly wasteful after decades of free hardware speedups; rewriting them thoughtfully lets Mac minis outperform server clusters. Sarah sees AI-designed hardware and hardware-aware AI software converging. The panel agrees the chip wars are far from settled—power, data movement, sovereignty, and new architectures will determine the winners.
-
Host: Bill Mew
Tech TV Presenter
-
Guest: Aidong XuHead of Semiconductor at Cambridge Consultants
-
Guest: Andrew ChienWilliam Eckhardt Professor of Computer Science, University of Chicago
-
Guest: Micheal CationFounder and Chairman of Fractal
-
Guest: Sarah OsentoskiCTO and Founder of Vinci
-
Guest: Wes HookDirector of Customer Success at Fractal