AI Chip Wars: Biggest winners in the move from training to inference

This TechTV panel discussion on the “AI chip wars” explores who will power the AI future amid intense competition and a shifting technology landscape.

TechTV host Bill Mew starts by framing the market: Nvidia dominates with GPUs but faces pressure from hyperscalers (Google, AWS) building their own chips, a resurgent AMD and Intel, and both new entrants and startups like Cerebras, SambaNova, Groq, Tenstorrent, d-Matrix, and Untether AI.

The difference between training and inference

The difference between training and inference:

Training is the process of building a model — feeding it large amounts of data and adjusting millions (or billions) of internal parameters (weights) so it learns patterns. It’s computationally intensive, runs once (or periodically), and requires massive hardware like GPU clusters. Think of it as education.

Inference is using the trained model to generate outputs from new inputs — a question gets an answer, an image gets a caption. It’s much lighter computationally than training, happens in real time, and is what end users actually experience. Think of it as applying knowledge.

Key differences at a glance:

 TrainingInference
PurposeBuild the modelUse the model
Compute costVery highLower
FrequencyOccasionalContinuous / real-time
Data flowInput → adjust weightsInput → output
HardwareGPU clustersGPUs, CPUs, or edge devices

Our expert guests then weigh in: Andrew Chien (University of Chicago, ex-Intel), Michael Chien (CEO of Fractal), Sarah Osentoski (CTO of Vinci), Wes Hook (customer success director at Fractal), and later Aidong Xu, a semiconductor and photonics consultant.

Andrew explains Nvidia’s dominance came from mastering dense linear algebra and pioneering low-precision math for training. Despite expectations of depreciation, GPU rental prices for H100/H200 chips have surged back above $2/hour, reflecting demand far outstripping supply.

The conversation pivots to the major industry shift from training to inference. Michael argues real enterprise value comes from smaller, focused models running on specific data at the edge—meaning the massive centralized data center buildout may be overbuilt for what’s actually needed. Andrew counters that betting against computing growth has always failed; inference will likely add to, not replace, training demand, especially as agentic AI drives inference-heavy workloads (potentially 90% inference, 10% training).

Wes reports from the field that many enterprise AI implementations struggle at the inference/daily-use stage. Customers increasingly demand constraints on AI, prioritize cost efficiency, and—most importantly—worry about data sovereignty, wanting to keep sensitive records out of cloud LLMs that might scrape them.

Sarah highlights that hardware is becoming harder to design as traditional manufacturing rules of thumb break down—chips warp, overheat, and struggle with power. Her company uses AI foundation models for physics to help hardware designers build better chips. She emphasizes on-prem deployment driven by sovereignty concerns.

Michael stresses that I/O, not compute, is the real bottleneck. CPUs calculate lightning-fast but move data slowly. Organizing data to minimize movement has yielded customers performance gains of 10x to a million-fold, suggesting much current infrastructure may be unnecessary if software becomes more efficient.

Andrew explains the power crisis: Moore’s Law scaling has ended, so each new GPU generation burns dramatically more power—300W to 700W to 1200W to an announced 2300W. Hyperscale training centers now consume 1–5 gigawatts, forcing operators to build their own natural gas, solar, or nuclear power plants. This sets up a fundamental tension between centralized mega-facilities and distributed edge inference.

Aidong Xu adds optimism: current inefficiency is partly because GPUs were repurposed for AI rather than designed for it. New architectures (memory-compute integration), new materials (photonics, optical processors, meta-materials), and supply-chain sovereignty concerns will drive more efficient solutions.

Closing thoughts converge on efficiency. Michael notes enterprise software stacks are spectacularly wasteful after decades of free hardware speedups; rewriting them thoughtfully lets Mac minis outperform server clusters. Sarah sees AI-designed hardware and hardware-aware AI software converging. The panel agrees the chip wars are far from settled—power, data movement, sovereignty, and new architectures will determine the winners.

  • Guest:

    Head of Semiconductor at Cambridge Consultants

  • William Eckhardt Professor of Computer Science, University of Chicago

  • Founder and Chairman of Fractal

  • CTO and Founder of Vinci

  • Guest:

    Director of Customer Success at Fractal

Leave a Reply

Your email address will not be published. Required fields are marked *