Consulting Call Transcript (August 18, 2025)
SemiconSam Original Report
I’ll start this call with Nomura’s analysis.
Nomura stated the following in a recent report:
Nvidia will occupy 60% of TSMC’s CoWoS capacity in 2026.
Originally, Nvidia was expected to account for less than 50% of CoWoS capacity in 2026.
However, Nvidia moved to make more aggressive reservations—why?
This is interpreted as an attempt to suppress ASIC production.
In other words, because TSMC’s capacity is limited, by overbooking Nvidia is effectively using cash to block competitors from producing their chips. (It has been confirmed that Nvidia paid TSMC a considerable advance to secure a large share of CoWoS output.)
On top of that, remember how talk of Nvidia cutting CoWoS output in the first half of this year shook its share price?
This means Nvidia also overbooked CoWoS this year. Then, when it actually came time to place orders, Nvidia canceled part of its CoWoS reservations, and a flood of negative rumors erupted across the supply chain.
Naive investors fell for this, sold Nvidia, and incurred a fairly large opportunity cost.
Next year, Nvidia will surely cut CoWoS as well. Otherwise, chips could exceed supply. That could place too heavy an inventory burden on server ODMs.
That doesn’t mean Nvidia will lower prices when inventory increases.
When inventory increases, Nvidia chooses to spend money to eliminate inventory rather than cut prices.
Next, I’ll share my thoughts on Morgan Stanley’s TCO report.
Frankly speaking… judged purely as a report, one cannot deny that a great deal of effort went into it.
However, I see several flaws.
First, an equipment utilization rate (UTR) of 70% doesn’t make sense.
In the real market, unit economics vary with enterprise contracts, bundled services, and ad subsidies, and utilization varies widely—from below 50% to above 80% depending on workloads and SLAs.
It’s locked into a single utilization scenario, so sensitivity analysis can be seen as lacking.
Second, while the report applies MLPerf inference benchmarks with FP16 standardized for performance estimation, real-world inference differs substantially due to FP8/FP4, sparse compute, KV cache, and serving software optimizations.
Also, the assumption of fixing input/output tokens at 1024 fails to reflect realistic workload distributions such as short queries, tool calls, and RAG.
The report’s premise of linear scaling (8× rack expansion ⇒ 8× TPS) likewise appears to be an optimistic estimate that underestimates network, synchronization, and scheduling overheads.
Third, it applies 4-year depreciation for hardware and 10-year depreciation for data center infrastructure. In reality, AI accelerator generation cycles are 12–18 months, so residual value declines much faster.
In addition, grid build-out, cooling infrastructure, land/regulatory costs vary greatly by region, so the assumption that 100MW = $1 billion is a highly simplified one.
Operating costs such as personnel, software licenses, security/compliance, and network egress are largely excluded, so there’s a high likelihood that TCO has been systematically understated.
Fourth, it views the value of ASIC chips through an overly distorted lens.
By homogenizing structures where R&D, IP reuse, and software bundle value differ—such as merchant GPU vendors like Nvidia versus in-house ASICs at AWS/Google—the actual price signals are made to look distorted.
In conclusion, it’s an analytical attempt worth referencing, but applying it to reality requires far more high-dimensional modeling.
This report can be seen as a meaningful first step toward quantifying the cost structure of an AI inference factory.
However, by simplifying or omitting multiple factors—price, utilization, performance, depreciation, operating costs, cooling, and supply chain constraints—it’s risky to use as-is for real investment decision-making.

