Training vs Inference — Your Two Biggest Cost Lines
Training is upfront CapEx; inference is per-user OpEx. Confusing them destroys unit economics and kills Series A companies that scale before the spreadsheet says they can.
Full — every example, fold, and depth note.
Key takeaway
Training buys capability once. Inference rents capability on every click. Founders who model inference cost per user before launch survive viral growth; founders who do not get a bridge-round conversation they did not plan for.
Training cost
The one-time capital expenditure — and why most startups should never pay it
Key takeaway
Training is the massive offline investment that produces a frozen model: GPU clusters, months of calendar time, and irrecoverable spend if the run fails. It is R&D CapEx, not a feature launch. Almost every startup should rent pre-trained models instead of funding pre-training.
Why this matters for you
Boards and investors will ask why you are not 'building your own AI.' The answer is arithmetic: frontier training runs cost eight to nine figures. Founders who cannot articulate this confuse capital allocation and approve projects that consume a Series A without producing a deployable product.Training is what happens before any user sees your product. Clusters of GPUs run for weeks or months, processing trillions of tokens or millions of images, adjusting weights through billions of forward-backward cycles. You are paying to manufacture the asset; you have not yet paid to operate it.
Training cost
Training is the massive offline investment that produces a frozen model: GPU clusters, months of calendar time, and irrecoverable spend if the run fails. It…
Inference cost
The recurring operational expenditure on every user click
Key takeaway
Inference is the per-query cost of running the frozen model: input tokens processed, output tokens generated, GPUs rented by the millisecond. It is COGS. Every user interaction spends money — there is no marginal cost near zero like traditional SaaS.
Why this matters for you
Your gross margin is set at inference time, not at fundraising time. Founders who launch AI features without a per-query cost model discover at 10× usage that their best customers are their least profitable — often during the same quarter they are pitching Series A.Inference is a forward pass — data in, prediction out, weights unchanged. The model does not learn during inference. It applies the weights produced by training to calculate the next token, classification, or embedding. Training is a bonfire you light once. Inference is a meter that runs forever.
The unit economics trap
Technically impressive and financially catastrophic — at the same time
Key takeaway
The unit economics trap is shipping an AI feature customers love whose inference cost per user exceeds the revenue that user generates. Viral adoption makes this worse, not better — there are no economies of scale in the forward pass.
Why this matters for you
Series A investors will model your gross margin. If your AI feature drags blended margin below 50% with no path to improvement, you are a services business wearing a SaaS multiple. Founders who catch this pre-launch keep fundraising optionality; founders who catch it post-viral spike negotiate from weakness.Traditional SaaS unit economics assume near-zero marginal cost per user. AI breaks that assumption permanently. Each generation requires fresh compute proportional to tokens processed. Caching helps only when queries repeat identically — rare in conversational or personalised products. Your LTV:CAC ratio means nothing if gross margin on the AI SKU is negative.
The unit economics trap
The unit economics trap is shipping an AI feature customers love whose inference cost per user exceeds the revenue that user generates. Viral adoption makes…
API dependency vs model ownership
The strategic tradeoff you must choose — not stumble into
Key takeaway
API dependency means paying retail per-token prices with zero infrastructure overhead. Model ownership means renting GPUs 24/7 and hiring ML ops — but capturing wholesale inference margins at scale. Neither is universally correct; the answer is a function of volume, capital, and control requirements.
Why this matters for you
Founders who default to APIs without a migration thesis pay forever. Founders who self-host too early pay idle GPU bills while product-market fit is still uncertain. The decision should be modelled, dated, and revisited at revenue milestones — not inherited from engineering preference.APIs trade margin for speed and optionality. You ship in days, scale instantly, and swap models when vendors release improvements. You pay a markup that funds the vendor's training CapEx and profit. APIs are the correct seed-stage default for most products.
API dependency vs model ownership
API dependency means paying retail per-token prices with zero infrastructure overhead. Model ownership means renting GPUs 24/7 and hiring ML ops — but…
Inference cost at scale
The numbers that kill Series A companies between 1,000 and 100,000 users
Key takeaway
Going from 1,000 to 100,000 users can multiply inference spend 100× while revenue per user stays flat. Auto-regressive generation makes long outputs disproportionately expensive. There is no volume discount on the forward pass.
Why this matters for you
Series A decks show hockey-stick revenue. They often omit hockey-stick COGS. Founders who model inference at 10× and 100× current usage avoid the emergency bridge round caused by a viral feature that loses money on every click.Auto-regressive generation makes output length a cost multiplier. Each generated token requires another full forward pass through the model. A 2,000-token report is not twice the cost of a 1,000-token report — it is roughly twice the sequential compute, with output tokens priced at a premium. Default verbosity in your product is a COGS policy, not a UX accident.
Cost optimisation levers
Caching, routing, compression, and smaller models — before you raise a bridge round
Key takeaway
Inference cost is not fixed. Caching, model routing, prompt compression, batching, and distillation can cut COGS 40–80% without killing the feature — if you invest before crisis, not during it.
Why this matters for you
Investors prefer founders who show margin discipline proactively. A bridge round to 'fix unit economics' signals you shipped without understanding the business model. These levers are cheaper than emergency fundraising.Model routing sends easy queries to cheap models and hard queries to expensive ones. A classifier or small model can handle FAQ routing, extraction, or simple summarisation. Reserve frontier models for multi-step reasoning. Ask engineering for a routing architecture in the MVP spec, not the Series A retrofit.
Cost optimisation levers
Inference cost is not fixed. Caching, model routing, prompt compression, batching, and distillation can cut COGS 40–80% without killing the feature — if you…
The inference cost conversation with your CTO
Five questions every founder should ask — and understand the answers to
Key takeaway
You do not need to write CUDA kernels. You need to ask five questions about every AI feature and understand whether the answers fit your margin model: cost per query, p95 latency, model tier, trigger logic, and 10× scale projection.
Why this matters for you
CTOs optimise for capability and reliability by default. Founders optimise for survival and margin. The inference cost conversation aligns both — or surfaces that you need a different technical lead for this phase of company.Question 1: What is all-in cost per query at median and p95 usage? Include input tokens, output tokens, retrieval, tool calls, and orchestration overhead. Median tells you pricing; p95 tells you tail risk. Do not accept 'it depends' without a worked example using real pilot data.
Founder decision lens
Building your AI cost model before investors ask for it
Key takeaway
Before you sign any AI infrastructure contract, build a spreadsheet: cost per query × queries per user × users × model price trajectory. Include training/fine-tuning as one-time rows and inference as monthly rows. Update it when vendors change pricing.
Why this matters for you
Investors will ask for AI unit economics in Series A diligence. Customers will ask for predictability in enterprise contracts. Your future self will ask why you launched without caps. The spreadsheet is the founder tool that prevents all three conversations from becoming surprises.Start with a single feature and model it honestly. Inputs: average prompt tokens, average output tokens, model price per million tokens, queries per user per day, expected MAU. If COGS exceeds 30% of ARPU at expected scale, the feature economics need redesign before marketing amplifies them.
Real product examples
OpenAI GPT-4 — nine-figure training bet
GPT-4's training run reportedly exceeded $100M in compute alone, ran for months, and answered zero customer queries during that period. OpenAI amortises that CapEx across API revenue and enterprise contracts. Founders buying tokens are renting the outcome of that bet — not replicating it.
Sort each line item into the correct cost bucket.
Drag each item into a category
Training (CapEx)
Inference (OpEx)

Vetted by Krishna KumarCurator, FactorBeam

