Question 1

What is a good gross margin for an AI-native SaaS product?

Accepted Answer

Most investors and CFOs target 70% or above for AI-native SaaS at scale. Early-stage companies often run at 40–60% while they optimize model selection and prompt engineering, then push toward 70%+ as volume grows and caching strategies mature. Below 40% is a signal that either the pricing model needs to change or the underlying model selection is wrong for the use case.

Question 2

How does prompt caching reduce LLM costs?

Accepted Answer

Prompt caching lets providers reuse the KV cache from a previous request when the prefix of a new request matches. This primarily reduces input token costs because the provider does not need to process the cached portion again. Both OpenAI and Anthropic charge 10% of the standard input rate for cached tokens — a 90% discount on the cached portion. The savings compound quickly when your prompts have long, stable system instructions that appear in every request.

Question 3

How do I choose the right LLM model to protect gross margin?

Accepted Answer

The right question is not which model is best, but which delivers the best cost-adjusted performance for your workload. Run objective evals on your actual tasks: define a measurable quality metric, score each model tier against it, and record the cost per call alongside the score. A model that scores 15% higher but costs 3x more has failed the ROI test — the performance gain does not justify the margin hit. The threshold is a ratio: a more expensive model only clears the bar if the performance gain percentage exceeds the cost increase percentage. Revisit this regularly as model prices fall.

Forecast Your LLM Token Costs and Gross Margin

Know Your LLM Costs Before Your Pricing Is Locked In

Ready to automate this for your team?