The cost-curve trap: when AWS spend grows 6× faster than users

We have looked at sixteen production AWS environments in the last two years. Twelve of them had cost growing 4× or more relative to user growth. The standard FinOps playbook does not catch this — by the time you notice the curve, you are already six months deep into the wrong architecture.

Standard diagnoses miss it

The usual checks: missing reservations, oversized instances, S3 lifecycle. These get you 10-20%. The 60-80% wins come from architectural patterns that scale superlinearly with users.

The three patterns we see most

Cross-AZ data transfer. Your database is in one AZ but your app servers spread across three. Every read crosses AZ boundaries. At 1,000 users this is rounding. At 100,000 it’s $8k/month.
S3 GET amplification. One user action triggers seventeen S3 reads. Multiply by users and growth in features.
Lambda cold-path warm-keeping. Provisioned concurrency on functions that get traffic once an hour.

How to spot it early

Plot two curves on the same graph: monthly active users and monthly cost. Index both to t=0. If cost is consistently steeper than users, you have a superlinear cost pattern somewhere. Once you see it, the architecture review tells you which one.

The operating test

We treat this as real only when it changes a dashboard, a runbook, and one named engineer’s weekly work. If the idea cannot survive those three places, it is probably just a slide.

The useful version is specific, measurable, and owned by someone who can say what changed after it shipped.

What we would do differently

Instrument before changing architecture. The baseline decides whether the fix worked.
Name the trade-off. Every improvement costs latency, money, complexity, or time somewhere else.
Revisit it after 30 days. Production has a way of teaching what the workshop missed.