Right-sizing LLM inference: a three-tier approach June 1, 2026 You do not need GPT-4 for every request. Most production AI bills are paying for the difference.