0
Faster inference
https://simonwillison.net/2025/Aug/1/faster-inference/#atom-everything(simonwillison.net)LLM service providers are increasingly marketing inference speed as a premium feature. Cerebras announced new subscription plans for a hosted version of Qwen's latest coding model, claiming an impressive 2,000 tokens per second. Similarly, Moonshot released a turbo version of its Kimi K2 model that is four times faster for a higher price. This trend suggests a growing market for high-performance models that can enhance interactive applications like live code generation.
0 points•by ogg•2 months ago