Faster inference

https://simonwillison.net/2025/Aug/1/faster-inference/#atom-everything(simonwillison.net)

LLM service providers are increasingly marketing inference speed as a premium feature. Cerebras announced new subscription plans for a hosted version of Qwen's latest coding model, claiming an impressive 2,000 tokens per second. Similarly, Moonshot released a turbo version of its Kimi K2 model that is four times faster for a higher price. This trend suggests a growing market for high-performance models that can enhance interactive applications like live code generation.

0 points•by ogg•2 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?