Qwen-Image: Crafting with Native Text Rendering

https://simonwillison.net/2025/Aug/4/qwen-image/#atom-everything(simonwillison.net)

Qwen has released its first image generation model, Qwen-Image, a 20 billion parameter Multimodal Diffusion Transformer with an Apache 2.0 license. The model's training process heavily emphasized native text rendering, using synthesized data and programmatic editing of templates. To create its training data, the team utilized their Qwen-2.5-VL vision LLM to generate comprehensive image descriptions and structured metadata. A text-to-image version is available, with an image editing model planned for a future release.

0 points•by ogg•11 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?