Krea 2: Open-weights 12B image generation model
Original: Krea 2: SOTA open-weights 12B image model
Why This Matters
Open-weights image model advances accessibility; enables creative-focused alternatives to commercial systems with narrow aesthetics.
Krea released Krea 2, an open-weights 12B foundation model series for image generation focused on creative exploration. The model supports diverse aesthetics and user control through multi-stage training pipeline including pretraining, finetuning, and reinforcement learning.
Krea introduced Krea 2, a series of foundation models designed for image generation with emphasis on creative exploration and diverse aesthetic control. Unlike existing systems that converge toward narrow default aesthetics, Krea 2 prioritizes expressive generation across multiple styles, moods, and visual directions. The model is built on a custom large-scale data infrastructure and distributed training framework developed from scratch. Key technical components include: a diffusion transformer (DiT) architecture refined through ablations; improved VAEs; Qwen3-VL text encoder; grouped-query attention; sigmoid-gated attention; and lightweight timestep modulation. The training pipeline spans pretraining, midtraining, supervised finetuning, preference optimization, and reinforcement learning, with each stage progressively refining output distribution. The model weights and inference code are released under a permissive license on Hugging Face and GitHub. The technical report, authored by Sangwu Lee and team, emphasizes bridging the gap between model conditioning space and user creative intent at inference time through carefully constructed captions and multiple input modes including text, mood, style, and reference images.