Black Forest Labs releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence

Black Forest Labs releases FLUX.2 [klein]A compact image family which targets interactive visual intelligence in consumer hardware. FLUX.2 [klein] The FLUX.2 Line is extended with sub-second generation, editing, an unified architecture of text to image or image to image conversion, as well as deployment options ranging from local GPUs up to cloud APIs. All this while maintaining state of the image quality.

The FLUX.2 [dev] Interactive visual intelligence

FLUX.2 [dev] It is designed to run on accelerators of the datacenter class. This is a high-quality, flexible product with long sampling times and high requirements for VRAM.

FLUX.2 [klein] This model takes the same direction in design and reduces it to smaller rectified flow transforms, with between 4 billion and nine billion parameters. These models support very short sampling times, the same multi-reference and text-to image editing tasks and have response time below one second.

The Model Family and its Capabilities

The FLUX. [klein] Family consists of four open-weight variants using a common architecture.

FLUX.2 [klein] 4B
FLUX.2 [klein] 9B
FLUX.2 [klein] 4B Base
FLUX.2 [klein] 9B Base

FLUX.2 [klein] The 4B and the 9B models are distilled step-by-step and distilled guidance. These models use four inference steps, and they are considered the fastest for interactive and production workloads. FLUX.2 [klein] The 9B model combines the 9B flow with the 8B Qwen3 embedder. It is described as a small flagship model that represents the Pareto frontier in terms of quality and latency for text to image conversion, editing single references, and multiple reference generation.

These Base variants have longer sampling periods and are not distillate versions. Documentation describes them as models which preserve all training signals and offer higher output diversity. The models can be used for LoRA training, custom post-training workflows and research pipelines.

All FLUX. [klein] Models support three main tasks within the same architecture. The models can create images using text. They can also edit an input image.

The quantized and latency variants of VRAM

The FLUX. [klein] Model page gives approximate inference time for GB200 and 5090. FLUX.2 [klein] FLUX.2 is the fastest of all variants, with a listed speed per image of 0.3 to 1 second depending on hardware. FLUX.2 [klein] 9B targets between 0.5 and 2 seconds with higher quality. They require a few seconds to run because of the 50-step sample schedule, but offer greater flexibility in custom pipelines.

The FLUX. [klein] The 4B card is a 13 GB VRAM model that can be used with GPUs such as the RTX3090 or RTX4070. The FLUX.2 [klein] A 9B cards reports a need for 29 GB VRAM, and is targeted at hardware such as RTX 4090. A single consumer-grade card could host all the variants of distilled graphics with full resolution.

Black Forest Labs releases FP8 and NVFP4 for FLUX. [klein] NVIDIA developed variants together. FP8 Quantization has been described as being up to 1.6 time faster and using up to 40% less VRAM. NVFP4 quantization, on the other hand, is said to be up 2.7 times quicker with up up to 55% lower VRAM consumption.

Benchmarks compared to other images models

Black Forest Labs evaluates FLUX.2 [klein] Elo-style comparisons are made for text to images, editing with single and multiple references, as well as other tasks. Performance charts of FLUX. [klein] The commentary says that Flux.2 is on the Pareto Frontier of Elo Score versus Latency and Elo Score versus VRAM. [klein] It is able to match or surpass the quality of Qwen-based image models with a fractional amount of latency and RAM, while outperforming Z Image and supporting multi reference and unified text editing within one architecture.

https://bfl.ai/blog/flux2-klein-towards-interactive-visual-intelligence

They are designed to be the foundation for research, domain specific pipelines, and new technologies.

What you need to know

FLUX.2 [klein] It is a family of compact rectified flow transforms with 9B and 4B variants. The architecture supports multi-reference generation, text to image conversion, and single image editing.
FLUX.2 [klein] Models 4B, 9B use four sampling steps to achieve sub-second inference on one modern GPU. Base models do not use this optimization and use long schedules for testing and for fine tuning.
Quantized FP8 & NVFP4 variants built by NVIDIA provide up 1.6 times faster performance with a 40 percent reduction in VRAM for FP8, and up 2.7 times faster performance with a 55 percent reduction in VRAM for NVFP4 for RTX GPUs.

Click here to find out more Technical details, Repo You can also find out more about the following: Model weights. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! What? now you can join us on telegram as well.

Michal Sutter, a data scientist with a master’s degree in data science from the University of Padova is an expert. Michal Sutter excels in transforming large datasets to actionable insight. He has a strong foundation in statistics, machine learning and data engineering.

Black Forest Labs releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

What Tech Exec Brothers and Lt. Col. Boz Will Do In The Army

Meet the Chinese Startup Using AI—and a Small Army of Workers—to Train Robots

‘She’s Never Going to Age’: Porn Stars Are Embracing AI Clones to Stay Forever Young

A small English town caught up in the global AI arms race

Microsoft, Meta, and Google Triple Up on AI Spending

Top Insights

YouTube on TV: How to watch it.

Microsoft, Meta, and Google Triple Up on AI Spending

Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Black Forest Labs releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence

The FLUX.2 [dev] Interactive visual intelligence

The Model Family and its Capabilities

The quantized and latency variants of VRAM

Benchmarks compared to other images models

What you need to know

Related Posts