Black Forest Labs releases FLUX.2 [klein]A compact image family which targets interactive visual intelligence in consumer hardware. FLUX.2 [klein] The FLUX.2 Line is extended with sub-second generation, editing, an unified architecture of text to image or image to image conversion, as well as deployment options ranging from local GPUs up to cloud APIs. All this while maintaining state of the image quality.
The FLUX.2 [dev] Interactive visual intelligence
FLUX.2 [dev] It is designed to run on accelerators of the datacenter class. This is a high-quality, flexible product with long sampling times and high requirements for VRAM.
FLUX.2 [klein] This model takes the same direction in design and reduces it to smaller rectified flow transforms, with between 4 billion and nine billion parameters. These models support very short sampling times, the same multi-reference and text-to image editing tasks and have response time below one second.
The Model Family and its Capabilities
The FLUX. [klein] Family consists of four open-weight variants using a common architecture.
- FLUX.2 [klein] 4B
- FLUX.2 [klein] 9B
- FLUX.2 [klein] 4B Base
- FLUX.2 [klein] 9B Base
FLUX.2 [klein] The 4B and the 9B models are distilled step-by-step and distilled guidance. These models use four inference steps, and they are considered the fastest for interactive and production workloads. FLUX.2 [klein] The 9B model combines the 9B flow with the 8B Qwen3 embedder. It is described as a small flagship model that represents the Pareto frontier in terms of quality and latency for text to image conversion, editing single references, and multiple reference generation.
These Base variants have longer sampling periods and are not distillate versions. Documentation describes them as models which preserve all training signals and offer higher output diversity. The models can be used for LoRA training, custom post-training workflows and research pipelines.
All FLUX. [klein] Models support three main tasks within the same architecture. The models can create images using text. They can also edit an input image.
The quantized and latency variants of VRAM
The FLUX. [klein] Model page gives approximate inference time for GB200 and 5090. FLUX.2 [klein] FLUX.2 is the fastest of all variants, with a listed speed per image of 0.3 to 1 second depending on hardware. FLUX.2 [klein] 9B targets between 0.5 and 2 seconds with higher quality. They require a few seconds to run because of the 50-step sample schedule, but offer greater flexibility in custom pipelines.
The FLUX. [klein] The 4B card is a 13 GB VRAM model that can be used with GPUs such as the RTX3090 or RTX4070. The FLUX.2 [klein] A 9B cards reports a need for 29 GB VRAM, and is targeted at hardware such as RTX 4090. A single consumer-grade card could host all the variants of distilled graphics with full resolution.
Black Forest Labs releases FP8 and NVFP4 for FLUX. [klein] NVIDIA developed variants together. FP8 Quantization has been described as being up to 1.6 time faster and using up to 40% less VRAM. NVFP4 quantization, on the other hand, is said to be up 2.7 times quicker with up up to 55% lower VRAM consumption.
Benchmarks compared to other images models
Black Forest Labs evaluates FLUX.2 [klein] Elo-style comparisons are made for text to images, editing with single and multiple references, as well as other tasks. Performance charts of FLUX. [klein] The commentary says that Flux.2 is on the Pareto Frontier of Elo Score versus Latency and Elo Score versus VRAM. [klein] It is able to match or surpass the quality of Qwen-based image models with a fractional amount of latency and RAM, while outperforming Z Image and supporting multi reference and unified text editing within one architecture.
They are designed to be the foundation for research, domain specific pipelines, and new technologies.
What you need to know
- FLUX.2 [klein] It is a family of compact rectified flow transforms with 9B and 4B variants. The architecture supports multi-reference generation, text to image conversion, and single image editing.
- FLUX.2 [klein] Models 4B, 9B use four sampling steps to achieve sub-second inference on one modern GPU. Base models do not use this optimization and use long schedules for testing and for fine tuning.
- Quantized FP8 & NVFP4 variants built by NVIDIA provide up 1.6 times faster performance with a 40 percent reduction in VRAM for FP8, and up 2.7 times faster performance with a 55 percent reduction in VRAM for NVFP4 for RTX GPUs.
Click here to find out more Technical details, Repo You can also find out more about the following: Model weights. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! What? now you can join us on telegram as well.

![Black Forest Labs releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence Step-by-Step Guide to Creating Synthetic Data Using the Synthetic Data](https://ai-trends.today/wp-content/uploads/2025/05/Step-by-Step-Guide-to-Creating-Synthetic-Data-Using-the-Synthetic-Data.webp-1024x683.webp)
