Qwen AI releases Qwen Scope: an open-source Sparse autoEncoders suite that turns LLM internal features into practical development tools

The large language models can be incredibly powerful, but frustratingly opaque. When a model misbehaves — generating responses in the wrong language, repeating itself endlessly, or refusing safe requests — AI devs have very few tools to diagnose What is the best way to get in touch with you? This was a result of the computations within. QwenScope was built to address this problem.

Qwen Team released just recently Qwen-ScopeThe open source suite of applications. sparse autoencoders (SAEs) Training on Qwen3 or Qwen3.5 models. Release comprises There are 14 different SAE weight groups across seven model variations — five dense models (Qwen3-1.7B, Qwen3-8B, Qwen3.5-2B, Qwen3.5-9B, and Qwen3.5-27B) and two mixture-of-experts (MoE) models (Qwen3-30B-A3B and Qwen3.5-35B-A3B).

What is Sparse Autoencoding, and why should you care?

Imagine a sparse encoder as an intermediary between neural network activations that are raw and concepts humans can understand. When an LLM processes text, it produces high-dimensional hidden states — vectors with thousands of numbers — that are difficult to interpret directly. The SAE decomposes these activations to a dictionary. Latent latent features with sparse appearanceThe inputs are limited to a few features. These features are usually grouped into a few distinct concepts: language, style, safety behavior.

QwenScope, in concrete terms, trains an SAE for each backbone layer and transformer layer to reconstruct residual stream activations by using a sparse collection of latent features. Each activation is mapped to a latent overcomplete representation by the SAE encoder. Top-k activation rule Only the largest of animals are kept K “Latent activation” for reconstruction. (k can be set either to 50 or to 100). For dense backbones, the SAE width scales to 16× the model hidden size; for MoE backbones, standard SAEs use 32K width (16× expansion), and wider SAEs up to 128K width (64× expansion) are also released to capture finer-grained representation structure.

This results in a feature dictionary that is layer-by-layer for each transformer layer on all seven backbones. A technical note: The SAEs of Qwen3.527B are the only ones that have been trained. You can also click here to learn more about All six other backbones are the same. Base model checkpoints.

The development workflow is changed in four ways by Qwen-Scope

1. Inference-Time Steering

It is important to use the most appropriate application. steering — influencing model output without modifying any model weights. This idea is based on the well-supported assumption that high-level behavior in a model’s representation space can be encoded by directions. The formula allows you to add or subtract a feature direction at inference using the formula h' ← h + αdWhere? The h Is the state hidden? D The SAE feature is directed by and α controls strength, engineers cThe aforementionedn push the model toward or away from specific behaviors.

The team of researchers presents two Qwen3 case studies. The first case involves a model that is prompted with English but mixes up Chinese text. The ranking of SAE features according to activation intensity reveals that a Chinese-language SAE feature is highly active (id: 6159). The language mix is completely eliminated by suppressing the feature during generation. The second example shows how activating the classical-Chinese (id:36398) feature successfully guides a task to a classical literary-style by enabling the classical-Chinese. Both examples required zero weight updates.

https://qianwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwen_Scope.pdf

2. Analyse Evaluation without Running Models

Evaluating LLMs typically means running many forward passes across large benchmark datasets — expensive in compute and time. Qwen-Scope suggests a cheaper solution: SAE feature activations. A proxy at the representation-level for benchmark analyses.

The core insight is that when a model processes a benchmark sample, the SAE decomposes its activation into a sparse set of active features, each interpretable as a ‘micro-capability.’ It is called a benchmark when all the samples of a benchmark sample activates the same feature. redundantTwo benchmarks activating overlapping feature sets Similar. The research team determines a feature redundancy metric that achieves a Spearman rank correlation of ρ ≈ 0.85 with performance-based redundancy across 17 widely-used benchmarks — including MMLU, GSM8K, MATH, EvalPlus, and GPQA-Diamond — without running a single model evaluation. This analysis shows that MATH already covers 63% of GSM8K features, so evaluation suites that include MATH could safely exclude GSM8K without losing any discriminative information.

After controlling for general model ability by partialing out MMLU scores, the partial Pearson correlation between feature overlap and performance-based similarity across 28 benchmark pairs improves to 75.5%. This shows that feature overlap captures benchmark-specific capability similarity rather than just general model quality. After controlling general model capability by partially removing MMLU score, the Pearson partial correlation between feature and performance based similarity for 28 benchmark pairs increases to 75.5%. This shows that feature overlap is more important than model general quality in capturing benchmark-specific capabilities. The practical implications are that benchmarks which have low feature overlap should be kept, while benchmarks which show high levels of overlap may need to be consolidated.

3. Data-Centric workflows: Safety data synthesis and Toxicity classification

Also, SAE-features are effective classifiers. The research team builds a multilingual toxicity classifier across 13 languages using a simple two-stage pipeline: identify SAE features that fire more frequently on toxic examples than clean ones (on a small discovery set), then apply an OR-rule over those features on held-out test data — no additional classifier head, no gradient-based fitting. This achieves an F1 Score above 0.90 in both Qwen3 -1.7B as well as Qwen3 -8B on English. The research team further shows that features discovered in English transfer meaningfully to other languages without rediscovery — performance declines with linguistic distance (strongest for European languages like Russian and French, weaker for Arabic, Chinese, and Amharic), and scaling to Qwen3-8B improves both the level and stability of cross-lingual transfer. Importantly, even with only 10% original data, the classification results are still 99% accurate, showing a strong data-efficiency.

The research team has developed a synthesis method that uses a Safety data pipeline based on feature-driven synthesisThe following steps are required: Identify safety-relevant SAE feature sets that have been missing in existing supervision; generate prompt completion pairs to activate these features and confirm retention of the features. With a matching budget, the feature-driven approach achieves a coverage rate of 99.74% for the safety features set. This is compared with the lower coverage rates achieved through natural sampling and random safety-related syntheses. Adding 4k feature-driven synthetic examples to 4k real safety examples produces a safety accuracy of 77.75 — approaching the performance of training on 120k safety-only examples.

4. After-Training: Reinforcement and fine-tuning through supervision

The use of SAE signals as signals is perhaps the most innovative contribution. TrainingNot just an inference

The research team can address the fine tuning of your vehicle. Code-switching unexpected — where multilingual LLMs spontaneously produce tokens in an unintended language. They call their method ” The Sparse Autoencoder Guided Supervised Fin-Tuning Technique (SASFT).It first determines language-specific feature via a score of monolinguality, and then introduces a regularization loss to suppress those features activation during training using non-target data. Across five models spanning three model families — Gemma-2, Llama-3.1, and Qwen3 — and three target languages (Chinese, Russian, and Korean), SASFT achieves over 50% reduction in code-switching ratio in the majority of experimental settings, with complete elimination in certain configurations (e.g., Qwen3-1.7B on Korean), while maintaining performance on six multilingual benchmarks.

The research team has tackled reinforcement learning Never-ending repetition — a low-frequency but disruptive failure mode where models loop in repeated content. Online RL standard rarely sees repeated rollouts so can’t get a good signal for corrective action. QwenScope solves this problem by generating a repetition biased rollout for each training group using SAE features steering. This negative sample is then incorporated into the DAPO RL pipeline as a very rare example. As a result, the ratio of repetitions drops consistently and sharply across Qwen3-1.7%B, Qwen3-8B, and Qwen3-30B – A3B. However, general benchmark performance stays competitive with vanilla Real-Time Learning.

Check out the Paper, Weights, You can also find out more about the following: Technical details. Also, feel free to follow us on Twitter Join our Facebook group! 130k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

You can partner with us to promote your GitHub Repository OR Hugging Page OR New Product Launch OR Webinar, etc.? Connect with us

Qwen AI releases Qwen Scope: an open-source Sparse autoEncoders suite that turns LLM internal features into practical development tools

This Coding implementation allows you to explore and analyze the TaskTrove dataset using visualisations of parsing and verifier detection.

The Developer’s Guide for Systematic Prompting – Mastering Negative constraints, Structured JSON outputs and Verbalized Samples with Multiple Hypotheses

What is Tokenization and How To Fix It?

Sakana AI Presents KAME – A Tandem Speak-to-Speech architecture that injects LLM in real time

The Enigma of Enforcing GDPR on LLMs • AI Blog

GPT-5 Doesn’t Dislike You—It Might Just Need a Benchmark for Emotional Intelligence

I Used Google’s New Gemini-Powered ‘Help Me Create’ Tool in Docs. This is a great tool for corporate-speak

AI podcasters Want To Tell You How To Keep A Man Happy

AI Slop Steals One of Summer’s Best Games Copycats have a hard time catching up.

Top Insights

Hugging Face Unveils AI Sheets: A Free, Open-Supply No-Code Toolkit for LLM-Powered Datasets

What the Latent vector fields reveal about neural autoencoders

Latest News

This Coding implementation allows you to explore and analyze the TaskTrove dataset using visualisations of parsing and verifier detection.

The Developer’s Guide for Systematic Prompting – Mastering Negative constraints, Structured JSON outputs and Verbalized Samples with Multiple Hypotheses

Qwen AI releases Qwen Scope: an open-source Sparse autoEncoders suite that turns LLM internal features into practical development tools

What is Sparse Autoencoding, and why should you care?

The development workflow is changed in four ways by Qwen-Scope

1. Inference-Time Steering

2. Analyse Evaluation without Running Models

3. Data-Centric workflows: Safety data synthesis and Toxicity classification

4. After-Training: Reinforcement and fine-tuning through supervision

Related Posts