Meta FAIR release Code World Model (CWM)A 32-billion parameter dense LLM with only decoder that injects World Modeling into code generation by training on execution traces and long-horizon agent–environment interactions—not just static source text.
Learn code with the new version of Code by Predicting execution?
CWM mid-trains on two large families of observation–action trajectories: (1) Python interpreter traces After each line of code, record the local variables. Interactions between Dockerized repositories and their agents This grounding is intended to teach semantics (how state evolves) rather than only syntax. The goal of this grounding is to introduce semantics, or how state changes over time.
The research team developed a collection scale to collect data at a larger volume. You can download executable images Thousands of GitHub Projects and multi-step trajectories are gathered by a software engineering agent.”ForagerAgent”). Release reports Trajectories 3M The issue was fixed in 10k of the images, and 3.15k repos.
Window with Model and Context
CWM is the acronym of Cement and Concrete Management. Transformer only with decoder No MoE with 64-layers, GQA (48Q/8KV)., SwiGLU, RMSNormThen, The Rope is Scaled. Attention Alternates Local 8k The following are some examples of how to get started: Global 131k Enabling sliding-window blocks 131k tokens effective context; training uses document-causal masking.
Training recipe (pre → mid → post)
- Preparation for General Training8T (code-heavy), 8k context.
- Mid-training+5T tokens (131k) in long context with Python execution trace, ForagerAgent Data, PR-derived differences, IR/compilers and Triton kernels.
- Post-training: 100B-token SFT Instruction + Reasoning multi-task RL (~172B-token) across verifiable coding, math, and multi-turn SWE environments using a GRPO-style algorithm and a minimal toolset (bash/edit/create/submit).
- A quantitative inference can be made on the basis of a Single 80 GB H100.
Benchmarks
According to the research team, these are some of the findings pass@1 / scores Test-time Scaling (where applicable)
- SWE-bench Verified: 65.8% (with test-time scaling).
- LiveCodeBench-v5: 68.6%; LCB-v6: 63.5%.
- Math-500: 96.6%; AIME-24: 76.0%; AIME-25: 68.2%.
- CruxEval-Output: 94.3%.
CWM is competitive against open weights of similar size and with models that are larger on SWE Bench Verified.
See the following for more information on SWE Bench Verified task design metrics. official benchmark resources.

Why is world modelling important for code?
This release highlights two operational abilities:
- Execution-trace prediction: given a function and a trace start, CWM predicts stack frames (locals) and the executed line at each step via a structured format—usable as a “neural debugger” You can use grounded reasoning to make decisions without actual live execution.
- Codes for AgenticMulti-turn reasoning using tool usage against real repos verified by hidden test and reward for patch similarity; setup teaches the model how to locate faults and generate End-to-end patch Use git diff instead of snippets.
It is worth noting some details
- TokenizerLlama-3 Family with Reserved Control Tokens. The reserved IDs serve to distinguish trace and reasoning segments when SFT is performed.
- Attention layout” 3:1 local:global Interleave occurs across the entire depth. Long-context training takes place at large token batch sizes Stabilize gradients
- Compute scalingThe learning rate/batch sizes are determined by internal scaling laws tailored to long context overheads.
You can read more about it here:
CWM is a pragmatic step toward grounded code generation: Meta ties a 32B dense transformer to execution-trace learning and agentic, test-verified patching, releases intermediate/post-trained checkpoints, and gates usage under the FAIR Non-Commercial Research License—making it a useful platform for reproducible ablations on long-context, execution-aware coding without conflating research with production deployment.
Take a look at the Paper, GitHub PageThen, Model on Hugging Face. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter.
Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to harnessing Artificial Intelligence’s potential for the social good. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.


