xAI Launches Grok-4Fast : Trained end-toend using tool-use reinforcement learning (RL), with a 2M-Token context.

xAI is a new technology introduced Grok-4-FastGrok-4 is a successor designed to be cost-effective that merges “reasoning” You can also find out more about the following: “non-reasoning” The system can control weights by prompting users. The model targets high-throughput search, coding, and Q&A with a 2M-token context window Use native tools-use RL to decide when you want to surf the internet, run code or use tools.

Architecture note

Grok Previous releases split in long-chain “reasoning” Shorten it to “non-reasoning” Different models have different responses. Grok-4-Fast’s Unified weight space Reduces latency from end to end and the tokens used by guiding behavior through system prompts. This is important for real-time (search) applications, such as interactive code, or assistive agents.

Use of search agents

Grok-4 Fast was taught end-to-end. Reminding learners to use tools Search-centric benchmarks show gains in the search agent: BrowseComp 44.9, SimpleQA 95.0%, Reka Research 65.0%Scores are higher on Chinese versions (e.g. BrowseComp-zh 51.2%). xAI also cites the private battle-testing done on LMArena. grok-4-fast-search (codename “menlo”Search Arena: Ranks #1 with 1163 EloThe text version (codename) is also available. “tahoe”The’seat’ of the. Text Arena: #8The average price of a car in the United States is roughly equal to grok-4-0709.

Performance and Efficiency Deltas

Grok-4 posts can be used to benchmark internal or public benchmarks Scores for the frontier class While cutting token use. xAI provides pass@1 results. 92.0% (AIME 2025, no tools), 93.3% (HMMT 2025, no tools), 85.7% (GPQA Diamond)” 80.0% (LiveCodeBench Jan–May)Grok-4 is similar to or comparable but not using There are 40% less people who use the internet. “thinking” tokens on average. The company’s frame this as “intelligence density,” Claim a Price reduction of 98% to achieve the benchmark performance level as Grok-4 When the new token pricing and lower token count are combined.

Cost and deployment

Model is All Users Can Access It Grok The Fastest Way to Get Your Own Ticket You can also find out more about the following: Autobahnen modes across web and mobile; Auto will select Grok-4-Fast for difficult queries to improve latency without losing quality, and—for the first time—Free users access xAI’s latest model tier. For developers, xAI exposes two SKUs—grok-4-fast-reasoning You can also find out more about the following: grok-4-fast-non-reasoning—both with The 2M Context. Pricing for xAI APIs is $0.20 / 1M input tokens (, $0.40 / 1M input tokens (≥128k), $0.50 / 1M output tokens (, $1.00 / 1M output tokens (≥128k)” $0.05 per 1M input tokens.

https://x.ai/news/grok-4-fast

Five Takeaways for Technical Professionals

Unified model + 2M context. A single space is used for weights in Grok-4. “reasoning” You can also find out more about the following: “non-reasoning,” The prompt-steered window is 2,000,000 tokens across the two SKUs.
Prices for scaling. Pricing starts at The input is $0.20 per M., Output: $0.50 per MThe cached input is at $0.05/M The rates are higher only if you go beyond the 128K context.
Efficiency claims. xAI reportsThe difference is 40% “thinking” tokens Grok-4 can be used with similar accuracy. Grok-4 at 98% lower price Frontier benchmarks
Benchmark profile. Reported pass@1: AIME-2025 92.0%, HMMT-2025 93.3%, GPQA-Diamond 85.7%, LiveCodeBench (Jan–May) 80.0%.
Agentic/search use. The tool is used in the real world. Documented search-agent metrics are included, as well as live-search billing.

You can read more about it here:

The Grok-4 fast model combines the Grok-4 level of capability with an easy-to-use prompt steering interface, a tool-use Real-time Learning (RL) and pricing that is optimized for search workloads and agents. Early signals (LMArena’s #1 position in search, and competitive text placement) are consistent with xAI’s claim that similar accuracy can be achieved using 40% fewer tokens. “thinking” The tokens reduce production costs and latency.

Take a look at the Technical details. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter.

Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost was his most recent venture. This platform, a Media Platform for Artificial Intelligence, is outstanding in its technical accuracy and ease of understanding by the general public. Over 2 million views per month are a testament to the platform’s popularity.

xAI Launches Grok-4Fast : Trained end-toend using tool-use reinforcement learning (RL), with a 2M-Token context.

Google Cloud AI Research introduces ReasoningBank: a memory framework that distills reasoning strategies from agent successes and failures.

Equinox Detailed implementation with JAX Native Moduls, Filtered Transformations, Stateful Ladders and Workflows from End to end.

Xiaomi MiMo V2.5 Pro and MiMo V2.5 Released: Frontier Model Benchmarks with Significantly Lower Token Cost

How to Create a Multi-Agent System of Production Grade CAMEL with Tool Usage, Consistency, and Criticism-Driven Improvement

Google Pixel 10 series, Pixel Watch 4 Pixel Buds: Features, Specs, and Release Date

It’s Hard to Be Excited about a New Amazon Smartphone

ChatGPT will soon have ads. Advertisements are Coming to ChatGPT.

Biggest Artificial Intelligence Companies Meet to Discover a Better Way for Chatbots

Nvidia CEO Dismisses Considerations of an AI Bubble. Buyers Stay Skeptical

Top Insights

EraRAG is a multi-layer, scalable graph-based retrieval system that can be used for dynamic and growing corpora.

R-Zero is a fully autonomous AI framework that generates its own training data from scratch.

Latest News

Google Cloud AI Research introduces ReasoningBank: a memory framework that distills reasoning strategies from agent successes and failures.

Equinox Detailed implementation with JAX Native Moduls, Filtered Transformations, Stateful Ladders and Workflows from End to end.

xAI Launches Grok-4Fast : Trained end-toend using tool-use reinforcement learning (RL), with a 2M-Token context.

Architecture note

Use of search agents

Performance and Efficiency Deltas

Cost and deployment

Five Takeaways for Technical Professionals

You can read more about it here:

Related Posts