Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • Apple’s new CEO must launch an AI killer product
  • OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing
  • 5 Reasons to Think Twice Before Using ChatGPT—or Any Chatbot—for Financial Advice
  • OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval
  • Your Favorite AI Gay Thirst Traps: The Men Behind them
  • Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin
  • Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges
  • Mend.io releases AI Security Governance Framework covering asset inventory, risk tiering, AI Supply Chain Security and Maturity model
AI-trends.todayAI-trends.today
Home»Tech»Stanford Researchers Release AgentFlow: In the Flow Reinforcement RL For Modular, Tool Using AI Agents

Stanford Researchers Release AgentFlow: In the Flow Reinforcement RL For Modular, Tool Using AI Agents

Tech By Gavin Wallace09/10/20254 Mins Read
Facebook Twitter LinkedIn Email
Meta AI Introduces Multi-SpatialMLLM: A Multi-Frame Spatial Understanding with Multi-modal
Meta AI Introduces Multi-SpatialMLLM: A Multi-Frame Spatial Understanding with Multi-modal
Share
Facebook Twitter LinkedIn Email

TL;DR: AgentFlow is a trainable agent framework with four modules—Planner, Executor, Verifier, Generator—coordinated by an explicit memory and toolset. The planner is optimised The loop A new method of policy formulation is available. Flow-GRPOThe, which applies KL regularization with group-normalized benefits and broadcasts an outcome-level reward at every turn. On ten benchmarks a tuned 7B with Flow GRPO reported +14.9% for search (search), +14.0% for agency (agenttic), +14.5% for math (math), plus +4.1% in science (science).

What is AgentFlow?

AgentFlow formalizes tool-integrated multi-turn reasoning into a Markov Decision Process. The MDP is applied at each step. Plan The sub-goal is selected and the tool chosen, plus the context. Executor The tool is called the Verifier If you continue, the signal will indicate whether you should. Generator It emits a final answer at the end. An evolving, structured memory stores states, verification signals, and tool calls. This constrains context growth, while making the trajectories auditable. The planner can only be trained. Other modules may have fixed engines.

Public implementation is based on a modular kit (e.g. base_generator, python_coder, google_search, wikipedia_search, web_searchThe repository is MIT-licensed. This repository has an MIT license.

https://arxiv.org/pdf/2510.05592

Training Method: Flow-GRPO

The Flow-GRPO is a group-based policy optimization based on flow. Converts sparse reward, long-horizon optimization into tractable one-turn updates

  • Reward broadcast for final outcome: A single and verifiable signal of trajectory level (LLM as judge correctness) will be assigned. Every TurnAchieving global success requires aligning local and regional planning.
  • The clipped target at token level: To prevent drift, importance-weighted rates are calculated per token. A PPO clipping style and KL penalties to the reference policy is used.
  • Group-normalized advantages: Stabilizing updates through variance reduction in groups of policy rollouts.
https://arxiv.org/pdf/2510.05592

Understand the benchmarks and results

Benchmarks. The research team evaluated four tasks types: Knowledge-intensive Search (Bamboogle 2, 2Wiki HotpotQA Musique), Agentic Reasoning (GAIA textual Split), Math (AIME-24 AMC 23 Game of 24) and Science (GPQA MedQA). GAIA, a benchmarking tool for general assistants, excludes requirements related to multimodality.

Main numbers (7B backbone after Flow-GRPO). The average gains per strong baseline: +14.9% (search), +14.0% (agentic), +14.5% (math), +4.1% (science). The team explains their 7B System GPT-4 surpasses GPT-4 The project page also reports training effects such as improved planning quality, reduced tool-calling errors (up to 50%), and increased productivity. This page reports on the training effect, such as better planning quality or reduced tool call errors (upto 28.4% On GAIA), there are positive trends in the model’s turn budget and scale.

Ablations. Online Flow-GRPO improves performance by +17.2% The performance of the planner is reduced by 12% when compared to a baseline frozen plan. −19.0% By a composite measure.

https://arxiv.org/pdf/2510.05592

What you need to know

  • Only planners can be trained in the modular agent training. AgentFlow structures an agent into Planner–Executor–Verifier–Generator with an explicit memory; only the Planner is trained in-loop.
  • Flow-GRPO converts RL with a long-horizon to updates that only require ONE-TURN. The reward for each turn is a signal at the trajectory level; it uses tokens and KL regularization to update.
  • Researchers reported gains in 10 benchmarks. AgentFlow, with its 7B-based backbone, reports improvements in average of +14.9% for search, +14.0% for agenttic/GAIA texts, +14.5% and +4.1% respectively, over the strong baselines. It also surpasses GPT-4o.
  • The reliability of tools is improving. Research team reports reduced tool-calling error (e.g. GAIA) as well as better planning quality when using larger budgets for turns and models.

AgentFlow divides agents using tools into four modules: planner, executor and verifier. The generator is trained in-loop by Flow-GRPO. This broadcasts a single reward at the trajectory level to each turn, with token-level updates of PPO style and KL controls. The reported results for ten benchmarks showed average gains of 14.9% in search, +14.0% in textual (agenttic/GAIA), +14.5% in math and +4.1% science. In addition, the team stated that the 7B system outperformed GPT-4o. The GitHub repository includes implementation, tools and quick-starts scripts that are MIT licensed.


Click here to find out more Technical Paper, GitHub Page You can also find out more about the following: Project Page. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! What? now you can join us on telegram as well.


Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to harnessing Artificial Intelligence’s potential for the social good. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. This platform has over 2,000,000 monthly views which shows its popularity.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

AI research search
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

24/04/2026

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

24/04/2026

Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin

24/04/2026

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges

24/04/2026
Top News

North Korean hacker mediocre use AI to steal millions.

Free Local RAG Scraper for Custom GPTs and Assistants • AI Blog

Trump’s Anti-Bias AI Order is Just More Bias

OpenAI asks contractors to upload past work to assess the performance of AI agents

AliExpress is Soon Selling a $4370 Humanoid Robot

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

This ultra-realistic platform is behind the rise of romance scams.

18/12/2025

How to Make Your Own Brainrot Meme (and Go Viral)

31/05/2025
Latest News

Apple’s new CEO must launch an AI killer product

24/04/2026

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

24/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.