The existing long-CoT models achieve state-of the-art performance for mathematical reasoning through a process of generating reasoning paths with self-verification. The open-source Long-CoT models are dependent on natural-language reasoning traces. This makes them expensive to compute and more prone for errors. Tool-aided reasoning can be more efficient and reliable for large numerical computations, thanks to frameworks such as OpenHands which integrate code interpreters. However, agentic approaches have difficulty with complex or abstract reasoning problems.
DualDistill Framework Model and Agentic-R1 model
Researchers at Carnegie Mellon University propose DualDistillThe framework combines the trajectories of two teachers in order to produce a student model that is unified. The framework uses a teacher that focuses on reasoning and another who uses tools. Agentic-R1A model which learns the optimal strategy to solve each type of problem dynamically. Agentic-R1 uses code to perform arithmetic, algorithmic and abstract tasks. DualDistill takes advantage of trajectory composition, which distills knowledge from complementary teachers. In addition, the researchers chose OpenHands to be the teacher of agentic reasoning, while DeepSeek R1 was used as the teacher for text-based reason.
Assessment and Benchmarks
It is possible to evaluate the method using multiple benchmarks such as DeepMath-L You can also find out more about the following: Combinatorics300 Tests various aspects in mathematical reasoning. The baselines are compared. DeepSeek-R1-Distill You can also find out more about the following: Qwen-2.5-Instruct. Agentic R1 shows a great improvement in performance that is due to both the agentic and reasoning strategy. It outperforms two similarly sized models, each specializing in tool-assisted (Qwen2.5-7B-Instruct) or pure reasoning (Deepseek-R1-Distill7B) strategies. The Agentic-R1 model outperforms other tool-based models because it intelligently uses reasoning strategies to solve standard math problems.
The Use Patterns of Qualitative Analyses and Tools
Agentic R1 demonstrates intelligent patterns of tool usage, which activates code execution tools. 79.2% Combinatorics300, which is a computationally challenging problem. 52.0% The simpler AMC datasets are easier to solve. Without explicit instructions, Agentic-R1 can learn to use tools by supervised fine tuning alone. This balances computational efficiency with reasoning accuracy.
This is a robustness that can be used by teachers who are not perfect.
It is effective when taught by teachers who are not perfect. The agentic teacher, for example, achieves only a limited amount. 48.4% Combinatorics300 is accurate, however the student model has been improved. 44.7% The following are some of the ways to get in touch with us: 50.9%The student will outperform the teacher.
The conclusion of the article is:
The summary of the DualDistill The framework combines natural language reasoning with tool-assisted problems solving, distilling complementary teacher knowledge into one versatile student model. Agentic-R1. Agentic-R1 balances precision with computational efficiency by learning to select dynamically the best strategy for every problem. Agentic R1 has outperformed pure reasoning as well as tool-based mathematical reasoning models in a wide range of benchmarks. This is true even when the model learns from imperfect teachers. This paper highlights an approach that is promising for building adaptive AI agents that can integrate heterogeneous strategies to solve problems in a more robust manner.
Take a look at the Paper You can also find out more about the following: GitHub Page. This research is the work of researchers.
The AI Dev newsletter is read by over 40k+ developers and researchers from NVIDIA and OpenAI. DeepMind and Meta are also included. Microsoft, JP Morgan Chase and Amgen. Aflac and Wells Fargo. [SUBSCRIBE NOW]


