Microsoft Research released OptiMind. This AI-based system converts complex decisions problems from natural language into mathematical formulas which optimization solvers are able to execute. This system addresses a problem that has been a bottleneck for operations research. Translating business intentions into mixed integer programs requires expert modelers, and can take days to complete.
OptiMind: What it is, and its outputs?
OptiMind-SFT It is a Mixture of Experts 20B Parameter model within the GPT oss Transformer family. The inference cost of the model is similar to that of a middle-sized one, but with a high level capacity. A context of 128,000 tokens allows complex specifications to be included in a request.
Model takes as an input a natural-language description of an Optimization Problem. Output is a mathematical formula along with an executable Python program that utilizes GurobiPy. The script generates decision variables, objectives, and constraints. It then calls the Gurobi Solver and displays the best objective and the decisions.
OptiMind is a layer of formulation between experts in the domain and standard MILP solving tools. The solver is not replaced, but the generated MILP will be optimized by the solver.
The Dataset, the Architecture and Training, as well as the Setup.
Base model: openai/gpt-oss-20bFine tuned to microsoft/OptiMind-SFT Using cleaned optimization datasets. Architecture is a Mixture of Experts Transformer, where routing activates subsets of experts for each token. This model has been released under the MIT License.
The reference set-up uses eight NVIDIA H100 graphics cards for inference, evaluation and training. Fine tuning is reported to take about eight hours. The team recommends that hardware like the A100, B200, or H100 have at least 32GB of GPU Memory.
The research team builds cleaned versions OR Instruct Train and OptMATH Train for supervised fine-tuning. They use validated versions and cleaned OptMATH, Mamo Complex and IndustryOR for testing. The benchmarks are designed to cover difficult formulation tasks, where models can only achieve 20-50 percent accuracy in the noisy original versions.
Class Based Error Analysis and Data Cleaning
OptiMind combines optimization knowledge with LLM education. The team classified problems in OR-Instruct or OptMATH, such as set cover, flow shops scheduling, traveling salesman, etc., into 53 different seed classes.
They run the gpt oss-20b base model for each class on a selection of problems, and then select those where the output does not match the reality. These items are inspected by optimization experts who identify the formulating mistakes and provide short descriptions of errors and prevention hints. The hints include correct constraint, variable bounds or modeling tricks such as Miller Tucker Zemlin for TSP.
A semi-automatic pipeline is then used by the research team. The team regenerates solutions using a large model, which is then prompted by the specific class hints. They also apply majority voting to samples in order to improve quality and remove items that are inconsistent. The program also detects missing parameters, ambiguous sentences and generates problem descriptions as needed. It produces a clean training corpus with better alignment to mathematical formulations.
Testing Time Scaling, Inference Pipeline And Hints
OptiMind acts as a multiple stage system and not a single prompt at the time of inference. The default pipeline classifies first each test into one of 53 optimization categories used in error analysis. The error summary is then added to the prompt.
A reasoning trail, a mathematical formulation and the GurobiPy program are generated by the model. If more computation is available, it can use majority voting and self consistency. The system generates many candidate scripts and executes them. It then selects the most common solution within set tolerances.
You can enable a mode of multi-turn correcting. The system generates code, captures logs from the solver or errors in execution, then feeds that feedback back to model. It lets it revise formulations and codes for several rounds. The model can correct some coding and modeling errors, but the latency is higher.
Quantitative Gains On Optimization Benchmarks
The OptiMind Framework significantly increases solution accuracy on cleaned versions of IndustryOR Mamo-Complex and OptMATH. The model is fine-tuned to improve formulation accuracy of up to 20.7 per cent across multiple benchmarks. Further gains are made when techniques like self consistency, multi turn feedback and test time scaling are used.
OptiMind is more accurate than other models, even those of larger or similar size. The performance is comparable to proprietary frontier models like GPT-o4mini and GPT-5. evaluation settings.
This depends on the careful cleaning of training data and test results. Researchers report that a lot of apparent errors in the benchmarks were actually due to missing data, unclear descriptions or incorrect solutions. Re-cleaning the sets can improve the apparent accuracy from 40-60% to 70-90%.
What you need to know
- OptiMind uses a mixture of experts transformer (gpt-oss) with 20B parameters that can take natural language problems and produce a GurobiPy program and a mathematical formula. It has 3.6B variables per token, and a token context size of 128,000.
- Models are fine-tuned from
openai/gpt-oss-20bThe benchmarks are validated by experts and include IndustryOR and Mamo Complex. They focus on the mixed integer formulations of linear programming. - OptiMind applies expert-written hints and class-based analysis for 53 optimization categories, and then uses these hints at both data cleaning time and inference, which reduces the number of common mistakes made in MILPs.
- It is possible to achieve performance comparable with large proprietary systems by using test time scale methods like self consistency, multi-turn feedback and other techniques.
- OptiMind-SFT was released on the same day as
microsoft/OptiMind-SFTOn Hugging Face asmicrosoft-optimind-sftAzure AI Foundry can serve it via SGLang, as a OpenAI-compatible endpoint. This allows for practical integration in decision support pipelines, such as supply chains, scheduling, manufacturing and logistics.
Click here to find out more Model Weights The following are some examples of how to get started: Technical details. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.
Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost was his most recent venture. This platform, devoted to Artificial Intelligence, is distinguished by its technical soundness and accessibility. Over 2 million views per month are a testament to the platform’s popularity.

