Chinese AI Startup DeepSeek Releases DeepSeek-V3.1The latest language model from the company. This builds upon the architectural foundation of DeepSeek-V3DeepSeek models have quickly gained a reputation for their superior reasoning and tool usage. DeepSeek models are known for their rapid growth. Delivering OpenAI- and Anthropic’s level of performance for a fraction the cost.
Model Architecture and Capabilities
- The Hybrid Mode of Thinking DeepSeek-V3.1 works with both Think about it The reasoning process is more deliberate (chain of thought) non-thinking (direct, stream-of-consciousness) generation, switchable via the chat template. The new version is more flexible and allows for a variety of use cases.
- Tool and Agent Support This model is optimized for tool calling The following are some examples of how to get started: agent tasks (e.g., using APIs, code execution, search). Tools are called in a structured manner, the model includes custom code agents as well as search agents. Templates for both of these agents can be found within the repository.
- Massive Scale, Efficient Activation: This model is a real eye-catcher 671B total parametersWith a 38B per token—a Mixture-of-Experts (MoE) The design should reduce inference costs, while maintaining the capacity. The design that lowers costs of inference while maintaining capacity. The context window It is a good idea to use a bilingual translator 128K tokensThe size of the company is much greater than that of most competitors.
- Extended Long Context (Long Context): DeepSeek V3.1 is a search engine that uses the a Two-phase extension of the long context approach. The 32K phase was the first to be trained. 630B tokens The first (128K) is 10x larger than the V3. 209B tokens This model has a training time of 3.3 times more than the V3! Model is trained with The FP8 Microscaling For efficient arithmetic, on the next-generation hardware.
- Chat Template Template supports Conversations with multiple turns The system will prompt the user, answer their questions, and respond to any assistant requests. This is done by using the “tokens” for system prompts, user queries and assistant responses. Think about it The following are some examples of how to get started: non-thinking Modes are activated by
The following are some examples of how to get started:Tokens are used to prompt the sequence.
Performance Benchmarks
DeepSeek-V3.1 (also known as DeepSeek) is a search engine that allows users to find information using their smartphones. The evaluation is done across many benchmarks This includes general knowledge (see the table below), coding, mathematics, tool usage, and tasks for agents. Highlights:
| Metric | V3.1-NonThinking | V3.1-Thinking | Competition |
|---|---|---|---|
| MMLU-Redux (EM) | 91.8 | 93.7 | 93.4 (R1-0528) |
| MMLU-Pro (EM) | 83.7 | 84.8 | 85.0 (R1-0528) |
| GPQA-Diamond (Pass@1) | 74.9 | 80.1 | 81.0 (R1-0528) |
| LiveCodeBench (Pass@1) | 56.4 | 74.8 | 73.3 (R1-0528) |
| AIMÉ 2025 (Pass@1) | 49.8 | 88.4 | 87.5 (R1-0528) |
| SWE-bench (Agent mode) | 54.5 | — | 30.5 (R1-0528) |
It is important to note that the word “you” means “your”. Think mode The latest version of the software is always better than previous versions in math and coding. The Non-thinking mode This is ideal for those applications which require high latency.

The integration of Tool and Code Agent
- Tool CallingScriptable workflows using external APIs and Services are possible in the non-thinking tool mode.
- Code AgentsDeepSeek-V3.1 can use external search tools for up-to date information. This feature is critical for business, finance, and technical research applications. DeepSeek V3.1 allows users to search externally for the latest information.
Deployment
- Open Source, MIT LicenseThe weights of all models and their codes are included. Free of charge Hugging Face ModelScope is located under the MIT License. This encourages both research as well as commercial usage.
- Local InferenceModel structure compatible with DeepSeek V3 and instructions are included for deployment locally. Due to its size, running the model requires large GPU resources. But the open ecosystem as well as community tools reduce the barriers for adoption.
The following is a summary of the information that you will find on this page.
DeepSeek V3.1 is a major milestone for the democratization and exploitation of AI. It shows that language models can be highly effective, open source, low-cost, and cost-efficient. This blend is a powerful combination of Reasoning scalable, Tool integrationThen, Special performance Its ability to perform coding and mathematics tasks makes it a good choice for research as well as applied AI.
Click here to find out more Model on Hugging Face. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter.


