The DeepSeek AI team has listened to extensive user feedback and ceremoniously launched a brand-new upgraded version of DeepSeek-V3.1, DeepSeek-V3.1-Terminus. The new version not only fixes language consistency issues but also significantly enhances the capabilities of the Code Agent and Search Agent, delivering a more stable and powerful AI experience. This article will take you deep into the highlights of the Terminus version and explore its performance through detailed evaluation data.
We Heard Your Feedback: The Birth of DeepSeek-V3.1-Terminus
In today’s fast-iterating world of AI technology, the quality of a model is not just determined by cold evaluation scores, but by its ability to truly solve user pain points. The DeepSeek AI team clearly understands this principle. Recently, they officially launched DeepSeek-V3.1-Terminus, which is not just a version update, but more like a deep dialogue with the community.
Frankly, no matter how powerful a model is, if its output mixes Chinese and English, or occasionally produces some incomprehensible abnormal characters, the experience can be quite jarring. A core goal of the Terminus version is to solve this problem and comprehensively improve language consistency.
In addition, another major focus is the further evolution of Agent capabilities. The Agent here can be thought of as the “hands” and “feet” of the AI, allowing it to not only chat but also help you perform complex tasks. The Terminus version has been deeply optimized specifically for the Code Agent (a helper for writing code) and the Search Agent (a helper for searching the internet), making them more proficient in practical applications.
Not Just Talk: Seeing the Hard Power of Terminus with Data
Empty words are not enough; performance improvements must ultimately be backed by data. Let’s take a look at how DeepSeek-V3.1-Terminus performs on major authoritative benchmarks.
| Benchmark | DeepSeek-V3.1 | DeepSeek-V3.1-Terminus |
|---|---|---|
| Non-Agent (thinking mode) | ||
| MMLU-Pro | 84.8 | 85.0 |
| GPQA-Diamond | 80.1 | 80.7 |
| Humanity’s Last Exam | 15.9 | 21.7 |
| LiveCodeBench | 74.8 | 74.9 |
| Codeforces | 2091 | 2046 |
| Aider-Polyglot | 76.3 | 76.1 |
| Agent | ||
| BrowseComp | 30.0 | 38.5 |
| BrowseComp-zh | 49.2 | 45.0 |
| SimpleQA | 93.4 | 96.8 |
| SWE Verified | 66.0 | 68.4 |
| SWE-bench Multilingual | 54.5 | 57.8 |
| Terminal-bench | 31.3 | 36.7 |
The chart above clearly shows that this update is comprehensive.
Non-Agent Evaluation (Model’s Basic Capabilities)
In the “Non-Agent Evaluation,” which tests the model’s basic knowledge and reasoning abilities, the Terminus version maintained its high standards and achieved breakthroughs in some areas.
- MMLU-Pro & GPQA-Diamond: These two tests examine the model’s multi-task language understanding and professional Q&A capabilities. The score for
Terminusslightly increased from 84.8 to 85.0 and from 80.1 to 80.7, indicating a more solid foundational knowledge base. - Humanity’s Last Exam: This is a highly challenging test, and the score jumped significantly from 15.9 to 21.7! This means the model’s ability to handle extremely complex and tricky problems has been significantly enhanced.
- LiveCodeBench & Codeforces: In code-related tests, the scores remained largely the same, which also proves that the new version did not sacrifice its powerful code generation capabilities while being optimized.
Agent Evaluation (Model’s Tool-Using Capabilities)
This part is the biggest highlight of this update! The Agent evaluation tests the model’s intelligence in using external tools (like browsers, terminals) to complete tasks.
- BrowseComp & SimpleQA: In tests simulating real-person web browsing and simple Q&A, the scores soared from 30.0 to 38.5 and from 93.4 to 96.8. This means the
TerminusSearch Agent has become smarter and can more accurately understand instructions and find answers. - SWE Verified & SWE-bench Multilingual: Software engineering-related tests also saw steady growth, proving that the strength of its Code Agent has indeed reached a new level.
- Terminal-bench: In the test simulating the use of terminal command lines, the score increased from 31.3 to 36.7, which is undoubtedly good news for developers who need to perform complex system operations.
It is worth noting that the official announcement mentioned that the toolset of the Search Agent has been adjusted in the new version. For more detailed technical information, you can refer to the official documentation released on HuggingFace.
Experience It Now! How to Get the Latest DeepSeek-V3.1-Terminus?
After all this, are you eager to get your hands on it and try it out? It’s simple! DeepSeek has officially updated the models on all platforms to DeepSeek-V3.1-Terminus.
Whether you are used to using the official App, web version, or mini-program, what you are experiencing now is the latest and most powerful version.
For developers and researchers, the DeepSeek API has also been updated simultaneously, allowing you to seamlessly enjoy the stability and power brought by Terminus in your applications.
Of course, as a staunch partner of the open-source community, DeepSeek also provided the model download links immediately:
- HuggingFace: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus
- ModelScope: https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1-Terminus
Frequently Asked Questions (FAQ)
Q1: What is the difference between DeepSeek-V3.1-Terminus and the previous version?
Terminus is a major upgrade to V3.1. It mainly optimizes two pain points reported by users: first is language consistency, which significantly reduces the problem of mixed Chinese and English and abnormal characters; second is Agent capability, making the model perform better and more stably when executing code and search tasks.
Q2: What is the biggest highlight of this update?
The biggest highlight is undoubtedly the significant improvement in Agent performance. Judging from the evaluation data, whether it is simulating web browsing (BrowseComp) or software engineering tasks (SWE Verified), the performance of Terminus has made a qualitative leap, making it more practical in real-world application scenarios.
Q3: Do I need to pay to use this new model?
No! You can experience it directly through the free App, web version, and mini-program provided by DeepSeek. For developers with higher demands, you can choose to use the DeepSeek API (billed by usage) or download the open-source model directly from HuggingFace or ModelScope for deployment.


