Introducing DeepSeek-V3.1-Terminus: Fixing Language Consistency and Enhancing Agent Capabilities for a More Stable AI Experience

The DeepSeek AI team has listened to extensive user feedback and ceremoniously launched a brand-new upgraded version of DeepSeek-V3.1, DeepSeek-V3.1-Terminus. The new version not only fixes language consistency issues but also significantly enhances the capabilities of the Code Agent and Search Agent, delivering a more stable and powerful AI experience. This article will take you deep into the highlights of the Terminus version and explore its performance through detailed evaluation data.

We Heard Your Feedback: The Birth of DeepSeek-V3.1-Terminus

In today’s fast-iterating world of AI technology, the quality of a model is not just determined by cold evaluation scores, but by its ability to truly solve user pain points. The DeepSeek AI team clearly understands this principle. Recently, they officially launched DeepSeek-V3.1-Terminus, which is not just a version update, but more like a deep dialogue with the community.

Frankly, no matter how powerful a model is, if its output mixes Chinese and English, or occasionally produces some incomprehensible abnormal characters, the experience can be quite jarring. A core goal of the Terminus version is to solve this problem and comprehensively improve language consistency.

In addition, another major focus is the further evolution of Agent capabilities. The Agent here can be thought of as the “hands” and “feet” of the AI, allowing it to not only chat but also help you perform complex tasks. The Terminus version has been deeply optimized specifically for the Code Agent (a helper for writing code) and the Search Agent (a helper for searching the internet), making them more proficient in practical applications.

Not Just Talk: Seeing the Hard Power of Terminus with Data

Empty words are not enough; performance improvements must ultimately be backed by data. Let’s take a look at how DeepSeek-V3.1-Terminus performs on major authoritative benchmarks.

Benchmark	DeepSeek-V3.1	DeepSeek-V3.1-Terminus
Non-Agent (thinking mode)
MMLU-Pro	84.8	85.0
GPQA-Diamond	80.1	80.7
Humanity’s Last Exam	15.9	21.7
LiveCodeBench	74.8	74.9
Codeforces	2091	2046
Aider-Polyglot	76.3	76.1
Agent
BrowseComp	30.0	38.5
BrowseComp-zh	49.2	45.0
SimpleQA	93.4	96.8
SWE Verified	66.0	68.4
SWE-bench Multilingual	54.5	57.8
Terminal-bench	31.3	36.7

The chart above clearly shows that this update is comprehensive.

Non-Agent Evaluation (Model’s Basic Capabilities)

In the “Non-Agent Evaluation,” which tests the model’s basic knowledge and reasoning abilities, the Terminus version maintained its high standards and achieved breakthroughs in some areas.

MMLU-Pro & GPQA-Diamond: These two tests examine the model’s multi-task language understanding and professional Q&A capabilities. The score for Terminus slightly increased from 84.8 to 85.0 and from 80.1 to 80.7, indicating a more solid foundational knowledge base.
Humanity’s Last Exam: This is a highly challenging test, and the score jumped significantly from 15.9 to 21.7! This means the model’s ability to handle extremely complex and tricky problems has been significantly enhanced.
LiveCodeBench & Codeforces: In code-related tests, the scores remained largely the same, which also proves that the new version did not sacrifice its powerful code generation capabilities while being optimized.

Agent Evaluation (Model’s Tool-Using Capabilities)

This part is the biggest highlight of this update! The Agent evaluation tests the model’s intelligence in using external tools (like browsers, terminals) to complete tasks.

BrowseComp & SimpleQA: In tests simulating real-person web browsing and simple Q&A, the scores soared from 30.0 to 38.5 and from 93.4 to 96.8. This means the Terminus Search Agent has become smarter and can more accurately understand instructions and find answers.
SWE Verified & SWE-bench Multilingual: Software engineering-related tests also saw steady growth, proving that the strength of its Code Agent has indeed reached a new level.
Terminal-bench: In the test simulating the use of terminal command lines, the score increased from 31.3 to 36.7, which is undoubtedly good news for developers who need to perform complex system operations.

It is worth noting that the official announcement mentioned that the toolset of the Search Agent has been adjusted in the new version. For more detailed technical information, you can refer to the official documentation released on HuggingFace.

Experience It Now! How to Get the Latest DeepSeek-V3.1-Terminus?

After all this, are you eager to get your hands on it and try it out? It’s simple! DeepSeek has officially updated the models on all platforms to DeepSeek-V3.1-Terminus.

Whether you are used to using the official App, web version, or mini-program, what you are experiencing now is the latest and most powerful version.

For developers and researchers, the DeepSeek API has also been updated simultaneously, allowing you to seamlessly enjoy the stability and power brought by Terminus in your applications.

Of course, as a staunch partner of the open-source community, DeepSeek also provided the model download links immediately:

HuggingFace: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus
ModelScope: https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1-Terminus

Frequently Asked Questions (FAQ)

Q1: What is the difference between DeepSeek-V3.1-Terminus and the previous version?

Terminus is a major upgrade to V3.1. It mainly optimizes two pain points reported by users: first is language consistency, which significantly reduces the problem of mixed Chinese and English and abnormal characters; second is Agent capability, making the model perform better and more stably when executing code and search tasks.

Q2: What is the biggest highlight of this update?

The biggest highlight is undoubtedly the significant improvement in Agent performance. Judging from the evaluation data, whether it is simulating web browsing (BrowseComp) or software engineering tasks (SWE Verified), the performance of Terminus has made a qualitative leap, making it more practical in real-world application scenarios.

Q3: Do I need to pay to use this new model?

No! You can experience it directly through the free App, web version, and mini-program provided by DeepSeek. For developers with higher demands, you can choose to use the DeepSeek API (billed by usage) or download the open-source model directly from HuggingFace or ModelScope for deployment.

We Heard Your Feedback: The Birth of DeepSeek-V3.1-Terminus

Not Just Talk: Seeing the Hard Power of Terminus with Data

Non-Agent Evaluation (Model’s Basic Capabilities)

Agent Evaluation (Model’s Tool-Using Capabilities)

Experience It Now! How to Get the Latest DeepSeek-V3.1-Terminus?

Frequently Asked Questions (FAQ)

DMflow.chat

DMflow.chat

videoweaver.app

DMflow.chat

DMflow.chat

videoweaver.app

Hello, we want to use some third-party cookies and scripts to enhance the functionality of this website.

Introducing DeepSeek-V3.1-Terminus: Fixing Language Consistency and Enhancing Agent Capabilities for a More Stable AI Experience

We Heard Your Feedback: The Birth of DeepSeek-V3.1-Terminus

Not Just Talk: Seeing the Hard Power of Terminus with Data

Non-Agent Evaluation (Model’s Basic Capabilities)

Agent Evaluation (Model’s Tool-Using Capabilities)

Experience It Now! How to Get the Latest DeepSeek-V3.1-Terminus?

Frequently Asked Questions (FAQ)

DMflow.chat

DMflow.chat

videoweaver.app

DMflow.chat

DMflow.chat

videoweaver.app

Recommended for You

DeepSeek-V3.2-Exp Unveiled: A More Efficient and Economical Choice for Long-Context Processing

AI Learns to Think for Itself? DeepSeek-R1 on the Cover of Nature Reveals the Surprising Potential of Pure Reinforcement Learning

DeepSeek V3.1 Major Upgrade! 128k Ultra-Long Context, Open-Sourced on Hugging Face!