The dark horse of the AI field, Z.ai, is making waves again! The newly released GLM-4.5 and GLM-4.5-Air models are not only impressive in their parameter scale, but also declare their strong ambition in the field of complex Agentic AI applications with their innovative “hybrid inference mode” and amazing performance in several authoritative benchmark tests.
The race in artificial intelligence never stops, and just as everyone was discussing the models of the major giants, the R&D team from Z.ai dropped a bombshell. They officially launched two new members of the GLM series: GLM-4.5 and GLM-4.5-Air. This is not just a regular update, but more like a declaration of a technological leap.
From the very beginning of their design, the goal of these two models was very clear: to integrate top-tier reasoning, code generation, and AI agent capabilities into a single model to cope with increasingly complex application scenarios. In this era of rapid rise of AI agent applications, this move is particularly important.
The Debut of the Two Heroes: More Than Just a Stack of Parameters
When we talk about a new model, the parameter scale is always a topic that cannot be avoided. But the GLM-4.5 family tells us that a smart architecture is more important than just numbers.
- GLM-4.5: As the top flagship in the family, it has a total of 355 billion parameters and 32 billion active parameters. This means it has an extremely deep knowledge base and the ability to handle complex problems.
- GLM-4.5-Air: This is a lighter and more efficient version with 106 billion total parameters and 12 billion active parameters. Its existence is to find the perfect balance between performance and efficiency, suitable for a more diverse range of application scenarios.
You might ask, what are “active parameters”? It’s like a person’s brain. Although it stores a huge amount of information, when thinking about a specific problem, it only mobilizes the most relevant parts. This Mixture-of-Experts (MoE) architecture allows the model to maintain powerful capabilities while having higher computational efficiency and faster responses.
Between Thinking and Not Thinking: The Innovation of Hybrid Inference Mode
This is perhaps the most exciting innovation of the GLM-4.5 series. In the past, we often had to make a trade-off between the model’s “depth of thought” and “reaction speed.” But GLM-4.5 introduces a hybrid inference mode, allowing the model to automatically switch its working mode based on the difficulty of the problem, just like a person.
- Thinking mode: When encountering complex tasks that require multi-step reasoning, planning, or using external tools (such as searching for data, executing code), the model will enter this mode. It will “stop and think,” formulate a strategy, and ensure that it provides high-quality, in-depth answers.
- Non-thinking mode: For simple, direct questions and answers, the model will switch to this mode, providing real-time, fast responses without any delay.
The benefits of this design are obvious: it takes into account both depth and speed, ensuring that users can get the best experience in any scenario.
Seeing is Believing: Sweeping Major Benchmark Tests
After all this talk, how is the actual performance? Data is always the most powerful proof. Judging from the benchmark test charts released by the official, the performance of the GLM-4.5 duo can only be described as “amazing.”
Let’s analyze them one by one:
- TAU-Bench (Retail Scenario): In this test that simulates real retail conversations, the performance of GLM-4.5 (79.7 points) and GLM-4.5-Air (77.9 points) is very impressive, on par with the industry’s top models and far ahead of other well-known models.
- TAU-Bench (Aviation Scenario): This scenario also tests the model’s professional domain conversation ability. Interestingly, the lighter GLM-4.5-Air (60.8 points) even slightly surpassed its “big brother” GLM-4.5 (60.4 points) in this project, both occupying the top spot on the list, demonstrating its excellent efficiency and performance.
- BFCL-v3 (Multi-turn Dialogue): This test is the real highlight. It specifically evaluates the model’s ability to maintain contextual understanding and logical consistency in long, multi-turn conversations—this is the core of AI agents. In this project, GLM-4.5 (64.3 points) and GLM-4.5-Air (61.9 points) achieved a landslide victory, leaving all other competitors far behind. This strongly proves their huge potential in performing complex agent tasks.
Immediate Experience and Open Source: Embracing the Power of the Community
The Z.ai team knows that a great model needs an active community. Therefore, they provide a variety of ways for everyone to experience and use the GLM-4.5 series:
- Online Experience: You can directly visit the Z.ai official website or the BigModel.cn platform to experience the power of the new models for yourself.
- Open Source Weights: For developers and researchers, the best news is this. The model weights of GLM-4.5 and GLM-4.5-Air have been opened on Hugging Face and ModelScope, and anyone can download and deploy them in their own projects.
This open attitude will undoubtedly greatly accelerate the development of the GLM-4.5 ecosystem and spawn more creative applications.
Frequently Asked Questions (FAQ)
Q1: How should I choose between GLM-4.5 and GLM-4.5-Air?
A: It depends on your needs. If you are pursuing the most powerful performance and need to handle extremely complex reasoning tasks, then GLM-4.5 with more parameters is your first choice. If you value efficiency and response speed more, or need to deploy in a resource-limited environment, then GLM-4.5-Air will provide an unparalleled price-performance ratio.
Q2: What does “hybrid inference mode” mean for ordinary users?
A: Simply put, you no longer have to worry about the AI “thinking too long” or “thinking too shallowly” when you ask it a question. The model will automatically judge the difficulty of your question. If you ask a simple question, it will answer in seconds; if you ask a complex question, it will think deeply like an expert before giving you a reliable answer, and the experience is very smooth.
Q3: I am a developer, where can I get these models?
A: You can get the models through multiple channels. The most direct way is to go to the Z.ai page on Hugging Face, where the complete model weights of GLM-4.5 and GLM-4.5-Air are available for download. At the same time, you can also call them through the API on the Z.ai and BigModel.cn platforms.


