AI Daily: Anthropic's Zero-Day Defense, GLM-5.1 Long-Horizon Engineering, and Microsoft Harrier

Exploring the AI Frontier: Anthropic’s Security Shield and GLM-5.1’s Long-Horizon Breakthroughs

Sometimes the evolution of technology truly takes your breath away. To be honest, today’s news feels exactly like that. Leading tech companies are pushing boundaries in their respective fields, covering cybersecurity, automated programming, and foundational text retrieval. Let’s take a closer look at today’s noteworthy developments.

Anthropic’s Bombshell: Claude Mythos Preview and Project Glasswing

Anthropic recently made a very bold decision. The company has developed Claude Mythos Preview, a model so powerful it could disrupt the entire cybersecurity field. It is capable of autonomously discovering and exploiting zero-day vulnerabilities in major operating systems and web browsers. Sounds a bit scary? It is. In fact, this model even found a vulnerability in OpenBSD that had been lurking for 27 years and accurately pinpointed a 16-year-old security flaw in the FFmpeg library. These vulnerabilities had escaped countless manual reviews and automated tests in the past, yet were easily cracked by AI.

To prevent these powerful capabilities from being exploited by malicious actors, Anthropic decided not to release this model to the general public. Instead, they launched Project Glasswing. This is an ambitious alliance bringing together tech giants like AWS, Apple, Google, Microsoft, and NVIDIA with the sole purpose of using Mythos Preview’s power exclusively for defensive cybersecurity. Anthropic has further committed up to $100 million in model credits and donated $4 million to open-source security organizations.

How powerful is this model exactly? You can see the detailed security evaluation in the official System Card. This report documents the leap in capabilities and risk testing results, demonstrating a rigorous security mechanism under the new RSP v3.0 policy. While the model occasionally shows a strong drive to complete tasks, the report indicates its behavior remains within controllable limits, highlighting why restricting it to defensive use is a wise decision.

AI’s “double-edged sword” nature has reached unprecedented heights. When AI possesses the ability to easily breach decades-old systems, restricting it to defensive use and forming corporate alliances shows the caution tech giants have regarding AI weaponization. Future cybersecurity defense will no longer be just a human-to-human confrontation, but an arms race between “AI defense” and “AI attack.” Businesses and developers should realize that adopting AI-assisted security scanning tools early is no longer an optional extra, but a necessity for survival.

Z.ai Launches GLM-5.1: An Open-Source Powerhouse Focused on Long-Horizon Engineering Tasks

Developing an AI that can write a few lines of code is one thing, but having it work continuously for eight hours without error? That’s the problem GLM-5.1 aims to solve. As a next-generation flagship engineering model, its performance on long-horizon tasks is impressive. While past models often stalled after dozens of conversation turns, GLM-5.1 can sustain hundreds or even thousands of iterations.

Here’s a specific example. When asked to build a Linux-style desktop web application from scratch, it was able to continuously evaluate its own output, gradually adding features like a file browser, terminal, and system monitor. This process lasted a full eight hours. The final delivery was a visually consistent and fully functional system, requiring no design drafts or mid-course guidance from humans. In tests optimizing vector databases, it executed over 600 iterations and more than 6,000 tool calls, demonstrating extreme stability.

It also achieved top scores in high-difficulty benchmarks like SWE-Bench Pro and Terminal-Bench 2.0. Even better, this model is fully open-sourced under the MIT license. Developers can now go to HuggingFace to download and explore its potential, integrating it into various automated programming workflows.

We are witnessing the transformation of AI from a “single Q&A tool” to a “virtual employee capable of long-term autonomous work.” GLM-5.1 proves that with enough computation and iteration space, AI can self-correct and complete extremely complex engineering systems. The core skill for human developers in the future will shift from “how to write a good single prompt” to “how to deploy, manage, and evaluate the long-term work trajectories of autonomous AI agents.”

Cognition Releases SWE-1.6: High Generation Speed and Ultimate Model UX

If you’ve used AI development tools, you might have encountered models that overthink, get stuck in infinite loops, or persist in using inefficient commands. Cognition’s latest SWE-1.6 is designed to solve these pain points. The development team focused on “Model User Experience,” significantly reducing unnecessary and lengthy reasoning.

The model now tends to call tools in parallel and reduces over-reliance on the terminal interface. This means it can obtain required information faster, reducing user wait time and manual intervention. The model no longer gets stuck easily in the same reasoning loops, making the overall operation trajectory much cleaner and sharper.

In addition to the experience upgrade, generation speed has reached industry-leading levels. On the Windsurf platform, through a partnership with Cerebras, paid users can experience a staggering speed of up to 950 tokens per second. Furthermore, SWE-1.6 is now fully live on the Windsurf platform, and for the next three months, the platform is providing free access at 200 tokens per second through Fireworks.

While a model’s capability and intelligence are important, “Model UX” is the key factor in whether developers are willing to continue using it in their daily work. Reducing infinite loops and overthinking, and enhancing parallel processing capabilities, makes the behavior of AI agents less like a clunky machine and more like an efficient human engineer. For tool developers, reducing AI interaction friction and improving fluency has become the next battlefield.

Microsoft Open-Sources Harrier Embedding Model: Building a Strong Foundation for Agents

When discussing powerful AI agents, precise information retrieval is an indispensable cornerstone. Microsoft just open-sourced the Harrier series of embedding models mentioned in Microsoft Open-Sources Industry-Leading Embedding Model. This technology is tailored specifically for the needs of modern agent systems and topped the rankings in the multilingual MTEB-v2 evaluation, beating numerous competitors.

The development of Harrier combined large-scale contrastive pre-training with synthetic data generation. The team used GPT-5 to generate billions of multilingual text pairs and transferred the capabilities of large teacher models to smaller, more efficient models through knowledge distillation. It supports over 100 languages and features a 32k context window. This not only improves the accuracy of the first retrieval but also reduces system latency and costs.

For application scenarios that require crossing different data sources, maintaining memory, and handling multi-step context, this is a very practical advancement. Interested developers can find model weights and related resources directly on the HuggingFace page.

While the public focuses on chatty generative large language models, Microsoft reminds us that precise “memory, retrieval, and association” are the underlying cornerstones that allow AI agents to operate stably and without error in real environments. When building enterprise-grade AI applications, instead of blindly chasing generative models with more parameters, it’s better to invest in and optimize a powerful, multilingual-supporting embedding model. This is the fundamental way to reduce AI hallucinations and improve application stability.

In summary, from proactive cybersecurity layouts to programming agents capable of continuous operation and foundational models supporting powerful retrieval, these technologies demonstrate diverse developmental facets. Each innovation solves existing problems while painting a clearer outline for the future development environment.

Q&A

About Anthropic and Claude Mythos Preview

Q1: Why did Anthropic develop the powerful Claude Mythos Preview but decide not to release it to the general public? A1: Because the model’s capabilities in the field of cybersecurity represent a staggering leap, even to the point of being potentially weaponizable. It can autonomously discover and exploit zero-day vulnerabilities in major operating systems and browsers (such as the 27-year-old vulnerability in OpenBSD and remote code execution vulnerabilities in FreeBSD). Considering that these powerful capabilities could pose a serious threat to global networks and national security if they fell into the hands of malicious actors, Anthropic decided to strictly limit it to defensive use. To this end, they launched Project Glasswing, partnering with tech giants like Microsoft, Google, and Apple to focus on using the model to patch security vulnerabilities in global critical infrastructure.

About Z.ai and GLM-5.1

Q2: What is the biggest difference between Z.ai’s GLM-5.1 and other AI programming models on the market? A2: GLM-5.1’s biggest breakthrough lies in solving the bottleneck of “Long-Horizon Engineering Tasks.” While past models often stalled or lost direction after dozens of conversation turns or modifications, GLM-5.1 can maintain efficient optimization capabilities across hundreds or even thousands of iterations. For example, it can build a web-based Linux desktop environment including a file browser and terminal from scratch during 8 hours of autonomous operation, or autonomously execute over 600 iterations and more than 6,000 tool calls when optimizing a vector database.

About Cognition and SWE-1.6

Q3: What common pain points of AI development tools does Cognition’s SWE-1.6 solve? Can average developers experience it for free? A3: SWE-1.6 doesn’t just strive to be smart; it focuses on optimizing “Model UX.” It significantly reduces common undesirable behaviors of AI agents, such as overthinking simple problems, getting stuck in infinite loops, and over-relying on terminal interfaces. The model now knows how to call multiple tools in parallel, making the operation trajectory more concise and faster. Regarding costs, SWE-1.6 is now fully live on the Windsurf platform, and for the next three months, the platform is providing free access at 200 tokens per second through Fireworks. Paid users can experience extreme speeds of up to 950 tokens per second through Cerebras.

About Microsoft and the Harrier Embedding Model

Q4: Everyone is focusing on generative AI that can chat; why is Microsoft’s open-source Harrier “Embedding Model” equally important? How does it help AI Agents? A4: Embedding models are the underlying foundation used by AI systems to “search, retrieve, organize, and connect information.” In modern AI Agent applications, agents must search across data sources in multiple steps, maintain long-term memory, and update context. Harrier is built specifically for this, supporting over 100 languages, featuring a 32k context window, and topping the multilingual MTEB-v2 evaluation. This means it can provide higher accuracy for the first retrieval, lower system latency, and lower costs, preventing AI Agents from easily “losing memory” or hallucinating when executing complex tasks.

Comprehensive Reflection

Q5: Looking at these four technological advances, what is the common major trend in current AI development? A5: The common trend is that AI is transforming from a “single-turn Q&A conversation tool” to “Agentic Systems” capable of long-term autonomous operation. Whether it’s Claude Mythos Preview capable of autonomously scanning and exploiting vulnerabilities, GLM-5.1 iterating for 8 hours to write systems, SWE-1.6 striving for smoother tool-call trajectories, or the Harrier model strengthening the foundation of agent memory and retrieval, all show that the industry is going all out to build “virtual employees” that can independently, stably, and for long periods execute complex tasks in real environments. This also means that the collaboration mode between humans and AI will shift from “giving instructions” to “assigning tasks and supervising.”