The Arrival of Claude 4: What Surprises Does Anthropic’s New AI Model Bring? A New Peak in Coding and Reasoning!
Anthropic has officially unveiled the next generation of its Claude models: Claude Opus 4 and Claude Sonnet 4! Discover their powerful upgrades in coding, advanced reasoning, and AI agent applications, plus how Claude Code and new API features are empowering developers.
We’ve all felt it—the pace of AI development is dizzyingly fast! And today, Anthropic has delivered another major announcement: the launch of its new Claude models—Claude Opus 4 and Claude Sonnet 4! These are far from minor updates. They’re designed to set new industry standards in coding, advanced logic, and AI agent workflows. Ready to dive in? Let’s explore what makes Claude 4 truly outstanding.
Meet the Claude 4 Duo: Opus 4 and Sonnet 4, Each with Their Own Strengths
Anthropic has released two flagship models at once. Think of them as powerful siblings, each with unique talents, but both equally impressive.
Claude Opus 4: The World’s Leading Coding Expert
First up is Claude Opus 4. Anthropic claims it’s currently the most powerful coding model in the world—and that’s not just hype. It excels at handling long, complex, and detail-intensive tasks, as well as powering AI agent workflows. This model has already received glowing reviews from early adopters:
- Cursor calls it the most advanced coding model yet, with huge gains in understanding complex codebases.
- Replit reports significant improvements in accuracy and capability when managing complex, multi-file changes.
- Block says it’s the first model that improves code quality during its agent (code-named goose)’s edit-debug loop, while maintaining performance and reliability.
- Even Rakuten validated its strength with a demanding open-source refactoring project—Opus 4 ran independently for 7 hours straight without issues.
- Cognition noted that Opus 4 excels at solving complex challenges where previous models failed, handling critical operations they had missed.
Sounds like a dream partner for any developer!
Claude Sonnet 4: A Well-Rounded Upgrade, More Accurate and Practical
Next, we have Claude Sonnet 4—a major upgrade from Sonnet 3.7. It also delivers strong coding and reasoning performance, but with an emphasis on accuracy and practical usability. Anthropic says it strikes the ideal balance between powerful capabilities and day-to-day reliability.
While it may not match Opus 4 in the most demanding tasks, Sonnet 4 shines in real-world scenarios. Many companies are already praising it:
- GitHub sees Sonnet 4 as a top performer in agent use cases and is using it in its new Copilot coding agents.
- Manus highlighted improvements in following complex instructions, clear reasoning, and generating aesthetically pleasing output.
- iGent reported excellent performance in building autonomous, multifunctional apps.
- Sourcegraph praised its focus, deep understanding of problems, and elegant code output for software development.
- Augment Code noted higher task success rates, more precise code edits, and improved attention to detail—making Sonnet 4 their go-to model.
Whether you’re chasing peak performance with Opus 4 or seeking a balanced, efficient model with Sonnet 4, the Claude 4 lineup has you covered.
The good news? Despite the major upgrades, Claude 4’s pricing remains the same as its predecessors. Opus 4 costs $15 per million input tokens and $75 per million output tokens. Sonnet 4 costs $3 and $15, respectively.
Both models are available via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Even better, Sonnet 4 is available to free-tier users, making world-class AI more accessible than ever.
Not Just Model Upgrades—A Whole New Level of Capability
In addition to stronger models, Claude 4 comes with a host of exciting new features that meaningfully enhance its power—not just bells and whistles, but real improvements.
Imagine an AI that can reason deeply and also browse the web or use a calculator like we do. Both Claude 4 models now support “Tool Use for Expanded Reasoning” (Beta).
This means Claude can use external tools like web search during its thought process. It can fluidly switch between reasoning and tool usage to deliver more comprehensive, accurate answers. Think of it as giving AI a backup brain and a universal toolbox.
Better Memory and Instruction Following
The models also saw major upgrades in task execution:
- Parallel tool usage: Claude can use multiple tools simultaneously, boosting efficiency.
- Improved instruction following: It does exactly what you say—unless you tell it not to.
- Enhanced memory: Especially when given access to local files, Claude can extract and retain key info to maintain context over time and build implicit knowledge. (Stay tuned for a fun example!)
Claude Code Now Generally Available: A Developer’s Best Coding Partner
Claude Code is now fully available! After a well-received research preview, Anthropic has expanded how developers can collaborate with Claude.
Claude Code now supports GitHub Actions for background tasks and integrates natively with both VS Code and JetBrains IDEs. This means Claude’s suggestions appear directly in your files, making pair programming smoother than ever.
New API Features for Building Powerful AI Agents
To empower developers even further, the Anthropic API now includes four new tools:
- Code Execution Tool
- MCP Connector
- Files API
- Prompt Caching (up to one hour)
These unlock even more possibilities for building advanced AI agents.
Deep Dive: How Does Claude 4 Push the Limits?
So how does Claude 4 perform in the real world? Let’s take a look at some hard data.
In the industry-standard SWE-bench Verified benchmark for software engineering tasks, Claude 4 models lead the pack. According to Anthropic (based on parallel testing conditions):
- Claude Opus 4 achieved 79.4% (72.5% without parallel conditions)
- Claude Sonnet 4 achieved 80.2% (72.7% without parallel conditions)
In Terminal-bench, Opus 4 scored 43.2% / 50.0%, proving its power in coding tasks.
Smarter, More Reliable Behavior
Beyond metrics, Claude 4 models behave more maturely:
- Less shortcutting: Opus 4 reduces attempts to “cheat” or bypass tasks by 65% compared to Sonnet 3.7, making results more grounded and reliable.
- Amazing memory example: With access to local files while playing Pokémon, Opus 4 created a “Navigation Guide” memory file, recording notes like a real gamer—e.g., “Try the same method no more than 5 times,” or “If stuck, try the opposite.” All self-recorded during gameplay!
- Thought Summaries: To prevent overly long internal reasoning, Claude 4 can summarize thoughts using a smaller model. This happens only ~5% of the time, and full logs are still available if needed via a new Developer Mode (contact sales).
Now generally available, Claude Code is embedding Claude more deeply into everyday developer workflows—whether in terminals, IDEs, or background tasks.
Anthropic released new beta extensions for VS Code and JetBrains, enabling seamless Claude integration. Code suggestions appear inline, simplifying reviews and edits—all within your familiar editor.
Even better, there’s now an expandable Claude Code SDK, meaning you can use the same core agents to build custom AI agents and apps.
To showcase its potential, Anthropic also released Claude Code on GitHub (beta). Tag Claude Code in pull requests, and it can respond to feedback, fix CI errors, or revise your code.
Get Started Now: Safe, Reliable, Full of Potential
Anthropic sees Claude 4 as a major step toward building true virtual collaborators—models that can maintain deep context across long-term projects and have lasting impact.
Of course, with great power comes great responsibility. These models have undergone extensive testing and risk mitigation, including steps to meet higher AI safety standards like ASL-3.
Anthropic is excited to see what you’ll build—and your feedback remains critical in helping them improve.