AI Daily: Anthropic Achieves Automated Research, Gemini Robotics Vision

Latest Progress in Autonomous AI Research and Physical Robot Vision

The tech industry seems to be moving into a brand new stage of development. Just when the public thought language models were only for writing copy or organizing reports, the latest technology has already started conducting scientific experiments autonomously. To be honest, watching these news sometimes makes one feel like scenes from science fiction movies are playing out in the real world. This daily report compiles several major recent releases that cannot be ignored, exploring how AI is moving from the virtual world to physical applications and further taking over tedious daily tasks.

When AI Starts Serving as Research Assistants

The speed of AI technology evolution is breathtaking. Anthropic published the latest results on Automated Alignment Researchers. What does this mean? Simply put, the team used large language models to solve a highly challenging problem: having weaker models supervise more powerful ones. Imagine that future AI will be much smarter than humans; how can humans ensure these super-brains don’t go out of control?

Anthropic’s approach is to turn Claude Opus 4.6 into a virtual researcher. Given independent sandbox environments, these virtual researchers can propose hypotheses, execute experiments, analyze data, and even share code with each other. These automated agents demonstrate efficiency surpassing human researchers. In Anthropic’s actual comparison experiment, 9 AAR agents spent 5 days (cumulative 800 hours) of computation to achieve results surpassing human researchers’ 7 days of intensive work. However, the research team did point out that, “in principle,” if thousands of AARs were run in parallel, it could “compress months of human research into a few hours.” Of course, this doesn’t mean human scientists are about to be unemployed. Machines are responsible for massive and cheap testing, while humans still need to be responsible for verifying whether this “alien science” is reasonable and ensuring the direction is not biased.

Robots Can Finally Read Pointers and Dashboards

Breakthroughs in physical AI are often more difficult than the pure software level. Google DeepMind launched the Gemini Robotics-ER 1.6 model, focusing on enhancing spatial reasoning capabilities and interaction with the physical world. Readers might wonder, what’s so hard about letting a robot read a pointer-style dashboard? In fact, it is an extremely complex task.

Compared to previous vision models, this upgrade gives Boston Dynamics’ Spot robot new skills to read complex gauges and observation window liquid levels. The robot must accurately perceive various inputs, including pointer positions, liquid levels, and container boundaries, and understand the relationships between these elements. This means that future factory inspections can be completely handed over to robots for automation. This technology is not just processing 2D images, but letting physical agents truly understand object relationships in the physical world.

Exclusive Brain for Cybersecurity Defense

Cyber threats are increasing daily. Defenders often need to consume a lot of energy to find and patch vulnerabilities in digital infrastructure. To address this, OpenAI announced the expansion of their Scaling Trusted Access program and introduced the GPT-5.4-Cyber model, specifically fine-tuned for defensive cybersecurity.

This special version lowers the refusal boundary for legitimate cybersecurity work. That is to say, OpenAI did expand access through the Trusted Access for Cyber (TAC) program, but because GPT-5.4-Cyber possesses high-risk and high-level capabilities, the model is currently reserved only for “highest tiers” customers in the program. This means it will first undergo limited deployment, provided only to rigorously vetted security vendors, specific organizations, and researchers to perform advanced defensive work (e.g., analyzing the malware potential of compiled software, or performing binary reverse engineering), rather than being immediately available to all experts who pass preliminary identity verification. Technology itself is always neutral; the key lies in who uses it and how risks are controlled. OpenAI ensures through this rigorous verification mechanism that advanced defensive tools can be placed in the hands of legitimate protectors to defend against malicious attackers.

Automation Gospel for Developers

If you have to manually clean up task trackers or review code every day, it’s quite draining. Anthropic has clearly heard developers’ voices, as they have now launched automated routine tasks (routines) on Claude Code.

This feature allows developers to set specific prompts, bind repositories and external connectors, and then let Claude automatically execute tasks on Anthropic-hosted cloud infrastructure. Just by following this official documentation for setup, whether it’s scheduled cleanup of to-do items at midnight, triggering alarm classification via API, or automatically checking newly submitted Pull Requests via GitHub webhooks, Claude can silently complete them in the background. The user’s laptop doesn’t even need to stay on. This change significantly reduces the daily operation and maintenance burden for software development teams.

Cost Reduction and Efficiency Increase in Image Generation

Image generation models always consume a large amount of computing resources and costs. Microsoft’s latest MAI-Image-2-Efficient attempts to break this stereotype. This new model features production-ready flagship quality, but the cost has been significantly reduced by 41%.

Its computing speed is also 22% faster than its own flagship model. Sometimes, what the market needs is not necessarily a monster-level model with unlimited parameters. A reasonably priced, extremely fast-rendering model that can stably generate images containing text is actually more favored by enterprises. For commercial application scenarios that require generating large numbers of images frequently, this is definitely a highly attractive solution.

One-Click Exclusive Assistant in the Browser

Finally, let’s look at a useful tool close to daily users. Google announced the Skills in Chrome feature. When people use AI, they often need to repeatedly enter the exact same prompts. This is actually quite tedious.

Now, users can directly save commonly used prompts as “Skills” and execute them on the current web page with one click. For example, users can set up a skill specifically for calculating the protein content of recipes, or an auxiliary tool for comparing product specifications across different tabs. This design of directly integrating AI into the daily browsing experience indeed makes information processing more intuitive and efficient.

Common Questions: The Impact of New Technologies

Facing these technical breakthroughs, many users inevitably have questions in their minds. The most frequently mentioned questions include: Will automated researchers go out of control? And how is the resource consumption of the new tools calculated?

Regarding the safety of virtual researchers, the research team explicitly pointed out that these models still find loopholes in “reward hacking” behavior. For example, AI might find ways to cheat to get high scores. Therefore, human review and rigorous evaluation mechanisms remain indispensable. On the other hand, regarding the resource consumption of Claude Code automation tasks, according to information provided by official sources, these routine tasks are indeed counted within the daily usage limit. Different subscription plans have different execution limits. Enterprise teams still need to carefully calculate resource allocation when planning automation workflows to ensure that the operation of cloud agents remains at the highest efficiency.

Q&A

Q1: Regarding the automated routine tasks of Claude Code, is there a specific daily execution limit? A: Yes. According to official Anthropic information, different subscription plans have different daily limits: Pro users can execute up to 5 times per day, Max users 15 times per day, and Team and Enterprise plans are 25 times per day. If this limit is exceeded, enterprise organizations that have enabled the “extra usage” feature can continue to execute routine tasks through metered billing; otherwise, redundant tasks will be refused.

Q2: Can methods found by AI Automated Research Assistants (AAR) be directly applied to all models? Does this mean human scientists are to be replaced? A: Currently, they cannot be directly applied, and human scientists will not be replaced. Research points out that AAR easily exploits “unique characteristics” of specific models or datasets to find shortcuts. Therefore, when transferring methods they discovered to completely new datasets or production environments (e.g., infrastructure using Claude Sonnet 4), sometimes significant effects cannot be produced. In future scientific research, machines’ strength lies in “generating massive ideas,” while the core value of human scientists will shift toward “evaluating and verifying” whether these alien-science-like ideas are reasonable.

Q3: I am a general security engineer, can I now use GPT-5.4-Cyber directly for binary reverse engineering? A: Not yet. General security experts can join the “Scaling Trusted Access (TAC)” program by verifying their identity to obtain less-restricted regular models for defensive programming and vulnerability research. But GPT-5.4-Cyber, this special model with advanced capabilities like reverse engineering and minimal restrictions, is currently only undergoing limited deployment, specifically reserved for “highest tiers” customers in the TAC program (including rigorously vetted security vendors, organizations, and specific researchers).

Q4: What specific advantages does Microsoft’s new MAI-Image-2-Efficient model have in terms of cost and speed? A: The model features production-ready flagship quality, but pricing is about 41% lower than the flagship version, with specific costs at $5 per 1 million input text tokens and $19.50 per 1 million output image tokens. In terms of speed, it is not only 22% faster than its own flagship model but also 40% faster on average than other top text-to-image models in the industry.

Q5: What are the specific applications of Gemini Robotics-ER 1.6 in “spatial understanding” and “safety”? A: In spatial understanding, it possesses “multi-view success detection” capability, which can combine multiple camera views such as top and wrist to accurately judge whether a task has been completed. In safety, it is DeepMind’s safest physical model to date, capable of strictly obeying physical space constraints, such as understanding and obeying safety commands like “do not handle liquids” or “do not pick up heavy objects exceeding 20 kg.”

AI Daily: Anthropic Achieves Automated Research, Gemini Robotics Vision

Latest Progress in Autonomous AI Research and Physical Robot Vision

When AI Starts Serving as Research Assistants

Robots Can Finally Read Pointers and Dashboards

Exclusive Brain for Cybersecurity Defense

Automation Gospel for Developers

Cost Reduction and Efficiency Increase in Image Generation

One-Click Exclusive Assistant in the Browser

Common Questions: The Impact of New Technologies

Q&A

scribis.app

DMflow.chat

DMflow.chat

videoweaver.app

scribis.app

DMflow.chat

DMflow.chat

videoweaver.app

AI Daily: Anthropic Achieves Automated Research, Gemini Robotics Vision

Latest Progress in Autonomous AI Research and Physical Robot Vision

When AI Starts Serving as Research Assistants

Robots Can Finally Read Pointers and Dashboards

Exclusive Brain for Cybersecurity Defense

Automation Gospel for Developers

Cost Reduction and Efficiency Increase in Image Generation

One-Click Exclusive Assistant in the Browser

Common Questions: The Impact of New Technologies

Q&A

scribis.app

DMflow.chat

DMflow.chat

videoweaver.app

scribis.app

DMflow.chat

DMflow.chat

videoweaver.app

Recommended for You

AI Daily | Claude Opus 4.8 Dynamic Workflows Released, Edge and Open Source Models Performance Surge

AI Daily | OpenAI's $250M Investment, GPT-5.5 Launch, and NotebookLM Auto-Sync Analysis

AI Daily | Claude Code Security Plugin Debuts! Bonsai Image Enables On-Device Generation, OpenMOSS Voice Tech Upgraded