AI Daily: Llama 4 Benchmark Faking Confirmed? Yann LeCun Drops Bombshell Before Departure, OpenAI Secretly Building Voice Hardware

In this whirlwind week in tech, from bombshells within Meta to practical tips for developer tools and breakthroughs in model architecture, the volume of information is staggering. This isn’t just about whose model is stronger; it’s about integrity, the philosophy of tool usage, and the future of how we interact with machines.

Meta’s Trust Crisis: Llama 4 Benchmarks Confirmed to be “Fudged”

This might be the biggest scandal in the AI circle recently. For a long time, the community has had doubts about Meta Llama 4’s benchmark results, feeling the data was almost too good to be true. Now, those suspicions have finally been confirmed internally—and by none other than departing Chief AI Scientist Yann LeCun.

According to a report by Slashdot, LeCun was blunt in an interview with the Financial Times, admitting that Llama 4’s results were “fudged a little bit.” To achieve high scores across various tests, the team used different versions of the model for specific benchmarks, completely violating the principle of fair evaluation.

The fallout from this scandal is severe. Rumor has it that Mark Zuckerberg is furious, not only losing confidence in the teams involved but even “marginalizing” the entire generative AI department. This explains why the highly anticipated full version of Llama 4 has been delayed and subsequent updates have almost stalled. As LeCun prepares to leave Meta to start his own lab, he left a cryptic remark: members of Meta’s newly recruited Super Intelligence team are “completely LLM-pilled,” whereas he has always believed this path is a dead end for achieving Super Intelligence.

This incident undoubtedly casts a shadow over the credibility of open-source models and serves as a cautionary tale for developers when choosing models.

How Do the Masters Use Tools? Claude Code Creator’s “Vanilla” Setup

In contrast to Meta’s chaos, the Claude development community appears much more pragmatic. Many are curious: how does Boris Cherny, the creator of the powerful Claude Code tool, code in his daily life? Is his setup too complex to replicate?

The answer is surprisingly simple. Boris Cherny shared on X that his setup is actually very “Vanilla.” He emphasizes that Claude Code works out of the box and doesn’t require excessive customization.

His workflow primarily relies on a mix of terminal and web-based operations:

Parallel Multitasking: He runs 5 Claude instances simultaneously in the terminal, tabbed 1 through 5, and uses system notifications to know which instance needs input.
Cloud Collaboration: In addition to local instances, he runs 5-10 instances in parallel on claude.ai/code.
Flexible Switching: When coding, he frequently uses the & command to hand over local conversations to the web version, or uses --teleport to jump back and forth between them.

The most interesting point is that their team shares a CLAUDE.md file. This file acts like an “employee handbook” for the AI, documenting project best practices. Whenever Claude makes a mistake, the team updates this file to ensure the AI doesn’t repeat it. This “collective training” approach is definitely something software development teams can learn from.

OpenAI’s Next Move: More Human-like Voice Interaction and Dedicated Hardware

While developers are optimizing code, OpenAI seems to be preparing to change our physical interaction with AI. According to exclusive news from The Information, OpenAI is actively integrating its internal audio and speech teams, aiming to launch a brand-new voice model architecture in the first quarter of 2026.

This isn’t just a simple model update; it’s paving the way for an “AI-first” personal hardware device. The device, expected to debut in a year, is said to possess a high level of emotional expression, with a voice that sounds more natural and emotionally resonant.

The key technical breakthrough lies in “real-time interruption handling” and faster response speeds. Imagine talking to it just like a real person—you can interrupt at any time, and the AI can naturally pause and respond, moving away from the rigid Q&A mode. This proactive, companion-style AI might be the next gateway OpenAI wants to capture.

DeepSeek Technical Deep Dive: Solving the “Identity Crisis” of Hyper-Connection Architectures

In the academic field, the DeepSeek team has just published a heavyweight paper mHC: Manifold-Constrained Hyper-Connections, proposing significant improvements to the foundation of large model architectures.

What is mHC?

This research aims to solve the bottleneck encountered by “Hyper-Connections (HC)” architectures during scaling. While HC improves performance by expanding the width of the Residual Stream, it also breaks the crucial “Identity Mapping” property in residual connections. Simply put, as the model gets deeper, signals tend to distort during transmission, leading to unstable training.

How did they solve it?

DeepSeek proposed a method called “Manifold-Constrained Hyper-Connections (mHC).” It sounds complex, but the core concepts are:

Manifold Projection: They constrain the matrix of residual connections within a specific geometric space (the Birkhoff polytope).
Doubly Stochastic Matrix: Forcing the sum of both rows and columns of the matrix to be 1. This turns signal transmission into a “Convex Combination,” like a weighted mix of features rather than unlimited amplification or shrinkage.

Practical Effects

This design restores signal conservation, making the training of deep networks exceptionally stable. Experiments show that in training a 27B parameter model, mHC not only solved the gradient explosion problem but also only added about 6.7% computational overhead in exchange for stronger scalability and stability. This is a crucial technical cornerstone for building even larger-scale foundation models in the future.

We can break down this technical breakthrough into three simple stages:

1. The Problem: The original enhanced design (HC) was like an “Exaggerating Messenger”

Imagine playing a game of “Telephone” that is 100 stories high (this represents a deep neural network):

Traditional Architecture (ResNet): Like dutifully passing a message to the next floor at each level. While stable, the communication channel is narrow (a single lane).
Hyper-Connection Architecture (HC): An earlier improvement. it widened the channel (e.g., to 4 lanes), allowing information to be exchanged between different lanes.
- The Issue: When exchanging information, there were no rules. The sound coming down from the floor above might be unlimitedly amplified at the current level.
- Result: Like someone who loves to exaggerate when passing a message. After several floors, a simple “Hello” might turn into a deafening scream (signal/gradient explosion). This makes the model extremely unstable during training, or even causes it to fail.

2. The Solution (mHC): Strict “Volume Control”

DeepSeek’s mHC (Manifold-Constrained Hyper-Connections) adds strict mathematical rules to this messaging process, called “Doubly Stochastic Matrices,” which we can think of as a “100% Quota System.”

What is “Manifold Constraint”? It sounds difficult, but it simply mandates: No matter how you mix information, the total amount must remain unchanged.
How is it done? (Rows and columns sum to 1) Imagine you are mixing a glass of juice (mixing features).
- Original HC: Add as much water or sugar as you want. The cup overflows (numerical explosion).
- Current mHC: Your cup capacity is fixed at 100%. If you want to add 20% more apple juice, you must reduce 20% of the orange juice. You can only redistribute the proportions, not increase the total amount out of thin air.

This turns signal transmission into a “Convex Combination,” which is a weighted average. This way, no matter how high the building is, the sound passed down always remains clear and at a moderate volume, never turning into a scream.

3. The Effect: Super Stability at a Minimal Cost

The brilliance of this technology lies in its high cost-performance ratio:

Rock Solid: Large models that used to crash midway through training can now be trained smoothly, with signal transmission becoming very smooth.
Low Overhead: To maintain this “100% quota system,” a bit more math is required (Sinkhorn-Knopp algorithm), but because DeepSeek optimized the underlying code, overall training time only increased by 6.7%.

Tencent Hunyuan Animates Text: 1 Billion Parameter 3D Animation Generation

Finally, for content creators, Tencent’s HY-Motion 1.0 is an exciting gift. It’s a Text-to-Motion model with over 1 billion parameters, now open-sourced.

This model utilizes a Diffusion Transformer (DiT) architecture, capable of generating high-quality, fluid, and diverse 3D character animations based on natural language instructions. Whether it’s “waving hello” or complex “combat moves,” it understands precisely. Tencent claims it’s the industry’s most comprehensive motion generation model, covering 6 major categories and over 200 types of motions. For game developers or animators, these generated assets can be directly integrated into 3D workflows, significantly lowering the barrier to production.

FAQ

Q1: Why is the Meta Llama 4 benchmark faking scandal so important? It’s about transparency and trust in AI development. The Llama series has been seen as a benchmark for open-source models. If even data from top tech companies is manipulated (by using different optimized models for different tests), developers cannot accurately assess a model’s true capabilities, misleading the entire community’s technical choices and resource allocation.

Q2: What is the CLAUDE.md mentioned by Boris Cherny, and what are its benefits? CLAUDE.md is a file stored in the project’s root directory, specifically designed to guide Claude in understanding the project’s architecture, coding standards, and common errors. It’s like a “handover document” for the AI. The benefit is that it allows the AI to “remember” team preferences as the project develops, avoiding repeated mistakes and achieving a form of “continuous learning.”

Q3: What main problem does DeepSeek’s mHC technology solve? It primarily solves the training stability issue for large models using the “Hyper-Connections (HC)” architecture. The original architecture often led to signals spiraling out of control in deep networks (gradient explosion or vanishing). mHC ensures stable signal transmission through mathematical constraints (manifold projection), allowing models to be deeper and larger while remaining efficient.

Q4: What is special about the voice hardware OpenAI plans to launch? Unlike current voice assistants, the core of this device is a more advanced AI audio model. It will have more natural emotional expression and support “real-time interruption,” meaning users can interrupt the AI at any time, and the AI can react like a human, aiming to create a truly companion-like interactive experience.

Q5: Where can Tencent’s HY-Motion 1.0 be used? It is mainly used in game development, film animation production, and virtual character interaction. Developers only need to enter a text description (e.g., “an injured person walking with a limp”), and the model can generate corresponding 3D skeleton motion data, which can be directly imported into software like Blender or Unity, saving a lot of time on manual motion adjustments.

AI Daily: Llama 4 Benchmark Faking Confirmed? Yann LeCun Drops Bombshell Before Departure, OpenAI Secretly Building Voice Hardware

Meta’s Trust Crisis: Llama 4 Benchmarks Confirmed to be “Fudged”

How Do the Masters Use Tools? Claude Code Creator’s “Vanilla” Setup

OpenAI’s Next Move: More Human-like Voice Interaction and Dedicated Hardware