Make Photos Move! Wan 2.2 Animate 14B New Model Arrives, Perfectly Replicating Expressions and Actions

Imagine being able to bring a static photo to life with just a reference video, perfectly replicating the expressions and movements. This isn’t magic, but a technological breakthrough achieved by Wan-AI’s latest AI model, Wan 2.2 Animate 14B. Let’s take a look at how amazing this technology is and the principles behind its operation.

Have you ever thought that the photos lying quietly in your album could one day, like the portraits in the movie “Harry Potter,” smile, talk, and move for you? This dream, which sounds like future technology, is becoming a reality at an unprecedented speed.

Recently, the AI field has dropped another bombshell: the Wan-AI team has released their latest powerful model, Wan 2.2 Animate 14B. Simply put, this model can make a static image move, with the movements and expressions coming from another reference video. Whether it’s complex dance moves or subtle facial expressions, it can accurately capture and reproduce them with quite amazing results.

This is not just “moving,” but “coming to life”

There are already some tools on the market that can make photos move, but Wan-Animate offers much more than that. It pursues a “soul transfer” level of animation generation.

The core capability of this technology is that it can perfectly combine a reference photo (who you want to animate), a motion video (what action you want them to do), and an environment background (where the story takes place). In the end, you will get a brand new video where the protagonist is the person you specified, but they can fluently perform all the actions and expressions from the reference video.

Sounds magical, right? Let’s see how the magic behind it works.

Deconstructing the technology behind it: How does AI think?

To make all this happen, the AI needs to act like a director, meticulously processing all kinds of information. The whole process can be roughly divided into several key steps, just like preparing for a wonderful performance.

Step 1: Collecting materials (Vision Inputs)

First, the AI needs to “understand” the materials we give it. This includes:

Ref Latent: This is our protagonist, the photo you want to animate.
Tempo Latent: This is the reference video, which provides the blueprint for the action.
Env Latent: This is the background, which determines the scene where the protagonist is.

These images and videos will first pass through an encoder called VAE Encoder, which converts them into “latents” that the AI can understand. You can think of this process as the AI digesting visual information into its own set of notes for subsequent processing.

Step 2: Precise control (Control Signals)

If you simply apply the motion to the image, the result will often be very stiff. To make the animation look natural, Wan-Animate has designed two sets of sophisticated “control systems”:

Body Adapter: By analyzing the skeleton signals in the reference video, this module acts like a digital puppeteer, precisely controlling the protagonist’s limbs and body posture to ensure the fluency and accuracy of the movements.
Face Adapter: This is the key to making the character “come to life.” It doesn’t just simply open and close the mouth, but extracts deep facial features from the reference video, capturing those subtle changes in the eyes and the curve of the corners of the mouth, injecting emotion into the static face.

Step 3: The AI’s brain — Transformer

When all the materials and control signals are ready, they are sent to the core of the entire system — the Transformer. This is a powerful processing center responsible for integrating all the fragmented information.

At this stage, the AI will fuse the character, action, expression, and background information, and generate each frame of the animation step by step through a series of complex calculations (such as the DiT Block and Face Block in the figure).

It is worth mentioning that there is also an optional secret weapon here: Relighting LoRA. What is this? When you need to place a character in a brand new environment (for example, placing a person from a daytime photo into a night street scene video), the biggest fear is that the lighting will be inconsistent, looking like a failed photoshop. This Relighting LoRA is like a professional lighting technician. It will automatically adjust the character’s light and shadow to make the character perfectly blend into the new environment, looking as if they were really there.

Final Step: Output (Output)

After careful orchestration by the Transformer, the AI already has a complete animation blueprint in its mind. Finally, these blueprints are sent to the VAE Decoder, which restores the AI’s “notes” into a video that we can see with our own eyes. Thus, a vivid animation generated from a static photo is born.

What makes Wan-Animate unique?

After reading the process above, you may feel that the technology is complex, but its core advantages are actually very clear:

Unified input architecture: It cleverly integrates the three different sources of information—character, action, and background—into a unified framework for more efficient processing.
Dual precision control: It performs independent and fine-grained control over both body movements and facial expressions, greatly improving the realism of the animation.
Intelligent light and shadow fusion: Through Relighting LoRA, it solves the common problem of mismatched lighting when replacing characters, making the composite effect seamless.

Want to try it yourself or learn more?

For developers, artists, and anyone interested in AI creation, this is undoubtedly an exciting tool. You can explore the charm of Wan 2.2 Animate 14B yourself through the following links:

Project Website: https://humanaigc.github.io/wan-animate/
HuggingFace Model Zoo: https://huggingface.co/Wan-AI/Wan2.2-Animate-14B
arXiv Technical Paper: https://arxiv.org/pdf/2509.14055

From digital humans and virtual anchors to movie special effects, the emergence of technologies like Wan-Animate is opening up infinite possibilities for digital content creation. Perhaps in the near future, bringing our cherished photos “to life” will no longer be a dream, but a daily reality that everyone can easily achieve.

Make Photos Move! Wan 2.2 Animate 14B New Model Arrives, Perfectly Replicating Expressions and Actions

This is not just “moving,” but “coming to life”