news

AI Daily: AI Creator Arrives? Project Genie Lets You Create Infinite Worlds, Grok Video API Storms In

January 30, 2026
Updated Jan 30
7 min read

Big events in the AI world this week: Google DeepMind launches Project Genie, capable of creating infinite interactive worlds, giving users the fun of being a creator; xAI opens up its powerful Grok Imagine video generation API to stake a claim in the visual generation field. Meanwhile, OpenAI announces the retirement of old models like GPT-4o in February to focus on a more personalized next-generation system, and Google Maps navigation now lets you chat with Gemini like a friend while walking.


Google DeepMind Project Genie: Everyone Can Create a World

Imagine if you could not just play games, but “draw” an interactive world at will—what would that feel like? Google DeepMind recently released Project Genie, an exciting experimental project that is exactly this. It’s not just a game generator; it’s a general-purpose “world model.”

At the core of this is the Genie 3 engine. Unlike traditional static 3D scenes, Genie generates content in real-time. This means that as you move or interact within this virtual world, the system predicts and generates the subsequent paths and physical reactions on the fly. Does that sound a bit sci-fi? Through this Project Genie experimental prototype, Google AI Ultra subscribers in the US can now personally try creating, exploring, and even “remixing” different worlds.

It features three core capabilities:

  1. World Sketching: This gives wings to your imagination. You can create an ever-expanding environment through text prompts or by uploading images. Want a fantasy land filled with dragons or a cyberpunk future city? Just describe it simply, and the system will generate it for you. Even cooler, with the Nano Banana Pro feature, you can fine-tune angles and details before entering the world.
  2. World Exploration: The world here isn’t a dead backdrop. When you control a character to move, Genie calculates what happens ahead in real-time based on your actions, as if the road is growing beneath your feet.
  3. World Remixing: This is the most fun part. If you see a world created by someone else that looks interesting, you can directly “remix” it, using new prompts to change its style or rules, or even look for inspiration in the gallery.

If you are interested in this technology, you can check out more examples at Google Labs or DeepMind’s showcase page. Although current generation length is limited to 60 seconds and physical reactions can occasionally be unnatural, this truly demonstrates a huge step forward for AI in understanding real-world physics.

xAI Grok Imagine API: A Strong Challenger in Video Generation

Competition in the video generation race is getting hotter. xAI has officially launched the Grok Imagine API, a powerful tool designed for creative workflows. This isn’t just for fun; it’s designed to let developers and businesses generate high-quality videos at lower costs and faster speeds.

From the official announcement, this model has significant ambition. It performs excellently in instruction following and visual consistency. This is crucial for professional creators because you don’t want the protagonist in your video to change faces while walking, or the background to suddenly collapse.

There are several highlights of this API worth noting:

  • Cinematic Dynamic Understanding: It can transform static photos into videos with realistic camera movements and object interactions.
  • Fine-grained Editing Control: It’s not just generation; you can also “retouch” the video. For example, removing unwanted objects from the frame or replacing props in a scene while maintaining lighting consistency.
  • Flexible Format Support: Supports landscape, portrait, and various other aspect ratios to meet the needs of different social platforms.

According to third-party reviews, Grok Imagine strikes a good balance between generation quality and latency. For app developers looking to integrate video generation features, this is undoubtedly an attractive new option.

OpenAI Says Goodbye to Old Loves: GPT-4o and Old Models to Become History

Technological progress is always accompanied by the phasing out of old technologies. OpenAI announced that on February 13, 2026, it will officially retire GPT-4o, GPT-4.1, and their mini versions from ChatGPT. While this is somewhat sentimental, as GPT-4o has accompanied many through countless brainstorming nights, this is done to focus resources on developing better models.

Official data shows that currently only 0.1% of users are still using GPT-4o, with the vast majority having already switched to the more powerful GPT-5.2. OpenAI found that users actually care more about AI’s “personality” and “creativity” rather than just cold logic. Therefore, the new generation of models will be more like an adult in conversational style, reducing awkward preaching and offering more options for custom tones.

This doesn’t mean the old models will disappear completely; API users won’t be affected for now. But for daily ChatGPT users, it’s time to embrace a more responsive partner with a distinct personality.

Google Maps and Gemini: A Tour Guide With You While Walking or Biking

Have you ever had to fumble typing a search while looking at a map and walking? Google Maps is changing that experience. Now, Gemini navigation features officially support walking and cycling modes.

What does this mean? It means your map becomes a local guide who can talk.

  • For Walkers: You can casually ask, “Hey Google, what neighborhood am I in right now?” or “Which restaurant nearby has the highest rating?” Gemini will answer you directly based on the latest information on the map, without you having to stop and stare at your phone.
  • For Cyclists: This is even more of a safety feature. When your hands are gripping the handlebars, you can directly ask, “How much longer until I arrive?” or even say, “Text Sarah that I’ll be 10 minutes late.”

This feature is rolling out globally on iOS and Android devices, available wherever Gemini is supported. This makes navigation no longer just cold voice commands, but a more natural interactive experience.

OpenAI’s Internal Secret Weapon: Self-Use Data Analysis Agent

People often wonder how a company with massive data like OpenAI handles its own data. They recently revealed their in-house data analysis Agent, a tool built specifically for their own engineers and scientists.

Imagine facing 600 PB of data and 70,000 datasets—just “finding the right table” might take half a day. The role of this internal Agent is to let employees ask questions in natural language, like “Which taxi route in New York has the highest time variance?”, and the Agent will automatically write SQL code, generate charts, and even self-correct errors.

This is not just a query tool; it also has a “memory” function. If it makes a mistake and gets corrected this time, it will remember the lesson next time. This demonstrates how AI can significantly lower the barrier to data analysis within enterprises, allowing non-data experts to easily mine insights. Perhaps this is a microcosm of future enterprise data management.

Qwen3-ASR: A New Benchmark for Open Source Speech Recognition

Finally, good news from the open-source community. The Qwen team released the Qwen3-ASR model series, which is a big gift for developers needing to handle multilingual speech recognition.

This series includes 1.7B and 0.6B versions, supporting recognition of up to 52 languages and dialects. This isn’t ordinary recognition; it handles accented English or specific Chinese dialects quite well.

  • All-Round Player: besides basic speech-to-text, it introduces Qwen3-ForcedAligner, a forced alignment model that provides extremely high-precision timestamp prediction.
  • Performance Monster: The 0.6B version maintains accuracy while having amazing throughput, perfect for scenarios requiring real-time processing of large amounts of audio.

For developers who don’t want to rely on expensive commercial APIs, Qwen3-ASR’s open source on Hugging Face undoubtedly offers one of the most powerful free alternatives on the market today.


FAQ

Q: Is Project Genie a game? Where can I play it? A: Project Genie is currently an experimental research prototype and not exactly a traditional game. It’s more like a creation tool. Currently, it is only open to Google AI Ultra subscribers in the US region for testing via Google Labs.

Q: Why is OpenAI retiring GPT-4o? A: Mainly because the performance of the new model GPT-5.2 has surpassed the old models, and the vast majority of users (99.9%) have already migrated. Retiring old models allows OpenAI to concentrate computing resources on optimizing the personalization and creativity performance of new models.

Q: How is Grok Imagine API different from other video generation models? A: Grok Imagine emphasizes “instruction following” and “video editing” capabilities. It not only generates videos but can also precisely remove or replace objects in videos, which is very advantageous for professional workflows requiring fine control over the footage.

Q: Is Qwen3-ASR free? A: Yes, Qwen3-ASR is an open-source model. Developers can download the weights and deploy them on their own servers, making it very suitable for projects that need to protect privacy or save on API costs.

Share on:
Featured Partners

© 2026 Communeify. All rights reserved.