news

AI Daily: AI Tool Evolution - From Medical Imaging to Precision Marketing Data Integration

January 14, 2026
Updated Jan 14
7 min read

Google Veo 3.1 significantly improves video generation consistency and vertical format support, Manus partners with Similarweb to integrate real market data, plus MedGemma 1.5 breakthroughs in medical imaging and speech recognition, and the open-source GLM-Image’s text rendering capabilities, showing AI moving from simple content generation to precise professional applications.


Google Veo 3.1: Consistent Characters and Vertical Video Support

For creators, the biggest headache with AI video generation is often not image quality, but “inconsistency.” The protagonist wearing red in one second might turn blue in the next, or the background might suddenly shift. This “flickering” has been a major flaw in AI videos. Google DeepMind has addressed this in the latest Veo 3.1 update.

The core of this update is the enhanced “Ingredients to Video” feature. It allows creators to provide reference images, and the AI adheres more strictly to these visual prompts. This means that whether it’s a character’s appearance, clothing, or objects and textures in a scene, high consistency is maintained throughout the video clip. This is great news for those wanting to create continuous narrative content with AI.

Interestingly, Veo 3.1 finally natively supports the 9:16 vertical video format. This is clearly aimed at TikTok and YouTube Shorts, allowing creators to generate full-screen content suitable for mobile viewing without awkward cropping. This feature is already integrated into YouTube Shorts and the YouTube Create App. General users can also experience more vivid conversations and dynamic effects in the Gemini App. For professional users pursuing ultimate quality, Veo also offers options to upgrade to 1080p or even 4K, ensuring clarity on large screens.

To improve creation transparency, all videos generated by Veo 3.1 are embedded with SynthID digital watermarks. Additionally, Google has launched a verification tool in the Gemini App where users can upload a video to ask if it was generated by Google AI, providing a basis of integrity for professional users.

Manus & Similarweb: Goodbye to AI Marketing Data “Hallucinations”

When marketers use AI for market analysis, they often face an awkward situation: the AI sounds logical, but the data source is unclear or even fabricated. This risk of “hallucination” makes many professionals hesitant to fully rely on AI for decision-making. Now, AI agent Manus has announced an official partnership with digital intelligence leader Similarweb to address this trust crisis.

The significance of this integration is “authenticity.” Manus can now directly access Similarweb’s massive database, including website traffic, bounce rates, and market rankings for specific countries over the past 12 months. This is like giving AI a pair of eyes to see the real market. Marketers can ask AI: “Analyze competitors’ traffic channels over the past six months” or “Compare the performance of two websites in the US,” and the AI no longer spits out vague guesses, but charts and reports based on authoritative Similarweb data.

Furthermore, this feature can automatically transform complex market intelligence into interactive dashboards, slides, or detailed presentation reports, significantly saving marketers’ time in manual data compilation.

Users might wonder if this feature requires an additional paid subscription to Similarweb? The answer is no. According to the official statement, all Manus users can access this key data using credits, without extra subscription thresholds. This greatly lowers the cost of obtaining high-quality market intelligence, allowing entrepreneurs, SEO experts, and investors to validate ideas quickly without worrying about data accuracy.

GLM-Image: A New Benchmark for Open Source Image Text Rendering

In the field of open-source image generation, there has been a persistent problem: while models can draw beautiful landscapes, they often produce alien-like gibberish when text is involved. The Z.ai team’s released GLM-Image attempts to break this curse. This is a hybrid architecture model combining the strengths of Auto-regressive and Diffusion models (consisting of a 9B parameter auto-regressive module based on GLM-4-9B and a 7B parameter diffusion decoder based on CogView4).

Simply put, GLM-Image uses the auto-regressive model to understand complex semantics and layout, and then uses the diffusion decoder to refine details. This design makes it excellent at understanding long instructions and rendering text. Tests show its ability to accurately generate text in images rivals or surpasses many mainstream closed-source models. For designers needing to create posters or materials with slogans, this is a very practical feature.

Currently, GLM-Image is available on Hugging Face for developers. It not only excels at text rendering (introducing a lightweight Glyph-byT5 model for character-level encoding, significantly enhancing precision), but also performs well in image editing, style transfer, and maintaining multi-subject consistency.

Google MedGemma 1.5: Medical AI Goes 3D and Auditory

AI applications in healthcare are evolving from “reading text” to “viewing scans” and “auscultation.” Google Research’s MedGemma 1.5 represents this trend. Compared to the previous generation, the biggest breakthrough is support for “high-dimensional” medical imaging. This means AI can now interpret 3D volumetric data like CT and MRI scans, and even analyze continuous X-ray image sequences over time. Additionally, MedGemma 1.5 uses a 4B parameter specification, making it highly efficient and capable of running in offline environments, safeguarding medical data privacy.

This is crucial for early disease detection. Meanwhile, Google also released MedASR, a speech-to-text model optimized for medical scenarios. Doctors’ dictated notes during consultations or surgeries are often full of difficult professional terminology that general speech models easily misidentify. MedASR is specifically trained to significantly reduce error rates for medical terms.

Notably, these models are released with open weights on Hugging Face, aiming to provide a reliable foundation for researchers and developers worldwide to build medical applications suited to local needs.

Antigravity Agent Skills: An “Operation Manual” for AI Agents

For developers, making AI Agents smarter and more aligned with project needs is an ongoing exploration. Google’s Antigravity framework introduced “Agent Skills”, a standardized extension mechanism. Imagine this as giving specific “operation manuals” to AI Agents.

Through a simple folder structure (containing a SKILL.md file), developers can define steps, best practices, and even script tools for AI to follow for specific tasks. For example, you can write a “Code Review” skill teaching AI what errors to look for and what tone to use for feedback.

This design uses a “progressive disclosure” pattern: the AI initially sees only the list of skills and reads the specific skill content only when it deems necessary for the current task. This saves computational resources and makes AI behavior more controllable and focused. Detailed updates can be found in the Antigravity Changelog.

Q&A

Google Veo 3.1 Video Creation

Q1: How does Google Veo 3.1 solve the common “inconsistency” problem in AI videos? A1: Veo 3.1 significantly improves Identity consistency, keeping characters’ appearance unchanged even as scenes change, which is crucial for narrative content. It also maintains background, object, and texture consistency.

Q2: How does Veo 3.1 help short video creators? A2: It natively supports 9:16 vertical video format, allowing users to generate full-screen mobile content without cropping. This feature is integrated into YouTube Shorts and YouTube Create App.


Manus & Similarweb Data Analysis

Q3: How does the Manus and Similarweb partnership solve AI “hallucinations”? A3: This integration builds AI agent Manus on Similarweb’s authoritative real data, providing website traffic and engagement data for the past 12 months. Marketers get trustworthy market metrics instead of vague guesses.

Q4: Is an extra subscription to Similarweb required? A4: No, all Manus users can access data on-demand using Manus credits.


GLM-Image Open Source Image Generation

Q5: What are the advantages of GLM-Image’s “hybrid architecture”? A5: It combines an Auto-regressive module (AR) and a Diffusion decoder. AR handles complex semantic layout, while the diffusion decoder refines high-frequency details.

Q6: What is special about GLM-Image’s text rendering? A6: It has a significant advantage in rendering text within images, introducing a lightweight Glyph-byT5 model for character-level encoding.


Google MedGemma 1.5 Medical AI

Q7: What are the breakthroughs of MedGemma 1.5 in medical imaging? A7: It now supports high-dimensional 3D medical imaging, including CT and MRI, and can perform continuous image analysis (like X-ray time series) to track disease progression.

Q8: What is the value of the MedASR model for clinical work? A8: MedASR is a speech-to-text model optimized for medical dictation, reducing recognition error rates on medical terminology by 82% compared to general models like Whisper large-v3.


Antigravity Agent Skills Development Tools

Q9: What are Agent Skills? A9: Agent Skills are an open standard for extending AI agent capabilities. Developers just need to create a folder with a SKILL.md file defining the skill’s name and description.

Q10: How do Agent Skills optimize AI performance? A10: It uses a Progressive Disclosure pattern: AI only reads the full detailed instructions of a skill when it judges the skill is relevant to the current task.

Share on:
Featured Partners

© 2026 Communeify. All rights reserved.