AI Daily: OpenAI Launches Powerful Image Editing Model, Meta Revolutionizes Audio Editing - Top 5 Major Updates from AI Giants This Week

This week has been bustling for the artificial intelligence field. From visual creation to audio processing, scientific research, and daily productivity, tech giants have released impressive new tools. OpenAI has finally addressed the pain point of AI image “fine-tuning,” Meta handles sound like photo editing, and Google aims to smooth your daily workflow. These updates are not just technical stacks but directly impact how creators and professionals work.

Here is a deep dive into five major updates that might change the future of work.

1. OpenAI Announces GPT Image 1.5: Precision Editing is No Longer a Dream

For many who have used AI image tools, the biggest headache is often not “creating from scratch” but “modifying.” Often, just wanting to change a piece of clothing in the picture results in the character’s facial features, lighting, or even the background changing completely. OpenAI’s newly released GPT Image 1.5 model is designed to solve this problem.

The most powerful aspect of this new model is its “instruction following capability.” It can precisely execute editing commands while preserving core details of the original image (such as lighting, composition, and appearance). This means users can fine-tune AI-generated images just like using professional photo editing software—whether changing clothes, adjusting background elements, or performing style transfers—while maintaining high consistency.

In addition to editing functions, OpenAI also launched a brand new “Images” creation interface. This interface is not just a chat box but more like a small creative studio, providing various preset style filters and inspiration prompts to make the creation process more intuitive. It is worth mentioning that the new model’s generation speed is four times faster than the previous generation, and the API price has been reduced by 20%, which is undoubtedly good news for enterprise users who need to generate a large number of images.

2. Meta Launches SAM Audio: “Select” Sound Like Photo Editing

If OpenAI solved the problem of visual editing, then Meta has dropped a bombshell in the auditory field. Meta officially released the SAM Audio audio separation model, which is the latest extension of its famous “Segment Anything” series in the audio field.

Imagine you recorded a video, but the background traffic noise drowned out the speech, or you want to extract a guitar solo from a song. In the past, this required professional audio engineers to spend a lot of time processing. But SAM Audio makes all this incredibly simple. It supports three intuitive command methods:

Text Prompting: Directly input “dog barking” or “vocals,” and the model will automatically grab the corresponding audio track.
Visual Prompting: Click on the object making sound in the video (such as a guitar being played), and the AI will separate the sound of that object.
Span Prompting: This is an industry-first feature that allows users to mark specific time segments to lock onto audio.

This technology breaks the threshold of traditional audio editing, making sound separation as simple as using the Magic Wand in Photoshop. Whether it’s Podcast production, video editing, or music creation, SAM Audio provides unprecedented flexibility. The model is currently open for experience on the Segment Anything Playground.

3. Google Launches Experimental AI Agent “CC”: Your All-Round Digital Secretary

In terms of productivity tools, Google Labs launched a new experimental AI agent codenamed CC. Built on the Gemini model, this tool aims to be a super assistant in users’ work and life.

Modern work data is often scattered everywhere: meeting notifications in Gmail, schedules on Google Calendar, project documents in Drive. The core value of CC lies in its ability to connect this scattered information. Every morning, it sends a “Your Day Ahead” brief to your inbox, which is not just a simple schedule but a comprehensive summary of to-dos, important email updates, and itineraries.

Even more thoughtful is that CC has the ability to execute proactively. If it finds you have an upcoming meeting, it will automatically prepare relevant email drafts or organize necessary file links. Users can even directly “teach” CC to remember specific personal preferences or long-term tasks by replying to emails. Currently, this feature is open for testing to Google AI Ultra users in the US and Canada.

4. Build Your Exclusive “Mini App” with Opal in Gemini

In addition to helping you handle chores, Google also wants you to build tools yourself. Google has integrated its development tool Opal directly into the Gemini web version. This is a platform that allows general users to create “AI Mini Apps.”

The characteristic of Opal lies in its visual editor. Users don’t need to understand complex code; just by inputting prompts, they can transform ideas into a reusable tool. The new interface can even convert your prompts into a clear list of steps, making it easier for you to understand and adjust the app’s operating logic.

This update means Gemini’s use is no longer limited to single conversations. You can create a mini app specifically for “generating weekly reports in a specific format” or “analyzing financial report data” for yourself and use it repeatedly. For users who want a more customized AI experience, this is a very practical function.

5. OpenAI Announces FrontierScience: The Ultimate Test for AI Scientific Reasoning

While we are discussing whether AI can draw or write letters, OpenAI is already thinking about whether AI can become a scientist. OpenAI released a new benchmark suite named FrontierScience, specifically designed to evaluate AI’s expert-level reasoning capabilities in physics, chemistry, and biology.

Existing evaluations mostly focus on multiple-choice questions, which hardly reflect the real scientific research process. FrontierScience includes two categories: “Olympiad Problems” and “Research Tasks.” The former is designed by International Olympiad medalists to test high-intensity theoretical reasoning; the latter is designed by PhD-level scientists to simulate real scientific research scenarios and evaluate whether AI has the potential to conduct original research.

In preliminary tests, OpenAI revealed that its internal model GPT-5.2 scored a high 77% on Olympiad questions, far exceeding previous models. The significance of this project is that it sets a clear standard for AI to enter the field of serious scientific research and also lets us see the possibility of AI assisting humans in unlocking cancer mysteries or developing new materials in the future.

Frequently Asked Questions (FAQ)

Q1: Can I use OpenAI’s new GPT Image 1.5 model now? Yes, the new Images model has been rolled out to all ChatGPT users starting today, and is also available to developers via API as GPT Image 1.5. However, access for Business and Enterprise editions will be opened later.

Q2: Is Meta’s SAM Audio paid? Currently, Meta has made SAM Audio available on the Segment Anything Playground for public experience and also provides model downloads. As part of open-source research, developers and researchers can explore its functions for free, but commercial use may require reference to specific licensing terms.

Q3: Is Google’s CC assistant available globally? Currently, CC is still in the Early Access phase, initially open only to Google AI Ultra subscribers and paid users in the US and Canada. Users in other regions may need to wait a bit longer; it is recommended to keep an eye on follow-up announcements from Google Labs.

Q4: Why is a new evaluation standard like FrontierScience needed? Because past tests were mostly multiple-choice questions, which are easy for models to “memorize” answers to, failing to measure true reasoning ability. FrontierScience, through open-ended questions and complex research tasks, more truly reflects whether AI possesses the ability to assist scientists in breakthrough research.

Q5: How much cheaper is the “API price” mentioned for GPT Image 1.5? According to OpenAI, the input and output prices of GPT Image 1.5 have been reduced by 20% compared to the previous generation GPT Image 1, allowing developers to generate or edit more images with the same budget.

1. OpenAI Announces GPT Image 1.5: Precision Editing is No Longer a Dream

2. Meta Launches SAM Audio: “Select” Sound Like Photo Editing

3. Google Launches Experimental AI Agent “CC”: Your All-Round Digital Secretary

4. Build Your Exclusive “Mini App” with Opal in Gemini

5. OpenAI Announces FrontierScience: The Ultimate Test for AI Scientific Reasoning

Frequently Asked Questions (FAQ)

DMflow.chat

DMflow.chat

videoweaver.app

DMflow.chat

DMflow.chat

videoweaver.app

Hello, we want to use some third-party cookies and scripts to enhance the functionality of this website.

AI Daily: OpenAI Launches Powerful Image Editing Model, Meta Revolutionizes Audio Editing - Top 5 Major Updates from AI Giants This Week

1. OpenAI Announces GPT Image 1.5: Precision Editing is No Longer a Dream

2. Meta Launches SAM Audio: “Select” Sound Like Photo Editing

3. Google Launches Experimental AI Agent “CC”: Your All-Round Digital Secretary

4. Build Your Exclusive “Mini App” with Opal in Gemini

5. OpenAI Announces FrontierScience: The Ultimate Test for AI Scientific Reasoning

Frequently Asked Questions (FAQ)

DMflow.chat

DMflow.chat

videoweaver.app

DMflow.chat

DMflow.chat

videoweaver.app

Recommended for You

AI Daily: GPT-5.4 Quietly Debuts, Bing Integrates Sora 2, and How Cursor Will Change Your Work

AI Daily: Google Launches Canvas, OpenAI Developer Tools, and Latest Tech Trends

AI Daily: GPT-5.3, Gemini 3.1 Latest Updates and the $80,000 API Key Leak Disaster Analysis