news

AI Daily | Qwen3.7-Plus Controls Interfaces? Bernini's New Video Architecture, Mellum2 Open Sourced, and Cursor Pricing Changes

June 2, 2026
Updated Jun 2
8 min read

AI Focus Daily: Qwen3.7-Plus Controls Global Interfaces, ByteDance’s Bernini Refines Video Editing Logic

The AI field sees stunning new progress every day. Honestly, keeping up with these technical releases can be quite a challenge. Today, we’ve rounded up some of the most influential recent technical updates, covering powerful multimodal agents, open-source video generation models, and tool billing plan adjustments and community trends closely related to developers.

Let’s break down the core highlights of these new technologies and how they will impact future software engineering and content creation workflows.

Alibaba Releases Qwen3.7-Plus: An All-Around Agent That Understands and Operates Interfaces

The long-awaited major multimodal upgrade has finally arrived. According to the detailed introduction in the Qwen Official Blog Post, the newly released Qwen3.7-Plus perfectly blends visual understanding with linguistic reasoning. This model possesses extremely powerful “Hybrid Agent” capabilities.

Did you know? Previous models could mostly only “tell stories from pictures,” but now Qwen3.7-Plus can directly read screens, operate Graphical User Interfaces (GUIs), and even complete end-to-end complex tasks in Command Line Interface (CLI) environments. For example, given a reference design drawing or a video, the model can directly output executable SVG or web frontend code.

An Automation Milestone for Software Development

The performance of this technology in practical applications is astonishing. An agent system built on Qwen3.7-Plus once set a record for continuous stable operation for over 11 hours. During this process, it autonomously completed the entire development loop for an English vocabulary learning app. From initial requirements document generation and coding all the way to test case creation and automated interface testing, it generated over 10,000 lines of code in total.

For professional desktop application scenarios, the model can also achieve one-click autonomous reproduction. It once autonomously completed a high-fidelity reproduction of the native macOS Stocks app, including connecting to real APIs to obtain real-time market data. Developers can now seamlessly integrate it into mainstream development frameworks, with stable support for Claude Code, OpenClaw, and Qwen Code.

Users who want to experience this technology can currently call the service directly through the Alibaba Cloud Model Studio API. The system also supports advanced features that retain reasoning content from previous rounds, making it ideal for building long-running agents.

ByteDance Open-Sources Bernini: Reshaping Video Generation and Editing with Semantic Planning

The technical logic of video generation is undergoing an interesting transformation. The ByteDance research team has brought the new Bernini Project, a unified architecture combining Large Multimodal Language Models (MLLMs) and Diffusion Models (DiTs).

Traditional video models usually process understanding and generation in a mixed manner, which often leads to wasted computing resources or loss of detail. Bernini adopts a very clever division of labor. the MLLM is responsible for high-level “Semantic Planning,” predicting the ViT embedding vector features of the target. Then, the DiT renderer takes over, responsible for transforming these semantic features into highly realistic pixel frames.

Technical Ingenuity in Solving Multi-Visual Feature Confusion

When processing video editing, models often face a difficult problem: how to distinguish features among the original video, reference images, and target output. To overcome this, the research team introduced “Segment-Aware 3D Rotary Positional Embedding (SA-3D RoPE).” This technology assigns independent index labels to different visual materials, ensuring that the renderer does not mistakenly paste the background of a reference image into the finally generated video when synthesizing frames.

In actual performance evaluations, Bernini has shown dominant strength. Whether it’s video-to-video editing (V2V) or reference image-guided editing (RV2V), its frame consistency and instruction-following capabilities surpass those of mainstream products currently on the market, including Kling O3 and Wan2.7.

Better yet, the development team has fully opened up this technology. Interested researchers can go and read the Research Paper Bernini: Latent Semantic Planning for Video Diffusion and can directly obtain the complete inference code and model weights from the ByteDance/Bernini Model Download.

JetBrains Launches Mellum2: A Lightweight Expert Built for Code Workflows

Sometimes, completing a task doesn’t require deploying the largest, most resource-intensive super model. The well-known developer tool company JetBrains has officially open-sourced its Mellum2 model. Technical details have been released in the JetBrains Official Blog Introduction.

Mellum2 is a 12B parameter model using a Mixture-of-Experts (MoE) architecture. Due to its unique architectural design, the parameters actually activated for each token are only 2.5B. This allows it to maintain high performance while having extremely low latency and ultra-high throughput.

Focusing on Pure Text and Code Tasks

Unlike the multimodal models mentioned earlier, Mellum2 intentionally avoids image and video processing. It focuses entirely on training with natural language and code data. This “specialization” actually makes it feel right at home in software engineering environments.

Whether it’s analyzing incoming prompts to decide which tool to call, building low-latency Retrieval-Augmented Generation (RAG) pipelines, or breaking down complex development work for sub-agents to execute, Mellum2 demonstrates extreme efficiency.

This model uses the Apache 2.0 license, making it ideal for enterprises to deploy in local environments to protect code privacy. Developers can learn more through the Hugging Face Release Notes and obtain related resources in the Hugging Face Dedicated Collection.

Cursor Teams Plan Upgrade: A Boon for Heavy Developers

Development tool billing methods always affect a team’s operating costs. According to the latest Cursor Official Announcement, starting from June 2026, the Teams plan has undergone an important structural optimization.

Team managers can now control spending more precisely. The usage quota for standard seats ($40 per month) has been significantly increased. More importantly, the system clearly splits the allocation into two independent pools: one specifically for Cursor’s own Composer and Auto features, and another for consuming third-party API models.

New Solutions for Extreme Usage

Looking closely at any development team, you’ll usually find that only a few “heavy users” consume the vast majority of AI quotas. To block such sudden on-demand costs, Cursor has launched the new Premium seat.

By paying approximately 3 times the cost ($120 per month for the annual plan), you get 5 times the included usage of a standard seat. This means teams can freely mix and match different types of seats, ensuring every penny is spent effectively. The management backend now also displays real-time progress towards usage limits and can set smart alerts to effectively avoid surprising bills at the end of the month.

Codex Quota Reset Sparks Community Discussion: Weekly to Monthly?

Finally, let’s look at a news item that has sparked a strong backlash in the developer community. Recently on Reddit, a thread titled “Weekly reset became monthly reset?” in the Reddit Discussion Thread has resonated deeply.

Many users who rely on free accounts or the Go plan for daily development suddenly found that the quota, which used to reset every 7 days, was unexpectedly extended to 30 days without warning. For students or hobbyists used to writing personal projects on weekends, this is undoubtedly a heavy blow.

This is indeed shocking. The discussion thread is full of various speculations and complaints, with some wondering if it’s a system glitch, while more believe it’s a deliberate strategic adjustment by officials. Faced with such sudden restrictions, many developers say they are looking for alternatives and even starting to prepare to migrate their workflows entirely to the more affordable DeepSeek API. This incident once again highlights the potential risks of over-relying on a single cloud service provider.

Q&A

Q1: How does the new Qwen3.7-Plus model from Alibaba differ from previous visual models? A: Qwen3.7-Plus is a multimodal interactive hybrid agent. It can not only see pictures but also directly read screens, operate Graphical User Interfaces (GUIs), and execute tasks in Command Line Interface (CLI) environments. Additionally, it possesses powerful visual code generation capabilities, such as converting images, videos, or UI screenshots directly into executable SVG or web frontend code.

Q2: How does ByteDance’s Bernini model solve the common feature confusion problem in video editing? A: Bernini employs Segment-Aware 3D Rotary Positional Embedding (SA-3D RoPE) technology. This technology can distinguish and label different visual materials, ensuring that when rendering frames, it can effectively identify features from different visual segments (such as reference images and original videos), avoiding confusion during synthesis.

Q3: Why is the Mellum2 model open-sourced by JetBrains particularly suitable for software development workflows? A: Mellum2 is a 12B parameter Mixture-of-Experts (MoE) model. It adheres to a philosophy of “focus,” avoiding complex image or video processing and focusing on text and code tasks. This gives it extremely low latency and high efficiency, making it ideal for assigning sub-agents, local private deployment, and building fast AI workflows.

Q4: What new billing solution has Cursor proposed for “heavy users” in the Teams plan? A: Cursor has introduced the new Premium seat. Companies can pay approximately 3 times the cost ($120 per month monthly, or $96 per month annually) for these high-usage developers to obtain 5 times the included usage of a standard seat. Meanwhile, the backend provides a real-time usage display dashboard and can set smart alerts to notify admins via Slack or email before spending exceeds limits.

Q5: What are the main complaints from the developer community regarding the recent Codex quota reset? What alternatives have developers proposed? A: Users with free accounts and Go plans found that the Codex quota reset cycle unexpectedly changed from the original weekly (7 days) to monthly (30 days). Faced with this sudden restriction, some developers say they are preparing to entirely migrate their workflows to the more affordable DeepSeek API as an alternative.

Share on:
Featured Partners

© 2026 Communeify. All rights reserved.