When Large Language Models start challenging “visual code”, who is the real winner? This article delves into the SVG generation benchmark of 9 top AI models including Claude Sonnet 4.5, GPT-5.1, Gemini 3.0, exploring their performance under 30 creative prompts, and analyzing what this means for developers and designers.
The Intersection of Code and Art
Have you ever wondered what happens if you ask artificial intelligence, which is good at writing Python or JavaScript, to “draw”? We are not talking about generating pixel images like Midjourney, but writing SVG (Scalable Vector Graphics) code. This is like asking a mathematician to draw a cat by writing formulas. It sounds crazy, but this is exactly one of the most interesting battlefields in the current AI field.
Recently, a benchmark named “LLM SVG Generation Benchmark” has attracted widespread attention. This test gathered 9 of the most powerful AI models currently on the market, challenging them with 30 extremely creative SVG generation prompts. This is not just testing whose code is correct, but also testing whether these models possess “spatial reasoning” and “visual imagination” capabilities.
The list of contestants in this showdown is a dream team, including the latest masterpieces from tech giants such as Anthropic, OpenAI, Google, xAI, and Alibaba.
Introduction to Contestants: Top Combat Power in 2025
This benchmark list reveals a message: the iteration speed of AI models is simply astonishingly fast. Let’s take a closer look at these 9 contestants, representing the highest level of current Large Language Models (LLM):
- Claude Sonnet 4.5 (Anthropic): Known for rigorous coding logic, can this upgraded version continue its advantage in graphic logic?
- Claude Opus 4.5 (Anthropic): As Anthropic’s flagship, theoretically, it should have more delicate performance when dealing with complex instructions.
- Grok Code Fast 1 (xAI): Featuring a 314B parameter MoE (Mixture of Experts) architecture, focusing on speed and code generation, it is a significant force under Elon Musk’s xAI.
- Gemini 2.5 Pro (Google): Google’s main model, which has always performed well in multimodal understanding.
- Gemini 3.0 Pro Preview (Google): This is the preview version of Google’s next generation, making people look forward to breakthrough architectural improvements.
- DeepSeek V3.2-Exp (685B/37B MoE): A powerful challenger from the open-source community, its huge parameter size implies its understanding of the complex world.
- GLM-4.6 (Zhipu AI, 355B/32B MoE): The latest iteration from Zhipu AI, demonstrating the competitiveness of Chinese language models in the coding field.
- Qwen3-VL-235B-A22B-Thinking (Alibaba): Alibaba Cloud’s Qwen series, specifically labeled “Thinking”, implying it strengthens the Chain of Thought (CoT) process, which is crucial for graphic generation.
- GPT-5.1 (OpenAI): As the market benchmark, every update of the GPT series is the focus of everyone’s attention, and version 5.1 is bound to have improved in creativity.
Why is SVG Generation So Hard?
You might ask, what’s so hard about generating a picture? Didn’t DALL-E already do it?
There is a key difference here. Models like DALL-E or Stable Diffusion generate “pixels”; they just need to fill colors in the right places. But LLM generating SVG is writing “code”. The model must build an X/Y coordinate system in its mind, precisely calculate the Bezier parameters for every curve, and understand the logic of layer stacking.
This is like playing a puzzle blindfolded. The model cannot see what it has drawn; it can only rely on its understanding of XML syntax and deduction of spatial logic to “draw blindly”. If the model has no concept of space, the cat drawn might have ears growing on its belly, or a circle becoming a strange polygon.
This benchmark used 30 creative prompts, which means the questions were not simply “draw a red circle”, but might involve complex scene descriptions, abstract concepts, or graphics requiring fine geometric structures. This tests not only syntactic correctness but also the model’s cognition of physical world shapes.
Technical Wrestling of Major Camps
In this benchmark, we can observe several interesting technical trends.
The Rise of MoE Architecture
Models like Grok, DeepSeek, and GLM in the list clearly indicate MoE (Mixture of Experts) architecture. This means there are different “experts” inside the model handling different types of tasks. When drawing SVG, there might be one expert responsible for geometric calculations and another for color matching. This division of labor theoretically improves output precision while maintaining computational efficiency.
Introduction of “Thinking” Ability
The name of Qwen3 includes “Thinking”, which is very intriguing. This may mean that the model performs internal Chain of Thought derivation before outputting the final code. For tasks like SVG that require precise calculation, letting the model “think before drawing” can often significantly reduce awkward situations of coordinate misalignment.
Tug of War between Closed Source and Open Source
GPT-5.1 and Claude 4.5 represent the peak of closed-source models; they have usually undergone extensive Reinforcement Learning from Human Feedback (RLHF) and know better how to please human aesthetics. Models like DeepSeek and Qwen represent the power of open weights or open-source communities; they are often bolder in parameter size and architectural innovation.
How Should Developers and Designers Choose?
Facing this benchmark list, how should we apply it in actual workflows?
If you are a Frontend Engineer and need to quickly generate simple icons or UI placeholders, Claude Sonnet 4.5 or Grok Code Fast 1 might be the first choice because they can usually generate clean, well-structured, and easy-to-maintain code.
If you are a Creative Worker looking for inspiration or generating complex vector illustrations, GPT-5.1 or Gemini 3.0 Pro Preview might give you more surprises. These models are usually more creative in understanding abstract instructions and color usage.
If you need Extreme Precision, or your instructions involve complex geometric transformations, then Qwen3 with “Thinking” ability or DeepSeek with huge parameters might perform more robustly.
Future Application Scenarios of SVG Generation
This benchmark is not just for fun; it heralds the transformation of future content creation.
- Dynamic Web Design: Future website images will no longer be rigid JPGs, but AI-generated SVGs that can change color, size, or even interact at will.
- Data Visualization: Just input Excel data, and AI can directly write beautiful SVG chart code without relying on chart libraries.
- Real-time Game Assets: Simple web games can have vector maps or characters generated directly by AI, significantly lowering the development threshold.
When AI can precisely control vector graphics, the boundary between design and code will become even more blurred.
Frequently Asked Questions (FAQ)
Here are common questions about AI-generated SVG to help you further understand this technology.
1. Why do AI-generated SVGs sometimes look “broken” or have lines running wild?
This is usually because the model’s “spatial reasoning” ability is insufficient. SVG relies on precise mathematical coordinates. If the model cannot correctly construct the geometric position of the graphic in its internal logic, it will lead to unclosed Paths or wrong coordinate point values, making the graphic look like randomly drawn lines.
2. Can these model-generated SVGs be used commercially directly?
Technically speaking, SVG is just code, and you can modify it freely. But copyright issues still have gray areas in law. However, since SVG is a general graphic composed of mathematical formulas, compared to pixel art, its copyright disputes are usually smaller. It is recommended to use it as a draft and then manually optimize and adjust it.
3. Which model generates the best quality SVG code?
According to the experience of the developer community, Gemini 3.0 Pro Preview can usually produce the cleanest and most readable XML code, which is very suitable for scenarios requiring subsequent manual editing. The Claude series tends to perform better in understanding complex and abstract drawing instructions.
4. How should I optimize my Prompts to get better SVGs?
Try to describe geometric shapes and layouts specifically. Instead of saying “draw a cat”, say “use simple combinations of circles and triangles to draw a minimalist style cat face SVG icon, using soft tones”. Giving clear geometric guidance helps the model calculate coordinates more accurately.
5. What is the difference between SVG generation and image generation like Midjourney?
The essence is completely different. Midjourney generates “bitmaps” (Pixels), which blur when zoomed in and internal elements cannot be edited. The models mentioned in this article generate “vector code” (Vector), which can be zoomed in infinitely without distortion, and you can modify the code at any time to change the color or shape of the graphic.


