I understand prose and code and raster images, but how does SVG work? The SVG structure and visual result seem so disconnected, I would have thought current LLMs would have a harder time on arbitrary descriptions of imagery.
XCSme 17 hours ago [-]
Svgs are just shapes, they (models) mostly only use sqaures, circles, etc. Rarely paths.
Plus most models are multi-modal, so they have a visual understanding too.
happytoexplain 17 hours ago [-]
Right, but rendering the visual understanding is based on neighboring pixels normally, which is much more directly related to the visual than plaintext like SVG tags.
XCSme 17 hours ago [-]
I kind of disagree, because describing an image is more like writing an svg than rendering pixels.
If I ask you to describe how would you draw a cat head, you could do it in text like: "a big circle as the head, 2 small circles as the eyes, 2 triangles for ears, 3 lines on each side of the mouth as moustaches, etc..."
happytoexplain 15 hours ago [-]
Hmm, sure. I'm still surprised - it also has to say where they are in coordinate space. It feels like the way genAI works ("what comes next") is not amenable to this use case (demonstrably I'm wrong, of course).
XCSme 15 hours ago [-]
I remember when I was learning WebGL/OpenGL and had to draw some test shapes to test my shader, I would manually think what the vertex positions should be to draw a triangle, a pyramid, etc. I think for AIs it's quite easy, because most are quite decent at math, and have been trained on many geometry problems, and probably also a lot of OpenGL code too, and 3D assets.
tmaly 20 hours ago [-]
the DeepSeek V4 Flash is impressive for the cost
XCSme 17 hours ago [-]
I use V4 Flash as my main model, it really is exceptionally capable for the price.
It can follow instructions quite well, if you have a plan, it usually executes it to completion and you get code that works.
My only concern with it, is that it's mostly served by Chinese companies on OpenRouter.
[0]: https://aibenchy.com/showcase/#showcase=eb4878dbff331c67
[0]: https://aibenchy.com/showcase/?q=grok
Plus most models are multi-modal, so they have a visual understanding too.
If I ask you to describe how would you draw a cat head, you could do it in text like: "a big circle as the head, 2 small circles as the eyes, 2 triangles for ears, 3 lines on each side of the mouth as moustaches, etc..."
It can follow instructions quite well, if you have a plan, it usually executes it to completion and you get code that works.
My only concern with it, is that it's mostly served by Chinese companies on OpenRouter.
If mods could update the link please.