The woes of sanitizing SVGs

Posted by varun_ch 22 hours ago

233 points | 95 commentspage 2

codedokode 9 hours ago|

I don't like that SVG uses things like CSS and JS and requires pulling in the whole browser to display. Instead of being a simple vector image format, it became just an extension of HTML. Maybe we need a new format, and if someone decides to do it, please add ability to embed fonts, wrap text and decent animations.

DarkUranium 2 hours ago|

Wrapping text is a bit tricky because of differences in text wrapping algorithms. Though I suppose an "easy" fix would be to be able to specify a very specific algorithm (to ensure equal representation across systems), or allowing custom (possibly better-quality) wrapping.

But for the most part, I 100% agree, and I've been considering making a format for my own use-cases. I think the biggest issue is in agreeing as to what subset is necessary; plus, of course, getting any level of adoption (though the latter isn't a factor for my own use ... except in the sense that there are no tools to help).

For example, do we need animations? Gradients? If so on the latter, what kind?

bawolff 18 hours ago||

These aren't really SVG specific issues. They are all pretty standard XSS that apply to html and are very well known vectors.

Like this post didn't even mention presentational attributes, like how cursor attribute can contain a url that gets loaded. Or any of the other tricky parts of svg sanitization, like using dtd to bypass things.

Liftyee 16 hours ago||

I'm not familiar with the details of real software development, so I don't know why it's not possible to just "not give the SVG part of the code internet access" or "perform sanitization on post-decoding (url, hex, etc) data".

Is it because the SVG parser/renderer being used is an entire library, and it would be prohibitive to write your own SVG parser/renderer or insert your own code into the existing one?

drfloyd51 15 hours ago|

Some of the suggestions are kind-of exactly that. But they specify not a change to the default behavior, but a new behavior based on the presence of a new attribute.

You could change the default behavior to the “safer” behavior. And then add some sort of “danger mode” attribute. But… devs are usually hesitant to do something that would break legitimate code, such as changing the default behavior would do.

wingi 4 hours ago||

thank you for this post.

kevinmgranger 20 hours ago||

> This was fixed by using a regular expression to remove script tags.

The infamous you can't parse (X)HTML with regex¹ meme is from 2009, yet this fix was done in 2019. I guess the SO answer never mentioned SVG.

1: https://stackoverflow.com/revisions/1732454/1

jancsika 20 hours ago||

For the "<script>" stuff: regardless of how the thing is spelled or otherwise obscured, the HTML5 parser eventually knows when it's gotten hold of a script tag. Oops, we got one in a NOSCRIPTTAG context. Let's poop out.

Tag names, attributes, attribute values, event callback default-cancelers... so many ways to declare that this node and its children shouldn't parse/evaluate scripts.

As Jay-Z said: "I've got 99 solutions, fixing a problem ain't one"

etchalon 21 hours ago||

I don't understand why it wasn't immediately understood that SVG is as dangerous as HTML.

It is not, and never was, an image format. It's a markup language.

nulltrace 14 hours ago||

Browsers already treat the same SVG differently depending on how you embed it. <img> strips scripts and external resource loads. <object> and inline don't. People test with img tags, looks fine, then someone switches the embed method and everything opens up.

OneDeuxTriSeiGo 12 hours ago||

it'd be nice if there was a way to declare in the URL that a given SVG could only be treated as an image so that you could safely open SVG urls, etc without exposing yourself to the dangers of embed/inline.

xigoi 8 hours ago||

Couldn’t you do that using Content-Security-Policy?

OneDeuxTriSeiGo 7 hours ago||

If you control the domain then yes you could. But if I want to put a link on my website to some SVG hosted elsewhere and I want it to be safe for you to open that link in a new tab then there's not really a way for CSP to protect you the user from the host deploying a malicious SVG.

Like opening a PNG in a new tab is harmless but opening an SVG in a new tab is opening a pretty substantial can of worms.

xigoi 7 hours ago||

If your threat model is “I don’t want the image I’m hotlinking to be replaced with something else when opened in a new tab”, then no image format is safe.

recursive 17 hours ago||

A markup language can be an image format. The "G" is for "Graphics" after all.

NooneAtAll3 19 hours ago||

wait... scratch is just a browser?

inkmuffin 13 hours ago|

Since 2019, Scratch is written to run in a standard web browser, replacing the older Flash runtime/editor. The desktop app uses Electron.

Theodores 19 hours ago||

Maybe we need a dumbed down version 3 of SVG where the browser knows it is not to do anything that requires fetching a URL, to make the image as harmless as a JPG.

This version 3 could have the version number changed to 2 in order to do cool SVG things, so full-fat SVG as version 2 is now. But you could just flip to 2 to a 3 on upload, so any embedded URLs are harmless.

This could be useful for the creator too, as it is helpful to have layers of source images in bitmap format to work with, and you can easily export such things accidentally.

Devasta 19 hours ago|

> In 2019, a few months after the initial release of Scratch 3, Scratch discovered that SVGs can contain <script> tags that Scratch would cause to be executed when the SVG loads. This is known as an XSS.

> Example from Scratch's test suite:

  <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
    "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
  <svg version="1.1" xmlns="http://www.w3.org/2000/svg">
    <circle cx="250" cy="250" r="50" fill="red" />
    <script type="text/javascript"><![CDATA[
        alert('from the svg!')
    ]]></script>
  </svg>

Is this really an issue? This is the method that the chrome teams polyfill to replace XSLT suggests you do. https://github.com/mfreed7/xslt_polyfill/tree/main#usage

inkmuffin 14 hours ago|

This was the example from their test suite. I didn't want to clone and build a 2019 copy of Scratch to test it end-to-end since the specifics weren't super important anyway.

More comments...