Glassworm Is Back: A New Wave of Invisible Unicode Attacks Hits Repositories

Posted by robinhouston 8 hours ago

Glassworm Is Back: A New Wave of Invisible Unicode Attacks Hits Repositories(www.aikido.dev)

146 points | 82 commentspage 2

DropDead 6 hours ago|

Why didn't some make av rule to find stuff like this, they are just plain text files

nine_k 5 hours ago||

The rule must be very simple: any occurrence of `eval()` should be a BIG RED FLAG. It should be handled like a live bomb, which it is.

Then, any appearance of unprintable characters should also be flagged. There are rather few legitimate uses of some zero-width characters, like ZWJ in emoji composition. Ideally all such characters should be inserted as \xNNNN escape sequences, and not literal characters.

Simple lint rules would suffice for that, with zero AI involvement.

WalterBright 4 hours ago|||

> There are rather few legitimate uses of some zero-width characters, like ZWJ in emoji composition.

Emojis are another abomination that should be removed from Unicode. If you want pictures, use a gif.

_flux 3 hours ago|||

Arguably them being in Unicode is an accessibility issue, unless we thought to standardize GIF names, and then that already sounds a lot like Unicode.

WalterBright 3 hours ago||

How is it an accessibility issue? HTML allows things like little gif files. I've done this myself when I wrote text that contained Egyptian hieroglyphs. It works just fine!

_flux 3 hours ago||

I mean if you don't have sight.

WalterBright 2 hours ago||

Then use words. Or tooltips (HTML supports that). I use tooltips on my web pages to support accessibility for screen readers. Unicode should not be attempting to badly reinvent HTML.

sghitbyabazooka 3 hours ago|||

( ꏿ ﹏ ꏿ ; )

hamburglar 4 hours ago||||

I think there’s debate (which I don’t want to participate in) over whether or not invisible characters have their uses in Unicode. But I hope we can all agree that invisible characters have no business in code, and banishing them is reasonable.

trollbridge 5 hours ago|||

In our repos, we have some basic stuff like ruff that runs, and that includes a hard error on any Unicode characters. We mostly did this after some un-fun times when byte order marks somehow ended up in a file and it made something fail.

I have considered allowing a short list that does not include emojis, joining characters, and so on - basically just currency symbols, accent marks, and everything else you'd find in CP-1521 but never got around to it.

abound 6 hours ago|||

Yeah it would have been nice to end with "and here's a five-line shell script to check if your project is likely affected". But to their credit, they do have an open-source tool [1], I'm just not willing to install a big blob of JavaScript to look for vulns in my other big blobs of JavaScript

[1] https://github.com/AikidoSec/safe-chain

nine_k 5 hours ago||

Something like this should work, assuming your encoding is Unicode (normally UTF-8), which grep would interpret:

  grep -P '[\x{200B}\x{200C}\x{200D}\x{FEFF}]' code.ts

See https://stackoverflow.com/q/78129129/223424

charcircuit 1 hour ago||

Isn't that what this article is about? Advertising an av rule in their product that catches this.

codechicago277 3 hours ago||

I wonder if this could be used for prompt injection, if you copy and paste the seemingly empty string into an LLM does it understand? Maybe the affect Unicode characters aren’t tokenized.

NoMoreNicksLeft 2 hours ago||

Why can't code editors have a default-on feature where they show any invisible character (other than newlines)? I seem to remember Sublime doing this at least in some cases... the characters were rendered as a lozenge shape with the hex value of the character.

Is there ever a circumstance where the invisible characters are both legitimate and you as a software developer wouldn't want to see them in the source code?

chairmansteve 2 hours ago||

eval() used to be evil....

Are people using eval() in production code?

faangguyindia 5 hours ago||

Back in time I was on hacking forums where lot of script kiddies used to make malicious code.

I am wondering how that they've LLM, are people using them for making new kind of malicious codes more sophisticated than before?

Yokohiii 4 hours ago|

In this case LLMs were obviously used to dress the code up as more legitimate, adding more human or project relevant noise. It's social engineering, but you leave the tedious bits to an LLM. The sophisticated part is the obscurity in the whole process, not the code.

aneyadeng 4 hours ago||

[dead]

aplomb1026 4 hours ago||

[dead]

robutsume 5 hours ago||

[dead]

max_ 3 hours ago|

I don't have to worry about any of this.

My clawbot & other AI agents already have this figured out.