Top
Best
New

Posted by creationix 9 hours ago

RX – a new random-access JSON alternative(github.com)
81 points | 31 comments
btown 5 hours ago|
This is really interesting. At first glance, I was tempted to say "why not just use sqlite with JSON fields as the transfer format?" But everything about that would be heavier-weight in every possible way - and if I'm reading things right, this handles nested data that might itself be massive. This is really elegant.

My one eyebrow raise is - is there no binary format specification? https://github.com/creationix/rx/blob/main/rx.ts#L1109 is pretty well commented, but you can't call it a JSON alternative without having some kind of equivalent to https://www.json.org/ in all its flowchart glory!

dtech 3 hours ago||
It's not quite clear to me why you'd use this over something more established such as protobuf, thrift, flatbuffers, cap n proto etc.
maxmcd 2 hours ago|
Those care about quickly sending compact messages over the network, but most of them do not create a sparse in-memory representation that you can read on the fly. Especially in javascript.

This lib keeps the compact representation at runtime and lets you read it without putting all the entities on the heap.

Cool!

Levitating 7 hours ago||
JSON is human-readable, why even compare it with this. Is any serialization format now just a "JSON alternative"?
creationix 5 hours ago||
- this encodes to ASCII text (unless your strings contain unicode themselves) - that means you can copy-paste it (good luck doing that with compressed JSON or CBOR or SQLite - there is a scale where JSON isn't human readable anymore. I've seen files that are 100+MB of minified JSON all on a single very long line. No human is reading that without using some tooling.
bawolff 5 hours ago|||
That kind of feels a bit worst of both worlds. None of the space savings/efficiency of binary but also no human readability.

Being able to copy/paste a serialization format is not really a feature i think i would care about.

rendaw 1 hour ago|||
Are there any examples? If it's ASCII I'd expect to see some of the actual data in the readme, not just API.

Unless, to read that correctly, it only has a text encoding as long as you can guarantee you don't have any unicode?

dietr1ch 6 hours ago||
cat file.whatever | whatever2json | jq ?

(Or to avoid using cat to read, whatever2json file.whatever | jq)

creationix 5 hours ago||
Or in this case, just do `rx file.rx` It has jq like queries built in and supports inputs with either rx or json. Also if you prefer jq, you can do `rx file.rx | jq`
jbverschoor 31 minutes ago||
So this is two things? A BSON-like encoding + something similar to implementing random access / tree walker using streaming JSON?

Docs are super unclear.

barishnamazov 7 hours ago||
You shouldn't be using JSON for things that'd have performance implications.
creationix 5 hours ago||
As with most things in engineering, it depends. There are real logistical costs to using binary formats. This format is almost compact as a binary format while still retaining all the nice qualities of being an ASCII friendly encoding (you can embed it anywhere strings are allowed, including copy-paste workflows)

Think of it as a hybrid between JSON, SQLite, and generic compression. This format really excels for use cases where large read-only build artifacts are queried by worker nodes like an embedded database.

Asmod4n 46 minutes ago||
The cost of using a textual format is that floats become so slow to parse, that it’s a factor of over 14 times slower than parsing a normal integer. Even with the fastest simd algos we have right now.
meehai 41 minutes ago||
and with little data (i.e. <10Mb), this matters much less than accessibility and easy understanding of the data using a simple text editor or jq in the terminal + some filters.
xxs 22 minutes ago||
what do you mean by little data, most communication protocols are not one off
squirrellous 4 hours ago|||
I agree in principle. However JSON tooling has also got so good that other formats, when not optimized and held correctly, can be worse than JSON. For example IME stock protocol buffers can be worse than a well optimized JSON library (as much as it pains me to say this).
tabwidth 3 hours ago||
Yeah the raw parse speed comparison is almost a red herring at this point. The real cost with JSON is when you have a 200MB manifest or build artifact and you need exactly two fields out of it. You're still loading the whole thing into memory, building the full object graph, and GC gets to clean all of it up after. That's the part where something like RX with selective access actually matters. Parse speed benchmarks don't capture that at all.
xxs 21 minutes ago||
as parser: keep only indexes to the original file (input), dont copy strings or parse numbers at all (unless the strings fit in the index width, e.g. 32bit)

That would make parsing faster and there will be very little in terms on tree (json can't really contain full blow graphs) but it's rather complicated, and it will require hashing to allow navigation, though.

Spivak 7 hours ago||
Can you imagine if a service as chatty and performance sensitive as Discord used JSON for their entire API surface?
50lo 1 hour ago||
The biggest challenge for formats like this is usually tooling. JSON won largely because: every language supports it, every tool understands it.

Even a technically superior format struggles without that ecosystem.

garrettjoecox 7 hours ago||
Very cool stuff!

This did catch my eye, however: https://github.com/creationix/rx?tab=readme-ov-file#proxy-be...

While this is a neat feature, this means it is not in fact a drop in replacement for JSON.parse, as you will be breaking any code that relies on the that result being a mutable object.

creationix 5 hours ago|
True, the particular use case where this really shines is large datasets where typical usage is to read a tiny part of it. Also there is no reason you couldn't write an rx parser that creates normal mutable objects. It could even be a hybrid one that is lazy parsed till you want to turn it mutable and then does a normal parse to normal objects after that point.
_flux 1 hour ago||
It doesn't seem the actual serialization format is specified? Other than in the code that is.

Is it versioned? Or does it need to be..

WatchDog 3 hours ago||
Cool project.

The viewer is cool, took me a while to find the link to it though, maybe add a link in the readme next to the screenshot.

transfire 1 hour ago|
I am a little confused. Is this still JSON? Is it “binary“ JSON?
More comments...