A Faster Alternative to Jq

Posted by pistolario 14 hours ago

A Faster Alternative to Jq(micahkepe.com)

338 points | 211 commentspage 2

vindin 8 hours ago|

The data viz of the benchmarks is really rough. I think you’d get a lot of leverage out of rebuilding it and using colors and/or shapes to extract additional dimensions. Nobody wants to scan through raw file paths as labels to try and figure out what the hell the results are

maxloh 12 hours ago||

From their README [0]:

> Jq is a powerful tool, but its imperative filter syntax can be verbose for common path-matching tasks. jsongrep is declarative: you describe the shape of the paths you want, and the engine finds them.

IMO, this isn't a common use case. The comparison here is essentially like Java vs Python. Jq is perfectly fine for quick peeking. If you actually need better performance, there are always faster ways to parse JSON than using a CLI.

[0]: https://github.com/micahkepe/jsongrep

1vuio0pswjnm7 2 hours ago||

One problem I have not seen addressed by jq or alterataives, perhaps this one addresses it, is "JSON-like" data. That is, JSON that is not contained in a JSON file

For example, web pages sometimes contain inline "JSON". But as this is not a proper JSON file, jq-style utilties cannot process it

The solution I have used for years is a simple utility written in C using flex^1 (a "filter") that reformats "JSON" on stdin, regardless of whether the input is a proper JSON file or not, into stdout that is line-delimited, human-readable and therefore easy to process with common UNIX utilities

The size of the JSON input does not affect the filter's memory usage. Generally, a large JSON file is processed at the same speed with the same resource usage as a small one

The author here has provided musl static-pie binaries instead of glibc. HN commenters seeking to discredit musl often claim glibc is faster

Personally I choose musl for control not speed

1. jq also uses flex

onedognight 11 hours ago||

Having the equivalent jq expression in these examples might help to compare expressiveness, and it might help me see if jq could “just” use a DFA when a (sub)query admits one. grep, ripgrep, etc change algorithms based on the query and that makes the speed improvements automatic.

Asmod4n 10 hours ago||

You could just take simdjson, use its ondemand api and then navigate it with .at_path(_with_wildcard) (https://github.com/simdjson/simdjson/blob/master/doc/basics....)

The whole tool would be like a few dozen lines of c++ and most likely be faster than this.

jrhey 2 hours ago||

Since when was jq considered slow?

hilti 6 hours ago||

I'm glad you adjusted the CSS while I was typing my comment. I needed to switch to dark mode to be able to read highlighted words.

Nice write up. I will try out your tool.

mlmonkey 5 hours ago|

LOL ... came here to grips about that!

Also "jg" reads very similar to "jq", and initially I thought he was talking about "jq" all along, and I was like: where can I see the "jasongrep" examples? Threw me off for a minute.

ontouchstart 8 hours ago||

Everything can be written in JavaScript will be written in JavaScript.

Everything can be rewritten in Rust will be written in Rust.

Voranto 9 hours ago||

Quick question: Isn't the construction of a NFA - DFA a O(2^n) algorithm? If a JSON file has a couple hundred values, its equivalent NFA will have a similar amount, and the DFA will have 2^100 states, so I must be missing something.

functional_dev 8 hours ago|

theory is one thing but the cpu cache is the real bottleneck here... here is a small visual breakdown of how these arrays look in memory and why pointer chasing is so expensive compared to the actual logic: https://vectree.io/c/json-array-memory-indexing

basically the double jump to find values in the heap is what slows down these tools most

Voranto 7 hours ago||

I can see that in practice the bottleneck isn't the automata construction, I'm just curious of how the construction is approached with such a super-exponential conversion algorithm

mlmonkey 5 hours ago|

Minor suggestion: often I just want to extract one field, whose name I know exactly. I see that `jg` has an option `-F` like this:

$ cat sample.json | jg -F name

I would humbly suggest that a better syntax would be:

$ cat sample.json | jg .name

for a leaf node named "name"; or

$ cat sample.json | jg -F .name.

for any node named "name".

More comments...