A Faster Alternative to Jq

Posted by pistolario 12 hours ago

A Faster Alternative to Jq(micahkepe.com)

338 points | 211 comments

allknowingfrog 4 minutes ago|

I deal with a fair amount of newline-delimited JSON in my day job, where each line in the file is a complete JSON object. I've seen this referred to as "jsonl", and it's not entirely uncommon for logs and other kinds of time-series data dumps. Do any of the popular JSON CLI tools work with this format? I didn't see any mention of it here.

regus 6 hours ago||

Jq's syntax is so arcane I can never remember it and always need to look up how to get a value from simple JSON.

LgWoodenBadger 2 minutes ago||

I completely agree. I much prefer leveraging actual javascript to get what I need instead of spending time trying to fumble my way through jq syntax.

raydev 14 minutes ago|||

Like I did with regex some years earlier, I worked on a project for a few weeks that required constant interactions with jq, and through that I managed to lock in the general shape of queries so that my google hints became much faster.

Of course, this doesn't matter now, I just ask an LLM to make the query for me if it's so complex that I can't do it by hand within seconds.

d35007 4 hours ago|||

That’s interesting! Can you say a little more? I find jq’s syntax and semantics to be simple and intuitive. It’s mostly dots, pipes, and brackets. It’s a lot like writing shell pipelines imo. And I tend to use it in the same way. Lots of one-time use invocations, so I spend more time writing jq filters than I spend reading them.

I suspect my use cases are less complex than yours. Or maybe jq just fits the way I think for some reason.

I dream of a world in which all CLI tools produce and consume JSON and we use jq to glue them together. Sounds like that would be a nightmare for you.

randusername 4 hours ago|||

I'm not GP, I use jq all the time, but I each time I use it I feel like I'm still a beginner because I don't get where I want to go on the first several attempts. Great tool, but IMO it is more intuitive to JSON people that want a CLI tool than CLI people that want a JSON tool. In other words, I have my own preconceptions about how piping should work on the whole thing, not iterating, and it always trips me up.

Here's an example of my white whale, converting JSON arrays to TSV.

cat input.json | jq -S '(first|keys | map({key: ., value: .}) | from_entries), (.[])' | jq -r '[.[]] | @tsv' > out.tsv

nh23423fefe 58 minutes ago|||

    <input.json  jq -S  -r '(first | keys) , (.[]| [.[]]) | @tsv'
    <input.json  # redir
    jq
    -S           # sort
    -r           # raw string out
    '
    (first | keys) # header
    ,              # comma is generator
    (.[] |           # loop input array and bind to .
    [                # construct array
     .[]             # with items being the array of values of the bound object
     ])           
     | @tsv'        # generator binds the above array to . and renders to tsv

randusername 5 minutes ago||

oh my god how could I have been doing this for so long and not realize that you can redirect before your binary.

I knew cat was an anti-pattern, but I always thought it was so unreadable to redirect at the end

figmert 2 hours ago||||

Here's an easier to understand query for what you're trying to do (at least it's easier to understand for me):

    cat input.json | jq -r '(first | keys) as $cols | $cols, (.[] | [.[$cols[]]]) | @tsv'

That whole map and from entries throws it off. It's not a good use for what you're doing. tsv expects a bunch of arrays, whereas you're getting a bunch of objects (with the header also being one) and then converting them to arrays. That is an unnecessary step and makes it a little harder to understand.

chuckadams 4 minutes ago|||

Honestly both of those make me do the confused-dog-head-tilt thing. I'd go for something sexp based, perhaps with infix composition, map, and flatmap operators as sugar.

randusername 1 hour ago|||

Thanks for sharing, this is much better, though I actually think it is the perfect example to explain something that is brain-slippery about jq

look at $cols | $cols

my brain says hmm that's a typo, clearly they meant ; instead of | because nothing is getting piped, we just have two separate statements. Surely the assignment "exhausts the pipeline" and we're only passing null downstream

the pipelining has some implicit contextual stuff going on that I have to arrive at by trial and error each time since it doesn't fit in my worldview while I'm doing other shell stuff

lokar 3 hours ago|||

I find it much harder to remember / use each time then awk

stingraycharles 3 hours ago|||

Sound similar to how power shell works, and it’s not great. Plain text is better.

marginalia_nu 2 hours ago|||

I think the big problem is it's a tool you usually reach for so rarely you never quite get the opportunity to really learn it well, so it always remains in that valley of despair where you know you should use it, but it's never intuitive or easy to use.

It's not unique in that regard. 'sed' is Turing complete[1][2], but few people get farther than learning how to do a basic regex substitution.

[1] https://catonmat.net/proof-that-sed-is-turing-complete

[1] And arguably a Turing tarpit.

ivaniscoding 5 hours ago|||

Shameless plug, but you might like this: https://github.com/IvanIsCoding/celq

jq is the CLI I like the most, but sometimes even I struggled to understand the queries I wrote in the past. celq uses a more familiar language (CEL)

TomNomNom 5 hours ago|||

Cool tool! Really appreciate the shoutout to gron in the readme, thanks! :)

bigfishrunning 4 hours ago||||

I had never heard of CEL, looks useful though, thanks for posting this!

xpe 5 hours ago|||

CEL looks interesting and useful, though it isn't common nor familiar imo (not for me at least). Quoting from https://github.com/google/cel-spec

    # Common Expression Language

    The Common Expression Language (CEL) implements common
    semantics for expression evaluation, enabling different
    applications to more easily interoperate.

    ## Key Applications

    - Security policy: organizations have complex infrastructure
      and need common tooling to reason about the system as a whole
    - Protocols: expressions are a useful data type and require
      interoperability across programming languages and platforms.

ivaniscoding 5 hours ago||

That’s some fair criticism, but the same page tells that the language wanted to have a similar syntax to C and JavaScript.

I think my personal preference for syntax would be Python’s. One day I want to try writing a query tool with https://github.com/pydantic/monty

dhuan_ 35 minutes ago|||

I agree, even trivial tasks require us to go back to jq's manual to learn how to write their language.

this and other reasons is why I built: https://github.com/dhuan/dop

epr 1 hour ago|||

To fix this I recently made myself a tiny tool I called jtree that recursively walks json, spitting out one line per leaf. Each line is the jq selector and leaf value separated by "=".

No more fiddling around trying to figure out the damn selector by trying to track the indentation level across a huge file. Also easy to pipe into fzf, then split on "=", trim, then pass to jq

iamjackg 1 hour ago||

You might like https://github.com/tomnomnom/gron

charlesdaniels 2 hours ago|||

If we're plugging jq alternatives, I'll plug my own: https://git.sr.ht/~charles/rq

I was working at lot with Rego (the DSL for Open Policy Agent) and realized it was actually a pretty nice syntax for jq type use cases.

xendo 3 hours ago|||

Highly recommend gron. https://github.com/tomnomnom/gron

eevmanu 2 hours ago||

or https://github.com/adamritter/fastgron

janderland 3 hours ago|||

JMESPath is what I wish jq was. Consistent grammar. It only issue is it lacks the ability to convert JSON to other formats like CSV.

voidfunc 4 hours ago|||

I just ask Opus to generate the queries for me these days.

hilti 4 hours ago|||

LOL ... I can absolutely feel your pain. That's exactly why I created for myself a graphical approach. I shared the first version with friends and it turned into "ColumnLens" (ImGUI on Mac) app. Here is a use case from the healthcare industry: https://columnlens.com/industries/medical

NSPG911 5 hours ago|||

I also genuinely hate using jq. It is one of the only things that I rely heavily on AI.

vips7L 4 hours ago|||

You should try nushell or PowerShell which have built ins to convert json to objects. It makes it so easy.

bigstrat2003 4 hours ago||

Second this. Working with nushell is a joy.

amelius 5 hours ago|||

At that point why don't we ask the AI directly to filter through our data? The AI query language is much more powerful.

latexr 5 hours ago|||

Because the output you get can have hallucinations, which don’t happen with a deterministic tool. Furthermore, by getting the `jq` command you get something which is reusable, fast, offline, local, doesn’t send your data to a third-party, doesn’t waste a bunch of tokens, … Using an LLM to filter the data is worse in every metric.

alwillis 2 hours ago|||

I get that AI isn’t deterministic by definition, but IMHO it’s become the go-to response for a reason to not use AI, regardless of the use case.

I’ve never seen AI “hallucinate” on basic data transformation tasks. If you tell it to convert JSON to YAML, that’s what you’re going to get. Most LLMs are probably using something like jq to do the conversion in the background anyway.

AI experts say AI models don’t hallucinate, they confabulate.

tkclough 1 hour ago||

Just because you haven't seen it hallucinate on these tasks doesn't mean it can't.

When I'm deciding what tool to use, my question is "does this need AI?", not "could AI solve this?" There's plenty of cases where its hard to write a deterministic script to do something, but if there is a deterministic option, why would you choose something that might give you the wrong answer? It's also more expensive.

The jq script or other script that an LLM generates is way easier to spot check than the output if you ask it to transform the data directly, and you can reuse it.

amelius 5 hours ago|||

You can use a local LLM and you can ask it to use tools so it is faster.

sigseg1v 4 hours ago|||

"so it is faster" than what? A cloud hosted LLM? That's a pretty low bar. It's certainly not faster than jq.

kelvinjps10 4 hours ago||||

There is hardware that is able to run jq but no a local AI model that's powerful enough to make the filtering reliable. Ex a raspberry pi

imcritic 4 hours ago||||

Because the input might be sensitive.

Because the input might be huge.

Because there is a risk of getting hallucinations in the output.

Isn't this obvious?

aduitsis 3 hours ago||

...and because it's going to burn a million times the energy of what jq would require.

Shorel 4 hours ago|||

You really need to go and learn about the concept of determinism and why for some tasks we need and want deterministic solutions.

It's an important idea in computer science. Go and learn.

amelius 4 hours ago||

You need to learn to adapt to the real world where most things are not deterministic. Go and learn.

Shorel 4 hours ago|||

I already know that. That's why we have deterministic algorithms, to simplify that complexity. You have much to learn, witty answers mean nothing here, particularly empty witty answers, which are no better than jokes. Maybe stand-up comedy is your call in life.

johnisgood 4 hours ago||||

That may be true, but do you not want determinism where possible, especially within this context, i.e. filtering data?

skipants 4 hours ago|||

Is your argument that the world isn't deterministic and so we should also apply nondeterminism to filtering json data?

GaryNumanVevo 4 hours ago|||

yeah I literally just use gemini / claude to one-shot JQ queries now

d0963319287 5 hours ago||

[flagged]

1a527dd5 10 hours ago||

I appreciate performance as much as the next person; but I see this endless battle to measure things in ns/us/ms as performative.

Sure there are 0.000001% edge cases where that MIGHT be the next big bottleneck.

I see the same thing repeated in various front end tooling too. They all claim to be _much_ faster than their counterpart.

9/10 whatever tooling you are using now will be perfectly fine. Example; I use grep a lot in an ad hoc manner on really large files I switch to rg. But that is only in the handful of cases.

j1elo 9 hours ago||

Whenever you have this kind of impressions on some development, here are my 2 cents: just think "I'm not the target audience". And that's fine.

The difference between 2ms and 0.2ms might sound unneeded, or even silly to you. But somebody, somewhere, is doing stream processing of TB-sized JSON objects, and they will care. These news are for them.

alsetmusic 5 hours ago|||

I remember when I was coming up on the command line and I'd browse the forums at unix.com. Someone would ask how to do a thing and CFAJohnson would come in with a far less readable solution that was more performative (probably replacing calls to external tools with Bash internals, but I didn't know enough then to speak intelligently about it now).

People would say, "Why use this when it's harder to read and only saves N ms?" He'd reply that you'd care about those ms when you had to read a database from 500 remote servers (I'm paraphrasing. He probably had a much better example.)

Turns out, he wrote a book that I later purchased. It appears to have been taken over by a different author, but the first release was all him and I bought it immediately when I recognized the name / unix.com handle. Though it was over my head when I first bought it, I later learned enough to love it. I hope he's on HN and knows that someone loved his posts / book.

https://www.amazon.com/Pro-Bash-Programming-Scripting-Expert...

noisy_boy 1 hour ago||

Wow that takes me back. I used to lurk on unix.com when I was starting with bash and perl and would see CFAJohnson's terse one-liners all the time. I enjoyed trying my own approaches to compare performance, conciseness and readability - mainly for learning. Some of the awk stuff was quite illuminating in my understanding of how powerful awk could be. I remember trying different approaches to process large files at first with awk and then with Perl. Then we discovered Oracle's external tables which turned out to be clear winner. We have a lot more options now with fantastic performance.

mememememememo 7 hours ago||||

Also as someone who looks at latency charts too much, what happens is a request does a lot in series and any little ms you can knock off adds up. You save 10ms by saving 10 x 1ms. And if you are a proxyish service then you are a 10ms in a chain that might be taking 200 or 300ms. It is like saving money, you have to like cut lots of small expenses to make an impact. (unless you move etc. but once you done that it is small numerous things thay add up)

Also performance improvements on heavy used systems unlocks:

Cost savings

Stability

Higher reliability

Higher throughput

Fewer incidents

Lower scaling out requirements.

lock1 7 hours ago||

Wait what? I don't get why performance improvement implies reliability and incident improvement.

For example, doing dangerous thing might be faster (no bound checks, weaker consistency guarantee, etc), but it clearly tend to be a reliability regression.

spiffyk 6 hours ago|||

First, if a performance optimization is a reliability regression, it was done wrong. A bounds check is removed because something somewhere else is supposed to already guaratee it won't be violated, not just in a vacuum. If the guarantee stands, removing the extra check makes your program faster and there is no reliability regression whatsoever.

And how does performance improve reliability? Well, a more performant service is harder to overwhelm with a flood of requests.

johnisgood 4 hours ago||

"Removing an extra check", so there is a check, so the check is not removed?

spiffyk 3 hours ago||

It does not need to be an explicit check (i.e. a condition checking that your index is not out of bounds). You may structure your code in such a way that it becomes a mathematical impossibility to exceed the bounds. For a dumb trivial example, you have an array of 500 bytes and are accessing it with an 8-bit unsigned index - there's no explicit bounds check, but you can never exceed its bounds, because the index may only be 0-255.

Of course this is a very artificial and almost nonsensical example, but that is how you optimize bounds checks away - you just make it impossible for the bounds to be exceeded through means other than explicitly checking.

cwaffles 6 hours ago|||

Less OOMs, less timeouts, less noisy neighbors problems affecting other apps

tclancy 7 hours ago||||

Which is fine, but the vast majority of the things that get presented aren’t bothering to benchmark against my use (for a whole lotta mes). They come from someone scratching an itch and solving it for a target audience of one and then extrapolating and bolting on some benchmarks. And at the sizes you’re talking about, how many tooling authors have the computing power on hand to test that?

NoSalt 5 hours ago||||

> "somebody, somewhere, is doing stream processing of TB-sized JSON objects"

That's crazy to think about. My JSON files can be measured in bytes. :-D

j1elo 5 hours ago|||

Well obviously that would happen mostly only on the biggest business scales or maybe academic research; one example from Nvidia, which showcases Apache Spark with GPU acceleration to process "tens of terabytes of JSON data":

https://developer.nvidia.com/blog/accelerating-json-processi...

DoctorOW 2 hours ago|||

All files can be measured in bytes. :)

NoSalt 16 minutes ago||

You, sir or ma'am, are a first class smarty pants.

Chris2048 7 hours ago||||

But even in this example, the 2ms vs 0.2 is irrelevant - its whatever the timings are for TB-size objects.

So went not compare that case directly? We'd also want to see the performance of the assumed overheads i.e. how it scales.

7bit 6 hours ago|||

Who is the target audience? I truly wonder who will process TB-sized data using jq? Either it's in a database already, in which case you're using the database to process the data, or you're putting it in a database.

Either way, I have really big doubts that there will be ever a significant amount of people who'd choose jq for that.

simonw 5 hours ago||

There was a thread yesterday where a company rewrote a similar JSON processing library in Go because they were spending $100,000s on serving costs using it to filter vast amounts of data: https://news.ycombinator.com/item?id=47536712

Hendrikto 7 hours ago|||

I get the sentiment, but everybody thinks that, and in aggregate, you get death by a thousand paper cuts.

It’s the same sentiment as “Individuals don’t matter, look at how tiny my contribution is.”. Society is made up of individuals, so everybody has to do their part.

> 9/10 whatever tooling you are using now will be perfectly fine.

It is not though. Software is getting slower faster than hardware is getting quicker. We have computers that are easily 3–4+ orders of magnitudes faster than what we had 40 years ago, yet everything has somehow gotten slower.

lemagedurage 9 hours ago|||

True. I feel like the main way a tool could differentiate from jq is having more intuitive syntax and many real world examples to show off the syntax.

roland35 8 hours ago|||

For better or worse, Claude is my intuitive interface to jq. I don't use it frequently, and before I would have to look up the commands every time, and slowly iterate it down to what I needed.

mpalmer 7 hours ago|||

The syntax makes perfect sense when you understand the semantics of the language.

Out of curiosity, have you read the jq manpage? The first 500 words explain more or less the entire language and how it works. Not the syntax or the functions, but what the language itself is/does. The rest follows fairly easily from that.

mattbis 4 hours ago|||

Was about to post exactly this... It is impressive engineering wise, but for data and syntax, ease of use or all the great features, I care about that more. Speed isn't that important to me for a lot of these tools.

If I/you was working with JSON of that size where this was important, id say you probably need to stop using JSON! and some other binary or structured format... so long as it has some kinda tooling support.

And further if you are doing important stuff in the CLI needing a big chain of commands, you probably should be programming something to do it anyways...

that's even before we get to the whole JSON isn't really a good data format whatsoever... and there are many better ways. The old ways or the new ways. One day I will get to use my XSLT skills again :D

phillipcarter 1 hour ago|||

I agree for some things, but not for tools or "micro-software" like jq that can get called a LOT in an automated process. Every order of magnitude saved for the latter category can be meaningful.

Koschi13 8 hours ago|||

Maybe look at it from another perspective. Better performance == less CPU cycles wasted. Consider how many people use jq daily and think about how much energy could be saved by faster implementations. In times like this where energy is becoming more scarce we should think about things like this.

gpvos 8 hours ago|||

I agree, but in this age of widespread LLM use, that's only marginal.

mpalmer 7 hours ago|||

> Consider how many people use jq daily and think about how much energy could be saved by faster implementations.

Say a number; make a real argument. Don't just wave your hand and say "just imagine how right I could be about this vague notion if we only knew the facts"

montroser 10 hours ago|||

Then this is for the handful of cases for you. When it matters it matters.

raverbashing 7 hours ago|||

Yes

I don't think I remember one case where jq wasn't fast enough

Now what I'd really want is a jq that's more intuitive and easier to understand

latexr 4 hours ago||

> Now what I'd really want is a jq that's more intuitive and easier to understand

Unfortunately I don’t recall the name, but there was something submitted to HN not too long ago (I think it was still 2026) which was like jq but used JavaScript syntax.

mikojan 9 hours ago|||

> I see the same thing repeated in various front end tooling too. They all claim to be _much_ faster than their counterpart.

> 9/10 whatever tooling you are using now will be perfectly fine

Are you working in frontend? On non-trivial webapps? Because this is entirely wrong in my experience. Performance issues are the #1 complaint of everyone on the frontend team. Be that in compiling, testing or (to a lesser extend) the actual app.

g947o 7 hours ago|||

Worked on front end for years. Rarely ever hear people talking about performance issues. I was among the very few people who knew how to use the dev tools to investigate memory leak or heard of memlab.

Either the team I worked at was horrible, or you are from Google/Meta/Walmart where either everyone is smart or frondend performance is directly related to $$.

chrisweekly 6 hours ago||

"performance is directly related to $$"

It is. Company size is moot. See https://wpostats.com for starters.

lelandfe 8 hours ago||||

There are some really fast tools out there for compiling FE these days, and that's probably to what they refer. Testing is still a slog.

ffsm8 8 hours ago|||

Uh, I've worked for a few years as a frontend dev, as in literal frontend dev - at that job my responsibility started at consuming and ended at feeding backend APIs, essentially.

From that I completely agree with your statement - however, you're not addressing the point he makes which kinda makes your statement completely unrelated to his point

99.99% of all performance issues in the frontend are caused by devs doing dumb shit at this point

The frameworks performance benefits are not going to meaningfully impact this issue anymore, hence no matter how performant yours is, that's still going to be their primary complaint across almost all complex rwcs

And the other issue is that we've decided that complex transpiling is the way to go in the frontend (typescript) - without that, all built time issues would magically go away too. But I guess that's another story.

It was a different story back when eg meteorjs was the default, but nowadays they're all fast enough to not be the source of the performance issues

dalvrosa 10 hours ago||

Fair, but agentic tooling can benefit quite a lot from this

Opencode, ClaudeCode, etc, feel slow. Whatever make them faster is a win :)

httpsterio 9 hours ago|||

The 2ms it takes to run jq versus the 0.2ms to run an alternative is not why your coding agent feels slow.

jmalicki 8 hours ago||

Still, jq is run a whole lot more than it used to be due to coding agents, so every bit helps.

The vast majority of Linux kernel performance improvement patches probably have way less of a real world impact than this.

PunchyHamster 7 hours ago||

> The vast majority of Linux kernel performance improvement patches probably have way less of a real world impact than this.

unlikely given that the number they are multiplying by every improvement is far higher than "times jq is run in some pipeline". Even 0.1% improvement in kernel is probably far far higher impact than this

jmalicki 6 hours ago||

Jq is run a ton by AIs, and that is only increasing.

foobarian 6 hours ago||

I can't take seriously any talk about performance if the tools are going to shell out. It's just not a bottleneck.

jamespo 10 hours ago|||

It's not running jq locally that's causing that

Kovah 11 hours ago||

I wonder so often about many new CLI tools whose primary selling point is their speed over other tools. Yet I personally have not encountered any case where a tool like jq feels incredibly slow, and I would feel the urge to find something else. What do people do all day that existing tools are no longer enough? Or is it that kind of "my new terminal opens 107ms faster now, and I don't notice it, but I simply feel better because I know"?

n_e 11 hours ago||

I process TB-size ndjson files. I want to use jq to do some simple transformations between stages of the processing pipeline (e.g. rename a field), but it so slow that I write a single-use node or rust script instead.

loxias 2 hours ago|||

I would love, _love_ to know more about your data formats, your tools, what the JSON looks like, basically as much as you're willing to share. :)

For about a month now I've been working on a suite of tools for dealing with JSON specifically written for the imagined audience of "for people who like CLIs or TUIs and have to deal with PILES AND PILES of JSON and care deeply about performance".

For me, I've been writing them just because it's an "itch". I like writing high performance/efficient software, and there's a few gaps that it bugged me they existed, that I knew I could fill.

I'm having fun and will be happy when I finish, regardless, but it would be so cool if it happened to solve a problem for someone else.

eru 11 hours ago||||

This reminds me of someone who wrote a regex tool that matches by compiling regexes (at runtime of the tool) via LLVM to native code.

You could probably do something similar for a faster jq.

nchmy 11 hours ago||||

This isn't for you then

> The query language is deliberately less expressive than jq's. jsongrep is a search tool, not a transformation tool-- it finds values but doesn't compute new ones. There are no filters, no arithmetic, no string interpolation.

Mind me asking what sorts of TB json files you work with? Seems excessively immense.

rennokki 9 hours ago|||

> Uses jq for TB json files

> Hadoop: bro

> Spark: bro

> hive: bro

> data team: bro

eevmanu 4 hours ago|||

made me remember this article

<https://adamdrake.com/command-line-tools-can-be-235x-faster-...>

  Command-line Tools can be 235x Faster than your Hadoop Cluster (2014)

  Conclusion: Hopefully this has illustrated some points about using and abusing tools like Hadoop for data processing tasks that can better be accomplished on a single machine with simple shell commands and tools.

f311a 7 hours ago||||

JQ is very convenient, even if your files are more than 100GB. I often need to extract one field from huge JSON line files, I just pipe jq to it to get results. It's slower, but implementing proper data processing will take more time.

anonymoushn 7 hours ago||||

are those tools known for their fast json parsers?

szundi 10 hours ago|||

[dead]

messe 11 hours ago||||

Now I'm really curious. What field are you in that ndjson files of that size are common?

I'm sure there are reasons against switching to something more efficient–we've all been there–I'm just surprised.

overfeed 11 hours ago||

> Now I'm really curious. What field are you in that ndjson files of that size are common?

I'm not OP,but structured JSON logs can easily result in humongous ndjson files, even with a modest fleet of servers over a not-very-long period of time.

messe 11 hours ago||

So what's the use case for keeping them in that format rather than something more easily indexed and queryable?

I'd probably just shove it all into Postgres, but even a multi terabyte SQLite database seems more reasonable.

carlmr 11 hours ago|||

Replying here because the other comment is too deeply nested to reply.

Even if it's once off, some people handle a lot of once-offs, that's exactly where you need good CLI tooling to support it.

Sure jq isn't exactly super slow, but I also have avoided it in pipelines where I just need faster throughput.

rg was insanely useful in a project I once got where they had about 5GB of source files, a lot of them auto-generated. And you needed to find stuff in there. People were using Notepad++ and waiting minutes for a query to find something in the haystack. rg returned results in seconds.

messe 10 hours ago||

You make some good points. I've worked in support before, so I shouldn't have discounted how frequent "once-offs" can be.

paavope 11 hours ago|||

The use case could be e.g. exactly processing an old trove of logs into something more easily indexed and queryable, and you might want to use jq as part of that processing pipeline

messe 11 hours ago||

Fair, but for a once-off thing performance isn't usually a major factor.

The comment I was replying to implied this was something more regular.

EDIT: why is this being downvoted? I didn't think I was rude. The person I responded to made a good point, I was just clarifying that it wasn't quite the situation I was asking about.

adastra22 10 hours ago|||

At scale, low performance can very easily mean "longer than the lifetime of the universe to execute." The question isn't how quickly something will get done, but whether it can be done at all.

messe 9 hours ago||

Good point. I said it above, but I'll repeat it here that I shouldn't have discounted how frequent once offs can be. I've worked in support before so I really should've known better

bigDinosaur 10 hours ago|||

Certain people/businesses deal with one-off things every day. Even for something truly one-off, if one tool is too slow it might still be the difference between being able to do it once or not at all.

swiftcoder 10 hours ago|||

Deal with really big log files, mostly.

If you work at a hyperscaler, service log volume borders on the insane, and while there is a whole pile of tooling around logs, often there's no real substitute for pulling a couple of terabytes locally and going to town on them.

sgarland 6 hours ago||

> often there's no real substitute for pulling a couple of terabytes locally and going to town on them.

Fully agree. I already know the locations of the logs on-disk, and ripgrep - or at worst, grep with LC_ALL=C - is much, much faster than any aggregation tool.

If I need to compare different machines, or do complex projections, then sure, external tooling is probably easier. But for the case of “I know roughly when a problem occurred / a text pattern to match,” reading the local file is faster.

bluedino 6 hours ago|||

We parse JSON responses for dashboards, alerting, etc. Thousands of nodes, depending on the resolution of your monitoring you could see improvements here.

xlii 9 hours ago|||

It's a simple loop:

- Someone likes tool X

- Figures, that they can vibe code alternative

- They take Rust for performance or FAVORITE_LANG for credentials

- Claude implements small subset of features

- Benchmark subset

- Claim win, profit on showcase

Note: this particular project doesn't have many visible tells, but there's pattern of overdocumentation (17% comment-to-code ratio, >1000 words in README, Claude-like comment patterns), so it might be a guided process.

I still think that the project follows the "subset is faster than set" trend.

InfinityByTen 11 hours ago|||

You don't know something is slow until you encounter a use case where the speed becomes noticeable. Then you see the slowness across the board. If you can notice that a command hasn't completed and you are able to fully process a thought about it, it's slow(er than your mind, ergo slow!).

Usually, a perceptive user/technical mind is able to tweak their usage of the tools around their limitations, but if you can find a tool that doesn't have those limitations, it feels far more superior.

The only place where ripgrep hasn't seeped into my workflow for example, is after the pipe and that's just out of (bad?) habit. So much so, sometimes I'll do this foolishly rg "<term>" | grep <second filter>; then proceed to do a metaphoric facepalm on my mind. Let's see if jg can make me go jg <term> | jq <transformation> :)

oefrha 8 hours ago||

Well grep is just better sometimes. Like you want to copy some lines and grep at the end of a pipeline is just easier than rg -N to suppress line numbers. Whatever works, no need to facepalm.

postepowanieadm 6 hours ago|||

Race between ripgrep and ugrep is entertaining.

password4321 10 hours ago|||

Optimization = good

Prioritizing SEO-ing speed over supporting the same features/syntax (especially without an immediately prominent disclosure of these deficiencies) = marketing bullshit

A faster jq except it can't do what jq does... maybe I can use this as a pre-filter when necessary.

skywhopper 6 hours ago|||

Not every use case of jq is a person using it interactively in their terminal, believe it or not.

mikkupikku 6 hours ago|||

If somebody needs performance, they probably shouldn't be calling out to a separate process for json of all things, no?

(Honestly, who even still writes shell scripts? Have a coding agent write the thing in a real scripting language at least, they aren't phased by the boilerplate of constructing pipelines with python or whatever. I haven't written a shell script in over a year now.)

sgarland 5 hours ago||

If you’re writing the script to be used by multiple people, or on multiple systems, or for CI runners, or in containers, etc. then there’s no guarantee of having Python (mostly for the container situation, but still), much less of its version. It’s far too easy to accidentally use a feature or syntax that you took for granted, because who would still be using 3.7 today, anyway? I say this from painful recent experience.

Plus, for any script that’s going to be fetching or posting anything over a network, the LLM will almost certainly want to include requests, so now you either have to deal with dependencies, or make it use urllib.

In contrast, there’s an extremely high likelihood of the environment having a POSIX-compatible interpreter, so as long as you don’t use bash-isms (or zsh-isms, etc.), the script will probably work. For network access, the odds of it having curl are also quite high, moreso (especially in containers) than Python.

mikkupikku 3 hours ago||

If you're distributing the script to other people then the benifit of using python and getting stuff like high quality argument parsing for free is even greater.

7bit 5 hours ago|||

If Ms performance is a main concern, you shouldn't use jq. Believe it or not.

Jakob 11 hours ago|||

Speed is a quality in itself. We are so bugged down by slow stuff that we often ignore that and don’t actively search for another.

But every now and then a well-optimised tool/page comes along with instant feedback and is a real pleasure to use.

I think some people are more affected by that than others.

Obligatory https://m.xkcd.com/1205

Imustaskforhelp 10 hours ago||

I am not sure if it was simon or pg who might've quoted this but I remembered a quote about that a 2 magnitude order in speed (quantity) is a huge qualititative change in it of itself.

hrmtst93837 7 hours ago|||

For people chewing through 50GB logs or piping JSON through cron jobs all day, a 2x speedup is measurable in wall time and cloud bill, not just terminal-brain nonsense. Most people won't care.

If jq is something you run a few times by hand, a "faster jq" is about as compelling as a faster toaster. A lot of these tools still get traction because speed is an easy pitch, and because some team hit one ugly bottleneck in CI or a data pipeline and decided the old tool was now unacceptable.

hrmtst93837 8 hours ago|||

[dead]

hrmtst93837 10 hours ago|||

[dead]

hrmtst93837 11 hours ago||

[dead]

hackrmn 11 hours ago||

Having used `jq` and `yq` (which followed from the former, in spirit), I have never had to complain about performance of the _latter_ which an order of magnitude (or several) _slower_ than the former. So if there's something faster than `jq`, it's laudable that the author of the faster tool accomplished such a goal, but in the broader context I'd say the performance benefit would be required by a niche slice of the userbase. People who analyse JSON-formatted logs, perhaps? Then again, newline-delimited JSON reigns supreme in that particular kind of scenario, making the point of a faster `jq` moot again.

However, as someone who always loved faster software and being an optimisation nerd, hat's off!

mroche 10 hours ago||

> Having used `jq` and `yq`

If you don't mind me asking, which yq? There's a Go variant and a Python pass-through variant, the latter also including xq and tomlq.

bungle 10 hours ago|||

Integrating with server software, the performance is nice to have, as you can have say 100 kRPS requests coming in that need some jq-like logic. For CLI tool, like you said, the performance of any of them is ok, for most of the cases.

robmccoll 8 hours ago||

jq is probably faster than storage, the network, compression, or something else in your stack and not your bottleneck.

jeffbee 4 hours ago|||

I use jq to grind through gigabytes of GeoJSON files exported from ArcGIS, as an ETL stage. It takes a long time.

skywhopper 6 hours ago|||

Yeah, turns out not everyone uses these tools the way you do. Weird!

alcor-z 10 hours ago||

[dead]

Jenk 7 hours ago||

I switched to Jaq[0] a while back for the 'correctness' sake rather than performance. But Jaq also claims to be more performant than jq.

[0]: https://github.com/01mf02/jaq

password4321 3 hours ago||

Thank you for the recommendation.

It looks like jaq has already progressed much further in the right direction than jsongrep has just started in the not-quite-as-right direction.

jeffbee 1 hour ago||

I keep an eye on jaq, but there are some holes in the story. jaq 3.0 is faster than Linux distro builds of jq, but jq built correctly is faster than jaq. As far as I can tell the performance reputation of jq is caused by bad distro packaging.

Bigpet 12 hours ago||

When initially opening the page it had broken colors in light mode. For anyone else encountering it: switch to dark mode and then back to light mode to fix it.

CodeCompost 10 hours ago||

I suspect the website is vibe-coded, like the tool itself.

jmalicki 8 hours ago|||

I can forgive vibe code... It needs to execute if it works it's fine.

Unedited vibe documentation is unforgivable.

merlindru 8 hours ago|||

this is a bad faith take. i think the website is really cool and doesn't reek of slop at all. what makes you think differently?

g947o 7 hours ago|||

I would not be surprised at all if it's vibe coded. I have seen exactly the same thing myself.

I gave instruction to Claude to add a toggle button to a website where the value needs to be stored in local storage.

It is a very straightforward change. Just follow exactly how it is done for a different boolean setting and you are set. An intern can do that on the first day of their job.

Everything is done properly except that on page load, the stored setting is not read.

Which can be easily discovered if the author, with or without AI tools, has a test or manually goes through the entire workflow just once. I discovered the problem myself and fixed it.

Setting all of that aside -- even if this is not AI coded, at the least it shows the site owner doesn't have the basic care for its visitors to go through this important workflow to check if everything works properly.

xeyownt 7 hours ago|||

Same.

And who cares if it's vibe-coded or not. Since when do we care more on the how than on the what? Are people looking at how a tool was coded before using it, as if it would accelerate confidence?

1123581321 7 hours ago||

It’s a heuristic to approach a program a bit warily as the length of the documentation likely outpaces how thoroughly it was designed and tested.

micahkepe 4 hours ago|||

Hey OP here! Sorry about this this is just laziness on my part because I never use light mode so I forget to test haha, will push a fix!

shellac 10 hours ago|||

I think this has just been fixed. A bit of dark mode was leaking into light in the css.

majewsky 9 hours ago|||

I still saw the same bug just now (Firefox on macOS).

drob518 6 hours ago|||

It’s still broken for me at this point. White link text on nearly white background. Impossible to read. Safari on my iPad.

micahkepe 3 hours ago|||

Should be fixed now! Let me now :)

jvdvegt 11 hours ago|||

Fine in Firefox on Android. Note that the scales of the charts are all different, which makes them hard to compare.

Also, there are lots of charts without comparison so the numbers mean nothing...

qwe----3 11 hours ago|||

White text with light background, yeah.

keysersoze33 11 hours ago|||

I had the same problem (brave browser)

vladvasiliu 12 hours ago|||

Looks fine to me on Edge/Windows.

xyst 6 hours ago|||

Modern programmers these days just give a shit about user experience. Better to just load up in reader mode.

youngtaff 11 hours ago||

Broken on iOS Safari too

ifh-hn 11 hours ago||

I learned a number of data processing cli tools: jq, mlr, htmlq, xsv, yq, etc; to name a few. Not to the level of completing advent of code or anything, but good enough for my day to day usage. It was never ending with the amount of formats I needed to extract data from, and the different syntax's. All that changed when I found nushell though, its replaced all of these tools for me. One syntax for everything, breath of fresh air!

rlonstein 3 hours ago||

+1. I switched to using Nushell as my daily driver around mid-2023 (0.84.0?) and use it in preference to other interactive tools. I do keep at hand jq, yq, and mlr because I need to exchange stuff with colleagues who don't use Nu.

igorramazanov 9 hours ago|||

Same! Nushell replaced almost all of them

Had to spend some efforts to set up completions, also there some small rough edges around commands discoverability, but anyway, much better than the previous oh-my-zsh setup

Ideally, wish it also had a flag to enforce users to write type annotations + compiling scripts as static binaries + a TUI library, and then I'd seriously consider it for writing small apps, but I like and appreciate it in the current state already

joknoll 11 hours ago||

Same here, nushell is awesome! It helped me to automate so many more things than I did with any other shell. The syntax is so much more intuitive and coherent, which really helps a lot for someone who always forgot how to write ifs or loops in bash ^^

jiehong 10 hours ago||

First of all, congratulations! Nice tool!

Second, some comments on the presentation: the horizontal violin graphs are nice, but all tools have the same colours, and so it's just hard to even spot where jsongrep is. I'd recommend grouping by tool and colour coding it. Besides, jq itself isn't in the graphs at all (but the title of the post made me think it would be!).

Last, xLarge is a 190MiB file. I was surprised by that. It seems too low for xLarge. I daily check 400MiB json documents, and sometimes GiB ones.

micahkepe 3 hours ago|

Hey thank you! OP here, yes I was struggling to find large enough documents to run the benchmarks on, the range currently on the benchmark data is ~106 B - ~190MB, which I think covers the majority of quick task workloads, but would love to have large documents, if there's an public ones you can thinking of I'd like to know!

jiehong 2 hours ago||

The US government tend to offer big public json document [0], such as crime rates [1], or others.

[0]: https://catalog.data.gov/dataset/?res_format=JSON

[1]: https://catalog.data.gov/dataset/crimes-2001-to-present

throwawaypath 5 hours ago|

After reading the title, I was worried that this wasn't written in Rust!

VHRanger 5 hours ago||

If rust is not in the HN title and fire emojis in the readme, it doesn't come from the Rust region of France.

It's just sparkling memory safe high performance software

More comments...