Posted by wofo 2 days ago
I like the power of `jq` and the fact that LLMs are proficient at it, but I find it right out impossible to come up with the right `jq` incantations myself. Has anyone here been in a similar situation? Which tool / language did you end up exposing to your users?
it's a pipeline operating on a stream of independent json terms. The filter is reapplied to every element from the stream. Streams != lists; the latter are just a data type. `.` always points at the current element of the stream. Functions like `select` operate on separate items of the stream, while `map` operates on individual elements of a list. If you want a `map` over all elements of the stream: that's just what jq is, naturally :)
stream of a single element which is a list:
echo '[1,2,3,4]' | jq .
# [1,2,3,4]
unpack the list into a stream of separate elements: echo '[1,2,3,4]' | jq '.[]'
# 1
# 2
# 3
# 4
echo '[1,2,3,4]' | jq '.[] | .' # same: piping into `.` is a NOP:
only keep elements 2 and 4 from the stream, not from the array--there is no array left after .[] : echo '[1,2,3,4]' | jq '.[] | select(. % 2 == 0)'
# 2
# 4
keep the array: echo '[1,2,3,4]' | jq 'map(. * 2)'
# [2,4,6,8]
map over individual elements of a stream instead: echo '[1,2,3,4]' | jq '.[] | . * 2'
# 2
# 4
# 6
# 8
printf '1\n2\n3\n4\n' | jq '. * 2' # same
This is how you can do things like printf '{"a":{"b":1}}\n{"a":{"b":2}}\n{"a":{"b":3}}\n' | jq 'select(.a.b % 2 == 0) | .a'
# {"b": 2}
select creates a nested "scope" for the current element in its parens, but restores the outer scope when it exits.Hope this helps someone else!
(LLMs are already very adept at using `jq` so I would think it was preferable to be able to prompt a system that implements querying inside of source code as "this command uses the same format as `jq`")
Yes there are issues with ideologically motivated moderators, poorly cited articles, etc. But even with its flaws, it's an amazing resource provided to the public for free (as in coffee and maybe as in speech also).
[1] - https://duckdb.org/docs/stable/data/json/overview [2] - https://www.malloydata.dev/
I've observed that too many users of jq aren't willing to take a few minutes to understand how stream programming works. That investment pays off in spades.
``` $sum($myArrayExtractor($.context)) ```
where `$myArrayExtractor` is your custom code.
---
Re: "how did it go"
We had a situation where we needed to generate EDI from json objects, which routinely required us to make small tweaks to data, combine data, loop over data, etc. JSONata provided a backend framework for data transformations that reduced the scope and complexity of the project drastically.
I think JSONata is an excellent fit for situations where companies need to do data transforms, for example when it's for the sake of integrations from 3rd-party sources; all the data is there, it just needs to be mapped. Instead of having potentially buggy code as integration, you can have a pseudo-declarative jsonata spec that describes the transform for each integration source, and then just keep a single unified "JSONata runner" as the integration handler.
It's nice because we can just put the JSONata expression into a db field, and so you can have arbitrary data transforms for different customers for different data structures coming or going, and they can be set up just by editing the expression via the site, without having to worry about sandboxing it (other than resource exhaustion for recursive loops). It really sped up the iteration process for configuring transforms.
It made my life a lot easier
Just use jq. None of the other ones are as flexible or widespread and you just end up with frustrated users.
Which isn't to say jq is the best or even good but its battle-tested and just about every conceivable query problem has been thrown at it by now.
I then switched to JavaScript / TypeScript, which I found much better overall: it's understandable to basically every developer, and LLMs are very good at it. So now in my app I have a button wherever a TypeScript snippet is required that asks the LLM for its implementation, and even "weak" models one-shot it correctly 99% of the times.
It's definitely more difficult to set up, though, as it requires a sandbox where you can run the code without fears. In my app I use QuickJS, which works very well for my use case, but might not be performant enough in other contexts.