A Faster Alternative to Jq

(micahkepe.com)

191 points | by pistolario 5 hours ago

32 comments

1a527dd5 3 hours ago
I appreciate performance as much as the next person; but I see this endless battle to measure things in ns/us/ms as performative.
Sure there are 0.000001% edge cases where that MIGHT be the next big bottleneck.
I see the same thing repeated in various front end tooling too. They all claim to be _much_ faster than their counterpart.
9/10 whatever tooling you are using now will be perfectly fine. Example; I use grep a lot in an ad hoc manner on really large files I switch to rg. But that is only in the handful of cases.
[-]
- j1elo 2 hours ago
  Whenever you have this kind of impressions on some development, here are my 2 cents: just think "I'm not the target audience". And that's fine.
  The difference between 2ms and 0.2ms might sound unneeded, or even silly to you. But somebody, somewhere, is doing stream processing of TB-sized JSON objects, and they will care. These news are for them.
  [-]
  - mememememememo 53 minutes ago
    Also as someone who looks at latency charts too much, what happens is a request does a lot in series and any little ms you can knock off adds up. You save 10ms by saving 10 x 1ms. And if you are a proxyish service then you are a 10ms in a chain that might be taking 200 or 300ms. It is like saving money, you have to like cut lots of small expenses to make an impact. (unless you move etc. but once you done that it is small numerous things thay add up)
    Also performance improvements on heavy used systems unlocks:
    Cost savings
    Stability
    Higher reliability
    Higher throughput
    Fewer incidents
    Lower scaling out requirements.
    [-]
    - lock1 33 minutes ago
      Wait what? I don't get why performance improvement implies reliability and incident improvement.
      For example, doing dangerous thing might be faster (no bound checks, weaker consistency guarantee, etc), but it clearly tend to be a reliability regression.
  - tclancy 27 minutes ago
    Which is fine, but the vast majority of the things that get presented aren’t bothering to benchmark against my use (for a whole lotta mes). They come from someone scratching an itch and solving it for a target audience of one and then extrapolating and bolting on some benchmarks. And at the sizes you’re talking about, how many tooling authors have the computing power on hand to test that?
  - Chris2048 39 minutes ago
    But even in this example, the 2ms vs 0.2 is irrelevant - its whatever the timings are for TB-size objects.
    So went not compare that case directly? We'd also want to see the performance of the assumed overheads i.e. how it scales.
- raverbashing 0 minutes ago
  Yes
  I don't think I remember one case where jq wasn't fast enough
  Now what I'd really want is a jq that's more intuitive and easier to understand
- lemagedurage 2 hours ago
  True. I feel like the main way a tool could differentiate from jq is having more intuitive syntax and many real world examples to show off the syntax.
  [-]
  - roland35 1 hour ago
    For better or worse, Claude is my intuitive interface to jq. I don't use it frequently, and before I would have to look up the commands every time, and slowly iterate it down to what I needed.
  - mpalmer 25 minutes ago
    The syntax makes perfect sense when you understand the semantics of the language.
    Out of curiosity, have you read the jq manpage? The first 500 words explain more or less the entire language and how it works. Not the syntax or the functions, but what the language itself is/does. The rest follows fairly easily from that.
- Hendrikto 13 minutes ago
  I get the sentiment, but everybody thinks that, and in aggregate, you get death by a thousand paper cuts.
  It’s the same sentiment as “Individuals don’t matter, look at how tiny my contribution is.”. Society is made up of individuals, so everybody has to do their part.
  > 9/10 whatever tooling you are using now will be perfectly fine.
  It is not though. Software is getting slower faster than hardware is getting quicker. We have computers that are easily 3–4+ orders of magnitudes faster than what we had 40 years ago, yet everything has somehow gotten slower.
- Koschi13 1 hour ago
  Maybe look at it from another perspective. Better performance == less CPU cycles wasted. Consider how many people use jq daily and think about how much energy could be saved by faster implementations. In times like this where energy is becoming more scarce we should think about things like this.
  [-]
  - mpalmer 20 minutes ago
    > Consider how many people use jq daily and think about how much energy could be saved by faster implementations.
    Say a number; make a real argument. Don't just wave your hand and say "just imagine how right I could be about this vague notion if we only knew the facts"
  - gpvos 1 hour ago
    I agree, but in this age of widespread LLM use, that's only marginal.
- montroser 3 hours ago
  Then this is for the handful of cases for you. When it matters it matters.
- mikojan 2 hours ago
  > I see the same thing repeated in various front end tooling too. They all claim to be _much_ faster than their counterpart.
  >
  > 9/10 whatever tooling you are using now will be perfectly fine
  Are you working in frontend? On non-trivial webapps? Because this is entirely wrong in my experience. Performance issues are the #1 complaint of everyone on the frontend team. Be that in compiling, testing or (to a lesser extend) the actual app.
  [-]
  - g947o 22 minutes ago
    Worked on front end for years. Rarely ever hear people talking about performance issues. I was among the very few people who knew how to use the dev tools to investigate memory leak or heard of memlab.
    Either the team I worked at was horrible, or you are from Google/Meta/Walmart where either everyone is smart or frondend performance is directly related to $$.
  - lelandfe 1 hour ago
    There are some really fast tools out there for compiling FE these days, and that's probably to what they refer. Testing is still a slog.
  - ffsm8 1 hour ago
    Uh, I've worked for a few years as a frontend dev, as in literal frontend dev - at that job my responsibility started at consuming and ended at feeding backend APIs, essentially.
    From that I completely agree with your statement - however, you're not addressing the point he makes which kinda makes your statement completely unrelated to his point
    99.99% of all performance issues in the frontend are caused by devs doing dumb shit at this point
    The frameworks performance benefits are not going to meaningfully impact this issue anymore, hence no matter how performant yours is, that's still going to be their primary complaint across almost all complex rwcs
    And the other issue is that we've decided that complex transpiling is the way to go in the frontend (typescript) - without that, all built time issues would magically go away too. But I guess that's another story.
    It was a different story back when eg meteorjs was the default, but nowadays they're all fast enough to not be the source of the performance issues
- dalvrosa 3 hours ago
  Fair, but agentic tooling can benefit quite a lot from this
  Opencode, ClaudeCode, etc, feel slow. Whatever make them faster is a win :)
  [-]
  - httpsterio 2 hours ago
    The 2ms it takes to run jq versus the 0.2ms to run an alternative is not why your coding agent feels slow.
    [-]
    - jmalicki 1 hour ago
      Still, jq is run a whole lot more than it used to be due to coding agents, so every bit helps.
      The vast majority of Linux kernel performance improvement patches probably have way less of a real world impact than this.
      [-]
      - PunchyHamster 3 minutes ago
        > The vast majority of Linux kernel performance improvement patches probably have way less of a real world impact than this.
        unlikely given that the number they are multiplying by every improvement is far higher than "times jq is run in some pipeline". Even 0.1% improvement in kernel is probably far far higher impact than this
  - jamespo 3 hours ago
    It's not running jq locally that's causing that
Kovah 4 hours ago
I wonder so often about many new CLI tools whose primary selling point is their speed over other tools. Yet I personally have not encountered any case where a tool like jq feels incredibly slow, and I would feel the urge to find something else. What do people do all day that existing tools are no longer enough? Or is it that kind of "my new terminal opens 107ms faster now, and I don't notice it, but I simply feel better because I know"?
[-]
- n_e 4 hours ago
  I process TB-size ndjson files. I want to use jq to do some simple transformations between stages of the processing pipeline (e.g. rename a field), but it so slow that I write a single-use node or rust script instead.
  [-]
  - nchmy 4 hours ago
    This isn't for you then
    > The query language is deliberately less expressive than jq's. jsongrep is a search tool, not a transformation tool-- it finds values but doesn't compute new ones. There are no filters, no arithmetic, no string interpolation.
    Mind me asking what sorts of TB json files you work with? Seems excessively immense.
    [-]
    - szundi 3 hours ago
      [dead]
    - rennokki 2 hours ago
      > Uses jq for TB json files
      > Hadoop: bro
      > Spark: bro
      > hive: bro
      > data team: bro
      [-]
      - f311a 4 minutes ago
        JQ is very convenient, even if your files are more than 100GB. I often need to extract one field from huge JSON line files, I just pipe jq to it to get results. It's slower, but implementing proper data processing will take more time.
      - anonymoushn 8 minutes ago
        are those tools known for their fast json parsers?
  - eru 4 hours ago
    This reminds me of someone who wrote a regex tool that matches by compiling regexes (at runtime of the tool) via LLVM to native code.
    You could probably do something similar for a faster jq.
  - messe 4 hours ago
    Now I'm really curious. What field are you in that ndjson files of that size are common?
    I'm sure there are reasons against switching to something more efficient–we've all been there–I'm just surprised.
    [-]
    - overfeed 4 hours ago
      > Now I'm really curious. What field are you in that ndjson files of that size are common?
      I'm not OP,but structured JSON logs can easily result in humongous ndjson files, even with a modest fleet of servers over a not-very-long period of time.
      [-]
      - messe 4 hours ago
        So what's the use case for keeping them in that format rather than something more easily indexed and queryable?
        I'd probably just shove it all into Postgres, but even a multi terabyte SQLite database seems more reasonable.
        [-]
        carlmr 4 hours ago
        Replying here because the other comment is too deeply nested to reply.
        Even if it's once off, some people handle a lot of once-offs, that's exactly where you need good CLI tooling to support it.
        Sure jq isn't exactly super slow, but I also have avoided it in pipelines where I just need faster throughput.
        rg was insanely useful in a project I once got where they had about 5GB of source files, a lot of them auto-generated. And you needed to find stuff in there. People were using Notepad++ and waiting minutes for a query to find something in the haystack. rg returned results in seconds.
        [-]
        messe 3 hours ago
        You make some good points. I've worked in support before, so I shouldn't have discounted how frequent "once-offs" can be.
        paavope 4 hours ago
        The use case could be e.g. exactly processing an old trove of logs into something more easily indexed and queryable, and you might want to use jq as part of that processing pipeline
        [-]
        messe 4 hours ago
        Fair, but for a once-off thing performance isn't usually a major factor.
        The comment I was replying to implied this was something more regular.
        EDIT: why is this being downvoted? I didn't think I was rude. The person I responded to made a good point, I was just clarifying that it wasn't quite the situation I was asking about.
        [-]
        adastra22 3 hours ago
        At scale, low performance can very easily mean "longer than the lifetime of the universe to execute." The question isn't how quickly something will get done, but whether it can be done at all.
        [-]
        messe 2 hours ago
        Good point. I said it above, but I'll repeat it here that I shouldn't have discounted how frequent once offs can be. I've worked in support before so I really should've known better
        bigDinosaur 3 hours ago
        Certain people/businesses deal with one-off things every day. Even for something truly one-off, if one tool is too slow it might still be the difference between being able to do it once or not at all.
- swiftcoder 3 hours ago
  Deal with really big log files, mostly.
  If you work at a hyperscaler, service log volume borders on the insane, and while there is a whole pile of tooling around logs, often there's no real substitute for pulling a couple of terabytes locally and going to town on them.
- xlii 2 hours ago
  It's a simple loop:
  - Someone likes tool X
  - Figures, that they can vibe code alternative
  - They take Rust for performance or FAVORITE_LANG for credentials
  - Claude implements small subset of features
  - Benchmark subset
  - Claim win, profit on showcase
  Note: this particular project doesn't have many visible tells, but there's pattern of overdocumentation (17% comment-to-code ratio, >1000 words in README, Claude-like comment patterns), so it might be a guided process.
  I still think that the project follows the "subset is faster than set" trend.
- InfinityByTen 4 hours ago
  You don't know something is slow until you encounter a use case where the speed becomes noticeable. Then you see the slowness across the board. If you can notice that a command hasn't completed and you are able to fully process a thought about it, it's slow(er than your mind, ergo slow!).
  Usually, a perceptive user/technical mind is able to tweak their usage of the tools around their limitations, but if you can find a tool that doesn't have those limitations, it feels far more superior.
  The only place where ripgrep hasn't seeped into my workflow for example, is after the pipe and that's just out of (bad?) habit. So much so, sometimes I'll do this foolishly rg "<term>" | grep <second filter>; then proceed to do a metaphoric facepalm on my mind. Let's see if jg can make me go jg <term> | jq <transformation> :)
  [-]
  - oefrha 1 hour ago
    Well grep is just better sometimes. Like you want to copy some lines and grep at the end of a pipeline is just easier than rg -N to suppress line numbers. Whatever works, no need to facepalm.
- hrmtst93837 31 minutes ago
  For people chewing through 50GB logs or piping JSON through cron jobs all day, a 2x speedup is measurable in wall time and cloud bill, not just terminal-brain nonsense. Most people won't care.
  If jq is something you run a few times by hand, a "faster jq" is about as compelling as a faster toaster. A lot of these tools still get traction because speed is an easy pitch, and because some team hit one ugly bottleneck in CI or a data pipeline and decided the old tool was now unacceptable.
- password4321 3 hours ago
  Optimization = good
  Prioritizing SEO-ing speed over supporting the same features/syntax (especially without an immediately prominent disclosure of these deficiencies) = marketing bullshit
  A faster jq except it can't do what jq does... maybe I can use this as a pre-filter when necessary.
- Jakob 4 hours ago
  Speed is a quality in itself. We are so bugged down by slow stuff that we often ignore that and don’t actively search for another.
  But every now and then a well-optimised tool/page comes along with instant feedback and is a real pleasure to use.
  I think some people are more affected by that than others.
  Obligatory https://m.xkcd.com/1205
  [-]
  - Imustaskforhelp 3 hours ago
    I am not sure if it was simon or pg who might've quoted this but I remembered a quote about that a 2 magnitude order in speed (quantity) is a huge qualititative change in it of itself.
- hrmtst93837 1 hour ago
  [dead]
- hrmtst93837 3 hours ago
  [dead]
- hrmtst93837 4 hours ago
  [dead]
hackrmn 4 hours ago
Having used `jq` and `yq` (which followed from the former, in spirit), I have never had to complain about performance of the _latter_ which an order of magnitude (or several) _slower_ than the former. So if there's something faster than `jq`, it's laudable that the author of the faster tool accomplished such a goal, but in the broader context I'd say the performance benefit would be required by a niche slice of the userbase. People who analyse JSON-formatted logs, perhaps? Then again, newline-delimited JSON reigns supreme in that particular kind of scenario, making the point of a faster `jq` moot again.
However, as someone who always loved faster software and being an optimisation nerd, hat's off!
[-]
- bungle 3 hours ago
  Integrating with server software, the performance is nice to have, as you can have say 100 kRPS requests coming in that need some jq-like logic. For CLI tool, like you said, the performance of any of them is ok, for most of the cases.
  [-]
  - robmccoll 1 hour ago
    jq is probably faster than storage, the network, compression, or something else in your stack and not your bottleneck.
- mroche 3 hours ago
  > Having used `jq` and `yq`
  If you don't mind me asking, which yq? There's a Go variant and a Python pass-through variant, the latter also including xq and tomlq.
- alcor-z 3 hours ago
  [dead]
Bigpet 5 hours ago
When initially opening the page it had broken colors in light mode. For anyone else encountering it: switch to dark mode and then back to light mode to fix it.
[-]
- CodeCompost 3 hours ago
  I suspect the website is vibe-coded, like the tool itself.
  [-]
  - jmalicki 1 hour ago
    I can forgive vibe code... It needs to execute if it works it's fine.
    Unedited vibe documentation is unforgivable.
  - merlindru 1 hour ago
    this is a bad faith take. i think the website is really cool and doesn't reek of slop at all. what makes you think differently?
    [-]
    - g947o 17 minutes ago
      I would not be surprised at all if it's vibe coded. I have seen exactly the same thing myself.
      I gave instruction to Claude to add a toggle button to a website where the value needs to be stored in local storage.
      It is a very straightforward change. Just follow exactly how it is done for a different boolean setting and you are set. An intern can do that on the first day of their job.
      Everything is done properly except that on page load, the stored setting is not read.
      Which can be easily discovered if the author, with or without AI tools, has a test or manually goes through the entire workflow just once. I discovered the problem myself and fixed it.
      Setting all of that aside -- even if this is not AI coded, at the least it shows the site owner doesn't have the basic care for its visitors to go through this important workflow to check if everything works properly.
    - xeyownt 31 minutes ago
      Same.
      And who cares if it's vibe-coded or not. Since when do we care more on the how than on the what? Are people looking at how a tool was coded before using it, as if it would accelerate confidence?
- shellac 3 hours ago
  I think this has just been fixed. A bit of dark mode was leaking into light in the css.
  [-]
  - majewsky 2 hours ago
    I still saw the same bug just now (Firefox on macOS).
- jvdvegt 4 hours ago
  Fine in Firefox on Android. Note that the scales of the charts are all different, which makes them hard to compare.
  Also, there are lots of charts without comparison so the numbers mean nothing...
- qwe----3 4 hours ago
  White text with light background, yeah.
- keysersoze33 4 hours ago
  I had the same problem (brave browser)
- vladvasiliu 5 hours ago
  Looks fine to me on Edge/Windows.
- youngtaff 4 hours ago
  Broken on iOS Safari too
Jenk 21 minutes ago
I switched to Jaq[0] a while back for the 'correctness' sake rather than performance. But Jaq also claims to be more performant than jq.
[0]: https://github.com/01mf02/jaq
ontouchstart 16 minutes ago
Everything can be written in JavaScript will be written in JavaScript.
Everything can be rewritten in Rust will be written in Rust.
Voranto 27 minutes ago
Quick question: Isn't the construction of a NFA - DFA a O(2^n) algorithm? If a JSON file has a couple hundred values, its equivalent NFA will have a similar amount, and the DFA will have 2^100 states, so I must be missing something.
jiehong 3 hours ago
First of all, congratulations! Nice tool!
Second, some comments on the presentation: the horizontal violin graphs are nice, but all tools have the same colours, and so it's just hard to even spot where jsongrep is. I'd recommend grouping by tool and colour coding it. Besides, jq itself isn't in the graphs at all (but the title of the post made me think it would be!).
Last, xLarge is a 190MiB file. I was surprised by that. It seems too low for xLarge. I daily check 400MiB json documents, and sometimes GiB ones.
onedognight 2 hours ago
Having the equivalent jq expression in these examples might help to compare expressiveness, and it might help me see if jq could “just” use a DFA when a (sub)query admits one. grep, ripgrep, etc change algorithms based on the query and that makes the speed improvements automatic.
ifh-hn 4 hours ago
I learned a number of data processing cli tools: jq, mlr, htmlq, xsv, yq, etc; to name a few. Not to the level of completing advent of code or anything, but good enough for my day to day usage. It was never ending with the amount of formats I needed to extract data from, and the different syntax's. All that changed when I found nushell though, its replaced all of these tools for me. One syntax for everything, breath of fresh air!
[-]
- igorramazanov 2 hours ago
  Same! Nushell replaced almost all of them
  Had to spend some efforts to set up completions, also there some small rough edges around commands discoverability, but anyway, much better than the previous oh-my-zsh setup
  Ideally, wish it also had a flag to enforce users to write type annotations + compiling scripts as static binaries + a TUI library, and then I'd seriously consider it for writing small apps, but I like and appreciate it in the current state already
- joknoll 4 hours ago
  Same here, nushell is awesome! It helped me to automate so many more things than I did with any other shell. The syntax is so much more intuitive and coherent, which really helps a lot for someone who always forgot how to write ifs or loops in bash ^^
sirfz 41 minutes ago
Nowadays I'd just use clickhouse-local / chdb / duckdb to query json files (and pretty much any standard format files)
Asmod4n 1 hour ago
You could just take simdjson, use its ondemand api and then navigate it with .at_path(_with_wildcard) (https://github.com/simdjson/simdjson/blob/master/doc/basics....)
The whole tool would be like a few dozen lines of c++ and most likely be faster than this.
maxloh 4 hours ago
From their README [0]:
> Jq is a powerful tool, but its imperative filter syntax can be verbose for common path-matching tasks. jsongrep is declarative: you describe the shape of the paths you want, and the engine finds them.
IMO, this isn't a common use case. The comparison here is essentially like Java vs Python. Jq is perfectly fine for quick peeking. If you actually need better performance, there are always faster ways to parse JSON than using a CLI.
[0]: https://github.com/micahkepe/jsongrep
luc4 1 hour ago
Since the query compilation needs exponential time, I wonder how large the queries can be before jsongrep becomes slower than all the other tools. In that regard, I think the library could benefit from some functionality for query compilation at compile-time.
rswail 52 minutes ago
Just about to read, but I had to change to dark mode to be able to see the examples, which are bold white on a white background.
bouk 2 hours ago
I highly recommend anyone to look at jq's VM implementation some time, it's kind of mind-blowing how it works under the hood: https://github.com/jqlang/jq/blob/master/src/execute.c
It does some kind of stack forking which is what allows its funky syntax
[-]
- functional_dev 2 hours ago
  The backtracking implementation in jq is really the secret sauce for how it handles those complex filters without getting bogged down
- vbezhenar 2 hours ago
  Looks like naive implementation of homemade bytecode interpreter. What's so mind blowing about that? Maybe I missed something.
enricozb 2 hours ago
I am excited for some alternative syntax to jq's. I haven't given much thought to how I'd write a new JSON query syntax if I were writing things from scratch, but I personally never found the jq syntax intuitive. Perhaps I haven't given it enough effort to learn properly.
[-]
- s_dev 2 hours ago
  You don't learn it properly. It's not supposed to be intuitive, it's supposed to be concise at the cost of it being intuitive. Would be like somebody saying typing words in to Google is more intuitive than writing regex.
  jq is supposed to fit in to other bash scripts as a one liner. That's it's super power. I know very few people who write regex on the fly either (unless you were using it everyday) they check the documentation and flesh it out when they need it.
  Just use Claude to generate the jq expression you need and test it.
peterohler 35 minutes ago
Another alternative is oj, https://github.com/ohler55/ojg. I don't know how the performance compares to jq or any others but it does use JSONPath as the query language. It has a few other options for making nicely formatted JSON and colorizing JSON.
stuaxo 2 hours ago
Nice.
Some bits of the site are hard to read "takes a query and a JSON input" query is in white and the background of the site is very light which makes it hard to read.
wolfi1 2 hours ago
forgive me my rant, but when I see "just install it with cargo" I immediately lose interest. How many GB do I have to install just to test a little tool? sorry, not gonna do that
keysersoze33 4 hours ago
I was a bit skeptical at first, but after reading more into jsongrep, it's actually very good. Only did a very quick test just now, and after stumbling over slightly different syntax to jq, am actually quite impressed. Give it a try
[-]
- carlmr 4 hours ago
  What were your syntax stumbling blocks? I must be honest I've used jq enough but can never remember the syntax. It's one of the worst things about jq IMO (not the speed, even though I'm a fan of speedups). There's something ungrokkable about that syntax for me.
steelbrain 4 hours ago
Surprised to see that there's no official binaries for arm64 darwin. Meaning macOS users will have to run it through the Rosetta 2 translation layer.
[-]
- QuantumNomad_ 4 hours ago
  I’d install it via cargo anyway and that would build it for arm64.
  If the arm64 version was on homebrew (didn’t check if it is but assume not because it’s not mentioned on the page), I’d install it from there rather than from cargo.
  I don’t really manually install binaries from GitHub, but it’s nice that the author provides binaries for several platforms for people that do like to install it that way.
  [-]
  - maleldil 3 hours ago
    You can use cargo-binstall to retrieve Github binary releases if there are any.
- baszalmstra 4 hours ago
  Really? That is your response? This is an high quality article from someone who spend a lot of time implementing a cool tool and also sharing the intricate inner workings of it. And your response is, "eh there are no official binaries for my platform". Give them some credit! Be a little more constructive!
  [-]
  - coldtea 3 hours ago
    His response at least fits the discussion and is relevant to the tool, not generic hollier-than-thou scolding.
    To address the concern, anyway, I'm sure it would soon be available in brew as an arm binary.
silverwind 2 hours ago
Effort would be better investigated making `jq` itself faster.
furryrain 4 hours ago
If it's easier to use than jq, they should sell the tool on that.
coldtea 3 hours ago
Speed is good! Not a big fan of the syntax though.
quotemstr 4 hours ago
Reminder you can also get DuckDB to slurp the JSON natively and give you a much more expressive query model than anything jq-like.
PUSH_AX 2 hours ago
Is Jq slow?
[-]
- PunchyHamster 3 minutes ago
  no
adastra22 3 hours ago
The fastest alternative to jq is to not use JSON.
[-]
marxisttemp 2 hours ago
Many Useless Uses of cat in this documentation. You never need to do `cat file | foo`, you can just do `<file foo`. cat is for concatenating inputs, you never need it for a single input.
leontloveless 48 minutes ago
[dead]
mitul005 1 hour ago
[dead]