| ▲ | Kovah 7 hours ago |
| I wonder so often about many new CLI tools whose primary selling point is their speed over other tools. Yet I personally have not encountered any case where a tool like jq feels incredibly slow, and I would feel the urge to find something else.
What do people do all day that existing tools are no longer enough? Or is it that kind of "my new terminal opens 107ms faster now, and I don't notice it, but I simply feel better because I know"? |
|
| ▲ | n_e 7 hours ago | parent | next [-] |
| I process TB-size ndjson files. I want to use jq to do some simple transformations between stages of the processing pipeline (e.g. rename a field), but it so slow that I write a single-use node or rust script instead. |
| |
| ▲ | eru 7 hours ago | parent | next [-] | | This reminds me of someone who wrote a regex tool that matches by compiling regexes (at runtime of the tool) via LLVM to native code. You could probably do something similar for a faster jq. | |
| ▲ | nchmy 7 hours ago | parent | prev | next [-] | | This isn't for you then > The query language is deliberately less expressive than jq's. jsongrep is a search tool, not a transformation tool-- it finds values but doesn't compute new ones. There are no filters, no arithmetic, no string interpolation. Mind me asking what sorts of TB json files you work with? Seems excessively immense. | | |
| ▲ | rennokki 5 hours ago | parent | next [-] | | > Uses jq for TB json files > Hadoop: bro > Spark: bro > hive: bro > data team: bro | | |
| ▲ | f311a 2 hours ago | parent | next [-] | | JQ is very convenient, even if your files are more than 100GB.
I often need to extract one field from huge JSON line files, I just pipe jq to it to get results. It's slower, but implementing proper data processing will take more time. | |
| ▲ | anonymoushn 2 hours ago | parent | prev | next [-] | | are those tools known for their fast json parsers? | |
| ▲ | 3 hours ago | parent | prev [-] | | [deleted] |
| |
| ▲ | szundi 6 hours ago | parent | prev [-] | | [dead] |
| |
| ▲ | messe 7 hours ago | parent | prev | next [-] | | Now I'm really curious. What field are you in that ndjson files of that size are common? I'm sure there are reasons against switching to something more efficient–we've all been there–I'm just surprised. | | |
| ▲ | overfeed 7 hours ago | parent [-] | | > Now I'm really curious. What field are you in that ndjson files of that size are common? I'm not OP,but structured JSON logs can easily result in humongous ndjson files, even with a modest fleet of servers over a not-very-long period of time. | | |
| ▲ | messe 7 hours ago | parent [-] | | So what's the use case for keeping them in that format rather than something more easily indexed and queryable? I'd probably just shove it all into Postgres, but even a multi terabyte SQLite database seems more reasonable. | | |
| ▲ | carlmr 6 hours ago | parent | next [-] | | Replying here because the other comment is too deeply nested to reply. Even if it's once off, some people handle a lot of once-offs, that's exactly where you need good CLI tooling to support it. Sure jq isn't exactly super slow, but I also have avoided it in pipelines where I just need faster throughput. rg was insanely useful in a project I once got where they had about 5GB of source files, a lot of them auto-generated. And you needed to find stuff in there. People were using Notepad++ and waiting minutes for a query to find something in the haystack. rg returned results in seconds. | | |
| ▲ | messe 6 hours ago | parent [-] | | You make some good points. I've worked in support before, so I shouldn't have discounted how frequent "once-offs" can be. |
| |
| ▲ | paavope 7 hours ago | parent | prev [-] | | The use case could be e.g. exactly processing an old trove of logs into something more easily indexed and queryable, and you might want to use jq as part of that processing pipeline | | |
| ▲ | messe 7 hours ago | parent [-] | | Fair, but for a once-off thing performance isn't usually a major factor. The comment I was replying to implied this was something more regular. EDIT: why is this being downvoted? I didn't think I was rude. The person I responded to made a good point, I was just clarifying that it wasn't quite the situation I was asking about. | | |
| ▲ | adastra22 6 hours ago | parent | next [-] | | At scale, low performance can very easily mean "longer than the lifetime of the universe to execute." The question isn't how quickly something will get done, but whether it can be done at all. | | |
| ▲ | messe 5 hours ago | parent [-] | | Good point. I said it above, but I'll repeat it here that I shouldn't have discounted how frequent once offs can be. I've worked in support before so I really should've known better |
| |
| ▲ | bigDinosaur 6 hours ago | parent | prev [-] | | Certain people/businesses deal with one-off things every day. Even for something truly one-off, if one tool is too slow it might still be the difference between being able to do it once or not at all. |
|
|
|
|
| |
| ▲ | 6 hours ago | parent | prev [-] | | [deleted] |
|
|
| ▲ | bluedino 2 hours ago | parent | prev | next [-] |
| We parse JSON responses for dashboards, alerting, etc. Thousands of nodes, depending on the resolution of your monitoring you could see improvements here. |
|
| ▲ | swiftcoder 6 hours ago | parent | prev | next [-] |
| Deal with really big log files, mostly. If you work at a hyperscaler, service log volume borders on the insane, and while there is a whole pile of tooling around logs, often there's no real substitute for pulling a couple of terabytes locally and going to town on them. |
| |
| ▲ | sgarland 2 hours ago | parent [-] | | > often there's no real substitute for pulling a couple of terabytes locally and going to town on them. Fully agree. I already know the locations of the logs on-disk, and ripgrep - or at worst, grep with LC_ALL=C - is much, much faster than any aggregation tool. If I need to compare different machines, or do complex projections, then sure, external tooling is probably easier. But for the case of “I know roughly when a problem occurred / a text pattern to match,” reading the local file is faster. |
|
|
| ▲ | xlii 5 hours ago | parent | prev | next [-] |
| It's a simple loop: - Someone likes tool X - Figures, that they can vibe code alternative - They take Rust for performance or FAVORITE_LANG for credentials - Claude implements small subset of features - Benchmark subset - Claim win, profit on showcase Note: this particular project doesn't have many visible tells, but there's pattern of overdocumentation (17% comment-to-code ratio, >1000 words in README, Claude-like comment patterns), so it might be a guided process. I still think that the project follows the "subset is faster than set" trend. |
|
| ▲ | InfinityByTen 7 hours ago | parent | prev | next [-] |
| You don't know something is slow until you encounter a use case where the speed becomes noticeable. Then you see the slowness across the board. If you can notice that a command hasn't completed and you are able to fully process a thought about it, it's slow(er than your mind, ergo slow!). Usually, a perceptive user/technical mind is able to tweak their usage of the tools around their limitations, but if you can find a tool that doesn't have those limitations, it feels far more superior. The only place where ripgrep hasn't seeped into my workflow for example, is after the pipe and that's just out of (bad?) habit. So much so, sometimes I'll do this foolishly rg "<term>" | grep <second filter>; then proceed to do a metaphoric facepalm on my mind. Let's see if jg can make me go jg <term> | jq <transformation> :) |
| |
| ▲ | oefrha 4 hours ago | parent [-] | | Well grep is just better sometimes. Like you want to copy some lines and grep at the end of a pipeline is just easier than rg -N to suppress line numbers. Whatever works, no need to facepalm. |
|
|
| ▲ | postepowanieadm 2 hours ago | parent | prev | next [-] |
| Race between ripgrep and ugrep is entertaining. |
|
| ▲ | hrmtst93837 3 hours ago | parent | prev | next [-] |
| For people chewing through 50GB logs or piping JSON through cron jobs all day, a 2x speedup is measurable in wall time and cloud bill, not just terminal-brain nonsense. Most people won't care. If jq is something you run a few times by hand, a "faster jq" is about as compelling as a faster toaster. A lot of these tools still get traction because speed is an easy pitch, and because some team hit one ugly bottleneck in CI or a data pipeline and decided the old tool was now unacceptable. |
|
| ▲ | skywhopper 2 hours ago | parent | prev | next [-] |
| Not every use case of jq is a person using it interactively in their terminal, believe it or not. |
| |
| ▲ | mikkupikku 2 hours ago | parent | next [-] | | If somebody needs performance, they probably shouldn't be calling out to a separate process for json of all things, no? (Honestly, who even still writes shell scripts? Have a coding agent write the thing in a real scripting language at least, they aren't phased by the boilerplate of constructing pipelines with python or whatever. I haven't written a shell script in over a year now.) | | |
| ▲ | sgarland an hour ago | parent [-] | | If you’re writing the script to be used by multiple people, or on multiple systems, or for CI runners, or in containers, etc. then there’s no guarantee of having Python (mostly for the container situation, but still), much less of its version. It’s far too easy to accidentally use a feature or syntax that you took for granted, because who would still be using 3.7 today, anyway? I say this from painful recent experience. Plus, for any script that’s going to be fetching or posting anything over a network, the LLM will almost certainly want to include requests, so now you either have to deal with dependencies, or make it use urllib. In contrast, there’s an extremely high likelihood of the environment having a POSIX-compatible interpreter, so as long as you don’t use bash-isms (or zsh-isms, etc.), the script will probably work. For network access, the odds of it having curl are also quite high, moreso (especially in containers) than Python. |
| |
| ▲ | 7bit an hour ago | parent | prev [-] | | If Ms performance is a main concern, you shouldn't use jq. Believe it or not. |
|
|
| ▲ | password4321 6 hours ago | parent | prev | next [-] |
| Optimization = good Prioritizing SEO-ing speed over supporting the same features/syntax (especially without an immediately prominent disclosure of these deficiencies) = marketing bullshit A faster jq except it can't do what jq does... maybe I can use this as a pre-filter when necessary. |
|
| ▲ | Jakob 7 hours ago | parent | prev | next [-] |
| Speed is a quality in itself. We are so bugged down by slow stuff that we often ignore that and don’t actively search for another. But every now and then a well-optimised tool/page comes along with instant feedback and is a real pleasure to use. I think some people are more affected by that than others. Obligatory https://m.xkcd.com/1205 |
| |
| ▲ | Imustaskforhelp 5 hours ago | parent [-] | | I am not sure if it was simon or pg who might've quoted this but I remembered a quote about that a 2 magnitude order in speed (quantity) is a huge qualititative change in it of itself. |
|
|
| ▲ | hrmtst93837 4 hours ago | parent | prev | next [-] |
| [dead] |
|
| ▲ | hrmtst93837 5 hours ago | parent | prev | next [-] |
| [dead] |
|
| ▲ | hrmtst93837 7 hours ago | parent | prev [-] |
| [dead] |