Remix.run Logo
boyter 13 hours ago

Such a good read. I actually went back though it the other day to steal the searching for the least common byte idea out to speed up my search tool https://github.com/boyter/cs which when coupled with the simd upper lower search technique from fzf cut the wall clock runtime by a third.

There was this post from cursor https://cursor.com/blog/fast-regex-search today about building an index for agents due to them hitting a limit on ripgrep, but I’m not sure what codebase they are hitting that warrants it. Especially since they would have to be at 100-200 GB to be getting to 15s of runtime. Unless it’s all matches that is.

tmarice 11 hours ago | parent [-]

Yeah, that Cursor blog post is a bit iffy since they just brush over the "ripgrep is slow on large monorepos", move on to techniques they used, and then completely ignore the fact that you have to build and maintain the index.

On a mid-size codebase, I fzf- and rg-ed through the code almost instantly, while watching my coworker's computer slow down to a crawl when Pycharm started reindexing the project.

cess11 4 hours ago | parent [-]

I'm not into the low level minutiae but on large code bases I sometimes see a lag on the first rg:s I run and then it's fast, which I attribute to some OS level caching stuff.

Perhaps they run their software on operating or file systems that can't do it, or on hardware with different constraints than the workstation flavoured laptops I use.

boyter 3 hours ago | parent [-]

The disk cache has a huge impact. However they claim it’s for multiple searches so it should be in it.