| ▲ | oever 12 hours ago |
| This week I wrote a small bash function that run ripgrep only on the files that are tracked by git: rgg() {
readarray -d '' -t FILES < <(git ls-files -z)
rg "${@}" "${FILES[@]}"
}
It speeds up a lot on directories with many binary files and committed dot files. To search the dot files, -uu is needed, but that also tells ripgrep to search the binary files.On repositories with hundreds of files, the git ls-files overhead a bit large. |
|
| ▲ | burntsushi 12 hours ago | parent | next [-] |
| Can you provide a concrete example where that's faster? ripgrep should generally already be approximating `git ls-files` by respecting gitignore. Also, `-uu` tells ripgrep to not respect gitignore and to search hidden files. But ripgrep will still skip binary files. You need `-uuu` to also ignore binary files. I tried playing with your `rgg` function. First problem occurred when I tried it on a checkout the Linux kernel: $ rgg APM_RESUME
bash: /home/andrew/rust/ripgrep/target/release/rg: Argument list too long
OK, so let's just use `xargs`: $ git ls-files -z | time xargs -0 rg APM_RESUME
arch/x86/kernel/apm_32.c
473: { APM_RESUME_DISABLED, "Resume timer disabled" },
include/uapi/linux/apm_bios.h
89:#define APM_RESUME_DISABLED 0x0d
real 0.638
user 0.741
sys 1.441
maxmem 29 MB
faults 0
And compared to just `rg APM_RESUME`: $ time rg APM_RESUME
arch/x86/kernel/apm_32.c
473: { APM_RESUME_DISABLED, "Resume timer disabled" },
include/uapi/linux/apm_bios.h
89:#define APM_RESUME_DISABLED 0x0d
real 0.097
user 0.399
sys 0.588
maxmem 29 MB
faults 0
So do you have an example where `git ls-files -z | xargs -0 rg ...` is faster than just `rg ...`? |
| |
| ▲ | oever 12 hours ago | parent | next [-] | | A checkout of my repository [0] with many pdf and audio files (20GB) is slow with -u. These data files are normally ignored because 1) they are in .gitignore and 2) they are binary. The repository contains CI files in .woodpecker. These are scripts that I'd normally expect to be searching in. Until a week ago I used -uu to do so, but that made rg take over 4 seconds for a search. Using -. brings the search time down to 24ms. git ls-files -z | time xargs -0 rg -w e23
40ms
rg -w. e23
24ms
rgg -w e23
16ms
rg -wuu e23
2754ms
To reproduce this with the given repository, fill it with 20GB of binary files.The -. flag makes this point moot though. [0] https://codeberg.org/vandenoever/rehorse | | |
| ▲ | burntsushi 12 hours ago | parent [-] | | Oh I see now. I now understand that you thought you couldn't convince ripgrep to search hidden files without also searching files typically ignored by gitignore. Thus, `git ls-files`. Yes, now it makes sense. And yes, `-./--hidden` makes it moot. Thanks for following up! |
| |
| ▲ | EnPissant 4 hours ago | parent | prev [-] | | I don't think this is the same thing as using gitignore. It will only search tracked files. For that it can just use the index. I would expect the index to be faster than looking at the fs for listings. | | |
| ▲ | burntsushi 3 hours ago | parent [-] | | I was extremely careful with my wording. Re-quoting, with added emphasis: > ripgrep should generally already be approximating `git ls-files` by respecting gitignore. See also: https://news.ycombinator.com/item?id=45629515 | | |
| ▲ | EnPissant 2 hours ago | parent [-] | | I'm just trying to be helpful, not call you out. I've implemented gitignore aware file scanning before, and it was slower than git native operations when you only care about tracked files. It's the speed that is the part I was speaking too, not the semantics. |
|
|
|
|
| ▲ | oever 12 hours ago | parent | prev | next [-] |
| After writing this comment, I read the man page again and found the -. flag which can be used instead of -uu. Searching in hidden files tracked by git would be great but the overhead of querying git to list all tracked files is probably significant even in Rust. |
|
| ▲ | woodruffw 12 hours ago | parent | prev | next [-] |
| Maybe I’m missing something, but doesn’t ripgrep ignore untracked files in git by default already? |
| |
| ▲ | oever 12 hours ago | parent [-] | | The point is to search hidden files that are tracked by git. An example is CI scripts which are stored in places like .woodpecker, .forgejo, .gitlab-ci-yml. | | |
| ▲ | burntsushi 11 hours ago | parent [-] | | One thing you might consider to make this more streamlined for you is this: $ printf '!.woodpecker\n!.forgejo\n!.gitlab-ci-yml\n' > .rgignore
Or whatever you need to whitelist specific hidden directories/files.For example, ripgrep has `!/.github/` in its `.ignore` file at the root of the repository[1]. By adding the `!`, these files get whitelisted even though they are hidden. Then `rg` with no extra arguments will search them automatically while still ignoring other hidden files/directories. [1]: https://github.com/BurntSushi/ripgrep/blob/38d630261aded3a8e... | | |
|
|
|
| ▲ | kibwen 11 hours ago | parent | prev [-] |
| Is this faster than `git grep`? |
| |
| ▲ | oever 10 hours ago | parent [-] | | No, amazingly (to me) on the repo in question, `git grep` is twice as fast as `ripgrep -w.` or the custom `rgg` function. All are less than 100ms, so fast enough. |
|