Remix.run Logo
Show HN: Gitignore aware disk usage – both CLI and browser visualization(peoplesgrocers.com)
1 points by marxism 14 hours ago

Code [AGPL-3.0]: https://github.com/peoplesgrocers/disk-usage-cli

Project page with more details: https://peoplesgrocers.com/en/projects/gitignore-aware-disk-...

Screenshots:

- TreeMap: https://peoplesgrocers.com/en/projects/gitignore-aware-disk-...

- Starburst: https://peoplesgrocers.com/en/projects/gitignore-aware-disk-...

- Flamegraph: https://peoplesgrocers.com/en/projects/gitignore-aware-disk-...

- Terminal UI: https://peoplesgrocers.com/en/projects/gitignore-aware-disk-...

I built a simple CLI tool that combines du with .gitignore awareness to help understand what's actually taking up space in your projects. In addition to normal terminal output, it can also show you three views of your disk usage in the browser: a treemap, a starburst diagram, and a flamegraph (all borrowed from esbuild's bundle visualizer).

What makes this different? It lets you toggle between showing only tracked files, only ignored files, or everything - which turned out to be surprisingly useful!

The whole `~/src` directory on my laptop was taking up 300GB. Too much! About 500 repos, most git but some svn and mercurial.

    - ~300 projects from GitHub where I was patching dependencies or contributing upstream
    - ~200 of my own
I got it down to 75GB by running `du | sort`. I kept thinking "there must be a better way to see this data." I wanted to visualize the ignored vs non-ignored files, but existing disk visualization tools [1] (GrandPerspective, spaceman, Treeize, Disk Map) were standalone GUI apps that I couldn't easily modify. Then I remembered how nice the esbuild bundle visualizer looked - so I borrowed hacked that visualization code to handle my ignored/non-ignored data structure, and embedded the that web app right in my CLI tool.

Of that remaining 75GB:

    - Only about 15% (10GB) was actual source code - everything else was build artifacts and caches.
    - Found 2.8GB `.angular` cache directories from old GitHub experiments
    - Discovered ~20GB of nested Rust target directories left over from migrating personal projects to single workspace.
    - Spotted some accidental 50MB binary files in .git/objects that I could clean from history
The tool is written in Rust with TypeScript for the browser visualizations. While researching for this post, I discovered there are already great disk usage tools like dirstat-rs, duf, dua, dust, pdu, ncdu (which are faster and handle larger directories better). What I built is a proof-of-concept to explore whether gitignore-aware visualization is useful. I liked it but thats just me.

Some technical details people might want to know:

    - It respects .git/info/exclude and global .gitignore files
    - Takes about 15 seconds for the web app to render treemap of 1.7M files (600MB of JSON)
      - Parse JSON: 3195.00ms
      - generate treemap viz: 12017.00ms
      - toggle ignored/non-ignored color mode: 1817ms
    - You'll need to build from source (I'm not working on making it installable)
    - Memory usage scales with directory size - probably won't handle multi-TB codebases at all. It has a --depth flag to control memory usage, but its still going to visit all of the files - no matter how deep - to give you total size information.
I'd love to hear what surprising things you find in your projects! Has anyone else built tools for visualizing ignored vs tracked files? Would it be useful to add more categories? Datasets come to mind. I'm also curious if this would be useful as a feature in existing tools like dust or ncdu.

I'm not planning on working on this any further.

[1] https://peoplesgrocers.com/en/projects/gitignore-aware-disk-...