Yeah, that's a fair point at first glance. 50GB might not sound like a huge burden for a modern SSD.

However, the 50GB figure was just a starting point for emails. A true "local Jarvis," would need to index everything: all your code repositories, documents, notes, and chat histories. That raw data can easily be hundreds of gigabytes.

For a 200GB text corpus, a traditional vector index can swell to >500GB. At that point, it's no longer a "meager" requirement. It becomes a heavy "tax" on your primary drive, which is often non-upgradable on modern laptops.

The goal for practical local AI shouldn't just be that it's possible, but that it's also lightweight and sustainable. That's the problem we focused on: making a comprehensive local knowledge base feasible without forcing users to dedicate half their SSD to a single index.

▲

notsylver 9 days ago | parent | next [-]

You already need very high end hardware to run useful local LLMs, I don't know if a 200gb vector database will be the dealbreaker in that scenario. But I wonder how small you could get it with compression and quantization on top

	▲	wafflemaker 9 days ago \| parent \| next [-]
		I'm no dev either and still set up remote ssh login to be able to use LaTeX at home PC from my laptop. Also, with many games and dual boot on my gaming PC I still have some space left on my 2TB NVME SSD. And my not enthusiast MOBO could fit two more. It took so much time to install LaTeX and packages, and also so much space, my 128GB drive couldn't handle it.
	▲	mwcz 9 days ago \| parent \| prev \| next [-]
		I've worked in other domains my whole career, so I was astonished this week when we put a million 768-len embeddings into a vector db and it was only a few GB. Napkin math said ~25 GB and intuition said a long list of widely distributed floats would be fairly uncompressable. HNSW is pretty cool.
	▲	OneDeuxTriSeiGo 9 days ago \| parent \| prev \| next [-]
		You can already do A LOT with an SLM running on commodity consumer hardware. Also it's important to consider that the bigger an embedding is, the more bandwidth you need to use it at any reasonable speed. And while storage may be "cheap", memory bandwidth absolutely is not.
	▲	varenc 9 days ago \| parent \| prev \| next [-]
		> You already need very high end hardware to run useful local LLMs A basic macbook can run gpt-oss-20b and it's quite useful for many tasks. And fast. Of course Macs have a huge advantage for local LLMs inference due to their shared memory architecture.
	▲	derefr 9 days ago \| parent \| prev \| next [-]
		The mid-spec 2025 iPhone can run “useful local LLMs” yet has 256GB of total storage. (Sure, this is a spec distortion due to Apple’s market-segmentation tactics, but due to the sheer install-base, it’s still a configuration you might want to take into consideration when talking about the potential deployment-targets for this sort of local-first tech.)
	▲	felarof 2 days ago \| parent \| prev [-]
		You should definitely checkout BrowserOS! -- https://github.com/browseros-ai/BrowserOS

▲

derefr 9 days ago | parent | prev | next [-]

Question: would it be possible to invert the problem? I.e., rather than decreasing the size of the RAG — use the RAG to compress everything other than the RAG index itself.

E.g., design a filesystem so that the RAG index is part of / managed internally within the metadata of the filesystem itself; and then, for each FS inode data-extent, give it two polymorphic on-disk representations:

1. extents hold raw data; rag-vectors are derivatives and updated after extent is updated (as today)

2. rag-vectors are canonical; extents hold residuals from a predictive-coding model that took the rag-vectors as input and tried to regenerate the raw data of the extent. When extent is read [or partially overwritten], use predictive-coding model to generate data from vectors and then repair it with residue (as in modern video-codec p-frame generation.)

———

Of course, even if this did work (in the sense of providing a meaningful decrease in storage use), this storage model would only really be practical for document files that are read entirely on open and atomically overwritten/updated (think Word and Excel docs, PDFs, PSDs, etc), not for files meant to be streamed.

But, luckily, the types of files this technique are amenable to are exactly the same types of files that a “user’s documents” RAG would have any hope of indexing in the first place!

▲

PeterStuer 9 days ago | parent | prev | next [-]

While your aims are undoutably sincere, in practice for the 'local ai' target people building their own rigs usually have. 4TB or more fast ssd storage.

The bottom tier (not meant disparagingly) are people running diffusion models as these do not have the high vram requirements. They generate tons of images or video, going form a one-click instally like Easydiffusion to very sophisticated workflows in comfyui.

For those going the LLM route, which would be your target audience, they quickly run into the problemm that to go beyond toying around, the hardware and software requirements and expertise grows exponential beyong just toying around with small, highly quantized model with small context windows.

Inlight of the typical enthusiast investments in this space, the few TB of fast storage will pale in comparison to the rest of the expenses.

Again, your work is absolutely valuable, it is just that the storage space requirement for the vector store in this particular scenario is not your strongest card to play.

▲

imoverclocked 9 days ago | parent | next [-]

Everyone benefits from focusing on efficiency and finding better ways of doing things. Those people with 4TB+ of fast storage can now do more than they could before as can the "bottom tier."

It's a breath of fresh air anytime someone finds a way to do more with less rather than just wait for things to get faster and cheaper.

▲

PeterStuer 9 days ago | parent [-]

Of course. And I am not arguing against that at all. Just like if someone makes an inference runtime that is 4% faster, I'll take that win. But would it be the decisive factor in my choice? Only if that was my bottleneck, my true constraint.

All I tried to convey was that for most of the people in the presented scenario (personal emails etc.) , a 50 or even 500GB storage requirement is not going to be that primary constraint. So the suggestion was the marketing for this usecase might be better spotlighting also something else.

▲

ricardobeat 9 days ago | parent [-]

You are glossing over the fact that for RAG you need to search over those 500GB+ which will be painfully slow and CPU-intensive. The goal is fast retrieval to add data to the LLM context. Storage space is not the sole reason to minimize the DB size.

▲

brookst 9 days ago | parent | next [-]

You’re not searching over 500GB, you’re searching an index of the vectors. That’s the magic of embeddings and vector databases.

Same way you might have a 50TB relational database but “select id, name from people where country=‘uk’ and name like ‘benj%’ might only touch a few MB of storage at most.

	▲	ricardobeat 8 days ago \| parent [-]
		That’s precisely the point I tried to clear up in the previous comment. The LEANN author proposes to create a 9GB index for a 500GB archive, and the other poster argued that it is not helpful because “storage is cheap”.

▲

9 days ago | parent | prev [-]

[deleted]

▲

brabel 9 days ago | parent | prev [-]

Speak for yourself! If it took me 500GB to store my vectors , on top of all my existing data, it would be a huge barrier for me.

▲

hdgvhicv 9 days ago | parent | next [-]

A 4tb external drive is £100. A 1TB sd card or usb stick a similar cost.

Maybe Im too old to appreciate what “fast” means, but storage doesnt seem an enormous cost once you stripe it.

	▲	mockingloris 9 days ago \| parent [-]
		This "...doesn't seem an enormous cost once you stripe it." gave me an idea. I KNOW that I will come back to link a blog post about it in the future.

▲

xandrius 9 days ago | parent | prev [-]

Maybe time to update your storage?

▲

mattlutze 7 days ago | parent | prev | next [-]

The DGX Spark being just $3-4,000 with 4TB of storage, 128GB unified memory, etc (or the Mac Studio tbh) is a great indicator that Local AI can soon be cheap and, along with the emerging routing and expert mixing strategies, incredibly performant for daily needs.

▲

42lux 6 days ago | parent | prev [-]

That's the size of just two or three triple A games nowadays.