Remix.run Logo
throwa356262 4 hours ago

"RAM footprint: ~8MB on an empty session, ~12MB when working"

I like this, Claude Code is using multiple gigabytes, which is really annoying on lowend laptops

rel 23 minutes ago | parent | next [-]

I've been trying to migrate over the zed and think they're Agent Client Protocol[1] is pretty neat, I wonder how much memory pressure Claude Code exerts if it is going through that mechanism instead

1: https://zed.dev/acp

all2 2 hours ago | parent | prev | next [-]

I'm building an agent framework in golang and it is extremely light weight. Startup time is under 1/2 second, and RAM usage is really low. I have a 12 year old laptop and it happily runs without slowing down.

There's no reason what is essentially a string concat engine should be slow on any hardware, including old hardware.

tecoholic 4 hours ago | parent | prev | next [-]

Yes. Just this fact is going to make a lot of people try it out.

messh 3 hours ago | parent | prev | next [-]

The memory footprint is great, it allows finally running these coding agents in extra small instances -- say x1 on shellbox.dev

chrisweekly 5 minutes ago | parent [-]

Hmm, if they're this small something like smolmachines (like shellbox, but free and local) might be a great fit.

marknutter 4 hours ago | parent | prev [-]

Isn't that because of the context window size?

gidellav 4 hours ago | parent | next [-]

Hi, I'm the developer of zerostack! No, the memory footprint is not beacuse of the context window size: on my benchmarks, with a 128k context loaded, and it jumped from 8MB (without any chat/context loaded) to 11MB.

The reasons why the memory footprint of zerostack are:

- Rust, and not JS/Python, so no interpreters/VMs on top

- Load-as-needed, so we only allocate things like LLM connectors when needed

- `smallvec` used for most of the array usage of the tool (up to N items are stored in stack)

- `compactstring` used for most of the string usage of the tool (up to N chars are stored in stack)

- `opt-level=z` to force LLVM to optimize for binary size and not for performance (even tho we still beat both in TTFT and in tool use time opencode)

- heavy usage of [LTO](https://en.wikipedia.org/wiki/Interprocedural_optimization#W...)

SatvikBeri 4 hours ago | parent | prev | next [-]

The context window has nothing to do with RAM usage and even if it did, a million tokens of context is maybe 5mb.

vlovich123 13 minutes ago | parent [-]

It has nothing to do with local RAM usage. But a million tokens of LLM context is decidedly not 5mb.

The rough estimate is 2 * L * H_kv * D * bytes per element

Where:

* L = number of layers * H_kv = # of KV heads * D = head dimension * factor of 2 = keys + values

The dominant factor here is typically 2 * H_kv * D since it’s usually at least 2048 bytes. Per token.

For Llama3 7B youre looking at 128gib if you’re context is really 1M (not that that particular model supports a context so big). DeepSeek4 uses something called sparse attention so the above calculus is improved - 1M of context would use 5-10GiB.

But regardless of the details, you’re off by several orders of magnitude.

SwellJoe 4 hours ago | parent | prev [-]

The context window is not on your system. It's on the server with the model. There may be some local prompt caching, of some sort, but you're not locally hosting the context unless you're also locally hosting the model.