Remix.run Logo
silvestrov 4 days ago

One more optimization idea: instead of the trie mapping to the suffix string directly, then instead make an array of unique suffixes and let the trie map to the index into the array, e.g.

    const suffixes = [",,,", "a,u,u,u", ",,i,s", ",,,s", "i,a,a,a", ...];
and then use the index of this list in the

    var serializedInput = "{e:{n:{ein:0_r: ...
KTibow 4 days ago | parent | next [-]

I (Claude Code) tried this and it actually increased the gzipped size by 100b (3456 -> 3556), only reducing the non-compressed size by 20%, likely because gzip is really good at interning repeated patterns already.

contravariant 4 days ago | parent | prev [-]

You could go a step further by putting the suffixes themselves into the trie and then identifying identical subtrees.

If you can use gzip there's bound to be a clever way of using a suffix array as well, that might end up being better unless you can use an optimised binary format for the tree.