Remix.run Logo
conradev 5 days ago

This is super cool! I've always liked Rkyv (https://rkyv.org) but it requires Rust which can be a big lift for a small project. I see this supports binary data (`lite3_val_bytes`) which is great!

eliasdejong 5 days ago | parent [-]

Thank you. Having a native bytes type is non-negotiable for any performance intensive application that cannot afford the overhead of base64 encoding. And yes, Rkyv also implements this idea of indexing serialized data. The main differences are:

1) Rkyv uses a binary tree vs Lite³ B-tree (B-trees are more cache and space efficient).

2) Rkyv is immutable once serialized. Lite³ allows for arbitrary mutations on serialized data.

3) Rkyv is Rust only. Lite³ is a 9.3 kB C library free of dependencies.

4) Rkyv as a custom binary format is not directly compatible with other formats. Lite³ can be directly converted to/from JSON.

I have not benchmarked Lite³ against Rust libraries, though it would be an interesting experiment.

conradev 5 days ago | parent | next [-]

That second point is huge – Rkyv does have limited support for in-place mutation, but it is quite limited!

If you added support for running jq natively, that would be very cool. Lite³ brings the B-trees, jq brings the query parser and bytecode, combined, you get SQLite :P

eliasdejong 5 days ago | parent [-]

Yes, in fact the name Lite³ was chosen because it is lighter than SQLite.

I thought about implementing something like jq or JSON query, and this is very possible. It is like sending a mini-database that can be queried at speeds thousands of times faster than any JSON library is able to parse.

One interesting effect of being a zero-copy format is that the 'parsing speed' can exceed the memory bandwidth of the CPU, since to fulfill a query you do not actually need to parse the entire dataset. You only walk the branches of the tree that are actually required.

I've talked to some other people that have also shown interest in this idea. There doesn't really seem to exist a good schemaless single-file format that supports advanced queries. There is only SQLite and maybe HDF5.

conradev 4 days ago | parent | next [-]

I would very much love that. I like `jq` itself as a standard, even though I don't know how well it maps. Areas where I'd want to use Lite³:

- Microcontrollers. I find myself reaching for https://github.com/siara-cc/sqlite_micro_logger_c/tree/maste... because SQLite is just too big

- Shared memory regions/mapped files. Use it to share state between processes. Could you make mutations across processes/threads lock-free?

- Caching GPU-friendly data (i.e. image cache). I'm not sure if the current API surface/structure is page alignment friendly

eliasdejong 4 days ago | parent [-]

In general jq maps very well to any hierarchical datastructure. One of the maintainers has made 'fq' which supports BSON, MsgPack, Protobuf, CBOR and even media files png, jpg, mp4 etc.

SQLite when compiled for size is 590 kB. But I think a full jq database implementation based on Lite³ would be possible under 100 kB.

Lock-free shared state relies on algorithms that can make clever use of atomic instructions. But you should not have threads write to the same memory regions, because the hardware only allows for 1 core to have a cacheline in a writeable state. If another core attempts a write to the same line, this will immediately invalidate all the other copies. Under high contention the coherency penalty becomes so large that throughput falls through the floor. So basically the algorithms need to do most of the work in separate memory regions, then occasionally coordinate by 'committing' their work via a spinlock or similar.

Lite³ implements some settings for node alignment, but not for user data. It would be possible to create a bytes type with extra alignment guarantees.

cryptonector 5 days ago | parent | prev [-]

Building a jq around something like Lite^3 or JSONB is a very appealing thought.

namibj 5 days ago | parent | prev [-]

1) when did they downgrade? I've stared for hours at that particular code...

2) no you just don't get to move data freely.

3) I don't believe JSON has any place in a system that needs C because it can't handle Rust.

4) JSON can't handle non-tree structures, it's further very limited in expressivity. Rkyv is more of a code gen akin to ASN.1

Happy benchmarking, feel free to use the rkyv benchmark tooling and ensure you have enough link time optimization going on.

taintegral a day ago | parent [-]

(I made rkyv) I took a look at lite3/tron because I was interested in seeing how it approached some of the issues I ran into with rkyv. I ended up with some additional information related to points 1 and 2 I figured I'd write down:

1) You're correct, rkyv uses a B-tree with a default branching factor of 6 [1]. I don't think I've ever implemented a binary tree in any version of rkyv - maybe one of the earliest versions used a sorted vec instead of a B-tree? I hate the B-tree map implementation because it is so complicated, even though it actually used to be even worse in 0.7 when it was implemented as a B+ tree. Sorry for making you read any of it!

2) Arbitrary mutations would definitely be an improvement on rkyv, and I was most interested to see what kind of approach lite3/tron took! Unfortunately growing the buffer is not supported [2] and freed memory isn't reclaimed [3]. These are the two actually difficult problems standing in the way of rkyv getting more comprehensive mutation support, so unfortunately I think there's nothing valuable to take away here.

The issue with growing the buffer is that you invalidate any pointers into the buffer and have to manually fix them up (e.g. by re-crawling the buffer). There's not a decent approach I'm aware of which manages these pointer fixups in a way that limits complexity and avoids really nasty API surfaces.

The issue with memory reclamation is that you effectively have to add an allocator to your buffer, and that allocator has to also be persistable. I actually got this part working in rel while that was still a project worth hacking on [4]. I just ran out of steam and willingness to play the type-tac-toe required to get a safe API out of it. The main problem I ran into there was related to value branding, since you shouldn't be allowed to make references to objects which cross between contiguous buffers.

This project is neat, but it does basically just look like an rkyv-converted `BTreeMap<String, JsonValue>`.

[1]: https://github.com/rkyv/rkyv/blob/main/rkyv/src/collections/... [2]: https://github.com/fastserial/lite3/blob/main/src/lite3.c#L5... [3]: https://lite3.io/design_and_limitations.html#autotoc_md29 (see note at end of section) [4]: https://github.com/rkyv/rel/blob/main/rel_example/src/main.r...