Remix.run Logo
eliasdejong 5 days ago

Yes, in fact the name Lite³ was chosen because it is lighter than SQLite.

I thought about implementing something like jq or JSON query, and this is very possible. It is like sending a mini-database that can be queried at speeds thousands of times faster than any JSON library is able to parse.

One interesting effect of being a zero-copy format is that the 'parsing speed' can exceed the memory bandwidth of the CPU, since to fulfill a query you do not actually need to parse the entire dataset. You only walk the branches of the tree that are actually required.

I've talked to some other people that have also shown interest in this idea. There doesn't really seem to exist a good schemaless single-file format that supports advanced queries. There is only SQLite and maybe HDF5.

conradev 4 days ago | parent | next [-]

I would very much love that. I like `jq` itself as a standard, even though I don't know how well it maps. Areas where I'd want to use Lite³:

- Microcontrollers. I find myself reaching for https://github.com/siara-cc/sqlite_micro_logger_c/tree/maste... because SQLite is just too big

- Shared memory regions/mapped files. Use it to share state between processes. Could you make mutations across processes/threads lock-free?

- Caching GPU-friendly data (i.e. image cache). I'm not sure if the current API surface/structure is page alignment friendly

eliasdejong 4 days ago | parent [-]

In general jq maps very well to any hierarchical datastructure. One of the maintainers has made 'fq' which supports BSON, MsgPack, Protobuf, CBOR and even media files png, jpg, mp4 etc.

SQLite when compiled for size is 590 kB. But I think a full jq database implementation based on Lite³ would be possible under 100 kB.

Lock-free shared state relies on algorithms that can make clever use of atomic instructions. But you should not have threads write to the same memory regions, because the hardware only allows for 1 core to have a cacheline in a writeable state. If another core attempts a write to the same line, this will immediately invalidate all the other copies. Under high contention the coherency penalty becomes so large that throughput falls through the floor. So basically the algorithms need to do most of the work in separate memory regions, then occasionally coordinate by 'committing' their work via a spinlock or similar.

Lite³ implements some settings for node alignment, but not for user data. It would be possible to create a bytes type with extra alignment guarantees.

cryptonector 5 days ago | parent | prev [-]

Building a jq around something like Lite^3 or JSONB is a very appealing thought.