Remix.run Logo
Litestream VFS(fly.io)
110 points by emschwartz 2 hours ago | 37 comments
psanford an hour ago | parent | next [-]

Oh hey this is using my go sqlite vfs module[0]. I love it when I find out some code I wrote is useful to others!

[0]: https://github.com/psanford/sqlite3vfs

benbjohnson 32 minutes ago | parent | next [-]

It worked great! Thanks for your work on it.

fragmede 17 minutes ago | parent | prev [-]

that's all we really want in life.

bencornia an hour ago | parent | prev | next [-]

> What we’re doing here is instantaneous point-in-time recovery (PITR), expressed simply in SQL and SQLite pragmas.

> Ever wanted to do a quick query against a prod dataset, but didn’t want to shell into a prod server and fumble with the sqlite3 terminal command like a hacker in an 80s movie? Or needed to do a quick sanity check against yesterday’s data, but without doing a full database restore? Litestream VFS makes that easy. I’m so psyched about how it turned out.

Man this is cool. I love the unix ethos of Litestream's design. SQLite works as normal and Litestream operates transparently on that process.

indigodaddy an hour ago | parent | prev | next [-]

This is awesome. Especially for sqlite db’s that are read only from a website user perspective. My use case would be an sqlite DB that would live on S3 and get updated by cron or some other task runner/automation means (eg some other facility independent of the website that is using the db), and the website would use litestream vfs and just make use of that “read only” (the website will never change or modify the db) db straightup. Can it be used in this described fashion? Also/if so, how will litestream vfs react to the remote db updating itself within this scenario? Will it be cool with that? Also I’m assuming there is or will be Python modules/integration for doing the needful around Litestream VFS?

Currently on this app, I have the Python/flask app just refreshing the sqlite db from a Google spreadsheet as the auth source (via dataframe then convert to sqlite) for the sqlite db on a daily scheduled basis done within the app.

For reference this is the current app: (yes the app is kinda shite but I’m just a sysadmin trying to learn Python!) https://github.com/jgbrwn/my-upc/blob/main/app.py

benbjohnson 30 minutes ago | parent [-]

Author here. Litestream VFS will automatically poll for new back up data every second so it keeps itself up to date with any changes made by the original database.

You don't need any additional code (Python or otherwise) to use the VFS. It will work on the SQLite CLI as is.

darintay 6 minutes ago | parent | prev | next [-]

Does this work with sqlite extensions? If I were using e.g. sqlite-vec or -vss or some other vector search extension would I be able to use litestream to back it up to S3 live, and then litestream-vfs to query it remotely without downloading the whole thing?

simonw 22 minutes ago | parent | prev | next [-]

This is such a clean interface design:

  export LITESTREAM_REPLICA_URL="s3://my-bucket/my.db"
  export AWS_ACCESS_KEY_ID="your-access-key"
  export AWS_SECRET_ACCESS_KEY="your-secret-key"

  sqlite3

  .load litestream.so
  .open file:///my.db?vfs=litestream
  PRAGMA litestream_time = '5 minutes ago'; 
  select * from sandwich_ratings limit 3;
itissid an hour ago | parent | prev | next [-]

Really nice. We should have this as an add-on to https://app.codecrafters.io/courses/sqlite/overview It can probably teach one a lot about the value of good replication and data formats.

If you are not familiar with data systems, havea read DDIA(Designing Data Intensive Applications) Chapter 3. Especially the part on building a database from the ground up — It almost starts with sthing like "Whats the simplest key value store?": `echo`(O(1) write to end of file, super fast) and `grep`(O(n) read, slow) — and then build up all the way to LSMTrees and BTrees. It will all make a lot more sense why this preserves so many of those ideas.

chickensong 16 minutes ago | parent | prev | next [-]

As a sandwich enthusiast, I would like to know more about these sandwich ratings.

orliesaurus an hour ago | parent | prev | next [-]

Been tinkering with litestream... the read-only VFS is neat but I'm curious about eventual write capabilities... using VFS for distributed DBs could unlock some interesting patterns.

ALSO I'm thinking about mixing this with object store caching... maybe combining memfs with remote metadata; would love to see more details on performance.

BUT I might be overthinking it... just excited to see SQLite exploring beyond local files...

reactordev an hour ago | parent [-]

Opens up a whole new can of worms. Transactions come to mind. Who would be responsible for coordinating? If two nodes wrote to the table conflicting information at the same time, who wins?

benbjohnson 28 minutes ago | parent [-]

Author here. We've done some proof-of-concept work on creating distributed leases using S3. We have some use cases internally where we've considered adding write capabilities to the VFS but we haven't started any work on it yet.

skybrian 2 hours ago | parent | prev | next [-]

This sounds pretty cool, but I’m confused about what software being announced. Is there a new release of Litestream?

benbjohnson 2 hours ago | parent | next [-]

Author here. Yes, Litestream v0.5.3 has been released with a new read-only VFS option: https://github.com/benbjohnson/litestream/releases/tag/v0.5....

zackify an hour ago | parent [-]

so how would i connect from a separate machine, i can't figure out from the post or release notes or current litestream website docs, how would i use the extension to do that?

wim an hour ago | parent | prev [-]

I noticed the new release also includes "directory replication support for multi-tenant databases", great addition as well!

hintoftime 2 hours ago | parent | prev | next [-]

Does this mean that I can run an application in K8s via one or many horizontally scaled pods all running off DB in s3? No StatefulSet required?

benbjohnson 2 hours ago | parent | next [-]

Author here. The VFS support right now is currently read only so it's useful for something more like a shared cache of data.

dpedu 2 hours ago | parent | prev [-]

I was doing something similar just the other day and came across sqlite-s3vfs[0]. It is likewise a SQLite VFS that translates IO to S3 api calls. However, that project is only for python and seemingly abandoned. Additionally, if you want more than one writer, you'd have to coordinate that yourself, afaik.

[0]: https://pypi.org/project/sqlite-s3vfs/

born-jre 2 hours ago | parent | prev | next [-]

more goodies nice!

I am going to integrate Litestream into the thing I am going to building[1]. I experimented with a lot of ways, but it turns out there is WebDAV support recently merged, not in the docs.

[1]: https://github.com/blue-monads/potatoverse

dzonga an hour ago | parent | prev | next [-]

dumb question: can this be used for versioned tables then ? what to see the state of a table 1 hour ago ?

benbjohnson 24 minutes ago | parent [-]

Author here. You can query the state of a table from an hour ago with Litestream VFS. It won't give you versioned tables in the sense that every time you update a row that it writes a new version in a table somewhere though.

rekwah an hour ago | parent | prev | next [-]

Now do this with DuckDB.

jauntywundrkind 2 hours ago | parent | prev | next [-]

So much fun streaming/sync/cdc stuff happening, all so cool. Having an underlying FUSE driver doing the Change Data Capture is really neat. This looks like such an incredibly lightweight way to remote-connect to sqlite. And to add a sort of exterior transaction management.

Different use case, but makes me think of sqlite Rewrite-it-it-Rust Turso announcing AgentFS. Here the roles are flipped, sqlite is acting as a file store to back FUSE, to allow watching/transaction-managing the filesystem/what agents are doing. Turso also has a sick CDC system built in, that just writes all changes to a cdc table. Which is related to this whole meta question, of what is happening to my sqlite DB. https://turso.tech/blog/agentfs

tptacek 2 hours ago | parent [-]

Just to be clear, the underlying FUSE thing is LiteFS, not Litestream; nothing described in this post needs a FUSE filesystem, just a small SQLite VFS plugin library.

jauntywundrkind 2 hours ago | parent [-]

Thanks Thomas.

To just drop the relevant paragraph that addresses my un-clarity/in-correctness (and which is super fun to read):

> Litestream v0.5 integrates LTX, our SQLite data-shipping file format. Where earlier Litestream blindly shipped whole raw SQLite pages to and from object storage, LTX ships ordered sets of pages. We built LTX for LiteFS, which uses a FUSE filesystem to do transaction-aware replication for unmodified applications, but we’ve spent this year figuring out ways to use LTX in Litestream, without all that FUSE drama.

tptacek 2 hours ago | parent [-]

You got it. Yeah, Ben built LiteFS a year or two ago, which is why he designed LTX. But using LiteFS required people to set up FUSE filesystems, which was too complicated for a lot of people, so Litestream got way more uptake. This past year he's been harvesting all the good stuff from LiteFS that doesn't require FUSE and building it into Litestream.

The easiest way so far to understand the split between Litestream and LiteFS: Litestream is an operational tool, for backup and restore. LiteFS is a method for doing online leader/follower replica clusters.

petcat 2 hours ago | parent | prev [-]

Are people still trying to shoehorn sqlite to run in a server-side context? I thought that was a fad that everyone gave up on.

ashish01 2 hours ago | parent | next [-]

I use Litestream for near real-time backups. Does not change how SQLite is used on the server, just a replacement for .backup

mhitza 2 hours ago | parent | prev | next [-]

No, it's still pretty cool, easy to use with low operational complexity in low volume read-mostly projects: CMSs, blogs, ecommerce platforms.

9rx 2 hours ago | parent | prev | next [-]

People are building DBMSes and, instead of writing the engine from scratch, are choosing an off-the-shelf solution that integrates into a DBMS with ease.

A better question to ask is why the world needs yet another DBMS, but the reasons are no doubt valid.

0xbadcafebee an hour ago | parent | prev | next [-]

I am a heavy skeptic of this thing, but I can see a good use case for it: S3 I/O, ephemeral compute (1 instance), versioned blobs. The first two allow you to abstract the data away from the compute (flexibility), and the third lets you recover from mistakes or bugs quicker (or do immutable migrations easier).

I think the devil's in the details though. I expect a high number of unusual bugs due to the novel code, networking, and multiple abstractions. I'd need to trial it for a year before I called it reliable.

jtbayly 2 hours ago | parent | prev | next [-]

I am. Super simple. Super cheap. Great dev experience. Want to know whether the migration is going to work? Just download the prod db locally and test it. I'm happy.

christophilus 2 hours ago | parent [-]

Works for very small prod databases, I guess.

tptacek 2 hours ago | parent [-]

We use it internally for some rather large databases. It's not database size that matters, it's usage pattern.

jauntywundrkind 2 hours ago | parent | prev [-]

For things like config management I feel like it makes all the sense in the world. Whomever the primary is can soak some infrequent-ish write-load. Then the whole DB can quickly copy to where it's needed, or, in lite stream VFS 's case, even less needs to be shipped.