Remix.run Logo
TekMol 3 days ago

From my perspective on databases, two trends continued in 2025:

1: Moving everything to SQLite

2: Using mostly JSON fields

Both started already a few years back and accelerated in 2025.

SQLite is just so nice and easy to deal with, with its no-daemon, one-file-per-db and one-type-per value approach.

And the JSON arrow functions make it a pleasure to work with flexible JSON data.

delaminator 3 days ago | parent | next [-]

From my perspective, everything's DuckDB.

Single file per database, Multiple ingestion formats, full text search, S3 support, Parquet file support, columnar storage. fully typed.

WASM version for full SQL in JavaScript.

sanderjd 2 days ago | parent | next [-]

This is a funny thread to me because my frustration is at the intersection of your comments: I keep wanting sqlite for writes (and lookups) and duckdb for reads. Are you aware of anything that works like this?

nlittlepoole 2 days ago | parent | next [-]

DuckDB can read/write SQLite files via extension. So you can do that now with DuckDB as is.

https://duckdb.org/docs/stable/core_extensions/sqlite

sanderjd 2 days ago | parent [-]

My understanding is that this is still too slow for quick inserts, because duckdb (like all columnar stores) is designed for batches.

theanonymousone 2 days ago | parent [-]

The way I understood it, you can do your inserts with SQLite "proper", and simultaneously use DuckDB for analytics (aka read-only).

sanderjd 2 days ago | parent [-]

Aha! That makes so much sense. Thank you for this.

Edit: Ah, right, the downside is that this is not going to have good olap query performance when interacting directly with the sqlite tables. So still necessary to copy out to duckdb tables (probably in batches) if this matters. Still seems very useful to me though.

dietr1ch 2 days ago | parent [-]

Analytics is done in "batches" (daily, weekly) anyways, right?

We know you can't get both, row and column orders at the same time, and that continuously maintaining both means duplication and ensuring you get the worst case from both worlds.

Local, row-wise writing is the way to go for write performance. Column-oriented reads are the way to do analytics at scale. It seems alright to have a sync process that does the order re-arrangement (maybe with extra precomputed statistics, and sharding to allow many workers if necessary) to let queries of now historical data run fast.

delaminator 2 days ago | parent | next [-]

It's not just about row versus column. OLAPs are potentially denormalised as well, and sometimes pre-aggregation, such as rolling up by day, by customer.

If you really need to get performance you'll be building a star schema.

sanderjd 2 days ago | parent | prev [-]

Not all olap-like queries are for daily reporting.

I agree that the basic architecture should be row order -> delay -> column order, but the question (in my mind) is balancing the length of that delay with the usefulness of column order queries for a given workload. I seem to keep running into workloads that do inserts very quickly and then batch reads on a slower cadence (either in lockstep with the writes, or concurrently) but not on the extremely slow cadence seen in the typical olap reporting type flow. Essentially, building up state and then querying the results.

I'm not so sure about "continuously maintaining both means duplication and ensuring you get the worst case from both worlds". Maybe you're right, I'm just not so sure. I agree that it's duplicating storage requirements, but is that such a big deal? And I think if fast writes and lookups and fast batch reads are both possible at the cost of storage duplication, that would actually be the best case from both worlds?

I mean, this isn't that different conceptually from the architecture of log-structured merge trees, which have this same kind of "duplication" but for good purpose. (Indeed, rocksdb has been the closest thing to what I want for this workload that I've found; I just think it would be neat if I could use sqlite+duckdb instead, accepting some tradeoffs.)

dietr1ch 2 days ago | parent [-]

> the question (in my mind) is balancing the length of that delay with the usefulness of column order queries for a given workload. I seem to keep running into workloads that do inserts very quickly and then batch reads on a slower cadence (either in lockstep with the writes, or concurrently) but not on the extremely slow cadence seen in the typical olap reporting type flow. Essentially, building up state and then querying the results.

I see. Can you come up with row/table watermarks? Say your column store is up-to-date with certain watermark, so any query that requires freshness beyond that will need to snoop into the rows that haven't made it into the columnar store to check for data up to the required query timestamp.

In the past I've dealt with a system that had read-optimised columnar data that was overlaid with fresh write-optimised data and used timestamps to agree on the data that should be visible to the queries. It continuously consolidated data into the read-optimised store instead of having the silly daily job that you might have in the extremely slow cadence reporting job you mention.

You can write such a system, but in reality I've found it hard to justify building a system for continuous updates when a 15min delay isn't the end of the world, but it's doable if you want it.

> I'm not so sure about "continuously maintaining both means duplication and ensuring you get the worst case from both worlds". Maybe you're right, I'm just not so sure. I agree that it's duplicating storage requirements, but is that such a big deal? And I think if fast writes and lookups and fast batch reads are both possible at the cost of storage duplication, that would actually be the best case from both worlds?

I mean that if you want both views in a consistent world, then writes will bring things to a crawl as both, row and column ordered data needs to be updated before the writing lock is released.

sanderjd 2 days ago | parent [-]

Yes! We're definitely talking about the same thing here! Definitely not thinking of consistent writes to both views.

Now that you said this about watermarks, I realize that this is definitely the same idea as streaming systems like flink (which is where I'm familiar with watermarks from), but my use cases are smaller data and I'm looking for lower latency than distributed systems like that. I'm interested in delays that are on the order of double to triple digit milliseconds, rather than 15 minutes. (But also not microseconds.)

I definitely agree that it's difficult to justify building this, which is why I keep looking for a system that already exists :)

SchwKatze 2 days ago | parent | prev [-]

I think you could build an ETL-ish workflow where you use SQLite for OLTP and DuckDB for OLAP, but I suppose it's very workload dependent, there are several tradeoffs here.

sanderjd 2 days ago | parent [-]

Right. This is what I want, but transparently to the client. It seems fairly straightforward, but I keep looking for an existing implementation of it and haven't found one yet.

swyx 2 days ago | parent | prev [-]

very interesting. whats the vector indexing story like in duckdb these days?

also are there sqlite-duckdb sync engines or is that an oxymoron

cfors 2 days ago | parent [-]

https://duckdb.org/docs/stable/core_extensions/vss

It's not bad if you need something quick. I haven't had a large need of ANN in duckdb since it's doing more analytical/exploratory needs, but it's definitely there if you need it.

DrBazza 2 days ago | parent | prev | next [-]

From my perspective - do you even need a database?

SQLite is kind-of the middle ground between a full fat database, and 'writing your own object storage'. To put it another way, it provides 'regularised' object access API, rather than, say, a variant of types in a vector that you use filter or map over.

TekMol 2 days ago | parent [-]

If I would write my own data storage I would re-implement SQLite. Why would I want to do that?

trevor-e 2 days ago | parent [-]

Not sure if this is quite what you are getting at, but the SQLite folks even mention this as a great use-case: https://www.sqlite.org/appfileformat.html

kopirgan 3 days ago | parent | prev | next [-]

As a backend database that's not multi user, how many web connections that do writes can it realistically handle? Assuming writes are small say 100+ rows each?

Any mitigation strategy for larger use cases?

Thanks in advance!

loxs 3 days ago | parent | next [-]

After 2 years in production with a small (but write heavy) web service... it's a mixed bag. It definitely does the job, but not having a DB server does have not only benefits, but also drawbacks. The biggest being (lack of) caching the file/DB in RAM. As a result I have to do my own read caching, which is fine in Rust using the mokka caching library, but it's still something you have to do yourself, which would otherwise come for free with Postgres. This of course also makes it impossible to share the cache between instances, doing so would require employing redis/memcached at which point it would be better to use Postgres.

It has been OK so far, but definitely I will have to migrate to Postgres at one point, rather sooner than later.

TekMol 2 days ago | parent | next [-]

How would caching on the db layer help with your web service?

In my experience, caching makes most sense on the CDN layer. Which not only caches the DB requests but the result of the rendering and everything else. So most requests do not even hit your server. And those that do need fresh data anyhow.

loxs 2 days ago | parent [-]

As I said, my app is write heavy. So there are several separate processes that constantly write to the database, but of course, often, before writing, they need to read in order to decide what/where to write. Currently they need to have their own read cache in order to not clog the database.

The "web service" is only the user facing part which bears the least load. Read caching is useful there too as users look at statistics, so calculating them once every 5-10 minutes and caching them is needed, as that requires scanning the whole database.

A CDN is something I don't even have. It's not needed for the amount of users I have.

If I was using Postgres, these writer processes + the web service would share the same read cache for free (coming from Posgres itself). The difference wouldn't be huge if I would migrate right now, but now I already have the custom caching.

kopirgan 2 days ago | parent | prev [-]

I am no expert, but SQLite does have in memory store? At least for tables that need it..ofc sync of the writes to this store may need more work.

WJW 3 days ago | parent | prev | next [-]

Couple thousand simultaneous should be fine, depending on total system load, whether you're running on spinning disks or on SSDs, p50/99 latency demands and of course you'd need to enable the WAL pragma to allow simultaneous writes in the first place. Run an experiment to be sure about your specific situation.

laurencerowe 2 days ago | parent [-]

You also need BEGIN CONCURRENT to allow simultaneous write transactions.

https://www.sqlite.org/src/doc/begin-concurrent/doc/begin_co...

TekMol 3 days ago | parent | prev [-]

Why have multiple connections in the first place?

If your writes are fast, doing them serially does not cause anyone to wait.

How often does the typical user write to the DB? Often it is like once per day or so (for example on hacker news). Say the write takes 1/1000s. Then you can serve

    1000 * 60 * 60 * 24 = 86 million users
And nobody has to wait longer than a second when they hit the "reply" button, as I do now ...
frje1400 3 days ago | parent | next [-]

> If your writes are fast, doing them serially does not cause anyone to wait.

Why impose such a limitation on your system when you don't have to by using some other database actually designed for multi user systems (Postgres, MySQL, etc)?

TekMol 3 days ago | parent [-]

Because development and maintenance faster and easier to reason about. Increasing the chances you really get to 86 million daily active users.

frje1400 2 days ago | parent [-]

So in this solution, you run the backend on a single node that reads/writes from an SQLite file, and that is the entire system?

withinboredom 2 days ago | parent [-]

Thats basically how the web started. You can serve a ridiculous number of users from a single physical machine. It isn't until you get into the hundreds-of-millions of users ballpark where you need to actually create architecture. The "cloud" lets you rent a small part of a physical machine, so it actually feels like you need more machines than you do. But a modern server? Easily 16-32+ cores, 128+gb of ram, and hundreds of tb of space. All for less than 2k per month (amortized). Yeah, you need an actual (small) team of people to manage that; but that will get you so far that it is utterly ridiculous.

Assuming you can accept 99% uptime (that's ~3 days a year being down), and if you were on a single cloud in 2025; that's basically last year.

kopirgan 2 days ago | parent [-]

I agree...there is scale and then there is scale. And then there is scale like Facebook.

We need not assume internet FB level scale for typical biz apps where one instance may support a few hundred users max. Or even few thousand. Over engineering under such assumptions is likely cost ineffective and may even increase surface area of risk. $0.02

downsplat a day ago | parent [-]

It goes much further than that.. a single moderately sized VPS web server can handle millions of hard-to-cache requests per day, all hitting the db.

Most will want to use a managed db, but for a real basic setup you can just run postgres or mysql on the same box. And running your own db on a separate VPS is not hard either.

kopirgan 2 days ago | parent | prev | next [-]

That depends on the use case. HN is not a good example. I am referring to business applications where users submit data. Ofc in these cases we are looking at 00s not millions of users. The answer is good enough.

nijave 2 days ago | parent | prev [-]

>How often does the typical user write to the DB

Turns out a lot when you have things like "last accessed" timestamps on your models.

Really depends on the app

I also don't think that calculation is valid. Your users aren't going to be purely uniformly accessing the app over the course of a day. Invariably you'll have queuing delays above a significantly smaller user count (but maybe the delays are acceptable)

andrewinardeer 3 days ago | parent | prev | next [-]

Pardon my ignorance, yet wasn't the prevailing thought a few years ago that you would never use SQLite in production? Has that school of thought changed?

WJW 2 days ago | parent | next [-]

SQlite as a database for web services had a little bit of a boom due to:

1. People gaining newfound appreciation of having the database on the same machine as the web server itself. The latency gains can be substantial and obviously there are some small cost savings too as you don't need a separate database server anymore. This does obviously limit you to a single web server, but single machines can have tons of cores and serve tens of thousands of requests per second, so that is not as limiting as you'd think.

2. Tools like litestream will continuously back up all writes to object storage, so that one web server having a hardware failure is not a problem as long as your SLA allow downtimes of a few minutes every few years. (and let's be real, most small companies for which this would be a good architecture don't have any SLA at all)

3. SQLite has concurrent writes now, so it's gotten much more performant in situations with multiple users at the same time.

So for specific use cases it can be a nice setup because you don't feel the downsides (yet) but you do get better latency and simpler architecture. That said, there's a reason the standard became the standard, so unless you have a very specific reason to choose this I'd recommend the "normal" multitier architectures in like 99% of cases.

pixelesque 2 days ago | parent | next [-]

> SQLite has concurrent writes now

Just to clarify: Unless I've missed something, this is only with WAL mode and concurrent reads at the same time as writes, I don't think it can handle multiple concurrent writes at the same time?

WJW 2 days ago | parent | next [-]

As I understand it, there can be concurrent writes as long as they don't touch the same data (the same file system pages, to be exact). Also, the actual COMMIT part is still serialized and you need to begin your transactions with BEGIN CONCURRENT. If two transactions do conflict, the later one will be forced to ROLLBACK although you can still try again. It is up to the application to do this.

See also https://www.sqlite.org/src/doc/begin-concurrent/doc/begin_co...

This type of limitation is exactly why I would recommend "normal" server-based databases like Postgres or MySQL for the vast majority of web backends.

asa400 2 hours ago | parent [-]

Also just a note: BEGIN CONCURRENT is not in mainline SQLite releases. You need to build your own from a branch. Not a huge deal but just something to note.

giovannibonetti 2 days ago | parent | prev [-]

I think only Turso — SQLite rewritten in Rust — supports that.

chasd00 2 days ago | parent | prev [-]

I’m a fan of SQLite but just want to point out there’s no reason you can’t have Postgres or some other rdbms on the same machine as the webserver too. It’s just another program running in the background bound to a port similar to the web server itself.

lpil 3 days ago | parent | prev | next [-]

SQLite is likely the most widely used production database due to its widespread usage in desktop and mobile software, and SQLite databases being a Library of Congress "sustainable format".

zerr 2 days ago | parent [-]

Most of the usage was/is as a local ACID-compliant replacement for txt/ini/custom local/bundled files though.

em500 2 days ago | parent | prev | next [-]

"Production" can mean many different things to different people. It's very widely used as a backend strutured file format in Android and iOS/macOS (e.g. for appls like Notes, Photos). Is that "production"? It's not widely used and largely inappropriate for applications with many concurrent writes.

Sqlite docs has a good overview of appropriate and inappropriate uses: https://sqlite.org/whentouse.html It's best to start with Section 2 "Situations Where A Client/Server RDBMS May Work Better"

almost 2 days ago | parent | prev | next [-]

The reason you heard that was probably because they were talking about a more specific circumstance. For example SQLite is often used as a database during development in Django projects but not usually in production (there are exceptions of course!). So you may have read when setting up Django, or a similar thing, that the SQLite option wasn't meant for production because usually you'd use a database like Postgres for that. Absolutely doesn't mean that SQLite isn't used in production, it's just used for different things.

andrewinardeer 2 days ago | parent [-]

You are right. Thanks!

scott_w 3 days ago | parent | prev [-]

Only for large scale multiple user applications. It’s more than reasonable as a data store in local applications or at smaller scales where having the application and data layer on the same machine are acceptable.

If you’re at a point where the application needs to talk over a network to your database then that’s a reasonable heuristic that you should use a different DB. I personally wouldn’t trust my data to NFS.

kunley 2 days ago | parent [-]

What is a "local application"?

loxs 2 days ago | parent [-]

Funny how people used to ask "what is a cloud application", and now they ask "what is a local application" :-)

Local as in "desktop application on the local machine" where you are the sole user.

scott_w 2 days ago | parent | next [-]

This, though I think other posters have pointed to a web app/site that’s backed by SQLite. It can be a perfectly reasonable approach, I think, as the application is the web server and it likely accesses SQLite on the same machine.

kunley 2 days ago | parent | prev [-]

That commenter's idea clearly wasn't about desktop application on a local machine. That is why I asked.

scott_w a day ago | parent [-]

You mean Andrew's comment? I took it in the broadest sense to try and give a more complete answer.

quotemstr 2 days ago | parent | prev | next [-]

FWIW (and this is IMHO of course) DuckDB makes working with random JSON much nicer than SQLite, not least because I can extract JSON fields to dense columnar representations and do it in a deterministic, repeatable way.

The only thing I want out of DuckDB core at this point is support for overriding the columnar storage representation for certain structs. Right now, DuckDB decomposes structs into fields and stores each field in a column. I'd like to be able to say "no, please, pre-materialize this tuple subset and store this struct in an internal BLOB or something".

randomtoast 2 days ago | parent | prev | next [-]

I would say SQLite when possible, PostgreSQL (incl. extensions) when necessary, DuckDB for local/hobbyist data analysis and BigQuery (often TB or PB range) for enterprise business intelligence.

odie5533 3 days ago | parent | prev | next [-]

For as much talk as I see about SQLite, are people actually using it or does it just have good marketers?

asa400 an hour ago | parent | next [-]

Do you have a specific use case you're curious about? It's the most widely deployed database software of all time. https://sqlite.org/mostdeployed.html

SJMG 2 days ago | parent | prev | next [-]

It's the standard for mobile. That said, in server-side enterprise computing, I know no one who uses it. I'm sure there are applications, but in this domain you'd need a good justification for not following standard patterns.

I have used DuckDB on an application server because it computes aggregations lightning fast which saved this app from needing caching, background services and all the invalidation and failure modes that come with those two.

TekMol 3 days ago | parent | prev | next [-]

Among people who can actually code (in contrast to just stitch together services), I see it used all around.

For someone who openly describes his stack and revenue, look up Pieter Levels, how he serves hundreds of thousands of users and makes millions of dollars per year, using SQLite as the storage layer.

sgbeal 2 days ago | parent | prev | next [-]

> are people actually using it or does it just have good marketers?

_You_ are using it right this second. It's storing your browser's bookmarks (at a minimum, and possibly other browser-internal data).

greenavocado 2 days ago | parent | prev | next [-]

If you use desktops, laptops, or mobile phones, there is a very good chance you have at least ten SQLite databases in your possession right now.

CyberDildonics 2 days ago | parent | prev [-]

It is fantastic software, have you ever used it?

odie5533 2 days ago | parent [-]

I don't have a use case for it. I've used it a tiny bit for mocking databases in memory, but because it's not fully Postgres, I've switched entirely to TestContainers.

CuriouslyC 2 days ago | parent | prev | next [-]

I think the right pattern here is edge sharding of user data. Cloudflare makes this pretty easy with D1/Hyperdrive.

phendrenad2 2 days ago | parent | prev [-]

Man, I hope so. Bailing people out of horribly slow NoSQL databases is good business.