Not really, replication lag is generally an accepted trade-off. Sync replication is rarely worth it, since you take a 30% performance hit on commits and add more single points of failure.

We will add some replication lag-based routing soon. It will prioritize replicas with the lowest lag to maximize the chance of the query succeeding and remove replicas from the load balancer entirely if they have fallen far behind. Incidentally, removing query load helps them catch up, so this could be used as a "self-healing" mechanism.

▲ jackfischer 4 hours ago | parent [-]

It sounds like this is one of the few places that might be a leaky abstraction in that queries _might_ fail and the failure might effectively be silent?

▲ levkk 4 hours ago | parent [-]

It can be silent, but usually it's loud and confusing because people do something like this (Rails example):

    user = User.create(email: "test@test.com")
    SendWelcomeEmail.perform_later(user.id)

And the job code fetches the row like so:

    user = User.find(id)

This blows up because `find` throws an error if the record isn't there. Job queues typically use replicas for reads. This is a common gotcha: code that runs async expects the data to be there after creation.

There can be others, of course, especially in fintech where you have an atomic ledger, but people are usually pretty conscious about this and send those type of queries to the primary.

In general though, I completely agree, this is leaky and an unsolved problem. You can have performance or accuracy, but not both, and most solutions skew towards performance and make applications handle the lack of accuracy.

	▲	jackfischer 4 hours ago \| parent [-]
		Makes sense, appreciate it