Remix.run Logo
umanwizard 8 hours ago

Perhaps the situation has gotten better since I looked a few years ago, but my experience is the Debezium project doesn’t really guarantee exactly-once delivery. Meaning that if row A is replaced by row B, you might see (A, -1), (A, -1), (B, +1), if for example Debezium was restarted at precisely the wrong time. Then if you’re using this stream to try to keep track of what’s in the database, you will think you have negatively many copies of A.

It sounds silly, but caused enormous headaches and problems for the project I was working on (Materialize), one of whose main use cases is creating incrementally maintained live materialized views on top of replicated Postgres (or MySql) data.

nchmy 5 hours ago | parent | next [-]

Debezium published this doc on Exactly-once delivery with their most recent 3.3.0 version. They dont support it natively, but say it can be achieved via Kafka Connect

https://debezium.io/documentation/reference/stable/configura...

You could probably achieve something similar with the NATS Jetstream sink as well, which has similar capabilities - though I think it doesnt have quite the same guarantees.

I switched to using Debezium a few months ago, after a Golang alternative to Debezium + Kafka Connect - ConduitIO - went defunct. I should have been using Debezium all along, as it is clearly the most mature, most stable option in the space, with the best prospects for long-term survival. Highly recommended, even if it is JVM (though they're currently doing some interesting stuff with Quarkus and GraalVM that might lead to a jvm-free binary at some point)

gunnarmorling 8 hours ago | parent | prev [-]

Debezium generally produces each change event exactly once if there are no unclean connector shut-downs. If that's not the case, I'd consider this a bug which ought to be fixed.

(Disclaimer: I used to lead the Debezium project)

umanwizard 4 hours ago | parent [-]

The problem is that unclean connector shutdowns are a thing that can happen in real life.

gunnarmorling 3 hours ago | parent [-]

They can happen, yes, although this should be a rather rare event (the most common reason would be misconfiguration, such as a K8s pod with too low memory limits). That said, work towards exactly-once has been done [1], utilizing the support for EOS in Kafka Connect (KIP-618).

In particular for Postgres, consumers can detect and reject duplicates really easy though, by tracking a watermark for the {Commit LSN / Event LSN} tuple which is monotonically increasing. So a consumer just needs to compare the value for that tuple from the incoming event to the highest event it has received before. If the incoming value is lower, the event must be a duplicate. We added support for exposing this via the `source.sequence` field a while back upon request by the Materialize team btw.

[1] https://debezium.io/documentation/reference/stable/configura....

umanwizard 2 hours ago | parent [-]

> They can happen, yes, although this should be a rather rare event

For our use case, it didn't matter if it was rare or not: the fact that it could happen at all meant we needed to be robust to it, which basically meant storing the entire database in memory.

> We added support for exposing this via the `source.sequence` field a while back upon request by the Materialize team btw.

Yes, I helped work on this! I'm not sure whether Materialize is still using it (it's been years since I've thought about MZ/Debezium integration) but it was helpful, thanks.