Remix.run Logo
gunnarmorling 3 hours ago

They can happen, yes, although this should be a rather rare event (the most common reason would be misconfiguration, such as a K8s pod with too low memory limits). That said, work towards exactly-once has been done [1], utilizing the support for EOS in Kafka Connect (KIP-618).

In particular for Postgres, consumers can detect and reject duplicates really easy though, by tracking a watermark for the {Commit LSN / Event LSN} tuple which is monotonically increasing. So a consumer just needs to compare the value for that tuple from the incoming event to the highest event it has received before. If the incoming value is lower, the event must be a duplicate. We added support for exposing this via the `source.sequence` field a while back upon request by the Materialize team btw.

[1] https://debezium.io/documentation/reference/stable/configura....

umanwizard 2 hours ago | parent [-]

> They can happen, yes, although this should be a rather rare event

For our use case, it didn't matter if it was rare or not: the fact that it could happen at all meant we needed to be robust to it, which basically meant storing the entire database in memory.

> We added support for exposing this via the `source.sequence` field a while back upon request by the Materialize team btw.

Yes, I helped work on this! I'm not sure whether Materialize is still using it (it's been years since I've thought about MZ/Debezium integration) but it was helpful, thanks.