Remix.run Logo
majkinetor 12 hours ago

Anybody with experience in using duckdb to quickly select page of filtered transactions from the single table having a couple of billions of records and let's say 30 columns where each can be filtered using simple WHERE clausule? Lets say 10 years of payment order data. I am wondering since this is not analytical scenario.

Doing that in postgres takes some time, and even simple count(*) takes a lot of time (with all columns indexed)

morganherlocker 10 hours ago | parent | next [-]

I've used duckdb a lot at this scale, and I would not expect something like this to take more than a few seconds, if that. The only slow duckdb queries I have encountered either involve complex joins or glob across many files.

cess11 7 hours ago | parent | prev [-]

I'm not so sure the common index algorithms would work to speed up a count. How often is the table updated? If it's often, and it's also queried often for the count, then run the count somewhat often on a schedule and store the result separately, and if it isn't queried often, do it more seldom.

From what you describe I'd expect a list of column-value pairs under a WHERE to resolve pretty fast if it uses indices and don't fish out large amounts of data at once.