Remix.run Logo
HDThoreaun 3 days ago

No, it is not a judgement call. The query planner should be storing the distributions of the values in every index. This makes it obvious which pushdown to do here. Again, basic stuff. Youre right though not quite simple as "push down as much as possible", it is one step past that.

Jupe 3 days ago | parent | next [-]

Agreed. Isn't this precisely why key statistics (table statistics) are maintained in many DB systems? Essentially, always "push down" the predicate with the worst statistics and always execute (early) the predicates with high selectivity.

I'd be very surprised if virtually every RDBMS doesn't do this already.

jmalicki 3 days ago | parent | prev [-]

Without storing the joint distribution of the values corresponding to the conditions that span multiple tables, it can be hard to know what's a win.

Seattle3503 3 days ago | parent [-]

Postgres by default computes univariate stats for each column and uses those. If this is producing bad query plans, you can extend the statistics to be multivariate for select groups of columns manually. But to avoid combinatorially growth of stats related storage and work, you have to pick the columns by hand.

jmalicki 2 days ago | parent [-]

That still isn't multivariate stats across a join.