Yes.

But here they are deciding between "pushdown o.status==shipped" and "pushdown u.email==address@", in parallel both, then join (which they already did) or first doing "u.email==address@" then pushing down "u.id==o.user_id" mostly.

This is a judgment call. Their planner is pretty dumb to not know which one is better, but “push down as much as possible” doesn't cut it: you need to actually decide what to push down and why.

▲

HDThoreaun 3 days ago | parent [-]

No, it is not a judgement call. The query planner should be storing the distributions of the values in every index. This makes it obvious which pushdown to do here. Again, basic stuff. Youre right though not quite simple as "push down as much as possible", it is one step past that.

▲

Jupe 3 days ago | parent | next [-]

Agreed. Isn't this precisely why key statistics (table statistics) are maintained in many DB systems? Essentially, always "push down" the predicate with the worst statistics and always execute (early) the predicates with high selectivity.

I'd be very surprised if virtually every RDBMS doesn't do this already.

▲

jmalicki 3 days ago | parent | prev [-]

Without storing the joint distribution of the values corresponding to the conditions that span multiple tables, it can be hard to know what's a win.

▲

Seattle3503 3 days ago | parent [-]

Postgres by default computes univariate stats for each column and uses those. If this is producing bad query plans, you can extend the statistics to be multivariate for select groups of columns manually. But to avoid combinatorially growth of stats related storage and work, you have to pick the columns by hand.

	▲	jmalicki 2 days ago \| parent [-]
		That still isn't multivariate stats across a join.