▲ | geysersam 13 hours ago | |
> Using != or NOT IN (...) is almost always going to be inefficient. Why do you say that? My understanding is that as long as the RHS of NOT IN is constant (in the sense that it doesn't depend on the row) the condition is basically a hash table lookup, which is typically efficient if the lookup table is not massive. What's the more efficient alternative? | ||
▲ | Sesse__ 13 hours ago | parent | next [-] | |
I'm going to assume here that we're talking about a subquery here (SELECT * FROM t1 WHERE x NOT IN ( SELECT x FROM t2 )). If you're just talking about a static list, then the basic problem is the amount of data you get back. :-) The biggest problem with NOT IN is that it has very surprising NULL behavior: Due to the way it's defined, if there is any NULL in the joined-on columns, then _all_ rows must pass. If the column is non-nullable, then sure, you can convert it into an antijoin and optimize it together with the rest of the join tree. If not, it usually ends up being something more complicated. For this reason, NOT EXISTS should usually be preferred. The syntax sucks, but it's much easier to rewrite to antijoin. | ||
▲ | SoftTalker 13 hours ago | parent | prev [-] | |
Because they can't use indexes. If I have a table of several million rows and I want to find rows "WHERE foo NOT IN ('A', 'B', 'C')" that's a full table scan, or possibly an index scan if foo is indexed, unless there are other conditions that narrow it down. |