▲ | jitl 15 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||
No matter how much time I spend contemplating recursive CTE examples, I just cannot “get it” enough to write my own without a lot of trial and error and head-scratching. I would love to take a 2 hour class on “thinking in SQL for hackers” or something, but I haven’t really found anything to improve my mental model from “broken” to “working” so far. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | CGamesPlay 14 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
A prerequisite to understanding is knowing the CTE syntax ("WITH"). That's just a group of SQL queries that can refer to one another. It's extremely useful for making modular SQL queries and has nothing to do with recursion. Then when you use "WITH RECURSIVE", the queries can now refer back to themselves. This is just a for loop over a queue of SQL results, conceptually. The part before the UNION ALL fills the queue to start, and the part after the UNION ALL runs once for each result in the queue, adding all results back into the queue. If you understand this, then you can understand the Sudoku or Mandelbrot examples (definitely don't start with trying to understand these two though). For example, the Sudoku example contains one recursive query, "x(s, ind)". As explained on the page, "s" is the puzzle (unsolved spaces shown as ".") and "ind" is the index of the first unsolved space (or 0 for a solved puzzle). It creates an unsolved puzzle in the initial setup. The for loop body finds all valid values for the first unsolved space, and the next index to solve; then puts all these results into the queue. The final (non-recursive) SELECT in the CTE looks over all results in the queue, and returns the one where the index is 0 (the solved puzzles). | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | cryptonector 10 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
SQL recursive queries are just loops. The CTE is a UNION or UNION ALL query (well, never use UNION ALL unless you really want to loop infinitely!) where one side is the query that seeds the CTE and the other is executed and its outputs added to the CTE, and the last part is repeated until the CTE stops growing. There's just one more detail: the "recursive" side of the UNION query needs to do a JOIN to the CTE itself (thus the "recursion"), but this doesn't change the loop nature of the thing. That's it. It's just a simple loop. (Yes, recursion is looping. But in this case it's very clear that a stack is not needed, that the recursion is "tail recursion" if know what that is.) The hard part lies in getting the JOIN in the "recursive" side of the UNION query right. Here's a transitive closure query:
Here `SELECT parent, child FROM relationships` is the seed, and the rest (`SELECT c.parent, r.child ...`) is the recursive part of the query that gets repeated until the whole set of results from the seed and all the repetitions of the recursive part stops growing. The recursive part simply says to add to the `closure` all the children of children of tuples in the closure, but with the parents from the closure as the parents. So if this were human parent/child relationships then initially you might have your parents' relationships to you, and also yours to your children, and this will find that your parents are ancestors to your children. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | remram 14 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
An crucial point for me was realizing that "recursive CTEs" are not really recursive but better understood as iterative, in other words a loop. The results of each iteration are fed into the next iteration, until no new result is produced. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | danielheath 14 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
“Thinking in sql” is hard because it’s an awkward syntax for the (much simpler) relational algebra. Learn to think in terms of the relational algebra, and how to translate that to/from SQL, and it starts making sense. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | cryptonector 8 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I highly recommend the O'Reilly SQL Pocket Guide (https://www.oreilly.com/library/view/sql-pocket-guide/978149...). Its explanation of CTEs is fantastic, and it does a great job on many other parts of SQL, and it's a short and sweet book you could read in a few hours. And it will server as a reference long after you first absorb it. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | tlarkworthy 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Some things that helped me Every query response in SQL is a rectangular matrix [1]. JOINs add columns sideways. UNIONs add rows vertically[2]. [1] Which is why tree shaped data is awkward to query in SQL in one round-trip. [2] From this you realize why the column names have to match to apply a UNION, and why recursion is something to do with a UNION. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | pawelduda 14 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
It's unusual and not a commonly needed tool in SQL. I always need a quick refresher on how to write these after a longer break | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | hansvm 13 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
For "ordinary" working programmers (some denominator of reasonably common knowledge across the industry without any specific skills that help with CTEs in particular), there are a couple mental models I find helpful: 1. Recursion and iteration are duals of each other. Anywhere a "recursive" CTE is a good tool for the job, there exists a natural for-loop or while-loop you would write in an ordinary programming language to solve the same job. Figure out what that looks like, and then you can translate it to SQL. Optimizing further often isn't necessary. The general form isn't terrible (ellipses hide the extra columns you'd have to manually include in most SQL dialects):
1 (continued). You can explicitly encode for-loops and while-loops, doing all the normal sorts of things an ordinary programming language allows. SQL is just a different language representing all the things you already know how to do; don't make it complicated until performance becomes a problem.2. There exists a concept of "equality saturation" that's exceptionally powerful in a number of domains. Everything in (1) is familiar to an ordinary working programmer, but the database's mental model of a recursive CTE is
2 (continued 0). One model of solving a sudoku puzzle is an iterative/recursive branch-and-bound algorithm (try a reasonable thing, expand all the easily learnable knowledge, exit if impossible or if done (recursing at each level), go to the next reasonable thing). One "equality saturation" version of that solution is (plus a stopping condition somewhere):2a. For each partial potential solution 2b. For all the ways in which the potential solution is viable 2c. Expand into all those new potentialities 2 (continued 1). That ability to describe a monotonically increasing set of things which finitely approaches a boundary is powerful in math, powerful in search, powerful in optimization (the Rust EGG crate (e-graphs good) has reasonable documentation for one particular concrete way in which that idea could manifest, if such concreteness helps you learn -- whether you know/like Rust or not), and so on. Gradient descent is just expanding your information till you don't have much more to learn. Optimizing a program is just expanding the set of optimization-based re-writes till you don't have any more re-writes (and pick the best one). Parsing a document is just adding to the things you know about it till no parsing rules can glean any more information. Since that thinking modality is how the database treats your query, you'll usually have better optimized queries if you can naturally formulate your problem in that language (as opposed to the naive iterative solution I proposed in (1)). Not all problems can be thus phrased, but if you're doing a lot of work in your database with recursive CTEs, it's a good idea to spend a week or three hammering home. 3. Combining (1) and (2) a bit, your database will usually have a cost somewhere around O(all_the_rows_you_produce) when evaluating recursive CTEs. These tasks only get hard when performance is a problem and you have to figure out how to take the naive ideas from (1) and transform them into the expected model of (2) in a way that actually reduces unnecessary work. For the sudoku example, you can do that by adding a bit more internal state to transform the breadth-first-search I described into a depth-first-search (making the solution _more_ iterative, interestingly; the "natural" solution is very slow comparatively), but in general you might have to get very creative or might not actually have a reasonable solution available. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | daelon 15 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I would also love to take that class! | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | nbbaier 15 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Sign me up for this class too! | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | hobs 14 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
It's just going to be practice, recursion in general is annoying to think about. Start simple, have a loop you want to write instead of a recursive thing, write it step by step instead of trying to do it all at once. I was implementing newton raphson a few years ago in SQL (so as not to implement it as a straight loop) and iterating over the problem several times really helped. Get a query. Now think of the next step in the iteration, and you're basically writing a query that connects to that. And now, it runs again and again based on the previous criteria. If you can isolate each part of the query for each logical step you're going to have a much simpler problem to mentally solve. |