| ▲ | leprechaun1066 6 hours ago | |
The aj function at its heart is a bin (https://code.kx.com/q/ref/bin/) search between the two tables, on the requested columns, to find the indices of the right table to zip onto the left table.
becomes
The rest of the aj function internals are there to handle edge cases, handling missing columns and options for filling nulls.A lot of the joins can be distilled to the core operators/functions in a similar manner. For example the plus-join is | ||
| ▲ | chrisaycock 5 hours ago | parent [-] | |
Indeed, my very first attempt used numpy.searchsorted: https://numpy.org/doc/2.2/reference/generated/numpy.searchso... I couldn't figure-out how Arthur's bin matched on symbol though, so I switched to a linear scan on the right table to record the last-seen index for each "by" element. While it worked, my hash table was messy because I relied on Python to handle a whole tuple as a key, which had some issues during initial testing. The asof join I wrote for Empirical properly categorizes the keys before they are matched. That approach worked far better. | ||