| ▲ | jandrewrogers 2 hours ago | |
Achieving adequate performance at 1T edges in one aspect requires severe tradeoffs in other aspects, making every implementation impractical at that scale. You touched on a couple of the key issues when I was working in this domain. There is no single machine constraint, just the observation that we routinely run non-graph databases at similar scale on single machines without issue. It doesn't scale on in-memory supercomputers either, so the hardware details are unrelated to the problem: - Graph database with good query performance typically has terrible write performance. It doesn't matter how fast queries are if it takes too long to get data into the system. At this scale there can be no secondary indexing structures into the graph; you need a graph cutting algorithm efficient for both scalable writes and join recursion. This was solved. - Graph workloads break cache replacement algorithms for well-understood theory reasons. Avoiding disk just removes one layer of broken caching among many but doesn't address the abstract purpose for which a cache exists. This is why in-memory systems still scale poorly. We've known how to solve this in theory since at least the 1980s. The caveat is it is surprisingly difficult to fully reduce to practice in software, especially at scale, so no one really has. This is a work in progress. - Most implementations use global synchronization barriers when parallelizing algorithms such as BFS, which greatly increases resource consumption while throttling hardware scalability and performance. My contribution to research was actually in this area: I discovered a way to efficiently use error correction algorithms to elide the barriers. I think there is room to make this even better but I don't think anyone has worked on it since. The pathological cache replacement behavior is the real killer here. It is what is left even if you don't care about write performance or parallelization. I haven't worked in this area for many years but I do keep tabs on new graph databases to see if someone is exploiting that prior R&D, even if developed independently. | ||