▲ | hatthew 4 days ago | ||||||||||||||||
My work involves petabyte scale data, and the algorithms are very straightforward: - What you want to do is probably trivially O(kn). - There isn't a <O(kn) algorithm, so try to reduce overhead in k. - Cache results when they are O(1), don't when they are O(n). - If you want to do something >O(kn), don't. - If you really need to so something >O(kn), do it in SQL and then go do something else while it's running. None of that requires any DSA knowledge beyond what you learn in the first weeks of CS101. Instead, what's useful is knowing how to profile to optimize k, knowing how SQL works, and being able to write high quality maintainable code. Any smart algorithms that have a large time complexity improvement will probably be practically difficult to create and test even if you are very comfortable with the underlying theoretical algorithm. And the storage required for an <O(n) algorithm is probably at least as expensive as the compute required for the naive O(n) algorithm. My general impression is that for small-scale problems, a trustworthy and easy algorithm is fine, even if it's inefficient ($100 of compute < $1000 of labor). For large-scale problems, domain knowledge and data engineering trumps clever DSA skills. The space between small- and large-scale problems is generally either nonexistent or already has premade solutions. The only people who make those "premade solutions" obviously need to feel it in their bones the way you describe, but they're a very very small portion of the total software population, and are not the target audience of this article. | |||||||||||||||||
▲ | fuzztester 4 days ago | parent [-] | ||||||||||||||||
>My work involves As the GP said: >>What someone says on this topic says more about what things they have worked on in their life than anything else. | |||||||||||||||||
|