Remix clone Hacker News

new | show | ask | jobs Github

	▲	jerf 2 hours ago
		I've got a couple of systems that 10-15 years ago needed something like Redis and multiple nodes distributing them but are today just a single node with an in-memory cache that is really just a hash keyed by a string. They're running on a hot/cold spare system. If one of them dies, it takes maybe 30 seconds to fully reconstruct the cache, which theses systems happen to be capable of doing in advance, they don't need to wait for the requests to come in. One thing that I think has gotten lost in the "I need redundant redundancy for my redundantly redundant replicas of my redundantly-distributed resources" world is that you really only need all that for super-real-time systems. Which a lot of things are, such as, all user-facing websites need to be up the moment the user hits them and not 30 seconds later. But when you don't have that constraint, if things can take an extra few minutes or drop some requests and it's not a big deal, you can get away with something a lot cheaper, made even more cheap by the fact that running things on a single node gets you access to a lot of performance you simply can not have in a distributed system because nothing is as fast as the RAM bus being accessed by a single OS process. And sometimes you have enough flexibility to design your system to be that way in the first place instead of accidentally wiring it up to be dependent on complicated redundancy schemes. (Next up after that, if that isn't enough, is the system where you have redundant nodes but you make sure they don't need to cross-talk at all with something like Redis. Observation: If you have two nodes for redundancy, and they are doing something with caching, and the cached values are generally stable for long periods of time, it is often not that big a deal just to let each node have its own in-memory cache and if they happen to recreate a value twice, let them. If you work the math out carefully, depending on your cache utilization profile you often are losing less than you think here (in particular, if the modal result is that you never hit a given cached value again, it's cheap especially if the ones you hit you end up hitting a lot, and if on average you get cached values all the time, the amortized cost of the second computation is nearly nothing, it's only in the "almost always hit them 2 or 3 times" case that this incurs extra expense and that's actually a very, very specific place in the caching landscape), especially since the in-process caching and such is faster on its own terms too which mitigates the problem, especially because you can set it up so you have no serialization costs in this case, and the architectural simplicity can be very beneficial. No, by no means does this work with every system, and it is helpful to scan out into the future to be sure you probably won't ever need to upgrade to a more complicated setup, but there's a lot of redundantly redundant systems that really don't need to be written with such complication because this would have been fine for them.)