Remix.run Logo
jamesblonde 10 hours ago

Why do we promote articles like this that have nice graphs and are well written, when they should get a grade 'F' as an actual benchmark study. The way it is presented, a casual reader would think Postgres is 2/3rds the performance of Redis. Good god. He even admits Postgres maxxed out its 2 cores, but Redis was bottlenecked by the HTTP server. We need more of an academic, not a hacker, culture for benchmarks.

dizzyVik 10 hours ago | parent | next [-]

There's a reason this is on my blog and not a paper in a journal. This isn't supposed to show the absolute speed of either tool, the benchmark is not set up for that. I do state that redis has more performance on the table in the blog post.

lemagedurage 6 hours ago | parent | next [-]

The main issue is that a reader might mistake Redis as a 2X faster postgres. Memory is 1000X faster than disk (SSD) and with network overhead Redis can still be 100X as fast as postgres for caching workloads.

Otherwise, the article does well to show that we can get a lot of baseline performance either way. Sometimes a cache is premature optimisation.

pigbearpig an hour ago | parent | next [-]

That's the reader's fault then. I see the blog post as the counter to the insane resume-building over-engineered architecture you see at a lot of non-tech companies. Oh, you need a cache for our 25-user internal web application? Let's put an front a redis cluster with elastisearch using an LLM to publish cache invalidation with Kafka.

themgt 3 minutes ago | parent [-]

There's also a sort of anti-everything attitude that gets boring and lazy. Redis is about the simplest thing possible to deploy. This wasn't about "a redis cluster with elastisearch using an LLM" it was just Redis.

I sometimes read this stuff like people explaining how they replaced their spoon and fork with a spork and measured only a 50% decrease in food eating performance. And have you heard of the people with a $20,000 Parisian cutlery set to eat McDonalds? I just can't understand insane fork enjoyers with their over-engineered their dining experience.

phiresky 5 hours ago | parent | prev | next [-]

If your cache fits in Redis then it fits in RAM, if your cache fits in RAM then Postgres will serve it from RAM just as well.

Writes will go to RAM as well if you have synchronous=off.

senorrib 4 hours ago | parent [-]

Not necessarily true. If you're sharing the database with your transaction workload your cache will be paged out eventually.

jgalt212 3 hours ago | parent [-]

This was my take as well, but I'm a MySQL / Redis shop. I really have no idea what tables MySQL has in RAM at any given moment, but with Redis I know what's in RAM.

motorest 5 hours ago | parent | prev [-]

> The main issue is that a reader might mistake Redis as a 2X faster postgres. Memory is 1000X faster than disk (SSD) and with network overhead Redis can still be 100X as fast as postgres for caching workloads.

Your comments suggest that you are definitely missing some key insights onto the topic.

If you, like the whole world, consume Redis through a network connection, it should be obvious to you that network is in fact the bottleneck.

Furthermore, using a RDBMS like Postgres may indeed imply storing data in a slower memory. However, you are ignoring the obvious fact that a service such as Postgres also has its own memory cache, and some query results can and are indeed fetched from RAM. Thus it's not like each and every single query forces a disk read.

And at the end of the day, what exactly is the performance tradeoff? And does it pay off to spend more on an in-memory cache like Redis to buy you the performance Delta?

That's why real world benchmarks like this one are important. They help people think through the problem and reassess their irrational beliefs. You may nitpick about setup and configuration and test patterns and choice of libraries. What you cannot refute are the real world numbers. You may argue they could be better if this and that, but the real world numbers are still there.

Implicated 2 hours ago | parent | next [-]

> If you, like the whole world, consume Redis through a network connection, it should be obvious to you that network is in fact the bottleneck.

Not to be annoying - but... what?

I specifically _do not_ use Redis over a network. It's wildly fast. High volume data ingest use case - lots and lots of parallel queue workers. The database is over the network, Redis is local (socket). Yes, this means that each server running these workers has its own cache - that's fine, I'm using the cache for absolutely insane speed and I'm not caching huge objects of data. I don't persist it to disk, I don't care (well, it's not a big deal) if I lose the data - it'll rehydrate in such a case.

Try it some time, it's fun.

> And at the end of the day, what exactly is the performance tradeoff? And does it pay off to spend more on an in-memory cache like Redis to buy you the performance Delta?

Yes, yes it is.

> That's why real world benchmarks like this one are important.

That's not what this is though. Just about nobody who has a clue is using default configurations for things like PG or Redis.

> They help people think through the problem and reassess their irrational beliefs.

Ok but... um... you just stated that "the whole world" consumes redis through a network connection. (Which, IMO, is wrong tool for the job - sure it will work, but that's not where/how Redis shines)

> What you cannot refute are the real world numbers.

Where? This article is not that.

lossolo 2 hours ago | parent | prev [-]

> If you, like the whole world, consume Redis through a network connection

I think "you are definitely missing some key insights onto the topic". The whole world is a lot bigger than your anecdotes.

a_c 7 hours ago | parent | prev | next [-]

I find your article valuable. It shows me what amount of configuration is needed for a reasonable expectation of performance. In real world, I’m not going to spend effort maxing out configuring a single piece of tool. Not being the most performing config on either of the tools is the least of my concern. Picking either of them, or as you suggested, Postgres, and then worry about getting one billion requests to the service is far more important

rollcat 4 hours ago | parent | prev | next [-]

Thank you for the article.

My own conclusions from your data:

- Under light workloads, you can get away with Postgres. 7k RPS is fine for a lot of stuff.

- Introducing Redis into the mix has to be carefully weighted against increased architectural complexity, and having a common interface allows us to change that decision down the road.

Yeah maybe that's not up to someone else's idea of a good synthetic benchmark. Do your load-testing against actual usage scenarios - spinning up an HTTP server to serve traffic is a step in the right direction. Kudos.

9 hours ago | parent | prev | next [-]
[deleted]
vasco 9 hours ago | parent | prev | next [-]

It's not a paper or a journal but you could at least try to run a decent benchmark. As it is this serves no purpose other than reinforcing whatever point you started with. Didn't even tweak postgres buffers, literally what's the point.

dizzyVik 8 hours ago | parent [-]

I still end up recommending using postgres though, don't I?

pcthrowaway 8 hours ago | parent | next [-]

"I'll use postgres" was going to be your conclusion no matter what I guess?

I mean what if an actual benchmark showed Redis is 100X as fast as postgres for a certain use case? What are the constraints you might be operating with? What are the characteristics of your workload? What are your budgetary constraints?

Why not just write a blog post saying "Unoptimized postgres vs redis for the lazy, running virtualized with a bottleneck at the networking level"

I even think that blog post would be interesting, and might be useful to someone choosing a stack for a proof of concept. For someone who to scale to large production workloads (~10,000 requests/second or more), this isn't a very useful article, so the criticism is fair, and I'm not sure why you're dismissing it off hand.

motorest 5 hours ago | parent | next [-]

> "I'll use postgres" was going to be your conclusion no matter what I guess?

Would it bother you as well if the conclusion was rephrased as "based on my observations, I see no point in rearchitecting the system to improve the performance by this much"?

I think you are too tied to a template solution that not only you don't stop to think why you're using it or even if it is justified at all. Then, when you are faced with observations that challenge your unfounded beliefs, you somehow opt to get defensive? That's not right.

dizzyVik 8 hours ago | parent | prev [-]

I completely agree that this is not relevant for anyone running such workloads, the article is not aimed at them at all.

Within the constraints of my setup, postgres came out slower but still fast enough. I don't think I can quantify what fast enough is though. Is it 1000 req/s? Is it 200? It all depends on what you're doing with it. For many of my hobby projects which see tens of requests per second it definitely is fast enough.

You could argue that caching is indeed redundant in such cases, but some of those have quite a lot of data that takes a while to query.

vasco 8 hours ago | parent | prev [-]

That's the point, you put no effort and decided to do what you had decided already to do before.

dizzyVik 8 hours ago | parent [-]

I don't think this is a fair assessment. Had my benchmarks shown, say, that postgres crumbled under heavy write load then the conclusion would be different. That's exactly why I decided to do this - to see what the difference was.

m000 7 hours ago | parent [-]

Of course you didn't see postgres crumble. This still a toy example of a benchmark. Nobody starts (and even more pays for) a postgres instance to use exclusively as a cache. It is guaranteed that even in the simplest of deployments some other app (if not many of them) will be the main postgres tenant.

Add an app that actually uses postgres as a database, you will probably see its performance crumble, as the app will content the cache for resources.

Nobody asked for benchmarking as rigorous as you would have in a published paper. But toy examples are toy examples, be it in a publication or not.

jamesblonde 10 hours ago | parent | prev [-]

[flagged]

adamhartenz 10 hours ago | parent [-]

That can't have felt great having your tantrum spotlighted by the author.

KronisLV 4 hours ago | parent | prev | next [-]

I feel like the outrage is unwarranted.

> The way it is presented, a casual reader would think Postgres is 2/3rds the performance of Redis.

If a reader cares about the technical choice, they'll probably at least read enough to learn of the benchmarks in this popular use case, or even just the conclusion:

> Redis is faster than postgres when it comes to caching, there’s no doubt about it. It conveniently comes with a bunch of other useful functionality that one would expect from a cache, such as TTLs. It was also bottlenecked by the hardware, my service or a combination of both and could definitely show better numbers. Surely, we should all use Redis for our caching needs then, right? Well, I think I’ll still use postgres. Almost always, my projects need a database. Not having to add another dependency comes with its own benefits. If I need my keys to expire, I’ll add a column for it, and a cron job to remove those keys from the table. As far as speed goes - 7425 requests per second is still a lot. That’s more than half a billion requests per day. All on hardware that’s 10 years old and using laptop CPUs. Not many projects will reach this scale and if they do I can just upgrade the postgres instance or if need be spin up a redis then. Having an interface for your cache so you can easily switch out the underlying store is definitely something I’ll keep doing exactly for this purpose.

I might take an issue with the first sentence (might add "...at least when it comes to my hardware and configuration."), but the rest seems largely okay.

As a casual reader, you more or less just get:

  * Oh hey, someone's experience and data points. I won't base my entire opinion upon it, but it's cool that people are sharing their experiences.
  * If I wanted to use either, I'd probably also need to look into bottlenecks, even the HTTP server, something you might not look into at first!
  * Even without putting in a lot of work into tuning, both of the solutions process a lot of data and are within an order of magnitude when it comes to performance.
  * So as a casual reader, for casual use cases, it seems like the answer is - just pick whatever feels the easiest.
If I wanted to read super serious benchmarks, I'd go looking for those (which would also have so many details that they would no longer be a casual read, short of just the abstract, but them I'm missing out on a lot anyways), or do them myself. This is more like your average pop-sci article, nothing wrong with that, unless you're looking for something else.

Eliminating the bottlenecks would be a cool followup post though!

lelanthran 9 hours ago | parent | prev | next [-]

I'm not seeing your point. This wouldn't get an F, purely because all the parameters are documented.

Conclusions aren't incorrect either, so what's the problem?

m000 7 hours ago | parent [-]

The use case is not representative of a real-life scenario, so the value of the presented results are minimal.

A takeaway could be that you can dedicate a postgres instance for caching and have acceptable results. But who does that? Even for a relatively simple intranet app, your #1 cost when deploying in Google Cloud would probably be running Postgres. Redis OTOH is dirt cheap.

lelanthran 5 hours ago | parent [-]

> The use case is not representative of a real-life scenario, so the value of the presented results are minimal.

Maybe I'm reading the article wrong, but it is representative of any application that uses a PosgreSQL server for data, correct?

In what way is that not a real-life scenario? I've deployed Single monolith + PostgreSQL to about 8 different clients in the last 2.5 years. It's my largest source of income.

Implicated 2 hours ago | parent | next [-]

> I've deployed Single monolith + PostgreSQL to about 8 different clients in the last 2.5 years. It's my largest source of income.

And... do you do that with the default configuration?

lelanthran 2 hours ago | parent [-]

> And... do you do that with the default configuration?

Yes. Internal apps/LoB apps for a large company might have, at most 5k users. PostgreSQL seems to manage it fine, none of my metrics are showing high latencies even when all employees log on in the morning during the same 30m period.

Implicated 2 hours ago | parent [-]

I'm definitely getting the wrong kind of clients.

Kudos to you sir. Sincerely, I'm not hating, I'm actually jealous of the environment being that mellow.

m000 3 hours ago | parent | prev [-]

When you run a relational database, you typically do it for the joins, aggregations, subqueries, etc. So a real-life scenario would include some application actually putting some stress on postgres.

If your don't mind overprovisioning your postgres, yes I guess the presented benchmarks are kind of representative. But they also don't add anything that you didn't know without reading the article.

lelanthran 2 hours ago | parent [-]

> If your don't mind overprovisioning your postgres

Why would I mind it? I'm not using overpriced hosted PostgreSQL, after all.

ENGNR 9 hours ago | parent | prev | next [-]

There’s too many hackers on hacker news!

motorest 10 hours ago | parent | prev | next [-]

> He even admits Postgres maxxed out its 2 cores, but Redis was bottlenecked by the HTTP server.

What exactly is your point? That you can further optimize either option? Well yes, that comes at no suprise. I mean, the latencies alone are in the range of some transcontinental requests. Were you surprised that Redis outperformed Postgres? I hardly think so.

So what's the problem?

The main point that's proven is that there is indeed diminishing returns in terms of performance. For applications where you can afford an extra 20ms when hitting a cache, caching using a persistent database is an option. For some people, it seems this fact was very surprising. That's food for thought, isn't it?

hvb2 9 hours ago | parent [-]

I've done this many times in AWS leveraging dynamodb.

Comes with ttl support (which isn't precise so you still need to check expiration on read), and can support long TTLs as there's essentially no limit to the storage.

All of this at a fraction of the cost of HA redis Only if you need that last millisecond of performance and have done all other optimizations should one consider redis imho

motorest 9 hours ago | parent | next [-]

> I've done this many times in AWS leveraging dynamodb.

Exactly. I think nosql offerings from any cloud provider already supports both TTL and conditional requests out-of-the-box, and the performance of basic key-value CRUD operations is often <10ms.

I've seem some benchmarks advertise memory cache services as having latencies around 1ms. Yeah, this would mean the latency of a database is 10 times higher. But relative numbers matter nothing. What matters is absolute numbers, as they are the ones that drive tradeoff analysis. Does a feature afford an extra 10ms in latency, and is that performance improvement worth paying a premium?

re-thc 9 hours ago | parent | prev [-]

> All of this at a fraction of the cost of HA redis

This depends on your scale. Dynamodb is pay per request and the scaling isn’t as smooth. At certain scales Redis is cheaper.

Then if you don’t have high demand maybe it’s ok without HA for Redis and it can still be cheaper.

hvb2 8 hours ago | parent | next [-]

You would need to get to insane read counts pretty much 24/7 for this to work out.

For HA redis you need at least 6 instances, 2 regions * 3 AZs. And you're paying for all of that 24/7.

And if you truly have 24/7 use then just 2 regions won't make sense as the latency to get to those regions from the other side of the globe easily removes any caching benefit.

odie5533 6 hours ago | parent | next [-]

It's $9/mo for 100 MB of ElastiCache Serverless which is HA.

It's $15/mo for 2x cache.t4g.micro nodes for ElastiCache Valkey with multi-az HA and a 1-year commitment. This gives you about 400 MB.

It very much depends on your use case though if you need multiple regions then I think DynamoDB might be better.

I prefer Redis over DynamoDB usually because it's a widely supported standard.

motorest 4 hours ago | parent [-]

> It's $9/mo for 100 MB of ElastiCache Serverless which is HA.

You need to be more specific with your scenario. Having to cache 100MB of anything is hardly a scenario that involves introducing a memory cache service such as Redis. This is well within the territory of just storing data in a dictionary. Whatever is driving the requirement for Redis in your scenario, performance and memory clearly isn't it.

ahoka 7 hours ago | parent | prev [-]

A 6 node cache and caching in DynamoDB, what the hell happened to the industry? Or people just call every kind of non business-object persistence cache now?

hvb2 5 hours ago | parent [-]

I don't understand your comment.

If you're given the requirement of highly available, how do you not end up with at least 3 nodes? I wouldn't consider a single region to be HA but I could see that argument as being paranoid.

A cache is just a store for things that expire after a while that take load of your persistent store. It's inherently eventually consistent and supposed to help you scale reads. Whatever you use for storage is irrelevant to the concept of offloading reads

motorest 9 hours ago | parent | prev [-]

> At certain scales Redis is cheaper.

Can you specify in which scenario you think Redis is cheaper than caching things in, say, dynamodb.

odie5533 6 hours ago | parent [-]

High read/write and low-ish size. Also it's faster.

motorest 6 hours ago | parent [-]

> High read/write and low-ish size. Also it's faster

You posted a vague and meaningless assertion. If you do not have latency numbers and cost differences, you have absolutely nothing to show for, and you failed to provide any rationale that justified even whether any cache is required at all.

odie5533 5 hours ago | parent [-]

At 10k RPS you'll see a significant cost savings with Redis over DynamoDB.

ElastiCache Serverless (Redis/Memcached): Typical latency is 300–500 microseconds (sub-millisecond response)

DynamoDB On-Demand: Typical latency is single-digit milliseconds (usually between 1–10 milliseconds for standard requests)

hvb2 an hour ago | parent | next [-]

> At 10k RPS

You would've used local memory first. At which point I cannot see getting to those request levels anymore

> ElastiCache Serverless (Redis/Memcached): Typical latency is 300–500 microseconds (sub-millisecond response)

Sure

> DynamoDB On-Demand: Typical latency is single-digit milliseconds (usually between 1–10 milliseconds for standard requests)

I know very little use cases where that difference is meaningful. Unless you have to do this many times sequentially in which case optimizing that would be much more interesting than a single read being .5 ms versus the typical 3 to 4 for dynamo (that last number is based on experience)

motorest 4 hours ago | parent | prev [-]

> At 10k RPS you'll see a significant cost savings with Redis over DynamoDB.

You need to be more specific than that. Depending on your read/write patterns and how much memory you need to allocate to Redis, back of the napkin calculations still point to the fact that Redis can still cost >$1k/month more than DynamoDB.

Did you actually do the math on what it costs to run Redis?

lomase 2 hours ago | parent | prev | next [-]

This site is called Hackernews btw.

positron26 8 hours ago | parent | prev | next [-]

A lot of great benchmarking probably dies inside internal tuning. When we're lucky, we get a blog post, but if the creator isn't incentivized or is even discouraged by an employer from sharing the results, it will never see the light of day.

oulipo2 6 hours ago | parent | prev | next [-]

The main point was not to fully benchmark and compare both, but just to get a rough sense of whether a Postgres cache was fast enough to be useful in practice. The comparison with Redis was more a crutch to get a sense of that, than really something that pretends to be "rock-solid benchmarking"

zer00eyz 9 hours ago | parent | prev | next [-]

You might not have been here 25 years ago when the dot com bubble burst.

A lot of us ate shit to stay in the Bay Area, to stay in computing. I have stories of great engineers doing really crappy jobs and "contracting" on the side.

I couldn't really have a 'startup' out of my house and a slice of rented hosting. Hardware was expensive and nothing was easy. Today I can set up a business and thrive on 1000 users at 10 bucks a month. Thats a viable and easy to build business. It's an achievable metric.

But Im not going to let amazon and its infinite bill you for everything at 2012 prices so it can be profitable hosting be my first choice. Im not going to do that when I can get fixed cost hosting.

For me, all the interesting things going on in tech aren't coming out of FB, Google and hyperscalers. They aren't AI or ML. We dont need another Kubernetes or Kafka or react (no more Conways law projects). There is more interesting work going on down at the bottom. In small 2 and 3 man shops solving their problems on limited time and budget with creative "next step" solutions. Their work is likely more applicable to most people reading HN than another well written engineering blog from cloud flare about their latest massive rust project.

whateveracct 10 hours ago | parent | prev [-]

most people with blogs don't know what they're doing. or don't care to know? sadly they get hired at companies and everyone does what they say cuz they have a blog. i've seen some shit in that department it's wild how much some people really are imposters.

motorest 10 hours ago | parent [-]

> most people with blogs don't know what they're doing. or don't care to know?

I don't see any point to this blend of cynical contrarianism. If you feel you can do better, put your money where your mouth is. Lashing at others because they went through the trouble of sharing something they did is something that's absurd and creates no value.

Also, maintaining a blog doesn't make anyone an expert, but not maintaining a blog doesn't mean you are suddenly more competent than those who do.

whateveracct 8 hours ago | parent [-]

just an observation :)