| ▲ | samlambert 4 days ago |
| If anyone has questions about our Postgres product please feel free to ask. I will be around to answer. |
|
| ▲ | bri3d 4 days ago | parent | next [-] |
| * I saw your benchmark page at https://planetscale.com/benchmarks/aurora ; do you have something similar for Aurora Serverless? * Do you support something like Aurora Fast Cloning (whether a true CoW fast clone or detaching a replica _without_ promoting it into its own cluster / branch with its own replicas, incurring cost)? * Can PlanetScale Postgres set `max_standby_streaming_delay` to an indefinite amount? * The equivalent of Aurora blue/green would be to make a branch and then switch branches, right? |
| |
| ▲ | samlambert 4 days ago | parent [-] | | it should be pretty much the same for aurora serverless and likely even cheaper. we see some astronomically expensive auorora serverless bills. We have not made max_standby_streaming_delay configurable yet. What's your use case? I don't fully parse your question about blue/green. can you expand your question please? is this for online updrades? | | |
| ▲ | bri3d 4 days ago | parent [-] | | > I don't fully parse your question about blue/green. can you expand your question please? is this for online updrades? Online upgrade or migration DDL - both use cases. I think Amazon's blue/green is effectively the same thing as your "branch-and-commit" strategy for schema migration. I was just looking for whether there's a significant difference. > We have not made max_standby_streaming_delay configurable yet. What's your use case? This goes with >> Do you support something like Aurora Fast Cloning (whether a true CoW fast clone or detaching a replica _without_ promoting it into its own cluster / branch with its own replicas, incurring cost)? The use case is mixing transaction processing and long-running queries/analytics in the same database using read replicas. The easiest way to do this in a native Postgres cluster is by using a "soft-real-time" read-replica with max_standby_streaming_delay set to `-1`, which is allowed to fall behind worst-case by the duration of a dashboard query and then keep up again. This doesn't work in environments with more esoteric SAN-based replication strategies like Aurora, where max_standby_streaming_delay can't go beyond 30 seconds. In this case we have to use some combination of strategies: making CoW clones for each analytics task, architecting data to avoid leader/replication conflicts, for example by using partitioning, retrying replica queries until they don't access hot rows, or falling back to traditional analytics/data warehouse solutions at the application or logical replication layer. Not having to do these things would be a nice benefit over Aurora. |
|
|
|
| ▲ | Ozzie_osman 3 days ago | parent | prev | next [-] |
| Any recommendations on how to best test our current workload of reads and writes? Also, if we are pretty certain we will need horizontal partitioning / sharding, would it be better to wait for Neki before considering a move? For context we are on Aurora Postgres right now, with several read replicas. |
|
| ▲ | endorphine 4 days ago | parent | prev | next [-] |
| Since NVMe nodes are ephemeral on GCP, would you suggest SaaS with critical customer data to use Metal or persistent disks? |
| |
| ▲ | rcrowley 4 days ago | parent | next [-] | | PlanetScale always maintains three (or more, if you want) copies of the data and only acknowledges writes after they've been written to at least two cloud provider availability zones. Durability is provided via replication at the database layer rather than hidden in the slow network-attached block device. Most of our largest customers use Metal for their most critical data and all of them saw their 99th percentile latency plummet when they migrated. I did an interview all about PlanetScale Metal a couple of months ago: <https://www.youtube.com/watch?v=3r9PsVwGkg4> | |
| ▲ | 4 days ago | parent | prev [-] | | [deleted] |
|
|
| ▲ | the_mitsuhiko 4 days ago | parent | prev | next [-] |
| If neki becomes available later, do you expect that customers will be able to easily migrate over to it? |
| |
| ▲ | samlambert 4 days ago | parent [-] | | we will aim to make it as easy as possible and doable as an online process. with sharding there is always a chance that some application changes are needed so there might be some work required there. |
|
|
| ▲ | attentionstinks 4 days ago | parent | prev | next [-] |
| How should one decide whether to go with MySQL or Postgres for a greenfield project? |
| |
| ▲ | add-sub-mul-div 4 days ago | parent | next [-] | | Pre-existing expertise with MySQL and lack of time or inclination to learn something new is the only reason I could think of not to go with Postgres. | |
| ▲ | jedberg 4 days ago | parent | prev | next [-] | | At this point I'm not sure why anyone would choose MySQL. Any advantage it had pretty much evaporated with these hosted solutions. For example, MySQL was easier to get running and connect to. These cloud offerings (Planetscale, Supabase, Neon, even RDS) have solved that. MySQL was faster for read heavy loads. Also solved by the cloud vendors. | | |
| ▲ | bri3d 4 days ago | parent | next [-] | | At large scale I'd say MySQL is still a competitor for a few reasons: * Scale-out inertia: yes, cloud vendors provide similar shading and clustering features for Postgres, but they're all a lot newer. * Thus, hiring. It's easier to find extreme-scale MySQL experts (although this erodes year by year). * Write amplification, index bloat, and tuple/page bloat for extremely UPDATE heavy workloads. It is what it is. Postgres continues to improve, but it is fundamentally an MVCC database. If your workload is mostly UPDATEs and simple SELECTs, Postgres will eventually fall behind MySQL. * Replication. Postgres replication has matured a ridiculous amount in the last 5-10 years, and to your point, cloud hosting has somewhat reduced the need to care about it, but it's still different from MySQL in ways that can be annoying at scale. One of the biggest issues is performing hybrid OLAP+OLTP (think, a big database of Stuff with user-facing Dashboards of Stuff). In MySQL this is basically a non-event, but in Postgres this pattern requires careful planning to avoid falling afoul of max_standby_streaming_delay for example. * Neutral but different: documentation - Postgres has better-written user-facing documentation for user-facing functions, IMO. However, _if_ you don't like reading source code, MySQL has better internals documentation, and less magic. However, Postgres is _very_ well written and commented, so if you're comfortable reading source, it's a joy. A _lot_ of Postgres work, in my experience, is reading somewhat vague documentation followed by digging into the source code to find a whole bunch of arbitrary magic numbers. If you don't believe me , as an exercise, try to figure out what `default_statistics_target` _actually_ does. Anyway, I still would choose a managed Postgres solution almost universally for a new product. Unless I know _exactly_ what I'm going to be doing with a database up-front, Postgres will offer better flexibility, a nicer feature-set, and a completely acceptable scale story. | | |
| ▲ | jashmatthews 3 days ago | parent | next [-] | | > hybrid OLAP+OLTP .... in Postgres this pattern requires careful planning to avoid falling afoul of max_standby_streaming_delay for example This is a really gnarly problem at scale I've rarely seen anyone else bring up. Either you use max_standby_streaming_delay and queries that conflict with replication cause replication to lag or you use hot_standby_feedback and long running queries on the OLAP replica cause problems on the primary. Logical Decoding on a replica in also needs hot standby feedback which is a giant PITA for your ETL replica. | |
| ▲ | jedberg 3 days ago | parent | prev [-] | | I appreciate your detailed reply, and I agree with all your points. I am however highly amused that everyone in this thread defending MySQL ends with some form of "I'd still choose Postgres though!". :) |
| |
| ▲ | cortesoft 4 days ago | parent | prev [-] | | > At this point I'm not sure why anyone would choose MySQL Because I have used MySQL for over 20 years and it is what I know! | | |
| ▲ | jedberg 4 days ago | parent [-] | | Fair enough, but I assume most of that is in the administration of MySQL? Which is all now abstracted away by the cloud vendors. If you're running it yourself I could see why you'd do that, but if you're mostly just using it now, Postgres can do all the same things in the database pretty much the same way, plus a whole lot more. | | |
| ▲ | cortesoft 4 days ago | parent [-] | | Its both operating MySQL and creating applications that use it. Additionally, almost all my workloads run in our own datacenters, so I haven't yet been able to offload the administration bits to the cloud. |
|
|
| |
| ▲ | n_u 4 days ago | parent | prev | next [-] | | From my position MySQL pros: The MySQL docs on how the default storage engine InnoDB locks rows to support transaction isolation levels is fantastic. [1] This can help you better architect your system to avoid lock contention or understand why existing queries may be contending for locks. As far as I know Postgres does not have docs like that. MySQL uses direct I/O so it disables the OS page cache and uses its own buffer pool instead[2]. Whereas Postgres doesn't use direct I/O so the OS page cache will duplicate pages (called the "double buffering" problem). So it is harder to estimate how large of a dataset you can keep in memory in Postgres. They are working on it though [3] If you delete a row in MySQL and then insert another row, MySQL will look through the page for empty slots and insert there. This keeps your pages more compact. Postgres will always insert at the bottom of the page. If you have a workload that deletes often, Postgres will not be using the memory as efficiently because the pages are fragmented. You will have to run the VACUUM command to compact pages. [4] Vitess supports MySQL[5] and not Postgres. Vitess is a system for sharding MySQL that as I understand is much more mature than the sharding options for Postgres. Obviously this GA announcement may change that. Uber switched from MySQL to Postgres only to switch back. It's a bit old but it's worth a read. [6] Postgres pros: Postgres supports 3rd party extensions which allow you to add features like columnar storage, geo-spatial data types, vector database search, proxies etc.[7] You are more likely to find developers who have worked with Postgres.[8] Many modern distributed database offerings target Postgres compatibility rather than MySQL compatibility (YugabyteDB[9], AWS Aurora DSQL[10], pgfdb[11]). My take: I would highly recommend you read the docs on InnoDB locking then pick Postgres. [1] https://dev.mysql.com/doc/refman/8.4/en/innodb-locking.html [2] https://dev.mysql.com/doc/refman/8.4/en/memory-use.html [3] https://pganalyze.com/blog/postgres-18-async-io [4] https://www.percona.com/blog/postgresql-vacuuming-to-optimiz... [5] https://vitess.io/ [6] https://www.uber.com/blog/postgres-to-mysql-migration/ [7] https://www.tigerdata.com/blog/top-8-postgresql-extensions [8] https://survey.stackoverflow.co/2024/technology#1-databases [9] https://www.yugabyte.com/ [10] https://aws.amazon.com/rds/aurora/dsql/ [11] https://github.com/fabianlindfors/pgfdb | | |
| ▲ | bri3d 4 days ago | parent | next [-] | | > I would highly recommend you read the docs on InnoDB locking then pick Postgres. This made me laugh pretty hard, but it's basically my take too. I'd pretty much go with the same thing. It's interesting to me, though, that people see Postgres as the "big database" and MySQL as the "hobby database." I basically see things as the exact opposite - Postgres is incredibly flexible, very nice to use, and these days, has fewer foot guns at small scale (IMO) than MySQL. It's more academically correct and it generally tends to "work better" at almost any achievable "normal" scale. On the other hand, Postgres is full of pitfalls and becomes very difficult at exceptionally large scale (no, not "your startup got traction" scale). Postgres also doesn't offer nearly the same quality of documentation or recipes for large scale optimization. Almost everything in the 2016 Uber article you link, which is a _great_ read, is still true to some extent with vanilla Postgres, although there are more proprietary scale-out options available now. Postgres simply has not been "hyper-scaled" to the extent that MySQL has and most massive globally sharded/replicated systems started as MySQL at some point. For this same reason, you are likely to be able to hire a MySQL-family DBA with more experience at hyper-scale than a Postgres one. With all that said, I still agree - I'd almost universally start with Postgres, with MySQL as a back-pocket scale-up-and-out option for specific very large use-cases that don't demand complex query execution or transactional workload properties. Unless you have an incredibly specific workload which is a very specific combination of heavy UPDATE and `SELECT * FROM x WHERE id=y`, Postgres will do better at any achievable scale you will find today. | | |
| ▲ | n_u 4 days ago | parent [-] | | > This made me laugh pretty hard, but it's basically my take too. Haha glad you enjoyed it. > It's interesting to me, though, that people see Postgres as the "big database" and MySQL as the "hobby database." I basically see things as the exact opposite I agree. As I understand Postgres started as a challenger to SQL[1][2] with support for more complicated data types but then in the mid '90s they added SQL support and it was renamed PostgreSQL. Anecdotally I have heard from people working in industry in the 2000s-2010s that Postgres was viewed as less mature so many of the large web applications were on MySQL. This is a bit confusing to me because MySQL was released around the same time Postgres added SQL support but maybe it was because MySQL had a company behind it. Many large scale applications of those days were using MySQL. Facebook developed RocksDB and then MyRocks[3] based on MySQL. Youtube built Vitess [4] which was sharded MySQL which was later used by Slack [5], Square, Pintrest and others. > It's more academically correct I'm curious about this. I know that Postgres implements MVCC in a wasteful way and uses the OS page cache in addition to its buffer pool resulting in double buffering rather than direct I/O. I feel like the more I learn about database internals the more I learn about how MySQL did things the "right" way and Postgres's approach is a bit odd. But perhaps I'm missing something. [1] https://en.wikipedia.org/wiki/PostgreSQL#History [2] https://db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec20... [3] https://engineering.fb.com/2016/08/31/core-infra/myrocks-a-s... [4] https://vitess.io/docs/22.0/overview/history/ [5] https://slack.engineering/scaling-datastores-at-slack-with-v... | | |
| ▲ | bri3d 4 days ago | parent | next [-] | | > I feel like the more I learn about database internals the more I learn about how MySQL did things the "right" way and Postgres's approach is a bit odd. This is a good distinction too; I was thinking from the end-user’s standpoint, where Postgres has historically been seen as more faithful to both SQL standards and consistency guarantees. | |
| ▲ | jedberg 3 days ago | parent | prev | next [-] | | > This is a bit confusing to me because MySQL was released around the same time Postgres added SQL support but maybe it was because MySQL had a company behind it. I think the main reason MySQL took off faster than Postgres originally is because it had better defaults. MySql worked out of the box on modern hardware. Postgres assumed you only have 4MB of memory until well into the 2010s, in part to make it keep running on everything it ever ran on in the past. So when you first installed Postgres, it would perform terribly until you optimized it. It's really a fantastic case study in setting good defaults. | |
| ▲ | pests 3 days ago | parent | prev | next [-] | | My reason for choosing MySQL in the early days was due to it being the default choice for PHP apps back in the day. Every tutorial, the mysql_ function, every app like Wordpress or anything else, and phpmysqladmin. Postgres was seen as more corporate / official / big boys club in my view… similar to how clickhouse was viewed here until recently. | |
| ▲ | apavlo 4 days ago | parent | prev [-] | | You are conflating MySQL and InnoDB. The latter does a lot of good things, much more than the former. | | |
| ▲ | n_u 3 days ago | parent [-] | | Haha that's probably correct, databases are a huge topic and I still know very little. I learned most of what I know about databases from you and my work in industry. Perhaps could you share some of those good / bad things InnoDB / MySQL does? |
|
|
| |
| ▲ | tracker1 4 days ago | parent | prev [-] | | To me, every time I've touched MySQL, I've found an issue that just rubbed me the wrong way... starting with the fact that UTF8 isn't, and even across new major versions hasn't changed to an alias for the real UTF8. (VAR)BINARY does case sensitive collation based on db default, it shouldn't do any type of textual comparison even if the data is "text". You can't do foreign keys with ANSI quotes for the table/field names... Note: some of this may have changed in the past 6+ years I've avoided looking at it again. | | |
| ▲ | sgarland 3 days ago | parent [-] | | > (VAR)BINARY Close [0]: (VAR)CHAR BINARY, which is its own type, uses the `_bin` collation for the column or table character set. (VAR)BINARY stores binary strings, and uses the `binary` character set and collation. In fairness, the amount of options, gotchas, and WTF involved with collations in any DB is mind-boggling. [0]: https://dev.mysql.com/doc/refman/8.4/en/binary-varbinary.htm... | | |
| ▲ | tracker1 3 days ago | parent [-] | | I just know that I was sitting converted uuid and was getting collisions on different binary values. This was admittedly a couple decades ago though. By converted it was legacy records where the original int value was injected to uuid format. This was then stored in a binary field as a primary key. |
|
|
| |
| ▲ | sgammon 4 days ago | parent | prev [-] | | planetscale now supports both :) |
|
|
| ▲ | sethreno 3 days ago | parent | prev | next [-] |
| Will Neki be a branch of [Citus](https://github.com/citusdata/citus) or is it more based on Vitess? |
| |
| ▲ | samlambert 3 days ago | parent [-] | | Inspired by Vitess seeing as we are the company behind Vitess but it's a new codebase. |
|
|
| ▲ | cocoflunchy 3 days ago | parent | prev | next [-] |
| Hi, any idea of the timing to launch in europe-west1 on GCP? Also does branching work on Postgres? |
|
| ▲ | robraven 3 days ago | parent | prev | next [-] |
| Any benchmarks for using planetscale Postgres vs MySQL? |
|
| ▲ | tacone 4 days ago | parent | prev | next [-] |
| Extensions! Which pg extensions are you going to make available? |
| |
|
| ▲ | dangoodmanUT 4 days ago | parent | prev [-] |
| Postgres (esoterically?) has some issues with index bloat on high-insert workloads, does PlanetScale do anything special to tune for this by default, since it caters to higher-perf workloads (over something like supabase)? |
| |
| ▲ | petergeoghegan 4 days ago | parent | next [-] | | Can you provide more detail/a reference? I've done extensive work on improving the Postgres B-Tree code, over quite a number of releases. I'm not aware of any problems with high-insert workloads in particular. I have personally fixed a number of subtle issues that could lead to lower space utilization with such workloads [1][2] in the past, though. if there's a remaining problem in this area, then I'd very much like to know about it. [1] https://www.youtube.com/watch?v=p5RaATILoiE
[2] https://speakerdeck.com/peterg/nbtree-arch-pgcon | | |
| ▲ | dangoodmanUT 3 days ago | parent [-] | | In a previous use case, when using postgres as a WAL-like append only store, I noticed that indexes would get massive. Then, after a while, they'd magically shrink. I had eventually switched to an API on top of Badger (golang KV), which afforded me an order of magnitude lower latency at ~30% of the resources IIRC. I'm sure there might have been some tuning I could have done to improve it. I've also heard similar behaviors exhibited from other folks who had similar high-write workloads on postgres. Sorry, I don't have anything super tangible to provide off the top of my head, or metrics/code I can share to recreate! It was also a project that required a lot of data to recreate the setup for. | | |
| ▲ | petergeoghegan 3 days ago | parent [-] | | > In a previous use case, when using postgres as a WAL-like append only store, I noticed that indexes would get massive. Then, after a while, they'd magically shrink. It's possible to recycle pages within indexes that have some churn (e.g., with workloads that use bulk range deletions). But it's not possible for indexes to shrink on their own, in a way that can be observed by monitoring the output of psql's "\di+" command. For that you'd need to REINDEX or run VACUUM FULL. | | |
| ▲ | dangoodmanUT 3 days ago | parent | next [-] | | I may not be remembering fully, maybe the indexes never shrunk but the tables did in size. Is there no way to automatically clean up indexes then? | | |
| ▲ | petergeoghegan 3 days ago | parent [-] | | > I may not be remembering fully, maybe the indexes never shrunk but the tables did in size. That only happens when it is possible to give back space to the OS filesystem using relation truncation in the first place -- which isn't all that common (it's generally only seen when there are bulk range deletions that leave lots of contiguous empty space at the end of a table/heap structure). But you said that this is an append-only workload. This behavior can be disabled by setting the vacuum_truncate table storage parameter to "off". This is useful with workloads where relation truncation is disruptive (truncation needs to acquire a very heavyweight table lock). > Is there no way to automatically clean up indexes then? What I meant was that indexes do not support relation truncation. It follows that the amount of space used for an index (from the point of view of the OS) cannot ever go down, barring a REINDEX or a VACUUM FULL. This does not mean that we cannot reuse space for previously freed/deleted pages (as long as we're reusing that space for the same index). Nor does it mean that "clean up" isn't possible in any general sense. |
| |
| ▲ | jashmatthews 3 days ago | parent | prev [-] | | Does vacuum not release free pages at the end of an index file in the same way it does for the heap? | | |
|
|
| |
| ▲ | samlambert 4 days ago | parent | prev | next [-] | | We don't do anything special (yet) but we do have bloat detection that we warn you about. We've noticed that autovacuum works very well on our Metal product because of the extra resources. | | | |
| ▲ | dangoodmanUT 4 days ago | parent | prev [-] | | On another, similar, vein - i'd be curious to know if XID wraparound and auto vacuum tuning was something you had to advise customers on up front consdering how often that issue rears its head for the same kinds of workloads. |
|