| ▲ | morshu9001 a day ago |
| The article compares UUIDv7 vs v4, but doesn't say why you'd do either instead of just serial/bigserial, which has always been my goto. Did I miss something? |
|
| ▲ | molf a day ago | parent | next [-] |
| Good question. There's a few reasons to pick UUID over serial keys: - Serial keys leak information about the total number of records and the rate at which records are added. Users/attackers may be able to guess how many records you have in your system (counting the number of users/customers/invoices/etc). This is a subtle issue that needs consideration on a case by case basis. It can be harmless or disastrous depending on your application. - Serial keys are required to be created by the database. UUIDs can be created anywhere (including your backend or frontend application), which can sometimes simplify logic. - Because UUIDs can be generated anywhere, sharding is easier. The obvious downside to UUIDs is that they are slightly slower than serial keys. UUIDv7 improves insert performance at the cost of leaking creation time. I've found that the data leaked by serial keys is problematic often enough; whereas UUIDs (v4) are almost always fast enough. And migrating a table to UUIDv7 is relatively straightforward if needed. |
| |
| ▲ | MBCook a day ago | parent | next [-] | | Not only can you make a good guess at how many customers/etc exist, you can guess individual ones. World’s easiest hack. You’re looking at /customers/3836/bills? What happens if you change that to 4000? They’re a big company. I bet that exists. Did they put proper security checks EVERYWHERE? Easy to test. But if you’re at /customers/{big-long-hex-string}/bill the chances of you guessing another valid ID are basically zero. Yeah it’s security through obscurity. But it’s really good obscurity. | | |
| ▲ | neya 17 hours ago | parent | next [-] | | This advice assumes /customers/:id/bills is public. Protected routes shouldn't expose sensitive information such as bills anyway, so this is more of an authorization issue (who can access which resource) more than privacy concerns. So this means, if you can access customes/4000/bills, then that's an application logic issue more than the type of ID itself. In a well designed application, you shouldn't be able to guess whether a record exists or not simply by accessing a protected URL.
As a counter argument - normal BIGINT or serial PKs are performant and are more than enough for most applications. | | |
| ▲ | andrewjf 9 hours ago | parent [-] | | You describe a world where human skill is required to prevent these class of bugs, time and time again we've proven that people are people and bugs happen. Systems must be _structurally architected_ with security in mind. Security is layered, using a random key with 128-bit space makes guessing UUIDs infeasible. But _also_ you should be doing AuthZ on the records, and also you should be doing rate limiting on API so they can't be brute forced, either. |
| |
| ▲ | morshu9001 a day ago | parent | prev [-] | | You normally aren't supposed to expose the PK anyway. | | |
| ▲ | bruce511 a day ago | parent [-] | | That advice was born primarily _because_ of the bigint/serial problem. If the PK is UUIDv4 then exposing the PK is less significant. In some use cases it can be possible to exclude, or anonymize the PK, but in other cases a PK is necessary. Once you start building APIs to allow others to access your system, a UUIDv4 is the best ID. There are some performance issues with very large tables though. If you have very large tables (think billions of rows) then UUIDv7 offers some performance benefits at a small security cost. Personally I use v4 for almost all my tables because only a very small number of them will get large enough to matter. But YMMV. | | |
| ▲ | morshu9001 13 hours ago | parent [-] | | It's not about table size so much as number of joins. You don't need to trade off between security and performance if you simply expose a uuid4 secondary col on a serial PK'd table. |
|
|
| |
| ▲ | a day ago | parent | prev [-] | | [deleted] |
|
|
| ▲ | edoceo a day ago | parent | prev | next [-] |
| So the client side can create the ID before insert - that's the case that (mostly) drives it for me. The other is where you have distributed systems and then later want to merge the data and not have any ID conflicts. |
| |
| ▲ | saagarjha a day ago | parent | next [-] | | Allowing the client to generate IDs for you seems like a bad idea? | | |
| ▲ | morshu9001 a day ago | parent | next [-] | | Client = backend here, right? So you could make a bunch of rows that relate to each other then insert, without having to ping the DB each time to assign a serial ID. Normally the latter is what I do, but I can imagine a scenario where it'd be slow. | | |
| ▲ | wongarsu a day ago | parent [-] | | The usual flow would be INSERT ... RETURNING id, which gives you the db-generated id for the record you just inserted with no performance penalty. That doesn't work for circular dependencies and it limits the amount of batching you can do. But typically those are smaller penalties than the penalty from having a 128 bit primary key vs a 64 bit key | | |
| |
| ▲ | clintonb 6 hours ago | parent | prev | next [-] | | Client-generated IDs are necessary for distributed or offline-first systems. My company, Vori, builds a POS for grocery stores. The POS generates UUIDv7 IDs for all data it creates and that data is eventually synced to our backend. The sync time can range from less than 1 second for a store with fast Internet to hours if a store is offline. Is a collision possible? Yes, but the likelihood of a collision is so low that it's not worth agonizing over (although I did when I was designing the system). | |
| ▲ | bramhaag a day ago | parent | prev | next [-] | | It can be quite elegant. You can avoid the whole temporary or external ID mess when the client generates the ID, this is particularly useful for offline-first clients. Of course you need to be sure the server will accept the ID, but that is practically guaranteed by the uniqueness property of UUIDs. | |
| ▲ | coolspot a day ago | parent | prev | next [-] | | “client” here may refer to a backend app server.
So you can have 10-100s of backend servers inserting into a same table without having a single authority coordinating IDs. | | |
| ▲ | morshu9001 a day ago | parent [-] | | That table is still a single authority, isn't it? But I guess fewer steps is still faster. | | |
| ▲ | tracker1 a day ago | parent [-] | | Except if you're using a sharding or clustering database system, where the record itself may be stored to separate servers as well as the key generation itself. | | |
| ▲ | morshu9001 a day ago | parent [-] | | In those cases yes. There's still a case for sequential there depending on the use pattern, but write-heavy benefits from not waiting on one server for IDs. |
|
|
| |
| ▲ | markstos a day ago | parent | prev [-] | | Why? |
| |
| ▲ | jrochkind1 a day ago | parent | prev [-] | | yup, I'd say those are the two biggies. |
|
|
| ▲ | Deadron a day ago | parent | prev | next [-] |
| For when you inevitably need to expose the ids to the public the uuids prevent a number of attacks that sequential numbers are vulnerable to. In theory they can also be faster/convenient in a certain view as you can generate a UUID without needing something like a central index to coordinate how they are created. They can also be treated as globally unique which can be useful in certain contexts. I don't think anyone would argue that their performance overall is better than serial/bigserial though as they take up more space in indexes. |
| |
| ▲ | morshu9001 a day ago | parent | next [-] | | But these are internal IDs only, and public ones should be a separate col. Being able to generate uuid7 without a central index is useful in distributed systems, but this is a Postgres DB already. Now, the index on the public IDs would be faster with a uuid7 than a uuid4, but you have a similar info leak risk that the article mentions. | | |
| ▲ | rcfox a day ago | parent [-] | | "Distributed systems" doesn't have to mean some fancy, purpose-built thing. Just correlating between two Postgres databases might be a thing you need to do. Or a database and a flat text file. | | |
| ▲ | morshu9001 a day ago | parent [-] | | I usually just have a uuid4 secondary for those correlations, with a serial primary. I've done straight uuid4 PK before, things got slow on not very large data because it affected every single join. |
|
| |
| ▲ | xienze a day ago | parent | prev [-] | | People really overthink this. You can safely expose internal IDs by doing a symmetric cipher, like a Feistel cipher. Even sequential IDs will appear random. | | |
|
|
| ▲ | ibejoeb a day ago | parent | prev | next [-] |
| If you need an opaque ID like a uuid because, for example, you need the capability to generate non-colliding IDs generated by disparate systems, the best way I've found is to separate these two concerns. Use a UUIDv4 for public purposes and a bigint internally. You don't need to worry about exposing creation time, and you can still manage your data in the home system with all the properties that a total ordering affords. |
| |
| ▲ | tracker1 a day ago | parent [-] | | Now coordinate those sequential ids on a sharded or otherwise clustered database system. | | |
| ▲ | ibejoeb a day ago | parent [-] | | That's the point. Those are only system-unique, not universally. It's a lower-level attribute that is an implementation detail, like for referential integrity in an rdbms. At that point, if you need it, you have atomic increment. |
|
|
|
| ▲ | nextaccountic a day ago | parent | prev | next [-] |
| uuids can be generated by multiple services across your stack bigserial must by generated by the db |
| |
| ▲ | coolspot a day ago | parent [-] | | But what if we just use milliseconds as our bigserial? And maybe add some hw-random number at the end to avoid conflicts? Wait | | |
| ▲ | tracker1 a day ago | parent | next [-] | | Somehow +1 on this comment just doesn't feel like enough. | |
| ▲ | crazygringo a day ago | parent | prev [-] | | Oh yeah, it would be an identifier but it would be unique. Across the universe of all devices, effectively. Should come up with a name for that |
|
|
|
| ▲ | simongr3dal a day ago | parent | prev | next [-] |
| I believe the concern is if your primary key in the database is a serial number it might be exposed to users unless you do extra work to hide that ID from any external APIs and if there are any flaws in your authorization checks it can allow enumeration attacks exposing private or semi-private info. With UUIDs being virtually unguessable that makes it less of a concern. |
| |
| ▲ | morshu9001 a day ago | parent [-] | | uuid7 is still guessable though, as the article says. The assumption is that these are internal only PKs. | | |
| ▲ | molf a day ago | parent | next [-] | | There is a big difference though. Serial keys allow attackers to guess the rate at which data is being added. UUID7 allows anyone to know the time of creation, but not how many records have been created (approximately) in a particular time frame. It leaks data about the record itself, but not about other records. | |
| ▲ | tracker1 a day ago | parent | prev | next [-] | | Far, far less than sequential Ids, and the random part is some pretty big values numerically... I mean there's billions of possible values for every MS on the generating server... you aren't going to practically "guess" at them. | |
| ▲ | e12e a day ago | parent | prev [-] | | Guessable with 80 bits of entropy? |
|
|
|
| ▲ | mhuffman a day ago | parent | prev | next [-] |
| >why you'd do either instead of just serial/bigserial, which has always been my goto. Did I miss something? So the common response is sequential ID crawling by bad actors. UUIDs are generally un-guessable and you can throw them into slop DBs like Mongo or storage like S3 as primary identifiers without worrying about permissions or having a clever interested party pwn your whole database. A common case of security through obscurity. |
|
| ▲ | martinky24 a day ago | parent | prev [-] |
| You don’t scale horizontally, do you? |
| |
| ▲ | rcfox a day ago | parent | next [-] | | Do most people? Not everyone is Google. | | |
| ▲ | martinky24 a day ago | parent [-] | | Many people have more than 1 server that need to generate coherent identifiers amongst one another. That's not a "Google scale" thing. | | |
| ▲ | rcfox a day ago | parent [-] | | Your comment heavily implied (to me) scaling databases horizontally. Yes, it's not necessarily "Google scale" either, but it's a ton of extra complexity that I'm happy to avoid. But a Google employee is probably going to approach every public-facing project with the assumption of scaling everything horizontally. With multiple servers talking to a single database, I'd still prefer to let the database handle generating IDs. | | |
| ▲ | morshu9001 a day ago | parent [-] | | Yeah, there's too much advice jumping straight to uuid4 or 7 PKs for no particular reason. If you're doing a sharded DB, maybe, and even then it depends. Speaking of Google, Spanner recommends uuid4, and specifically not any uuid that includes a timestamp at the start like uuid7. |
|
|
| |
| ▲ | morshu9001 a day ago | parent | prev [-] | | This is Postgres. There is Citus, but that still supports (maybe recommends?) serial PKs. |
|