| ▲ | jph 12 hours ago |
| We evaluated UUIDv7 and determined that it's unwise to use it as a primary key. We have applications where we control the creation of the primary key, and where the primary key will be exposed to end users, such as when using a typical web app framework built with Rails, Phoenix, Loco, Laravel, etc. For these applications, UUIDv7 time is too problematic for security, so we prefer binary-stored UUIDv4 even though it's less efficient. We also have applications where we control the creation of the primary key, and where we can ensure the primary key is never shown to users. For these applications, UUIDv7 is slower at inserts and joins, so we prefer BIGSERIAL for primary key, and binary-stored UUIDv4 for showing to users such as in URLs. |
|
| ▲ | wongarsu 11 hours ago | parent | next [-] |
| Deploying UUIDv7 certainly requires more thought about the implications. In many cases leaking the creation time of a key is completely fine, in some cases it isn't An interesting compromise is transforming the UUIDv7 to a UUIDv4 at the API boundary, like e.g. UUIDv47 [1]. On the other hand if you are doing that you can also go with u64 primary keys and transform those 1: https://github.com/stateless-me/uuidv47 |
|
| ▲ | laughing_snyder 11 hours ago | parent | prev | next [-] |
| Why would exposing any primary key be bad for security? If your system's security *in any way* depends on the randomness of a database private key, you have other problems. It's not the job of a primary key to add to security. Not to mention that UUIDv7 has 6 random bytes, which, for the vast majority of web applications, even finance, is more than enough randomness. Just imagine how many requests an attacker would need to make to guess even one UUID (281 trillion possible combinations for 6 random bytes, and he also would need to guess the unix timestamp in ms correctly). The only scenario I can think of is that you use the primary as a sort of API key. |
| |
| ▲ | 8organicbits 10 hours ago | parent | next [-] | | > system's security in any way depends on the randomness of a database private key Unlisted URLs, like YouTube videos are a popular example used by a reputable tech company. > UUIDv7 has 6 random bytes Careful. The spec allows 74 bits to be filled randomly. However you are allowed to exchange up to 12 bits for a more accurate timestamp and a counter of up to 42 bits. If you can get a fix on the timestamp and counter, the random portion only provides 20 bits (1M possiblities). Python 3.14rc introduces a UUIDv7 implementation that has only 32 random bits, for example. Basically, you need to see what your implementation does. | | |
| ▲ | bearjaws 9 hours ago | parent [-] | | only 32bits, so 4 billion guesses per microsecond... Even if youtube has 1 million videos per microsecond you would never guess them before rate limits. | | |
| ▲ | Incipient 3 hours ago | parent | next [-] | | Not sure if this is helpful here, but you're still looking at 32 bits of randomness, regardless of the time window. Use it for anything that you feel that's enough randomness to secure - a private home video of a cat braking a cup? Sure. File sharing endpoints for a business? No. Use another uuid4 based 'sharing uuid' that you map internally to the PK uuid7. | |
| ▲ | 8organicbits 7 hours ago | parent | prev [-] | | You're mixing a couple things. The 32 bit random occurs in the Python implementation, which uses a millisecond counter. The numbers you provided are suspicious, but seem quite feasible to attack. 1M IDs in 4B means each guess has ~ 1-in-4000 chance. You can make 4000 requests in an hour at a one-per-second rate. A successful attack can guess one ID, it doesn't need to enumerate all of them. | | |
| ▲ | bearjaws 4 hours ago | parent [-] | | Ah I was looking at the pg_uuidv7 python package. The backwards compatibility is a wild trade off. Either way my comment was hyperbole, but the concept is the same, 10000 records per millisecond and you get the point. For 99.999% of SQL use cases UUIDv7 is good. I only advocate for UUID so much because 3 separate times in my career I have been the one to have to add UUIDs so we don't leak number of patients, let users scrape the site by just incrementing (amongst other protections). So much easier to just UUID everything. |
|
|
| |
| ▲ | btown 10 hours ago | parent | prev | next [-] | | One of the big things here is de-anonymization and account correlation. Say you have an application where users'/products' affiliation with certain B2B accounts is considered sensitive; perhaps they need to interact with each other anonymously for bid fairness, perhaps people might be scraping for "how many users does account X have onboarded" as metadata for a financial edge. If users/products are onboarded in bulk/during B2B account signup, then, leaking the creation times of each of them with any search that returns their UUIDs, becomes metadata that can be used to correlate users with each other, if imperfectly. Often, the benefits of a UUID with natural ordering outweigh this. But it's something to weigh before deciding to switch to UUIDv7. | |
| ▲ | rhplus 9 hours ago | parent | prev | next [-] | | The German Tank Problem springs to mind. While not precisely the same problem, it’s still a case where more information that necessary is leaked in seemingly benign IDs. For the Germans, they leaked production volumes. For UUID v7, you’re leaking timing and timestamps. https://en.wikipedia.org/wiki/German_tank_problem | | |
| ▲ | andy_ppp 7 hours ago | parent [-] | | The rest of the ID will be random enough that guessing it will take an extremely long time, unless all the tanks were inserted in the same microsecond of course. I’m not sure this is a security issue with UUID though! |
| |
| ▲ | Hizonner 10 hours ago | parent | prev | next [-] | | Because anything that knows the primary key now knows the timestamp. The UUID itself leaks information. It's not that it's not adding security. It's that it's actually subtracting security. | | |
| ▲ | andy_ppp 7 hours ago | parent | next [-] | | There’s every chance the API has timestamps on when it was inserted. Honestly I’d rather my data was ordered correctly than imagining the extremely rare situations that leaking the insert time is going to bring the world falling down. You usually want that information. And I’m honestly not a fan of public services using primary keys publicly for anything important. I’d much rather nice or shorter URLs. What might be an improvement is if you can look up records efficiently from the random bits of the UUID automatically, replacing the timestamp with an index. | |
| ▲ | lucideer 10 hours ago | parent | prev [-] | | > leaks information It would have to leak sensitive information to be "subtracting security", which implies you're relying on timestamp secrecy to ensure your security. This would be one of the "other problems" the gp mentioned. | | |
| ▲ | atomicnumber3 10 hours ago | parent [-] | | Pretty much any information can be used for something. You're ignoring everything they say about how something not critical to application security may still not be desirable to be leaked for other reasons. Example: Target and Walmart may not depend on satellites being unable to image their parking lots from the perspective of loss prevention or corporate security. But it still leaks information they may not want financial analysts to know about their performance. | | |
| ▲ | lucideer 9 hours ago | parent | next [-] | | You've used an analogy instead of an example to demonstrate your point: analogies can be helpful for explaining concepts but are rarely accurate enough to prove logical parity. It would be much easier to discuss the merits of your argument if you had an example of the dangers of leaking creation timestamps for database entries. Otherwise, carparks & database creation timestamps have nothing in common that is meaningfully relevant to your argument. You cannot just generalise all worldly concepts & call it a day. | | |
| ▲ | atomicnumber3 4 hours ago | parent [-] | | The other post literally mentions using creation timestamps to judge growth rates of companies on a platform. My analogy was meant for a reader with a modicum of ability to connect dots to better interpret the parent and aunt/uncle replies. |
| |
| ▲ | limagnolia 10 hours ago | parent | prev [-] | | Sam Walton used to fly investors in his plane over Walmart stores and ask them to count the cars in the parking lot, then he would fly them over competitors stores and ask the same. Just a fun fact about how this is a very real scenario! |
|
|
| |
| ▲ | sz4kerto 10 hours ago | parent | prev | next [-] | | Example: if user IDs are not random but eg Bigserial (autoincremented) and they're exposed through some API, then API clients can infer the creation time of said users in the system. Now if my system is storing eg health data for a large population, then it'll be easy to guess the age of the user. Etc. This is not a security problem, this is an information governance problem. But it's a problem. Now if you say that I should not expose these IDs - fine, but then whatever I expose is essentially an ID anyway. | | |
| ▲ | andy_ppp 7 hours ago | parent [-] | | I really don’t think using primary keys publicly is ever good, just because UUID4 has allowed people to smash junk into the URL doesn’t mean it’s good for the web or the users over a slug or a cleaner ID. |
| |
| ▲ | 10 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | echelon 8 hours ago | parent | prev [-] | | Depends how much entropy is in your primary keys. If your primary keys are monotonic or time based, bad actors can simply walk your API. |
|
|
| ▲ | wpollock 11 hours ago | parent | prev | next [-] |
| I wonder if the issue is with exposing internal IDs to end users. I'm sure the experts here have already thought of this, but could someone explain why using encryption or even an HMAC for external views of a primary key doesn't make sense? Maybe because the extra processing is more expensive than just using UUIDv4? Using a KDF such as argon2id on the random bits of a UUIDv7 seems like it might work well for external IDs. (And why the heck are different types or variants of UUIDs called "versions"?) |
| |
| ▲ | Hizonner 10 hours ago | parent | next [-] | | Because now, for the rest of eternity, every single person who writes any code that moves data from this table to somewhere else, for any purpose, has to remember that the primary key gives away the creation time of something, which can potentially be linked to something else. A lot of people won't notice that, and a lot of people who do notice it will get the remediation wrong. And you can now forget using a simple view on the database to give any information to any person or program that shouldn't get the creation times. You've embrittled your system. | | |
| ▲ | gfody 9 hours ago | parent [-] | | the question was why not use encryption (sqids/hashids/etc) to secure publicly exposed surrogate keys, I don't think this reply is on point .. surrogate keys ideally are never exposed (for a slew of reasons beyond just leaking information) so securing them is a perfectly reasonable thing to do (as seen everywhere on the internet). otoh using any form of uuid as surrogate key is an awful thing to do to your db engine (making its job significantly harder for no benefit) > You've embrittled your system. this is the main argument for keeping surrogate keys internal - they really should be thought of like pointers, dangling pointers outside of your control are brittle. ideally anything exposed to the wild that points back to a surrogate key decodes with extra information you can use to invalidate it (like a safe-pointer!) |
| |
| ▲ | bri3d 9 hours ago | parent | prev | next [-] | | It does make all kinds of sense and is a majorly underutilized tool. | |
| ▲ | gfody 10 hours ago | parent | prev [-] | | > but could someone explain why using encryption or even an HMAC for external views of a primary key doesn't make sense? it does make sense and it's what you should do instead of using a UUID as PK for this purpose. |
|
|
| ▲ | alex_duf 7 hours ago | parent | prev | next [-] |
| UUIDv7 makes sense when a distributed system needs to insert vast amount of data that will be consumed chronologically. Typically an event log table. Anything else, as you're rightly pointing it out, is a bit of a stretch. |
| |
| ▲ | dietr1ch 7 hours ago | parent [-] | | The distributed part is what forces creating Ids outside of the server, where UUIDs become useful, and also where systems become reliable. Last year I went to renew my Id and they told me, sorry, the (centralised) system is down, but before computers things were done in a more resilient local, offline authoring + sync when convenient way that didn't result in "Sorry, computer says no. Schedule a new appointment." |
|
|
| ▲ | halayli 7 hours ago | parent | prev | next [-] |
| I think privacy fits better rather than security. If your primary key is being used as a secret then you probably got your schema wrong. how will you encrypt them when required? |
|
| ▲ | nighthawk454 9 hours ago | parent | prev | next [-] |
| Recently someone shared a method for encrypting the timestamp portion as well: https://news.ycombinator.com/item?id=45275973 |
|
| ▲ | kasperset 11 hours ago | parent | prev | next [-] |
| Currently evaluating UUIDv7 as primary key for some inventory origin. I think it should be ok to use it for such use case since it will indicate the time of creation? Any thoughts? |
| |
| ▲ | gtowey 11 hours ago | parent | next [-] | | You have to ask what problems exactly are you solving? Unless there is a compelling reason to use them, sticking with auto increment IDs is much simpler. And I say this as someone who recently has to convert some tables from auto increment IDs to uuid. In that instance, they were sharded tables that were relying on the IDs to be globally unique, and made heavy use of the IDs to scan records in time order. So uuids were something which could solve the first problem while preserving the functionality of the second requirement. | | |
| ▲ | elcritch 5 hours ago | parent [-] | | Yeah depends a lot on scale. If the inventory system only holds thousands of items, UUIDs just add a lot of headache for little gain. Your distributed table case sounds like a great use case for UUIDv7. |
| |
| ▲ | andy_ppp 7 hours ago | parent | prev [-] | | It’s a perfectly good choice, most of the complaints here are exaggerated. If your inventory has SKUs use those externally/for links and for API lookups if possible. |
|
|
| ▲ | moron4hire 11 hours ago | parent | prev | next [-] |
| This is why the UUID versions should have been labeled by letter rather than number. Each UUID version doesn't replace the last. They do different things. The numbered versioning gives the impression that "higher numbers = better" and that's neither the case nor the intention. |
|
| ▲ | bricss 11 hours ago | parent | prev [-] |
| If knowing IDs has a negative impact on security, then application system design is probably a trash. |
| |
| ▲ | dietr1ch 10 hours ago | parent | next [-] | | The actual concern is privacy. Privacy wise, - Knowing sequential IDs leaks the rate of creation and amount of said entity which can translate in number of customers or rate of sales. - Knowing timed IDs leaks activity patterns. This gets worse as you cross reference data. - Random IDs reveal nothing. --- Security wise, - Sequential IDs can be guessed. Performance wise, - Sequential IDs may result in self-inflicted hotspots. - Spanner doesn't like writing rows first keyed with timestamps, https://cloud.google.com/spanner/docs/schema-design#primary-key-prevent-hotspots.
- Random IDs lends themselves to sharding, but make indexing, column-compression, and maintaining order after inserts hard. | | |
| ▲ | bearjaws 9 hours ago | parent | next [-] | | - Knowing sequential IDs leaks the rate of creation and amount of said entity which can translate in number of customers or rate of sales. This implies the existence of an endpoint that returns a list of items, which could by itself be used to determine customers or rate of sales. This also means you have a broken security model that leaks a list of customers or list of sales, that you should probably not have access to begin with. - Knowing timed IDs leaks activity patterns. This gets worse as you cross reference data. Again if you can list items freely you can do this anyway, capture what exists now and do diffs to determine update times and creation times. | | |
| ▲ | dietr1ch 7 hours ago | parent [-] | | With sequential Ids you use one of two sequences, - table-global sequence :: Which leaks activity signals to all users that can create and see new Ids. This is the naive sequence you get when creating an incremental Id in a table. - user-local sequence :: How many invoices a single user has, which is safe if kept within the reach of a single user. The sequence though, is slower and more awkward to generate. Say you have a store that allows a user to check out just their own invoices. - store.com/profile/invoices/{sequence_id}/ This does not imply that using a random id will return you back the data from another user, so it isn't necessarily as unsafe as you guessed. You'll probably get a 404 that does not even acknowledges the existence of said Id (but may be suspect to timing attacks to guess if it exists). --- With timed Ids you do need a data leak out of bubble of a single user. Database design should always try to guard against that anyway. That's why we salt our passwords and store only their digest (right?). |
| |
| ▲ | jrockway 9 hours ago | parent | prev [-] | | Why leak your primary keys? They are for the DBMS, not your end users. | | |
| ▲ | dietr1ch 7 hours ago | parent [-] | | Primary keys to what? Users wanting to get a specific piece of data will need to know some user-visible Id for that. You can masquerade internal Ids with opaque Ids if you want to maintain a translation layer. There's also more distributed use cases that require coming up with new Ids in isolation, so they will be "exposed" anyways as you sync up with other nodes. |
|
| |
| ▲ | bearjaws 11 hours ago | parent | prev [-] | | Yeah I am trying to imagine a universe where having the creation time of an item breaks your security model and every path I go down is that the system has terrible security. | | |
| ▲ | Hizonner 10 hours ago | parent [-] | | I know that the person I'm stalking created a pseudonymous account on service X around time Y. Based on other information, I have a limited number of suspect accounts. The creation time leaks to me, either via a bug which would otherwise have been harmless, or because somebody writing code "can't imagine a universe where having the creation time of an item breaks your security". I use the creation time to figure out which of my candidates is actually the target. It took me under 15 seconds to come up with that. | | |
| ▲ | bearjaws 9 hours ago | parent [-] | | It took you 15 seconds because its a terrible example, _around time Y_ is doing insane lifting of this concept. Then "based on other information" okay so some other information is enabling this. | | |
| ▲ | Hizonner 9 hours ago | parent [-] | | It turns out that in reality, I usually know both "around time Y" and "other information". You're going to narrow me down from 10 accounts to 1, or from 100 to 10. | | |
| ▲ | bricss 9 hours ago | parent [-] | | In huge number of cases you will have timestamps in the payload anyway, since most db records will have unredacted createdOn, updatedOn fields for display in the UI. |
|
|
|
|
|