Remix.run Logo
bricss 12 hours ago

If knowing IDs has a negative impact on security, then application system design is probably a trash.

dietr1ch 12 hours ago | parent | next [-]

The actual concern is privacy.

Privacy wise,

- Knowing sequential IDs leaks the rate of creation and amount of said entity which can translate in number of customers or rate of sales.

- Knowing timed IDs leaks activity patterns. This gets worse as you cross reference data.

- Random IDs reveal nothing.

---

Security wise,

- Sequential IDs can be guessed.

Performance wise,

- Sequential IDs may result in self-inflicted hotspots.

   - Spanner doesn't like  writing rows first keyed with timestamps, https://cloud.google.com/spanner/docs/schema-design#primary-key-prevent-hotspots.
- Random IDs lends themselves to sharding, but make indexing, column-compression, and maintaining order after inserts hard.
bearjaws 11 hours ago | parent | next [-]

- Knowing sequential IDs leaks the rate of creation and amount of said entity which can translate in number of customers or rate of sales.

This implies the existence of an endpoint that returns a list of items, which could by itself be used to determine customers or rate of sales. This also means you have a broken security model that leaks a list of customers or list of sales, that you should probably not have access to begin with.

- Knowing timed IDs leaks activity patterns. This gets worse as you cross reference data.

Again if you can list items freely you can do this anyway, capture what exists now and do diffs to determine update times and creation times.

dietr1ch 8 hours ago | parent [-]

With sequential Ids you use one of two sequences,

- table-global sequence :: Which leaks activity signals to all users that can create and see new Ids. This is the naive sequence you get when creating an incremental Id in a table.

- user-local sequence :: How many invoices a single user has, which is safe if kept within the reach of a single user. The sequence though, is slower and more awkward to generate.

Say you have a store that allows a user to check out just their own invoices.

- store.com/profile/invoices/{sequence_id}/

This does not imply that using a random id will return you back the data from another user, so it isn't necessarily as unsafe as you guessed. You'll probably get a 404 that does not even acknowledges the existence of said Id (but may be suspect to timing attacks to guess if it exists).

---

With timed Ids you do need a data leak out of bubble of a single user. Database design should always try to guard against that anyway. That's why we salt our passwords and store only their digest (right?).

jrockway 11 hours ago | parent | prev [-]

Why leak your primary keys? They are for the DBMS, not your end users.

dietr1ch 9 hours ago | parent [-]

Primary keys to what? Users wanting to get a specific piece of data will need to know some user-visible Id for that.

You can masquerade internal Ids with opaque Ids if you want to maintain a translation layer. There's also more distributed use cases that require coming up with new Ids in isolation, so they will be "exposed" anyways as you sync up with other nodes.

bearjaws 12 hours ago | parent | prev [-]

Yeah I am trying to imagine a universe where having the creation time of an item breaks your security model and every path I go down is that the system has terrible security.

Hizonner 12 hours ago | parent [-]

I know that the person I'm stalking created a pseudonymous account on service X around time Y. Based on other information, I have a limited number of suspect accounts. The creation time leaks to me, either via a bug which would otherwise have been harmless, or because somebody writing code "can't imagine a universe where having the creation time of an item breaks your security". I use the creation time to figure out which of my candidates is actually the target.

It took me under 15 seconds to come up with that.

bearjaws 11 hours ago | parent [-]

It took you 15 seconds because its a terrible example, _around time Y_ is doing insane lifting of this concept. Then "based on other information" okay so some other information is enabling this.

Hizonner 11 hours ago | parent [-]

It turns out that in reality, I usually know both "around time Y" and "other information". You're going to narrow me down from 10 accounts to 1, or from 100 to 10.

bricss 11 hours ago | parent [-]

In huge number of cases you will have timestamps in the payload anyway, since most db records will have unredacted createdOn, updatedOn fields for display in the UI.