| ▲ | ZYbCRq22HbJ2y7 6 days ago |
| > You’re supposed to store timestamps instead, and treat the presence of a timestamp as true. I do this sometimes but not always - in my view there’s some value in keeping a database schema immediately-readable. Seems overly negative of broad advice on a good pattern? is_on => true
on_at => 1023030
Sure, that makes sense. is_a_bear => true
a_bear_at => 12312231231
Not so much, as most bears do not become bears at some point after not being a bear. |
|
| ▲ | grey-area 6 days ago | parent | next [-] |
| I’d see the booleans as a bad thing in almost all cases, instead of a boolean you can have a timestamp or an integer field (which can expand later). In the is_a case almost always a type or kind is better as you’ll rarely just have bears even if you only start with bears, just as you rarely have just two states for a status field (say on or off), often these expand in use to include things like suspended, deleted and asleep. So generally I’d avoid booleans as they tend to multiply and increase complexity partially when they cover mutually exclusive states like live, deleted and suspended. I have seen is_visible, is_deleted and is_suspended all on the same table (without a status) and the resulting code and queries are not pretty. I’d use an integer rather than a timestamp to replace them though. |
| |
| ▲ | ZYbCRq22HbJ2y7 5 days ago | parent | next [-] | | Yeah, I mean, an integer can definitely hold more data than a boolean. If your data was simple enough, you could have an integer hold the entire meaning of a table's row, if every client understood how it was interpreted. You could do bitwise manipulations, encodings, and so on. Sometimes it is nice to understand what the data means in the schema alone. You can do that with enums, etc. ate_an_apple_in_may_2024
saw_an_eclipse_before_30
These are more of the sort of things I don't see needing enums, timestamps, integers... | | |
| ▲ | grey-area 5 days ago | parent [-] | | I’ve never seen anything remotely like that in a schema and it seems inappropriate anyway - those should IMO be timestamps like saw_eclipse_at not booleans. You should not encode business rules in the schema (like certain magic dates) because those business rules always change over time. | | |
| ▲ | ZYbCRq22HbJ2y7 5 days ago | parent [-] | | Yeah, need a bit of imagination to walk there with me, but you are saying just use `saw_eclipse_at`. That would require having knowledge of the entity's birthday, the date that the event occurred on, and so on, which in this imaginary scenario, we do not. | | |
| ▲ | grey-area 5 days ago | parent [-] | | I would not store that in a schema, storing bday and date seen is much more useful so that when the business inevitably asks for saw eclipse at 50 too you can answer the question without adding a saw at 50 boolean. Bday is also super useful info for a business that cares how old someone is (as your hypothetical one clearly does). Often the stated requirements of a problem are far too specific. Part of the job is system design is saying no to suggestions and finding the deeper constraints. This example is a great illustration of why booleans are usually a mistake. | | |
| ▲ | 4 days ago | parent | next [-] | | [deleted] | |
| ▲ | ZYbCRq22HbJ2y7 4 days ago | parent | prev | next [-] | | > This example is a great illustration of why booleans are usually a mistake. It is not. You are ignoring hypothetical constraints to bolster your own hypothetical argument, which is apparently some permutation of, "tell people they are wrong, and do some imputing of imaginary data. repeat until you have ideal 'system design'" | |
| ▲ | 4 days ago | parent | prev [-] | | [deleted] |
|
|
|
| |
| ▲ | 5 days ago | parent | prev [-] | | [deleted] |
|
|
| ▲ | setr 6 days ago | parent | prev | next [-] |
| If you take the statement at face value — essentially storing booleans in the db ever is a bad smell - then he’s correct. Although I’m not even sure it’s broadly a good principle, even in the on_at case; if you actually care about this kind of thing, you should be storing it properly in some kind of audit table. Switching bool to timestamp is more of a weird lazy hack that probably won’t be all that useful in practice because only a random subset of data is being tracked like that (Boolean data type definitely isn’t the deciding factor on whether it’s important enough to track update time on). The main reason it’s even suggested is probably just that it’s “free” — you can smuggle the timestamp into your bool without an extra column — and it probably saved some effort accidentally; but not because it’s a broadly complete solution to the set of problems it tries to solve for I’ve got the same suspicion with soft-deletes — I’m fairly positive it’s useless in practice, and is just a mentally lazy solution to avoid proper auditing. Like you definitely can’t just undelete it, and it doesn’t solve for update history, so all you’re really protecting against is accidental bulk delete caught immediately? Which is half the point of your backup |
| |
| ▲ | moebrowne 6 days ago | parent | next [-] | | It's well documented that soft delete is more of a headache than it's worth https://brandur.org/soft-deletion | | |
| ▲ | ZYbCRq22HbJ2y7 4 days ago | parent [-] | | Maybe well opined on, not well-documented. There are clients (ORM, etc) that handle soft deletion just fine in many of the scenarios laid out in that article. |
| |
| ▲ | maxbond 6 days ago | parent | prev [-] | | Audit tables are a big ask both in terms of programming effort to design and support them, and in terms of performance hit due to write amplification (all inserts and updates cause an additional write to an audit table). Whereas making a bool into a timestamp is free. Including timestamps on rows (including created_at and updated_at) are real bacon savers when you've deployed a bug and corrupted some rows and need to eg refund orders created in a certain window. | | |
| ▲ | mrkeen 5 days ago | parent | next [-] | | Audit tables are a dumb concept because they imply bolting on an actual source of truth in addition to the regular not so source of truth tables, and only if the programmer gets around to it (like documentation or logging or whatever else falls along the wayside). | | |
| ▲ | tremon 5 days ago | parent | next [-] | | This doesn't make sense to me-- if the regular tables don't capture the true state, then the audit tables based on them will not magically become a source of truth either. | | |
| ▲ | mrkeen 5 days ago | parent [-] | | Exactly. If they agree, then there was no point to introducing a second source of truth. If they disagree, then which one do you trust? |
| |
| ▲ | RaftPeople 5 days ago | parent | prev [-] | | > Audit tables are a dumb concept because they imply bolting on an actual source of truth in addition to the regular not so source of truth tables, The regular table is the source of truth, the audit table is just a historical record of what changed and when. | | |
| ▲ | mrkeen 5 days ago | parent [-] | | So what do you do if a balance has $30 and the audit table shows two deposits of $20? | | |
| ▲ | setr 4 days ago | parent | next [-] | | If you’re recording events as intent, then the event table is the source of truth and the state of the database is effectively snapshot of the state. In principle, you should be able to replay the events to produce the current state. If it calculates wrong, go fix your calculator, assume your events table isn’t wrong, and replay. If you’re storing a log of database changes (what’s usually meant by an audit table), as in (Table_Name, Column_Name, Old_Value, New_Value, Timestamp) Then the database state is the source of truth, and the audit table is simply a record of changes ever occurred that reached that state. In either case, there’s a single source of truth. | |
| ▲ | maxbond 4 days ago | parent | prev [-] | | The same as if you didn't have an audit table and the balance was wrong. You declare some kind of incident and investigate until you find this critical bug. Hopefully you can use the audit trail to determine what the right answer is and fix everyone's balance. How do you know which one to trust? By reading the code and figuring out what the bug is. |
|
|
| |
| ▲ | valenterry 5 days ago | parent | prev | next [-] | | This. The mere fact that it's much easier to find deleted/impacted entities is worth it. | |
| ▲ | setr 5 days ago | parent | prev [-] | | > Including timestamps on rows (including created_at and updated_at) are real bacon savers when you've deployed a bug and corrupted some rows and need to eg refund orders created in a certain window. But that’s my point. You’re making an active decision to record timestamps on important events (and no bool was being converted here); bool —> timestamp everywhere is not the same thing — the bool data type is not a useful signal for whether this change needs to be timestamp-tracked. Either think and choose to track these particular changes or not, and drop in the appropriate tracking. The mindless Bool—>timestamp change is only ever suggested because it’s a “why not”, not because it’s a good practice that often leads to good things. The audit table is of course just deciding “every change ever is important” | | |
| ▲ | maxbond 5 days ago | parent [-] | | Setting a boolean is, quite often, an important event you want to keep track of. (Specifically, when it is a flag indicating an event took place, eg, once it's been set it is rarely if ever unset.) Of course it isn't universally applicable to every boolean column; nothing is. Of course you need to understand your schema and the problem you're solving. That doesn't make it mindless or make it any less valuable. It is a good pattern which does lead to good things. If it is mindless, your suggestion of tracking every change in an audit table, and the created_at and updated_at fields you approved of, are equally mindless. As it stands, I feel I've articulated straightforward and realistic advantages of using timestamps as booleans. But I haven't heard from you why I shouldn't. | | |
| ▲ | setr 5 days ago | parent [-] | | The original premise, from TFA, is Another is the Twitter-optimized “you’re a terrible engineer if you ever store booleans in a database” clever trick
That is, universal application. My point is: doing that, for the most part, will not the track the things you actually want, because bools are not the driving decision maker on value of tracking (and many things that need to be tracked will not be bools). It may accidentally capture useful tracking, but it’s more by accident than anything else. Thus, universal application is clearly wrong, and TFA is right to apply it conditionally.You appear to agree on all of these points. I’m not arguing that, should you need to timetrack a bool, bool—>timestamp is a bad way to do it. As I’ve said, the universal application of it is only stated because it’s free to do. But being free doesn’t make it useful, and since you agree on that as well, I don’t know if there’s any argument actually occurring here. | | |
| ▲ | maxbond 4 days ago | parent [-] | | I think we agree more than we disagree on how to go about designing a schema, but that my narrow disagreement is that I think a.) timestamps and enums are better choices than booleans most of the time (without judging anyone who uses booleans to be a "terrible engineer", it's not malpractice) and b.) that's a legitimate trick of the trade rather than something "mindless". I would agree that throwing a table together based on Twitter vibes without actually designing it will only work like a stuck clock, but I think that goes without saying. I kinda disregard opinions attributed to nebulous Twitter posters and not specific people. I think when you conflate people's opinions together like that you reduce their position to a caricature. If you poll that group of people almost all of them will have a more nuanced opinion. |
|
|
|
|
|
|
| ▲ | spiddy 6 days ago | parent | prev | next [-] |
| though why treat booleans as special case and keep timestamps for them when you don’t for integers with this pattern: isDarkTheme: {timestamped}
paginationItems: 50 I can see when dark theme was activated but not when pagination was set to 50. also, i can’t see when dark theme is being deactivated either. seems like a poor-man changelog. there maybe use cases for it but i can’t think of anything tbh. |
|
| ▲ | oftenwrong 5 days ago | parent | prev | next [-] |
| A boolean is smaller, which is a relevant consideration for some workloads. For example, you may be pre-aggregating a large amount of data to serve a set of analytical queries which do not care about the associated timestamp. The smaller data type is more efficient both in storage and in query execution. Additionally, there are situations where it is logical to store a boolean. For example, if the boolean denotes an outcome: process_executed_at timestamp not null
process_succeeded boolean not null
|
| |
| ▲ | maxbond 5 days ago | parent [-] | | It's unlikely the boolean will result in better utilization, the savings will probably be consumed by padding. Most people don't know how to use structure packing to create a row which is actually smaller after it's been padded (though it's not very hard, anyone could learn). Columns are generally ordered by which features were shipped first and not by alignment (as is necessary to minimize padding). I do try my best to pack my columns, but it's a fragile and likely premature optimization. Better to opt for something defensive at a cost of like, 7 bytes per row (in Postgres). |
|
|
| ▲ | Lionga 6 days ago | parent | prev | next [-] |
| All this general advice is quite useless and needs millions of asterix. Good system design is designing a system that works best for the problem at hand. |
| |
|
| ▲ | seafoamteal 6 days ago | parent | prev [-] |
| I think in that situation, you could have an enum value that contains Bear and whatever other categories you are looking at. |
| |
| ▲ | ZYbCRq22HbJ2y7 6 days ago | parent [-] | | Sure, but this was for demonstration purposes showing that some data has other meaning that doesn't have an instantiation state dependent on time. |
|