Remix.run Logo
weitendorf 12 hours ago

We are building similar functionality for our own postgres setup right now.

Of course, postgres is very powerful and you can implement anything like this in many different ways. But at the same time, maintaining DDL history and tracking major changes to the database is a very common requirement, and unfortunately many people don't realize that until they learned that lesson the hard way.

Relatedly are not DDL changes per-se, but big/important db operations that you want to also keep a record of so that you can look back and understand why something changed. I am not sure if this is the right term, but basically when we update our pricing model or skus, or set custom pricing for someone, we want those updates to be "auditable".

Actually, I think this is a relatively common use case too: a fully relational model often leaves you with a large number of "static" tables that only change when you're making updates to your application. They support the smaller number of big, important, dynamic tables that your application is regularly modifying. If you had the foresight to recognize that you'd likely need to change those static tables in the future, you probably organized them so you could do so by adding new rows. It is not quite a DDL change but it is a big, risky change to your application logic that only happens rarely, and you basically just want to keep a list of all those big changes in case things get messed up or you find yourself unable to make sense of older data.

cryptonector 11 hours ago | parent | next [-]

There's lots of extensions that automatically create audit tables and record history in them. The trick is to a) create audit tables for all _existing_ tables, b) create an event trigger so you can have audit tables created automatically for any future new tables. Here's an example of how to do it: https://github.com/twosigma/postgresql-contrib/blob/master/a...

Another thing you might do is to go with a schema that follows the event shipping pattern. In this pattern you have the "truth" held in insert-only tables (deletes and updates not allowed), then turn those "event" tables into ones you can query naturally using VIEWs, MATERIALIZED VIEWs, or live tables that you update with triggers on the event tables. Then your event tables _are your history/audit_ tables.

weitendorf 11 hours ago | parent [-]

Yeah, I guess what I mean is, it's good for very common use cases to have fully supported features in the product itself even if there are third party tools to handle it for you or I already know how to implement it myself.

I have been working on database features and functionality that I felt I moderately to fully understood and just needed to sit down and implement in the next week, for close to 3 weeks now.

> In this pattern you have the "truth" held in insert-only tables (deletes and updates not allowed), then turn those "event" tables into ones you can query naturally using VIEWs, MATERIALIZED VIEWs, or live tables that you update with triggers on the event tables.

This is almost exactly what I'm doing, with an additional versioning column (primary key on (id, version_num)). After I did that I realized that it'd be better to correlate changes with a push_id too because if some operations didn't modify all ids then I wouldn't be able to easily tell which versions were updated at the same time. But then I realized most of my "auditable pushes" would be operations on 3-4 related tables and not just an individual table, so push_ids would be performed on all tables. And also, since not every push modifies every value, it makes sense to model pushes as additions + diffs to the existing table. But then after several pushes constructing the MATERIALIZED VIEW of active values becomes rather complex because I have to convert a sparse tree of diffs across multiple tables into a flat table recursively...

So yeah it would be pretty nice for postgres to have something that mostly just works to audit changes at either the user, function, or table level.

cryptonector 9 hours ago | parent [-]

> it's good for very common use cases to have fully supported features in the product itself

There are a _lot_ of incredibly useful extensions to PG. What you find useful and necessary someone else might find to be unnecessary bloat. Over time the industry's demands will become clear. Another issue is that different users want auditing done differently, and so it might be difficult for PG to have one solution that fits all use-cases.

aftbit 11 hours ago | parent | prev [-]

    CREATE TABLE public.audit (
        id uuid NOT NULL,
        created_time timestamp without time zone DEFAULT now() NOT NULL,
        schema_name text NOT NULL,
        table_name text NOT NULL,
        record_id uuid NOT NULL,
        user_name text,
        action text NOT NULL,
        old_data jsonb,
        new_data jsonb
    );

    CREATE OR REPLACE FUNCTION audit_if_modified_func() RETURNS TRIGGER AS $body$
    DECLARE
        v_old_data JSONB;
        v_new_data JSONB;
    BEGIN
        IF (TG_OP = 'UPDATE') THEN
            v_old_data := to_jsonb(OLD.*);
            v_new_data := to_jsonb(NEW.*);
            IF (TG_TABLE_NAME::TEXT = 'users') THEN
                v_old_data = v_old_data - 'last_login_time';
                v_new_data = v_new_data - 'last_login_time';
            END IF;
            IF (v_old_data <> v_new_data) THEN
                INSERT INTO audit (id,record_id,schema_name,table_name,user_name,action,old_data,new_data)
                    VALUES (uuid_generate_v4(), NEW.id, TG_TABLE_SCHEMA::TEXT, TG_TABLE_NAME::TEXT,
                            session_user::TEXT, substring(TG_OP,1,1), v_old_data, v_new_data);
            END IF;
            RETURN NEW;
        ELSIF (TG_OP = 'DELETE') THEN
            v_old_data := to_jsonb(OLD.*);
            INSERT INTO audit (id, record_id,schema_name,table_name,user_name,action,old_data)
            VALUES (uuid_generate_v4(), OLD.id, TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_old_data);
            RETURN OLD;
        ELSIF (TG_OP = 'INSERT') THEN
            v_new_data := to_jsonb(NEW.*);
            INSERT INTO audit (id, record_id,schema_name,table_name,user_name,action,new_data)
            VALUES (uuid_generate_v4(), NEW.id, TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_new_data);
            RETURN NEW;
        ELSE
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - Other action occurred: % at %',TG_OP,now();
            RETURN NULL;
        END IF;
    
    EXCEPTION
        WHEN data_exception THEN
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [DATA EXCEPTION] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM;
            RETURN NULL;
        WHEN unique_violation THEN
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [UNIQUE] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM;
            RETURN NULL;
        WHEN OTHERS THEN
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [OTHER] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM;
            RETURN NULL;
    END;
    $body$
    LANGUAGE plpgsql
    SECURITY DEFINER;
    
    CREATE TRIGGER audit_logger_accounts
        AFTER INSERT OR UPDATE OR DELETE ON accounts
        FOR EACH ROW EXECUTE PROCEDURE audit_if_modified_func();

    CREATE TRIGGER audit_logger_users
        AFTER INSERT OR UPDATE OR DELETE ON users
        FOR EACH ROW EXECUTE PROCEDURE audit_if_modified_func();
weitendorf 11 hours ago | parent [-]

It's not that maintaining some kind of audit history is an insurmountable problem, it's that it's both a very common need and the "right" way to do it can take significantly longer to set up than a quick naive approach. Once you take into account batching (many if not all of your pushes to auditable tables that are typically static will happen in batches within that table and across multiple other tables, because you are updating your application logic or implementing some new feature) and the need to not blow up your storage with audit logs by recording diffs, and the need to be able to "replay" or "rollback", etc. it's quite a lot to handle

And again, it's not impossible to do any of that. I just think it seems repeatable enough that it could be a first-party feature, and I would prefer to just enable or configure that feature and move on to other problems.

Going from something like your naive approach which I assume an LLM generated for you, to something productionized enough to bet your business on it, is not always trivial.