> Feature Requests

> Concerning schema changes: they desire PostgreSQL to record a history of schema change events, such as adding or removing columns and other DDL operations.

You can do this right now today by using `EVENT TRIGGER`s. You can check out things like Aquameta[0] (if I remember correctly) to see how it's done.

[0] https://github.com/aquametalabs/aquameta

▲ weitendorf a month ago | parent | next [-]

We are building similar functionality for our own postgres setup right now.

Of course, postgres is very powerful and you can implement anything like this in many different ways. But at the same time, maintaining DDL history and tracking major changes to the database is a very common requirement, and unfortunately many people don't realize that until they learned that lesson the hard way.

Relatedly are not DDL changes per-se, but big/important db operations that you want to also keep a record of so that you can look back and understand why something changed. I am not sure if this is the right term, but basically when we update our pricing model or skus, or set custom pricing for someone, we want those updates to be "auditable".

Actually, I think this is a relatively common use case too: a fully relational model often leaves you with a large number of "static" tables that only change when you're making updates to your application. They support the smaller number of big, important, dynamic tables that your application is regularly modifying. If you had the foresight to recognize that you'd likely need to change those static tables in the future, you probably organized them so you could do so by adding new rows. It is not quite a DDL change but it is a big, risky change to your application logic that only happens rarely, and you basically just want to keep a list of all those big changes in case things get messed up or you find yourself unable to make sense of older data.

▲ cryptonector a month ago | parent | next [-]

There's lots of extensions that automatically create audit tables and record history in them. The trick is to a) create audit tables for all _existing_ tables, b) create an event trigger so you can have audit tables created automatically for any future new tables. Here's an example of how to do it: https://github.com/twosigma/postgresql-contrib/blob/master/a...

Another thing you might do is to go with a schema that follows the event shipping pattern. In this pattern you have the "truth" held in insert-only tables (deletes and updates not allowed), then turn those "event" tables into ones you can query naturally using VIEWs, MATERIALIZED VIEWs, or live tables that you update with triggers on the event tables. Then your event tables _are your history/audit_ tables.

▲

weitendorf a month ago | parent [-]

Yeah, I guess what I mean is, it's good for very common use cases to have fully supported features in the product itself even if there are third party tools to handle it for you or I already know how to implement it myself.

I have been working on database features and functionality that I felt I moderately to fully understood and just needed to sit down and implement in the next week, for close to 3 weeks now.

> In this pattern you have the "truth" held in insert-only tables (deletes and updates not allowed), then turn those "event" tables into ones you can query naturally using VIEWs, MATERIALIZED VIEWs, or live tables that you update with triggers on the event tables.

This is almost exactly what I'm doing, with an additional versioning column (primary key on (id, version_num)). After I did that I realized that it'd be better to correlate changes with a push_id too because if some operations didn't modify all ids then I wouldn't be able to easily tell which versions were updated at the same time. But then I realized most of my "auditable pushes" would be operations on 3-4 related tables and not just an individual table, so push_ids would be performed on all tables. And also, since not every push modifies every value, it makes sense to model pushes as additions + diffs to the existing table. But then after several pushes constructing the MATERIALIZED VIEW of active values becomes rather complex because I have to convert a sparse tree of diffs across multiple tables into a flat table recursively...

So yeah it would be pretty nice for postgres to have something that mostly just works to audit changes at either the user, function, or table level.

	▲	cryptonector a month ago \| parent [-]
		> it's good for very common use cases to have fully supported features in the product itself There are a _lot_ of incredibly useful extensions to PG. What you find useful and necessary someone else might find to be unnecessary bloat. Over time the industry's demands will become clear. Another issue is that different users want auditing done differently, and so it might be difficult for PG to have one solution that fits all use-cases.

▲ aftbit a month ago | parent | prev [-]

    CREATE TABLE public.audit (
        id uuid NOT NULL,
        created_time timestamp without time zone DEFAULT now() NOT NULL,
        schema_name text NOT NULL,
        table_name text NOT NULL,
        record_id uuid NOT NULL,
        user_name text,
        action text NOT NULL,
        old_data jsonb,
        new_data jsonb
    );

    CREATE OR REPLACE FUNCTION audit_if_modified_func() RETURNS TRIGGER AS $body$
    DECLARE
        v_old_data JSONB;
        v_new_data JSONB;
    BEGIN
        IF (TG_OP = 'UPDATE') THEN
            v_old_data := to_jsonb(OLD.*);
            v_new_data := to_jsonb(NEW.*);
            IF (TG_TABLE_NAME::TEXT = 'users') THEN
                v_old_data = v_old_data - 'last_login_time';
                v_new_data = v_new_data - 'last_login_time';
            END IF;
            IF (v_old_data <> v_new_data) THEN
                INSERT INTO audit (id,record_id,schema_name,table_name,user_name,action,old_data,new_data)
                    VALUES (uuid_generate_v4(), NEW.id, TG_TABLE_SCHEMA::TEXT, TG_TABLE_NAME::TEXT,
                            session_user::TEXT, substring(TG_OP,1,1), v_old_data, v_new_data);
            END IF;
            RETURN NEW;
        ELSIF (TG_OP = 'DELETE') THEN
            v_old_data := to_jsonb(OLD.*);
            INSERT INTO audit (id, record_id,schema_name,table_name,user_name,action,old_data)
            VALUES (uuid_generate_v4(), OLD.id, TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_old_data);
            RETURN OLD;
        ELSIF (TG_OP = 'INSERT') THEN
            v_new_data := to_jsonb(NEW.*);
            INSERT INTO audit (id, record_id,schema_name,table_name,user_name,action,new_data)
            VALUES (uuid_generate_v4(), NEW.id, TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_new_data);
            RETURN NEW;
        ELSE
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - Other action occurred: % at %',TG_OP,now();
            RETURN NULL;
        END IF;
    
    EXCEPTION
        WHEN data_exception THEN
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [DATA EXCEPTION] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM;
            RETURN NULL;
        WHEN unique_violation THEN
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [UNIQUE] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM;
            RETURN NULL;
        WHEN OTHERS THEN
            RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [OTHER] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM;
            RETURN NULL;
    END;
    $body$
    LANGUAGE plpgsql
    SECURITY DEFINER;
    
    CREATE TRIGGER audit_logger_accounts
        AFTER INSERT OR UPDATE OR DELETE ON accounts
        FOR EACH ROW EXECUTE PROCEDURE audit_if_modified_func();

    CREATE TRIGGER audit_logger_users
        AFTER INSERT OR UPDATE OR DELETE ON users
        FOR EACH ROW EXECUTE PROCEDURE audit_if_modified_func();

▲

weitendorf a month ago | parent [-]

It's not that maintaining some kind of audit history is an insurmountable problem, it's that it's both a very common need and the "right" way to do it can take significantly longer to set up than a quick naive approach. Once you take into account batching (many if not all of your pushes to auditable tables that are typically static will happen in batches within that table and across multiple other tables, because you are updating your application logic or implementing some new feature) and the need to not blow up your storage with audit logs by recording diffs, and the need to be able to "replay" or "rollback", etc. it's quite a lot to handle

And again, it's not impossible to do any of that. I just think it seems repeatable enough that it could be a first-party feature, and I would prefer to just enable or configure that feature and move on to other problems.

Going from something like your naive approach which I assume an LLM generated for you, to something productionized enough to bet your business on it, is not always trivial.

	▲	aftbit a month ago \| parent [-]
		>Going from something like your naive approach which I assume an LLM generated for you, to something productionized enough to bet your business on it, is not always trivial. I'll admit, this hurts to hear a bit. This is real production code I've been running in a webapp that initially launched targeting PostgreSQL 9.5. This is not some crazy web-scale situation - these specific tables only experience at most 1 qps of write load - but this has solved the problem for this business as it stands here.

▲ tudorg a month ago | parent | prev | next [-]

We use event triggers both in pgroll [0] and pgstream [1] to capture DDL changes. It's a feature of pgroll that it captures the schema migration history, regardless if you do the changes via pgroll or with another tool. And pgstream uses it to put the schema changes in the logical replication stream.

One thing to be aware is that on many Postgres DBaaS EVENT TRIGGERS are not allowed, because they generally require superuser. But RDS and Aurora do support them, we (Xata) support them of course, and I think Supabase is working on adding support for them.

[0]: https://github.com/xataio/pgroll

[1]: https://github.com/xataio/pgstream

▲ erichanson a month ago | parent | prev [-]

Thanks for remembering Aquameta and stay tuned! Some cool new stuff coming out shortly.

▲

cryptonector a month ago | parent [-]

Hey, I see you've added a web server. How does it compare to PostgREST?

▲

erichanson a month ago | parent [-]

Big difference is it's implemented in pl/pgsql so as Aquameta evolves, there's no external deps. There's a function called endpoint.request(http_verb, URL, post vars) and then it does the rest in plpgsql. A thin Go daemon just takes the request and throws it at the function.

It does a lot of the same stuff PostgREST does. Automatic REST interface to any database, but it also hosts static resources, and dynamic mapping of URL templates to functions.

Needs a rewrite though. That's probably the next big dev push. Right now we're rewriting the system catalog (meta) and the data VCS.

▲

cryptonector a month ago | parent | next [-]

Sounds really good. Your vision is excellent, and I admire it and your work.

BTW, it's very important to get the details of authentication and authorization right. PostgREST does: it validates JWTs and exposes the claims to the SQL application via set_config()s, just like every detail of every request, and it does a `SET` to set the role to whatever the authenticated user is, though this could be done by a SECURITY DEFINER function instead that looks at the claims.

	▲	erichanson a month ago \| parent [-]
		Real talk. https://www.postgresql.org/message-id/flat/894C0144-5BCC-41C... set_user extension is AFICT essential for scaling user-per-role.

▲

cryptonector a month ago | parent | prev [-]

I'm tempted to build something similar (because I can't use GPL libraries at $WORK and I also can't use PostgREST for reasons), but somewhere between what you do in Aquameta and what PostgREST does, with as much logic as possible in SQL or PlPgSQL.

Basically I'd have a thin HTTP/REST layer that does a set_config() for every request header and q-param, and which compiles the resource path to a SELECT (for GETs) or a DML (for POST/PUT/DELETE/PATCH), with similar restrictions to PostgREST, though perhaps a bit more liberal.

For DMLs the local-part would have to be just a table/view, or a function call (what to do with the request body in the case of a function call? reject it).

For a GET... I've ideas. The simplest would be to have a very simple "language" to compile to SQL, not too unlike PostgREST. The craziest one would be to have a transliteration of literal-value-free (certainly no strings, no quotes, no arrays, no row values, maybe only just integer and boolean literals) SQL statements to local-part, and the code to convert it back to a SELECT would parse the query and make sure it uses no tables/views/functions outside the public schema (and if functions, only ones marked pure), with q-params as... query parameters -- no SQL injection attacks here.

As in PostgREST the SQL code in the database (but not the caller's SQL) could set_config() things like response headers and status-code. Authentication would be done via SECURITY DEFINER functions that consume whatever Negotiate or Bearer token in the Authorization: header and SET the session role (which would always get RESET before the next request).

Except for the need for a parser for validating any queries (for the "crazy" design, if I go there) this would make the Java/Go/Rust/Whatever code quite simple, and w/o requiring any dynamic SQL in the DB.