Remix.run Logo
PaulHoule 5 days ago

The best part I think is "pg_query is special. It doesn’t actually implement parsing SQL. It works by extracting C source code directly from Postgres and wraps it with a nice Rust interface. This allows PgDog to understand all queries that Postgres can."

One of the things that puts metaprogramming on wheels is being able to pull in grammars, schemas and such for external systems. Think how many cool tools we could have if we could do this in a more general way.

thenonameguy 5 days ago | parent | next [-]

> One of the things that puts metaprogramming on wheels is being able to pull in grammars, schemas and such for external systems.

That's what I've been trying to do with: https://github.com/schemamap/schemamap

For a well-constrained Postgres schema, a deterministic SQL compiler can be built (with plenty of logic programming/constraint-solving/hypergraph-analysis) that can integrate arbitrary external systems data.

While this is NP-hard, with some clever use of embeddings and ontologies, and use of every single constraint AST within the DB to reduce the search space, this becomes a feasible problem.

For any Clojurists interested, I've packaged `pg_query`, so you can use it in your applications: https://github.com/schemamap/pg-query-clj

At the moment I'm saving up for the next development cycle, so not only PG->PG schema mappings can be solved for (JSON schema is next!). Hope this sounds interesting :)

Inviz 5 days ago | parent [-]

Json schema layer support sounds interesting. Truth be told I didn’t immediately figure out how your project works

mdaniel 5 days ago | parent | prev | next [-]

I had such high hopes for tree-sitter but once it went all "and then, $CC -c -o" all was lost :-(

jitl 5 days ago | parent [-]

What do you mean?

mdaniel 4 days ago | parent [-]

Unless the system that wishes to consume tree-sitter grammars has access to a linker, it being written in C, and then compiled to machine code, gravely limits the places it can be consumed. That's in contrast to any one of the hundreds of grammar compiler compiler systems that allow targeting a platform of choice without mandating a platform of choice. I do see that tree-sitter alleges to emit wasm, but I am not deep enough in that ecosystem to know if it means "wasm for the browser," or "wasm for extism[1]," or "wasm, but if you already have WASI" or what

1: e.g. https://github.com/1Password/onepassword-sdk-go/blob/v0.2.1/...

boomskats 5 days ago | parent | prev | next [-]

Supabase's postgres LSP works in a similar way iirc.

bri3d 5 days ago | parent [-]

Yes, the same way. It's all based on the extremely useful `https://github.com/pganalyze/libpg_query` project, which is where the "extracted the parser from Postgres" part comes in.

Supabase's LSP also uses tree-sitter for corrections and autocomplete, because one drawback of using the server's source is that pg_query only works on well-formed/executable SQL - when it detects a malformed query, it formulates an error and exits, since that's what you want in an SQL server. So for partially-correct syntax and fill-forward, tree-sitter covers the gaps.

pphysch 5 days ago | parent | prev [-]

I agree. Does anyone know much heavy lifting is done by pg_query in wrapping the Postgres code vs. Postgres in expressing that code in a manner that makes pg_query possible?

ZiiS 5 days ago | parent [-]

Tends to be a matter of opinion. Postgres does not expose the relevant functions, so https://github.com/pganalyze/libpg_query has to do some heavy lifting to convert their source code into a nice library. Conversely, Postgres is very well written code, in an extremely common language, with a stable release cadence, and such a long track record it is seen as the reference implementation for correctly parsing SQL.

lfittl 5 days ago | parent [-]

Yeah, as one of the main authors of libpg_query, I think the primary things that make this easier is that Postgres has good abstractions internally, and the parser works independently from other parts (e.g. the community discourages adding settings that affect parser behavior).

Over the years we've only had to maintain a small set of patches on top of the Postgres source [0], together with some mocks and our libclang-based extraction logic [1]. Of course it would be nice if Postgres just packaged this directly like it packages the client libraries, but there is non-trivial effort involved to do that. From what I recall, the main issue is that error handling and memory allocations work differently in the client-side libraries (and so that would have to either also be moved out of the backend source, or use some other abstraction).

[0]: https://github.com/pganalyze/libpg_query/tree/17-latest/patc...

[1]: https://github.com/pganalyze/libpg_query/blob/17-latest/scri...

dwmcc 4 days ago | parent [-]

Unrelated to your comment but a big fan of pganalyze. Makes Postgres infinitely more grokable for developers who aren’t experts at running and tuning a database. Keep up the good work!

lfittl 4 days ago | parent [-]

Thanks, glad to hear! I like to think that one of the reasons pganalyze is a good product (though there are always parts I'd like to improve, and feedback is always welcome) is because we like to use it ourselves to optimize our own database, where we can :)