| ▲ | SigmundA 7 hours ago |
| Postgresql uses a process per connection model and it has no way to serialize a query plan to some form that can be shared between processes, so the time it takes to make the plan including JIT is very important. Most other DB's cache query plans including jitted code so they are basically precompiled from one request to the next with the same statement. |
|
| ▲ | zaphirplane 7 hours ago | parent | next [-] |
| What do you mean ? Cause the obvious thing is a shared cache and if there is one thing the writers of a db know it is locking |
| |
| ▲ | SigmundA 6 hours ago | parent [-] | | Sharing executable code between processes it not as easy as sharing data. AFAIK unless somethings changed recently PG shares nothing about plans between process and can't even share a cached plan between session/connections. | | |
| ▲ | _flux 5 hours ago | parent | next [-] | | Write the binary to a file, call it `libquery-id1234.so`, and link that to whichever processes that need it? | | |
| ▲ | vladich an hour ago | parent | next [-] | | Won't work well if it executes 20k+ queries per second. Filesystem will be a bottleneck among other things. | |
| ▲ | SigmundA 2 hours ago | parent | prev [-] | | Might want to take a look at some research like this [1] that goes over the issues: "This obvious drawback of the current software architecture motivates our work: sharing JIT
code caches across applications. During the exploration of this idea, we have encountered several
challenges. First of all, most JIT compilers leverage both runtime context and profile information
to generate optimized code. The compiled code may be embedded with runtime-specific pointers,
simplified through unique class-hierarchy analysis, or inlined recursively. Each of these "improve-
ments" can decrease the shareability of JIT compiled code." Anythings doable here with enough dev time. Would be nice if PG could just serialize the query plan itself maybe just as an SO along with non-process specific executable code that then has to be dynamically linked again in other processes. 1. https://dl.acm.org/doi/10.1145/3276494 |
| |
| ▲ | llm_nerd 6 hours ago | parent | prev [-] | | Executable code is literally just data that you mark as executable. It did the JIT code, and the idea that it can't then share it between processes is incomprehensible. I was actually confused by this submission as it puts so much of an emphasis on initial compilation time, when every DB (apparently except for pgsql) caches that result and shares it/reuses it until invalidation. Invalidation can occur for a wide variety of reasons (data composition changing, age, etc), but still the idea of redoing it on every query, where most DBs see the same queries endlessly, is insane. | | |
| ▲ | vladich 2 hours ago | parent | next [-] | | The emphasis on compilation time there is because the JIT provider that comes with Postgres (LLVM-based) is broken in that particular area. But you're right, JITed code can be cached, if some conditions are met (it's position independent, for one). Not all JIT providers do that, but many do. Caching is on the table, but if your JIT-compilation takes microseconds, caching could be rather a burden in many cases. Still for some cases useful. | |
| ▲ | SigmundA 2 hours ago | parent | prev [-] | | No a lot of jitted code has pointers to addresses specific to that process which makes no sense in another process. To make code shareable between processes takes effort and will have tradeoff in performance since it is not specialized to the process. If the query plan where at least serializable which is more like a AST then at least that part could be reused and then maybe have jitted code in each processes cached in memory that the plan can reference by some key. DB's like MSSQL avoid the problem because they run a single OS process with multiple threads instead. This is also why it can handle more connections easily since each connection is not a whole process. |
|
|
|
|
| ▲ | hans_castorp 6 hours ago | parent | prev [-] |
| > and it has no way to serialize a query plan to some form that can be shared between processes https://www.postgresql.org/docs/current/parallel-query.html "PostgreSQL can devise query plans that can leverage multiple CPUs in order to answer queries faster." |
| |
| ▲ | SigmundA 6 hours ago | parent [-] | | Nothing to do with plan caching, thats just talking about plan execution of parallel operations which is that thread or process based in PG? If process based then they can send small parts of plan across processes. | | |
| ▲ | hans_castorp 6 hours ago | parent [-] | | Ah, didn't see the caching part. Plans for prepared statements are cached though. | | |
| ▲ | SigmundA 2 hours ago | parent | next [-] | | Yes if the client manually prepares the statement it will be cached for just that connection because in PG a connection is a process, but it won't survive from one connection to the next even in same process. Other databases like MSSQL have prepared statements but they are rarely used now days since plan caching based on query text was introduced decades ago. | |
| ▲ | AlisdairO 3 hours ago | parent | prev [-] | | Only on a per-connection basis |
|
|
|