| ▲ | lairv 2 hours ago | |||||||||||||||||||
I would agree if not for the fact that polars is not compatible with Python multiprocessing when using the default fork method, the following script hangs forever (the pandas equivalent runs):
Using thread pool or "spawn" start method works but it makes polars a pain to use inside e.g. PyTorch dataloader | ||||||||||||||||||||
| ▲ | skylurk an hour ago | parent | next [-] | |||||||||||||||||||
You are not wrong, but for this example you can do something like this to run in threads:
(comm_subplan_elim is important) | ||||||||||||||||||||
| ▲ | ritchie46 an hour ago | parent | prev | next [-] | |||||||||||||||||||
Python 3.14 "spawns" by default. However, this is not a Polars issue. Using "fork" can leave ANY MUTEX in the system process invalid (a multi-threaded query engine has plenty of mutexes). It is highly unsafe and has the assumption that none of you libraries in your process hold a lock at that time. That's an assumption that's not PyTorch dataloaders to make. | ||||||||||||||||||||
| ▲ | schmidtleonard an hour ago | parent | prev [-] | |||||||||||||||||||
I can't believe parallel processing is still this big of a dumpster fire in python 20 years after multi-core became the rule rather than the exception. Do they really still not have a good mechanism to toss a flag on a for loop to capture embarrassing parallelism easily? | ||||||||||||||||||||
| ||||||||||||||||||||