▲ | solidasparagus 3 days ago | |||||||||||||||||||||||||
Nice work! There is a gap when it comes to writing single-machine, concurrent CPU-bound python code. Ray is too big, pykka is threads only, builtins are poorly abstracted. The syntax is also very nice! But I'm not sure I can use this even though I have a specific use-case that feels like it would work well (high-performance pure Python downloading from cloud object storage). The examples are a bit too simple and I don't understand how I can do more complicated things. I chunk up my work, run it in parallel and then I need to do a fan-in step to reduce my chunks - how do you do that in Pyper? Can the processes have state? Pure functions are nice, but if I'm reaching for multiprocess, I need performance and if I need performance, I'll often want a cache of some sort (I don't want to pickle and re-instantiate a cloud client every time I download some bytes for instance). How do exceptions work? Observability? Logs/prints? Then there's stuff that is probably asking too much from this project, but I get it if I write my own python pipeline so it matters to me - rate limiting WIP, cancellation, progress bars. But if some of these problems are/were solved and it offers an easy way to use multiprocessing in python, I would probably use it! | ||||||||||||||||||||||||||
▲ | pyper-dev 2 days ago | parent | next [-] | |||||||||||||||||||||||||
Great feedback, thank you. We'll certainly be working on adding more examples to illustrate more complex use cases. One thing I'd mention is that we don't really imagine Pyper as a whole observability and orchestration platform. It's really a package for writing Python functions and executing them concurrently, in a flexible pattern that can be integrated with other tools. For example, I'm personally a fan of Prefect as an observability platform-- you could define pipelines in Pyper then wrap it in a Prefect flow for orchestration logic. Exception handling and logging can also be handled by orchestration tools (or in the business logic if appropriate, literally using try... except...) For a simple progress bar, tqdm is probably the first thing to try. As it wraps anything iterable, applying it to a pipeline might look like:
| ||||||||||||||||||||||||||
▲ | halfcat 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||
> I don't want to pickle and re-instantiate a cloud client every time I download some bytes for instance Have you tried multiprocessing.shared_memory to address this? | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | globular-toast 3 days ago | parent | prev [-] | |||||||||||||||||||||||||
Do you really need to reinvent the wheel every time for parallel workloads? Just learn GNU parallel and write single-threaded code. Concurrency in general isn't about parallelism. It's just about doing multiple things at the same time. | ||||||||||||||||||||||||||
|