▲ | apbytes 14 days ago | ||||||||||||||||||||||||||||||||||||||||||||||
When you call a cuda method, it is launched asynchronously. That is the function queues it up for execution on gpu and returns. So if you need to wait for an op to finish, you need to `synchronize` as shown above. `get_current_stream` because the queue mentioned above is actually called stream in cuda. If you want to run many independent ops concurrently, you can use several streams. Benchmarking is one use case for synchronize. Another would be if you let's say run two independent ops in different streams and need to combine their results. Btw, if you work with pytorch, when ops are run on gpu, they are launched in background. If you want to bench torch models on gpu, they also provide a sync api. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | claytonjy 14 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I’ve always thought it was weird GPU stuff in python doesn’t use asyncio, and mostly assumed it was because python-on-GPU predates asyncio. But I was hoping a new lib like this might right that wrong, but it doesn’t. Maybe for interop reasons? Do other languages surface the asynchronous nature of GPUs in language-level async, avoiding silly stuff like synchronize? | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | hnuser123456 14 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Thank you kindly! |