CuPy has been available for years and has always worked great. The article is about the next wave of Python-oriented JIT toolchains, that will allow writing actual GPU kernels in a Pythonic-style instead of calling an existing precompiled GEMM implementation in CuPy (like in that snippet) or even JIT-ing CUDA C++ kernels from a Python source, that has also been available for years: https://docs.cupy.dev/en/stable/user_guide/kernel.html#raw-k...

▲ almostgotcaught 3 months ago | parent [-]

it's funny - people around here really do not have a clue about the GPU ecosystem even though everyone is always talking about AI:

> The article is about the next wave of Python-oriented JIT toolchains

the article is content marketing (for whatever) but the actual product has literally has nothing to do with kernels or jitting or anything

https://github.com/NVIDIA/cuda-python

literally just cython bindings to CUDA runtime and CUB.

for once CUDA is aping ROCm:

https://github.com/ROCm/hip-python

▲ dragonwriter 3 months ago | parent | next [-]

The mistake you seem to be making is confusing the existing product (which has been available for many years) with the upcoming new features for that product just announced at GTC, which are not addressed at all on the page for the existing product, but are addressed in the article about the GTC announcement.

▲

almostgotcaught 3 months ago | parent [-]

> The mistake you seem to be making is confusing the existing product

i'm not making any such mistake - i'm just able to actually read and comprehend what i'm reading rather than perform hype:

> Over the last year, NVIDIA made CUDA Core, which Jones said is a “Pythonic reimagining of the CUDA runtime to be naturally and natively Python.”

so the article is about cuda-core, not whatever you think it's about - so i'm responding directly to what the article is about.

> CUDA Core has the execution flow of Python, which is fully in process and leans heavily into JIT compilation.

this is bullshit/hype about Python's new JIT which womp womp womp isn't all that great (yet). this has absolutely nothing to do with any other JIT e.g., the cutile kernel driver JIT (which also has absolutely nothing to do with what you think it does).

▲

dragonwriter 3 months ago | parent | next [-]

> i'm just able to actually read and comprehend what i'm reading rather than perform hype:

The evidence of that is lacking.

> so the article is about cuda-core, not whatever you think it's about

cuda.core (a relatively new, rapidly developing, library whose entire API is experimental) is one of several things (NVMath is another) mentioned in the article, but the newer and as yet unreleased piece mentioned in the article and the GTC announcement, and a key part of the “Native Python” in the headline, is the CuTile model [0]:

“The new programming model, called CuTile interface, is being developed first for Pythonic CUDA with an extension for C++ CUDA coming later.”

> this is bullshit/hype about Python's new JIT

No, as is is fairly explicit in the next line after the one you quote, it is about the Nvidia CUDA Python toolchain using in-process compilation rather than relying on shelling out to out-of-process command-line compilers for CUDA code.

[0] The article only has fairly vague qualitative description of what CuTile is, but (without having to watch the whole talk from GTC), one could look at this tweet for a preview of what the Python code using the model is expected to look like when it is released: https://x.com/blelbach/status/1902113767066103949?t=uihk0M8V...

▲

almostgotcaught 3 months ago | parent [-]

> No, as is is fairly explicit in the next line after the one you quote, it is about the Nvidia CUDA Python toolchain using in-process compilation rather than relying on shelling out to out-of-process command-line compilers for CUDA code.

my guy what i am able to read, which you are not, is the source and release notes. i do not need to read tweets and press releases because i know what these things actually are. here are the release notes

> Support Python 3.13

> Add bindings for nvJitLink (requires nvJitLink from CUDA 12.3 or above)

> Add optional dependencies on CUDA NVRTC and nvJitLink wheels

https://nvidia.github.io/cuda-python/latest/release/12.8.0-n...

do you understand what "bindings" and "optional dependencies on..." means? it means there's nothing happening in this library and these are... just bindings to existing libraries. specifically that means you cannot jit python using this thing (except via the python 3.13 jit interpreter) and can only do what you've always already been able to do with eg cupy (compile and run C/C++ CUDA code).

EDIT: y'all realize that

1. calling a compiler for your entire source file

2. loading and running that compiled code

is not at all a JIT? y'all understand that right?

	▲	squeaky-clean 3 months ago \| parent \| next [-]
		> my guy what i am able to read, which you are not, is the source and release notes. i do not need to read tweets and press releases because i know what these things actually are. here are the release notes Those aren't the release notes for the native python thing being announced. CuTile has not been publicly released yet. Based on what the devs are saying on Twitter it probably won't be released before the SciPy 2025 conference in July.
	▲	musicale 3 months ago \| parent \| prev \| next [-]
		JIT as an adjective means just-in-time, as opposed to AOT, ahead-of-time. What Nvidia discussed at GTC was a software stack that will enable you to generate new CUDA kernels dynamically at runtime using Python API calls. It is a just-in-time (runtime, dynamic) compiler system rather than an ahead-of-time (pre-runtime, static) compiler.
	▲	saagarjha 3 months ago \| parent \| prev \| next [-]
		cuTile is basically Nvidia’s Triton (no, not that Triton, OpenAI’s Triton) competitor. It takes your Python code and generates kernels at runtime. CUTLASS has a new Python interface that does the same thing.
	▲	wahnfrieden 3 months ago \| parent \| prev [-]
		[flagged]

▲

squeaky-clean 3 months ago | parent | prev [-]

Isn't the main announcement of the article CuTile? Which has not been released yet.

Also the cuda-core JIT stuff has nothing to do with Python's new JIT, it's referring to integrating nvJitLink with python, which you can see an example of in cuda_core/examples/jit_lto_fractal.py

▲ ashvardanian 3 months ago | parent | prev | next [-]

In case someone is looking for some performance examples & testimonials, even on RTX 3090 vs a 64-core AMD Epy/Threadripper, even a couple of years ago, CuPy was a blast. I have a couple of recorded sessions with roughly identical slides/numbers:

  - San Francisco Python meetup in 2023: https://youtu.be/L9ELuU3GeNc?si=TOp8lARr7rP4cYaw
  - Yerevan PyData meetup in 2022: https://youtu.be/OxAKSVuW2Yk?si=5s_G0hm7FvFHXx0u

Of the more remarkable results:

  - 1000x sorting speedup switching from NumPy to CuPy.
  - 50x performance improvements switching from Pandas to CuDF on the New York Taxi Rides queries.
  - 20x GEMM speedup switching from NumPy to CuPy.

CuGraph is also definitely worth checking out. At that time, Intel wasn't in as bad of a position as they are now and was trying to push Modin, but the difference in performance and quality of implementation was mind-boggling.

▲ ladberg 3 months ago | parent | prev | next [-]

The main release highlighted by the article is cuTile which is certainly about jitting kernels from Python code

▲

almostgotcaught 3 months ago | parent [-]

> main release

there is no release of cutile (yet). so the only substantive thing that the article can be describing is cuda-core - which it does describe and is a recent/new addition to the ecosystem.

man i can't fathom glazing a random blog this hard just because it's tangentially related to some other thing (NV GPUs) that clearly people only vaguely understand.

	▲	throwaway314155 3 months ago \| parent [-]
		christ man lighten the fuck up. there's zero need to be _so_ god damn patronizing and disrespectful.

▲ yieldcrv 3 months ago | parent | prev [-]

I just want to see benchmarks. is this new one faster than CuPy or not