Fixed a llama.cpp bug silently disabling Vulkan GPU on all 32-bit ARM devices

	▲	Fixed a llama.cpp bug silently disabling Vulkan GPU on all 32-bit ARM devices
		2 points by perinban 9 hours ago
		While running llama.cpp on a Samsung Galaxy Watch 4 Classic (armeabi-v7a, Mali G68), I noticed the Vulkan backend was rejecting every quantized MUL_MAT operation despite reporting "33/33 layers offloaded to GPU". Root cause: a missing block size division in tensor stride calculation inside create_tensor() in llama-model-loader.cpp. The wrong stride cascades into ggml_nbytes() overflow, exceeding max_buffer_size on 32-bit where size_t is 32-bit. On 64-bit devices the overflow is silently masked — wrong value but still within GPU memory limits so nobody noticed. Bug has likely been there for years. Fix and context: https://github.com/Perinban/llama.cpp/tree/axon-dev