Remix clone Hacker News

new | show | ask | jobs Github

	▲	thehk 5 hours ago
		> ESP32-S31 is particularly well suited for edge AI and machine learning workloads, including neural network inference Any way to know what kind of performance one could expect running e.g. a depth anything model on there?
	▲	kcb 7 minutes ago \| parent \| next [-]
		A real example https://github.com/OHF-Voice/micro-wake-word
	▲	mattalex 2 hours ago \| parent \| prev \| next [-]
		Regarding specifically depth anything: You're not running this on a microcontroller. In general, CNNs still reign supreme on microcontrollers since you have a way lower peak memory demand which is what usually kills you. Here in this case you have a couple of _kilobytes_ of SRAM, potentially extendable to a couple of megabytes of PSRAM. Even for small CNNs you often need to do some quite complex interleaving of layers (i.e. running parts of layer 1 and layer 2 in parallel interleaved to take advantage of the downsampling of CNNs) to keep performance and memory impact reasonable (see e.g. https://openreview.net/pdf?id=2O8qbyxH6X). Think more "image classifier" less "run an image to image transformer". For depth anything, a single layer's activation is probably significantly larger than the available SRAM (I think it is (224/16)^2 patches each with activations [48, 96, 192, 384] for depth anything small: You aren't running this.)
	▲	otterdude 5 hours ago \| parent \| prev \| next [-]
		I was wondering this as well. What exactly makes this a good AI chip vs others. Unless they're not listing a major feature in their spec, a dual core 320Mhz microcontroller is not bad but youre not going to be running any kind of vision model on it, at least very fast.
	▲	porridgeraisin 3 hours ago \| parent \| prev \| next [-]
		Memory is the main constraint. You have what, 8mb of psram. Compute wise you can manage. You can do quantisation and run a small 10-15 layer CNN perhaps. Image classification is possible. Keep in mind the channel count and input resolution cannot be high since memory will be a problem. You can maybe do face _detection_, "is my cat on my keyboard" classification as well maybe. Audio, you can do a lot more. Wake word detection happens on _much_ smaller accelerators inside iphones. In this one you can do slightly heavier classifications. Maybe speaker identification "which member of family" or maybe "which dog is barking"
	▲	asadm 2 hours ago \| parent \| prev [-]
		nope. not happening. at most YOLO or mayyybe FastDepth