> Neural accelerators to get prompt prefill time down.

Apple Neural Engine is a thing already, with support for multiply-accumulate on INT8 and FP16. AI inference frameworks need to add support for it.

> this setup can support up to 4 Mac devices because each Mac must be connected to every other Mac!!

Do you really need a fully connected mesh? Doesn't Thunderbolt just show up as a network connection that RDMA is ran on top of?

▲

pdpi 15 hours ago | parent | next [-]

> Do you really need a fully connected mesh? Doesn't Thunderbolt just show up as a network connection that RDMA is ran on top of?

If you daisy chain four nodes, then traffic between nodes #1 and #4 eat up all of nodes #2 and #3's bandwidth, and you eat a big latency penalty. So, absent a switch, the fully connected mesh is the only way to have fast access to all the memory.

▲

rbanffy 3 hours ago | parent [-]

Can’t you make bandwidth reservations and optimise data location to prefer comms between directly connected nodes over one or two-hop paths?

	▲	KeplerBoy 3 hours ago \| parent [-]
		Sure, one could think of some kind of pipeline parallelism where you only need a fast transfer to the next step in the model and that would boost throughput but not increase model size.

▲

fooblaster 16 hours ago | parent | prev | next [-]

Might be helpful if they actually provided a programming model for ANE that isn't onnx. ANE not having a native development model just means software support will not be great.

	▲	sroussey 13 hours ago \| parent [-]
		onnx supports CoreML, is that how?

▲

liuliu 16 hours ago | parent | prev | next [-]

They were talking about neural accelerators (a silicon piece on GPU): https://releases.drawthings.ai/p/metal-flashattention-v25-w-...

▲

csdreamer7 15 hours ago | parent | prev | next [-]

> Apple Neural Engine is a thing already, with support for multiply-accumulate on INT8 and FP16. AI inference frameworks need to add support for it.

Or, Apple could pay for the engineers to add it.

	▲	ls612 15 hours ago \| parent [-]
		Apple already paid software engineers to add Tensorflow support for the ANE hardware.

▲

solarkraft 12 hours ago | parent | prev [-]

How much of an improvement can be expected here? It seems to me that in general most potential is pretty quickly realized on Apple platforms.