| ▲ | zozbot234 16 hours ago | ||||||||||||||||
> Neural accelerators to get prompt prefill time down. Apple Neural Engine is a thing already, with support for multiply-accumulate on INT8 and FP16. AI inference frameworks need to add support for it. > this setup can support up to 4 Mac devices because each Mac must be connected to every other Mac!! Do you really need a fully connected mesh? Doesn't Thunderbolt just show up as a network connection that RDMA is ran on top of? | |||||||||||||||||
| ▲ | pdpi 15 hours ago | parent | next [-] | ||||||||||||||||
> Do you really need a fully connected mesh? Doesn't Thunderbolt just show up as a network connection that RDMA is ran on top of? If you daisy chain four nodes, then traffic between nodes #1 and #4 eat up all of nodes #2 and #3's bandwidth, and you eat a big latency penalty. So, absent a switch, the fully connected mesh is the only way to have fast access to all the memory. | |||||||||||||||||
| |||||||||||||||||
| ▲ | fooblaster 16 hours ago | parent | prev | next [-] | ||||||||||||||||
Might be helpful if they actually provided a programming model for ANE that isn't onnx. ANE not having a native development model just means software support will not be great. | |||||||||||||||||
| |||||||||||||||||
| ▲ | liuliu 16 hours ago | parent | prev | next [-] | ||||||||||||||||
They were talking about neural accelerators (a silicon piece on GPU): https://releases.drawthings.ai/p/metal-flashattention-v25-w-... | |||||||||||||||||
| ▲ | csdreamer7 15 hours ago | parent | prev | next [-] | ||||||||||||||||
> Apple Neural Engine is a thing already, with support for multiply-accumulate on INT8 and FP16. AI inference frameworks need to add support for it. Or, Apple could pay for the engineers to add it. | |||||||||||||||||
| |||||||||||||||||
| ▲ | solarkraft 12 hours ago | parent | prev [-] | ||||||||||||||||
How much of an improvement can be expected here? It seems to me that in general most potential is pretty quickly realized on Apple platforms. | |||||||||||||||||