| ▲ | behnamoh 17 hours ago |
| My expectations from M5 Max/Ultra devices: - Something like DGX QSFP link (200Gb/s, 400Gb/s) instead of TB5. Otherwise, the economies of this RDMA setup, while impressive, don't make sense. - Neural accelerators to get prompt prefill time down. I don't expect RTX 6000 Pro speeds, but something like 3090/4090 would be nice. - 1TB of unified memory in the maxed out version of Mac Studio. I'd rather invest in more RAM than more devices (centralized will always be faster than distributed). - +1TB/s bandwidth. For the past 3 generations, the speed has been 800GB/s... - The ability to overclock the system? I know it probably will never happen, but my expectation of Mac Studio is not the same as a laptop, and I'm TOTALLY okay with it consuming +600W energy. Currently it's capped at ~250W. Also, as the OP noted, this setup can support up to 4 Mac devices because each Mac must be connected to every other Mac!! All the more reason for Apple to invest in something like QSFP. |
|
| ▲ | wartywhoa23 8 hours ago | parent | next [-] |
| Would you please mind leaving some RAM to remain available for purchase at an affordable price for us mere mortals ? 1Tb for what, like, "Come on AI, make the humankind happy now"? /"s" |
| |
| ▲ | RunSet an hour ago | parent [-] | | AI bubble will do wonders for used RAM prices when it pops. | | |
| ▲ | ddalex an hour ago | parent [-] | | There is no popping. We cannot have enough compute for the forseeable future. | | |
|
|
|
| ▲ | Dylan16807 16 hours ago | parent | prev | next [-] |
| > +1TB/s bandwidth. For the past 3 generations, the speed has been 800GB/s... M4 already hit the necessary speed per channel, and M5 is well above it. If they actually release an Ultra that much bandwidth is guaranteed on the full version. Even the smaller version with 25% fewer memory channels will be pretty close. We already know Max won't get anywhere near 1TB/s since Max is half of an Ultra. |
|
| ▲ | Marsymars 13 hours ago | parent | prev | next [-] |
| > - The ability to overclock the system? I know it probably will never happen, but my expectation of Mac Studio is not the same as a laptop, and I'm TOTALLY okay with it consuming +600W energy. Currently it's capped at ~250W. I don't think the Mac Studio has a thermal design capable of dissipating 650W of heat for anything other than bursty workloads. Need to look at the Mac Pro design for that. |
| |
| ▲ | jauntywundrkind 12 hours ago | parent [-] | | The thermal design is irrelevant, and people saying they want insane power density are, in my personal view, deluded ridiculous individuals who understand very very little. Overclocking long ago was an amazing saintly act, milking a lot of extra performance that was just there waiting, without major downsides to take. But these days, chips are usually already well tuned. You can feed double or tripple the power into the chip with adequate cooling, but the gain is so unremarkable. +10% +15% +20% is almost never going to be a make or break difference for your work, and doing so at double or triple the power budget is an egregious waste. So many of the chips about are already delivered at way higher than optimum efficiency, largely for bragging rights. The exponential decay of efficiency you keep pushing for is an anti-quest, is against good. The absolute performance wins are ridiculous to seek. In almost all cases. If your problem will not scale and dumping a ton of power into one GPU or one cpu socket is all you got, fine, your problem is bad and you have to deal with that. But for 90% of people, begging for more power proces you don't actually know jack & my personal recommendation is that all such points of view deserve massive down voting by anyone with half a brain. Go back to 2018 and look at Matthew Dillon on DragobflyBSD underpowering the heck out of their 2990wx ThreadRipper. Efficiency just soars as you tell the chip to take less power. The situation has not improved! Efficiency skyrockets today at least as much as it did then by telling chips not to go all out. Good chips behave & reward. I believe Apple competent enough to thoroughly disabuse this position that this chip would be far better if we could dump 2x 3x more power into it. Just a fools position, beyond a joke, imo. https://apollo.backplane.com/DFlyMisc/threadripper.txt | | |
| ▲ | vablings 13 minutes ago | parent | next [-] | | It's been funny to see people move from overclocking to underclocking. Especially for the older AMD gpus. On the RX480 a slight underclock would cut the power usage almost in half! | |
| ▲ | nottorp an hour ago | parent | prev | next [-] | | > Overclocking long ago was an amazing saintly act, milking a lot of extra performance that was just there waiting, without major downsides to take. Back when you bought a 233 Mhz chip with ram at 66 Mhz, ran the bus at 100 Mhz which also increased your ram speed if it could handle it, and everything was faster. > But these days, chips are usually already well tuned. You can feed double or tripple the power into the chip with adequate cooling, but the gain is so unremarkable. +10% +15% +20% is almost never going to be a make or break difference for your work 20% in synthetic benchmarks maybe, or very particular loads. Because you only overclock the CPU these days so anything hitting the ram won't even go to 20%. | | |
| ▲ | mapt an hour ago | parent [-] | | Initially, thermal throttling was a safety valve for a failure condition. A way to cripple performance briefly so as not to let the magic blue smoke out. Only a terrible PC would be thermal throttling out of the box; Only neglectful owners who failed to clean filters, had thermal throttling happening routinely. That's not how it works any more. Many of these CPUs both at the high end and even a few tiers down from the top, are thermal throttling whenever they hit 100% utilization. I'm thinking of Intel's last couple generations particularly. They're shipped with pretty good heatsinks, but not nearly good enough to run stock clocks on all cores at once. Instead, smarter grades of thermal throttling are designed for for routine use to balance loads. Better heatsinks (and watercooling) help a bit, but not enough, you end up hitting a wall; Only the risky process of delidding seems to push further. We're running into limitations on how well a conventional heatsink can transfer the heat from a limited contact patch. GPUs seem to have more effective heatsinks, and are bottlenecked mostly by power requirements. The 600 watt monsters are already melting cables that aren't in perfect condition. |
| |
| ▲ | Marsymars 10 hours ago | parent | prev | next [-] | | Oh, we're largely on the same page there. I was actually looking for benchmarks earlier this week along those lines - ideally covering the whole slate of Arrow Lake processors running at various TDPs. Not much available on the web though. | |
| ▲ | ssl-3 9 hours ago | parent | prev | next [-] | | I learned a lot about underclocking, undervolting, and computational power efficiency during my brief time in the ethereum mining[1] shenanigans. The best ROI was with the most-numerous stable computations at the lowest energy expense. I'd tweak individual GPUs' various clocks and volts to optimize this. I'd even go so far as to tweak fan speed ramps on the cards themselves (those fans don't power themselves! There's whole Watts to save there!). I worked to optimize the efficiency of even the power from the wall. But that was a system that ran, balls-out, 24/7/365. Or at least it ran that way until it got warmer outside, and warmer inside, and I started to think about ways to scale mining eth in the basement vs. cooling the living space of the house to optimize returns. (And I never quite got that sorted before they pulled the rug on mining.) And that story is about power efficiency, but: Power efficiency isn't always the most-sensible goal. Sometimes, maximum performance is a better goal. We aren't always mining Ethereum. Jeff's (quite lovely) video and associated article is a story about just one man using a stack of consumer-oriented-ish hardware in amusing -- to him -- ways, with local LLM bots. That stack of gear is a personal computer. (A mighty-expensive one on any inflation-adjusted timeline, but what was constructed was definitely used as a personal computer.) Like most of our personal computers (almost certainly including the one you're reading this on), it doesn't need to be optimized for a 24/7 100% workload. It spends a huge portion of its time waiting for the next human input. And unlike mining Eth in the winter in Ohio: Its compute cycles are bursty, not constant, and are ultimately limited by the input of one human. So sure: I, like Jeff, would also like to see how it would work when running with the balls[2] running further out. For as long as he gets to keep it, the whole rig is going to spend most of its time either idling or off, anyway. So it might as well get some work done when a human is in front of it, even if each token costs more in that configuration than it does OOTB. It theoretically can even clock up when being actively-used (and suck all the power), and clock back down when idle (and resume being all sleepy and stuff). That's a well-established concept that [eg] Intel has variously called SpeedStep and/or Turbo Boost -- and those things work for bursty workloads, and have worked in that way for a very long time now. [1]: Y'all can hate me for being a small part of that problem. It's allowed. [2]: https://en.wikipedia.org/wiki/Centrifugal_governor | | |
| ▲ | jermaustin1 2 hours ago | parent [-] | | I did Crypto Mining as an alternative to heating. In my centrally cool apartment my office was the den which had the air return. So my mining rig ran RIGHT in front of that, it sucked the heat out and pushed it all over the house. Then summer came, and in Texas the AC can barely keep up to begin with. So then my GPUs became part of a render farm instead. |
| |
| ▲ | sandworm101 7 hours ago | parent | prev [-] | | >> people saying they want insane power density are, in my personal view, deluded ridiculous individuals who understand very very little. Or they are simply not-rich people who cannot afford to purchase extra hardware to run in parallel. Electricity is cheap. GPUs are not. So i want to get every ounce of power out of the precious few GPUs i can afford to own. (And dont point at clouds. Running AI on someone else's cloud is like telling a shadetree mechanic to rent a car instead of fixing his owm.) |
|
|
|
| ▲ | wtallis 14 hours ago | parent | prev | next [-] |
| > Also, as the OP noted, this setup can support up to 4 Mac devices because each Mac must be connected to every other Mac I do wonder where this limitation comes from, since on the M3 Ultra Mac Studios the front USB-C ports are also Thunderbolt 5, for a total of six Thunderbolt ports: https://www.apple.com/mac-studio/specs/ |
| |
| ▲ | kappuchino 7 hours ago | parent | next [-] | | He corrected that in the comment section of the youtube video. Six is actually the maximum amount. He just didn't want to buy another one. He also published the Benchmarks in Detail and with two/four Macs in Comparison: https://github.com/geerlingguy/beowulf-ai-cluster/issues/17 | | |
| ▲ | re-thc 2 hours ago | parent [-] | | > He just didn't want to buy another one. Wasn’t it loaned ie didn’t buy any at all? Apple should have loaned enough to flex. |
| |
| ▲ | QuantumNomad_ 14 hours ago | parent | prev [-] | | Jeff mentioned in the video that only three of the ports can be used for RDMA. But it’s unclear where that limitation is coming from. | | |
| ▲ | geerlingguy 13 hours ago | parent | next [-] | | From my brief discussion with Exo/Apple, it sounds like that is just a limitation of this initial rollout, but it's not a hardware limitation. Though, I am always leery to recommend any decisions be made over something that's not already proven to work, so I would say don't bet on all ports being able to be used. They very well may be able to though. | |
| ▲ | sroussey 13 hours ago | parent | prev [-] | | I bet there is one piece of silicon per two ports. |
|
|
|
| ▲ | zozbot234 16 hours ago | parent | prev | next [-] |
| > Neural accelerators to get prompt prefill time down. Apple Neural Engine is a thing already, with support for multiply-accumulate on INT8 and FP16. AI inference frameworks need to add support for it. > this setup can support up to 4 Mac devices because each Mac must be connected to every other Mac!! Do you really need a fully connected mesh? Doesn't Thunderbolt just show up as a network connection that RDMA is ran on top of? |
| |
| ▲ | pdpi 15 hours ago | parent | next [-] | | > Do you really need a fully connected mesh? Doesn't Thunderbolt just show up as a network connection that RDMA is ran on top of? If you daisy chain four nodes, then traffic between nodes #1 and #4 eat up all of nodes #2 and #3's bandwidth, and you eat a big latency penalty. So, absent a switch, the fully connected mesh is the only way to have fast access to all the memory. | | |
| ▲ | rbanffy 3 hours ago | parent [-] | | Can’t you make bandwidth reservations and optimise data location to prefer comms between directly connected nodes over one or two-hop paths? | | |
| ▲ | KeplerBoy 3 hours ago | parent [-] | | Sure, one could think of some kind of pipeline parallelism where you only need a fast transfer to the next step in the model and that would boost throughput but not increase model size. |
|
| |
| ▲ | fooblaster 16 hours ago | parent | prev | next [-] | | Might be helpful if they actually provided a programming model for ANE that isn't onnx. ANE not having a native development model just means software support will not be great. | | | |
| ▲ | liuliu 16 hours ago | parent | prev | next [-] | | They were talking about neural accelerators (a silicon piece on GPU): https://releases.drawthings.ai/p/metal-flashattention-v25-w-... | |
| ▲ | csdreamer7 15 hours ago | parent | prev | next [-] | | > Apple Neural Engine is a thing already, with support for multiply-accumulate on INT8 and FP16. AI inference frameworks need to add support for it. Or, Apple could pay for the engineers to add it. | | |
| ▲ | ls612 15 hours ago | parent [-] | | Apple already paid software engineers to add Tensorflow support for the ANE hardware. |
| |
| ▲ | solarkraft 12 hours ago | parent | prev [-] | | How much of an improvement can be expected here? It seems to me that in general most potential is pretty quickly realized on Apple platforms. |
|
|
| ▲ | checker659 12 hours ago | parent | prev | next [-] |
| For a company that has repeatedly ignored macOS, your wishlist seems anything but a pipe dream. QSFP on a mac. Yeah right. If anything, they’ll double down on TB or some nonstandard interconnect. What is a computer? (Although, I do hope with the new work on supporting RDMA, the MLX5 driver shipped with macOS will finally support RDMA for ConnectX NICs) https://kittenlabs.de/blog/2024/05/17/25gbit/s-on-macos-ios/ |
| |
| ▲ | rbanffy 3 hours ago | parent [-] | | QSFP makes sense on a MacPro platform - and might be where Apple chooses to differentiate (one could dream of an M5 Mega, with four chiplets). The Mac Studio is a general purpose compact workstation that doesn’t need ludicrously fast networking beyond what 10Gbe and TB5 offer. It’s already overkill for the vast majority of users. Top configuration Studios are already a niche product. | | |
| ▲ | checker659 an hour ago | parent | next [-] | | Apple already ships an MLX5 driver for ConnectX NICs. | |
| ▲ | justincormack 2 hours ago | parent | prev [-] | | Also given that Apple are using these in their datacenters I think they will ship much more server like hw. |
|
|
|
| ▲ | burnt-resistor 16 hours ago | parent | prev | next [-] |
| Apple has always sucked at properly embracing properly robust tech for high-end gear for markets outside of individual prosumer or creatives. When Xserves existed, they used commodity IDE drives without HA or replaceable PSUs that couldn't compete with contemporary enterprise servers (HP-Compaq/Dell/IBM/Fujitsu). Xserve RAID interconnection half-heartedly used fiber channel but couldn't touch a NetApp or EMC SAN/filer. I'm disappointed Apple has a persistent blindspot preventing them from succeeding in data center-quality gear category when they could've had virtualized servers, networking, and storage, things that would eventually find their way into my home lab after 5-7 years. |
| |
| ▲ | donavanm 13 hours ago | parent | next [-] | | Enterprise never ever mattered, and there arent enough digits available to show your “home lab” use case in the revenue numbers. Xserve, the RAID shelves, and the directory services were kinda there as a half hearted attempt for that late 90-00s AV setup. All of that fell on the cutting room floor once personal devices, esp iphone, was realized. By the time I left in ‘10 the total revenue from mac hardware was like 15% of revenue. Im honestly surprised theres anyone who cared enough to package the business services for mac minis. So if everything else is printing cash for a HUGE addressable consumer market at premium price points why would they try and compete with their own ODMs on more-or-less commodity enterprise gear? | | |
| ▲ | SoftTalker 13 hours ago | parent [-] | | Seems like I remember the main reason Macs survived as a product at all was because you needed one to develop for iOS. That may be an exaggeration but there certainly was a time when Macs were few and far between outside of creative shops. Certainly they were almost unseen in the corporate world, where now they are fairly common at least in laptops. | | |
| ▲ | pjmlp 4 hours ago | parent [-] | | Macs survived because Apple got a cash injection, survived long enough to come out with colorful iMacs with an hockey puck mouse, still running on Mac OS 8, and the iPod. Requiring one for doing iOS development they were already back into the green. | | |
| ▲ | raw_anon_1111 an hour ago | parent [-] | | It’s a myth that the “cash injection” from Microsoft saved Apple. Microsoft gave Apple $250 million. The next quarter Apple turned around and spent $100 million on PowerComputing’s Mac assets. Apple lost over a billion more before it became profitable. The $150 Net wouldn’t have been make or break. Now Microsoft promising to keep Office on the Mac was a big deal | | |
| ▲ | pjmlp 3 minutes ago | parent [-] | | One way or the other, it was a cash injection from Microsoft, after all who paid the salaries from Office developers? Also you're forgetting the part that those announcements gave Apple a good marketing for additional credit from banks. |
|
|
|
| |
| ▲ | PunchyHamster 11 hours ago | parent | prev | next [-] | | For Apple, datacenter stuff is low margin business | | |
| ▲ | spacedcowboy 6 hours ago | parent [-] | | Considering that Apple is moving away from Linux in the datacenter to its own devices, I'm not sure that's the case. The apple machines aren't available to the consumer (they're rack-mounted, dozens of chips per PCB board, custom-made machines) but they're much less power-hungry, just as fast (or more so), much cheaper for them to make rather than buy, and natively support their own ecosystem. Some of the machine-designs that consumers are able to buy seem to have a marked resemblance to the feature-set that the datacenter people were clamouring for. Just saying... | | |
| ▲ | mcculley 2 hours ago | parent | next [-] | | > dozens of chips per PCB board Have there been leaks or something about these internal machines? I am curious to know more. | |
| ▲ | an hour ago | parent | prev [-] | | [deleted] |
|
| |
| ▲ | Terretta 14 hours ago | parent | prev [-] | | > I'm disappointed Apple has a persistent blindspot preventing them from succeeding in ... things that would eventually find their way into my home lab after 5-7 years. I can see the dollar signs in their eyes right now. Aftermarkets are a nice reflection of durable value, and there's a massive one for iPhones and a smaller one for quick flameout startup servers, but not much money in 5 - 7 year old servers. |
|
|
| ▲ | lostmsu 12 hours ago | parent | prev | next [-] |
| > 3090 would be nice They would need 3x speedup over the current generation to approach 3090. A100 that has +- the 3090 compute but 80GB VRAM (so fits LLaMA 70B) does prefill at 550tok/s on a single GPU: https://www.reddit.com/r/LocalLLaMA/comments/1ivc6vv/llamacp... |
| |
| ▲ | doctorpangloss 9 hours ago | parent [-] | | the GB10 is only the same performance as a 3090. gb10 uses way less power. i'm not sure why anyone would buy a mac studio instead of a gb10 machine for this use case. | | |
| ▲ | rbanffy 3 hours ago | parent | next [-] | | > i'm not sure why anyone would buy a mac studio instead of a gb10 For an AI-only use case, the GB10s make sense, but they are only OK as desktop workstations, and I’m not sure for how long DGX OS will be updated, as dedicated AI machines have somewhat short lives. Apple computers, OTOH, have much longer lives, and desktops live the longest. I retired my Mac Mini a year after the machine was no longer getting OS updates, and it was still going strong. | |
| ▲ | villgax 5 hours ago | parent | prev [-] | | it's just people looking to do experiments locally on the main machine rather than just get a dedicated spark, which can be used properly as a headless box than a Mac of which you are at the mercy of system shenanigans albiet still bearable compared to windows |
|
|
|
| ▲ | angoragoats 16 hours ago | parent | prev | next [-] |
| > Also, as the OP noted, this setup can support up to 4 Mac devices because each Mac must be connected to every other Mac!! All the more reason for Apple to invest in something like QSFP. This isn’t any different with QSFP unless you’re suggesting that one adds a 200GbE switch to the mix, which: * Adds thousands of dollars of cost, * Adds 150W or more of power usage and the accompanying loud fan noise that comes with that, * And perhaps most importantly adds measurable latency to a networking stack that is already higher latency than the RDMA approach used by the TB5 setup in the OP. |
| |
| ▲ | fenced_load 16 hours ago | parent | next [-] | | Mikrotik has a switch that can do 6x200g for ~$1300 and <150W. https://www.bhphotovideo.com/c/product/1926851-REG/mikrotik_... | | |
| ▲ | throwaway2037 6 hours ago | parent | next [-] | | Wow, this switch (MikroTik CRS812) is scary good for the price point. A quick Google search fails to find any online vendors with stock. I guess it is very popular! Retail price will be <= 1300 USD. I did some digging to find the switching chip: Marvell 98DX7335 Seems confirmed here: https://cdn.mikrotik.com/web-assets/product_files/CRS812-8DS... And here: https://cdn.mikrotik.com/web-assets/product_files/CRS812-8DS... > Switch chip model 98DX7335
From Marvell's specs: https://www.marvell.com/content/dam/marvell/en/public-collat... > Description: 32x50G / 16x100G-R2 / 8x100G-R4 / 8x200G-R4 / 4x400G-R8
> Bandwidth: 1600Gbps
Again, those are some wild numbers if I have the correct model. Normally, Mikrotik includes switching bandwidth in their own specs, but not in this case. | | |
| ▲ | cess11 5 hours ago | parent [-] | | They are very popular and make quite good products, but as you noticed it can be tricky to find them in stock. Besides stuff like this switch they've also produced pretty cool little micro-switches you can PoE and run as WLAN hotspots, e.g. to distance your mobile user device from some network you don't really trust, or more or less maliciously bridge a cable network through a wall because your access to the building is limited. |
| |
| ▲ | wtallis 15 hours ago | parent | prev | next [-] | | That switch appears to have 2x 400G ports, 2x 200G ports, 8x 50G ports, and a pair of 10G ports. So unless it allows bonding together the 50G ports (which the switch silicon probably supports at some level), it's not going to get you more than four machines connected at 200+ Gbps. | | |
| ▲ | angoragoats 15 hours ago | parent [-] | | As with most 40+GbE ports, the 400Gbit ports can be split into 2x200Gbit ports with the use of special cables. So you can connect a total of 6 machines at 200Gbit. | | |
| ▲ | wtallis 15 hours ago | parent | next [-] | | Ah, good point. Though if splitter cables are an option, then it seems more likely that the 50G ports could be combined into a 200G cable. Marvell's product brief for that switch chip does say it's capable of operating as an 8x 200G or 4x 400G switch, but Mikrotik may need to do something on their end to enable that configuration. | | |
| ▲ | throwaway2037 6 hours ago | parent | next [-] | | I'm not trolling here: Do you think that Marvell sells the chips wholesale buy the vendor buys the feature set (IP/drivers/whatever)? That would allow Marvell to effectively sell the same silicon but segment the market depending upon what buyers needs. Example: A buyer might need a config that is just a bunch of 50GB/s ports and another 100GB/s ports and another a mix. (I'm thinking about blowing fuses in the manuf phase, similar to what AMD and Intel do.) I write this as a complete noob in switching hardware. | | |
| ▲ | wtallis 41 minutes ago | parent [-] | | I think if Marvell were doing that, they would have more part numbers in their catalog. |
| |
| ▲ | angoragoats 4 hours ago | parent | prev [-] | | You’re talking about link aggregation (LACP) here, which requires specific settings on both the switch and client machine to enable, as well as multiple ports on the client machine (in your example, multiple 50Gbps ports). So while it’s likely possible to combine 50Gbps ports like you describe, that’s not what I was referring to. | | |
| ▲ | wtallis 42 minutes ago | parent [-] | | No, I'm not talking about LACP, I'm talking about configuring four 50Gb links on the switch to operate as a single 200Gb link as if those links were wired up to a single QSFP connector instead of four individual SFP connectors. The switch in question has eight 50Gb ports, and the switch silicon apparently supports configurations that use all of its lanes in groups of four to provide only 200Gb ports. So it might be possible with the right (non-standard) configuration on the switch to be able to use a four-way breakout cable to combine four of the 50Gb ports from the switch into a single 200Gb connection to a client device. |
|
| |
| ▲ | sgjohnson 15 hours ago | parent | prev [-] | | Breakout cables typically split to 4. e.g. QSFP28 (100GbE) splits into 4x SFP28s (25GbE each), because QSFP28 is just 4 lanes of SFP28. Same goes for QSFP112 (400GbE). Splits into SFP112s. It’s OSFP that can be split in half, i.e. into QSFPs. | | |
|
| |
| ▲ | angoragoats 15 hours ago | parent | prev [-] | | Cool! So for marginally less in cost and power usage than the numbers I quoted, you can get 2 more machines than with the RDMA setup. And you’ve still not solved the thing that I called out as the most important drawback. | | |
| ▲ | nicky_nickell 15 hours ago | parent [-] | | how significant is the latency hit? | | |
| ▲ | angoragoats 15 hours ago | parent [-] | | The OP makes reference to this with a link to a GitHub repo that has some benchmarks. TCP over Thunderbolt compared to RDMA over Thunderbolt has roughly 7-10x higher latency, ~300us vs 30-50us. I would expect TCP over 200GbE to have similar latency to TCP over Thunderbolt. Put another way, see the graphs in the OP where he points out that the old way of clustering performs worse the more machines you add? I’d expect that to happen with 200GbE also. And with a switch, it would likely be even worse, since the hop to the switch adds additional latency that isn’t a factor in the TB5 setup. | | |
| ▲ | wmf 14 hours ago | parent | next [-] | | You're ignoring RoCE which would have the same or lower latency than RoTB. And I think macOS already supports RoCE. | | | |
| ▲ | Hikikomori 5 hours ago | parent | prev [-] | | Switch probably does cut through so it starts forwarding the frame before its even fully received. |
|
|
|
| |
| ▲ | SoftTalker 12 hours ago | parent | prev [-] | | For RDMA you'd want Infiniband not Ethernet. | | |
| ▲ | johncolanduoni 12 hours ago | parent [-] | | RDMA for new AI/HPC clusters is moving toward ethernet (the keyword to look for is RoCE). Ethernet gear is so much cheaper that you can massively over-provision to make up for some of the disadvantages of asynchronous networking, and it lets your run jobs on hyperscalers (only Azure ever supported actual IB). Most HPC is not latency-sensitive enough that it needs Infiniband’s lower jitter/median, and vendors have mostly caught up on the hardware acceleration front. |
|
|
|
| ▲ | tylerflick 16 hours ago | parent | prev | next [-] |
| > TOTALLY okay with it consuming +600W energy The 2019 i9 Macbook Pro has entered the chat. |
|
| ▲ | dev_l1x_be 14 hours ago | parent | prev [-] |
| Mine is to remove the extreme Macos bloat. |