Remix.run Logo
hbogert 6 hours ago

It stands out, because it didn't sell. Which is weird because there were some pretty big pros about using them. The latency for updating 1 byte was crazy good. Some databases or journals for something like zfs really benefited from this.

amluto 5 hours ago | parent | next [-]

Intel did a spectacularly poor job with the ecosystem around the memory cells. They made two plays, and both were flops.

1. “Optane” in DIMM form factor. This targeted (I think) two markets. First, use as slower but cheaper and higher density volatile RAM. There was actual demand — various caching workloads, for example, wanted hundreds of GB or even multiple TB in one server, and Optane was a route to get there. But the machines and DIMMs never really became available. Then there was the idea of using Optane DIMMs as persistent storage. This was always tricky because the DDR interface wasn’t meant for this, and Intel also seems to have a lot of legacy tech in the way (their caching system and memory controller) and, for whatever reason, they seem to be barely capable of improving their own technology. They had multiple serious false starts in the space (a power-supply-early-warning scheme using NMI or MCE to idle the system, a horrible platform-specific register to poke to ask the memory controller to kindly flush itself, and the stillborn PCOMMIT instruction).

2. Very nice NVMe devices. I think this was more of a failure of marketing. If they had marketed a line of SSDs that, coupled with an appropriate filesystem, could give 99% fsync latency of 5 microseconds and they had marketed this, I bet people would have paid. But they did nothing of the sort — instead they just threw around the term “Optane” inconsistently.

These days one could build a PCM-backed CXL-connected memory mapped drive, and the performance might be awesome. Heck, I bet it wouldn’t be too hard to get a GPU to stream weights directly off such a device at NVLink-like speeds. Maybe Intel should try it.

orion138 5 hours ago | parent [-]

One of the many problems was trying to limit the use of Optane to Intel devices. They should have manufactured and sold Optane memory and let other players build on top of it at a low level.

amluto 4 hours ago | parent [-]

> Optane memory

Which “Optane memory”? The NVMe product always worked on non-Intel. The NVDIMM products that I played with only ever worked on a very small set of rather specialized Intel platforms. I bet AMD could have supported them about as easily as Intel, and Intel barely ever managed to support them.

wtallis 4 hours ago | parent [-]

The consumer "Optane memory" products were a combination of NVMe and Intel's proprietary caching software, the latter of which was locked to Intel's platforms. They also did two generations of hybrid Optane+QLC drives that only worked on certain Intel platforms, because they ran a PCIe x2+x2 pair of links over a slot normally used for a single X2 or x4 link.

Yes, the pure-Optane consumer "Optane memory" products were at a hardware level just small, fast NVMe drives that could be use anywhere, but they were never marketed that way.

myself248 4 hours ago | parent | next [-]

Exactly. I happen to have all AMD sitting around here, and buying my first Optane devices was a gamble, because I had no idea if they'd work. Only reason I ever did, is they got cheap at one point and I could afford the gamble.

That uncertainty couldn't have done the market any favors.

amluto 4 hours ago | parent | prev [-]

I feel like this is proving my point. You can’t read “Optane” and have any real idea of what you’re buying.

Also… were those weird hybrid SSDs even implemented by actual hardware, or were they part of the giant series of massive kludges in the “Rapid Storage” family where some secret sauce in the PCIe host lied to the OS about what was actually connected so an Intel driver could replace the OS’s native storage driver (NVMe, AHCI, or perhaps something worse depending on generation) to implement all the actual logic in software?

It didn’t help Intel that some major storage companies started selling very, very nice flash SSDs in the mean time.

wtallis 4 hours ago | parent [-]

> were those weird hybrid SSDs even implemented by actual hardware, or were they part of the giant series of massive kludges

They were definitely part of the series of massive kludges. But aside from the Intel platforms they were marketed for, I never found a PCIe host that could see both of the NVMe devices on the drive. Some hosts would bring up the x2 link to the Optane half of the drive, some hosts would bring up the x2 link to the QLC half of the drive, but I couldn't find any way to get both links active even when the drive was connected downstream of a PCIe switch that definitely had hardware support for bifurcation down to x2 links. I suspect that with appropriate firmware hacking on the host side, it may have been possible to get those drives fully operational on a non-Intel host.

ksec 5 hours ago | parent | prev | next [-]

>Which is weird....

It isn't weird at all. I would be surprised if it ever succeed in the first place.

Cost was way too high. Intel not sharing the tech with others other than Micron. Micron wasn't committed to it either, and since unused capacity at the Fab was paid by Intel regardless they dont care. No long term solution or strategy to bring cost down. Neither Intel or Micron have a vision on this. No one wanted another Intel only tech lock in. And despite the high price, it barely made any profits per unit compared to NAND and DRAM which was at the time making historic high profits. Once the NAND and DRAM cycle went down again cost / performance on Optane wasn't as attractive. Samsung even made some form of SLC NAND that performs similar to Optane but cheaper, and even they end up stopped developing for it due to lack of interest.

amluto 2 hours ago | parent | next [-]

A ways back, I wrote a sort of database that was memory-mapped-file backed (a mistake, but I didn’t know that at the time), and I would have paid top dollar for even a few GB of NVDIMMs that could be put in an ordinary server and could be somewhat straightforwardly mounted as a DAX filesystem. I even tried to do some of the kernel work. But the hardware and firmware was such a mess that it was basically a lost cause. And none of the tech ever seemed to turn into an actual purchasable product. I’m a bit suspicious that Intel never found product-market fit in part because they never had a credible product on the NVDIMM side.

Somewhere I still have some actual battery-backed DIMMs (DRAM plus FPGA interposer plus awkward little supercapacitor bundle) in a drawer. They were not made by Intel, but Intel was clearly using them as a stepping stone toward the broader NVDIMM ecosystem. They worked on exactly one SuperMicro board, kind of, and not at all if you booted using UEFI. Rebooting without doing the magic handshake over SMBUS [0] first took something like 15 minutes, which was not good for those nines of availability.

[0] You can find my SMBUS host driver for exactly this purpose on the LKML archives. It was never merged, in part, because no one could ever get all the teams involved in the Xeon memory controller to reach any sort of agreement as to who owned the bus or how the OS was supposed to communicate without, say, defeating platform thermal management or causing the refresh interval to get out of sync with the DIMM temperature, thus causing corruption.

I’m suspicious that everything involved in Optane development was like this.

deepsquirrelnet 5 hours ago | parent | prev | next [-]

I worked at Micron in the SSD division when Optane (originally called crosspoint “Xpoint”) was being made. In my mind, there was never a real serious push to productize it. But it’s not clear to me whether that was due to unattractive terms of the joint venture or lack of clear product fit.

There was certainly a time when it seemed they were shopping for engineers opinions of what to do with it, but I think they quickly determined it would be a much smaller market anyway from ssds and didn’t end up pushing on it too hard. I could be wrong though, it’s a big company and my corner was manufacturing and not product development.

chrneu 4 hours ago | parent | next [-]

I worked at Intel for a while and might be able to explain this.

There were/are often projects that come down from management that nobody thinks are worth pursuing. When i say nobody, it might not just be engineers but even say 1 or 2 people in management who just do a shit roll out. There are a lot of layers of Intel and if even one layer in the Intel Sandwich drag their feet it can kill an entire project. I saw it happen a few times in my time there. That one specific node that intel dropped the ball on kind of came back to 2-3 people in one specific department, as an example.

Optane was a minute before I got there, but having been excited about it at the time and somewhat following it, that's the vibe I get from Optane. It had a lot of potential but someone screwed it up and it killed the momentum.

osnium123 4 hours ago | parent | next [-]

Are you referring to the Intel 10nm struggles in your reference to 2-3 people?

empiricus 3 hours ago | parent | prev | next [-]

This is actually insane. Do you mean 2-4 people in one department basically killed Intel? Roll to disbelief.

LASR 2 hours ago | parent | next [-]

Yes this is pretty common in large enterprise-ey tech companies that are successful. There are usually a small group of vocal members that have a strong conviction and drive to make a vision a reality. This is contrary to popular belief that large companies design by committee.

Of course it works exceptionally well when the instinct turns out to be right. But can end companies if it isn’t.

wtallis 3 hours ago | parent | prev [-]

It's somewhat plausible that a small group of people in one department were responsible for the bad bets that made their 10nm process a failure. But it was very much a group effort for Intel to escalate that problem into the prolonged disaster. Management should have stopped believing the undeliverable promises coming out of their fab side after a year or two, and should have started much sooner to design chips targeting fab processes that actually worked.

3 hours ago | parent | prev [-]
[deleted]
rjsw 2 hours ago | parent | prev [-]

A friend was working at Micron on a rackmount network server with a lot of flash memory, I didn't ask at the time what kind of flash it used. The project was cancelled when nearly finished.

jauntywundrkind 5 hours ago | parent | prev [-]

Cost was fantastically cheap, if you take into account that Optane is going to live >>10x longer than a SSD.

For a lot of bulk storage, yes, you don't have frequently changing data. But for databases or caches, that are under heavy load, optane was not only far faster, but if looking at life-cycle costs, way way less.

mapt 2 hours ago | parent | next [-]

Write endurance of the drive would be measured in TBW, and TLC flash kept adding enough 3D layers to stay cheap enough, quickly enough, that Optane never really beat their pricing per TBW to make a practical product.

I have to wonder if it isn't usable for some kind of specialized AI workflow that would benefit from extremely low latency reads but which is isn't written often, at this point. Perhaps integrated in a GPU board.

zozbot234 an hour ago | parent | next [-]

Optane practical TBW endurance is way higher than that of even TLC flash, never mind QLC or PLC which is the current standard for consumer NAND hardware. It even seems to go way beyond what's stated on the spec sheet. However, while Optane excels for write-heavy workloads (not read-heavy, where NAND actually performs very well) these are also power-hungry which is a limitation for modern AI workflow.

jauntywundrkind an hour ago | parent | prev [-]

The extra capacity of modern SSD is a good point, especially now that we have 100TB+ SSD.

But Optane still offered 100 DWPD (drive writes per day), up to 3.2TB. Thats still just so many more DWPD than flash ssd. A Kioxia CM8V for example will do 12TB at 3 DWPD. The net TBW is still 10x apart.

You can get back to high endurance with SLC drives like the Solidigm p7-p5810, but you're back down to 1.6TB and 50 DWPD, so, 1/4 the Intel P5800X endurance, and worse latencies. I highly suspect the drive model here is a homage, and in spite of being much newer and very expensive, the original is still so much better in so many ways. https://www.solidigm.com/content/solidigm/us/en/products/dat...

You also end up paying for what I assume is a circa six figure drive, if you are substituting DWPD with more capacity than you need. There's something elegant about being able to keep using your cells, versus overbuying on cells with the intent to be able to rip through them relatively quickly.

PunchyHamster 26 minutes ago | parent | prev | next [-]

So instead of replacing every 5 years you replace every 5 years because if you need that level of performance you're replacing servers every 5 years anyway

wtallis 4 hours ago | parent | prev [-]

Optane was in the market during a time when the mainstream trend in the SSD industry was all about sacrificing endurance to get higher capacity. It's been several years, and I'm not seeing a lot of regrets from folks who moved to TLC and QLC NAND, and those products are more popular than ever.

The niche that could actually make use of Optane's endurance was small and shrinking, and Intel had no roadmap to significantly improve Optane's $/GB which was unquestionably the technology's biggest weakness.

bombcar 6 hours ago | parent | prev | next [-]

It feels like everyone figured out what to do with them and how just about when they stopped making them.

timschmidt 5 hours ago | parent [-]

Same for the Larabee / Knights architecture. Would sure be fun to play around with a 500 core Knights CPU with a couple TB of optane for LLM inference.

Intel's got an amazing record of axing projects as soon as they've done the hard work of building an ecosystem.

zozbot234 5 hours ago | parent [-]

> 500 core

The newest fully E-core based Xeon CPUs have reached that figure by now, at least in dual-socket configs.

timschmidt 5 hours ago | parent [-]

Yup. And high end GPU compute now has on-package HBM like Knight's had a decade ago, and those new Intel CPUs are finally shipping with AVX reliably again. We lost a decade for workloads that would benefit from both.

mort96 2 hours ago | parent | prev | next [-]

I never understood what they're meant to do. Intel seemed to picture some future where RAM is persistent; but they were never close to fast enough to replace RAM, and the option to reboot in order to fix some weird state your system has gotten itself into is a feature of computers, not a problem to work around.

thesz 3 hours ago | parent | prev | next [-]

In "databases and journals" you rarely update just one byte, you do a transaction that updates data, several indexes and metadata. All of that needs to be atomic.

Power failure can happen in between any of "1 byte updates with crazy latencies." However small latency is, power failure is still faster. Usually, there is a write ahead or some other log that alleviates the problem, this log is usually written in streaming fashion.

What is good, though, is that "blast radius" [1] of failure is smaller than usual - failed one byte write rarely corrupts more that one byte or cache line. SQLite has to deal with 512 (and even more) bytes long possible corruptions on most disks, with Optane it is not necessarily so. So, less data to copy, scan, etc.

[1] https://sqlite.org/psow.html

PunchyHamster 24 minutes ago | parent [-]

It's not. You won't be writing one byte, ever (even if you had layers that actually supported less-than-block writes), because the overhead of instruction would be massive and you'd be murdering both latency and bandwidth for anything non-trivial

epistasis 6 hours ago | parent | prev | next [-]

When most people are running databases on AWS RDS, or on ridiculous EBS drives with insanely low throughput and latency, it makes sense to me.

There are very few applications that benefit from such low latency, and if one has to go off the standard path of easy, but slow and expensive and automatically backup up, people will pick the ease.

Having the best technology performance is not enough to have product market fit. The execution required from the side of executives at Intel is far far beyond their capability. They developed a platform and wanted others to do the work of building all the applications. Without that starting killer app, there's not enough adoption to build an ecosystem.

amluto 4 hours ago | parent [-]

> There are very few applications that benefit from such low latency

Basically any RDBMS? MySQL and Postgres both benefit from high performance storage, but too many customers have moved into the cloud where you can’t get NVMe-like performance for durable storage for anything remotely close to a worthwhile price.

epistasis 4 hours ago | parent [-]

I'm saying that there are very few downstream applications that use databases that benefit from reducing latency beyond the slow performance of the cloud. Running your database on VMs or baremetal gives better performance, but almost no applications built on databases bother to do it.

cogman10 5 hours ago | parent | prev | next [-]

IMO, the reason they didn't sell is the ideal usage for them is pairing them with some slow spinning disks. The issue Optane had is that SSD capacity grew dramatically while the price plummeted. The difference between Optane and SSDs was too small. Especially since the M.2 standard proliferated and SSDs took advantage of PCI-E performance.

I believe Optane retained a performance advantage (and I think even today it's still faster than the best SSDs) but SSDs remain good enough and fast enough while being a lot cheaper.

The ideal usage of optane was as a ZIL in ZFS.

zozbot234 5 hours ago | parent | next [-]

That may have been the ideal usage back in the day, but ideal usage now is just for setting up swap. Write-heavy workloads are king with Optane, and threshing to swap is the prototypical example of something that's so write-heavy it's a terrible fit for NAND. Optane might not have been "as fast as DRAM" but it was plenty close enough to be fit for purpose.

mort96 2 hours ago | parent [-]

That would be fine if I could put it in an M.2 slot. But all my computers already have RAM in their RAM slots, and even if I had a spare RAM slot, I don't know that I'd trust the software stack to treat one RAM slot as a drive...

And their whole deal was making RAM persistent anyway, which isn't exactly what I want.

zozbot234 2 hours ago | parent [-]

Optane M.2-format hardware exists.

saxonww 5 minutes ago | parent | next [-]

Iirc it wasn't great because higher power == more heat though

mort96 2 hours ago | parent | prev [-]

Interesting, all I ever saw advertised was that weird persistent kinda slow RAM stick. Does the M.2 version just show up as a normal block device or is that too trying to be persistent RAM?

exmadscientist 5 hours ago | parent | prev | next [-]

> The ideal usage of optane was as a ZIL in ZFS.

It was also the best boot drive money could buy. Still is, I think, though other comments in the thread ask how it compares against today's best, which I'd also love to see.

gozzoo 4 hours ago | parent [-]

This concept was very popular back in the days when computers used to boot from HDD, but now it doesn't make much sense. I wouldn't notice If my laptop boots for 5 sec instead of 10.

exmadscientist 4 hours ago | parent [-]

At the time of their introduction Optane drives were noticeably faster to boot your machine than even the fastest available Flash SSD. So in a workstation with multiple hard drives installed anyway, buying one to boot off of made decent sense.

If they had been cheaper, I think they'd have been really, really popular.

bushbaba 5 hours ago | parent | prev [-]

Not just capacity but SSD speeds also improved to the point it was good enough for many high memory workloads.

zozbot234 5 hours ago | parent | prev | next [-]

Optane didn't sell because they focused on their weird persistent DIMM sticks, which are a nightmare for enterprise where for many ordinary purposes you want ephemeral data that disappears as soon as you cut power. Thet should have focused on making ordinary storage and solving the interconnect bandwidth and latency problems differently, such as with more up-to-date PCIe standards.

hrmtst93837 3 hours ago | parent | next [-]

PCIe was a bottleneck in consumer boxes, but that wasn't the whole problem. Optane's low latency and write endurance looked great on paper, yet once you put it behind SSD controllers and file systems built around NAND assumptions, a lot of the upside got shaved off before users ever saw it.

"Just make it a faster SSD" was never a business. The DIMMs were weird, sure, but the bigger issue was that Optane made the most sense when software treated storage and memory as one tier, and almost nobody was going to rewrite kernels, DBs, and apps for a product that cost more than flash and solved pain most buyers barely felt.

PunchyHamster 23 minutes ago | parent [-]

> and file systems built around NAND assumptions, a lot of the upside got shaved off before users ever saw it.

What file systems ? Most common one you'd find would be ext4 or XFS and neither of them are

jauntywundrkind 5 hours ago | parent | prev [-]

I don't think that would be my main complaint. Sticking optane in a dimm was just awkward as hell. You now have different bits of memory with very different characteristics, & you lose a ton of bandwidth.

If CXL was around at the time it would have been such a nice fit, allowing for much lower latency access.

It also seems like in spite of the bad fit, there were enough regular options drives, and they were indeed pretty incredible. Good endurance, reasonable price (and cheap as dirt if you consider that endurance/lifecycle cost!), some just fantastic performance figures. My conclusion is that alas there just aren't many people in the world who are serious about storage performance.

tayo42 4 hours ago | parent [-]

Can Linux differentiate that different dimms are different? Or does it see it all as one big memory space still?

wmf 2 hours ago | parent [-]

Yes, Linux was aware of the difference via ACPI tables.

p-e-w 6 hours ago | parent | prev [-]

Optane was a victim of its own hype, such as “entirely new physics”, or “as fast as RAM, but persistent”. The reality felt like a failure afterwards even though it was still revolutionary, objectively speaking.