Remix.run Logo
OhMeadhbh 13 hours ago

At Amazon we had a bug that was the result of a compiler bug and the behaviour of intel cores being mis-documented. It was intermittent and related to one core occasionally being allowed to access stale data in the cache. We debugged it with a logic analyzer, the commented nginx source and a copy of the C++ 11 spec.

It took longer than 2 days to fix.

amoss 11 hours ago | parent | next [-]

When you work on compilers, all bugs are compiler bugs.

(apart from the ones in the firmware, and the hardware glitches...)

ChrisMarshallNY 12 hours ago | parent | prev | next [-]

I’m old enough to have used ICEs to trace program execution.

They were damn cool. I seriously doubt that something like that, exists outside of a TSMC or Intel lab, these days.

plq 12 hours ago | parent | next [-]

ICE meaning in-circuit emulator in this instance, I assume?

ChrisMarshallNY 8 hours ago | parent [-]

Yeah. Guess it’s kind of a loaded acronym, these days.

Windchaser 2 hours ago | parent | prev [-]

/imagining using an internal combustion engine here

auguzanellato 12 hours ago | parent | prev [-]

What kind of LA did you use to de bug an Intel core?

OhMeadhbh 11 hours ago | parent [-]

The hardware team had some semi-custom thing from intel that spat out (no surprise) gigabytes of trace data per second. I remember much of the pain was in constructing a lab where we could drive a test system at reasonable loads to get the buggy behavior to emerge. It was intermittent so it took use a couple weeks to come up with theories, another couple days for testing and a week of analysis before we came up triggers that allowed us to capture the data that showed the bug. it was a bit of a production.