Remix.run Logo
QuiEgo 3 hours ago

As someone who works with hardware, hard to repo bugs can take months to track down. Your code, the compiler, or the hardware itself (which is often a complex ball of IP from dozens of manufacturers held together with a NoC) could all be a problem. The extra fun bugs are when a bug is due to problems in two or three of them combining together in the perfect storm to make a mega bug that is impossible to reproduce in isolation.

QuiEgo 3 hours ago | parent [-]

Random example: I once worked on a debug where you were not allowed to send zero length packets due to a known HW bug. Okay fine, work around in SW. Turns out there was an HW eviction timer that was disabled. It was connected to a counter that counted sys clk ticks. Turns out it was not disabled entirely properly due to SW bug, so once every 2^32 ticks, it would trigger an evection, and if the queue happened to be empty, it would send a ZLP, which triggered the first bug (hard hang the system in a way that breaks the debugger). There were dozens of ways that could hard hang the system, this was just one. Good luck debugging that in two days.

jeffreygoesto 2 hours ago | parent [-]

We had one where data, interpreted as address (simple C typo before static analysis was common) fell into an unmapped memory region and the PCI controller stalled trying to get a response, thereby also halting the internal debugging logic and JTAG just stopped forever (PPC603 core). Each time you'd hit the bug, the debugger was thrown off.