I had designed a PCI chip. Our driver team reported that the PC would hard hang under certain conditions. They suspected a bug in the chip.
Armed with a PCI analyzer, I figured out that it was a race condition where the firmware on the chip would raise an interrupt to the PC. And they would clear the interrupt right at the moment where the firmware wanted to issue a new interrupt.
If performed in the wrong order, the PCI interrupt stayed high even after the PC thought it was already serviced.
The solution was to just switch around 2 lines of code in the firmware, but it took two weeks to figure that out.
Not that long, but it was an insidiously subtle mistake.
Armed with a PCI analyzer, I figured out that it was a race condition where the firmware on the chip would raise an interrupt to the PC. And they would clear the interrupt right at the moment where the firmware wanted to issue a new interrupt.
If performed in the wrong order, the PCI interrupt stayed high even after the PC thought it was already serviced.
The solution was to just switch around 2 lines of code in the firmware, but it took two weeks to figure that out.
Not that long, but it was an insidiously subtle mistake.