07 November 20245 min read

What side-channel attacks taught me about patience

The first correlation attack I ran did not work. The correlation coefficients were nearly uniform across all 256 key hypotheses for every byte position. I spent most of a day re-reading the ChipWhisperer documentation, then another half-day examining the trigger logic, before I measured the actual problem: the shunt resistor I had soldered into the VCC line was too high in resistance, and the resulting voltage drop was saturating the oscilloscope input on high-activity clock cycles. The signal I was trying to recover was buried under clipping artifacts. The mathematics were correct. The experiment was broken at the level of the physics, and I had spent two days debugging the wrong layer.

What followed was a structured debugging process that has stayed with me. One variable at a time. Characterise the noise floor before touching the analysis. Understand your instrumentation before trusting your data. Lower-resistance shunt. Reduce the oscilloscope input voltage range. Capture a single trace and inspect it visually: does the power waveform have the morphology you would expect from an AES round structure? Does the trigger fire at the right point in the encryption? Does the capture window cover the operation you are targeting? Only when the answers to those questions are clearly yes should you collect traces at scale. The correct sequence is: understand your measurement first, then apply your model. Most of the time wasted in experimental work comes from reversing that order.

The attack that recovered the full 128-bit AES key in 4,800 traces was methodical rather than clever. The insight was not mathematical — Correlation Power Analysis is well-documented and the implementation is straightforward once your measurement is clean. The insight was operational: the failure was not a data quantity problem, it was a data quality problem. Adding more traces against a broken measurement setup does not converge on a result. It produces a larger, equally useless dataset. This distinction matters and is consistently underweighted in how we talk about both offensive techniques and defensive practices. The bottleneck is almost always what you are measuring and how, not how much of it you have collected.

The transfer to threat hunting is direct and underappreciated. Hunting for persistent activity in an environment is structurally similar to recovering a key from noisy power measurements: you have a hypothesis, you have imperfect data, and the temptation is to run the analysis immediately. The analysts who consistently find what others miss are not the ones with the most sophisticated detection logic. They are the ones who, before starting, understood their data sources — coverage gaps, log forwarding latency, timestamp normalisation, what is indexed in the SIEM versus what is collected but unqueried, which hosts produce no telemetry at all. Patience, in both contexts, is methodology operating at a pace slower than the problem seems to demand. It is also the only approach that reliably works.