Timing attack on general purpose processor
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


sci.crypt archive

Timing attack on general purpose processor

From: <cedric.lauradoux@inria.fr>
Date: Tue May 31 2005 - 09:45:45 CEST

Okay my original post was on the linux kernel mailing list. (I wanted to
know the opinion of people about my work).
First of all the purpose of my work is to deal with timing attack on GPP.

I will first take a look on Tsunoo attack which is similar to Bernstein
attack. Those 2 attacks relies on internal collisions attacks of christof
paar (ches 2004). The Hit ratio of code is increased when internal
collision occured (we have a similar effect on 8051 microcontroler but
with power consumption). Then we can try to distinguish collision which
indicate a correlation between the key and plaintext. We will need to
gather a lot of data as we measure different things than in Paar paper.
The underlying mathematics are the generalized birthday paradox. As we
measure hit ratio the attack depends on the cache parameter :
-cache size
-cache line
-cache associativity
Exactly a huge number of miss guarantee that we don't have a internal
collision or near collision.

Then how to defeat such attack without big time penalty ?
I have 5 solutions in mind:
- replace lookup table by instructions. This is the obvious solution that
everybody allready think about. This is not possible for hard timing
constraint application (like server).

- add dummy access until there is enough noise that nobody will be able
to distinguish a collision case from another event. This can be a solution
if the penalty is not huge. We can have a table with size twice the cache
size and make random access to it. The hit probability is 1/2. But we need
to handle random number. This can be done with HAVEGE or something similar
to HAVEGE with great efficiency. Unfortunatly I don't completely believe
in this solution as the dummy array may interact with the lookup table and
maybe generate new misses.

- Warmup technique consist in garantee the presence of lookup table in the
cache. If the table are too big this technique won't work.

- The attack is relying on the possibility to distinguish a special event
from any onther. Then we have a threshold that allow us detect collision.
My idea is now to make the threshold also trigger on none collision event.
To achieve this a propose to use prefetch. Prefetch allow us to indicate
the CPU to load data into the cache. Prefecth is an important technique in
code optimization. We are going to hide memory latency. The countermeasure
will work as follow. We first prefecth the table. We will have to wait the
full memory latency to do the first memory access. But for the other, the
load of the data has been hidden by the first. I can show you data I have
plot to illustrate this fact. Again if the table are too big, the
technique won't work. But on X86 I think this is THE SOLUTION.

- The last solution is a bit complicated. We can increase the size
of the table by adding dummy data. We will have to produce a more
complicated access function but the hit/miss ratio could be constant.
I have just think about it, I don't know if it is feasible.

Okay, one last thing that I want to pointed is the fact that those attacks
can be transform into power analysis attack if we consider Digital Signal
Processor or embedded processor with cache. As any out package's processor
will imply a huge power consumption. In this case the attack will be
easier (we can measure things in a more precise way).
Received on Thu Sep 29 21:39:16 2005