![]() |
Available news archives:
comp.lang.tcl
-
comp.lang.python
-
comp.security.firewalls
-
sci.crypt -
comp.lang.php -
comp.lang.javascript
|
|
sci.crypt archiveSalsa20 and SSE2?
From: Paul Rubin <//phr.cx@NOSPAM.invalid>
Date: Mon Aug 29 2005 - 03:44:44 CEST
I wonder if anyone (DJB?) has tried coding Salsa20 using SSE2
Say the input is x0,x1,...,x15. Let a,b,c,d be SSE2 registers holding
The algorithm looks like (pseudocode):
# initial permutation
# ten pairs of rounds
# rearrange variables (parallel assignment)
# round 2, 4, ....
# (final permutation and add back to input omitted)
The initial permutation is just a bunch of 32-bit mov instructions
The quarter-round operations look like:
# b ^= (a+d) <<< 7
and similarly for the other quarter-rounds. Note that we needed 3
Rearrangement of the variables uses the PSHUFD instruction:
# after the first set of quarter-rounds
There's a similar arrangement after the second set of quarter rounds.
So we've used a total of seven sse2 registers:
a,b,c,d = the data being worked on
which works out rather nicely (ia32+sse has 8 sse registers, the
Maybe some cycles can be saved by reordering somewhat. I'm not enough
Anyway, has this already been tried and is it a speedup on the usual
If not, I might give it a try, if no one else does.
|