![]() |
Available news archives:
comp.lang.tcl
-
comp.lang.python
-
comp.security.firewalls
-
sci.crypt -
comp.lang.php -
comp.lang.javascript
|
|
sci.crypt archiveRe: Salsa20 and SSE2?
From: D. J. Bernstein <djb@cr.yp.to>
Date: Mon Aug 29 2005 - 08:49:46 CEST
Paul Rubin wrote:
The biggest part of the problem is the number of instructions required:
* XMM instructions can't simply add the way that LEA can. They have
* XMM instructions can't simply rotate the way that ROL can. They
* XMM instructions can't even copy in one instruction without serious
Even if you aren't worried about P4 latency, the simple add-rotate-xor
For comparison, traditional non-XMM code takes 26.75 cycles per round on
> Alternatively, any chance of a "salsa40" function that uses 64-bit
The main problem is that the 64-bit speedup is accompanied by a severe
---D. J. Bernstein, Professor, Mathematics, Statistics,
|