1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
Note that the UNROLL option makes the 'inner' des loop unroll all 16 rounds
instead of the default 4.
RISC1 and RISC2 are 2 alternatives for the inner loop and
PTR means to use pointers arithmetic instead of arrays.
IRIX 6.2 - R10000 195mhz - cc (-O3 -n32) - UNROLL RISC2 PTR 496,000 3968k/s
solaris 2.5.1 usparc 167mhz?? - SC4.0 - UNROLL RISC1 PTR [1] 475,400 3804k/s
solaris 2.5.1 usparc 167mhz?? - gcc 2.7.2 - UNROLL RISC1 PTR 306,000 2448k/s
linux - pentium 100mhz - gcc 2.7.0 - assember 281,000 2250k/s
NT 4.0 - pentium 100mhz - VC 4.2 - assember 281,000 2250k/s
IRIX 5.3 - R4400 200mhz - gcc 2.6.3 - UNROLL RISC2 PTR 235,300 1882k/s
IRIX 5.3 - R4400 200mhz - cc - UNROLL RISC2 PTR 233,700 1869k/s
NT 4.0 - pentium 100mhz - VC 4.2 - UNROLL RISC1 PTR 191,000 1528k/s
DEC Alpha 165mhz?? - cc - RISC2 PTR [2] 181,000 1448k/s
linux - pentium 100mhz - gcc 2.7.0 - UNROLL RISC1 PTR 158,500 1268k/s
HPUX 10 - 9000/887 - cc - UNROLL [3] 148,000 1190k/s
solaris 2.5.1 - sparc 10 50mhz - gcc 2.7.2 - UNROLL 123,600 989k/s
IRIX 5.3 - R4000 100mhz - cc - UNROLL RISC2 PTR 101,000 808k/s
DGUX - 88100 50mhz(?) - gcc 2.6.3 - UNROLL 81,000 648k/s
solaris 2.4 486 50mhz - gcc 2.6.3 - assember 65,000 522k/s
HPUX 10 - 9000/887 - k&r cc (default compiler) - UNROLL PTR 76,000 608k/s
solaris 2.4 486 50mhz - gcc 2.6.3 - UNROLL RISC2 43,500 344k/s
AIX - old slow one :-) - cc - 39,000 312k/s
Notes.
[1] For the ultra sparc, SunC 4.0 cc -fast -Xa -xO5, running 'des_opts'
gives a speed of 475,000 des/s while 'speed' gives 417,000 des/s.
I belive the difference is tied up in optimisation that the compiler
is able to perform when the code is 'inlined'. For 'speed', the DES
routines are being linked from a library. I'll record the higher
speed since if performance is everything, you can always inline
'des_enc.c'.
[2] Similar to the ultra sparc ([1]), 181,000 for 'des_opts' vs 175,000.
[3] I was unable to get access to this machine when it was not heavily loaded.
As such, my timing program was never able to get more that %30 of the CPU.
This would cause the program to give much lower speed numbers because
it would be 'fighting' to stay in the cache with the other CPU burning
processes.
|