Age | Commit message (Collapse) | Author |
|
The OPENSSL_cpu_caps() change after the last bump missed a crucial bit:
there is more MD mess in the MI code than anticipated, with the result
that AES is now used without AES-NI on amd64 and i386, hurting machines
that previously greatly benefitted from it.
Temporarily add an internal crypto_cpu_caps_ia32() API that returns the
OPENSSL_ia32cap_P or 0 like OPENSSL_cpu_caps() previously did. This can
be improved after the release.
Regression reported and fix tested by Mark Patruck.
No impact on public ABI or API.
with/ok jsing
PS: Next time my pkg_add feels very slow, I should perhaps not mechanically
blame IEEE 802.11...
|
|
Provide a per architecture crypto_arch.h - this will be used in a similar
manner to bn_arch.h and will allow for architecture specific #defines and
static inline functions. Move the HAVE_AES_* and HAVE_RC4_* defines here.
ok tb@
|
|
These will be used in upcoming changes.
ok tb@
|
|
|
|
|
|
Instead of using HOST_{c2l,l2c} macros, provide and use
crypto_load_le32toh() and crypto_store_htole32(). In some cases just
use htole32() directly.
ok tb@
|
|
This recommits r1.37 of sha512.c, however uses uint8_t * instead of void *
for the crypto_load_* functions and primarily uses const uint8_t * to track
input, only casting to const SHA_LONG64 * once we know that it is suitably
aligned. This prevents the compiler from implying alignment based on type.
Tested by tb@ and deraadt@ on platforms with gcc and strict alignment.
ok tb@
|
|
|
|
All assembly implementations are required to perform their own alignment
handling. In the case of the C implementation, on strict alignment
platforms, unaligned data will be copied into an aligned buffer. However,
most platforms then perform byte-by-byte reads (via the PULL64 macros).
Instead, remove SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA and alignment
handling to sha512_block_data_order() - if the data is aligned then simply
perform 64 bit loads and then do endian conversion via be64toh(). If the
data is unaligned then use memcpy() and be64toh() (in the form of
crypto_load_be64toh()). Overall this reduces complexity and can improve
performance (on aarch64 we get a ~10% performance gain with aligned input
and about ~1-2% gain on armv7), while the same movq/bswapq is generated
for amd64 and movl/bswapl for i386.
ok tb@
|
|
ok tb@
|
|
Various code in libcrypto needs bitwise rotation - rather than defining
different versions across the code base, provide a common set that can
be reused. Any sensible compiler optimises these to a single instruction
where the architecture supports it, which means we can ditch the inline
assembly.
On the chance that we need to provide a platform specific versions, this
follows the approach used in BN where a MD crypto_arch.h header could be
added in the future, which would then provide more specific versions of
these functions.
ok tb@
|
|
It is common to need to store data in a specific endianness - rather than
handrolling and deduplicating code to do this, provide a
crypto_store_htobe64() function that converts from host endian to big
endian, before storing the data to a location with unknown alignment.
ok tb@
|