Age | Commit message (Collapse) | Author |
|
On BN_ULLONG architectures, the C compiler can usually do a decent job
of optimising primitives, however it struggles to see through primitive
calls due to type narrowing. As such, providing explicit versions of
compound primitives can result in the production of more optimal code.
For example, on arm the bn_mulw_addw_addw() primitive can be replaced
with a single umaal instruction, which provides significant performance
gains.
Rather than intermingling #ifdef/#else throughout the header, the
BN_ULLONG defines are pulled up above the normal functions. This also
allows complex compound primitives to be reused. The conditionals have also
been changed from BN_LLONG to BN_ULLONG, since that is what really matters.
ok tb@
|
|
At least one of our bn_mul_words() assembly implementation fails to handle
n = 0 correctly... *sigh*
|
|
|
|
This removes a data dependent timing path from BN_sqr().
ok tb@
|
|
Rework bn_sqr()/bn_sqr_normal() so that it is less convoluted and more
readable. Instead of recomputing values that the caller has already
computed, pass it as an argument. Avoid branching and remove duplication
of variables. Consistently use a_len and r_len naming for lengths.
ok tb@
|
|
|
|
Historically (and currently in OpenSSL), BN_asc2bn() could be called with
NULL, but only for positive numbers. So BN_asc2bn(NULL, "1") would succeed
but BN_asc2bn(NULL, "-1"), would crash. The other *2bn functions return a
length, so accepting a NULL makes some sense since it allows callers to
skip over part of the string just parsed (atoi-style).
For BN_asc2bn() a NULL bn makes no sense because it returns a boolean. The
recent CBS rewrite makes BN_asc2bn(NULL, *) always crash which in turn made
Coverity throw a fit.
Another change of behavior from that rewrite pertains to accidents (or is
it madness?) like -0x-11 and 0x-11 being parsed as decimal -17 (which Ingo
of course spotted and diligently documented). This will be addressed later.
ok jsing
|
|
It returns a length, not a Boolean, so check for 0 explicitly. This is
purely cosmetic.
ok jsing
|
|
|
|
|
|
On some architectures, we can provide an optimised (often single
instruction) count-leading-zero implementation. In order to do this
effectively, provide bn_clzw() as a static inline that can be replaced
by an architecture specific version. The default implementation defers
to the bn_word_clz() function (which may also be architecture specific).
ok tb@
|
|
Provide bn_bitsize(), which performs a constant time scan of a BN in order
to determine the bit size of the BN value. Use this for BN_num_bits() such
that it is no longer dependent on the bn->top value.
ok tb@
|
|
This provides significant performance gains for bn_sqr_comba4() and
bn_sqr_comba8().
|
|
Factor out and optimise the inner loop for Montgomery multiplication,
making use of bn_qwmulw_addqw_addw() to perform Montgomery multiplication
by one word in larger steps. This provides a significant performance gain,
especially on platforms where bn_qwmulw_addqw_addw() is (or can be)
optimised.
ok tb@
|
|
All the functions changed in this commit would silently misbehave if the
return value aliases the modulus, most of the time they would succeed and
return an incorrect result of 0 in that situation. This adjusts all the
functions in BN_mod.c, others and documentation will follow later.
Prompted by a bug report about BN_mod_inverse() by Guido Vranken.
ok jsing
|
|
One problem with OpenSSL error codes is that they tend to be too specific
(another problem is that they are extremely ugly). So add an EINVAL-style
error code. This will be used in an upcoming commit to disallow aliasing
of the 'return value' with the modulus in BN_mod_* functions and should
be applicable elsewhere, outside of this one narrow use case.
ok jsing
|
|
This provides a performance gain across most BN operations.
|
|
This includes bn_qwaddqw(), bn_qwsubqw(), bn_qwmulw_addw() and
bn_qwmulw_addqw_addw(). These can typically be optimised on architectures
that have a reasonable number of general purpose registers.
ok tb@
|
|
This traded local copies of CTASSERT() to the one in crypto_internal.h.
This change was backed out due to SHA-512 breakage on STRICT_ALIGNMENT
architectures still using Fred Flintstone's gcc without asm sha512.
Original commit message:
Use crypto_internal.h's CTASSERT()
Now that this macro is available in a header, let's use that version
rather than copies in several .c files.
discussed with jsing
|
|
The somewhat strange calculation m = a^{-1} (mod m) can return 0. This
breaks because of BN_nnmod() having delicate semantics of which variable
can be reused. BN_nnmod(a, a, m, ctx) works and the library relies on that.
Here, the code ends up doing BN_nnmod(m, a, m, ctx) and this doesn't work.
If the result of the initial BN_mod() is negative, then BN_nnmod() will
return 0.
Problem reported by Guido Vranken in
https://github.com/openssl/openssl/issues/21110
This code is well covered by regress, but it does not currently have
explicit test coverage. Such will be added soon.
ok beck jsing
|
|
This results in bn_mul_comba4() and bn_mul_comba8() requiring ~30% less
instructions than they did previously.
|
|
|
|
This gives us more readable and safer code. There are two intentional
changes to behaviour - firstly, all three functions zero any BN that was
passed in, prior to doing any further processing. This means that a passed
BN is always in a known state, regardless of what happens later. Secondly,
BN_asc2bn() now fails on NULL input, rather than crashing. This brings its
behaviour inline with BN_dec2bn() and BN_hex2bn().
ok tb@
|
|
|
|
Now that this macro is available in a header, let's use that version
rather than copies in several .c files.
discussed with jsing
|
|
This is more accurate and improves readability a bit. Apart from a comment
tweak this is sed + knfmt (which resulted in four wrapped lines).
Discussed with beck and jsing
|
|
The behavior of the BPSW primality test for numbers > 2^64 is not very
well understood. While there is no known composite that passes the test,
there are heuristics that indicate that there are likely infinitely many.
Therefore it seems appropriate to harden the test. Having a settable
number of MR rounds before doing a version of BPSW is also the approach
taken by Go's primality check in math/big.
This adds a new implementation of the old MR test that runs before running
the strong Lucas test. I like to imagine that it's slightly cleaner code.
We're effectively at about twice the cost of what we had a year ago. In
addition, it adds some non-determinism in case there actually are false
positives for the BPSW test.
The implementation is straightforward. It could easily be tweaked to use
the additional gcds in the "enhanced" MR test of FIPS 186-5, but as long
as we are only going to throw away the additional info, that's not worth
much.
This is a first step towards incorporating some of the considerations in
"A performant misuse-resistant API for Primality Testing" by Massimo and
Paterson. Further work will happen in tree. In particular, there are plans
to crank the number of Miller-Rabin tests considerably so as to have a
guaranteed baseline. The manual will be updated shortly.
positive feedback beck
ok jsing
|
|
Anything taken to the power of 0 is 1, and then reduced mod 1 or mod -1 it
will be 0. If "anything" includes 0 or not is a matter of convention, but
it should not depend on the sign of the modulus...
Reported by Guido Vranken
ok jsing (who had the same diff)
|
|
ok tb@
|
|
ok tb@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This code is full of problematic C and is also otherwise of questionable
quality. It is far from constant time and jsing informs me it also isn't
faster. Good riddance.
|
|
|
|
|
|
ok jsing
|
|
ok jsing
|
|
ok jsing
|
|
ok jsing, and kind of tb an earlier version
|
|
Pull a number of invariants into variables, which avoids repeated loading
from memory on architectures where sufficient registers are available.
Also keep track of the per-iteration carry in a variable, rather than
unnecessarily reading from and writing to memory.
This gives a reasonable performance gain on some architectures (e.g. armv7)
|
|
ok tb@
|
|
|
|
No functional change.
|
|
|
|
This removes a bunch of incomplete and scary code, which potentially leaks
secrets and is not constant time. A performance gain is achieved on arm64
for sizes that we care about, while a minimal decrease in performance is
noted for larger sizes on some other platforms.
While we will potentially reimplement Karatsuba (or Toom-Cook) at a later
date, it will be easier and safer to do it from a clean slate.
ok tb@
|
|
No functional change.
|