Age | Commit message (Collapse) | Author |
|
with different locking mechanism. 88110 soft ipi are replaced with an
ipi callback which is checked upon return from exception (it can not be kept
as a softintr, as the generic softinterrupt code doesn't have per-cpu
pending softintr queues).
|
|
levels. This will allow for platforms where soft interrupt levels do not
map to real hardware interrupt levels to have soft ipl values overlapping
hard ipl values without breaking spl asserts.
|
|
flushing the whole TLB block every time a pte is modified, store a bitmask
of pending flushes and do them at pmap_update() time. 88100 behaviour is
unchanged.
|
|
exchange with zero; use it in the soft interrupt code to make it simpler
and faster.
|
|
|
|
Rework nmi handling to handle ``complex'' NMI faster, and return as fast as
possible from the exception, without doing the AST and softintr dance.
This should avoid too much stack usage under load.
ok deraadt@
|
|
to disable NMI sources in addition to interrupt sources, and we can not
use a quick sequence with shadowing frozen as done for atomic ops.
This lets GENERIC.MP boot multiuser on MVME197DP boards, and is so far stable
enough to be able to recompile a kernel from scratch (with make -j2).
|
|
since it was intended to service NMI occuring in user mode, and we could
end up invoking preempt() and have another cpu start using this stack,
with interesting results.
|
|
|
|
can be interrupted by NMI; move the SMP version of these routines from
inlines to a separate file (kernel text shrinks 20KB...).
Since the implementation for 88110 becomes really hairy, the pre-main() code
is responsible for copying the appropriate code over for kernels configured
for both 88100 and 88110 cpus, to avoid having to choose the atomicity
strategy at runtime. Hairy, I said.
This gets GENERIC.MP run much further on 197DP. Not enough to reach multiuser
mode, but boots up to starting sshd and then panics.
|
|
xmem didn't return the expected value, spin doing regular loads until it
appears we have a chance to grab the lock again.
|
|
|
|
synchronizes the pipeline on 88110.
|
|
- dma_cachectl() split into a ``local cpu only'' and ``all cpus'', and an ipi
to broadcast ``local dma_cachectl'' is added.
- cpu_info fields are rearranged, to have the 88100-specific information
and the 88110-specific information overlap, and has many more 88110
ugly things.
- more ipi handling in the 197-specific area. Since it is not possible to
have the second processor receive any hardware interrupt (selection
is done on a level basis via ISEL, and we definitely do not want the
main cpu to lose interrupts), the best we can do is to inflict ourselves
a soft interrupt for late ipi processing. It gets used for softclock and
hardclock on the secondary processor, but since the soft interrupt
dispatcher doesn't have an exception frame, we have to remember parts
of it to build a fake clockframe from the soft ipi handler (ugly but
works).
This now lets GENERIC.MP run a few userland binaries before bugs trigger.
|
|
from interrupt() and related function pointers.
|
|
now set up both the exception frame structure and the exception stack as
soon as possible, so that we can safely get interrupted by an NMI as soon
as we reenable shadowing.
|
|
this defeats the purpose of having a separate stack at this point... Oopsie
|
|
different from regular hardware interrupts to be worth handling the
same way.
Disable IPI reception while we are handling pending IPIs. And do not
reenable them by mistake if we need to send an IPI in return.
This lets GENERIC.MP boot single user on a MVME197DP. There are still
many bugs to fix.
|
|
has been invoked on the new process.
|
|
while we are switching pcbs and all sort of bad things could happen.
|
|
pmap makes sure these can't happen.
|
|
the old vs(4) code is gone.
|
|
|
|
(i.e. with the valid bit set in them). Found the hard way by Anders Gavare
trying his latest gxemul, proves the hardware is more permitting than one
would expect it to be...
|
|
even in non-MP kernels, to avoid unnecessary tlb flushes later when
pmap operates on shared pages.
|
|
other MP platforms.
|
|
protected by __ISO_C_VISIBLE > 1999. With a little help from miod@.
ok miod@
|
|
which are uniform for the profclock on each cpu in a SMP system (but using
a different seed for each cpu). on all cpus, avoid seeding with a value out
of the [0, 2^31-1] range (since that is not stable)
ok kettenis drahn
|
|
anything special to prod a cpu to leave the idle loop in signotify.
powerpc, i386, amd64 and sparc64 will follow soon so that everyone has
the same interface to wake an idling cpu.
|
|
For now, sparc64 is arbitrarily set to 256 (only architecture that didn't have
a practical limit in the code on the number of cpus).
|
|
- provide proper dtoa locks
- use the real strtof implementation
- add strtold, __hdtoa, __hldtoa
- add %a/%A support
- don't lose precision in printf, don't round to double anymore
- implement extended-precision versions of libc functions: fpclassify,
isnan, isinf, signbit, isnormal, isfinite, now that the ieee.h is
fixed
- separate vax versions of strtof, and __hdtoa
- add complex math support. added functions: cacos, casin, catan,
ccos, csin, ctan, cacosh, casinh, catanh, ccosh, csinh, ctanh, cexp,
clog, cabs, cpow, csqrt, carg, cimag, conj, cproj, creal, cacosf,
casinf, catanf, ccosf, csinf, ctanf, cacoshf, casinhf, catanhf,
ccoshf, csinhf, ctanhf, cexpf, clogf, cabsf, cpowf, csqrtf, cargf,
cimagf, conjf, cprojf, crealf
- add fdim, fmax, fmin
- add log2. (adapted implementation e_log.c. could be more acruate
& faster, but it's good enough for now)
- remove wrappers & cruft in libm, supposed to work-around mistakes
in SVID, etc.; use ieee versions. fixes issues in python 2.6 for
djm@
- make _digittoint static
- proper definitions for i386, and amd64 in ieee.h
- sh, powerpc don't really have extended-precision
- add missing definitions for mips64 (quad), m{6,8}k (96-bit) float.h
for LDBL_*
- merge lead to frac for m{6,8}k, for gdtoa to work properly
- add FRAC*BITS & EXT_TO_ARRAY32 definitions in ieee.h, for hdtoa&ldtoa
to use
- add EXT_IMPLICIT_NBIT definition, which indicates implicit
normalization bit
- add regression tests for libc: fpclassify and printf
- arith.h & gd_qnan.h definitions
- update ieee.h: hppa doesn't have quad-precision, hppa64 does
- add missing prototypes to gdtoaimp
- on 64-bit platforms make sure gdtoa doesn't use a long when it
really wants an int
- etc., what i may have forgotten...
- bump libm major, due to removed&changed symbols
- no libc bump, since this is riding on djm's libc major crank from
a day ago
discussed with / requested by / testing theo, sthen@, djm@, jsg@,
merdely@, jsing@, tedu@, brad@, jakemsr@, and others.
looks good to millert@
parts of the diff ok kettenis@
this commit does not include:
- man page changes
|
|
|
|
userland is allowed to change in psr.
|
|
so don't use any in the 88110-specific parts of locore.
|
|
- math.h shouldn't define FLT_EVAL_METHOD, but float.h should (per
C99). remove from math.h, and add proper definitions in float.h
ok millert@
|
|
Right now when mi_switch picks up the same proc, we didn't clear the
flag which would mean that every time we service an AST we would attempt
a context switch. For some architectures, amd64 being probably the
most extreme, that meant attempting to context switch for every
trap and interrupt.
Now we clear_resched explicitly after every context switch, even if it
didn't do anything. Which also allows us to remove some more code
in cpu_switchto (not done yet).
miod@ ok
|
|
conversions that should shave a few bytes off the kernel.
ok henning, krw, jsing, oga, miod, and thib (``even though i usually prefer
FOO|BAR''; thanks for looking.
|
|
|
|
|
|
spotted by drahn
|
|
|
|
|
|
|
|
shorter code than a conditional ADDCARRY, so use it; inspired by hppa.
|
|
1 for 88110), for userland to have an easy way to figure out.
|
|
|
|
on m88k.
|
|
have jumped on it instead of basing the FPU completion work on the sparc
FPU code.
This is now repaired with this commit, and m88110_fp.c changes directory
again, for the last time.
|
|
instructions to be serialized (this defeats the purpose of having a superscalar
processor, and accesses to volatile variables are done with explicit memory
barriers anyway).
This brings a HUGE speedup: openssl speed -elapsed shows AES is 90% faster,
blowfish is 75% faster, and sha1 is 50% faster. Not so bad!
However, doing this increases the pressure on the processor bus, so it is
necessary to increase the processor bus timeout on 40MHz boards again (to 256
usec). ``black cat'' 50MHz boards seem to be unaffected, so they remain at
64 usec.
|
|
|