Age | Commit message (Collapse) | Author |
|
broke pthreads on hppa. Reverting. Ok deraadt@
|
|
our i386 compiler does not generate SSE instructions by default,
it is not strictly necessary to save MXCSR content between setjmp(3)
and longjmp(3). We do not want to end supporting such old processors
now. Remove the stmxcsr and ldmxcsr instructions from libc.
reported by Johan Huldtgren; OK jsg@ kettenis@
|
|
it. There is enough space in jmp_buf to save MXCSR and CW register.
Idea taken from amd64. This fixes regress/lib/libc/setjmp-fpu .
OK kettenis@
|
|
i386 libc. The assembler code is more readable than with magic
numbers. This brings i386 in line with amd64. No change in object
file.
OK kettenis@
|
|
This changes RETGUARD_SETUP(ffs) to RETGUARD_SETUP(ffs, %r11, %r12)
and RETGUARD_CHECK(ffs) to RETGUARD_CHECK(ffs, %r11, %r12)
to show that r11 and r12 are in use between setup and check, and to
pick registers other than r11 and r12 in some kernel functions.
ok mortimer@ deraadt@
|
|
ok deraadt@
|
|
Add retguard to some, but not all, asm functions in libc. Edit SYS.h
in libc to remove the PREFIX macros and add SYSENTRY (more like
aarch64 and powerpc64), so we can insert RETGUARD_SETUP after
SYSENTRY. Some .S files in this commit don't get retguard, but do
stop using the old prefix macros.
Tested by deraadt@, who put this diff in a macppc snap.
|
|
floating-point control modes are properly restored by longjmp(3).
ok guenther@
|
|
OK deraadt@
|
|
ok deraadt@
|
|
("permanently undefined")
ok deraadt@ kettenis@
|
|
ok mortimer
|
|
are properly restored by longjmp(3).
|
|
ok deraadt@
|
|
Put a hard-trap instruction after the syscall instruction.
ok kettenis mortimer
|
|
calls are guarded. Adapt the first few hand-written functions to this
model (a few remain)
ok kettenis mortimer
|
|
framepointer, so gdb knows to stop. Inspired by glibc
ok kettenis@
|
|
Regarding RDTSC, the Intel ISA reference says (Vol 2B. 4-545):
> The RDTSC instruction is not a serializing instruction.
>
> It does not necessarily wait until all previous instructions
> have been executed before reading the counter.
>
> Similarly, subsequent instructions may begin execution before the
> read operation is performed.
>
> If software requires RDTSC to be executed only after all previous
> instructions have completed locally, it can either use RDTSCP (if
> the processor supports that instruction) or execute the sequence
> LFENCE;RDTSC.
To mitigate this problem, Linux and DragonFly use LFENCE. FreeBSD and
NetBSD take a more complex route: they selectively use MFENCE, LFENCE,
or CPUID depending on whether the CPU is AMD, Intel, VIA or something
else.
Let's start with just LFENCE. We only use the TSC as a timecounter on
SSE2 systems so there is no need to conditionally compile the LFENCE.
We can explore conditionally using MFENCE later.
Microbenchmarking on my machine (Core i7-8650) suggests a penalty of
about 7-10% over a "naked" RDTSC. This is acceptable. It's a bit of
a moot point though: the alternative is a considerably weaker
monotonicity guarantee when comparing timestamps between threads,
which is not acceptable.
It's worth noting that kernel timecounting is not *exactly* like
userspace timecounting. However, they are similar enough that we can
use userspace benchmarks to make conjectures about possible impacts on
kernel performance.
Concerns about kernel performance, in particular the network stack,
were the blocking issue for this patch. Regarding networking
performance, claudio@ says a 10% slower nanotime(9) or nanouptime(9)
is acceptable and that shaving off "tens of cycles" is a
micro-optimization. There are bigger optimizations to chase down
before such a difference would matter.
There is additional work to be done here. We could experiment with
conditionally using MFENCE. Also, the userspace TSC timecounter
doesn't have access to the adjustment skews available to the kernel
timecounter. pirofti@ has suggested a scheme involving RDTSCP and an
array of skews mapped into user memory. deraadt@ has suggested a
scheme where the skew would be kept in the TCB. However it is done,
access to the skews will improve monotonicity, which remains a problem
with the TSC.
First proposed by kettenis@ and pirofti@. With input from pirofti@,
deraadt@, guenther@, naddy@, kettenis@, and claudio@. Based on
similar changes in Linux, FreeBSD, NetBSD, and DragonFlyBSD.
ok deraadt@ pirofti@ kettenis@ naddy@ claudio@
|
|
32-bit values.
ok gkoehler@, drahn@
|
|
Initialize __curbrk = &_end.
It's a 64-bit pointer, so use ld/std instead of lwz/stw.
ok drahn@
|
|
OK naddy@; no objections from kettenis@
|
|
Tested by cwen@ and myself. Thanks to pirofti@ for creating the
userland timecounter feature.
ok kettenis@ pirofti@ deraadt@ cheloha@
|
|
ok naddy@
|
|
be 8 bytes in the 64-bit ABI just like in the 32-bit ABI. But that means
there is no "spare" word in the TCB that we can use to store a pointer
to our struct pthread. So we have to treat powerpc64 special.
Also recognize that the thread pointer points 0x7000 bytes after the TCB.
Since the TCB is 8 bytes this means that TCB_OFFSET should be 0x7008.
Pointed out by guenther@; ok deraadt@
|
|
|
|
|
|
ok deraadt@, pirofti@
|
|
* We don't need TC_LAST
* Make internal functions static to avoid namespace pollution in libc.a
* Use a switch statement to harmonize with architectures providing
multiple timecounters
ok deraadt@, pirofti@
|
|
This diff exposes parts of clock_gettime(2) and gettimeofday(2) to
userland via libc eliberating processes from the need for a context
switch everytime they want to count the passage of time.
If a timecounter clock can be exposed to userland than it needs to set
its tc_user member to a non-zero value. Tested with one or multiple
counters per architecture.
The timing data is shared through a pointer found in the new ELF
auxiliary vector AUX_openbsd_timekeep containing timehands information
that is frequently updated by the kernel.
Timing differences between the last kernel update and the current time
are adjusted in userland by the tc_get_timecount() function inside the
MD usertc.c file.
This permits a much more responsive environment, quite visible in
browsers, office programs and gaming (apparently one is are able to fly
in Minecraft now).
Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others!
OK from at least kettenis@, cheloha@, naddy@, sthen@
|
|
Use correct register to reference the location where we store CR.
|
|
address to load the correct TOC address.
|
|
of bcopy(9) doesn't work in its current state.
ok deraadt@
|
|
we use ld to load it again in longjmp(3).
|
|
instructions.
ok drahn@
|
|
|
|
aarch64/powerpc/powerpc64, making use of the count leading
zeros instruction. Also add a brief regression test.
ok deraadt@ kettenis@
|
|
as the per-thread register.
ok patrick@, drahn@
|
|
|
|
Initial attempt to port powerpc code to powerpc64
Expects TOC loading in ENTRY(),
ok kettenis@ (some cleanup required)
|
|
Initial attempt to port powerpc code to powerpc64
Expects TOC loading in ENTRY(),
memmove.S is the powerpc 32 bit, optimization is possible for 64 bit
and handle len of > 32 bits.
|
|
|
|
This is a almost a direct copy from powerpc with 64 bit mods,
with two additions present in 64 arch.
NOTE: long double 128 is not supported currently.
|
|
|
|
Initial attempt to port powerpc code to powerpc64
Expects TOC loading in ENTRY(),
ok kettenis@
|
|
Expects ELFv2 TOC loading in ENTRY(),
build with -gdwarf-4
Split SYS.h into SYS.h and DEFS.h
fix tabs after #define
|
|
problems as 64-bit models. To resolve the syscall speculation, as a first
step "nop; nop" was added after all occurances of the syscall ("swi 0")
instruction. Then the kernel was changed to jump over the 2 extra instructions.
In this final step, those pair of nops are converted into the speculation-blocking
sequence ("dsb nsh; isb").
Don't try to build through these multiple steps, use a snapshot instead.
Packages matching the new ABI will be out in a while...
ok kettenis
|
|
problems as 64-bit models. For the syscall instruction issue, add nop;nop
after swi 0, in preparation for jumping over a speculation barrier here later.
ok kettenis
|
|
a syscall, replace the double nop with a dsb nsh; isb; sequence which
stops the CPU from speculating any further. This fix was suggested
by Anthony Steinhauser.
ok deraadt@
|
|
The will be replaced by a speculation barrier as soon as we teach the
kernel to skip over these two instructions when returning from a
system call.
ok patrick@, deraadt@
|
|
as well as those in arch/arm/gen/divsi3.S. This cleans up the PLTs on the
32bit archs.
luna88k testing by aoyama@
"looks good" kettenis@, testing and ok deraadt@
|