Age | Commit message (Collapse) | Author |
|
ok mpi@ miod@
|
|
Apparently, we used to created several kthreads before the kernel
random number generator was up and running. A toggle, "randompid",
was needed to tell allocpid() whether it made sense to attempt to
allocate random PIDs.
However, these days we get e.g. arc4random(9) into a working state
before any kthreads are spawned, so the toggle is no longer needed.
Thread: https://marc.info/?l=openbsd-tech&m=165541052614453&w=2
Very nice historical context provided by miod@.
probably ok miod@ deraadt@
|
|
net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@
|
|
|
|
SYS_syscall as the nosys() function into the MD syscall entry
routines and the SYSCALL_DEBUG support. Adjust alpha's syscall
check to match the other archs. Also, make sysent const to get it
into .rodata.
With that, 'struct emul' is unused: delete it and all its references
ok millert@
|
|
exec_elf_fixup() and coredump_elf() in <sys/exec_elf.h> and call
them and the MD setregs() directly in kern_exec.c and kern_sig.c
Also delete e_name[] (only used by sysctl), e_errno (unused), and
e_syscallnames[] (only used by SYSCALL_DEBUG) and constipate
syscallnames to 'const char *const[]'
ok kettenis@
|
|
sigobject. Just use the existing globals for the former and use a
global for the latter.
ok jsg@ kettenis@
|
|
of the auxinfo is fixed: provide ELF_AUX_WORDS in <sys/exec_elf.h>
as a replacement for emul->e_arglen
ok millert@
|
|
copyargs() return 0/1 and merge elf_copyargs() into it. Rename
ep_emul_arg and ep_emul_argp to have clearer meaning and type and
eliminate ep_emul_argsize as no longer necessary. Make sure
ep_auxinfo (nee ep_emul_argp) is initialized as powerpc64 always
uses it in setregs().
ok semarie@ deraadt@ kettenis@
|
|
in crypto.c and annotate locking protection. Assert kernel lock
where needed. Remove dead code from crypto_get_driverid(). Move
crypto_init() prototype into header file.
OK mpi@
|
|
Use the pool cache to reduce the overhead of memory management in
function kqueue_register().
When EV_ADD is given, kqueue_register() pre-allocates a knote to avoid
potential sleeping in the middle of the critical section that spans
from knote lookup to insertion. However, the pre-allocation is useless
if the lookup finds a matching knote.
The cost of knote allocation will become significant with kqueue-based
poll(2) and select(2) because the frequency of allocation will increase.
Most of the cost appears to come from the locking inside the pool.
The pool cache amortizes it by using CPU-local caches of free knotes
as buffers.
OK dlg@ mpi@
|
|
We did not reach a consensus about using SMR to unlock single_thread_set()
so there's no point in keeping this change.
|
|
Original port from NetBSD by guenther@, required for upcoming amap & anon
locking.
ok kettenis@
|
|
|
|
ok kettenis@, dlg@
|
|
Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.
From and ok claudio@
|
|
OK mpi@
|
|
|
|
conversion steps). it only contains kernel prototypes for 4 interfaces,
all of which legitimately belong in sys/systm.h, which are already included
by all enqueue_randomness() users.
|
|
will frantically compensate.
ok kettenis
|
|
This shows that atomic_* operations should not be necessery to write
to this field unlike with the process one.
The advantage of using a somewhat-unique prefix for struct member is
moot when multiple definitions use the same prefix :o)
From Amit Kulkarni, ok claudio@
|
|
prevents the appearance of a "smr: dispatch took N seconds" message
during boot when there is an early smr_call(). Such a call can happen
with mfii(4). The initial dispatch cannot make progress until
smr_grace_wait() can visit all CPUs.
This fix is essentially a hack. It makes use of the fact that there
is no hard guarantee on how quickly the callback of smr_call() gets
invoked. It is assumed that the SMR call backlog does not grow large
during boot.
An alternative fix is to make smr_grace_wait() skip secondary CPUs
until they have been started. However, this could break if the spinup
logic of secondary CPUs was changed.
Delayed SMR dispatch reported and fix tested by Hrvoje Popovski
Discussed with and OK kettenis@, claudio@
|
|
|
|
ok bluhm@
|
|
enforce a new policy: system calls must be in pre-registered regions.
We have discussed more strict checks than this, but none satisfy the
cost/benefit based upon our understanding of attack methods, anyways
let's see what the next iteration looks like.
This is intended to harden (translation: attackers must put extra
effort into attacking) against a mixture of W^X failures and JIT bugs
which allow syscall misinterpretation, especially in environments with
polymorphic-instruction/variable-sized instructions. It fits in a bit
with libc/libcrypto/ld.so random relink on boot and no-restart-at-crash
behaviour, particularily for remote problems. Less effective once on-host
since someone the libraries can be read.
For static-executables the kernel registers the main program's
PIE-mapped exec section valid, as well as the randomly-placed sigtramp
page. For dynamic executables ELF ld.so's exec segment is also
labelled valid; ld.so then has enough information to register libc's
exec section as valid via call-once msyscall(2)
For dynamic binaries, we continue to to permit the main program exec
segment because "go" (and potentially a few other applications) have
embedded system calls in the main program. Hopefully at least go gets
fixed soon.
We declare the concept of embedded syscalls a bad idea for numerous
reasons, as we notice the ecosystem has many of
static-syscall-in-base-binary which are dynamically linked against
libraries which in turn use libc, which contains another set of
syscall stubs. We've been concerned about adding even one additional
syscall entry point... but go's approach tends to double the entry-point
attack surface.
This was started at a nano-hackathon in Bob Beck's basement 2 weeks
ago during a long discussion with mortimer trying to hide from the SSL
scream-conversations, and finished in more comfortable circumstances
next to a wood-stove at Elk Lakes cabin with UVM scream-conversations.
ok guenther kettenis mortimer, lots of feedback from others
conversations about go with jsing tb sthen
|
|
instead of task(9). Undefined behavior can potentially be present in any
context and calling task_add() isn't always safe.
ok visa@
|
|
Allows us to determine how long a process has been running, even if the
UTC clock jumps.
With help from bluhm@ and millert@, who squashed several bugs.
ok bluhm@ millert@
|
|
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.
Inspired by code in DragonFly BSD and FreeBSD.
OK mpi@, agreement from jmatthew@ and anton@
|
|
kernel. kubsan reports findings using printf() and assuming that calling
printf() is safe in all contexts can be problematic. Instead, defer
reporting of findings to the systq task queue.
Storage for findings is allocated early in the boot process in order to
catch potential UB during boot. The same findings are reported once the
task queue subsystem has been initialized.
Feedback from kettenis@ and ok mpi@
|
|
function is also a proper place for setting up the plimit pool.
While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed
in upcoming MP work.
OK claudio@
|
|
It currently creates a lock ordering problem because SCHED_LOCK() is taken
by hardclock(). That means the "priorities" of a thread should be moved
out of the SCHED_LOCK() first in order to make progress.
Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com
via anton@ as well as by kettenis@
|
|
Note that hardclock(9) still increments p_{u,s,i}ticks without holding a
lock.
ok visa@, cheloha@
|
|
with the fields of struct proc. Make pl_refcnt unsigned for upcoming
atomic updating.
OK deraadt@ guenther@
|
|
objects that readers can access without locking. This provides a basis
for read-copy-update operations.
Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.
The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.
The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.
Discussed with many
OK mpi@ sashan@
|
|
To protect the timehands we first need to protect the basis for all UTC
time in the kernel: the boottime.
Because the boottime can be changed at any time it needs to be versioned
along with the other members of the timehands to enable safe lockless reads
when using it for anything. So the global boottime timespec goes away and
the static boottimebin becomes a member of the timehands. Instead of reading
the global boottime you use one of two interfaces: binboottime(9) or
microboottime(9). nanoboottime(9) can trivially be added later, though there
are no consumers for it at the moment.
This introduces one small change in behavior. We used to advance the
reported boottime just before launching kernel threads from main().
This makes it look to userland like we "booted" moments before those
threads were launched. Because there is no longer a boottime global we
can no longer trivially do this from main(), so the boottime we report
to userspace via e.g. kern.boottime will now reflect whatever the time
was when we bootstrapped the timehands via inittodr(9). This is usually
no more than a minute before the kernel threads are launched from main().
The prior behavior can be restored by adding a new interface to the
timecounter layer in a future commit.
Based on FreeBSD r303387.
Discussed with mpi@ and visa@.
ok visa@
|
|
|
|
so we can let go if_cloners_lock.
OK tb@, claudio@, bluhm@, kn@, henning@
|
|
passing the main function directly to kthread_create(9). The start_*
functions are mere stepping stones nowadays and can be pruned.
They used to contain more logic in the pre-kthread era.
While here, set `cleanerproc' and `syncerproc' during the thread
creation rather than expect the threads to set the proc pointer.
Also, rename `sched_sync' to `syncer_thread' to reduce confusion
with the scheduler-related functions.
OK kettenis@, deraadt@, mpi@
|
|
a bad/corrupt binary not returning ENOEXEC but some other error.
ok guenther kettenis bluhm
|
|
instead of passing sendsig() the code+type+val, pass a siginfo_t*
to copy from. Eliminate the indirection through struct emul for
sendsig(); we no longer have a SunOS4-compat version of sendsig()
ok deraadt@
|
|
curproc that does the locking or unlocking, so the proc parameter
is pointless and can be dropped.
OK mpi@, deraadt@
|
|
syscall) confirm the stack register points at MAP_STACK memory, otherwise
SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified
to create a MAP_STACK sub-region which satisfies alignment requirements.
Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the
contents of the region -- there is no mprotect() equivalent operation, so
there is no MAP_STACK-adding gadget.
This opportunistic software-emulation of a stack protection bit makes
stack-pivot operations during ROPchain fragile (kind of like removing a
tool from the toolbox).
original discussion with tedu, uvm work by stefan, testing by mortimer
ok kettenis
|
|
Extend the logic already present for panic() to any DDB-related
operation such that if ddb(4) is entered because of a fault or
other trap it is still possible to call 'boot reboot'.
While here stop printing splassert() messages as well, to not fill
the buffer.
ok visa@, deraadt@
|
|
This was needed to be able to use loadfirmware() to load the microcode
before letting the cores go. Now that the microcode is loaded earlier
we can restore the previous behaviour.
ok deraadt@
|
|
useful for loading CPU microcode from the disk before the CPUs are
let go.
Tested by visa@ on sgi, loongson and octeon
"don't see immediate issues" kettenis@
ok deraadt@
|
|
|
|
This is obviously useful in order to investigate a failure to mount an NFS
or other root device.
ok mpi
|
|
|
|
The syscall is marked NOLOCK and only FUTEX_WAIT grabs the KERNEL_LOCK()
because of PCATCH and the signal nightmare.
Serialization of threads is currently done with a global & exclusive
rwlock.
Note that the current implementation still use copyin(9) which is not
guaranteed to be atomic. Committing now such that remaining issues can
be addressed in-tree.
With inputs from guenther@, kettenis@ and visa@.
ok deraadt@, visa@
|
|
Go-ahead from kettenis@, guenther@, deraadt@
|