src - OpenBSD base system

Age	Commit message (Collapse)	Author
2022-08-14	remove unneeded includes in sys/kern	Jonathan Gray
	ok mpi@ miod@
2022-07-23	kernel: remove global "randompid" toggle	Scott Soule Cheloha
	Apparently, we used to created several kthreads before the kernel random number generator was up and running. A toggle, "randompid", was needed to tell allocpid() whether it made sense to attempt to allocate random PIDs. However, these days we get e.g. arc4random(9) into a working state before any kthreads are spawned, so the toggle is no longer needed. Thread: https://marc.info/?l=openbsd-tech&m=165541052614453&w=2 Very nice historical context provided by miod@. probably ok miod@ deraadt@
2022-02-22	Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>	Philip Guenther
	net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
2022-01-01	copyright++;	Jonathan Gray

2021-12-09	We only have one syscall table: inline sysent/SYS_MAXSYSCALL and	Philip Guenther
	SYS_syscall as the nosys() function into the MD syscall entry routines and the SYSCALL_DEBUG support. Adjust alpha's syscall check to match the other archs. Also, make sysent const to get it into .rodata. With that, 'struct emul' is unused: delete it and all its references ok millert@
2021-12-07	Delete the last emulation callbacks: we're Just ELF, so declare	Philip Guenther
	exec_elf_fixup() and coredump_elf() in <sys/exec_elf.h> and call them and the MD setregs() directly in kern_exec.c and kern_sig.c Also delete e_name[] (only used by sysctl), e_errno (unused), and e_syscallnames[] (only used by SYSCALL_DEBUG) and constipate syscallnames to 'const char *const[]' ok kettenis@
2021-12-07	Continue to delete emulation support: we only have one sigcode and	Philip Guenther
	sigobject. Just use the existing globals for the former and use a global for the latter. ok jsg@ kettenis@
2021-12-07	Continue to delete emulation support: since we're Just ELF, the size	Philip Guenther
	of the auxinfo is fixed: provide ELF_AUX_WORDS in <sys/exec_elf.h> as a replacement for emul->e_arglen ok millert@
2021-12-06	Start to delete emulation support: since we're Just ELF, make	Philip Guenther
	copyargs() return 0/1 and merge elf_copyargs() into it. Rename ep_emul_arg and ep_emul_argp to have clearer meaning and type and eliminate ep_emul_argsize as no longer necessary. Make sure ep_auxinfo (nee ep_emul_argp) is initialized as powerpc64 always uses it in setregs(). ok semarie@ deraadt@ kettenis@
2021-06-30	Remove unused variable cryptodesc_pool. Document global variables	Alexander Bluhm
	in crypto.c and annotate locking protection. Assert kernel lock where needed. Remove dead code from crypto_get_driverid(). Move crypto_init() prototype into header file. OK mpi@
2021-06-02	Enable pool cache on knote pool	Visa Hankala
	Use the pool cache to reduce the overhead of memory management in function kqueue_register(). When EV_ADD is given, kqueue_register() pre-allocates a knote to avoid potential sleeping in the middle of the critical section that spans from knote lookup to insertion. However, the pre-allocation is useless if the lookup finds a matching knote. The cost of knote allocation will become significant with kqueue-based poll(2) and select(2) because the frequency of allocation will increase. Most of the cost appears to come from the locking inside the pool. The pool cache amortizes it by using CPU-local caches of free knotes as buffers. OK dlg@ mpi@
2021-02-08	Revert the convertion of per-process thread into a SMR_TAILQ.	Martin Pieuchot
	We did not reach a consensus about using SMR to unlock single_thread_set() so there's no point in keeping this change.
2021-01-11	New rw_obj_init() API providing reference-counted rwlock.	Martin Pieuchot
	Original port from NetBSD by guenther@, required for upcoming amap & anon locking. ok kettenis@
2021-01-01	copyright++;	Jonathan Gray

2020-12-28	Use per-CPU counters for fault and stats counters reached in uvm_fault().	Martin Pieuchot
	ok kettenis@, dlg@
2020-12-07	Convert the per-process thread list into a SMR_TAILQ.	Martin Pieuchot
	Currently all iterations are done under KERNEL_LOCK() and therefor use the *_LOCKED() variant. From and ok claudio@
2020-09-13	Initialize sigacts0 before making them visible by setting ps->ps_sigacts.	Claudio Jeker
	OK mpi@
2020-06-16	wire stoeplitz code into the tree.	David Gwynne

2020-05-29	dev/rndvar.h no longer has statistical interfaces (removed during various	Theo de Raadt
	conversion steps). it only contains kernel prototypes for 4 interfaces, all of which legitimately belong in sys/systm.h, which are already included by all enqueue_randomness() users.
2020-05-25	Pass bootblock indicator RB_GOODRANDOM to random_start(). Future work	Theo de Raadt
	will frantically compensate. ok kettenis
2020-03-13	Rename "sigacts" flag field to avoid conflict with the "process" one.	Martin Pieuchot
	This shows that atomic_* operations should not be necessery to write to this field unlike with the process one. The advantage of using a somewhat-unique prefix for struct member is moot when multiple definitions use the same prefix :o) From Amit Kulkarni, ok claudio@
2020-02-25	Start the SMR thread when all CPUs are ready for scheduling. This	Visa Hankala
	prevents the appearance of a "smr: dispatch took N seconds" message during boot when there is an early smr_call(). Such a call can happen with mfii(4). The initial dispatch cannot make progress until smr_grace_wait() can visit all CPUs. This fix is essentially a hack. It makes use of the fact that there is no hard guarantee on how quickly the callback of smr_call() gets invoked. It is assumed that the SMR call backlog does not grow large during boot. An alternative fix is to make smr_grace_wait() skip secondary CPUs until they have been started. However, this could break if the spinup logic of secondary CPUs was changed. Delayed SMR dispatch reported and fix tested by Hrvoje Popovski Discussed with and OK kettenis@, claudio@
2020-01-01	copyright++;	Jonathan Gray

2019-12-30	Convert infinite sleeps to {m,t}sleep_nsec(9).	Martin Pieuchot
	ok bluhm@
2019-11-29	Repurpose the "syscalls must be on a writeable page" mechanism to	Theo de Raadt
	enforce a new policy: system calls must be in pre-registered regions. We have discussed more strict checks than this, but none satisfy the cost/benefit based upon our understanding of attack methods, anyways let's see what the next iteration looks like. This is intended to harden (translation: attackers must put extra effort into attacking) against a mixture of W^X failures and JIT bugs which allow syscall misinterpretation, especially in environments with polymorphic-instruction/variable-sized instructions. It fits in a bit with libc/libcrypto/ld.so random relink on boot and no-restart-at-crash behaviour, particularily for remote problems. Less effective once on-host since someone the libraries can be read. For static-executables the kernel registers the main program's PIE-mapped exec section valid, as well as the randomly-placed sigtramp page. For dynamic executables ELF ld.so's exec segment is also labelled valid; ld.so then has enough information to register libc's exec section as valid via call-once msyscall(2) For dynamic binaries, we continue to to permit the main program exec segment because "go" (and potentially a few other applications) have embedded system calls in the main program. Hopefully at least go gets fixed soon. We declare the concept of embedded syscalls a bad idea for numerous reasons, as we notice the ecosystem has many of static-syscall-in-base-binary which are dynamically linked against libraries which in turn use libc, which contains another set of syscall stubs. We've been concerned about adding even one additional syscall entry point... but go's approach tends to double the entry-point attack surface. This was started at a nano-hackathon in Bob Beck's basement 2 weeks ago during a long discussion with mortimer trying to hide from the SSL scream-conversations, and finished in more comfortable circumstances next to a wood-stove at Elk Lakes cabin with UVM scream-conversations. ok guenther kettenis mortimer, lots of feedback from others conversations about go with jsing tb sthen
2019-11-04	Regularly poll and report kubsan findings using the timeout(9) API	anton
	instead of task(9). Undefined behavior can potentially be present in any context and calling task_add() isn't always safe. ok visa@
2019-10-22	struct proc: change ps_start from utc time to uptime	cheloha
	Allows us to determine how long a process has been running, even if the UTC clock jumps. With help from bluhm@ and millert@, who squashed several bugs. ok bluhm@ millert@
2019-06-21	Make resource limit access MP-safe. So far, the copy-on-write sharing	Visa Hankala
	of resource limit structs has been done between processes. By applying copy-on-write also between threads, threads can read rlimits in a nearly lock-free manner. Inspired by code in DragonFly BSD and FreeBSD. OK mpi@, agreement from jmatthew@ and anton@
2019-06-20	Undefined behavior (UB) can potentially be present anywhere in the	anton
	kernel. kubsan reports findings using printf() and assuming that calling printf() is safe in all contexts can be problematic. Instead, defer reporting of findings to the systq task queue. Storage for findings is allocated early in the boot process in order to catch potential UB during boot. The same findings are reported once the task queue subsystem has been initialized. Feedback from kettenis@ and ok mpi@
2019-06-02	Move initialization of limit0 into a dedicated function. This new	Visa Hankala
	function is also a proper place for setting up the plimit pool. While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed in upcoming MP work. OK claudio@
2019-06-01	Revert to using the SCHED_LOCK() to protect time accounting.	Martin Pieuchot
	It currently creates a lock ordering problem because SCHED_LOCK() is taken by hardclock(). That means the "priorities" of a thread should be moved out of the SCHED_LOCK() first in order to make progress. Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com via anton@ as well as by kettenis@
2019-05-31	Use a per-process mutex to protect time accounting instead of SCHED_LOCK().	Martin Pieuchot
	Note that hardclock(9) still increments p_{u,s,i}ticks without holding a lock. ok visa@, cheloha@
2019-05-31	Rename struct plimit field p_refcnt to pl_refcnt to avoid confusion	Visa Hankala
	with the fields of struct proc. Make pl_refcnt unsigned for upcoming atomic updating. OK deraadt@ guenther@
2019-02-26	Introduce safe memory reclamation, a mechanism for reclaiming shared	Visa Hankala
	objects that readers can access without locking. This provides a basis for read-copy-update operations. Readers access SMR-protected shared objects inside SMR read-side critical section where sleeping is not allowed. To reclaim an SMR-protected object, the writer has to ensure mutual exclusion of other writers, remove the object's shared reference and wait until read-side references cannot exist any longer. As an alternative to waiting, the writer can schedule a callback that gets invoked when reclamation is safe. The mechanism relies on CPU quiescent states to determine when an SMR-protected object is ready for reclamation. The <sys/smr.h> header additionally provides an implementation of singly- and doubly-linked lists that can be used together with SMR. These lists allow lockless read access with a concurrent writer. Discussed with many OK mpi@ sashan@
2019-01-19	Move boottime into the timehands.	cheloha
	To protect the timehands we first need to protect the basis for all UTC time in the kernel: the boottime. Because the boottime can be changed at any time it needs to be versioned along with the other members of the timehands to enable safe lockless reads when using it for anything. So the global boottime timespec goes away and the static boottimebin becomes a member of the timehands. Instead of reading the global boottime you use one of two interfaces: binboottime(9) or microboottime(9). nanoboottime(9) can trivially be added later, though there are no consumers for it at the moment. This introduces one small change in behavior. We used to advance the reported boottime just before launching kernel threads from main(). This makes it look to userland like we "booted" moments before those threads were launched. Because there is no longer a boottime global we can no longer trivially do this from main(), so the boottime we report to userspace via e.g. kern.boottime will now reflect whatever the time was when we bootstrapped the timehands via inittodr(9). This is usually no more than a minute before the kernel threads are launched from main(). The prior behavior can be restored by adding a new interface to the timecounter layer in a future commit. Based on FreeBSD r303387. Discussed with mpi@ and visa@. ok visa@
2019-01-01	copyright++;	Jonathan Gray

2018-09-10	- if_cloners list populated at boot time only then becomes immutable,	Alexandr Nedvedicky
	so we can let go if_cloners_lock. OK tb@, claudio@, bluhm@, kn@, henning@
2018-08-13	Simplify the startup of the cleaner, reaper and update threads by	Visa Hankala
	passing the main function directly to kthread_create(9). The start_* functions are mere stepping stones nowadays and can be pruned. They used to contain more logic in the pre-kthread era. While here, set `cleanerproc' and `syncerproc' during the thread creation rather than expect the threads to set the proc pointer. Also, rename `sched_sync' to `syncer_thread' to reduce confusion with the scheduler-related functions. OK kettenis@, deraadt@, mpi@
2018-07-20	Remove a few leftovers from the days of emulation, which could result in	Theo de Raadt
	a bad/corrupt binary not returning ENOEXEC but some other error. ok guenther kettenis bluhm
2018-07-10	Move from sendsig() to its callers the initsiginfo() calls and	Philip Guenther
	instead of passing sendsig() the code+type+val, pass a siginfo_t* to copy from. Eliminate the indirection through struct emul for sendsig(); we no longer have a SunOS4-compat version of sendsig() ok deraadt@
2018-04-28	Clean up the parameters of VOP_LOCK() and VOP_UNLOCK(). It is always	Visa Hankala
	curproc that does the locking or unlocking, so the proc parameter is pointless and can be dropped. OK mpi@, deraadt@
2018-04-12	Implement MAP_STACK option for mmap(). Synchronous faults (pagefault and	Theo de Raadt
	syscall) confirm the stack register points at MAP_STACK memory, otherwise SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified to create a MAP_STACK sub-region which satisfies alignment requirements. Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the contents of the region -- there is no mprotect() equivalent operation, so there is no MAP_STACK-adding gadget. This opportunistic software-emulation of a stack protection bit makes stack-pivot operations during ROPchain fragile (kind of like removing a tool from the toolbox). original discussion with tedu, uvm work by stefan, testing by mortimer ok kettenis
2018-03-20	Do not panic from ddb(4) when a lock requirement isn't fulfilled.	Martin Pieuchot
	Extend the logic already present for panic() to any DDB-related operation such that if ddb(4) is entered because of a fault or other trap it is still possible to call 'boot reboot'. While here stop printing splassert() messages as well, to not fill the buffer. ok visa@, deraadt@
2018-02-28	Revert the change that postpones CPUs until after mounthook activities.	Patrick Wildt
	This was needed to be able to use loadfirmware() to load the microcode before letting the cores go. Now that the microcode is loaded earlier we can restore the previous behaviour. ok deraadt@
2018-01-11	Postpone secondary CPUs until after mounthook activities. This is	Patrick Wildt
	useful for loading CPU microcode from the disk before the CPUs are let go. Tested by visa@ on sgi, loongson and octeon "don't see immediate issues" kettenis@ ok deraadt@
2018-01-01	copyright++;	Jonathan Gray

2017-08-14	Load CTF debug symbols before mountroot	Uwe Stuehler
	This is obviously useful in order to investigate a failure to mount an NFS or other root device. ok mpi
2017-08-11	Merge DDBCTF into DDB.	Martin Pieuchot

2017-04-28	Add futex(2) syscall based on a sane subset of its Linux equivalent.	Martin Pieuchot
	The syscall is marked NOLOCK and only FUTEX_WAIT grabs the KERNEL_LOCK() because of PCATCH and the signal nightmare. Serialization of threads is currently done with a global & exclusive rwlock. Note that the current implementation still use copyin(9) which is not guaranteed to be atomic. Committing now such that remaining issues can be addressed in-tree. With inputs from guenther@, kettenis@ and visa@. ok deraadt@, visa@
2017-04-20	Add a port of witness(4) lock validation tool from FreeBSD.	Visa Hankala
	Go-ahead from kettenis@, guenther@, deraadt@