src - OpenBSD base system

Age	Commit message (Collapse)	Author
2019-05-14	Add lock order checking for smr_barrier(9). This is similar to the	Visa Hankala
	checking done in taskq_barrier(9) and timeout_barrier(9). OK mpi@
2019-05-13	Add a kernel implementation of realpath() as __realpath().	Bob Beck
	We want this so that we can stop allowing readlink() on traversed vnodes in unveil(). This includes all the kernel side and the system call. This is not yet used in libc for realpath, so nothing calls this yet. The libc wrapper will be committed later. Testing by many, and ports build by naddy@ ok deraadt@
2019-05-13	When killing a process, the signal is handled by any thread that	Alexander Bluhm
	does not block the signal. If all threads block the signal, we delivered it to the main thread. This does not conform to POSIX. If any thread unblocks the signal, it should be delivered immediately to this thread. Mark such signals pending at the process instead of a single thread. Then any thread can handle it later. OK kettenis@ guenther@
2019-05-13	dup2(n,n) would rlimit check before handling the n==n shortcut,	Theo de Raadt
	and incorrectly return EBADF when n>curlim. ok millert guenther tedu
2019-05-12	no need to store the wmesg passed to rwsleep() as a static variable anymore	anton

2019-05-11	wxneeded binaries on wxallowed filesystems were refused execution. We have	Theo de Raadt
	encountered a wxneeded binary that attempts correct operation when started on a nowxallowed filesystem (it tries mprotect with RWX, notices ENOTSUP and acts in a different way). So permit execution (but of course don't allow W^X violating mappings) ok sthen kettenis robert
2019-05-11	make rw-lock adaptive	Alexandr Nedvedicky
	OK visa@, OK mpi@
2019-05-11	Restore previous behavior of limiting deadlock detection to posix-style	anton
	locks. ok jturner@ visa@ Reported-by: syzbot+f9f13034fd656af6c48f@syzkaller.appspotmail.com
2019-05-11	socppc makes an extended visit to the bigbucket.	Theo de Raadt
	ok kettenis
2019-05-10	Reduce number of timehands from to just two.	cheloha
	Reduces the worst-case error for for time values retrieved via the microtime(9) functions from 10 ticks to 2 ticks. Being interrupted for over a tick is unlikely but possible. While here use C99 initializers. From FreeBSD r303383. ok mpi@
2019-05-09	If mallocing the array program header fails, give up on coredumping	Philip Guenther
	instead of panicing ok deraadt@, tedu@, mpi@
2019-05-09	Ensure that pagedaemon wakeups as a result of failed UVM_PLA_NOWAIT	Bob Beck
	allocations will recover some memory from the dma_constraint range. The allocation still fails, the intent is to ensure that the pagedaemon will free some memory to possibly allow a subsequent allocation to succeed. This also adds a UVM_PLA_NOWAKE flag to allow special cases in the buffer cache to not wake up the pagedaemon until they want to. ok kettenis@
2019-05-09	Unlock adjfreq(2), adjtime(2), clock_settime(2), and settimeofday(2).	cheloha
	clock_settime(2)/settimeofday(2) still need KERNEL_LOCK for a moment when resetting the RTC, as that's done periodically from a task under KERNEL_LOCK. Not quite sure how to approach that one yet. ok visa@ mpi@, "good stuff" tedu@, "please wait until after [tree] unlock" deraadt@
2019-05-09	Don't unconditionally throw away dma memory when we don't need to.	Bob Beck
	Noticed by me and otto@ ok tedu@
2019-05-09	Add a sysctl accessor to struct pf_status. The pf_status only holds the	Claudio Jeker
	current status and statistics and can be exported without super-user rights via sysctl to make it easier for tools like systat to access those. OK deraadt@, sashan@
2019-05-09	disable stack printing for now since at least arm64 can't print them	Ted Unangst
	reported by kettenis
2019-05-08	print a few warnings when calling free with a zero size.	Ted Unangst
	let's see what falls out. ok beck deraadt kettenis mpi
2019-05-08	group function prototypes	anton

2019-05-08	Add a compile-time option called SPLASSERT_WATCH which changes the	anton
	default value of kern.splassert to 3, i.e. enter ddb on splassert() failure. Will be used during fuzzing. ok mpi@ visa@
2019-05-08	Modify the buffer cache to always flip recovered DMA buffers high.	Bob Beck
	This also modifies the backoff logic to only back off what is requested and not a "mimimum" amount. Tested by me, benno@, tedu@ anda ports build by naddy@. ok tedu@
2019-05-06	Bring back lockf deadlock detection from the dead. Back in 2007, the deadlock	anton
	detection broke while changing the owner of a lock from struct proc to struct filedesc/file. Instead of keeping track of the owning proc for each lock, introduce a new list for all pending blocked locks. This list is scanned before waiting on a blocking lock in order to determine if sleeping would cause a deadlock. The new implementation is serialized by the recently added locking to the same subsystem, meaning that acquiring the kernel lock is no longer necessary. ok visa@
2019-05-04	Removed all diagnostic, calling printf() here might create a recursion.	Martin Pieuchot
	Should be revisited once logwakeup() is fixed.
2019-05-04	Relax the check in knote_{de,en}queue: don't panic w/o KERNEL_LOCK().	Martin Pieuchot
	Use splassert_fail() instead, please set kern.splassert to 2 and report the corresponding stack trace if you see a warning. ok dlg@
2019-05-03	Make VOP_ADVLOCK() safe to use without kernel lock	Visa Hankala
	All non-dummy implementations of VOP_ADVLOCK() rely on lf_advlock() which is now safe to use without the kernel lock. Because VOP_ADVLOCK() does not make the vnode dirty, it is unnecessary to keep track of in-flight vnode lock operations and the updating of vnode->v_inflight can be dropped from VOP_ADVLOCK(). This makes VOP_ADVLOCK() safe to use without the kernel lock. OK tedu@ mpi@
2019-05-01	add a KERNEL_ASSERT_LOCKED() to ptsignal	David Gwynne
	it obviously needs to be called with the kernel lock held, so it makes sense to check that so we can unlock more code without introducing bugs that shoot us in the face in the indeterminate future. csignal is basically a wrapper around ptsignal, so calls to that without the kernel lock should be caught by this too. discussed with mpi@ on bugs@
2019-05-01	sprinkle some KERNEL_ASSERT_LOCKED()	David Gwynne
	operations that tweak the kq_head and kq_count need to be serialised against the kevent syscall which also fumbles with the list and count too. these asserts would have made it extremely obvious where the tun(4) bug was. for half the time of the bug report about it we werent even sure it was tun(4) discussed with mpi@ jmatthew@
2019-04-30	tc_setclock: always call tc_windup() before leaving windup_mtx.	cheloha
	We ought to conform to the windup_mtx protocol and call tc_windup() even if we aren't changing the system uptime.
2019-04-28	add WITNESS support to barriers modelled on the timeout stuff visa did.	David Gwynne
	if a taskq takes a lock, and something holding that lock calls taskq_barrier, there's a potential deadlock. detect this as a lock order problem when witness is enable. task_del conditionally followed by taskq_barrier is a common pattern, so add a taskq_del_barrier wrapper for it that unconditionally checks for the deadlock, like timeout_del_barrier. ok visa@
2019-04-23	Remove file name and line number output from witness(4)	Visa Hankala
	Reduce code clutter by removing the file name and line number output from witness(4). Typically it is easy enough to locate offending locks using the stack traces that are shown in lock order conflict reports. Tricky cases can be tracked using sysctl kern.witness.locktrace=1 . This patch additionally removes the witness(4) wrapper for mutexes. Now each mutex implementation has to invoke the WITNESS_*() macros in order to utilize the checker. Discussed with and OK dlg@, OK mpi@
2019-04-20	#define ELFROUNDSIZE 4 /* XXX Should it be sizeof(Elf_Word)? */	Theo de Raadt
	Now that alpha is fixed, we can use sizeof().
2019-04-20	print locked range in decimal in debug routines	anton

2019-04-20	Move lockf structures from header to implementation since external users	anton
	only need a lockf_state pointer by now. ok mpi@ visa@
2019-04-19	Add a subsystem lock for vfs_lockf.c. This enables calling lf_advlock()	Visa Hankala
	and lf_purgelocks() without the kernel lock. OK anton@ mpi@
2019-04-16	Use the actual cluster size instead of fixed MCLBYTES for the	YASUOKA Masahiko
	condition in sb_compress(). Currently the actual cluster size might be 9KB even if the mtu is 1500, in this case a lot of memory space had been wasted, since sbcompress() doesn't compress because of previous condition. ok dlg claudio
2019-04-14	Add lock order checking for timeouts	Visa Hankala
	The caller of timeout_barrier() must not hold locks that could prevent timeout handlers from making progress. The system could deadlock otherwise. This patch makes witness(4) able to detect barrier locking errors. This is done by introducing a pseudo-lock that couples the lock chains of barrier callers to the lock chains of timeout handlers. In order to find these errors faster, this diff adds a synchronous version of cancelling timeouts, timeout_del_barrier(9). As the synchronous intent is explicit, this interface can check lock order immediately instead of waiting for the potentially rare occurrence of timeout_barrier(9). OK dlg@ mpi@
2019-04-02	Restrict which filesystems are available for swap. This rules out	Visa Hankala
	obvious misconfigurations that cannot work. OK mpi@ tedu@
2019-04-02	retguard has now replaced the stack protector on clang architectures,	Theo de Raadt
	the kernel does not need a __stack_smash_handler function. WARNING: You need a fairly new clang, approximately > March 31. with mortimer
2019-04-01	deprecate TASKQ_CANTSLEEP since nothing uses it anymore	David Gwynne
	if we ever want it back, it's in the attic. ok mpi@ visa@ kettenis@
2019-03-31	Move the prototypes of internal lockf functions from <sys/lockf.h>	Visa Hankala
	to vfs_lockf.c. This makes the public interface clearer. The declaration of variable lockf_debug is removed from the header because it is not needed outside of vfs_lockf.c. OK anton@ tedu@
2019-03-26	Tweak previous: include <sys/stdint.h> for INT64_MAX/INT64_MIN.	cheloha

2019-03-26	adjtime(2): set EINVAL if delta overflows 64 bits of microseconds.	cheloha
	No other (known) BSD-derived adjtime(2) implementation checks for overflow when converting delta into its final denomination of fractional seconds. This is peculiar, as the call originates in 4.3BSD. However, glibc, uclibc, and (to an extent) musl /do/ check the input and set EINVAL if it exceeds a certain bound, so we'll just use the errno that they use to be consistent with extant practice. Prompted by the comment kettenis@ left when we switched to storing the adjustment in an int64_t like ~5 years ago (kern_time.c,v 1.87). Positive feedback from deraadt@, manpage bits ok jmc@, no code complaints from otto@ or tedu@.
2019-03-26	Remove this assert, I can't do this here with UNVEIL_INSPECT	Bob Beck
	added aggressively today. Hopefully post release a glorious flensing will remove UNVEIL_INSPECT anyway Reported-by: syzbot+3375ce307ac7909b907b@syzkaller.appspotmail.com
2019-03-26	Make sure that each ci has its spc_deferred queue initialized.	Visa Hankala
	Otherwise, the system can crash in smr_call_impl() if SMT is enabled later. Crash reported by jcs@
2019-03-25	MP-safe timecounting: new rwlock: tc_lock	cheloha
	tc_lock allows adjfreq(2) and the kern.timecounter.hardware sysctl(2) to read/write the active timecounter pointer and the .tc_adj_freq member of the active timecounter safely. This eliminates any possibility of a torn read/write for the .tc_adj_freq member when we drop the KERNEL_LOCK from the timecounting layer. It also ensures the active timecounter does not change in the midst of an adjfreq(2) call. Because these are not high-traffic paths, we can get away with using tc_lock in write-mode to ensure combination read/write adjtime(2) calls are relatively atomic (a) to other writer adjtime(2) calls, and (b) to settimeofday(2)/clock_settime(2) calls, which cancel ongoing adjtime(2) adjustment. When the KERNEL_LOCK is dropped, an unprivileged user will be able to create some tc_lock contention via adjfreq(2); it is very unlikely to ever be a problem. If it ever is actually a problem a lockless read could be added to address it. While here, reorganize sys_adjfreq()/sys_adjtime() to minimize code under the lock. Also while here, make tc_adjfreq() void, as it cannot fail under any circumstance. Also also while here, annotate various globals/struct members with lock ordering details. With lots of input from mpi@ and visa@. ok visa@
2019-03-24	Make stat(2) and access(2) need UNVEIL_READ instead of UNVEIL_INSPECT	Bob Beck
	UNVEIL_INSPECT is a hack we added to get chrome/glib working. It silently adds permission for stat(2), access(2), and readlink(2) to be used on all path components of any unveil'ed path. robert@ has sucessfully now fixed chrome/glib to not require exessive TOC vs TOU stat(2) and access(2) calls on the paths it uses, so that this no longer needed there. readlink(2) is the sole call that is now permitted by UNVEIL_INSPECT, and this is only needed so that realpath(3) can work. Going forward we will likely make a realpath(2), after which we can completely deprecate UNVEIL_INSPECT. ok deraadt@
2019-03-23	Add a simple spinning mutex for ddb. Unlike mutex(9), this lock keeps	Visa Hankala
	on spinning even if `db_active' or `panicstr' has been set. The new mutex also disables IPIs in the critical section. OK mpi@ patrick@
2019-03-22	Move adjtimedelta into the timehands.	cheloha
	adjtimedelta is 64-bit and thus can't be read/written atomically on all architectures. Because it can be modified from tc_windup() and ntp_update_second() we need a way to ensure safe reads/writes for adjtime(2) callers. One solution is to move it into the timehands and adopt the lockless read protocol we now use for the system boot time and uptime. So make new_adjtimedelta an argument to tc_windup() and add a lockless read loop to tc_adjtime(). With adjtimedelta stored in the timehands we can now simply pass a timehands pointer to ntp_update_second(). This makes ntp_update_second() safer as we're using the timehands' timecounter pointer instead of the mutable global timecounter pointer. Lots of input from mpi@ and visa@. ok visa@
2019-03-22	Rename "timecounter_mtx" to "windup_mtx".	cheloha
	This will make upcoming MP-related diffs smaller and should make the code int kern_tc.c easier to read in general. "windup_mtx" is also a better mnemonic: always call tc_windup() before leaving windup_mtx.
2019-03-19	correct LOCATION_REPORTED mask	anton

2019-03-18	Add kubsan(4), a undefined behavior sanitizer for the kernel. It's	anton
	capable of detecting undefined behavior at runtime and all findings are printed to the system console, including the offending line in the source code. kubsan is limited to architectures using Clang as their default compiler and is not enabled by default. Derived from the NetBSD implementation. ok kettenis@ visa@