src - OpenBSD base system

Age	Commit message (Collapse)	Author
2022-06-07	Remove uvm_km_valloc_prefer_wait(9) and uvm_km_free_wakeup(9) now that	Mark Kettenis
	nothing uses these functions anymore. ok mpi@
2022-06-07	Remove redundant check for free pages. The pmemrange code that is called	Mark Kettenis
	by uvm_pglistalloc(9) does a similar check already. ok mpi@
2022-06-02	Take the size of allocation into account when checking the kernel reserve.	Mark Kettenis
	ok mpi@
2022-05-14	uvm_km_valloc(9), uvm_km_valloc_try(9), uvm_km_valloc_wait(9) and	Mark Kettenis
	uvm_km_valloc_align(9) are no longer used. Remove these functions. ok mpi@
2022-05-12	Consider BUFPAGES_DEFICIT in swap_shortage.	Martin Pieuchot
	ok beck@
2022-05-12	Introduce uvm_pagedequeue() to reduce code duplication.	Martin Pieuchot
	ok kettenis@
2022-05-04	Merge swap-backed and object-backed inactive page lists.	Martin Pieuchot
	ok millert@, kettenis@
2022-05-03	Rate limit uvn_flush error during pageout messages. They occur	Alexander Bluhm
	when a memory mapped file cannot be written to disk, e.g. if the file system is full. Too much printf() during kernel relinking slows down the system boot. OK deraadt@
2022-04-30	Recheck PG_BUSY after locking the page.	Martin Pieuchot
	Another thread can set the bit if we sleep during rw_enter(9) in which case the page shouldn't be touched. ok semarie@
2022-04-28	Always acquire the `vmobjlock' before incrementing an object's reference.	Martin Pieuchot

2022-04-28	Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.	Martin Pieuchot
	Having fewer places manipulating the global list of active/inactive pages will help future LRU improvements. ok kettenis@, kn@
2022-04-28	Update uvmexp.swpgonly only once in uvm_swap_get().	Martin Pieuchot
	Prevent a small window where a check could be incorrect in case an error occurs in uvm_swap_io(). ok kettenis@, kn@
2022-04-19	add missing unlock before returning in uvn_detach()	Sebastien Marie
	uvn_detach sets UVM_VNODE_RELKILL flag and wait for all async i/o to finish. but uvm_vnp_terminate() could clear the flag and take over the vnode. mpi@ noted that this code path is mostly dead code because there is no "async I/O" (uvn_io() is always synchronous). ok visa@ mpi@
2022-04-11	Remove trailing spaces.	Martin Pieuchot

2022-04-04	Replace KASSERT in uvm_fault_unwire_locked() with code that handles the	Mark Kettenis
	case where not all pages are wired. The KASSERT can be triggered in multi-threaded applications when a thread calling munmap(2) races another thread that invokes sysctl(2). Properly written code shouldn't do this, but making the kernel crash in this case is a bit harsh. ok gezdo@, deraadt@ Fixes: Reported-by: syzbot+e8310909e2910c9cca08@syzkaller.appspotmail.com
2022-03-17	In swap_io() allocate the buffer before doing encryption.	Martin Pieuchot
	If the allocation fails due to memory pressure no time is wasted doing encryption. This also simplify the error path. Tested by sthen@. ok kn@, miod@, kettenis@, tb@
2022-03-12	Uncompress some one line comments to reduce the difference with NetBSD.	Martin Pieuchot
	No functionnal change.
2022-03-12	Revert holding a read lock on the map while copying out data during sysctl(2).	Martin Pieuchot
	This introduced a lock ordering issue reported by naddy@, anton@ and syzkaller. Reported-by: syzbot+739bb901045d9b193bde@syzkaller.appspotmail.com
2022-03-11	Hold a read lock on the map while copying out data during a sysctl(2) call	Mark Kettenis
	to prevent another thread from unmapping the memory and triggering an assertion or even corrupting random physical memory pages. This fix is similar to the change in uvm_glue.c rev. 1.74. However in this case we need to be careful since some sysctl(2) calls look at the map of the current process. In those cases we must not attempt to lock the map again. ok mpi@ Should fix: Reported-by: syzbot+be89fe83d6c004fcb412@syzkaller.appspotmail.com
2022-03-10	Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().	Martin Pieuchot
	Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out its memory. This code path has been broken since at least January 2021 and is apparently not so easy to trigger. Found the hard way by sthen@ ok kettenis@, kn@
2022-02-22	Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>	Philip Guenther
	net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
2022-02-21	Grab vmobjlocks with RW_DUPOK in vm_obj_wire() to silence WITNESS	Klemens Nanni
	The drm subsystem implements graphics buffers as uvm objects backed by anonymous memory, thus drm locks and aobj locks share the same "vmobjlock" type. uvm_obj_wire() is only called from sys/dev/pci/drm/, so instead of changing drm's lock init/alloc routines to mark allow duplicate locks in general, enter uvm's vmobjlock with RW_DUPOK in this function to allow duplicate lock types per thread in this specific call path alone. Fixes the following WITNESS report when booting/starting X (as seen already in other unrelated bugs@ reports): wsdisplay0: screen 1-5 added (std, vt100 emulation) witness: acquiring duplicate lock of same type: "&uobj->vmobjlock" 1st uobjlk 2nd uobjlk Starting stack trace... witness_checkorder(fffffd83b625f9b0,9,0) at witness_checkorder+0x8ac rw_enter(fffffd83b625f9a0,1) at rw_enter+0x68 uvm_obj_wire(fffffd843c39e948,0,40000,ffff800033b70428) at uvm_obj_wire+0x46 shmem_get_pages(ffff800008008500) at shmem_get_pages+0xb8 __i915_gem_object_get_pages(ffff800008008500) at __i915_gem_object_get_pages+0x6d i915_gem_fault(ffff800008008500,ffff800033b707c0,10009b000,a43d6b1c000,ffff800033b70740,1,35ba896911df1241,ffff8000000aa078,ffff8000000aa178) at i915_gem_fault+0x203 drm_fault(ffff800033b707c0,a43d6b1c000,ffff800033b70740,1,0,0,7eca45006f70ee0,ffff800033b707c0) at drm_fault+0x156 uvm_fault(fffffd843a7cf480,a43d6b1c000,0,2) at uvm_fault+0x179 upageflttrap(ffff800033b70920,a43d6b1c000) at upageflttrap+0x62 usertrap(ffff800033b70920) at usertrap+0x129 recall_trap() at recall_trap+0x8 end of kernel end trace frame: 0x7f7ffffdc7c0, count: 246 End of stack trace. Input kettenis OK mpi
2022-02-21	interting -> inserting	Jonathan Gray

2022-02-18	Convert KVA allocation to km_alloc(9).	Mark Kettenis
	ok mpi@
2022-02-15	Backout previous "Unwire with map lock held" (commitid: SsVz7dLGFgR21kFe)	Klemens Nanni
	The (known) lock order reversals which now occur more reliably and much earlier on WITNESS boots with this diff knock out syzcaller reports since syzcaller stops at the first "crash report": https://syzkaller.appspot.com/bug?id=81b39e970cd2eb21b97d1b31746c693e300fd2dd
2022-02-14	Unwire with map lock held	Klemens Nanni
	This is an updated version of uvm_map.c r1.283 "Unwire with map lock held". The previous version introduced a use-after-free by not unlocking vm_map locks in uvm_map_teardown(), resulting in dangling references on the reaper's lock list (thanks visa!). Lock and unlock the map in around uvm_map_teardown() instead. This code path holds the last reference, hence the lock isn't strictly needed except for satisfying upcoming locking assertions. Tested on amd64, arm64, i386, macppc, octeon, sparc64. This time also with WITNESS enabled (except on sparc64 which builds but does not boot with WITNESS; this is a known issue). OK mpi visa
2022-02-11	Backout previous "Unwire with map lock held" (commitid: eQBvWUwShD91dN9Z)	Klemens Nanni
	WITNESS builds broke^W^Wkernels panic on boot as reported by anton and bluhm. Booting bsd.mp in single-user mode inside VMM shows: root on sd0a (5f9e458ed30b39ab.a) swap on sd0b dump on sd0b Enter pathname of shell or RETURN for sh: witness: lock order reversal: 1st 0xfffffd801f8ce468 vmmaplk (&map->lock) 2nd 0xfffffd801b8162c0 inode (&ip->i_lock) lock order "&ip->i_lock"(rrwlock) -> "&map->lock"(rwlock) first seen at: #0 rw_enter_read+0x38 #1 uvmfault_lookup+0x8a #2 uvm_fault_check+0x32 #3 uvm_fault+0xfb #4 kpageflttrap+0x12c #5 kerntrap+0x91 #6 alltraps_kern_meltdown+0x7b #7 copyout+0x53 #8 ffs_read+0x1f6 #9 VOP_READ+0x41 #10 vn_rdwr+0xa1 #11 vmcmd_map_readvn+0xa0 #12 exec_process_vmcmds+0x88 #13 sys_execve+0x732 #14 start_init+0x26f #15 proc_trampoline+0x1c lock order data w1 -> w2 missing # exit kernel: protection fault trap, code=0 Stopped at witness_checkorder+0x312: movl 0x10(%r14),%ecx gkoehler reported faults on poisened addresses on macppc dual G5.
2022-02-11	Backout previous "Assert vm map locks" (commitid: sRNBfzX2dJrxFDmb)	Klemens Nanni
	WITNESS builds broke as reported by anton and bluhm: root on sd0a (5ec49b3ad23eb2d4.a) swap on sd0b dump on sd0b kernel: protection fault trap, code=0 Stopped at witness_checkorder+0x4ec: movl 0x10(%r12),%ecx https://syzkaller.appspot.com/bug?id=be02b290a93c648986c35370a271aad4135a5044 https://syzkaller.appspot.com/text?tag=CrashLog&x=136e9aa4700000
2022-02-10	Assert vm map locks	Klemens Nanni
	Introduce vm_map_assert_{wrlock,rdlock,anylock,unlocked}() in rwlock(9) fashion and back up function comments about locking assumptions with proper assertions. Also add new comments/assertions based on code analysis and sync with NetBSD as much as possible. vm_map_lock() and vm_map_lock_read() are used for exclusive and shared access respectively; currently no code path is purely protected by vm_map_lock_read() alone, i.e. functions called with a read lock held by the callee are also called with a write lock elsewhere. Thus only vm_map_assert_{wrlock,anylock}() are used as of now. This should help with unlocking UVM related syscalls Tested as part of a larger diff through - amd64 package bulk build by naddy - amd64, arm64, powerpc64 base builds and regress by bluhm - amd64 and sparc64 base builds and regress by me Input mpi Feedback OK kettenis
2022-02-10	Unwire with map lock held	Klemens Nanni
	uvm_unmap_remove() effectively requires its caller to lock the vm map. Even though uvm_map_teardown() is only called after a map's last reference is dropped and is thus safe from other threads accessing the map, grab the map's lock in uvm_map_teardown() to satify upcoming lock assertions in uvm_unmap_remove(). Tested as part of a larger diff through - amd64 package bulk builds by naddy - amd64, arm64, powerpc64 base builds and regress by bluhm - amd64 and sparc64 base builds and regress by me Feedback mpi OK kettenis
2022-02-03	The sparc64 pmap at least requires the fault access_type to be a	Philip Guenther
	subset of the request permissions, so when forcing an initial RO fault for CoW also clamp the access_type. problem reported by bluhm@ based on a suggestion from miod@ ok kettenis@
2022-02-03	Use UVM_KMF_TRYLOCK for consistency	Klemens Nanni
	No object change. OK millert
2022-02-01	Attempt to guarantee that on copy-on-write faulting, the new copy	Philip Guenther
	can't be written to while any thread can see the original version of the page via a not-yet-flushed stale TLB entry: pmaps can indicate they do this correctly by defining __HAVE_PMAP_MPSAFE_ENTER_COW; uvm will force the initial CoW fault to be read-only otherwise. Set that on amd64 and fix the problem case in pmap_enter() by putting a read-only mapping in place, shooting the TLB entry, then fixing it to the final read-write entry so this thread can continue without re-faulting. reported by jsing@ from https://github.com/golang/go/issues/34988 assisted by discussion in https://reviews.freebsd.org/D14347 tweaks from jsing@ and kettenis@ ok jsing@ mpi@ kettenis@
2022-01-29	Fix macro name in comment.	Kenji Aoyama
	ok visa@
2022-01-19	Grab the kernel lock in uvm_wxcheck() when aborting the process	Klemens Nanni
	kern.wxabort=1 logs and kills programs after W^X violations. At least sigexit() -> coredump() as well as the non-atomic increment of ps_wxcounter require protection, so grab the big lock for the entire block. This is part of the effort to unlock mmap(2)'s MAP_ANON case. Feedback mvs claudio kettenis deraadt OK kettenis
2022-01-19	Comment out an incorrect lock assertion.	Martin Pieuchot
	The swap code path in uvm_aio_aiodone() is not holding the corresponding page lock and shouldn't as long as anons are locked inside uvm_page_unbusy() to handle the PG_RELEASED case. Reported by Ralf Horstmann on bugs@
2022-01-17	Call uvm_pglistfree(9) instead of uvm_pmr_freepageq().	Martin Pieuchot
	There is no functionnal change as the former is just a wrapper around the latter. However upper layer of UVM do not need to mess with the internals of the page allocator. This will also help when a page cache will be introduced to reduce contention on the global mutex serializing acess to pmemrange's data. ok kettenis@, kn@, tb@
2022-01-05	Remove kbind(2)'s restriction that a target buffer not cross page	Philip Guenther
	boundaries: hppa has 8-byte PLT entries that sometimes do that. ok kettenis@
2021-12-29	Consistently name page argument `pg'.	Martin Pieuchot
	Reduce differences with NetBSD, no functional changes.
2021-12-28	Unlock bottom part of the fault handler.	Martin Pieuchot
	Tested by many during the past months, thanks! ok sthen@
2021-12-23	Roll the syscalls that have an off_t argument to remove the explicit padding.	Philip Guenther
	Switch libc and ld.so to the generic stubs for these calls. WARNING: reboot to updated kernel before installing libc or ld.so! Time for a story... When gcc (back in 1.x days) first implemented long long, it didn't (always) pass 64bit arguments in 'aligned' registers/stack slots, with the result that argument offsets didn't match structure offsets. This affected the nine system calls that pass off_t arguments: ftruncate lseek mmap mquery pread preadv pwrite pwritev truncate To avoid having to do custom ASM wrappers for those, BSD put an explicit pad argument in so that the off_t argument would always start on a even slot and thus be naturally aligned. Thus those odd wrappers in lib/libc/sys/ that use __syscall() and pass an extra '0' argument. The ABIs for different CPUs eventually settled how things should be passed on each and gcc 2.x followed them. The only arch now where it helps is landisk, which needs to skip the last argument register if it would be the first half of a 64bit argument. So: add new syscalls without the pad argument and on landisk do that skipping directly in the syscall handler in the kernel. Keep compat support for the existing syscalls long enough for the transition. ok deraadt@
2021-12-21	Fix a typo in mlock(2) error path triggering a double-free.	Martin Pieuchot
	Pass the correct entry to uvm_fault_unwire_locked(). Reported-by: syzbot+bb2f63f076618e9ed0d3@syzkaller.appspotmail.com ok kettenis@, deraadt@
2021-12-17	Do not try to unlock a NULL object.	Martin Pieuchot
	Fix a NULL dereference introduced in previous, reported by anton@ and Benjamin Baier. Reported-by: syzbot+c172bd335801b67e515b@syzkaller.appspotmail.com
2021-12-15	Use a per-UVM object lock to serialize the lower part of the fault handler.	Martin Pieuchot
	Like the per-amap lock the `vmobjlock' is principally used to serialized access to objects in the fault handler to allow faults occurring on different CPUs and different objects to be processed in parallel. The fault handler now acquires the `vmobjlock' of a given UVM object as soon as it finds one. For now a write-lock is always acquired even if some operations could use a read-lock. Every pager, corresponding to a different kind of UVM object, now expect the UVM object to be locked and some operations, like _get() return it unlocked. This is enforced by assertions checking for rw_write_held(). The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager. To ensure the correct amap or object lock is held when modifying a page many uvm_page operations are now asserting for the "owner" lock. However, fields of the "struct vm_page" are still being protected by the global `pageqlock'. To prevent lock ordering issues with the new `vmobjlock' and to reduce differences with NetBSD this lock is now taken and released for each page instead of around the whole loop. This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking will follow if there is no fallout. Ported from NetBSD, tested by many, thanks! ok kettenis@, kn@
2021-12-12	Add vnode parameter to VOP_STRATEGY()	Visa Hankala
	Pass the device vnode as a parameter to VOP_STRATEGY() to allow calling the correct vop_strategy callback. Now the vnode is also available in the callback. OK mpi@
2021-12-10	Revert "kbind(2): disable system call if not initialized before	Philip Guenther
	first __tfork(2)" The immediate issue is that a process linked with -znow will still perform lazy relocation on objects loaded with dlopen(), but there are possibly other dark corners to plumb to find a better invariant. Problem reported by thfr@
2021-12-07	uvm_map_inentry() is provided a format string that says "inside", but then	Theo de Raadt
	prints the end which is in the next page. Subtract 1 to avoid confusion.
2021-12-07	uvn_reference(): correct printf(9) argument order	Scott Soule Cheloha
	Thread: https://marc.info/?l=openbsd-tech&m=163884527530326&w=2 ok deraadt@
2021-12-05	kbind(2): disable system call if not initialized before first __tfork(2)	Scott Soule Cheloha
	To unlock kbind(2) we need to protect ps_kbind_addr and ps_kbind_cookie. The simplest way to do this is to disallow kbind(2) initialization after the first __tfork(2) call. If the first thread does not initialize the kbind(2) variables before __tfork(2) then we disable kbind(2) during that first __tfork(2) call. This is guenther@'s patch, I'm just committing it. Discussed with guenther@, deraadt@, kettenis@, and mpi@. ok kettenis@, positive response from mpi@, "I am busy" guenther@
2021-11-11	Convert a for loop into LIST_FOREACH to reduce the diff to NetBSD.	Theo Buehler
	ok millert mpi