src - OpenBSD base system

Age	Commit message (Collapse)	Author
2023-02-11	I forgot to copy the msyscall interlock flag to forked processes, so	Theo de Raadt
	only freshly executed processes were actually locked. (This happened because I didn't realize how the uvm_map's contents are copied entry by entry, and other parts are not) ok kettenis
2023-01-31	On systems without xonly mmu hardware-enforcement, we can still mitigate	Theo de Raadt
	against classic BROP with a range-checking wrapper in front of copyin() and copyinstr() which ensures the userland source doesn't overlap the main program text, ld.so text, signal tramp text (it's mapping is hard to distinguish so it comes along for the ride), or libc.so text. ld.so tells the kernel libc.so text range with msyscall(2). The range checking for 2-4 elements is done without locking (because all 4 ranges are immutable!) and is inexpensive. write(sock, &open, 400) now fails with EFAULT. No programs have been discovered which require reading their own text segments with a system call. On a machine without mmu enforcement, a test program reports the following: userland kernel ld.so readable unreadable mmap xz unreadable unreadable mmap x readable readable mmap nrx readable readable mmap nwx readable readable mmap xnwx readable readable main readable unreadable libc unmapped? readable unreadable libc mapped readable unreadable ok kettenis, additional help from miod
2023-01-25	In the previous commit, FIXPROT would upgrade a PROT_NONE mapping too far.	Theo de Raadt
	Correct the logic, still blocking PROT_EXEC ok anton kettenis
2023-01-24	oops, a silly typo	Theo de Raadt

2023-01-24	uvm_map_extract() UVM_EXTRACT_FIXPROT alias mappings are only used for	Theo de Raadt
	read/write operations, so mask out PROT_EXEC to avoid creating an pointless exec mapping in the kernel. We probably need this masking upon minprot (for the non-UVM_EXTRACT_FIXPROT case) also, but I haven't done a test yet. ok kettenis
2022-12-18	spelling	Theo de Raadt

2022-11-17	stack growth from setrlimit was never updated to set UVM_ET_STACK on	Theo de Raadt
	the entries, so the check-sp-at-system-call check failed. Quite strange it took this long to find this. ok kettenis
2022-11-04	Assert the VM map lock is held in function used by mmap/mprotect/munmap.	Martin Pieuchot
	Also grab the lock in uvm_map_teardown() and uvm_map_deallocate() to satisfy the assertions. Grabbing the lock there shouldn't be strictly necessary, because no other reference to the map should exist when the reaper is holding it, but it doesn't hurt and makes our life easier. Inputs & tests from Ivo van der Sangen, tb@, gnezdo@, kn@ kettenis@ and tb@ agree with the direction, ok kn@
2022-10-31	Fix VMMAP_DEBUG code to compile with not-so-recent changes.	Martin Pieuchot
	If enabled the debug code currently panic the kernel. To investigate.
2022-10-24	uvm_unmap_remove() traverses the entries in the start,end range scanning	Theo de Raadt
	for IMMUTABLE, before traversing for unmap. I didn't copy enough traversal code for the scan, and thus MAP_FIXED was subtly broken. test help from tb, ok kettenis miod
2022-10-21	Recent chrome renderers try to change some immutable RW region to R.	Theo de Raadt
	I really want immutable to not allow such transitions either, because it will help bring code up to the highest standard. For now, allow this for all processes, until we find out the underlying reason.
2022-10-21	the debug "name" parameter to uvm_map_immutable() is no longer needed	Theo de Raadt

2022-10-16	Rather than marking MAP_STACK on entries for sigaltstack() [2 days ago],	Theo de Raadt
	go back to the old approach: using a new anon mapping because it removes any potential gadgetry pre-placed in the region (by making it zero). But also bring in a few more validation checks beyond contigious mapping -- it must not be a syscall region, and the protection must be precisely RW. This does allow sigaltstack() to shoot zero'd MAP_STACK non-immutable regions into the main stack area (which will soon be immutable). I am not sure we can keep reinforce immutable on the region after we do stack (like maybe determine this while doing the validation entry walk?) Sadly, continued support for sigaltstack() does require selecting the guessed best compromise. ok kettenis
2022-10-15	remove one of the debug messages	Theo de Raadt

2022-10-15	During the MAP_STACK introduction in 2018, sigaltstack() became a	Theo de Raadt
	problem because haphazard use could shoot holes in the address space (changing permissions, providing opportunities for pivoting, etc). I tried to write a diff to convert the address space correctly but did not understand enough about map entries, so instead we mapped new memory over top of the existing object. Placing a new mapping becomes unfeasible with the upcoming mimmutable model, so here is code that adds MAP_STACK to the region. It will only do so for a contigiously mapped region that is non-syscall with permission RW, otherwise it returns an error. Food for thought: If we know the object isn't service by an object, we should consider zero'ing the region, to block pre-pivot placement? ok kettenis
2022-10-07	Add mimmutable(2) system call which locks the permissions (PROT_*) of	Theo de Raadt
	memory mappings so they cannot be changed by a later mmap(), mprotect(), or munmap(), which will error with EPERM instead. ok kettenis
2022-08-15	remove FSPACE macros, unused after uvm_map_sel_limits() removal	Jonathan Gray

2022-08-15	remove unused uvm_map_sel_limits()	Jonathan Gray
	ok miod@ millert@
2022-08-07	Move fallback PMAP_PREFER definitions from uvm_map.c to uvm_pmap.h for them	Miod Vallat
	to be available to other files. NFC ok kettenis@ mpi@
2022-05-04	Merge swap-backed and object-backed inactive page lists.	Martin Pieuchot
	ok millert@, kettenis@
2022-03-12	Revert holding a read lock on the map while copying out data during sysctl(2).	Martin Pieuchot
	This introduced a lock ordering issue reported by naddy@, anton@ and syzkaller. Reported-by: syzbot+739bb901045d9b193bde@syzkaller.appspotmail.com
2022-03-11	Hold a read lock on the map while copying out data during a sysctl(2) call	Mark Kettenis
	to prevent another thread from unmapping the memory and triggering an assertion or even corrupting random physical memory pages. This fix is similar to the change in uvm_glue.c rev. 1.74. However in this case we need to be careful since some sysctl(2) calls look at the map of the current process. In those cases we must not attempt to lock the map again. ok mpi@ Should fix: Reported-by: syzbot+be89fe83d6c004fcb412@syzkaller.appspotmail.com
2022-02-15	Backout previous "Unwire with map lock held" (commitid: SsVz7dLGFgR21kFe)	Klemens Nanni
	The (known) lock order reversals which now occur more reliably and much earlier on WITNESS boots with this diff knock out syzcaller reports since syzcaller stops at the first "crash report": https://syzkaller.appspot.com/bug?id=81b39e970cd2eb21b97d1b31746c693e300fd2dd
2022-02-14	Unwire with map lock held	Klemens Nanni
	This is an updated version of uvm_map.c r1.283 "Unwire with map lock held". The previous version introduced a use-after-free by not unlocking vm_map locks in uvm_map_teardown(), resulting in dangling references on the reaper's lock list (thanks visa!). Lock and unlock the map in around uvm_map_teardown() instead. This code path holds the last reference, hence the lock isn't strictly needed except for satisfying upcoming locking assertions. Tested on amd64, arm64, i386, macppc, octeon, sparc64. This time also with WITNESS enabled (except on sparc64 which builds but does not boot with WITNESS; this is a known issue). OK mpi visa
2022-02-11	Backout previous "Unwire with map lock held" (commitid: eQBvWUwShD91dN9Z)	Klemens Nanni
	WITNESS builds broke^W^Wkernels panic on boot as reported by anton and bluhm. Booting bsd.mp in single-user mode inside VMM shows: root on sd0a (5f9e458ed30b39ab.a) swap on sd0b dump on sd0b Enter pathname of shell or RETURN for sh: witness: lock order reversal: 1st 0xfffffd801f8ce468 vmmaplk (&map->lock) 2nd 0xfffffd801b8162c0 inode (&ip->i_lock) lock order "&ip->i_lock"(rrwlock) -> "&map->lock"(rwlock) first seen at: #0 rw_enter_read+0x38 #1 uvmfault_lookup+0x8a #2 uvm_fault_check+0x32 #3 uvm_fault+0xfb #4 kpageflttrap+0x12c #5 kerntrap+0x91 #6 alltraps_kern_meltdown+0x7b #7 copyout+0x53 #8 ffs_read+0x1f6 #9 VOP_READ+0x41 #10 vn_rdwr+0xa1 #11 vmcmd_map_readvn+0xa0 #12 exec_process_vmcmds+0x88 #13 sys_execve+0x732 #14 start_init+0x26f #15 proc_trampoline+0x1c lock order data w1 -> w2 missing # exit kernel: protection fault trap, code=0 Stopped at witness_checkorder+0x312: movl 0x10(%r14),%ecx gkoehler reported faults on poisened addresses on macppc dual G5.
2022-02-11	Backout previous "Assert vm map locks" (commitid: sRNBfzX2dJrxFDmb)	Klemens Nanni
	WITNESS builds broke as reported by anton and bluhm: root on sd0a (5ec49b3ad23eb2d4.a) swap on sd0b dump on sd0b kernel: protection fault trap, code=0 Stopped at witness_checkorder+0x4ec: movl 0x10(%r12),%ecx https://syzkaller.appspot.com/bug?id=be02b290a93c648986c35370a271aad4135a5044 https://syzkaller.appspot.com/text?tag=CrashLog&x=136e9aa4700000
2022-02-10	Assert vm map locks	Klemens Nanni
	Introduce vm_map_assert_{wrlock,rdlock,anylock,unlocked}() in rwlock(9) fashion and back up function comments about locking assumptions with proper assertions. Also add new comments/assertions based on code analysis and sync with NetBSD as much as possible. vm_map_lock() and vm_map_lock_read() are used for exclusive and shared access respectively; currently no code path is purely protected by vm_map_lock_read() alone, i.e. functions called with a read lock held by the callee are also called with a write lock elsewhere. Thus only vm_map_assert_{wrlock,anylock}() are used as of now. This should help with unlocking UVM related syscalls Tested as part of a larger diff through - amd64 package bulk build by naddy - amd64, arm64, powerpc64 base builds and regress by bluhm - amd64 and sparc64 base builds and regress by me Input mpi Feedback OK kettenis
2022-02-10	Unwire with map lock held	Klemens Nanni
	uvm_unmap_remove() effectively requires its caller to lock the vm map. Even though uvm_map_teardown() is only called after a map's last reference is dropped and is thus safe from other threads accessing the map, grab the map's lock in uvm_map_teardown() to satify upcoming lock assertions in uvm_unmap_remove(). Tested as part of a larger diff through - amd64 package bulk builds by naddy - amd64, arm64, powerpc64 base builds and regress by bluhm - amd64 and sparc64 base builds and regress by me Feedback mpi OK kettenis
2021-12-21	Fix a typo in mlock(2) error path triggering a double-free.	Martin Pieuchot
	Pass the correct entry to uvm_fault_unwire_locked(). Reported-by: syzbot+bb2f63f076618e9ed0d3@syzkaller.appspotmail.com ok kettenis@, deraadt@
2021-12-15	Use a per-UVM object lock to serialize the lower part of the fault handler.	Martin Pieuchot
	Like the per-amap lock the `vmobjlock' is principally used to serialized access to objects in the fault handler to allow faults occurring on different CPUs and different objects to be processed in parallel. The fault handler now acquires the `vmobjlock' of a given UVM object as soon as it finds one. For now a write-lock is always acquired even if some operations could use a read-lock. Every pager, corresponding to a different kind of UVM object, now expect the UVM object to be locked and some operations, like _get() return it unlocked. This is enforced by assertions checking for rw_write_held(). The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager. To ensure the correct amap or object lock is held when modifying a page many uvm_page operations are now asserting for the "owner" lock. However, fields of the "struct vm_page" are still being protected by the global `pageqlock'. To prevent lock ordering issues with the new `vmobjlock' and to reduce differences with NetBSD this lock is now taken and released for each page instead of around the whole loop. This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking will follow if there is no fallout. Ported from NetBSD, tested by many, thanks! ok kettenis@, kn@
2021-12-07	uvm_map_inentry() is provided a format string that says "inside", but then	Theo de Raadt
	prints the end which is in the next page. Subtract 1 to avoid confusion.
2021-10-24	Move pmap_{,k}remove() inside uvm_km_pgremove{,_intrsafe}().	Martin Pieuchot
	Reduce differences with NetBSD, tested by many as part of a larger diff. ok kettenis@
2021-10-05	Unref/free amaps before grabbing the KERNEL_LOCK().	Martin Pieuchot
	This is possible now that amaps & anons are protected by a per-map rwlock. Tested by many as part of a bigger diff. ok kettenis@
2021-06-17	Revert previous: unref of amap outside of the KERNEL_LOCK().	Martin Pieuchot
	This change introduced or exposed a leak of anons which result in system freezes. anton@ observed a high number of INUSE for anonpl and semarie@ saw multiple processes waiting in the fault handler on "flt_noramX" probably the one related to allocating an anon.
2021-06-15	Unref/free amaps before grabbing the KERNEL_LOCK().	Martin Pieuchot
	This is possible now that amaps & anons are protected by a per-map rwlock. ok kettenis@, jmatthew@
2021-05-22	Use atomic operations for reference counting VM maps.	Martin Pieuchot
	This is necessary to do this accounting without the KERNEL_LOCK(). ok mvs@, kettenis@
2021-03-26	Remove parenthesis around return value to reduce the diff with NetBSD.	Martin Pieuchot
	No functional change. ok mlarkin@
2021-03-12	spelling	Jonathan Gray
	ok mpi@
2021-03-05	ansi	Jonathan Gray

2021-02-23	remove unused uvm_mapent_bias()	Jonathan Gray
	ok mpi@
2021-01-19	(re)Introduce locking for amaps & anons.	Martin Pieuchot
	A rwlock is attached to every amap and is shared with all its anon. The same lock will be used by multiple amaps if they have anons in common. This should be enough to get the upper part of the fault handler out of the KERNEL_LOCK() which seems to bring up to 20% improvements in builds. This is based/copied/adapted from the most recent work done in NetBSD which is an evolution of the precendent simple_lock scheme. Tested by many, thanks! ok kettenis@, mvs@
2020-10-19	Serialize accesses to "struct vmspace" and document its refcounting.	Martin Pieuchot
	The underlying vm_space lock is used as a substitute to the KERNEL_LOCK() in uvm_grow() to make sure `vm_ssize' is not corrupted. ok anton@, kettenis@
2020-09-22	Spell inline correctly.	Martin Pieuchot
	Reduce differences with NetBSD. ok mvs@, kettenis@
2020-09-14	Since the issues with calling uvm_map_inentry_fix() without holding the	Mark Kettenis
	kernel lock are fixed now, push the kernel lock down again. ok deraadt@
2020-09-12	Add tracepoints in the page fault handler and when entries are added to maps.	Martin Pieuchot
	ok kettenis@
2020-07-06	fix spelling	Theo de Raadt

2020-03-25	Do not test against NULL a variable which is dereference before that.	Martin Pieuchot
	CID 1453116 ok kettenis@
2020-03-04	Do not count pages mapped as PROT_NONE against the RLIMIT_DATA limit.	Mark Kettenis
	Instead count (and check the limit) when their protection gets flipped from PROT_NONE to something that permits access. This means that mprotect(2) may now fail if changing the protection would exceed RLIMIT_DATA. This helps code (such as Chromium's JavaScript interpreter that reserves large chunks of address space but populates it sparsely. ok deraadt@, otto@, kurt@, millert@, robert@
2019-12-30	convert infinite msleep(9) to msleep_nsec(9)	Jonathan Gray
	ok mpi@
2019-12-18	Set vm_map's pmap in uvm_map_setup().	Visa Hankala
	OK guenther@, kettenis@, mpi@