summaryrefslogtreecommitdiff
path: root/sys/uvm/uvm_map.c
AgeCommit message (Collapse)Author
2023-02-11I forgot to copy the msyscall interlock flag to forked processes, soTheo de Raadt
only freshly executed processes were actually locked. (This happened because I didn't realize how the uvm_map's contents are copied entry by entry, and other parts are not) ok kettenis
2023-01-31On systems without xonly mmu hardware-enforcement, we can still mitigateTheo de Raadt
against classic BROP with a range-checking wrapper in front of copyin() and copyinstr() which ensures the userland source doesn't overlap the main program text, ld.so text, signal tramp text (it's mapping is hard to distinguish so it comes along for the ride), or libc.so text. ld.so tells the kernel libc.so text range with msyscall(2). The range checking for 2-4 elements is done without locking (because all 4 ranges are immutable!) and is inexpensive. write(sock, &open, 400) now fails with EFAULT. No programs have been discovered which require reading their own text segments with a system call. On a machine without mmu enforcement, a test program reports the following: userland kernel ld.so readable unreadable mmap xz unreadable unreadable mmap x readable readable mmap nrx readable readable mmap nwx readable readable mmap xnwx readable readable main readable unreadable libc unmapped? readable unreadable libc mapped readable unreadable ok kettenis, additional help from miod
2023-01-25In the previous commit, FIXPROT would upgrade a PROT_NONE mapping too far.Theo de Raadt
Correct the logic, still blocking PROT_EXEC ok anton kettenis
2023-01-24oops, a silly typoTheo de Raadt
2023-01-24uvm_map_extract() UVM_EXTRACT_FIXPROT alias mappings are only used forTheo de Raadt
read/write operations, so mask out PROT_EXEC to avoid creating an pointless exec mapping in the kernel. We probably need this masking upon minprot (for the non-UVM_EXTRACT_FIXPROT case) also, but I haven't done a test yet. ok kettenis
2022-12-18spellingTheo de Raadt
2022-11-17stack growth from setrlimit was never updated to set UVM_ET_STACK onTheo de Raadt
the entries, so the check-sp-at-system-call check failed. Quite strange it took this long to find this. ok kettenis
2022-11-04Assert the VM map lock is held in function used by mmap/mprotect/munmap.Martin Pieuchot
Also grab the lock in uvm_map_teardown() and uvm_map_deallocate() to satisfy the assertions. Grabbing the lock there shouldn't be strictly necessary, because no other reference to the map should exist when the reaper is holding it, but it doesn't hurt and makes our life easier. Inputs & tests from Ivo van der Sangen, tb@, gnezdo@, kn@ kettenis@ and tb@ agree with the direction, ok kn@
2022-10-31Fix VMMAP_DEBUG code to compile with not-so-recent changes.Martin Pieuchot
If enabled the debug code currently panic the kernel. To investigate.
2022-10-24uvm_unmap_remove() traverses the entries in the start,end range scanningTheo de Raadt
for IMMUTABLE, before traversing for unmap. I didn't copy enough traversal code for the scan, and thus MAP_FIXED was subtly broken. test help from tb, ok kettenis miod
2022-10-21Recent chrome renderers try to change some immutable RW region to R.Theo de Raadt
I really want immutable to not allow such transitions either, because it will help bring code up to the highest standard. For now, allow this for all processes, until we find out the underlying reason.
2022-10-21the debug "name" parameter to uvm_map_immutable() is no longer neededTheo de Raadt
2022-10-16Rather than marking MAP_STACK on entries for sigaltstack() [2 days ago],Theo de Raadt
go back to the old approach: using a new anon mapping because it removes any potential gadgetry pre-placed in the region (by making it zero). But also bring in a few more validation checks beyond contigious mapping -- it must not be a syscall region, and the protection must be precisely RW. This does allow sigaltstack() to shoot zero'd MAP_STACK non-immutable regions into the main stack area (which will soon be immutable). I am not sure we can keep reinforce immutable on the region after we do stack (like maybe determine this while doing the validation entry walk?) Sadly, continued support for sigaltstack() does require selecting the guessed best compromise. ok kettenis
2022-10-15remove one of the debug messagesTheo de Raadt
2022-10-15During the MAP_STACK introduction in 2018, sigaltstack() became aTheo de Raadt
problem because haphazard use could shoot holes in the address space (changing permissions, providing opportunities for pivoting, etc). I tried to write a diff to convert the address space correctly but did not understand enough about map entries, so instead we mapped new memory over top of the existing object. Placing a new mapping becomes unfeasible with the upcoming mimmutable model, so here is code that adds MAP_STACK to the region. It will only do so for a contigiously mapped region that is non-syscall with permission RW, otherwise it returns an error. Food for thought: If we know the object isn't service by an object, we should consider zero'ing the region, to block pre-pivot placement? ok kettenis
2022-10-07Add mimmutable(2) system call which locks the permissions (PROT_*) ofTheo de Raadt
memory mappings so they cannot be changed by a later mmap(), mprotect(), or munmap(), which will error with EPERM instead. ok kettenis
2022-08-15remove FSPACE macros, unused after uvm_map_sel_limits() removalJonathan Gray
2022-08-15remove unused uvm_map_sel_limits()Jonathan Gray
ok miod@ millert@
2022-08-07Move fallback PMAP_PREFER definitions from uvm_map.c to uvm_pmap.h for themMiod Vallat
to be available to other files. NFC ok kettenis@ mpi@
2022-05-04Merge swap-backed and object-backed inactive page lists.Martin Pieuchot
ok millert@, kettenis@
2022-03-12Revert holding a read lock on the map while copying out data during sysctl(2).Martin Pieuchot
This introduced a lock ordering issue reported by naddy@, anton@ and syzkaller. Reported-by: syzbot+739bb901045d9b193bde@syzkaller.appspotmail.com
2022-03-11Hold a read lock on the map while copying out data during a sysctl(2) callMark Kettenis
to prevent another thread from unmapping the memory and triggering an assertion or even corrupting random physical memory pages. This fix is similar to the change in uvm_glue.c rev. 1.74. However in this case we need to be careful since some sysctl(2) calls look at the map of the current process. In those cases we must not attempt to lock the map again. ok mpi@ Should fix: Reported-by: syzbot+be89fe83d6c004fcb412@syzkaller.appspotmail.com
2022-02-15Backout previous "Unwire with map lock held" (commitid: SsVz7dLGFgR21kFe)Klemens Nanni
The (known) lock order reversals which now occur more reliably and much earlier on WITNESS boots with this diff knock out syzcaller reports since syzcaller stops at the first "crash report": https://syzkaller.appspot.com/bug?id=81b39e970cd2eb21b97d1b31746c693e300fd2dd
2022-02-14Unwire with map lock heldKlemens Nanni
This is an updated version of uvm_map.c r1.283 "Unwire with map lock held". The previous version introduced a use-after-free by not unlocking vm_map locks in uvm_map_teardown(), resulting in dangling references on the reaper's lock list (thanks visa!). Lock and unlock the map in around uvm_map_teardown() instead. This code path holds the last reference, hence the lock isn't strictly needed except for satisfying upcoming locking assertions. Tested on amd64, arm64, i386, macppc, octeon, sparc64. This time also with WITNESS enabled (except on sparc64 which builds but does not boot with WITNESS; this is a known issue). OK mpi visa
2022-02-11Backout previous "Unwire with map lock held" (commitid: eQBvWUwShD91dN9Z)Klemens Nanni
WITNESS builds broke^W^Wkernels panic on boot as reported by anton and bluhm. Booting bsd.mp in single-user mode inside VMM shows: root on sd0a (5f9e458ed30b39ab.a) swap on sd0b dump on sd0b Enter pathname of shell or RETURN for sh: witness: lock order reversal: 1st 0xfffffd801f8ce468 vmmaplk (&map->lock) 2nd 0xfffffd801b8162c0 inode (&ip->i_lock) lock order "&ip->i_lock"(rrwlock) -> "&map->lock"(rwlock) first seen at: #0 rw_enter_read+0x38 #1 uvmfault_lookup+0x8a #2 uvm_fault_check+0x32 #3 uvm_fault+0xfb #4 kpageflttrap+0x12c #5 kerntrap+0x91 #6 alltraps_kern_meltdown+0x7b #7 copyout+0x53 #8 ffs_read+0x1f6 #9 VOP_READ+0x41 #10 vn_rdwr+0xa1 #11 vmcmd_map_readvn+0xa0 #12 exec_process_vmcmds+0x88 #13 sys_execve+0x732 #14 start_init+0x26f #15 proc_trampoline+0x1c lock order data w1 -> w2 missing # exit kernel: protection fault trap, code=0 Stopped at witness_checkorder+0x312: movl 0x10(%r14),%ecx gkoehler reported faults on poisened addresses on macppc dual G5.
2022-02-11Backout previous "Assert vm map locks" (commitid: sRNBfzX2dJrxFDmb)Klemens Nanni
WITNESS builds broke as reported by anton and bluhm: root on sd0a (5ec49b3ad23eb2d4.a) swap on sd0b dump on sd0b kernel: protection fault trap, code=0 Stopped at witness_checkorder+0x4ec: movl 0x10(%r12),%ecx https://syzkaller.appspot.com/bug?id=be02b290a93c648986c35370a271aad4135a5044 https://syzkaller.appspot.com/text?tag=CrashLog&x=136e9aa4700000
2022-02-10Assert vm map locksKlemens Nanni
Introduce vm_map_assert_{wrlock,rdlock,anylock,unlocked}() in rwlock(9) fashion and back up function comments about locking assumptions with proper assertions. Also add new comments/assertions based on code analysis and sync with NetBSD as much as possible. vm_map_lock() and vm_map_lock_read() are used for exclusive and shared access respectively; currently no code path is purely protected by vm_map_lock_read() alone, i.e. functions called with a read lock held by the callee are also called with a write lock elsewhere. Thus only vm_map_assert_{wrlock,anylock}() are used as of now. This should help with unlocking UVM related syscalls Tested as part of a larger diff through - amd64 package bulk build by naddy - amd64, arm64, powerpc64 base builds and regress by bluhm - amd64 and sparc64 base builds and regress by me Input mpi Feedback OK kettenis
2022-02-10Unwire with map lock heldKlemens Nanni
uvm_unmap_remove() effectively requires its caller to lock the vm map. Even though uvm_map_teardown() is only called after a map's last reference is dropped and is thus safe from other threads accessing the map, grab the map's lock in uvm_map_teardown() to satify upcoming lock assertions in uvm_unmap_remove(). Tested as part of a larger diff through - amd64 package bulk builds by naddy - amd64, arm64, powerpc64 base builds and regress by bluhm - amd64 and sparc64 base builds and regress by me Feedback mpi OK kettenis
2021-12-21Fix a typo in mlock(2) error path triggering a double-free.Martin Pieuchot
Pass the correct entry to uvm_fault_unwire_locked(). Reported-by: syzbot+bb2f63f076618e9ed0d3@syzkaller.appspotmail.com ok kettenis@, deraadt@
2021-12-15Use a per-UVM object lock to serialize the lower part of the fault handler.Martin Pieuchot
Like the per-amap lock the `vmobjlock' is principally used to serialized access to objects in the fault handler to allow faults occurring on different CPUs and different objects to be processed in parallel. The fault handler now acquires the `vmobjlock' of a given UVM object as soon as it finds one. For now a write-lock is always acquired even if some operations could use a read-lock. Every pager, corresponding to a different kind of UVM object, now expect the UVM object to be locked and some operations, like *_get() return it unlocked. This is enforced by assertions checking for rw_write_held(). The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager. To ensure the correct amap or object lock is held when modifying a page many uvm_page* operations are now asserting for the "owner" lock. However, fields of the "struct vm_page" are still being protected by the global `pageqlock'. To prevent lock ordering issues with the new `vmobjlock' and to reduce differences with NetBSD this lock is now taken and released for each page instead of around the whole loop. This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking will follow if there is no fallout. Ported from NetBSD, tested by many, thanks! ok kettenis@, kn@
2021-12-07uvm_map_inentry() is provided a format string that says "inside", but thenTheo de Raadt
prints the end which is in the next page. Subtract 1 to avoid confusion.
2021-10-24Move pmap_{,k}remove() inside uvm_km_pgremove{,_intrsafe}().Martin Pieuchot
Reduce differences with NetBSD, tested by many as part of a larger diff. ok kettenis@
2021-10-05Unref/free amaps before grabbing the KERNEL_LOCK().Martin Pieuchot
This is possible now that amaps & anons are protected by a per-map rwlock. Tested by many as part of a bigger diff. ok kettenis@
2021-06-17Revert previous: unref of amap outside of the KERNEL_LOCK().Martin Pieuchot
This change introduced or exposed a leak of anons which result in system freezes. anton@ observed a high number of INUSE for anonpl and semarie@ saw multiple processes waiting in the fault handler on "flt_noramX" probably the one related to allocating an anon.
2021-06-15Unref/free amaps before grabbing the KERNEL_LOCK().Martin Pieuchot
This is possible now that amaps & anons are protected by a per-map rwlock. ok kettenis@, jmatthew@
2021-05-22Use atomic operations for reference counting VM maps.Martin Pieuchot
This is necessary to do this accounting without the KERNEL_LOCK(). ok mvs@, kettenis@
2021-03-26Remove parenthesis around return value to reduce the diff with NetBSD.Martin Pieuchot
No functional change. ok mlarkin@
2021-03-12spellingJonathan Gray
ok mpi@
2021-03-05ansiJonathan Gray
2021-02-23remove unused uvm_mapent_bias()Jonathan Gray
ok mpi@
2021-01-19(re)Introduce locking for amaps & anons.Martin Pieuchot
A rwlock is attached to every amap and is shared with all its anon. The same lock will be used by multiple amaps if they have anons in common. This should be enough to get the upper part of the fault handler out of the KERNEL_LOCK() which seems to bring up to 20% improvements in builds. This is based/copied/adapted from the most recent work done in NetBSD which is an evolution of the precendent simple_lock scheme. Tested by many, thanks! ok kettenis@, mvs@
2020-10-19Serialize accesses to "struct vmspace" and document its refcounting.Martin Pieuchot
The underlying vm_space lock is used as a substitute to the KERNEL_LOCK() in uvm_grow() to make sure `vm_ssize' is not corrupted. ok anton@, kettenis@
2020-09-22Spell inline correctly.Martin Pieuchot
Reduce differences with NetBSD. ok mvs@, kettenis@
2020-09-14Since the issues with calling uvm_map_inentry_fix() without holding theMark Kettenis
kernel lock are fixed now, push the kernel lock down again. ok deraadt@
2020-09-12Add tracepoints in the page fault handler and when entries are added to maps.Martin Pieuchot
ok kettenis@
2020-07-06fix spellingTheo de Raadt
2020-03-25Do not test against NULL a variable which is dereference before that.Martin Pieuchot
CID 1453116 ok kettenis@
2020-03-04Do not count pages mapped as PROT_NONE against the RLIMIT_DATA limit.Mark Kettenis
Instead count (and check the limit) when their protection gets flipped from PROT_NONE to something that permits access. This means that mprotect(2) may now fail if changing the protection would exceed RLIMIT_DATA. This helps code (such as Chromium's JavaScript interpreter that reserves large chunks of address space but populates it sparsely. ok deraadt@, otto@, kurt@, millert@, robert@
2019-12-30convert infinite msleep(9) to msleep_nsec(9)Jonathan Gray
ok mpi@
2019-12-18Set vm_map's pmap in uvm_map_setup().Visa Hankala
OK guenther@, kettenis@, mpi@