summaryrefslogtreecommitdiff
path: root/sys/uvm
AgeCommit message (Collapse)Author
2022-06-07Remove uvm_km_valloc_prefer_wait(9) and uvm_km_free_wakeup(9) now thatMark Kettenis
nothing uses these functions anymore. ok mpi@
2022-06-07Remove redundant check for free pages. The pmemrange code that is calledMark Kettenis
by uvm_pglistalloc(9) does a similar check already. ok mpi@
2022-06-02Take the size of allocation into account when checking the kernel reserve.Mark Kettenis
ok mpi@
2022-05-14uvm_km_valloc(9), uvm_km_valloc_try(9), uvm_km_valloc_wait(9) andMark Kettenis
uvm_km_valloc_align(9) are no longer used. Remove these functions. ok mpi@
2022-05-12Consider BUFPAGES_DEFICIT in swap_shortage.Martin Pieuchot
ok beck@
2022-05-12Introduce uvm_pagedequeue() to reduce code duplication.Martin Pieuchot
ok kettenis@
2022-05-04Merge swap-backed and object-backed inactive page lists.Martin Pieuchot
ok millert@, kettenis@
2022-05-03Rate limit uvn_flush error during pageout messages. They occurAlexander Bluhm
when a memory mapped file cannot be written to disk, e.g. if the file system is full. Too much printf() during kernel relinking slows down the system boot. OK deraadt@
2022-04-30Recheck PG_BUSY after locking the page.Martin Pieuchot
Another thread can set the bit if we sleep during rw_enter(9) in which case the page shouldn't be touched. ok semarie@
2022-04-28Always acquire the `vmobjlock' before incrementing an object's reference.Martin Pieuchot
2022-04-28Call uvm_pageactivate() from uvm_pageunwire() instead of rerolling it.Martin Pieuchot
Having fewer places manipulating the global list of active/inactive pages will help future LRU improvements. ok kettenis@, kn@
2022-04-28Update uvmexp.swpgonly only once in uvm_swap_get().Martin Pieuchot
Prevent a small window where a check could be incorrect in case an error occurs in uvm_swap_io(). ok kettenis@, kn@
2022-04-19add missing unlock before returning in uvn_detach()Sebastien Marie
uvn_detach sets UVM_VNODE_RELKILL flag and wait for all async i/o to finish. but uvm_vnp_terminate() could clear the flag and take over the vnode. mpi@ noted that this code path is mostly dead code because there is no "async I/O" (uvn_io() is always synchronous). ok visa@ mpi@
2022-04-11Remove trailing spaces.Martin Pieuchot
2022-04-04Replace KASSERT in uvm_fault_unwire_locked() with code that handles theMark Kettenis
case where not all pages are wired. The KASSERT can be triggered in multi-threaded applications when a thread calling munmap(2) races another thread that invokes sysctl(2). Properly written code shouldn't do this, but making the kernel crash in this case is a bit harsh. ok gezdo@, deraadt@ Fixes: Reported-by: syzbot+e8310909e2910c9cca08@syzkaller.appspotmail.com
2022-03-17In swap_io() allocate the buffer before doing encryption.Martin Pieuchot
If the allocation fails due to memory pressure no time is wasted doing encryption. This also simplify the error path. Tested by sthen@. ok kn@, miod@, kettenis@, tb@
2022-03-12Uncompress some one line comments to reduce the difference with NetBSD.Martin Pieuchot
No functionnal change.
2022-03-12Revert holding a read lock on the map while copying out data during sysctl(2).Martin Pieuchot
This introduced a lock ordering issue reported by naddy@, anton@ and syzkaller. Reported-by: syzbot+739bb901045d9b193bde@syzkaller.appspotmail.com
2022-03-11Hold a read lock on the map while copying out data during a sysctl(2) callMark Kettenis
to prevent another thread from unmapping the memory and triggering an assertion or even corrupting random physical memory pages. This fix is similar to the change in uvm_glue.c rev. 1.74. However in this case we need to be careful since some sysctl(2) calls look at the map of the current process. In those cases we must not attempt to lock the map again. ok mpi@ Should fix: Reported-by: syzbot+be89fe83d6c004fcb412@syzkaller.appspotmail.com
2022-03-10Do not clear the PG_BUSY flag before passing the anon to uvm_anon_release().Martin Pieuchot
Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out its memory. This code path has been broken since at least January 2021 and is apparently not so easy to trigger. Found the hard way by sthen@ ok kettenis@, kn@
2022-02-22Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>Philip Guenther
net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
2022-02-21Grab vmobjlocks with RW_DUPOK in vm_obj_wire() to silence WITNESSKlemens Nanni
The drm subsystem implements graphics buffers as uvm objects backed by anonymous memory, thus drm locks and aobj locks share the same "vmobjlock" type. uvm_obj_wire() is only called from sys/dev/pci/drm/, so instead of changing drm's lock init/alloc routines to mark allow duplicate locks in general, enter uvm's vmobjlock with RW_DUPOK in this function to allow duplicate lock types per thread in this specific call path alone. Fixes the following WITNESS report when booting/starting X (as seen already in other unrelated bugs@ reports): wsdisplay0: screen 1-5 added (std, vt100 emulation) witness: acquiring duplicate lock of same type: "&uobj->vmobjlock" 1st uobjlk 2nd uobjlk Starting stack trace... witness_checkorder(fffffd83b625f9b0,9,0) at witness_checkorder+0x8ac rw_enter(fffffd83b625f9a0,1) at rw_enter+0x68 uvm_obj_wire(fffffd843c39e948,0,40000,ffff800033b70428) at uvm_obj_wire+0x46 shmem_get_pages(ffff800008008500) at shmem_get_pages+0xb8 __i915_gem_object_get_pages(ffff800008008500) at __i915_gem_object_get_pages+0x6d i915_gem_fault(ffff800008008500,ffff800033b707c0,10009b000,a43d6b1c000,ffff800033b70740,1,35ba896911df1241,ffff8000000aa078,ffff8000000aa178) at i915_gem_fault+0x203 drm_fault(ffff800033b707c0,a43d6b1c000,ffff800033b70740,1,0,0,7eca45006f70ee0,ffff800033b707c0) at drm_fault+0x156 uvm_fault(fffffd843a7cf480,a43d6b1c000,0,2) at uvm_fault+0x179 upageflttrap(ffff800033b70920,a43d6b1c000) at upageflttrap+0x62 usertrap(ffff800033b70920) at usertrap+0x129 recall_trap() at recall_trap+0x8 end of kernel end trace frame: 0x7f7ffffdc7c0, count: 246 End of stack trace. Input kettenis OK mpi
2022-02-21interting -> insertingJonathan Gray
2022-02-18Convert KVA allocation to km_alloc(9).Mark Kettenis
ok mpi@
2022-02-15Backout previous "Unwire with map lock held" (commitid: SsVz7dLGFgR21kFe)Klemens Nanni
The (known) lock order reversals which now occur more reliably and much earlier on WITNESS boots with this diff knock out syzcaller reports since syzcaller stops at the first "crash report": https://syzkaller.appspot.com/bug?id=81b39e970cd2eb21b97d1b31746c693e300fd2dd
2022-02-14Unwire with map lock heldKlemens Nanni
This is an updated version of uvm_map.c r1.283 "Unwire with map lock held". The previous version introduced a use-after-free by not unlocking vm_map locks in uvm_map_teardown(), resulting in dangling references on the reaper's lock list (thanks visa!). Lock and unlock the map in around uvm_map_teardown() instead. This code path holds the last reference, hence the lock isn't strictly needed except for satisfying upcoming locking assertions. Tested on amd64, arm64, i386, macppc, octeon, sparc64. This time also with WITNESS enabled (except on sparc64 which builds but does not boot with WITNESS; this is a known issue). OK mpi visa
2022-02-11Backout previous "Unwire with map lock held" (commitid: eQBvWUwShD91dN9Z)Klemens Nanni
WITNESS builds broke^W^Wkernels panic on boot as reported by anton and bluhm. Booting bsd.mp in single-user mode inside VMM shows: root on sd0a (5f9e458ed30b39ab.a) swap on sd0b dump on sd0b Enter pathname of shell or RETURN for sh: witness: lock order reversal: 1st 0xfffffd801f8ce468 vmmaplk (&map->lock) 2nd 0xfffffd801b8162c0 inode (&ip->i_lock) lock order "&ip->i_lock"(rrwlock) -> "&map->lock"(rwlock) first seen at: #0 rw_enter_read+0x38 #1 uvmfault_lookup+0x8a #2 uvm_fault_check+0x32 #3 uvm_fault+0xfb #4 kpageflttrap+0x12c #5 kerntrap+0x91 #6 alltraps_kern_meltdown+0x7b #7 copyout+0x53 #8 ffs_read+0x1f6 #9 VOP_READ+0x41 #10 vn_rdwr+0xa1 #11 vmcmd_map_readvn+0xa0 #12 exec_process_vmcmds+0x88 #13 sys_execve+0x732 #14 start_init+0x26f #15 proc_trampoline+0x1c lock order data w1 -> w2 missing # exit kernel: protection fault trap, code=0 Stopped at witness_checkorder+0x312: movl 0x10(%r14),%ecx gkoehler reported faults on poisened addresses on macppc dual G5.
2022-02-11Backout previous "Assert vm map locks" (commitid: sRNBfzX2dJrxFDmb)Klemens Nanni
WITNESS builds broke as reported by anton and bluhm: root on sd0a (5ec49b3ad23eb2d4.a) swap on sd0b dump on sd0b kernel: protection fault trap, code=0 Stopped at witness_checkorder+0x4ec: movl 0x10(%r12),%ecx https://syzkaller.appspot.com/bug?id=be02b290a93c648986c35370a271aad4135a5044 https://syzkaller.appspot.com/text?tag=CrashLog&x=136e9aa4700000
2022-02-10Assert vm map locksKlemens Nanni
Introduce vm_map_assert_{wrlock,rdlock,anylock,unlocked}() in rwlock(9) fashion and back up function comments about locking assumptions with proper assertions. Also add new comments/assertions based on code analysis and sync with NetBSD as much as possible. vm_map_lock() and vm_map_lock_read() are used for exclusive and shared access respectively; currently no code path is purely protected by vm_map_lock_read() alone, i.e. functions called with a read lock held by the callee are also called with a write lock elsewhere. Thus only vm_map_assert_{wrlock,anylock}() are used as of now. This should help with unlocking UVM related syscalls Tested as part of a larger diff through - amd64 package bulk build by naddy - amd64, arm64, powerpc64 base builds and regress by bluhm - amd64 and sparc64 base builds and regress by me Input mpi Feedback OK kettenis
2022-02-10Unwire with map lock heldKlemens Nanni
uvm_unmap_remove() effectively requires its caller to lock the vm map. Even though uvm_map_teardown() is only called after a map's last reference is dropped and is thus safe from other threads accessing the map, grab the map's lock in uvm_map_teardown() to satify upcoming lock assertions in uvm_unmap_remove(). Tested as part of a larger diff through - amd64 package bulk builds by naddy - amd64, arm64, powerpc64 base builds and regress by bluhm - amd64 and sparc64 base builds and regress by me Feedback mpi OK kettenis
2022-02-03The sparc64 pmap at least requires the fault access_type to be aPhilip Guenther
subset of the request permissions, so when forcing an initial RO fault for CoW also clamp the access_type. problem reported by bluhm@ based on a suggestion from miod@ ok kettenis@
2022-02-03Use UVM_KMF_TRYLOCK for consistencyKlemens Nanni
No object change. OK millert
2022-02-01Attempt to guarantee that on copy-on-write faulting, the new copyPhilip Guenther
can't be written to while any thread can see the original version of the page via a not-yet-flushed stale TLB entry: pmaps can indicate they do this correctly by defining __HAVE_PMAP_MPSAFE_ENTER_COW; uvm will force the initial CoW fault to be read-only otherwise. Set that on amd64 and fix the problem case in pmap_enter() by putting a read-only mapping in place, shooting the TLB entry, then fixing it to the final read-write entry so this thread can continue without re-faulting. reported by jsing@ from https://github.com/golang/go/issues/34988 assisted by discussion in https://reviews.freebsd.org/D14347 tweaks from jsing@ and kettenis@ ok jsing@ mpi@ kettenis@
2022-01-29Fix macro name in comment.Kenji Aoyama
ok visa@
2022-01-19Grab the kernel lock in uvm_wxcheck() when aborting the processKlemens Nanni
kern.wxabort=1 logs and kills programs after W^X violations. At least sigexit() -> coredump() as well as the non-atomic increment of ps_wxcounter require protection, so grab the big lock for the entire block. This is part of the effort to unlock mmap(2)'s MAP_ANON case. Feedback mvs claudio kettenis deraadt OK kettenis
2022-01-19Comment out an incorrect lock assertion.Martin Pieuchot
The swap code path in uvm_aio_aiodone() is not holding the corresponding page lock and shouldn't as long as anons are locked inside uvm_page_unbusy() to handle the PG_RELEASED case. Reported by Ralf Horstmann on bugs@
2022-01-17Call uvm_pglistfree(9) instead of uvm_pmr_freepageq().Martin Pieuchot
There is no functionnal change as the former is just a wrapper around the latter. However upper layer of UVM do not need to mess with the internals of the page allocator. This will also help when a page cache will be introduced to reduce contention on the global mutex serializing acess to pmemrange's data. ok kettenis@, kn@, tb@
2022-01-05Remove kbind(2)'s restriction that a target buffer not cross pagePhilip Guenther
boundaries: hppa has 8-byte PLT entries that sometimes do that. ok kettenis@
2021-12-29Consistently name page argument `pg'.Martin Pieuchot
Reduce differences with NetBSD, no functional changes.
2021-12-28Unlock bottom part of the fault handler.Martin Pieuchot
Tested by many during the past months, thanks! ok sthen@
2021-12-23Roll the syscalls that have an off_t argument to remove the explicit padding.Philip Guenther
Switch libc and ld.so to the generic stubs for these calls. WARNING: reboot to updated kernel before installing libc or ld.so! Time for a story... When gcc (back in 1.x days) first implemented long long, it didn't (always) pass 64bit arguments in 'aligned' registers/stack slots, with the result that argument offsets didn't match structure offsets. This affected the nine system calls that pass off_t arguments: ftruncate lseek mmap mquery pread preadv pwrite pwritev truncate To avoid having to do custom ASM wrappers for those, BSD put an explicit pad argument in so that the off_t argument would always start on a even slot and thus be naturally aligned. Thus those odd wrappers in lib/libc/sys/ that use __syscall() and pass an extra '0' argument. The ABIs for different CPUs eventually settled how things should be passed on each and gcc 2.x followed them. The only arch now where it helps is landisk, which needs to skip the last argument register if it would be the first half of a 64bit argument. So: add new syscalls without the pad argument and on landisk do that skipping directly in the syscall handler in the kernel. Keep compat support for the existing syscalls long enough for the transition. ok deraadt@
2021-12-21Fix a typo in mlock(2) error path triggering a double-free.Martin Pieuchot
Pass the correct entry to uvm_fault_unwire_locked(). Reported-by: syzbot+bb2f63f076618e9ed0d3@syzkaller.appspotmail.com ok kettenis@, deraadt@
2021-12-17Do not try to unlock a NULL object.Martin Pieuchot
Fix a NULL dereference introduced in previous, reported by anton@ and Benjamin Baier. Reported-by: syzbot+c172bd335801b67e515b@syzkaller.appspotmail.com
2021-12-15Use a per-UVM object lock to serialize the lower part of the fault handler.Martin Pieuchot
Like the per-amap lock the `vmobjlock' is principally used to serialized access to objects in the fault handler to allow faults occurring on different CPUs and different objects to be processed in parallel. The fault handler now acquires the `vmobjlock' of a given UVM object as soon as it finds one. For now a write-lock is always acquired even if some operations could use a read-lock. Every pager, corresponding to a different kind of UVM object, now expect the UVM object to be locked and some operations, like *_get() return it unlocked. This is enforced by assertions checking for rw_write_held(). The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager. To ensure the correct amap or object lock is held when modifying a page many uvm_page* operations are now asserting for the "owner" lock. However, fields of the "struct vm_page" are still being protected by the global `pageqlock'. To prevent lock ordering issues with the new `vmobjlock' and to reduce differences with NetBSD this lock is now taken and released for each page instead of around the whole loop. This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking will follow if there is no fallout. Ported from NetBSD, tested by many, thanks! ok kettenis@, kn@
2021-12-12Add vnode parameter to VOP_STRATEGY()Visa Hankala
Pass the device vnode as a parameter to VOP_STRATEGY() to allow calling the correct vop_strategy callback. Now the vnode is also available in the callback. OK mpi@
2021-12-10Revert "kbind(2): disable system call if not initialized beforePhilip Guenther
first __tfork(2)" The immediate issue is that a process linked with -znow will still perform lazy relocation on objects loaded with dlopen(), but there are possibly other dark corners to plumb to find a better invariant. Problem reported by thfr@
2021-12-07uvm_map_inentry() is provided a format string that says "inside", but thenTheo de Raadt
prints the end which is in the next page. Subtract 1 to avoid confusion.
2021-12-07uvn_reference(): correct printf(9) argument orderScott Soule Cheloha
Thread: https://marc.info/?l=openbsd-tech&m=163884527530326&w=2 ok deraadt@
2021-12-05kbind(2): disable system call if not initialized before first __tfork(2)Scott Soule Cheloha
To unlock kbind(2) we need to protect ps_kbind_addr and ps_kbind_cookie. The simplest way to do this is to disallow kbind(2) initialization after the first __tfork(2) call. If the first thread does not initialize the kbind(2) variables before __tfork(2) then we disable kbind(2) during that first __tfork(2) call. This is guenther@'s patch, I'm just committing it. Discussed with guenther@, deraadt@, kettenis@, and mpi@. ok kettenis@, positive response from mpi@, "I am busy" guenther@
2021-11-11Convert a for loop into LIST_FOREACH to reduce the diff to NetBSD.Theo Buehler
ok millert mpi