summaryrefslogtreecommitdiff
path: root/sys/uvm
AgeCommit message (Collapse)Author
2022-02-22Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>Philip Guenther
net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
2022-02-21Grab vmobjlocks with RW_DUPOK in vm_obj_wire() to silence WITNESSKlemens Nanni
The drm subsystem implements graphics buffers as uvm objects backed by anonymous memory, thus drm locks and aobj locks share the same "vmobjlock" type. uvm_obj_wire() is only called from sys/dev/pci/drm/, so instead of changing drm's lock init/alloc routines to mark allow duplicate locks in general, enter uvm's vmobjlock with RW_DUPOK in this function to allow duplicate lock types per thread in this specific call path alone. Fixes the following WITNESS report when booting/starting X (as seen already in other unrelated bugs@ reports): wsdisplay0: screen 1-5 added (std, vt100 emulation) witness: acquiring duplicate lock of same type: "&uobj->vmobjlock" 1st uobjlk 2nd uobjlk Starting stack trace... witness_checkorder(fffffd83b625f9b0,9,0) at witness_checkorder+0x8ac rw_enter(fffffd83b625f9a0,1) at rw_enter+0x68 uvm_obj_wire(fffffd843c39e948,0,40000,ffff800033b70428) at uvm_obj_wire+0x46 shmem_get_pages(ffff800008008500) at shmem_get_pages+0xb8 __i915_gem_object_get_pages(ffff800008008500) at __i915_gem_object_get_pages+0x6d i915_gem_fault(ffff800008008500,ffff800033b707c0,10009b000,a43d6b1c000,ffff800033b70740,1,35ba896911df1241,ffff8000000aa078,ffff8000000aa178) at i915_gem_fault+0x203 drm_fault(ffff800033b707c0,a43d6b1c000,ffff800033b70740,1,0,0,7eca45006f70ee0,ffff800033b707c0) at drm_fault+0x156 uvm_fault(fffffd843a7cf480,a43d6b1c000,0,2) at uvm_fault+0x179 upageflttrap(ffff800033b70920,a43d6b1c000) at upageflttrap+0x62 usertrap(ffff800033b70920) at usertrap+0x129 recall_trap() at recall_trap+0x8 end of kernel end trace frame: 0x7f7ffffdc7c0, count: 246 End of stack trace. Input kettenis OK mpi
2022-02-21interting -> insertingJonathan Gray
2022-02-18Convert KVA allocation to km_alloc(9).Mark Kettenis
ok mpi@
2022-02-15Backout previous "Unwire with map lock held" (commitid: SsVz7dLGFgR21kFe)Klemens Nanni
The (known) lock order reversals which now occur more reliably and much earlier on WITNESS boots with this diff knock out syzcaller reports since syzcaller stops at the first "crash report": https://syzkaller.appspot.com/bug?id=81b39e970cd2eb21b97d1b31746c693e300fd2dd
2022-02-14Unwire with map lock heldKlemens Nanni
This is an updated version of uvm_map.c r1.283 "Unwire with map lock held". The previous version introduced a use-after-free by not unlocking vm_map locks in uvm_map_teardown(), resulting in dangling references on the reaper's lock list (thanks visa!). Lock and unlock the map in around uvm_map_teardown() instead. This code path holds the last reference, hence the lock isn't strictly needed except for satisfying upcoming locking assertions. Tested on amd64, arm64, i386, macppc, octeon, sparc64. This time also with WITNESS enabled (except on sparc64 which builds but does not boot with WITNESS; this is a known issue). OK mpi visa
2022-02-11Backout previous "Unwire with map lock held" (commitid: eQBvWUwShD91dN9Z)Klemens Nanni
WITNESS builds broke^W^Wkernels panic on boot as reported by anton and bluhm. Booting bsd.mp in single-user mode inside VMM shows: root on sd0a (5f9e458ed30b39ab.a) swap on sd0b dump on sd0b Enter pathname of shell or RETURN for sh: witness: lock order reversal: 1st 0xfffffd801f8ce468 vmmaplk (&map->lock) 2nd 0xfffffd801b8162c0 inode (&ip->i_lock) lock order "&ip->i_lock"(rrwlock) -> "&map->lock"(rwlock) first seen at: #0 rw_enter_read+0x38 #1 uvmfault_lookup+0x8a #2 uvm_fault_check+0x32 #3 uvm_fault+0xfb #4 kpageflttrap+0x12c #5 kerntrap+0x91 #6 alltraps_kern_meltdown+0x7b #7 copyout+0x53 #8 ffs_read+0x1f6 #9 VOP_READ+0x41 #10 vn_rdwr+0xa1 #11 vmcmd_map_readvn+0xa0 #12 exec_process_vmcmds+0x88 #13 sys_execve+0x732 #14 start_init+0x26f #15 proc_trampoline+0x1c lock order data w1 -> w2 missing # exit kernel: protection fault trap, code=0 Stopped at witness_checkorder+0x312: movl 0x10(%r14),%ecx gkoehler reported faults on poisened addresses on macppc dual G5.
2022-02-11Backout previous "Assert vm map locks" (commitid: sRNBfzX2dJrxFDmb)Klemens Nanni
WITNESS builds broke as reported by anton and bluhm: root on sd0a (5ec49b3ad23eb2d4.a) swap on sd0b dump on sd0b kernel: protection fault trap, code=0 Stopped at witness_checkorder+0x4ec: movl 0x10(%r12),%ecx https://syzkaller.appspot.com/bug?id=be02b290a93c648986c35370a271aad4135a5044 https://syzkaller.appspot.com/text?tag=CrashLog&x=136e9aa4700000
2022-02-10Assert vm map locksKlemens Nanni
Introduce vm_map_assert_{wrlock,rdlock,anylock,unlocked}() in rwlock(9) fashion and back up function comments about locking assumptions with proper assertions. Also add new comments/assertions based on code analysis and sync with NetBSD as much as possible. vm_map_lock() and vm_map_lock_read() are used for exclusive and shared access respectively; currently no code path is purely protected by vm_map_lock_read() alone, i.e. functions called with a read lock held by the callee are also called with a write lock elsewhere. Thus only vm_map_assert_{wrlock,anylock}() are used as of now. This should help with unlocking UVM related syscalls Tested as part of a larger diff through - amd64 package bulk build by naddy - amd64, arm64, powerpc64 base builds and regress by bluhm - amd64 and sparc64 base builds and regress by me Input mpi Feedback OK kettenis
2022-02-10Unwire with map lock heldKlemens Nanni
uvm_unmap_remove() effectively requires its caller to lock the vm map. Even though uvm_map_teardown() is only called after a map's last reference is dropped and is thus safe from other threads accessing the map, grab the map's lock in uvm_map_teardown() to satify upcoming lock assertions in uvm_unmap_remove(). Tested as part of a larger diff through - amd64 package bulk builds by naddy - amd64, arm64, powerpc64 base builds and regress by bluhm - amd64 and sparc64 base builds and regress by me Feedback mpi OK kettenis
2022-02-03The sparc64 pmap at least requires the fault access_type to be aPhilip Guenther
subset of the request permissions, so when forcing an initial RO fault for CoW also clamp the access_type. problem reported by bluhm@ based on a suggestion from miod@ ok kettenis@
2022-02-03Use UVM_KMF_TRYLOCK for consistencyKlemens Nanni
No object change. OK millert
2022-02-01Attempt to guarantee that on copy-on-write faulting, the new copyPhilip Guenther
can't be written to while any thread can see the original version of the page via a not-yet-flushed stale TLB entry: pmaps can indicate they do this correctly by defining __HAVE_PMAP_MPSAFE_ENTER_COW; uvm will force the initial CoW fault to be read-only otherwise. Set that on amd64 and fix the problem case in pmap_enter() by putting a read-only mapping in place, shooting the TLB entry, then fixing it to the final read-write entry so this thread can continue without re-faulting. reported by jsing@ from https://github.com/golang/go/issues/34988 assisted by discussion in https://reviews.freebsd.org/D14347 tweaks from jsing@ and kettenis@ ok jsing@ mpi@ kettenis@
2022-01-29Fix macro name in comment.Kenji Aoyama
ok visa@
2022-01-19Grab the kernel lock in uvm_wxcheck() when aborting the processKlemens Nanni
kern.wxabort=1 logs and kills programs after W^X violations. At least sigexit() -> coredump() as well as the non-atomic increment of ps_wxcounter require protection, so grab the big lock for the entire block. This is part of the effort to unlock mmap(2)'s MAP_ANON case. Feedback mvs claudio kettenis deraadt OK kettenis
2022-01-19Comment out an incorrect lock assertion.Martin Pieuchot
The swap code path in uvm_aio_aiodone() is not holding the corresponding page lock and shouldn't as long as anons are locked inside uvm_page_unbusy() to handle the PG_RELEASED case. Reported by Ralf Horstmann on bugs@
2022-01-17Call uvm_pglistfree(9) instead of uvm_pmr_freepageq().Martin Pieuchot
There is no functionnal change as the former is just a wrapper around the latter. However upper layer of UVM do not need to mess with the internals of the page allocator. This will also help when a page cache will be introduced to reduce contention on the global mutex serializing acess to pmemrange's data. ok kettenis@, kn@, tb@
2022-01-05Remove kbind(2)'s restriction that a target buffer not cross pagePhilip Guenther
boundaries: hppa has 8-byte PLT entries that sometimes do that. ok kettenis@
2021-12-29Consistently name page argument `pg'.Martin Pieuchot
Reduce differences with NetBSD, no functional changes.
2021-12-28Unlock bottom part of the fault handler.Martin Pieuchot
Tested by many during the past months, thanks! ok sthen@
2021-12-23Roll the syscalls that have an off_t argument to remove the explicit padding.Philip Guenther
Switch libc and ld.so to the generic stubs for these calls. WARNING: reboot to updated kernel before installing libc or ld.so! Time for a story... When gcc (back in 1.x days) first implemented long long, it didn't (always) pass 64bit arguments in 'aligned' registers/stack slots, with the result that argument offsets didn't match structure offsets. This affected the nine system calls that pass off_t arguments: ftruncate lseek mmap mquery pread preadv pwrite pwritev truncate To avoid having to do custom ASM wrappers for those, BSD put an explicit pad argument in so that the off_t argument would always start on a even slot and thus be naturally aligned. Thus those odd wrappers in lib/libc/sys/ that use __syscall() and pass an extra '0' argument. The ABIs for different CPUs eventually settled how things should be passed on each and gcc 2.x followed them. The only arch now where it helps is landisk, which needs to skip the last argument register if it would be the first half of a 64bit argument. So: add new syscalls without the pad argument and on landisk do that skipping directly in the syscall handler in the kernel. Keep compat support for the existing syscalls long enough for the transition. ok deraadt@
2021-12-21Fix a typo in mlock(2) error path triggering a double-free.Martin Pieuchot
Pass the correct entry to uvm_fault_unwire_locked(). Reported-by: syzbot+bb2f63f076618e9ed0d3@syzkaller.appspotmail.com ok kettenis@, deraadt@
2021-12-17Do not try to unlock a NULL object.Martin Pieuchot
Fix a NULL dereference introduced in previous, reported by anton@ and Benjamin Baier. Reported-by: syzbot+c172bd335801b67e515b@syzkaller.appspotmail.com
2021-12-15Use a per-UVM object lock to serialize the lower part of the fault handler.Martin Pieuchot
Like the per-amap lock the `vmobjlock' is principally used to serialized access to objects in the fault handler to allow faults occurring on different CPUs and different objects to be processed in parallel. The fault handler now acquires the `vmobjlock' of a given UVM object as soon as it finds one. For now a write-lock is always acquired even if some operations could use a read-lock. Every pager, corresponding to a different kind of UVM object, now expect the UVM object to be locked and some operations, like *_get() return it unlocked. This is enforced by assertions checking for rw_write_held(). The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager. To ensure the correct amap or object lock is held when modifying a page many uvm_page* operations are now asserting for the "owner" lock. However, fields of the "struct vm_page" are still being protected by the global `pageqlock'. To prevent lock ordering issues with the new `vmobjlock' and to reduce differences with NetBSD this lock is now taken and released for each page instead of around the whole loop. This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking will follow if there is no fallout. Ported from NetBSD, tested by many, thanks! ok kettenis@, kn@
2021-12-12Add vnode parameter to VOP_STRATEGY()Visa Hankala
Pass the device vnode as a parameter to VOP_STRATEGY() to allow calling the correct vop_strategy callback. Now the vnode is also available in the callback. OK mpi@
2021-12-10Revert "kbind(2): disable system call if not initialized beforePhilip Guenther
first __tfork(2)" The immediate issue is that a process linked with -znow will still perform lazy relocation on objects loaded with dlopen(), but there are possibly other dark corners to plumb to find a better invariant. Problem reported by thfr@
2021-12-07uvm_map_inentry() is provided a format string that says "inside", but thenTheo de Raadt
prints the end which is in the next page. Subtract 1 to avoid confusion.
2021-12-07uvn_reference(): correct printf(9) argument orderScott Soule Cheloha
Thread: https://marc.info/?l=openbsd-tech&m=163884527530326&w=2 ok deraadt@
2021-12-05kbind(2): disable system call if not initialized before first __tfork(2)Scott Soule Cheloha
To unlock kbind(2) we need to protect ps_kbind_addr and ps_kbind_cookie. The simplest way to do this is to disallow kbind(2) initialization after the first __tfork(2) call. If the first thread does not initialize the kbind(2) variables before __tfork(2) then we disable kbind(2) during that first __tfork(2) call. This is guenther@'s patch, I'm just committing it. Discussed with guenther@, deraadt@, kettenis@, and mpi@. ok kettenis@, positive response from mpi@, "I am busy" guenther@
2021-11-11Convert a for loop into LIST_FOREACH to reduce the diff to NetBSD.Theo Buehler
ok millert mpi
2021-10-24Move pmap_{,k}remove() inside uvm_km_pgremove{,_intrsafe}().Martin Pieuchot
Reduce differences with NetBSD, tested by many as part of a larger diff. ok kettenis@
2021-10-24Shuffle variables around and use KASSERT() instead of panic().Martin Pieuchot
No functionnal change. Reduce differences with NetBSD, tested by many as part of a larger diff.
2021-10-23Sprinkle uvm_obj_destroy() over UVM object recycling code.Martin Pieuchot
For now, only assert that the tree of pages is empty in uvm_obj_destroy(). This will soon be used to free the per-UVM object lock. While here call uvm_obj_init() when new vnodes are allocated instead of in uvn_attach(). Because vnodes and there associated UVM object are currently never freed, it isn't easy to know where/when to garbage collect the associated lock. So simply check that the reference of a given object is 0 when uvn_attach(). Tested by many as part of a bigger diff. ok kettenis@
2021-10-20revert vnode: remove VLOCKSWORK and check locking when vop_islocked != nullopSebastien Marie
(both kernel and userland bits) GENERIC + VFSLCKDEBUG is broken with it.
2021-10-19vnode: remove VLOCKSWORK and check locking when vop_islocked != nullopSebastien Marie
This flag is currently used to mark or unmark a vnode to actively check vnode locking semantic (when compiled with VFSLCKDEBUG). Currently, VLOCKSWORK flag isn't properly set for several FS implementation which have full locking support. This commit enable proper checking for them too (cd9660, udf, fuse, msdosfs, tmpfs). Instead of using a particular flag, it directly check if v_op->vop_islocked is nullop or not to activate or not the vnode locking checks. ok mpi@
2021-10-17km_alloc(9) needs to be passed a size that is a multiple of PAGE_SIZE.Patrick Wildt
ok mpi@
2021-10-12Introduce a dummy uvm_obj_destroy() interface. This function will beMark Kettenis
used in the near future (by mpi@) to improve the locking for uvm objects. Introducing this function now will me allow me to call it in the appropriate place in the drm code. ok mpi@, jsg@
2021-10-12Fix the deadlock between uvn_io() and uvn_flush() by restarting the fault.Martin Pieuchot
Do not allow a faulting thread to sleep on a contended vnode lock to prevent lock ordering issues with upcoming per-uobj lock. Also reduce the sleep value for VM_PAGER_AGAIN from 1sec to 5nsec to not add visible slowdown when starting a multi-threaded application with threads that fault on the same vnode (chromium, firefox, etc). Tested by anton@, tb@, robert@ and gnezdo@ ok anton@, tb@ Reported-by: syzbot+e63407b35dff08dbee02@syzkaller.appspotmail.com
2021-10-12Revert the fix for the deadlock between uvn_io() and uvn_flush().Martin Pieuchot
This fix (ab)use the vnode lock to serialize access to some fields of the corresponding pages associated with UVM vnode object and this will create new deadlocks with the introduction of a per-uobj lock. ok anton@
2021-10-05Unref/free amaps before grabbing the KERNEL_LOCK().Martin Pieuchot
This is possible now that amaps & anons are protected by a per-map rwlock. Tested by many as part of a bigger diff. ok kettenis@
2021-09-05Introduce dummy pagers for 'special' subsystems using UVM objects.Martin Pieuchot
Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate and manipulate pages. These objects should not be manipulated by uvm_fault() and do not currently require the same locking enforcement. Use the dummy pagers to explicitly document which UVM functions are meant to manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and instead still rely on the KERNEL_LOCK(). Tested by many as part of a larger diff. ok kettenis@, beck@
2021-08-30Fix a locking assertion in error path.Martin Pieuchot
In amap_copy() make the new amap share the source amap's lock right in the begining and only allocate a new one if no anon have been referenced. Issue reported by Thomas L. <tom.longshine at web dot de> on bugs@. ok tb@
2021-06-29remove arch ifdefs around drm.h includeJonathan Gray
ok deraadt@ kettenis@
2021-06-28Make anonymous object reference counting independant from the KERNEL_LOCK().Martin Pieuchot
- Use atomic operations for increment/decrement - Rewrite the loop from uao_swap_off() to only keep a reference to the next item in the list. ok jmatthew@
2021-06-25basic radeondrm / X support for riscv64. Ok kettenis@Matthieu Herrb
- add wscons devices - build radeondrm and add MD uvm bits to support it.
2021-06-17Revert previous: unref of amap outside of the KERNEL_LOCK().Martin Pieuchot
This change introduced or exposed a leak of anons which result in system freezes. anton@ observed a high number of INUSE for anonpl and semarie@ saw multiple processes waiting in the fault handler on "flt_noramX" probably the one related to allocating an anon.
2021-06-16Change the prefix of UVM object functions to match NetBSD's.Martin Pieuchot
For example uvm_objinit() becomes uvm_obj_init(). Reduce differences between the trees and help porting new functions needed for UVM object locking. No functionnal change.
2021-06-15Use a macro to assert that given uobjs correspond to anonymous objects.Martin Pieuchot
Reduce the difference with NetBSD. ok kettenis@
2021-06-15Unref/free amaps before grabbing the KERNEL_LOCK().Martin Pieuchot
This is possible now that amaps & anons are protected by a per-map rwlock. ok kettenis@, jmatthew@
2021-05-31call drmbackoff() on powerpc64 as wellJonathan Gray
ok kettenis@