Age | Commit message (Collapse) | Author |
|
nothing uses these functions anymore.
ok mpi@
|
|
by uvm_pglistalloc(9) does a similar check already.
ok mpi@
|
|
ok mpi@
|
|
uvm_km_valloc_align(9) are no longer used. Remove these functions.
ok mpi@
|
|
ok beck@
|
|
ok kettenis@
|
|
ok millert@, kettenis@
|
|
when a memory mapped file cannot be written to disk, e.g. if the
file system is full. Too much printf() during kernel relinking
slows down the system boot.
OK deraadt@
|
|
Another thread can set the bit if we sleep during rw_enter(9) in which case
the page shouldn't be touched.
ok semarie@
|
|
|
|
Having fewer places manipulating the global list of active/inactive pages
will help future LRU improvements.
ok kettenis@, kn@
|
|
Prevent a small window where a check could be incorrect in case an error
occurs in uvm_swap_io().
ok kettenis@, kn@
|
|
uvn_detach sets UVM_VNODE_RELKILL flag and wait for all async i/o to finish. but
uvm_vnp_terminate() could clear the flag and take over the vnode.
mpi@ noted that this code path is mostly dead code because there is no "async
I/O" (uvn_io() is always synchronous).
ok visa@ mpi@
|
|
|
|
case where not all pages are wired. The KASSERT can be triggered in
multi-threaded applications when a thread calling munmap(2) races another
thread that invokes sysctl(2). Properly written code shouldn't do this,
but making the kernel crash in this case is a bit harsh.
ok gezdo@, deraadt@
Fixes:
Reported-by: syzbot+e8310909e2910c9cca08@syzkaller.appspotmail.com
|
|
If the allocation fails due to memory pressure no time is wasted doing
encryption. This also simplify the error path.
Tested by sthen@.
ok kn@, miod@, kettenis@, tb@
|
|
No functionnal change.
|
|
This introduced a lock ordering issue reported by naddy@, anton@ and syzkaller.
Reported-by: syzbot+739bb901045d9b193bde@syzkaller.appspotmail.com
|
|
to prevent another thread from unmapping the memory and triggering
an assertion or even corrupting random physical memory pages.
This fix is similar to the change in uvm_glue.c rev. 1.74. However in this
case we need to be careful since some sysctl(2) calls look at the map of
the current process. In those cases we must not attempt to lock the map
again.
ok mpi@
Should fix:
Reported-by: syzbot+be89fe83d6c004fcb412@syzkaller.appspotmail.com
|
|
Should prevent a KASSERT() from tiggering when freeing an anon after swaping-out
its memory.
This code path has been broken since at least January 2021 and is apparently not
so easy to trigger.
Found the hard way by sthen@
ok kettenis@, kn@
|
|
net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@
|
|
The drm subsystem implements graphics buffers as uvm objects backed by
anonymous memory, thus drm locks and aobj locks share the same "vmobjlock"
type.
uvm_obj_wire() is only called from sys/dev/pci/drm/, so instead of changing
drm's lock init/alloc routines to mark allow duplicate locks in general,
enter uvm's vmobjlock with RW_DUPOK in this function to allow duplicate
lock types per thread in this specific call path alone.
Fixes the following WITNESS report when booting/starting X (as seen already
in other unrelated bugs@ reports):
wsdisplay0: screen 1-5 added (std, vt100 emulation)
witness: acquiring duplicate lock of same type: "&uobj->vmobjlock"
1st uobjlk
2nd uobjlk
Starting stack trace...
witness_checkorder(fffffd83b625f9b0,9,0) at witness_checkorder+0x8ac
rw_enter(fffffd83b625f9a0,1) at rw_enter+0x68
uvm_obj_wire(fffffd843c39e948,0,40000,ffff800033b70428) at uvm_obj_wire+0x46
shmem_get_pages(ffff800008008500) at shmem_get_pages+0xb8
__i915_gem_object_get_pages(ffff800008008500) at __i915_gem_object_get_pages+0x6d
i915_gem_fault(ffff800008008500,ffff800033b707c0,10009b000,a43d6b1c000,ffff800033b70740,1,35ba896911df1241,ffff8000000aa078,ffff8000000aa178) at i915_gem_fault+0x203
drm_fault(ffff800033b707c0,a43d6b1c000,ffff800033b70740,1,0,0,7eca45006f70ee0,ffff800033b707c0) at drm_fault+0x156
uvm_fault(fffffd843a7cf480,a43d6b1c000,0,2) at uvm_fault+0x179
upageflttrap(ffff800033b70920,a43d6b1c000) at upageflttrap+0x62
usertrap(ffff800033b70920) at usertrap+0x129
recall_trap() at recall_trap+0x8
end of kernel
end trace frame: 0x7f7ffffdc7c0, count: 246
End of stack trace.
Input kettenis
OK mpi
|
|
|
|
ok mpi@
|
|
The (known) lock order reversals which now occur more reliably and much
earlier on WITNESS boots with this diff knock out syzcaller reports since
syzcaller stops at the first "crash report":
https://syzkaller.appspot.com/bug?id=81b39e970cd2eb21b97d1b31746c693e300fd2dd
|
|
This is an updated version of uvm_map.c r1.283 "Unwire with map lock held".
The previous version introduced a use-after-free by not unlocking vm_map
locks in uvm_map_teardown(), resulting in dangling references on the
reaper's lock list (thanks visa!).
Lock and unlock the map in around uvm_map_teardown() instead.
This code path holds the last reference, hence the lock isn't strictly
needed except for satisfying upcoming locking assertions.
Tested on amd64, arm64, i386, macppc, octeon, sparc64.
This time also with WITNESS enabled (except on sparc64 which builds but does
not boot with WITNESS; this is a known issue).
OK mpi visa
|
|
WITNESS builds broke^W^Wkernels panic on boot as reported by anton and bluhm.
Booting bsd.mp in single-user mode inside VMM shows:
root on sd0a (5f9e458ed30b39ab.a) swap on sd0b dump on sd0b
Enter pathname of shell or RETURN for sh:
witness: lock order reversal:
1st 0xfffffd801f8ce468 vmmaplk (&map->lock)
2nd 0xfffffd801b8162c0 inode (&ip->i_lock)
lock order "&ip->i_lock"(rrwlock) -> "&map->lock"(rwlock) first seen at:
#0 rw_enter_read+0x38
#1 uvmfault_lookup+0x8a
#2 uvm_fault_check+0x32
#3 uvm_fault+0xfb
#4 kpageflttrap+0x12c
#5 kerntrap+0x91
#6 alltraps_kern_meltdown+0x7b
#7 copyout+0x53
#8 ffs_read+0x1f6
#9 VOP_READ+0x41
#10 vn_rdwr+0xa1
#11 vmcmd_map_readvn+0xa0
#12 exec_process_vmcmds+0x88
#13 sys_execve+0x732
#14 start_init+0x26f
#15 proc_trampoline+0x1c
lock order data w1 -> w2 missing
# exit
kernel: protection fault trap, code=0
Stopped at witness_checkorder+0x312: movl 0x10(%r14),%ecx
gkoehler reported faults on poisened addresses on macppc dual G5.
|
|
WITNESS builds broke as reported by anton and bluhm:
root on sd0a (5ec49b3ad23eb2d4.a) swap on sd0b dump on sd0b
kernel: protection fault trap, code=0
Stopped at witness_checkorder+0x4ec: movl 0x10(%r12),%ecx
https://syzkaller.appspot.com/bug?id=be02b290a93c648986c35370a271aad4135a5044
https://syzkaller.appspot.com/text?tag=CrashLog&x=136e9aa4700000
|
|
Introduce vm_map_assert_{wrlock,rdlock,anylock,unlocked}() in rwlock(9)
fashion and back up function comments about locking assumptions with proper
assertions.
Also add new comments/assertions based on code analysis and sync with
NetBSD as much as possible.
vm_map_lock() and vm_map_lock_read() are used for exclusive and shared
access respectively; currently no code path is purely protected by
vm_map_lock_read() alone, i.e. functions called with a read lock held by the
callee are also called with a write lock elsewhere.
Thus only vm_map_assert_{wrlock,anylock}() are used as of now.
This should help with unlocking UVM related syscalls
Tested as part of a larger diff through
- amd64 package bulk build by naddy
- amd64, arm64, powerpc64 base builds and regress by bluhm
- amd64 and sparc64 base builds and regress by me
Input mpi
Feedback OK kettenis
|
|
uvm_unmap_remove() effectively requires its caller to lock the vm map.
Even though uvm_map_teardown() is only called after a map's last reference
is dropped and is thus safe from other threads accessing the map, grab the
map's lock in uvm_map_teardown() to satify upcoming lock assertions in
uvm_unmap_remove().
Tested as part of a larger diff through
- amd64 package bulk builds by naddy
- amd64, arm64, powerpc64 base builds and regress by bluhm
- amd64 and sparc64 base builds and regress by me
Feedback mpi
OK kettenis
|
|
subset of the request permissions, so when forcing an initial RO
fault for CoW also clamp the access_type.
problem reported by bluhm@
based on a suggestion from miod@
ok kettenis@
|
|
No object change.
OK millert
|
|
can't be written to while any thread can see the original version
of the page via a not-yet-flushed stale TLB entry: pmaps can indicate
they do this correctly by defining __HAVE_PMAP_MPSAFE_ENTER_COW;
uvm will force the initial CoW fault to be read-only otherwise.
Set that on amd64 and fix the problem case in pmap_enter() by putting
a read-only mapping in place, shooting the TLB entry, then fixing
it to the final read-write entry so this thread can continue without
re-faulting.
reported by jsing@ from https://github.com/golang/go/issues/34988
assisted by discussion in https://reviews.freebsd.org/D14347
tweaks from jsing@ and kettenis@
ok jsing@ mpi@ kettenis@
|
|
ok visa@
|
|
kern.wxabort=1 logs and kills programs after W^X violations.
At least sigexit() -> coredump() as well as the non-atomic increment of
ps_wxcounter require protection, so grab the big lock for the entire block.
This is part of the effort to unlock mmap(2)'s MAP_ANON case.
Feedback mvs claudio kettenis deraadt
OK kettenis
|
|
The swap code path in uvm_aio_aiodone() is not holding the corresponding
page lock and shouldn't as long as anons are locked inside uvm_page_unbusy()
to handle the PG_RELEASED case.
Reported by Ralf Horstmann on bugs@
|
|
There is no functionnal change as the former is just a wrapper around the
latter. However upper layer of UVM do not need to mess with the internals
of the page allocator.
This will also help when a page cache will be introduced to reduce contention
on the global mutex serializing acess to pmemrange's data.
ok kettenis@, kn@, tb@
|
|
boundaries: hppa has 8-byte PLT entries that sometimes do that.
ok kettenis@
|
|
Reduce differences with NetBSD, no functional changes.
|
|
Tested by many during the past months, thanks!
ok sthen@
|
|
Switch libc and ld.so to the generic stubs for these calls.
WARNING: reboot to updated kernel before installing libc or ld.so!
Time for a story...
When gcc (back in 1.x days) first implemented long long, it didn't (always)
pass 64bit arguments in 'aligned' registers/stack slots, with the result that
argument offsets didn't match structure offsets. This affected the nine system
calls that pass off_t arguments:
ftruncate lseek mmap mquery pread preadv pwrite pwritev truncate
To avoid having to do custom ASM wrappers for those, BSD put an explicit pad
argument in so that the off_t argument would always start on a even slot and
thus be naturally aligned. Thus those odd wrappers in lib/libc/sys/ that use
__syscall() and pass an extra '0' argument.
The ABIs for different CPUs eventually settled how things should be passed on
each and gcc 2.x followed them. The only arch now where it helps is landisk,
which needs to skip the last argument register if it would be the first half of
a 64bit argument. So: add new syscalls without the pad argument and on landisk
do that skipping directly in the syscall handler in the kernel. Keep compat
support for the existing syscalls long enough for the transition.
ok deraadt@
|
|
Pass the correct entry to uvm_fault_unwire_locked().
Reported-by: syzbot+bb2f63f076618e9ed0d3@syzkaller.appspotmail.com
ok kettenis@, deraadt@
|
|
Fix a NULL dereference introduced in previous, reported by anton@ and
Benjamin Baier.
Reported-by: syzbot+c172bd335801b67e515b@syzkaller.appspotmail.com
|
|
Like the per-amap lock the `vmobjlock' is principally used to serialized
access to objects in the fault handler to allow faults occurring on
different CPUs and different objects to be processed in parallel.
The fault handler now acquires the `vmobjlock' of a given UVM object as
soon as it finds one. For now a write-lock is always acquired even if
some operations could use a read-lock.
Every pager, corresponding to a different kind of UVM object, now expect
the UVM object to be locked and some operations, like *_get() return it
unlocked. This is enforced by assertions checking for rw_write_held().
The KERNEL_LOCK() is now pushed to the VFS boundary in the vnode pager.
To ensure the correct amap or object lock is held when modifying a page
many uvm_page* operations are now asserting for the "owner" lock.
However, fields of the "struct vm_page" are still being protected by the
global `pageqlock'. To prevent lock ordering issues with the new
`vmobjlock' and to reduce differences with NetBSD this lock is now taken
and released for each page instead of around the whole loop.
This commit does not remove the KERNEL_LOCK/UNLOCK() dance. Unlocking
will follow if there is no fallout.
Ported from NetBSD, tested by many, thanks!
ok kettenis@, kn@
|
|
Pass the device vnode as a parameter to VOP_STRATEGY() to allow calling
the correct vop_strategy callback. Now the vnode is also available
in the callback.
OK mpi@
|
|
first __tfork(2)"
The immediate issue is that a process linked with -znow will still
perform lazy relocation on objects loaded with dlopen(), but there
are possibly other dark corners to plumb to find a better invariant.
Problem reported by thfr@
|
|
prints the end which is in the next page. Subtract 1 to avoid confusion.
|
|
Thread: https://marc.info/?l=openbsd-tech&m=163884527530326&w=2
ok deraadt@
|
|
To unlock kbind(2) we need to protect ps_kbind_addr and
ps_kbind_cookie.
The simplest way to do this is to disallow kbind(2) initialization
after the first __tfork(2) call. If the first thread does not
initialize the kbind(2) variables before __tfork(2) then we disable
kbind(2) during that first __tfork(2) call.
This is guenther@'s patch, I'm just committing it.
Discussed with guenther@, deraadt@, kettenis@, and mpi@.
ok kettenis@, positive response from mpi@, "I am busy" guenther@
|
|
ok millert mpi
|