Age | Commit message (Collapse) | Author |
|
A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.
This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.
This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.
Tested by many, thanks!
ok kettenis@, mvs@
|
|
Fix a regression where the valye wasn't correctly overwritten for wired
mapping, introduced in previous refactoring.
ok mvs@
|
|
ok kettenis@
|
|
OK millert@
|
|
We can simulate the current behavior without lbolt by sleeping for 1
second on the &nowake channel.
ok mpi@
|
|
ok kettenis@, dlg@
|
|
At least some initialization code on i386 calls it w/o KERNEL_LOCK().
Found the hardway by jungle Boogie and Hrvoje Popovski.
|
|
This will allow uvm_fault_upper() to enter swap-related functions without
holding the KERNEL_LOCK().
ok jmatthew@
|
|
ok jmatthew@, tb@
|
|
Currently all iterations are done under KERNEL_LOCK() and therefor use
the *_LOCKED() variant.
From and ok claudio@
|
|
ok kettenis@
|
|
Use a new flag, UVM_PLA_USERESERVE, to tell uvm_pmr_getpages() that using
kernel reserved pages is allowed.
Merge duplicated checks waking the pagedaemon to uvm_pmr_getpages().
Add two more pages to the amount reserved for the kernel to compensate the
fact that the pagedaemon may now consume an additional page.
Document locking of some uvmexp fields.
ok kettenis@
|
|
Reported by AIsha Tammy.
ok kettenis@
|
|
Document which global data structures require this lock and add some
asserts where the lock should be held.
Some code paths are still incorrect and should be revisited.
ok jmatthew@
|
|
No functionnal change.
ok kettenis@, jmatthew@, tb@
|
|
ok tb@, jmatthew@
|
|
Separate fault handling code for type 1 and 2 and reduce differences
with NetBSD.
ok tb@, jmatthew@, kettenis@
|
|
Some minor documentation improvments and style nits but this should
not contain any functionnal change.
ok tb@
|
|
Reduce code duplication, reduce differences with NetBSD and simplify
upcoming locking diff.
ok jmatthew@
|
|
It won't be used when amap and anon locking will be introduced.
This "fixes" passing a unrelated/uninitialized pointer in an error path
in case of memory shortage.
ok kettenis@
|
|
page backed by a vnode, uvn_io() will end up being called in order to
populate newly allocated pages using I/O on the backing vnode. Before
performing the I/O, newly allocated pages are flagged as busy by
uvn_get(), that is before uvn_io() tries to lock the vnode. Such pages
could then end up being flushed by uvn_flush() which already has
acquired the vnode lock. Since such pages are flagged as busy,
uvn_flush() will wait for them to be flagged as not busy. This will
never happens as uvn_io() cannot make progress until the vnode lock is
released.
Instead, grab the vnode lock before allocating and flagging pages as
busy in uvn_get(). This does extend the scope in uvn_get() in which the
vnode is locked but resolves the deadlock.
ok mpi@
Reported-by: syzbot+e63407b35dff08dbee02@syzkaller.appspotmail.com
|
|
|
|
within the correct #ifdef of course.
ok kettenis
|
|
While here put some KERNEL_ASSERT_LOCKED() in the functions called from
the page fault handler. The removal of locking of `uobj' will need to be
revisited and these are good indicator that something is missing and that
many comments are lying.
ok kettenis
|
|
The name, uvm_fault_check() and logic comes from NetBSD as reuducing diff
with their tree is useful to learn from their experience and backport fixes.
No functional change intended.
ok kettenis@
|
|
ok kettenis@
|
|
ok patrick@, mpi@
|
|
The underlying vm_space lock is used as a substitute to the KERNEL_LOCK()
in uvm_grow() to make sure `vm_ssize' is not corrupted.
ok anton@, kettenis@
|
|
|
|
Improves readability and reduces the difference with NetBSD without
compromising debuggability on RAMDISK.
While here also use local variables to help with future locking and
reference counting.
ok semarie@
|
|
ok deraadt@
|
|
Previous attempt to unlock amap & anon exposed a race in vnode reference
counting. So be conservative with the code paths that we're not fully moving
out of the KERNEL_LOCK() to allow us to concentrate on one area at a time.
The panic reported was:
....panic: vref used where vget required
....db_enter() at db_enter+0x5
....panic() at panic+0x129
....vref(ffffff03b20d29e8) at vref+0x5d
....uvn_attach(1010000,ffffff03a5879dc0) at uvn_attach+0x11d
....uvm_mmapfile(7,ffffff03a5879dc0,2,1,13,100000012) at uvm_mmapfile+0x12c
....sys_mmap(c50,ffff8000225f82a0,1) at sys_mmap+0x604
....syscall() at syscall+0x279
Note that this change has no effect as long as mmap(2) is still executed with
ze big lock.
ok kettenis@
|
|
failed to note this also guarded against heavy amap allocations in the
MAP_SHARED case. Bring back the checks for MAP_SHARED
from semarie, ok kettenis
https://syzkaller.appspot.com/bug?extid=d80de26a8db6c009d060
|
|
This reduces code duplication, reduces the diff with NetBSD and will help
to introduce locks around global variables.
ok cheloha@
|
|
Reduce the diff with NetBSD.
ok kettenis@, deraadt@
|
|
|
|
Reduce differences with NetBSD.
ok mvs@, kettenis@
|
|
ok kettenis@
|
|
kernel lock are fixed now, push the kernel lock down again.
ok deraadt@
|
|
|
|
ok kettenis@
|
|
|
|
This diff exposes parts of clock_gettime(2) and gettimeofday(2) to
userland via libc eliberating processes from the need for a context
switch everytime they want to count the passage of time.
If a timecounter clock can be exposed to userland than it needs to set
its tc_user member to a non-zero value. Tested with one or multiple
counters per architecture.
The timing data is shared through a pointer found in the new ELF
auxiliary vector AUX_openbsd_timekeep containing timehands information
that is frequently updated by the kernel.
Timing differences between the last kernel update and the current time
are adjusted in userland by the tc_get_timecount() function inside the
MD usertc.c file.
This permits a much more responsive environment, quite visible in
browsers, office programs and gaming (apparently one is are able to fly
in Minecraft now).
Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others!
OK from at least kettenis@, cheloha@, naddy@, sthen@
|
|
time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.
This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).
There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.
There is no performance cost on 64-bit (__LP64__) platforms.
With input from visa@, dlg@, and tedu@.
Several bugs squashed by visa@.
ok kettenis@
|
|
ok bluhm@, visa@
|
|
Prompted by a question from schwarze@
ok deraadt@, schwarze@, visa@
|
|
Although there are open questions about whether we should flag failures with
UVM_PMA_FAIL or not, we really should only wake up a sleeper if we unlink
the pma. For now only do that if pages were actually freed in the requested
region.
Prompted by:
CID 1453061 Logically dead code
which should be fixed by this commit.
ok (and together with) beck@
|
|
CID 1453116
ok kettenis@
|
|
CID 1453262.
|
|
Instead count (and check the limit) when their protection gets flipped
from PROT_NONE to something that permits access. This means that
mprotect(2) may now fail if changing the protection would exceed RLIMIT_DATA.
This helps code (such as Chromium's JavaScript interpreter that reserves
large chunks of address space but populates it sparsely.
ok deraadt@, otto@, kurt@, millert@, robert@
|