Age | Commit message (Collapse) | Author |
|
objects that readers can access without locking. This provides a basis
for read-copy-update operations.
Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.
The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.
The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.
Discussed with many
OK mpi@ sashan@
|
|
was never updated.
from Amit Kulkarni
|
|
sp must be on a MAP_STACK page. Relax the check a bit -- the sp may be
on a PROT_NONE page. Can't see how an attacker can leverage that situation.
(New perl build process contains a "how many call frames can my stack
hold" checker, and this triggers via the MAP_STACK fault rather than
the normal access check. The MAP_STACK check still has a kernel printf
as we hunt for applications which map stacks poorly. Interestingly the
perl code has a knob to disable similar printing alerts on Windows, which
apparently has a feature somewhat like MAP_STACK!)
ok tedu guenther kettenis
|
|
instead
From Pamela Mosiejczuk, many thanks!
OK phessler@ deraadt@
|
|
introduced with __MAP_NOFAULT. The regression let uvm_fault() run
without proper locking and rechecking of state after map version change
if page zero-fill was chosen.
OK kettenis@ deraadt@
Reported-by: syzbot+9972088c1026668c6c5c@syzkaller.appspotmail.com
|
|
about shared resources which no program should see. only a few pieces of
software use it, generally poorly thought out. they are being fixed, so
mincore() can be deleted.
ok guenther tedu jca sthen, others
|
|
another process is doing. We don't want that, so instead have it
always return that memory is in core.
ok deraadt kettenis
|
|
physio(9) to prevent another thread from unmapping the memory and triggering
an assertion or even corruption random physical memory pages.
ok deraadt@
Should fix:
Reported-by: syzbot+b8e7faf688f8c9d341b1@syzkaller.appspotmail.com
Reported-by: syzbot+b6a9255faa0605669432@syzkaller.appspotmail.com
|
|
|
|
inteldrm driver to add support for the I915_MMAP_WC flag.
ok deraadt@, jsg@
|
|
ok jsg@ (who pointed out the kern_pledge.c change was necessary as well)
|
|
fd_getfile(9) is mpsafe. Note that sys_mmap(2) isn't actually unlocked
currently. However this diff has been tested with it unlocked, and I
hope to unlock it for real soon-ish.
ok visa@, mpi@
|
|
the start of the range of pages that we're changing. Prevents a panic from
a somewhat convoluted test case that anton@ came up with.
ok guenther@, anton@
|
|
kernel calls to ensure that the UVM cache for memory mapped files is
up to date.
ok mpi@
|
|
unusedNN.
Missing man page bits pointed out by
jmc@. Ports source scan by sthen@.
ok deraadt@ guenther@
|
|
|
|
"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."
ok beck@ (again)
|
|
unnecessary because curproc always does the locking.
OK mpi@
|
|
curproc that does the locking or unlocking, so the proc parameter
is pointless and can be dropped.
OK mpi@, deraadt@
|
|
ok visa@
|
|
stack buffer. With a page-aligned buffer, creating a MAP_STACK sub-region
would undo the PROT_NONE guard. Ignore that last page.
(We could check if the last page is non-RW before choosing to skip it. But
we've already elected to grow STK sizes to compensate. Always ignoring the
last page makes it a non-MAP_STACK guard page which can be opportunistically
discovered)
ok semarie stefan kettenis
|
|
the brk area anyway.
- Use a larger hint bound to spread the allocations more for the 32-bit case
- Simplified the overy abstracted brs/stack allocator and switch of
guard pages for the brk case. This allows i386 some extra space,
depending on memory usage patterns.
- Reduce brk area on i386 to give the rnd space more room
ok stefan@ sthen@
|
|
Other parts of uvm/pmap check for proper prot flags
already. This fixes the qemu startup problems that
semarie@ reported on tech@.
|
|
syscall) confirm the stack register points at MAP_STACK memory, otherwise
SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified
to create a MAP_STACK sub-region which satisfies alignment requirements.
Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the
contents of the region -- there is no mprotect() equivalent operation, so
there is no MAP_STACK-adding gadget.
This opportunistic software-emulation of a stack protection bit makes
stack-pivot operations during ROPchain fragile (kind of like removing a
tool from the toolbox).
original discussion with tedu, uvm work by stefan, testing by mortimer
ok kettenis
|
|
direction, otherwise we might break the loop prematurely; ok stefan@
|
|
issues with upcoming NFSnode's locks.
ok visa@
|
|
protection cannot block the final SIGABRT.
While here apply the same logic to ddb(4)'s kill command.
From semarie@, ok deraadt@
|
|
revoked while syncing disk, so the processes lose their executable
pages. Instead of killing them with a SIGBUS after page fault,
just sleep. This should prevent that init dies without pages
followed by a kernel panic.
initial diff from tedu@; OK deraadt@ tedu@
|
|
The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.
No objection from millert@, ok tedu@, bluhm@
|
|
|
|
no other process which could free it. Better panic in malloc(9)
or pool_get(9) instead of sleeping forever.
tested by visa@ patrick@ Jan Klemkow
suggested by kettenis@; OK deraadt@
|
|
so diffs in snapshots can exercise the change in a less disruptive way
idea with sthen, ok kettenis tom others
|
|
ok millert@ sthen@
|
|
ok deraadt@ krw@
|
|
that is attempted.
Minor cleanups:
- Eliminate some always false and always true tests against MAP_ANON
- We treat anon mappings with neither MAP_{SHARED,PRIVATE} as MAP_PRIVATE
so explicitly indicate that
ok kettenis@ beck@
|
|
Tested by Hrvoje Popovski.
|
|
when WITNESS is enabled
ok visa@ kettenis@
|
|
according to POSIX. Bring regression test and kernel in line for
amd64 and i386. Other architectures have to follow.
OK deraadt@ kettenis@
|
|
with the RS780E chipset.
OK kettenis@, jsg@
|
|
A deadlock can occur when the uvm_km_thread(), running without KERNEL_LOCK()
is interrupted by and non-MPSAFE handler while holding the pool's mutex. At
that moment if another CPU is holding the KERNEL_LOCK() and wants to grab the
pool mutex, like in sys_kbind(), kaboom!
This is a temporaty solution, a more generate approach regarding mutexes and
un-KERNEL_LOCK()ed threads is beeing discussed.
Deadlock reported by sthen@, ok kettenis@
|
|
Recursions are still marked as XXXSMP.
ok deraadt@, bluhm@
|
|
found by jmc@
|
|
the particular use before init was in uvm_init step 6, which calls
kmeminit to set up malloc(9), which calls uvm_km_zalloc, which calls
pmap_enter, which calls pool_get, which tries to allocate a page
using km_alloc, which isnt initalised until step 9 in uvm_init.
uvm_km_page_init calls kthread_create though, which uses malloc
internally, so it cant be reordered before malloc init.
to cope with this, uvm_km_page_init is split up. it sets up the
subsystem, and is called before kmeminit. the thread init is moved
to uvm_km_page_lateinit, which is called after kmeminit in uvm_init.
|
|
PZERO used to be a special value in the first BSD releases but since
the introduction of tsleep(9) there's no way to tell if a thread is
going to sleep for a "short" period of time.
This remove the only (ab)use of ``p_priority'' outside the scheuler
logic, which will help moving avway from a priority-based scheduler.
ok visa@
|
|
ok kettenis@
|
|
between mount locks and inode locks, which may been recorded in either order
ok visa@
|
|
It doesn't compile und hasn't been working during the last decade.
ok kettenis@, deraadt@
|
|
on amd64 and i386.
|
|
ok deraadt@
|
|
For the moment the NET_LOCK() is always taken by threads running under
KERNEL_LOCK(). That means it doesn't buy us anything except a possible
deadlock that we did not spot. So make sure this doesn't happen, we'll
have plenty of time in the next release cycle to stress test it.
ok visa@
|