Age | Commit message (Collapse) | Author |
|
ok kettenis@, semarie@, deraadt@
|
|
Found by jsing@
|
|
ok kettenis@
|
|
The lookup in uvm_map_inentry_fix() is already serialized by the
vm_map_lock and such lookup is already executed w/o the KERNEL_LOCK().
ok kettenis@, deraadt@
|
|
|
|
violations in system accounting. This will help to find missbehaving
programs and possible attacks. The flags bit field is full, so
recycle the PDP-11 compatibility on VAX. lastcomm(1) prints the
AMAP flag as 'M'. daily(8) prints a list of affected processes.
OK deraadt@
|
|
UVM_WAIT() doesn't provide much of a useful abstraction. All callers
tsleep forever and no callers set PCATCH, so only 2 of 4 parameters are
actually used. Might as well just use tsleep_nsec(9) directly and make
the uvm code a bit less specialized.
Suggested by mpi@.
ok mpi@ visa@ millert@
|
|
Equivalent to their unsuffixed counterparts except that (a) they take
a timeout in terms of nanoseconds, and (b) INFSLP, aka UINT64_MAX (not
zero) indicates that a timeout should not be set.
For now, zero nanoseconds is not a strictly valid invocation: we log a
warning on DIAGNOSTIC kernels if we see such a call. We still sleep
until the next tick in such a case, however. In the future this could
become some sort of poll... TBD.
To facilitate conversions to these interfaces: add inline conversion
functions to sys/time.h for turning your timeout into nanoseconds.
Also do a few easy conversions for warmup and to demonstrate how
further conversions should be done.
Lots of input from mpi@ and ratchov@. Additional input from tedu@,
deraadt@, mortimer@, millert@, and claudio@.
Partly inspired by FreeBSD r247787.
positive feedback from deraadt@, ok mpi@
|
|
ok visa@, semarie@
|
|
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.
Inspired by code in DragonFly BSD and FreeBSD.
OK mpi@, agreement from jmatthew@ and anton@
|
|
damaged the error messages. Repair that, passing distinct format
strings for the two cases.
ok beck
|
|
Lookup the address that a syscall instruction is executed from, and kill
the process if that page is writeable. This brings an aspect of W^X
behaviour to W|X mappings (in JITs not yet adapted to W^X). The goal is
to remove simple attack methods and force use of ret2libc or other more
complicated means.
ok kettenis stefan visa
|
|
taking the kernel lock on when operating on the kernel_map when called from
all kernel memory allocation interfaces.
ok visa@, mlarkin@
|
|
|
|
knob, since we found a proram which tests RWX mapping then changes execution
behaviour to non-W^X.
(that program is chrome, as v8 is heading towards W^X compliance with
mprotect RW/RX swaps, and also has jitless components in developent.)
ok sthen kettenis robert
|
|
under lock
ok guenther@
|
|
ok guenther@
|
|
allocations will recover some memory from the dma_constraint range.
The allocation still fails, the intent is to ensure that the
pagedaemon will free some memory to possibly allow a subsequent
allocation to succeed.
This also adds a UVM_PLA_NOWAKE flag to allow special cases in the
buffer cache to not wake up the pagedaemon until they want to.
ok kettenis@
|
|
Reduce code clutter by removing the file name and line number output
from witness(4). Typically it is easy enough to locate offending locks
using the stack traces that are shown in lock order conflict reports.
Tricky cases can be tracked using sysctl kern.witness.locktrace=1 .
This patch additionally removes the witness(4) wrapper for mutexes.
Now each mutex implementation has to invoke the WITNESS_*() macros
in order to utilize the checker.
Discussed with and OK dlg@, OK mpi@
|
|
obvious misconfigurations that cannot work.
OK mpi@ tedu@
|
|
disabled in the process. Rather than tying it to KERNBASE, make it simply
-1, which means it even more invalid..
ok tedu
|
|
MAP_CONCEAL'd memory is not written to disk in the event of a core dump.
It may grow other qualities in the future.
Wanted by libressl, probably useful elsewhere, too.
Prompted by deraadt@, concept from deraadt@/kettenis@. With input from
deraadt@, cjeker@, kettenis@, otto@, bcook@, matthew@, guenther@, djm@,
and tedu@.
ok otto@ deraadt@
|
|
objects that readers can access without locking. This provides a basis
for read-copy-update operations.
Readers access SMR-protected shared objects inside SMR read-side
critical section where sleeping is not allowed. To reclaim
an SMR-protected object, the writer has to ensure mutual exclusion of
other writers, remove the object's shared reference and wait until
read-side references cannot exist any longer. As an alternative to
waiting, the writer can schedule a callback that gets invoked when
reclamation is safe.
The mechanism relies on CPU quiescent states to determine when an
SMR-protected object is ready for reclamation.
The <sys/smr.h> header additionally provides an implementation of
singly- and doubly-linked lists that can be used together with SMR.
These lists allow lockless read access with a concurrent writer.
Discussed with many
OK mpi@ sashan@
|
|
was never updated.
from Amit Kulkarni
|
|
sp must be on a MAP_STACK page. Relax the check a bit -- the sp may be
on a PROT_NONE page. Can't see how an attacker can leverage that situation.
(New perl build process contains a "how many call frames can my stack
hold" checker, and this triggers via the MAP_STACK fault rather than
the normal access check. The MAP_STACK check still has a kernel printf
as we hunt for applications which map stacks poorly. Interestingly the
perl code has a knob to disable similar printing alerts on Windows, which
apparently has a feature somewhat like MAP_STACK!)
ok tedu guenther kettenis
|
|
instead
From Pamela Mosiejczuk, many thanks!
OK phessler@ deraadt@
|
|
introduced with __MAP_NOFAULT. The regression let uvm_fault() run
without proper locking and rechecking of state after map version change
if page zero-fill was chosen.
OK kettenis@ deraadt@
Reported-by: syzbot+9972088c1026668c6c5c@syzkaller.appspotmail.com
|
|
about shared resources which no program should see. only a few pieces of
software use it, generally poorly thought out. they are being fixed, so
mincore() can be deleted.
ok guenther tedu jca sthen, others
|
|
another process is doing. We don't want that, so instead have it
always return that memory is in core.
ok deraadt kettenis
|
|
physio(9) to prevent another thread from unmapping the memory and triggering
an assertion or even corruption random physical memory pages.
ok deraadt@
Should fix:
Reported-by: syzbot+b8e7faf688f8c9d341b1@syzkaller.appspotmail.com
Reported-by: syzbot+b6a9255faa0605669432@syzkaller.appspotmail.com
|
|
|
|
inteldrm driver to add support for the I915_MMAP_WC flag.
ok deraadt@, jsg@
|
|
ok jsg@ (who pointed out the kern_pledge.c change was necessary as well)
|
|
fd_getfile(9) is mpsafe. Note that sys_mmap(2) isn't actually unlocked
currently. However this diff has been tested with it unlocked, and I
hope to unlock it for real soon-ish.
ok visa@, mpi@
|
|
the start of the range of pages that we're changing. Prevents a panic from
a somewhat convoluted test case that anton@ came up with.
ok guenther@, anton@
|
|
kernel calls to ensure that the UVM cache for memory mapped files is
up to date.
ok mpi@
|
|
unusedNN.
Missing man page bits pointed out by
jmc@. Ports source scan by sthen@.
ok deraadt@ guenther@
|
|
|
|
"Buffer cache pages are wired but not counted as such. Therefore we
have to set the wire count on the pages to 0 before we call
uvm_pagefree() on them, just like we do in buf_free_pages().
Otherwise the wired pages counter goes negative. While there, also
sprinkle some KASSERTs in there that buf_free_pages() has as well."
ok beck@ (again)
|
|
unnecessary because curproc always does the locking.
OK mpi@
|
|
curproc that does the locking or unlocking, so the proc parameter
is pointless and can be dropped.
OK mpi@, deraadt@
|
|
ok visa@
|
|
stack buffer. With a page-aligned buffer, creating a MAP_STACK sub-region
would undo the PROT_NONE guard. Ignore that last page.
(We could check if the last page is non-RW before choosing to skip it. But
we've already elected to grow STK sizes to compensate. Always ignoring the
last page makes it a non-MAP_STACK guard page which can be opportunistically
discovered)
ok semarie stefan kettenis
|
|
the brk area anyway.
- Use a larger hint bound to spread the allocations more for the 32-bit case
- Simplified the overy abstracted brs/stack allocator and switch of
guard pages for the brk case. This allows i386 some extra space,
depending on memory usage patterns.
- Reduce brk area on i386 to give the rnd space more room
ok stefan@ sthen@
|
|
Other parts of uvm/pmap check for proper prot flags
already. This fixes the qemu startup problems that
semarie@ reported on tech@.
|
|
syscall) confirm the stack register points at MAP_STACK memory, otherwise
SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified
to create a MAP_STACK sub-region which satisfies alignment requirements.
Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the
contents of the region -- there is no mprotect() equivalent operation, so
there is no MAP_STACK-adding gadget.
This opportunistic software-emulation of a stack protection bit makes
stack-pivot operations during ROPchain fragile (kind of like removing a
tool from the toolbox).
original discussion with tedu, uvm work by stefan, testing by mortimer
ok kettenis
|
|
direction, otherwise we might break the loop prematurely; ok stefan@
|
|
issues with upcoming NFSnode's locks.
ok visa@
|
|
protection cannot block the final SIGABRT.
While here apply the same logic to ddb(4)'s kill command.
From semarie@, ok deraadt@
|
|
revoked while syncing disk, so the processes lose their executable
pages. Instead of killing them with a SIGBUS after page fault,
just sleep. This should prevent that init dies without pages
followed by a kernel panic.
initial diff from tedu@; OK deraadt@ tedu@
|