Age | Commit message (Collapse) | Author |
|
prints the end which is in the next page. Subtract 1 to avoid confusion.
|
|
Reduce differences with NetBSD, tested by many as part of a larger diff.
ok kettenis@
|
|
This is possible now that amaps & anons are protected by a per-map rwlock.
Tested by many as part of a bigger diff.
ok kettenis@
|
|
This change introduced or exposed a leak of anons which result in system
freezes.
anton@ observed a high number of INUSE for anonpl and semarie@ saw multiple
processes waiting in the fault handler on "flt_noramX" probably the one
related to allocating an anon.
|
|
This is possible now that amaps & anons are protected by a per-map rwlock.
ok kettenis@, jmatthew@
|
|
This is necessary to do this accounting without the KERNEL_LOCK().
ok mvs@, kettenis@
|
|
No functional change.
ok mlarkin@
|
|
ok mpi@
|
|
|
|
ok mpi@
|
|
A rwlock is attached to every amap and is shared with all its anon. The
same lock will be used by multiple amaps if they have anons in common.
This should be enough to get the upper part of the fault handler out of the
KERNEL_LOCK() which seems to bring up to 20% improvements in builds.
This is based/copied/adapted from the most recent work done in NetBSD which
is an evolution of the precendent simple_lock scheme.
Tested by many, thanks!
ok kettenis@, mvs@
|
|
The underlying vm_space lock is used as a substitute to the KERNEL_LOCK()
in uvm_grow() to make sure `vm_ssize' is not corrupted.
ok anton@, kettenis@
|
|
Reduce differences with NetBSD.
ok mvs@, kettenis@
|
|
kernel lock are fixed now, push the kernel lock down again.
ok deraadt@
|
|
ok kettenis@
|
|
|
|
CID 1453116
ok kettenis@
|
|
Instead count (and check the limit) when their protection gets flipped
from PROT_NONE to something that permits access. This means that
mprotect(2) may now fail if changing the protection would exceed RLIMIT_DATA.
This helps code (such as Chromium's JavaScript interpreter that reserves
large chunks of address space but populates it sparsely.
ok deraadt@, otto@, kurt@, millert@, robert@
|
|
ok mpi@
|
|
OK guenther@, kettenis@, mpi@
|
|
maps. This lets witness know that these really are different classes
avoiding false positives when detecting lock order reversals.
ok guenther@, visa@, mpi@
|
|
- reduces gratuitous differences with NetBSD,
- merges multiple '#ifdef _KERNEL' blocks,
- kills unused 'struct vm_map_intrsafe'
- turns 'union vm_map_object' into a anonymous union (following to NetBSD)
- move questionable vm_map_modflags() into uvm/uvm_map.c
- remove guards around MAX_KMAPENT, it is defined&used only once
- document lock differences
- fix tab vs space
ok mlarkin@, visa@
|
|
|
|
non-writeable / syscall checker.
|
|
Syzkaller found a bug in uvm_share when using a vmd(8) mmap region with
an offset that ended up making an overlap with a previous vmm(4) uvm_map
range.
This diff reworks the range and offset calculation in uvm_share. Only
vmm(4) uses this, so there should be no visible effects outside vmm(4)
environments.
Syzkaller also went sorta crazy on this one, finding multiple reproducers
for the same bug with just slightly different parameters, thus the
multiple "Reported-by" lines below.
ok stefan@, anton@
Reported-by: syzbot+2c625ab1b8e964da644a@syzkaller.appspotmail.com
Reported-by: syzbot+1300829862412751462d@syzkaller.appspotmail.com
Reported-by: syzbot+27cfad3394f34528cbec@syzkaller.appspotmail.com
Reported-by: syzbot+3e700c5698177f91cce1@syzkaller.appspotmail.com
|
|
ok tedu@, visa@
|
|
haven't crossed over the ABI break as easily as expected.
|
|
enforce a new policy: system calls must be in pre-registered regions.
We have discussed more strict checks than this, but none satisfy the
cost/benefit based upon our understanding of attack methods, anyways
let's see what the next iteration looks like.
This is intended to harden (translation: attackers must put extra
effort into attacking) against a mixture of W^X failures and JIT bugs
which allow syscall misinterpretation, especially in environments with
polymorphic-instruction/variable-sized instructions. It fits in a bit
with libc/libcrypto/ld.so random relink on boot and no-restart-at-crash
behaviour, particularily for remote problems. Less effective once on-host
since someone the libraries can be read.
For static-executables the kernel registers the main program's
PIE-mapped exec section valid, as well as the randomly-placed sigtramp
page. For dynamic executables ELF ld.so's exec segment is also
labelled valid; ld.so then has enough information to register libc's
exec section as valid via call-once msyscall(2)
For dynamic binaries, we continue to to permit the main program exec
segment because "go" (and potentially a few other applications) have
embedded system calls in the main program. Hopefully at least go gets
fixed soon.
We declare the concept of embedded syscalls a bad idea for numerous
reasons, as we notice the ecosystem has many of
static-syscall-in-base-binary which are dynamically linked against
libraries which in turn use libc, which contains another set of
syscall stubs. We've been concerned about adding even one additional
syscall entry point... but go's approach tends to double the entry-point
attack surface.
This was started at a nano-hackathon in Bob Beck's basement 2 weeks
ago during a long discussion with mortimer trying to hide from the SSL
scream-conversations, and finished in more comfortable circumstances
next to a wood-stove at Elk Lakes cabin with UVM scream-conversations.
ok guenther kettenis mortimer, lots of feedback from others
conversations about go with jsing tb sthen
|
|
wrapped line.
No code change.
|
|
No code change.
|
|
Found by jsing@
|
|
The lookup in uvm_map_inentry_fix() is already serialized by the
vm_map_lock and such lookup is already executed w/o the KERNEL_LOCK().
ok kettenis@, deraadt@
|
|
|
|
violations in system accounting. This will help to find missbehaving
programs and possible attacks. The flags bit field is full, so
recycle the PDP-11 compatibility on VAX. lastcomm(1) prints the
AMAP flag as 'M'. daily(8) prints a list of affected processes.
OK deraadt@
|
|
damaged the error messages. Repair that, passing distinct format
strings for the two cases.
ok beck
|
|
Lookup the address that a syscall instruction is executed from, and kill
the process if that page is writeable. This brings an aspect of W^X
behaviour to W|X mappings (in JITs not yet adapted to W^X). The goal is
to remove simple attack methods and force use of ret2libc or other more
complicated means.
ok kettenis stefan visa
|
|
taking the kernel lock on when operating on the kernel_map when called from
all kernel memory allocation interfaces.
ok visa@, mlarkin@
|
|
Reduce code clutter by removing the file name and line number output
from witness(4). Typically it is easy enough to locate offending locks
using the stack traces that are shown in lock order conflict reports.
Tricky cases can be tracked using sysctl kern.witness.locktrace=1 .
This patch additionally removes the witness(4) wrapper for mutexes.
Now each mutex implementation has to invoke the WITNESS_*() macros
in order to utilize the checker.
Discussed with and OK dlg@, OK mpi@
|
|
MAP_CONCEAL'd memory is not written to disk in the event of a core dump.
It may grow other qualities in the future.
Wanted by libressl, probably useful elsewhere, too.
Prompted by deraadt@, concept from deraadt@/kettenis@. With input from
deraadt@, cjeker@, kettenis@, otto@, bcook@, matthew@, guenther@, djm@,
and tedu@.
ok otto@ deraadt@
|
|
sp must be on a MAP_STACK page. Relax the check a bit -- the sp may be
on a PROT_NONE page. Can't see how an attacker can leverage that situation.
(New perl build process contains a "how many call frames can my stack
hold" checker, and this triggers via the MAP_STACK fault rather than
the normal access check. The MAP_STACK check still has a kernel printf
as we hunt for applications which map stacks poorly. Interestingly the
perl code has a knob to disable similar printing alerts on Windows, which
apparently has a feature somewhat like MAP_STACK!)
ok tedu guenther kettenis
|
|
instead
From Pamela Mosiejczuk, many thanks!
OK phessler@ deraadt@
|
|
inteldrm driver to add support for the I915_MMAP_WC flag.
ok deraadt@, jsg@
|
|
the start of the range of pages that we're changing. Prevents a panic from
a somewhat convoluted test case that anton@ came up with.
ok guenther@, anton@
|
|
stack buffer. With a page-aligned buffer, creating a MAP_STACK sub-region
would undo the PROT_NONE guard. Ignore that last page.
(We could check if the last page is non-RW before choosing to skip it. But
we've already elected to grow STK sizes to compensate. Always ignoring the
last page makes it a non-MAP_STACK guard page which can be opportunistically
discovered)
ok semarie stefan kettenis
|
|
the brk area anyway.
- Use a larger hint bound to spread the allocations more for the 32-bit case
- Simplified the overy abstracted brs/stack allocator and switch of
guard pages for the brk case. This allows i386 some extra space,
depending on memory usage patterns.
- Reduce brk area on i386 to give the rnd space more room
ok stefan@ sthen@
|
|
Other parts of uvm/pmap check for proper prot flags
already. This fixes the qemu startup problems that
semarie@ reported on tech@.
|
|
syscall) confirm the stack register points at MAP_STACK memory, otherwise
SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified
to create a MAP_STACK sub-region which satisfies alignment requirements.
Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the
contents of the region -- there is no mprotect() equivalent operation, so
there is no MAP_STACK-adding gadget.
This opportunistic software-emulation of a stack protection bit makes
stack-pivot operations during ROPchain fragile (kind of like removing a
tool from the toolbox).
original discussion with tedu, uvm work by stefan, testing by mortimer
ok kettenis
|
|
that is attempted.
Minor cleanups:
- Eliminate some always false and always true tests against MAP_ANON
- We treat anon mappings with neither MAP_{SHARED,PRIVATE} as MAP_PRIVATE
so explicitly indicate that
ok kettenis@ beck@
|
|
when WITNESS is enabled
ok visa@ kettenis@
|
|
A deadlock can occur when the uvm_km_thread(), running without KERNEL_LOCK()
is interrupted by and non-MPSAFE handler while holding the pool's mutex. At
that moment if another CPU is holding the KERNEL_LOCK() and wants to grab the
pool mutex, like in sys_kbind(), kaboom!
This is a temporaty solution, a more generate approach regarding mutexes and
un-KERNEL_LOCK()ed threads is beeing discussed.
Deadlock reported by sthen@, ok kettenis@
|