Age | Commit message (Collapse) | Author |
|
ok bluhm@, visa@
|
|
Prompted by a question from schwarze@
ok deraadt@, schwarze@, visa@
|
|
Although there are open questions about whether we should flag failures with
UVM_PMA_FAIL or not, we really should only wake up a sleeper if we unlink
the pma. For now only do that if pages were actually freed in the requested
region.
Prompted by:
CID 1453061 Logically dead code
which should be fixed by this commit.
ok (and together with) beck@
|
|
CID 1453116
ok kettenis@
|
|
CID 1453262.
|
|
Instead count (and check the limit) when their protection gets flipped
from PROT_NONE to something that permits access. This means that
mprotect(2) may now fail if changing the protection would exceed RLIMIT_DATA.
This helps code (such as Chromium's JavaScript interpreter that reserves
large chunks of address space but populates it sparsely.
ok deraadt@, otto@, kurt@, millert@, robert@
|
|
Do not include <sys/kthread.h> where it is not needed and stop including
<sys/proc.h> in it.
ok visa@, anton@
|
|
into read-only data segment.
OK deraadt@ tedu@
|
|
contiguous pages.
ok beck@
|
|
the amap code to free pages as a list instead of one at a time to
allow for more efficient freeing.
Most of the work done at elk lakes, with testing by me and mlarkin
and kettenis. Speeds up a test program which zeros a big pile of memory
and then exits considerably.
ok kettenis@
|
|
in reverse order from uvm. Use it in uvm_pmr_freepageq when the
pages appear to be in reverse order.
This greatly improves cases of massive page freeing as noticed by
mlarkin@ in his ongoing efforts to have the most gigantish buffer
cache on the planet.
Most of this work done by me with help and polish from kettenis@
at e2k19. Follow on commits to this will make use of this for
more efficient freeing of amaps and a few other things.
ok kettenis@ deraadt@
|
|
ok mpi@
|
|
drops graphics buffers that are cached and not in active use.
Help from beck@ for pointing out how to hook this up to our pagedaemon.
ok jsg@
|
|
OK guenther@, kettenis@, mpi@
|
|
maps. This lets witness know that these really are different classes
avoiding false positives when detecting lock order reversals.
ok guenther@, visa@, mpi@
|
|
- reduces gratuitous differences with NetBSD,
- merges multiple '#ifdef _KERNEL' blocks,
- kills unused 'struct vm_map_intrsafe'
- turns 'union vm_map_object' into a anonymous union (following to NetBSD)
- move questionable vm_map_modflags() into uvm/uvm_map.c
- remove guards around MAX_KMAPENT, it is defined&used only once
- document lock differences
- fix tab vs space
ok mlarkin@, visa@
|
|
|
|
non-writeable / syscall checker.
|
|
ok visa@, jca@
|
|
by sparc pmap.
OK mpi@ guenther@ kettenis@
|
|
ok guenther@
|
|
ok mlarkin@
|
|
In May 29 2008, Matthew R. Green removed it in NetBSD:
github.com/IIJ-NetBSD/netbsd-src/commit/7ea20401d535da9996394136ef
ok deraadt@
|
|
Syzkaller found a bug in uvm_share when using a vmd(8) mmap region with
an offset that ended up making an overlap with a previous vmm(4) uvm_map
range.
This diff reworks the range and offset calculation in uvm_share. Only
vmm(4) uses this, so there should be no visible effects outside vmm(4)
environments.
Syzkaller also went sorta crazy on this one, finding multiple reproducers
for the same bug with just slightly different parameters, thus the
multiple "Reported-by" lines below.
ok stefan@, anton@
Reported-by: syzbot+2c625ab1b8e964da644a@syzkaller.appspotmail.com
Reported-by: syzbot+1300829862412751462d@syzkaller.appspotmail.com
Reported-by: syzbot+27cfad3394f34528cbec@syzkaller.appspotmail.com
Reported-by: syzbot+3e700c5698177f91cce1@syzkaller.appspotmail.com
|
|
ok tedu@, visa@
|
|
haven't crossed over the ABI break as easily as expected.
|
|
Use this in the buffer cache to free all the pages from a buffer,
resulting in a considerable speedup when throwing away pages from
the buffer cache.
Lots of work done with mlarkin and kettenis
ok kettinis@ deraadt@
|
|
into a separate uvm_pageclean() function and call it from uvm_pagefree().
ok mpi@, guenther@, beck@
|
|
enforce a new policy: system calls must be in pre-registered regions.
We have discussed more strict checks than this, but none satisfy the
cost/benefit based upon our understanding of attack methods, anyways
let's see what the next iteration looks like.
This is intended to harden (translation: attackers must put extra
effort into attacking) against a mixture of W^X failures and JIT bugs
which allow syscall misinterpretation, especially in environments with
polymorphic-instruction/variable-sized instructions. It fits in a bit
with libc/libcrypto/ld.so random relink on boot and no-restart-at-crash
behaviour, particularily for remote problems. Less effective once on-host
since someone the libraries can be read.
For static-executables the kernel registers the main program's
PIE-mapped exec section valid, as well as the randomly-placed sigtramp
page. For dynamic executables ELF ld.so's exec segment is also
labelled valid; ld.so then has enough information to register libc's
exec section as valid via call-once msyscall(2)
For dynamic binaries, we continue to to permit the main program exec
segment because "go" (and potentially a few other applications) have
embedded system calls in the main program. Hopefully at least go gets
fixed soon.
We declare the concept of embedded syscalls a bad idea for numerous
reasons, as we notice the ecosystem has many of
static-syscall-in-base-binary which are dynamically linked against
libraries which in turn use libc, which contains another set of
syscall stubs. We've been concerned about adding even one additional
syscall entry point... but go's approach tends to double the entry-point
attack surface.
This was started at a nano-hackathon in Bob Beck's basement 2 weeks
ago during a long discussion with mortimer trying to hide from the SSL
scream-conversations, and finished in more comfortable circumstances
next to a wood-stove at Elk Lakes cabin with UVM scream-conversations.
ok guenther kettenis mortimer, lots of feedback from others
conversations about go with jsing tb sthen
|
|
ok kettenis@
|
|
No code change.
|
|
be used by kernel and ld.so in the near future. Adding the system call
earlier will reduce the number of people who try to build through and
encounter agony.
ok kettenis guenther
|
|
wrapped line.
No code change.
|
|
No code change.
|
|
ok kettenis@, semarie@, deraadt@
|
|
Found by jsing@
|
|
ok kettenis@
|
|
The lookup in uvm_map_inentry_fix() is already serialized by the
vm_map_lock and such lookup is already executed w/o the KERNEL_LOCK().
ok kettenis@, deraadt@
|
|
|
|
violations in system accounting. This will help to find missbehaving
programs and possible attacks. The flags bit field is full, so
recycle the PDP-11 compatibility on VAX. lastcomm(1) prints the
AMAP flag as 'M'. daily(8) prints a list of affected processes.
OK deraadt@
|
|
UVM_WAIT() doesn't provide much of a useful abstraction. All callers
tsleep forever and no callers set PCATCH, so only 2 of 4 parameters are
actually used. Might as well just use tsleep_nsec(9) directly and make
the uvm code a bit less specialized.
Suggested by mpi@.
ok mpi@ visa@ millert@
|
|
Equivalent to their unsuffixed counterparts except that (a) they take
a timeout in terms of nanoseconds, and (b) INFSLP, aka UINT64_MAX (not
zero) indicates that a timeout should not be set.
For now, zero nanoseconds is not a strictly valid invocation: we log a
warning on DIAGNOSTIC kernels if we see such a call. We still sleep
until the next tick in such a case, however. In the future this could
become some sort of poll... TBD.
To facilitate conversions to these interfaces: add inline conversion
functions to sys/time.h for turning your timeout into nanoseconds.
Also do a few easy conversions for warmup and to demonstrate how
further conversions should be done.
Lots of input from mpi@ and ratchov@. Additional input from tedu@,
deraadt@, mortimer@, millert@, and claudio@.
Partly inspired by FreeBSD r247787.
positive feedback from deraadt@, ok mpi@
|
|
ok visa@, semarie@
|
|
of resource limit structs has been done between processes. By applying
copy-on-write also between threads, threads can read rlimits in
a nearly lock-free manner.
Inspired by code in DragonFly BSD and FreeBSD.
OK mpi@, agreement from jmatthew@ and anton@
|
|
damaged the error messages. Repair that, passing distinct format
strings for the two cases.
ok beck
|
|
Lookup the address that a syscall instruction is executed from, and kill
the process if that page is writeable. This brings an aspect of W^X
behaviour to W|X mappings (in JITs not yet adapted to W^X). The goal is
to remove simple attack methods and force use of ret2libc or other more
complicated means.
ok kettenis stefan visa
|
|
taking the kernel lock on when operating on the kernel_map when called from
all kernel memory allocation interfaces.
ok visa@, mlarkin@
|
|
|
|
knob, since we found a proram which tests RWX mapping then changes execution
behaviour to non-W^X.
(that program is chrome, as v8 is heading towards W^X compliance with
mprotect RW/RX swaps, and also has jitless components in developent.)
ok sthen kettenis robert
|
|
under lock
ok guenther@
|