summaryrefslogtreecommitdiff
path: root/sys/uvm/uvm_map.c
AgeCommit message (Collapse)Author
2021-12-07uvm_map_inentry() is provided a format string that says "inside", but thenTheo de Raadt
prints the end which is in the next page. Subtract 1 to avoid confusion.
2021-10-24Move pmap_{,k}remove() inside uvm_km_pgremove{,_intrsafe}().Martin Pieuchot
Reduce differences with NetBSD, tested by many as part of a larger diff. ok kettenis@
2021-10-05Unref/free amaps before grabbing the KERNEL_LOCK().Martin Pieuchot
This is possible now that amaps & anons are protected by a per-map rwlock. Tested by many as part of a bigger diff. ok kettenis@
2021-06-17Revert previous: unref of amap outside of the KERNEL_LOCK().Martin Pieuchot
This change introduced or exposed a leak of anons which result in system freezes. anton@ observed a high number of INUSE for anonpl and semarie@ saw multiple processes waiting in the fault handler on "flt_noramX" probably the one related to allocating an anon.
2021-06-15Unref/free amaps before grabbing the KERNEL_LOCK().Martin Pieuchot
This is possible now that amaps & anons are protected by a per-map rwlock. ok kettenis@, jmatthew@
2021-05-22Use atomic operations for reference counting VM maps.Martin Pieuchot
This is necessary to do this accounting without the KERNEL_LOCK(). ok mvs@, kettenis@
2021-03-26Remove parenthesis around return value to reduce the diff with NetBSD.Martin Pieuchot
No functional change. ok mlarkin@
2021-03-12spellingJonathan Gray
ok mpi@
2021-03-05ansiJonathan Gray
2021-02-23remove unused uvm_mapent_bias()Jonathan Gray
ok mpi@
2021-01-19(re)Introduce locking for amaps & anons.Martin Pieuchot
A rwlock is attached to every amap and is shared with all its anon. The same lock will be used by multiple amaps if they have anons in common. This should be enough to get the upper part of the fault handler out of the KERNEL_LOCK() which seems to bring up to 20% improvements in builds. This is based/copied/adapted from the most recent work done in NetBSD which is an evolution of the precendent simple_lock scheme. Tested by many, thanks! ok kettenis@, mvs@
2020-10-19Serialize accesses to "struct vmspace" and document its refcounting.Martin Pieuchot
The underlying vm_space lock is used as a substitute to the KERNEL_LOCK() in uvm_grow() to make sure `vm_ssize' is not corrupted. ok anton@, kettenis@
2020-09-22Spell inline correctly.Martin Pieuchot
Reduce differences with NetBSD. ok mvs@, kettenis@
2020-09-14Since the issues with calling uvm_map_inentry_fix() without holding theMark Kettenis
kernel lock are fixed now, push the kernel lock down again. ok deraadt@
2020-09-12Add tracepoints in the page fault handler and when entries are added to maps.Martin Pieuchot
ok kettenis@
2020-07-06fix spellingTheo de Raadt
2020-03-25Do not test against NULL a variable which is dereference before that.Martin Pieuchot
CID 1453116 ok kettenis@
2020-03-04Do not count pages mapped as PROT_NONE against the RLIMIT_DATA limit.Mark Kettenis
Instead count (and check the limit) when their protection gets flipped from PROT_NONE to something that permits access. This means that mprotect(2) may now fail if changing the protection would exceed RLIMIT_DATA. This helps code (such as Chromium's JavaScript interpreter that reserves large chunks of address space but populates it sparsely. ok deraadt@, otto@, kurt@, millert@, robert@
2019-12-30convert infinite msleep(9) to msleep_nsec(9)Jonathan Gray
ok mpi@
2019-12-18Set vm_map's pmap in uvm_map_setup().Visa Hankala
OK guenther@, kettenis@, mpi@
2019-12-18Use separate rwlock initializations for userland ("vmspace") and kernelMark Kettenis
maps. This lets witness know that these really are different classes avoiding false positives when detecting lock order reversals. ok guenther@, visa@, mpi@
2019-12-12Header cleanup.Martin Pieuchot
- reduces gratuitous differences with NetBSD, - merges multiple '#ifdef _KERNEL' blocks, - kills unused 'struct vm_map_intrsafe' - turns 'union vm_map_object' into a anonymous union (following to NetBSD) - move questionable vm_map_modflags() into uvm/uvm_map.c - remove guards around MAX_KMAPENT, it is defined&used only once - document lock differences - fix tab vs space ok mlarkin@, visa@
2019-12-09Many people have crossed the ABI, so re-enable "syscall call-from" checking.Theo de Raadt
2019-12-09improve comment for uvm_map_inentry_pc(), the underlyingTheo de Raadt
non-writeable / syscall checker.
2019-12-04Fix a bad offset calculation in uvm_share.Mike Larkin
Syzkaller found a bug in uvm_share when using a vmd(8) mmap region with an offset that ended up making an overlap with a previous vmm(4) uvm_map range. This diff reworks the range and offset calculation in uvm_share. Only vmm(4) uses this, so there should be no visible effects outside vmm(4) environments. Syzkaller also went sorta crazy on this one, finding multiple reproducers for the same bug with just slightly different parameters, thus the multiple "Reported-by" lines below. ok stefan@, anton@ Reported-by: syzbot+2c625ab1b8e964da644a@syzkaller.appspotmail.com Reported-by: syzbot+1300829862412751462d@syzkaller.appspotmail.com Reported-by: syzbot+27cfad3394f34528cbec@syzkaller.appspotmail.com Reported-by: syzbot+3e700c5698177f91cce1@syzkaller.appspotmail.com
2019-12-02Stop supporting UVM_FLAG_TRYLOCK in uvm_mapanon(), it is not used.Martin Pieuchot
ok tedu@, visa@
2019-11-30temporarily neuter the syscall-callfrom check as a few peopleTheo de Raadt
haven't crossed over the ABI break as easily as expected.
2019-11-29Repurpose the "syscalls must be on a writeable page" mechanism toTheo de Raadt
enforce a new policy: system calls must be in pre-registered regions. We have discussed more strict checks than this, but none satisfy the cost/benefit based upon our understanding of attack methods, anyways let's see what the next iteration looks like. This is intended to harden (translation: attackers must put extra effort into attacking) against a mixture of W^X failures and JIT bugs which allow syscall misinterpretation, especially in environments with polymorphic-instruction/variable-sized instructions. It fits in a bit with libc/libcrypto/ld.so random relink on boot and no-restart-at-crash behaviour, particularily for remote problems. Less effective once on-host since someone the libraries can be read. For static-executables the kernel registers the main program's PIE-mapped exec section valid, as well as the randomly-placed sigtramp page. For dynamic executables ELF ld.so's exec segment is also labelled valid; ld.so then has enough information to register libc's exec section as valid via call-once msyscall(2) For dynamic binaries, we continue to to permit the main program exec segment because "go" (and potentially a few other applications) have embedded system calls in the main program. Hopefully at least go gets fixed soon. We declare the concept of embedded syscalls a bad idea for numerous reasons, as we notice the ecosystem has many of static-syscall-in-base-binary which are dynamically linked against libraries which in turn use libc, which contains another set of syscall stubs. We've been concerned about adding even one additional syscall entry point... but go's approach tends to double the entry-point attack surface. This was started at a nano-hackathon in Bob Beck's basement 2 weeks ago during a long discussion with mortimer trying to hide from the SSL scream-conversations, and finished in more comfortable circumstances next to a wood-stove at Elk Lakes cabin with UVM scream-conversations. ok guenther kettenis mortimer, lots of feedback from others conversations about go with jsing tb sthen
2019-11-26Fix a panic string that had the wrong function name and an improperlyMike Larkin
wrapped line. No code change.
2019-11-26Fix a bunch of lines that had trailing whitespace.Mike Larkin
No code change.
2019-11-02Revert previous, a race is present and can be triggered with golang.Martin Pieuchot
Found by jsing@
2019-11-01Push the KERNEL_LOCK() down in uvm_map_inentry().Martin Pieuchot
The lookup in uvm_map_inentry_fix() is already serialized by the vm_map_lock and such lookup is already executed w/o the KERNEL_LOCK(). ok kettenis@, deraadt@
2019-11-01Keep local function definitions in C files.Martin Pieuchot
2019-09-09Inform about system call memory write protection and stack mappingAlexander Bluhm
violations in system accounting. This will help to find missbehaving programs and possible attacks. The flags bit field is full, so recycle the PDP-11 compatibility on VAX. lastcomm(1) prints the AMAP flag as 'M'. daily(8) prints a list of affected processes. OK deraadt@
2019-06-14The addition of writeable-syscall checking near MAP_STACK checkingTheo de Raadt
damaged the error messages. Repair that, passing distinct format strings for the two cases. ok beck
2019-06-01Refactor the MAP_STACK feature, and introduce another similar variation:Theo de Raadt
Lookup the address that a syscall instruction is executed from, and kill the process if that page is writeable. This brings an aspect of W^X behaviour to W|X mappings (in JITs not yet adapted to W^X). The goal is to remove simple attack methods and force use of ret2libc or other more complicated means. ok kettenis stefan visa
2019-05-16Handle a bit more work without taking the kernel lock. This should avoidMark Kettenis
taking the kernel lock on when operating on the kernel_map when called from all kernel memory allocation interfaces. ok visa@, mlarkin@
2019-04-23Remove file name and line number output from witness(4)Visa Hankala
Reduce code clutter by removing the file name and line number output from witness(4). Typically it is easy enough to locate offending locks using the stack traces that are shown in lock order conflict reports. Tricky cases can be tracked using sysctl kern.witness.locktrace=1 . This patch additionally removes the witness(4) wrapper for mutexes. Now each mutex implementation has to invoke the WITNESS_*() macros in order to utilize the checker. Discussed with and OK dlg@, OK mpi@
2019-03-01New mmap(2) flag: MAP_CONCEAL.cheloha
MAP_CONCEAL'd memory is not written to disk in the event of a core dump. It may grow other qualities in the future. Wanted by libressl, probably useful elsewhere, too. Prompted by deraadt@, concept from deraadt@/kettenis@. With input from deraadt@, cjeker@, kettenis@, otto@, bcook@, matthew@, guenther@, djm@, and tedu@. ok otto@ deraadt@
2019-02-15With an opportunistic check performed at every trap, we insist userlandTheo de Raadt
sp must be on a MAP_STACK page. Relax the check a bit -- the sp may be on a PROT_NONE page. Can't see how an attacker can leverage that situation. (New perl build process contains a "how many call frames can my stack hold" checker, and this triggers via the MAP_STACK fault rather than the normal access check. The MAP_STACK check still has a kernel printf as we hunt for applications which map stacks poorly. Interestingly the perl code has a knob to disable similar printing alerts on Windows, which apparently has a feature somewhat like MAP_STACK!) ok tedu guenther kettenis
2019-02-10"non-existant" is one of those words that don't exist, so use "non-existent"Peter Hessler
instead From Pamela Mosiejczuk, many thanks! OK phessler@ deraadt@
2018-10-31Add support to uvm to establish write-combining mappings. Use this in theMark Kettenis
inteldrm driver to add support for the I915_MMAP_WC flag. ok deraadt@, jsg@
2018-07-22In uvm_map_protect(), make sure we select a first map entry that ends afterMark Kettenis
the start of the range of pages that we're changing. Prevents a panic from a somewhat convoluted test case that anton@ came up with. ok guenther@, anton@
2018-04-18Some programs create a PROT_NONE guard page at the far-end of the providedTheo de Raadt
stack buffer. With a page-aligned buffer, creating a MAP_STACK sub-region would undo the PROT_NONE guard. Ignore that last page. (We could check if the last page is non-RW before choosing to skip it. But we've already elected to grow STK sizes to compensate. Always ignoring the last page makes it a non-MAP_STACK guard page which can be opportunistically discovered) ok semarie stefan kettenis
2018-04-17- Make rnd hints avoid the brk area. The rnd allocator refuses to allocate inOtto Moerbeek
the brk area anyway. - Use a larger hint bound to spread the allocations more for the 32-bit case - Simplified the overy abstracted brs/stack allocator and switch of guard pages for the brk case. This allows i386 some extra space, depending on memory usage patterns. - Reduce brk area on i386 to give the rnd space more room ok stefan@ sthen@
2018-04-17Remove protection checks from uvm_map_is_stack_remappableStefan Kempf
Other parts of uvm/pmap check for proper prot flags already. This fixes the qemu startup problems that semarie@ reported on tech@.
2018-04-12Implement MAP_STACK option for mmap(). Synchronous faults (pagefault andTheo de Raadt
syscall) confirm the stack register points at MAP_STACK memory, otherwise SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified to create a MAP_STACK sub-region which satisfies alignment requirements. Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the contents of the region -- there is no mprotect() equivalent operation, so there is no MAP_STACK-adding gadget. This opportunistic software-emulation of a stack protection bit makes stack-pivot operations during ROPchain fragile (kind of like removing a tool from the toolbox). original discussion with tedu, uvm work by stefan, testing by mortimer ok kettenis
2017-11-30__MAP_NOFAULT doesn't make sense with anon mappings, so return EINVAL ifPhilip Guenther
that is attempted. Minor cleanups: - Eliminate some always false and always true tests against MAP_ANON - We treat anon mappings with neither MAP_{SHARED,PRIVATE} as MAP_PRIVATE so explicitly indicate that ok kettenis@ beck@
2017-08-12In the locking wrappers for &map->lock and &map->mtx, pass through file+linePhilip Guenther
when WITNESS is enabled ok visa@ kettenis@
2017-05-17Raise "uvm_map_entry_kmem_pool" IPL level to IPL_VM to prevent a deadlock.Martin Pieuchot
A deadlock can occur when the uvm_km_thread(), running without KERNEL_LOCK() is interrupted by and non-MPSAFE handler while holding the pool's mutex. At that moment if another CPU is holding the KERNEL_LOCK() and wants to grab the pool mutex, like in sys_kbind(), kaboom! This is a temporaty solution, a more generate approach regarding mutexes and un-KERNEL_LOCK()ed threads is beeing discussed. Deadlock reported by sthen@, ok kettenis@