summaryrefslogtreecommitdiff
path: root/sys/uvm
AgeCommit message (Collapse)Author
2019-09-09Inform about system call memory write protection and stack mappingAlexander Bluhm
violations in system accounting. This will help to find missbehaving programs and possible attacks. The flags bit field is full, so recycle the PDP-11 compatibility on VAX. lastcomm(1) prints the AMAP flag as 'M'. daily(8) prints a list of affected processes. OK deraadt@
2019-07-18R.I.P. UVM_WAIT(). Use tsleep_nsec(9) directly.cheloha
UVM_WAIT() doesn't provide much of a useful abstraction. All callers tsleep forever and no callers set PCATCH, so only 2 of 4 parameters are actually used. Might as well just use tsleep_nsec(9) directly and make the uvm code a bit less specialized. Suggested by mpi@. ok mpi@ visa@ millert@
2019-07-03Add tsleep_nsec(9), msleep_nsec(9), and rwsleep_nsec(9).cheloha
Equivalent to their unsuffixed counterparts except that (a) they take a timeout in terms of nanoseconds, and (b) INFSLP, aka UINT64_MAX (not zero) indicates that a timeout should not be set. For now, zero nanoseconds is not a strictly valid invocation: we log a warning on DIAGNOSTIC kernels if we see such a call. We still sleep until the next tick in such a case, however. In the future this could become some sort of poll... TBD. To facilitate conversions to these interfaces: add inline conversion functions to sys/time.h for turning your timeout into nanoseconds. Also do a few easy conversions for warmup and to demonstrate how further conversions should be done. Lots of input from mpi@ and ratchov@. Additional input from tedu@, deraadt@, mortimer@, millert@, and claudio@. Partly inspired by FreeBSD r247787. positive feedback from deraadt@, ok mpi@
2019-07-01Document which mechanism protect some fields used w/o KERNEL_LOCK().Martin Pieuchot
ok visa@, semarie@
2019-06-21Make resource limit access MP-safe. So far, the copy-on-write sharingVisa Hankala
of resource limit structs has been done between processes. By applying copy-on-write also between threads, threads can read rlimits in a nearly lock-free manner. Inspired by code in DragonFly BSD and FreeBSD. OK mpi@, agreement from jmatthew@ and anton@
2019-06-14The addition of writeable-syscall checking near MAP_STACK checkingTheo de Raadt
damaged the error messages. Repair that, passing distinct format strings for the two cases. ok beck
2019-06-01Refactor the MAP_STACK feature, and introduce another similar variation:Theo de Raadt
Lookup the address that a syscall instruction is executed from, and kill the process if that page is writeable. This brings an aspect of W^X behaviour to W|X mappings (in JITs not yet adapted to W^X). The goal is to remove simple attack methods and force use of ret2libc or other more complicated means. ok kettenis stefan visa
2019-05-16Handle a bit more work without taking the kernel lock. This should avoidMark Kettenis
taking the kernel lock on when operating on the kernel_map when called from all kernel memory allocation interfaces. ok visa@, mlarkin@
2019-05-15free size for amap; ok visa@anton
2019-05-11move the noise about W^X mapping failure inside the sysctl kern.wxabortTheo de Raadt
knob, since we found a proram which tests RWX mapping then changes execution behaviour to non-W^X. (that program is chrome, as v8 is heading towards W^X compliance with mprotect RW/RX swaps, and also has jitless components in developent.) ok sthen kettenis robert
2019-05-10simplify logic after wakeup since this variable is only manipulatedBob Beck
under lock ok guenther@
2019-05-10Check for nowait failed *after* the wakeup point, not before.Bob Beck
ok guenther@
2019-05-09Ensure that pagedaemon wakeups as a result of failed UVM_PLA_NOWAITBob Beck
allocations will recover some memory from the dma_constraint range. The allocation still fails, the intent is to ensure that the pagedaemon will free some memory to possibly allow a subsequent allocation to succeed. This also adds a UVM_PLA_NOWAKE flag to allow special cases in the buffer cache to not wake up the pagedaemon until they want to. ok kettenis@
2019-04-23Remove file name and line number output from witness(4)Visa Hankala
Reduce code clutter by removing the file name and line number output from witness(4). Typically it is easy enough to locate offending locks using the stack traces that are shown in lock order conflict reports. Tricky cases can be tracked using sysctl kern.witness.locktrace=1 . This patch additionally removes the witness(4) wrapper for mutexes. Now each mutex implementation has to invoke the WITNESS_*() macros in order to utilize the checker. Discussed with and OK dlg@, OK mpi@
2019-04-02Restrict which filesystems are available for swap. This rules outVisa Hankala
obvious misconfigurations that cannot work. OK mpi@ tedu@
2019-04-02BOGO_PC is an invalid userland address, which indicates kbind() is nowTheo de Raadt
disabled in the process. Rather than tying it to KERNBASE, make it simply -1, which means it even more invalid.. ok tedu
2019-03-01New mmap(2) flag: MAP_CONCEAL.cheloha
MAP_CONCEAL'd memory is not written to disk in the event of a core dump. It may grow other qualities in the future. Wanted by libressl, probably useful elsewhere, too. Prompted by deraadt@, concept from deraadt@/kettenis@. With input from deraadt@, cjeker@, kettenis@, otto@, bcook@, matthew@, guenther@, djm@, and tedu@. ok otto@ deraadt@
2019-02-26Introduce safe memory reclamation, a mechanism for reclaiming sharedVisa Hankala
objects that readers can access without locking. This provides a basis for read-copy-update operations. Readers access SMR-protected shared objects inside SMR read-side critical section where sleeping is not allowed. To reclaim an SMR-protected object, the writer has to ensure mutual exclusion of other writers, remove the object's shared reference and wait until read-side references cannot exist any longer. As an alternative to waiting, the writer can schedule a callback that gets invoked when reclamation is safe. The mechanism relies on CPU quiescent states to determine when an SMR-protected object is ready for reclamation. The <sys/smr.h> header additionally provides an implementation of singly- and doubly-linked lists that can be used together with SMR. These lists allow lockless read access with a concurrent writer. Discussed with many OK mpi@ sashan@
2019-02-22at some point the uvm_km_thread learned to free memory, but the commentTed Unangst
was never updated. from Amit Kulkarni
2019-02-15With an opportunistic check performed at every trap, we insist userlandTheo de Raadt
sp must be on a MAP_STACK page. Relax the check a bit -- the sp may be on a PROT_NONE page. Can't see how an attacker can leverage that situation. (New perl build process contains a "how many call frames can my stack hold" checker, and this triggers via the MAP_STACK fault rather than the normal access check. The MAP_STACK check still has a kernel printf as we hunt for applications which map stacks poorly. Interestingly the perl code has a knob to disable similar printing alerts on Windows, which apparently has a feature somewhat like MAP_STACK!) ok tedu guenther kettenis
2019-02-10"non-existant" is one of those words that don't exist, so use "non-existent"Peter Hessler
instead From Pamela Mosiejczuk, many thanks! OK phessler@ deraadt@
2019-02-03Always refault if relocking maps fails after IO. This fixes a regressionVisa Hankala
introduced with __MAP_NOFAULT. The regression let uvm_fault() run without proper locking and rechecking of state after map version change if page zero-fill was chosen. OK kettenis@ deraadt@ Reported-by: syzbot+9972088c1026668c6c5c@syzkaller.appspotmail.com
2019-01-11mincore() is a relic from the past, exposing physical machine informationTheo de Raadt
about shared resources which no program should see. only a few pieces of software use it, generally poorly thought out. they are being fixed, so mincore() can be deleted. ok guenther tedu jca sthen, others
2019-01-10Make mincore lie. The nature of shared memory means it can spy on whatTed Unangst
another process is doing. We don't want that, so instead have it always return that memory is in core. ok deraadt kettenis
2019-01-10Hold a read lock on the map while doing the actual device I/O during inMark Kettenis
physio(9) to prevent another thread from unmapping the memory and triggering an assertion or even corruption random physical memory pages. ok deraadt@ Should fix: Reported-by: syzbot+b8e7faf688f8c9d341b1@syzkaller.appspotmail.com Reported-by: syzbot+b6a9255faa0605669432@syzkaller.appspotmail.com
2018-11-06new sysctl for userland malloc flags, kernel part. ok millert@ deraadt@Otto Moerbeek
2018-10-31Add support to uvm to establish write-combining mappings. Use this in theMark Kettenis
inteldrm driver to add support for the I915_MMAP_WC flag. ok deraadt@, jsg@
2018-08-20Preparations for arm64 radeondrm(4) support.Mark Kettenis
ok jsg@ (who pointed out the kern_pledge.c change was necessary as well)
2018-08-15Push back the kernel lock in sys_mmap(2) a little bit more now thatMark Kettenis
fd_getfile(9) is mpsafe. Note that sys_mmap(2) isn't actually unlocked currently. However this diff has been tested with it unlocked, and I hope to unlock it for real soon-ish. ok visa@, mpi@
2018-07-22In uvm_map_protect(), make sure we select a first map entry that ends afterMark Kettenis
the start of the range of pages that we're changing. Prevents a panic from a somewhat convoluted test case that anton@ came up with. ok guenther@, anton@
2018-07-16Insert the appropriate uvm_vnp_uncache(9) and uvm_vnp_setsize(9)helg
kernel calls to ensure that the UVM cache for memory mapped files is up to date. ok mpi@
2018-06-19Rename some unused fields in struct uvmexp toKenneth R Westerback
unusedNN. Missing man page bits pointed out by jmc@. Ports source scan by sthen@. ok deraadt@ guenther@
2018-05-16Avoid overflow in constraint computation; ok kettenis@ tb@Otto Moerbeek
2018-05-12Re-apply inadvertantly misplaced r1.127 from kettenis@:Kenneth R Westerback
"Buffer cache pages are wired but not counted as such. Therefore we have to set the wire count on the pages to 0 before we call uvm_pagefree() on them, just like we do in buf_free_pages(). Otherwise the wired pages counter goes negative. While there, also sprinkle some KASSERTs in there that buf_free_pages() has as well." ok beck@ (again)
2018-05-02Remove proc from the parameters of vn_lock(). The parameter isVisa Hankala
unnecessary because curproc always does the locking. OK mpi@
2018-04-28Clean up the parameters of VOP_LOCK() and VOP_UNLOCK(). It is alwaysVisa Hankala
curproc that does the locking or unlocking, so the proc parameter is pointless and can be dropped. OK mpi@, deraadt@
2018-04-27Move FREF() inside fd_getfile().Martin Pieuchot
ok visa@
2018-04-18Some programs create a PROT_NONE guard page at the far-end of the providedTheo de Raadt
stack buffer. With a page-aligned buffer, creating a MAP_STACK sub-region would undo the PROT_NONE guard. Ignore that last page. (We could check if the last page is non-RW before choosing to skip it. But we've already elected to grow STK sizes to compensate. Always ignoring the last page makes it a non-MAP_STACK guard page which can be opportunistically discovered) ok semarie stefan kettenis
2018-04-17- Make rnd hints avoid the brk area. The rnd allocator refuses to allocate inOtto Moerbeek
the brk area anyway. - Use a larger hint bound to spread the allocations more for the 32-bit case - Simplified the overy abstracted brs/stack allocator and switch of guard pages for the brk case. This allows i386 some extra space, depending on memory usage patterns. - Reduce brk area on i386 to give the rnd space more room ok stefan@ sthen@
2018-04-17Remove protection checks from uvm_map_is_stack_remappableStefan Kempf
Other parts of uvm/pmap check for proper prot flags already. This fixes the qemu startup problems that semarie@ reported on tech@.
2018-04-12Implement MAP_STACK option for mmap(). Synchronous faults (pagefault andTheo de Raadt
syscall) confirm the stack register points at MAP_STACK memory, otherwise SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified to create a MAP_STACK sub-region which satisfies alignment requirements. Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the contents of the region -- there is no mprotect() equivalent operation, so there is no MAP_STACK-adding gadget. This opportunistic software-emulation of a stack protection bit makes stack-pivot operations during ROPchain fragile (kind of like removing a tool from the toolbox). original discussion with tedu, uvm work by stefan, testing by mortimer ok kettenis
2018-04-10Fix stop condition for linear search by taking into account the searchOtto Moerbeek
direction, otherwise we might break the loop prematurely; ok stefan@
2018-03-30Unlock the NET_LOCK() before calling vn_lock(9) to avoid a lock orderingMartin Pieuchot
issues with upcoming NFSnode's locks. ok visa@
2018-03-27Make sure that programs violating a pledge(2) promise or some memoryMartin Pieuchot
protection cannot block the final SIGABRT. While here apply the same logic to ddb(4)'s kill command. From semarie@, ok deraadt@
2018-03-08When we are rebooting, do not fail in uvn_io(). The vnodes areAlexander Bluhm
revoked while syncing disk, so the processes lose their executable pages. Instead of killing them with a SIGBUS after page fault, just sleep. This should prevent that init dies without pages followed by a kernel panic. initial diff from tedu@; OK deraadt@ tedu@
2018-02-19Remove almost unused `flags' argument of suser().Martin Pieuchot
The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
2018-02-11Can mask MAP_STACK by name rather than numberTheo de Raadt
2018-01-18While booting it does not make sense to wait for memory, there isAlexander Bluhm
no other process which could free it. Better panic in malloc(9) or pool_get(9) instead of sleeping forever. tested by visa@ patrick@ Jan Klemkow suggested by kettenis@; OK deraadt@
2018-01-15mask out (ie. ignore) the bit which will be MAP_STACK in the future,Theo de Raadt
so diffs in snapshots can exercise the change in a less disruptive way idea with sthen, ok kettenis tom others
2018-01-02Stop assuming <sys/file.h> will pull in fcntl.h when _KERNEL is defined.Philip Guenther
ok millert@ sthen@