summaryrefslogtreecommitdiff
path: root/sys/uvm
AgeCommit message (Collapse)Author
2019-04-23Remove file name and line number output from witness(4)Visa Hankala
Reduce code clutter by removing the file name and line number output from witness(4). Typically it is easy enough to locate offending locks using the stack traces that are shown in lock order conflict reports. Tricky cases can be tracked using sysctl kern.witness.locktrace=1 . This patch additionally removes the witness(4) wrapper for mutexes. Now each mutex implementation has to invoke the WITNESS_*() macros in order to utilize the checker. Discussed with and OK dlg@, OK mpi@
2019-04-02Restrict which filesystems are available for swap. This rules outVisa Hankala
obvious misconfigurations that cannot work. OK mpi@ tedu@
2019-04-02BOGO_PC is an invalid userland address, which indicates kbind() is nowTheo de Raadt
disabled in the process. Rather than tying it to KERNBASE, make it simply -1, which means it even more invalid.. ok tedu
2019-03-01New mmap(2) flag: MAP_CONCEAL.cheloha
MAP_CONCEAL'd memory is not written to disk in the event of a core dump. It may grow other qualities in the future. Wanted by libressl, probably useful elsewhere, too. Prompted by deraadt@, concept from deraadt@/kettenis@. With input from deraadt@, cjeker@, kettenis@, otto@, bcook@, matthew@, guenther@, djm@, and tedu@. ok otto@ deraadt@
2019-02-26Introduce safe memory reclamation, a mechanism for reclaiming sharedVisa Hankala
objects that readers can access without locking. This provides a basis for read-copy-update operations. Readers access SMR-protected shared objects inside SMR read-side critical section where sleeping is not allowed. To reclaim an SMR-protected object, the writer has to ensure mutual exclusion of other writers, remove the object's shared reference and wait until read-side references cannot exist any longer. As an alternative to waiting, the writer can schedule a callback that gets invoked when reclamation is safe. The mechanism relies on CPU quiescent states to determine when an SMR-protected object is ready for reclamation. The <sys/smr.h> header additionally provides an implementation of singly- and doubly-linked lists that can be used together with SMR. These lists allow lockless read access with a concurrent writer. Discussed with many OK mpi@ sashan@
2019-02-22at some point the uvm_km_thread learned to free memory, but the commentTed Unangst
was never updated. from Amit Kulkarni
2019-02-15With an opportunistic check performed at every trap, we insist userlandTheo de Raadt
sp must be on a MAP_STACK page. Relax the check a bit -- the sp may be on a PROT_NONE page. Can't see how an attacker can leverage that situation. (New perl build process contains a "how many call frames can my stack hold" checker, and this triggers via the MAP_STACK fault rather than the normal access check. The MAP_STACK check still has a kernel printf as we hunt for applications which map stacks poorly. Interestingly the perl code has a knob to disable similar printing alerts on Windows, which apparently has a feature somewhat like MAP_STACK!) ok tedu guenther kettenis
2019-02-10"non-existant" is one of those words that don't exist, so use "non-existent"Peter Hessler
instead From Pamela Mosiejczuk, many thanks! OK phessler@ deraadt@
2019-02-03Always refault if relocking maps fails after IO. This fixes a regressionVisa Hankala
introduced with __MAP_NOFAULT. The regression let uvm_fault() run without proper locking and rechecking of state after map version change if page zero-fill was chosen. OK kettenis@ deraadt@ Reported-by: syzbot+9972088c1026668c6c5c@syzkaller.appspotmail.com
2019-01-11mincore() is a relic from the past, exposing physical machine informationTheo de Raadt
about shared resources which no program should see. only a few pieces of software use it, generally poorly thought out. they are being fixed, so mincore() can be deleted. ok guenther tedu jca sthen, others
2019-01-10Make mincore lie. The nature of shared memory means it can spy on whatTed Unangst
another process is doing. We don't want that, so instead have it always return that memory is in core. ok deraadt kettenis
2019-01-10Hold a read lock on the map while doing the actual device I/O during inMark Kettenis
physio(9) to prevent another thread from unmapping the memory and triggering an assertion or even corruption random physical memory pages. ok deraadt@ Should fix: Reported-by: syzbot+b8e7faf688f8c9d341b1@syzkaller.appspotmail.com Reported-by: syzbot+b6a9255faa0605669432@syzkaller.appspotmail.com
2018-11-06new sysctl for userland malloc flags, kernel part. ok millert@ deraadt@Otto Moerbeek
2018-10-31Add support to uvm to establish write-combining mappings. Use this in theMark Kettenis
inteldrm driver to add support for the I915_MMAP_WC flag. ok deraadt@, jsg@
2018-08-20Preparations for arm64 radeondrm(4) support.Mark Kettenis
ok jsg@ (who pointed out the kern_pledge.c change was necessary as well)
2018-08-15Push back the kernel lock in sys_mmap(2) a little bit more now thatMark Kettenis
fd_getfile(9) is mpsafe. Note that sys_mmap(2) isn't actually unlocked currently. However this diff has been tested with it unlocked, and I hope to unlock it for real soon-ish. ok visa@, mpi@
2018-07-22In uvm_map_protect(), make sure we select a first map entry that ends afterMark Kettenis
the start of the range of pages that we're changing. Prevents a panic from a somewhat convoluted test case that anton@ came up with. ok guenther@, anton@
2018-07-16Insert the appropriate uvm_vnp_uncache(9) and uvm_vnp_setsize(9)helg
kernel calls to ensure that the UVM cache for memory mapped files is up to date. ok mpi@
2018-06-19Rename some unused fields in struct uvmexp toKenneth R Westerback
unusedNN. Missing man page bits pointed out by jmc@. Ports source scan by sthen@. ok deraadt@ guenther@
2018-05-16Avoid overflow in constraint computation; ok kettenis@ tb@Otto Moerbeek
2018-05-12Re-apply inadvertantly misplaced r1.127 from kettenis@:Kenneth R Westerback
"Buffer cache pages are wired but not counted as such. Therefore we have to set the wire count on the pages to 0 before we call uvm_pagefree() on them, just like we do in buf_free_pages(). Otherwise the wired pages counter goes negative. While there, also sprinkle some KASSERTs in there that buf_free_pages() has as well." ok beck@ (again)
2018-05-02Remove proc from the parameters of vn_lock(). The parameter isVisa Hankala
unnecessary because curproc always does the locking. OK mpi@
2018-04-28Clean up the parameters of VOP_LOCK() and VOP_UNLOCK(). It is alwaysVisa Hankala
curproc that does the locking or unlocking, so the proc parameter is pointless and can be dropped. OK mpi@, deraadt@
2018-04-27Move FREF() inside fd_getfile().Martin Pieuchot
ok visa@
2018-04-18Some programs create a PROT_NONE guard page at the far-end of the providedTheo de Raadt
stack buffer. With a page-aligned buffer, creating a MAP_STACK sub-region would undo the PROT_NONE guard. Ignore that last page. (We could check if the last page is non-RW before choosing to skip it. But we've already elected to grow STK sizes to compensate. Always ignoring the last page makes it a non-MAP_STACK guard page which can be opportunistically discovered) ok semarie stefan kettenis
2018-04-17- Make rnd hints avoid the brk area. The rnd allocator refuses to allocate inOtto Moerbeek
the brk area anyway. - Use a larger hint bound to spread the allocations more for the 32-bit case - Simplified the overy abstracted brs/stack allocator and switch of guard pages for the brk case. This allows i386 some extra space, depending on memory usage patterns. - Reduce brk area on i386 to give the rnd space more room ok stefan@ sthen@
2018-04-17Remove protection checks from uvm_map_is_stack_remappableStefan Kempf
Other parts of uvm/pmap check for proper prot flags already. This fixes the qemu startup problems that semarie@ reported on tech@.
2018-04-12Implement MAP_STACK option for mmap(). Synchronous faults (pagefault andTheo de Raadt
syscall) confirm the stack register points at MAP_STACK memory, otherwise SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified to create a MAP_STACK sub-region which satisfies alignment requirements. Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the contents of the region -- there is no mprotect() equivalent operation, so there is no MAP_STACK-adding gadget. This opportunistic software-emulation of a stack protection bit makes stack-pivot operations during ROPchain fragile (kind of like removing a tool from the toolbox). original discussion with tedu, uvm work by stefan, testing by mortimer ok kettenis
2018-04-10Fix stop condition for linear search by taking into account the searchOtto Moerbeek
direction, otherwise we might break the loop prematurely; ok stefan@
2018-03-30Unlock the NET_LOCK() before calling vn_lock(9) to avoid a lock orderingMartin Pieuchot
issues with upcoming NFSnode's locks. ok visa@
2018-03-27Make sure that programs violating a pledge(2) promise or some memoryMartin Pieuchot
protection cannot block the final SIGABRT. While here apply the same logic to ddb(4)'s kill command. From semarie@, ok deraadt@
2018-03-08When we are rebooting, do not fail in uvn_io(). The vnodes areAlexander Bluhm
revoked while syncing disk, so the processes lose their executable pages. Instead of killing them with a SIGBUS after page fault, just sleep. This should prevent that init dies without pages followed by a kernel panic. initial diff from tedu@; OK deraadt@ tedu@
2018-02-19Remove almost unused `flags' argument of suser().Martin Pieuchot
The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
2018-02-11Can mask MAP_STACK by name rather than numberTheo de Raadt
2018-01-18While booting it does not make sense to wait for memory, there isAlexander Bluhm
no other process which could free it. Better panic in malloc(9) or pool_get(9) instead of sleeping forever. tested by visa@ patrick@ Jan Klemkow suggested by kettenis@; OK deraadt@
2018-01-15mask out (ie. ignore) the bit which will be MAP_STACK in the future,Theo de Raadt
so diffs in snapshots can exercise the change in a less disruptive way idea with sthen, ok kettenis tom others
2018-01-02Stop assuming <sys/file.h> will pull in fcntl.h when _KERNEL is defined.Philip Guenther
ok millert@ sthen@
2017-12-30Don't pull in <sys/file.h> just to get fcntl.hPhilip Guenther
ok deraadt@ krw@
2017-11-30__MAP_NOFAULT doesn't make sense with anon mappings, so return EINVAL ifPhilip Guenther
that is attempted. Minor cleanups: - Eliminate some always false and always true tests against MAP_ANON - We treat anon mappings with neither MAP_{SHARED,PRIVATE} as MAP_PRIVATE so explicitly indicate that ok kettenis@ beck@
2017-08-12Use the NET_LOCK() macro instead of handrolling it.Martin Pieuchot
Tested by Hrvoje Popovski.
2017-08-12In the locking wrappers for &map->lock and &map->mtx, pass through file+linePhilip Guenther
when WITNESS is enabled ok visa@ kettenis@
2017-07-20Accessing a mmap(2)ed file behind its end should result in a SIGBUSAlexander Bluhm
according to POSIX. Bring regression test and kernel in line for amd64 and i386. Other architectures have to follow. OK deraadt@ kettenis@
2017-05-21Enable radeondrm(4) on loongson to get accelerated graphicsVisa Hankala
with the RS780E chipset. OK kettenis@, jsg@
2017-05-17Raise "uvm_map_entry_kmem_pool" IPL level to IPL_VM to prevent a deadlock.Martin Pieuchot
A deadlock can occur when the uvm_km_thread(), running without KERNEL_LOCK() is interrupted by and non-MPSAFE handler while holding the pool's mutex. At that moment if another CPU is holding the KERNEL_LOCK() and wants to grab the pool mutex, like in sys_kbind(), kaboom! This is a temporaty solution, a more generate approach regarding mutexes and un-KERNEL_LOCK()ed threads is beeing discussed. Deadlock reported by sthen@, ok kettenis@
2017-05-15Enable the NET_LOCK(), take 3.Martin Pieuchot
Recursions are still marked as XXXSMP. ok deraadt@, bluhm@
2017-05-11unbreak PMAP_DIRECT archs.David Gwynne
found by jmc@
2017-05-11reorder uvm init to avoid use before initialisation.David Gwynne
the particular use before init was in uvm_init step 6, which calls kmeminit to set up malloc(9), which calls uvm_km_zalloc, which calls pmap_enter, which calls pool_get, which tries to allocate a page using km_alloc, which isnt initalised until step 9 in uvm_init. uvm_km_page_init calls kthread_create though, which uses malloc internally, so it cant be reordered before malloc init. to cope with this, uvm_km_page_init is split up. it sets up the subsystem, and is called before kmeminit. the thread init is moved to uvm_km_page_lateinit, which is called after kmeminit in uvm_init.
2017-05-09Stop considering some sleeping threads are running.Martin Pieuchot
PZERO used to be a special value in the first BSD releases but since the introduction of tsleep(9) there's no way to tell if a thread is going to sleep for a "short" period of time. This remove the only (ab)use of ``p_priority'' outside the scheuler logic, which will help moving avway from a priority-based scheduler. ok visa@
2017-05-08Unifed PMAP_UAREA, unused since we stopped supporting ARM < v7.Martin Pieuchot
ok kettenis@
2017-05-03Mark uvm_sync_lock as vnode'ish for witness purposes, as it is takenPhilip Guenther
between mount locks and inode locks, which may been recorded in either order ok visa@