summaryrefslogtreecommitdiff
path: root/sys/uvm
AgeCommit message (Collapse)Author
2020-05-23Prevent km_alloc() from returning garbage if pagelist is empty.jan
ok bluhm@, visa@
2020-04-23Document uvmexp.nswget without relying on implementation details.Martin Pieuchot
Prompted by a question from schwarze@ ok deraadt@, schwarze@, visa@
2020-04-04Tweak the code that wakes up uvm_pmalloc sleepers in the page daemin.Mark Kettenis
Although there are open questions about whether we should flag failures with UVM_PMA_FAIL or not, we really should only wake up a sleeper if we unlink the pma. For now only do that if pages were actually freed in the requested region. Prompted by: CID 1453061 Logically dead code which should be fixed by this commit. ok (and together with) beck@
2020-03-25Do not test against NULL a variable which is dereference before that.Martin Pieuchot
CID 1453116 ok kettenis@
2020-03-24Use FALLTHROUGH in uvm_total() like it is done in uvm_loadav().Martin Pieuchot
CID 1453262.
2020-03-04Do not count pages mapped as PROT_NONE against the RLIMIT_DATA limit.Mark Kettenis
Instead count (and check the limit) when their protection gets flipped from PROT_NONE to something that permits access. This means that mprotect(2) may now fail if changing the protection would exceed RLIMIT_DATA. This helps code (such as Chromium's JavaScript interpreter that reserves large chunks of address space but populates it sparsely. ok deraadt@, otto@, kurt@, millert@, robert@
2020-02-18Cleanup <sys/kthread.h> and <sys/proc.h> includes.Martin Pieuchot
Do not include <sys/kthread.h> where it is not needed and stop including <sys/proc.h> in it. ok visa@, anton@
2020-01-20struct vops is not modified during runtime so use const which moves eachClaudio Jeker
into read-only data segment. OK deraadt@ tedu@
2020-01-16Use list for freeing pages in uvn_flush() to optimize freeing chunks ofMark Kettenis
contiguous pages. ok beck@
2020-01-04Add uvm_anfree_list() to free anons as a list of pages. Use this inBob Beck
the amap code to free pages as a list instead of one at a time to allow for more efficient freeing. Most of the work done at elk lakes, with testing by me and mlarkin and kettenis. Speeds up a test program which zeros a big pile of memory and then exits considerably. ok kettenis@
2020-01-01Add uvm_pmr_remove_1strange_reverse to efficiently free pagesBob Beck
in reverse order from uvm. Use it in uvm_pmr_freepageq when the pages appear to be in reverse order. This greatly improves cases of massive page freeing as noticed by mlarkin@ in his ongoing efforts to have the most gigantish buffer cache on the planet. Most of this work done by me with help and polish from kettenis@ at e2k19. Follow on commits to this will make use of this for more efficient freeing of amaps and a few other things. ok kettenis@ deraadt@
2019-12-30convert infinite msleep(9) to msleep_nsec(9)Jonathan Gray
ok mpi@
2019-12-25Hook up the shrinker for inteldrm(4). This is a "light" version that onlyMark Kettenis
drops graphics buffers that are cached and not in active use. Help from beck@ for pointing out how to hook this up to our pagedaemon. ok jsg@
2019-12-18Set vm_map's pmap in uvm_map_setup().Visa Hankala
OK guenther@, kettenis@, mpi@
2019-12-18Use separate rwlock initializations for userland ("vmspace") and kernelMark Kettenis
maps. This lets witness know that these really are different classes avoiding false positives when detecting lock order reversals. ok guenther@, visa@, mpi@
2019-12-12Header cleanup.Martin Pieuchot
- reduces gratuitous differences with NetBSD, - merges multiple '#ifdef _KERNEL' blocks, - kills unused 'struct vm_map_intrsafe' - turns 'union vm_map_object' into a anonymous union (following to NetBSD) - move questionable vm_map_modflags() into uvm/uvm_map.c - remove guards around MAX_KMAPENT, it is defined&used only once - document lock differences - fix tab vs space ok mlarkin@, visa@
2019-12-09Many people have crossed the ABI, so re-enable "syscall call-from" checking.Theo de Raadt
2019-12-09improve comment for uvm_map_inentry_pc(), the underlyingTheo de Raadt
non-writeable / syscall checker.
2019-12-08Convert infinite sleeps to {m,t}sleep_nsec(9).Martin Pieuchot
ok visa@, jca@
2019-12-08Remove an unnecessary #ifndef PMAP_EXCLUDE_DECLS. It was last utilizedVisa Hankala
by sparc pmap. OK mpi@ guenther@ kettenis@
2019-12-06Sync KVE_ET_* and UVM_ET_* flags.Martin Pieuchot
ok guenther@
2019-12-05Move uvmexp_print() to a better place.Martin Pieuchot
ok mlarkin@
2019-12-05Remove clause #3 from mrg@NetBSD license.Martin Pieuchot
In May 29 2008, Matthew R. Green removed it in NetBSD: github.com/IIJ-NetBSD/netbsd-src/commit/7ea20401d535da9996394136ef ok deraadt@
2019-12-04Fix a bad offset calculation in uvm_share.Mike Larkin
Syzkaller found a bug in uvm_share when using a vmd(8) mmap region with an offset that ended up making an overlap with a previous vmm(4) uvm_map range. This diff reworks the range and offset calculation in uvm_share. Only vmm(4) uses this, so there should be no visible effects outside vmm(4) environments. Syzkaller also went sorta crazy on this one, finding multiple reproducers for the same bug with just slightly different parameters, thus the multiple "Reported-by" lines below. ok stefan@, anton@ Reported-by: syzbot+2c625ab1b8e964da644a@syzkaller.appspotmail.com Reported-by: syzbot+1300829862412751462d@syzkaller.appspotmail.com Reported-by: syzbot+27cfad3394f34528cbec@syzkaller.appspotmail.com Reported-by: syzbot+3e700c5698177f91cce1@syzkaller.appspotmail.com
2019-12-02Stop supporting UVM_FLAG_TRYLOCK in uvm_mapanon(), it is not used.Martin Pieuchot
ok tedu@, visa@
2019-11-30temporarily neuter the syscall-callfrom check as a few peopleTheo de Raadt
haven't crossed over the ABI break as easily as expected.
2019-11-29Add uvm_objfree function to free all pages in a uvm_obj in one go.Bob Beck
Use this in the buffer cache to free all the pages from a buffer, resulting in a considerable speedup when throwing away pages from the buffer cache. Lots of work done with mlarkin and kettenis ok kettinis@ deraadt@
2019-11-29Split out the code that removes a page from uvm objects and clears the flagsMark Kettenis
into a separate uvm_pageclean() function and call it from uvm_pagefree(). ok mpi@, guenther@, beck@
2019-11-29Repurpose the "syscalls must be on a writeable page" mechanism toTheo de Raadt
enforce a new policy: system calls must be in pre-registered regions. We have discussed more strict checks than this, but none satisfy the cost/benefit based upon our understanding of attack methods, anyways let's see what the next iteration looks like. This is intended to harden (translation: attackers must put extra effort into attacking) against a mixture of W^X failures and JIT bugs which allow syscall misinterpretation, especially in environments with polymorphic-instruction/variable-sized instructions. It fits in a bit with libc/libcrypto/ld.so random relink on boot and no-restart-at-crash behaviour, particularily for remote problems. Less effective once on-host since someone the libraries can be read. For static-executables the kernel registers the main program's PIE-mapped exec section valid, as well as the randomly-placed sigtramp page. For dynamic executables ELF ld.so's exec segment is also labelled valid; ld.so then has enough information to register libc's exec section as valid via call-once msyscall(2) For dynamic binaries, we continue to to permit the main program exec segment because "go" (and potentially a few other applications) have embedded system calls in the main program. Hopefully at least go gets fixed soon. We declare the concept of embedded syscalls a bad idea for numerous reasons, as we notice the ecosystem has many of static-syscall-in-base-binary which are dynamically linked against libraries which in turn use libc, which contains another set of syscall stubs. We've been concerned about adding even one additional syscall entry point... but go's approach tends to double the entry-point attack surface. This was started at a nano-hackathon in Bob Beck's basement 2 weeks ago during a long discussion with mortimer trying to hide from the SSL scream-conversations, and finished in more comfortable circumstances next to a wood-stove at Elk Lakes cabin with UVM scream-conversations. ok guenther kettenis mortimer, lots of feedback from others conversations about go with jsing tb sthen
2019-11-28uvm_pagealloc_contig() doesn't exist and shouldn't existPhilip Guenther
ok kettenis@
2019-11-28Remove end of line whitespace.Mike Larkin
No code change.
2019-11-27Add dummy msyscall(2) system call which is currently a noop. This willTheo de Raadt
be used by kernel and ld.so in the near future. Adding the system call earlier will reduce the number of people who try to build through and encounter agony. ok kettenis guenther
2019-11-26Fix a panic string that had the wrong function name and an improperlyMike Larkin
wrapped line. No code change.
2019-11-26Fix a bunch of lines that had trailing whitespace.Mike Larkin
No code change.
2019-11-05Kill uvm_deallocate(9) and use uvm_unmap() directly.Martin Pieuchot
ok kettenis@, semarie@, deraadt@
2019-11-02Revert previous, a race is present and can be triggered with golang.Martin Pieuchot
Found by jsing@
2019-11-02Start documenting which locking primitives apply to uvm_map members.Martin Pieuchot
ok kettenis@
2019-11-01Push the KERNEL_LOCK() down in uvm_map_inentry().Martin Pieuchot
The lookup in uvm_map_inentry_fix() is already serialized by the vm_map_lock and such lookup is already executed w/o the KERNEL_LOCK(). ok kettenis@, deraadt@
2019-11-01Keep local function definitions in C files.Martin Pieuchot
2019-09-09Inform about system call memory write protection and stack mappingAlexander Bluhm
violations in system accounting. This will help to find missbehaving programs and possible attacks. The flags bit field is full, so recycle the PDP-11 compatibility on VAX. lastcomm(1) prints the AMAP flag as 'M'. daily(8) prints a list of affected processes. OK deraadt@
2019-07-18R.I.P. UVM_WAIT(). Use tsleep_nsec(9) directly.cheloha
UVM_WAIT() doesn't provide much of a useful abstraction. All callers tsleep forever and no callers set PCATCH, so only 2 of 4 parameters are actually used. Might as well just use tsleep_nsec(9) directly and make the uvm code a bit less specialized. Suggested by mpi@. ok mpi@ visa@ millert@
2019-07-03Add tsleep_nsec(9), msleep_nsec(9), and rwsleep_nsec(9).cheloha
Equivalent to their unsuffixed counterparts except that (a) they take a timeout in terms of nanoseconds, and (b) INFSLP, aka UINT64_MAX (not zero) indicates that a timeout should not be set. For now, zero nanoseconds is not a strictly valid invocation: we log a warning on DIAGNOSTIC kernels if we see such a call. We still sleep until the next tick in such a case, however. In the future this could become some sort of poll... TBD. To facilitate conversions to these interfaces: add inline conversion functions to sys/time.h for turning your timeout into nanoseconds. Also do a few easy conversions for warmup and to demonstrate how further conversions should be done. Lots of input from mpi@ and ratchov@. Additional input from tedu@, deraadt@, mortimer@, millert@, and claudio@. Partly inspired by FreeBSD r247787. positive feedback from deraadt@, ok mpi@
2019-07-01Document which mechanism protect some fields used w/o KERNEL_LOCK().Martin Pieuchot
ok visa@, semarie@
2019-06-21Make resource limit access MP-safe. So far, the copy-on-write sharingVisa Hankala
of resource limit structs has been done between processes. By applying copy-on-write also between threads, threads can read rlimits in a nearly lock-free manner. Inspired by code in DragonFly BSD and FreeBSD. OK mpi@, agreement from jmatthew@ and anton@
2019-06-14The addition of writeable-syscall checking near MAP_STACK checkingTheo de Raadt
damaged the error messages. Repair that, passing distinct format strings for the two cases. ok beck
2019-06-01Refactor the MAP_STACK feature, and introduce another similar variation:Theo de Raadt
Lookup the address that a syscall instruction is executed from, and kill the process if that page is writeable. This brings an aspect of W^X behaviour to W|X mappings (in JITs not yet adapted to W^X). The goal is to remove simple attack methods and force use of ret2libc or other more complicated means. ok kettenis stefan visa
2019-05-16Handle a bit more work without taking the kernel lock. This should avoidMark Kettenis
taking the kernel lock on when operating on the kernel_map when called from all kernel memory allocation interfaces. ok visa@, mlarkin@
2019-05-15free size for amap; ok visa@anton
2019-05-11move the noise about W^X mapping failure inside the sysctl kern.wxabortTheo de Raadt
knob, since we found a proram which tests RWX mapping then changes execution behaviour to non-W^X. (that program is chrome, as v8 is heading towards W^X compliance with mprotect RW/RX swaps, and also has jitless components in developent.) ok sthen kettenis robert
2019-05-10simplify logic after wakeup since this variable is only manipulatedBob Beck
under lock ok guenther@