Age | Commit message (Collapse) | Author |
|
Imagine lots of random small mappings (think malloc(3)) and sometimes
one large mapping (network buffer). If we've filled up our address space
enough, the random address picked for the large allocation is likely to
be overlapping an existing small allocation, so we'll do a linear scan
to find the next free address. That next free address is likely to
be just after a small allocation. Those two map entires get merged.
If we now allocate an amap for the merged map entry, it will be large.
When we later free the large allocation the amap is not truncated. All
these are design decisions that made sense for sbrk, but with random
allocations and malloc that actually returns memory, this really hurt us.
This is the reason why certain processes like apache and sendmail could
eat more than 10 times as much amap memory as they needed, eventually
hitting the malloc limit and hanging or running the machine out of
kmem_map and crashing.
otto@ ok
|
|
Found by LLVM/Clang Static Analyzer.
ok miod@ art@
|
|
Found by LLVM/Clang Static Analyzer.
"Right." miod@
|
|
allocate a single malloc chunk instead of three and allocate a single
slot for a single page instead of four slots. ok miod@ tedu@ @deraadt
|
|
"looks sane to me" otto@, ok miod@
|
|
|
|
|
|
parameter and returns an aligned random load address for position
independent executables to use. This also adds three new vmparam.h
defines to specify the maximum address, minimum address and minimum
allowed alignment for uvm_map_pie() to use. The PIE address range
for i386 was carefully selected to work well within the i386 W^X
framework.
With much help and feedback from weingart@.
okay weingart@, miod@, kettenis@, drahn@
|
|
1. When checking if the pagedaemon should be awakened and to see how
much work it should do, consider the buffer cache deficit
(how much pages the buffer cache can eat max vs. how much it has
now) as pages that are not free. They are actually still usable by
the allocator, but the presure on the pagedaemon is increased when
we starting to chew into the memory that the buffer cache wants to
use.
2. Remove the stupid 512kB limit of how much memory should be our
free target. That maybe made sense on 68k, but on modern systems
512k is just a joke. Keep it at 3% of physical memory just like
it was meant to be.
3. When doing allocations for the pagedaemon, always let it use the
reserve. the whole UVM_OBJ_IS_KERN_OBJECT is silly and doesn't
work in most cases anyway. We still don't have a reserve for
the pagedaemon in the km_page allocator, but this seems to help
enough. (yes, there are still bad cases in that code and the comment
is only half-true, the whole section needs a massage, but that will
happen later, this diff only touches pagedaemon parts)
Testing by many, prodded by theo.
|
|
with, trying to find free pages matching the callers requirement.
However, on systems with noncontiguous memory and large gaps between
segments, this is a disaster as soon as one of these gaps is hit.
Rewrite the logic by iterating on the physsegs, and the on the intersection
of the physseg range and the callers range. This also frees us from having
to check whether a given page range crosses a physseg.
|
|
Not sure what's more surprising: how long it took for NetBSD to
catch up to the rest of the BSDs (including UCB), or the amount of
code that NetBSD has claimed for itself without attributing to the
actual authors.
OK deraadt@
|
|
example an ioctl that loads bazillions of entries into a pf table) it
would exhaust the pool of free pages and not let uvm_km_thread catch
up until the pool was actually empty. This could be bad for non-sleeping
allocators since they can't wait for the memory while the big hog
can.
Instead of letting the syscall exhaust the pool, detect when we fall below
the low watermark, wake the thread, sleep once and let the thread
catch up. This paces the huge consumer so that the more critical consumers
never find an exhausted pool of pages.
"seems reasonable" kettenis@
|
|
ok thib beck art
|
|
file copies to nfsv2 causes the system to eventually peg the console.
On the console ^T indicates that the load is increasing rapidly, ddb
indicates many calls to getbuf, there is some very slow nfs traffic
making none (or extremely slow) progress. Eventually some machines
seize up entirely.
|
|
biowait() reads that do *not* come from the buffer cache - we use the
B_RAW flag to identify these at art's suggestion - since it makes sense
and the flag was not being used. this just flags all these buffers with
B_RAW - biodone already ignores returned buffers marked B_RAW.
ok art@
|
|
a new etype, UVM_ET_HOLE, meaning it has no backend.
UVM_ET_HOLE entries (which should be created as UVM_PROT_NONE and with
UVM_FLAG_NOMERGE and UVM_FLAG_HOLE) are skipped in uvm_unmap_remove(), so
that pmap_{k,}remove() is not called on the entry.
This is intended to save time, and behave better, on pmaps with MMU holes
at process exit time.
ok art@, kettenis@ provided feedback as well.
|
|
gets correctly encrypted if the swap isn't a multiple of 128 pages.
ok deraadt@
|
|
ifdef netbsd block in drm code, but oga@ says he'll remove
it soon...
OK art@, oga@;
|
|
|
|
uvm_swap_initcrypt. The number of available pages may not match, if we
are using a miniroot in the swap partition.
|
|
|
|
|
|
through signal handlers with gdb.
ok miod@
|
|
memory map is fragmented. Avoids ridiculously large core dumps.
ok miod@
|
|
Proper casts should be added to all invocations of ptoa() before this cast
can be removed again.
ok toby@, marco@, miod@
|
|
Has been in snapshots for a short while.
|
|
returning EINVAL, you'll get ENOSYS. No serious code has used this system
call in at least fifteen years.
The libc stub will be removed at the next major crank time.
ok henning@ deraadt@ krw@ toby@
|
|
and supposed to be only used from within ddb.
|
|
macros that just expand into the mutex functions
to keep the abstraction, do assorted cleanup.
ok miod@,art@
|
|
where core dumps on hppa were missing the last stack page.
ok miod@
|
|
|
|
|
|
|
|
|
|
fixed size array which size should match any buf; if a bogus buf is passed
to this function, the kernel will KASSERT instead of potentially running out
of stack and having an undefined behaviour.
ok deraadt@
|
|
ok krw@
|
|
- Move the functionality of choosing a process from cpu_switch into
a much simpler function: cpu_switchto. Instead of having the locore
code walk the run queues, let the MI code choose the process we
want to run and only implement the context switching itself in MD
code.
- Let MD context switching run without worrying about spls or locks.
- Instead of having the idle loop implemented with special contexts
in MD code, implement one idle proc for each cpu. make the idle
loop MI with MD hooks.
- Change the proc lists from the old style vax queues to TAILQs.
- Change the sleep queue from vax queues to TAILQs. This makes
wakeup() go from O(n^2) to O(n)
there will be some MD fallout, but it will be fixed shortly.
There's also a few cleanups to be done after this.
deraadt@, kettenis@ ok
|
|
vnode locking actually works, just check the VLOCKSWORK
flag. Also, change this ifdef DEBUG to VFSDEBUG since
VLOCKSWORK is only ever set if VFSDEBUG is defined.
ok/input miod@, art@ (earlier diff)
|
|
help and ok miod@ thib@
|
|
the holes a MMU may have from a given vm_map. This will be automagically
invoked for newly created vmspaces.
On platforms with MMU holes (e.g. sun4, sun4c and vax), this prevents
mmap(2) hints which would end up being in the hole to be accepted as valid,
causing unexpected signals when the process tries to access the hole
(since pmap can not fill the hole anyway).
Unfortunately, the logic mmap() uses to pick a valid address for anonymous
mappings needs work, as it will only try to find an address higher than the
hint, which causes all mmap() with a hint in the hole to fail on vax. This
will be improved later.
|
|
eyeballed and ok dlg@
|
|
version for i386
more architectures and ctob() replacement is being worked on
prodded by and ok miod
|
|
some comment cleanup and a touch of KNF.
ok art@
|
|
when we hit swap before actually fully populating the buffer cache which
would lead to deadlocks.
From pedro, tested by many, deraadt@ ok
|
|
|
|
ckuethe@ for a while. Okay beck@, "it is good timing" deraadt@.
|
|
"reads ok" dlg@
|
|
type of all variables to daddr64_t. this includes the APIs for XXsize()
and XXdump(), all range checks inside bio drivers, internal variables
for disklabel handling, and even uvm's swap offsets. re-read numerous
times by otto, miod, krw, thib to look for errors
|
|
to size. tested on almost all machines, double checked by miod and krw
next comes the type handling surrounding these values
|
|
ok art bob
|