summaryrefslogtreecommitdiff
path: root/sys/kern/subr_pool.c
AgeCommit message (Collapse)Author
2019-07-19After the kernel has reached the sysclt kern.maxclusters limit,Alexander Bluhm
operations get stuck while holding the net lock. Increasing the limit did not help as there was no wakeup of the waiting pools. So introduce pool_wakeup() and run through the mbuf pool request list when the limit changes. OK dlg@ visa@
2019-04-23Remove file name and line number output from witness(4)Visa Hankala
Reduce code clutter by removing the file name and line number output from witness(4). Typically it is easy enough to locate offending locks using the stack traces that are shown in lock order conflict reports. Tricky cases can be tracked using sysctl kern.witness.locktrace=1 . This patch additionally removes the witness(4) wrapper for mutexes. Now each mutex implementation has to invoke the WITNESS_*() macros in order to utilize the checker. Discussed with and OK dlg@, OK mpi@
2019-02-10revert revert revert. there are many other archs that use custom allocs.Ted Unangst
2019-02-10if waitok flag is set, have the interrupt multipage allocator redirectTed Unangst
to the not interrupt allocator.
2019-02-10make it possible to reduce kmem pressure by letting some pools use a moreTed Unangst
accomodating allocator. an interrupt safe pool may also be used in process context, as indicated by waitok flags. thanks to the garbage collector, we can always free pages in process context. the only complication is where to put the pages. solve this by saving the allocation flags in the pool page header so the free function can examine them. not actually used in this diff. (coming soon.) arm testing and compile fixes from phessler
2018-06-08Constipate all the struct lock_type's so they go into .rodataPhilip Guenther
ok visa@
2018-02-06slightly randomize the order that new pages populate their item lists in.David Gwynne
ok tedu@ deraadt@
2018-01-18While booting it does not make sense to wait for memory, there isAlexander Bluhm
no other process which could free it. Better panic in malloc(9) or pool_get(9) instead of sleeping forever. tested by visa@ patrick@ Jan Klemkow suggested by kettenis@; OK deraadt@
2017-08-13New flag PR_RWLOCK for pool_init(9) makes the pool use rwlocks insteadPhilip Guenther
of mutexes. Use this immediately for the pool_cache futex pools. Mostly worked out with dlg@ during e2k17 ok mpi@ tedu@
2017-07-12Compute the level of contention only once.Visa Hankala
Suggested by and OK dlg@
2017-07-12When there is no contention on a pool cache lock, lower the numberVisa Hankala
of items that a cache list is allowed to hold. This lets the cache release resources back to the common pool after pressure on the cache has decreased. OK dlg@
2017-06-23set the alignment of the per cpu cache structures to CACHELINESIZE.David Gwynne
hardcoding 64 is too optimistic.
2017-06-23change the semantic for calculating when to grow the size of a cache list.David Gwynne
previously it would figure out if there's enough items overall for all the cpus to have full active an inactive free lists. this included currently allocated items, which pools wont actually hold on a free list and cannot predict when they will come back. instead, see if there's enough items in the idle lists in the depot that could instead go on all the free lists on the cpus. if there's enough idle items, then we can grow. tested by hrvoje popovski and amit kulkarni ok visa@
2017-06-19dynamically scale the size of the per cpu cache lists.David Gwynne
if the lock around the global depot of extra cache lists is contented a lot in between the gc task runs, consider growing the number of entries a free list can hold. the size of the list is bounded by the number of pool items the current set of pages can represent to avoid having cpus starve each other. im not sure this semantic is right (or the least worst) but we're putting it in now to see what happens. this also means reality matches the documentation i just committed in pool_cache_init.9. tested by hrvoje popovski and amit kulkarni ok visa@
2017-06-16add garbage collection of unused lists percpu cached items.David Gwynne
the cpu caches in pools amortise the cost of accessing global structures by moving lists of items around instead of individual items. excess lists of items are stored in the global pool struct, but these idle lists never get returned back to the system for use elsewhere. this adds a timestamp to the global idle list, which is updated when the idle list stops being empty. if the idle list hasn't been empty for a while, it means the per cpu caches arent using the idle entries and they can be recovered. timestamping the pages prevents recovery of a lot of items that may be used again shortly. eg, rx ring processing and replenishing from rate limited interrupts tends to allocate and free items in large chunks, which the timestamping smooths out. gc'ed lists are returned to the pool pages, which in turn get gc'ed back to uvm. ok visa@
2017-06-16split returning an item to the pool pages out of pool_put as pool_do_put.David Gwynne
this lets pool_cache_list_put return items to the pages. currently, if pool_cache_list_put is called while the per cpu caches are enabled, the items on the list will put put straight back onto another list in the cpu cache. this also avoids counting puts for these items twice. a put for the items have already been coutned when the items went to a cpu cache, it doesnt need to be counted again when it goes back to the pool pages. another side effect of this is that pool_cache_list_put can take the pool mutex once when returning all the items in the list with pool_do_put, rather than once per item. ok visa@
2017-06-15report contention on caches global data to userland.David Gwynne
2017-06-15white space tweaks. no functional change.David Gwynne
2017-06-15implement the backend of the sysctls that report pool cache info.David Gwynne
KERN_POOL_CACHE reports info about the global cache info, like how long the lists of cache items the cpus build should be and how many of these lists are idle on the pool struct. KERN_POOL_CACHE_CPUS reports counters from each each. the counters are for how many item and list operations the cache has handled on a cpu. the sysctl provides an array of ncpusfound * struct kinfo_pool_cache_cpu, not a single struct kinfo_pool_cache_cpu. tested by hrvoje popovski ok mikeb@ millert@ ----------------------------------------------------------------------
2017-06-13when enabling cpu caches, check the item size against the right thingDavid Gwynne
lists of free items on the per cpu caches are built out the pool items as struct pool_cache_items, not struct pool_cache. make the KASSERT in pool_cache_init check that properly.
2017-04-20Tweak lock inits to make the system runnable with witness(4)Visa Hankala
on amd64 and i386.
2017-02-20revert 1.206 because it allows deadlocks.David Gwynne
if the gc task is running on a cpu that handles interrupts it is possible to allow a deadlock. the gc task my be cleaning up a pool and holding its mutex when an non-MPSAFE interrupt arrives and tries to take the kernel lock. another cpu may already be holding the kernel lock when it then tries use the same pool thats the pool GC is currently processing. thanks to sthen@ and mpi@ for chasing this down.
2017-02-08the splvm() in pool_gc_pages is unecessary now.David Gwynne
all pools set their ipls unconditionally now, so there isn't a need to second guess them. pointed out by and ok jmatthew@
2017-01-24Force a context switch for every pool_get(9) with the PR_WAITOK flagMartin Pieuchot
if pool_debug is equal to 2, just like we do for malloc(9). ok dlg@
2016-11-21let pool page allocators advertise what sizes they can provide.David Gwynne
to keep things concise i let the multi page allocators provide multiple sizes of pages, but this feature was implicit inside pool_init and only usable if the caller of pool_init did not specify a page allocator. callers of pool_init can now suplly a page allocator that provides multiple page sizes. pool_init will try to fit 8 items onto a page still, but will scale its page size down until it fits into what the allocator provides. supported page sizes are specified as a bit field in the pa_pagesz member of a pool_allocator. setting the low bit in that word indicates that the pages can be aligned to their size.
2016-11-07rename some types and functions to make the code easier to read.David Gwynne
pool_item_header is now pool_page_header. the more useful change is pool_list is now pool_cache_item. that's what items going into the per cpu pool caches are cast to, and they get linked together to make a list. the functions operating on what is now pool_cache_items have been renamed to make it more obvious what they manipulate.
2016-11-02poison the TAILQ_ENTRY in items in the per cpu pool cache.David Gwynne
2016-11-02add poisoning of items on the per cpu caches.David Gwynne
it copies the existing pool code, except it works on pool_list structures instead of pool_item structures. after this id like to poison the words used by the TAILQ_ENTRY in the pool_list struct that arent used until a list of items is moved into the global depot.
2016-11-02use a TAILQ to maintain the list of item lists used by the percpu code.David Gwynne
it makes it more readable, and fixes a bug in pool_list_put where it was returning the next item in the current list rather than the next list to be freed.
2016-11-02add per cpu caches for free pool items.David Gwynne
this is modelled on whats described in the "Magazines and Vmem: Extending the Slab Allocator to Many CPUs and Arbitrary Resources" paper by Jeff Bonwick and Jonathan Adams. the main semantic borrowed from the paper is the use of two lists of free pool items on each cpu, and only moving one of the lists in and out of a global depot of free lists to mitigate against a cpu thrashing against that global depot. unlike slabs, pools do not maintain or cache constructed items, which allows us to use the items themselves to build the free list rather than having to allocate arrays to point at constructed pool items. the per cpu caches are build on top of the cpumem api. this has been kicked a bit by hrvoje popovski and simon mages (thank you). im putting it in now so it is easier to work on and test. ok jmatthew@
2016-09-15all pools have their ipl set via pool_setipl, so fold it into pool_init.David Gwynne
the ioff argument to pool_init() is unused and has been for many years, so this replaces it with an ipl argument. because the ipl will be set on init we no longer need pool_setipl. most of these changes have been done with coccinelle using the spatch below. cocci sucks at formatting code though, so i fixed that by hand. the manpage and subr_pool.c bits i did myself. ok tedu@ jmatthew@ @ipl@ expression pp; expression ipl; expression s, a, o, f, m, p; @@ -pool_init(pp, s, a, o, f, m, p); -pool_setipl(pp, ipl); +pool_init(pp, s, a, ipl, f, m, p);
2016-09-15move pools to using the subr_tree version of rb treesDavid Gwynne
this is half way to recovering the space used by the subr_tree code.
2016-09-05revert moving pools from tree.h to subr_tree.c rb trees.David Gwynne
itll go in again when i dont break userland.
2016-09-05move pool red-black trees from tree.h code to subr_tree.c codeDavid Gwynne
ok tedu@
2016-01-15add a "show socket" command to ddbDavid Gwynne
should help inspecting socket issues in the future. enthusiasm from mpi@ bluhm@ deraadt@
2015-09-11Now that interrupt-safe uvm maps are porperly locked, the interrupt-safeMark Kettenis
multi page backend allocator implementation no longer needs to grab the kernel lock. ok mlarkin@, dlg@
2015-09-08Give the pool page allocator backends more sensible names. We now have:Mark Kettenis
* pool_allocator_single: single page allocator, always interrupt safe * pool_allocator_multi: multi-page allocator, interrupt safe * pool_allocator_multi_ni: multi-page allocator, not interrupt-safe ok deraadt@, dlg@
2015-09-08Now that msleep(9) no longer requires the kernel lock (as long as PCATCHMark Kettenis
isn't specified) the default backend allocator implementation no longer needs to grab the kernel lock. ok visa@, guenther@
2015-09-06We no longer need to grab the kernel lock for allocating and freeing pagesMark Kettenis
in the (default) single page pool backend allocator. This means it is now safe to call pool_get(9) and pool_put(9) for "small" items while holding a mutex without holding the kernel lock as well as these functions will no longer acquire the kernel lock under any circumstances. For "large" items (where large is larger than 1/8th of a page) this still isn't safe though. ok dlg@
2015-09-01Push down the KERNEL_LOCK/KERNEL_UNLOCK calls into the back-end allocatorMark Kettenis
functions. Note that these calls are deliberately not added to the special-purpose back-end allocators in the various pmaps. Those allocators either don't need to grab the kernel lock, are always called with the kernel lock already held, or are only used on non-MULTIPROCESSOR platforms. pk tedu@, deraadt@, dlg@
2015-08-21re-enable *8.David Gwynne
if we're allowed to try and use large pages, we try and fit at least 8 of the items. this amortises the per page cost of an item a bit. "be careful" deraadt@
2015-07-23remove the POOL_NEEDS_CATCHUP macro, it isnt used.David Gwynne
from martin natano
2015-07-20Move `ticks' declaration to sys/kernel.h.Masao Uebayashi
2015-04-21disable *8 again for now. incoherent archs arent having much fun with it.David Gwynne
2015-04-07nothing uses pool_sleep, so get rid of itDavid Gwynne
2015-04-07introduce a garbage collector for (very) idle pool pages.David Gwynne
now that idle pool pages are timestamped we can tell how long theyve been idle. this adds a task that runs every second that iterates over all the pools looking for pages that have been idle for 8 seconds so it can free them. this idea probably came from a conversation with tedu@ months ago. ok tedu@ kettenis@
2015-03-20reintroduce r1.173:David Gwynne
> if we're able to use large page allocators, try and place at least > 8 items on a page. this reduces the number of allocator operations > we have to do per item on large items. this was backed out because of fallout on landisk which has since been fixed. putting this in again early in the cycle so we can look for more fallout. hopefully it will stick. ok deraadt@
2015-03-14Remove some includes include-what-you-use claims don'tJonathan Gray
have any direct symbols used. Tested for indirect use by compiling amd64/i386/sparc64 kernels. ok tedu@ deraadt@
2015-02-10reintroduce page item cache colouring.David Gwynne
if you're having trouble understanding what this helps, imagine your cpus caches are a hash table. by moving the base address of items around (colouring them), you give it more bits to hash with. in turn that makes it less likely that you will overflow buckets in your hash. i mean cache. it was inadvertantly removed in my churn of this subsystem, but as tedu has said on this issue: > The history of pool is filled with features getting trimmed because they > seemed unnecessary or in the way, only to later discover how important they > were. Having slowly learned that lesson, I think our default should be "if > bonwick says do it, we do it" until proven otherwise. until proven otherwise we can keep the functionality, especially as the code cost is minimal. ok many including tedu@ guenther@ deraadt@ millert@
2015-01-22pool_chk_page iterates over a pages free item lists and checks thatDavid Gwynne
the items address is within the page. it does that by masking the item address with the page mask and comparing that to the page address. however, if we're using large pages with external page headers, we dont request that the large page be aligned to its size. eg, on an arch with 4k pages, an 8k large page could be aligned to 4k, so masking bits to get the page address wont work. these incorrect checks were distracting while i was debugging large pages on landisk. this changes it to do range checks to see if the item is within the page. it also checks if the item is on the page before checking if its magic values or poison is right. ok miod@