Age | Commit message (Collapse) | Author |
|
there are no buffers on the dirty queue to clean.
ok beck@
|
|
as fix the case where buffers can be returned on the vinvalbuf path
and we do not get woken up when waiting for kva.
An earlier version looked at and ok'd by guenther@ in coimbra. - helpful
comments from kettenis@
|
|
be throwing away when growing the buffer cache - ok mlarkin@
|
|
A long time ago (in vienna) the reserves for the cleaner and syncer were
removed. softdep and many things have not performed ths same ever since.
Follow on generations of buffer cache hackers assumed the exising code
was the reference and have been in frustrating state of coprophagia ever
since.
This commit
0) Brings back a (small) reserve allotment of buffer pages, and the kva to
map them, to allow the cleaner and syncer to run even when under intense
memory or kva pressure.
1) Fixes a lot of comments and variables to represent reality.
2) Simplifies and corrects how the buffer cache backs off down to the lowest
level.
3) Corrects how the page daemons asks the buffer cache to back off, ensuring
that uvmpd_scan is done to recover inactive pages in low memory situaitons
4) Adds a high water mark to the pool used to allocate struct buf's
5) Correct the cleaner and the sleep/wakeup cases in both low memory and low
kva situations. (including accounting for the cleaner/syncer reserve)
Tested by many, with very much helpful input from deraadt, miod, tobiasu,
kettenis and others.
ok kettenis@ deraadt@ jj@
|
|
- Whitespace KNF
- Removal/fixing of old useless comments
- Removal of unused counter
- Removal of pointless test that had no effect
ok krw@
|
|
This change ensures that writes in flight from the buffer cache via bufq
are limited to a high water mark - when the limit is reached the writes sleep
until the amount of IO in flight reaches a low water mark. This avoids the
problem where userland can queue an unlimited amount of asynchronous writes
resulting in the consumption of all/most of our available buffer mapping kva,
and a long queue of writes to the disk.
ok kettenis@, krw@
|
|
- make sure the buffer reclaiming loop in buf_get() actually does something
but spin, if `backoffpages' is nonzero and all free queues have been drained.
- don't forget to set a poor man's condition variable to nonzero before
tsleeping on it in bufadjust(), otherwise you'll never get woken up.
- don't be too greedy and reassing backoffpages a large amount immediately
after bufadjust() has been called.
This fixes reproduceable hangs seen during heavy I/O (such as `make install'
of many large files, e.g. run in /usr/src/lib with NOMAN=) on systems with
a challenged number of pages (less than a few thousands, total).
Part of this is temporary bandaid until a better pressure logic is devised,
but it's solving an immediate problem. Been in snapshots for a solid month.
|
|
of per-rthread. Handling of per-thread tick and runtime counters
inspired by how FreeBSD does it.
ok kettenis@
|
|
remove some now useless statistics, and add some
relevant ones regarding kva usage in the cache.
make systat io and show bcstats in ddb both show
these counters.
ok deraadt@ krw@
|
|
(part missed from previous commit)
|
|
|
|
|
|
for now; that is unlikely to hit some of the remaining starvation bugs.
Repair the bufpages calculation too; i386 was doing it ahead of time
(incorrectly) and then re-calculating it.
ok thib
|
|
does not do what it purports to do, it shrinks mapping, not allocation, as
the pages have already been given away to other buffers. This also renames
the function to make this a little more obvious
and art should not name funcitons
ok thib@, art@
|
|
With this change bufcachepercent will be the percentage of dma reachable
memory that the buffer cache will attempt to use.
ok deraadt@ thib@ oga@
|
|
it is totally wrong to convert bdwrite into bawrite on the fly. this just
causes way bigger issues.
ok beck blambert
|
|
repair that situation. Darn newbies...
|
|
from its vnode's buffer cache in an interrupt context. Therefore we
need interrupt protection when searching the buffer red-black tree.
ok deraadt@, thib@, art@
|
|
a freed buf as that causes problems...
|
|
is causing havoc with vnds and release must be buildable.
|
|
and waits until all I/O currently on the queues has been completed. To get
I/O going again, call bufq_restart().
To be used for suspend/resume.
Joint effort with thib@, tedu@; tested by mlarkin@, marco@
|
|
ok beck@ krw@
|
|
1) fix buffer cache low water mark to allow for extremely low memory machines
without dying
2) Add "show bcstats" to ddb to allow for looking at the buffer cache statistics in ddb
ok art@ oga@
|
|
after c2k9
allows buffer cache to be extended and grow/shrink dynamically
tested by many, ok oga@, "why not just commit it" deraadt@
|
|
where doing bremfree() befure calling buf_acquire().
This is causing us headache pinning down a bug that showed up
when deraadt@ too cvs to current, and will have to be done
anyway as a preperation for backouts.
OK deraadt@
|
|
three
commits:
1) The sysctl allowing bufcachepercent to be changed at boot time.
2) The change moving the buffer cache hash chains to a red-black tree
3) The dynamic buffer cache (Which depended on the earlier too).
ok on the backout from marco and todd
|
|
Just put it in the buf_acquire function.
oga@ ok
|
|
This commit won't change the default behaviour of the system unless the
buffer cache size is increased with sysctl kern.bufcachepercent. By default
our buffer cache is 10% of memory, which with this commit is now treated
as a low water mark. If the buffer cache size is increased, the new size
is treated as a high water mark and the buffer cache is permitted to grow
to that percentage of memory.
If the page daemon is invoked, the page daemon will ask the buffer cache
to relenquish pages. if the buffer cache has more than the low water mark it
will relenquish pages allowing them to be consumed by uvm. after a short
period the buffer cache will attempt to re-grow back to the high water mark.
This permits the use of a large buffer cache without penalizing the available
memory for other purposes.
Above the low water mark the buffer cache remains entirely subservient to
the page daemon, so if uvm requires pages, the buffer cache will abandon
them.
ok art@ thib@ oga@
|
|
size on a running system.
ok art@, oga@
|
|
off the vnode.
ok art@, oga@, miod@
|
|
- getnewbuf dies. instead of having getnewbuf, buf_get, buf_stub and
buf_init we now have buf_get that is smaller than some of those
functions were before.
- Instead of allocating anonymous buffers and then freeing them if we
happened to lose the race to the hash, always allocate a buffer knowing
which <vnode, block> it will belong to.
- In cluster read, instead of allocating an anonymous buffer to cover
the whole read and then stubs for every buffer under it, make the
first buffer in the cluster cover the whole range and then shrink it
in the callback.
now, all buffers are always on the correct hash and we always know their
identity.
discussed with many, kettenis@ ok
|
|
this ensures we ignore counting any buffers returning through biodone()
for which B_PHYS has been set - which should be set on all transfers
that manually do raw io bypassing the buffer cache by setting up their
own buffer and calling strategy..
ok thib@, todd@, and now that he is a buffer cache and nfs hacker oga@
|
|
"keep b_proc set to the process, thats doing the io as advertised"
This broke dvd playing on my laptop (page fault trap in vmapbuf in the
physio path).
thib's cookie privileges are hereby suspended until further notice.
|
|
thats doing the io as advertised
closes PR3948
OK tedu@ (and blambert@ I think).
|
|
from getnewbuf() to buf_put(), since getnewbuf() does not directly
recycle buffers anymore. While at it, remove two lines of dead code
from getnewbuf(), which used to disassociate a vnode from a buffer.
"just go for it, because everyone had a chance" deraadt@.
|
|
In brelse, if we end up in the B_INVAL case without mappings, check
for B_WANTED and wake up the sleeper if there's one before freeing the
buffer. This shouldn't happen, but it looks like there might actually
be some dodgy corner cases in nfs where this could just happen if the
phase of the moon is right and the wind is blowing from the right
direction.
thib@ ok
|
|
ok thib beck art
|
|
file copies to nfsv2 causes the system to eventually peg the console.
On the console ^T indicates that the load is increasing rapidly, ddb
indicates many calls to getbuf, there is some very slow nfs traffic
making none (or extremely slow) progress. Eventually some machines
seize up entirely.
|
|
1) remove multiple size queues, introduced as a stopgap.
2) decouple pages containing data from their mappings
3) only keep buffers mapped when they actually have to be mapped
(right now, this is when buffers are B_BUSY)
4) New functions to make a buffer busy, and release the busy flag
(buf_acquire and buf_release)
5) Move high/low water marks and statistics counters into a structure
6) Add a sysctl to retrieve buffer cache statistics
Tested in several variants and beat upon by bob and art for a year. run
accidentally on henning's nfs server for a few months...
ok deraadt@, krw@, art@ - who promises to be around to deal with any fallout
|
|
and add some to be able to support statvfs(2). Do the compat dance
to provide backward compatibility. ok thib@ miod@
|
|
normally.
ok deraadt@ tedu@ otto@
|
|
daemon requires splbio when doing dirty buffer queue manipulation. Since
version 1.88 of vfs_bio.c, it was possible to break out of the processing
loop when the cleaner had been running long enough, and this early exit would
mean a future pass through would manipulate the buffer queues not at splbio.
This change corrects this.
ok krw@, deraadt@, tedu@, thib@
|
|
help and ok miod@ thib@
|
|
brings us back roughly to 4.1 level performance, although this is still
far from optimal as we have seen in a number of cases. This change
1) puts a lower bound on buffer cache queues to prevent starvation
2) fixes the code which looks for a buffer to recycle
3) reduces the number of vnodes back to 4.1 levels to avoid complex
performance issues better addressed after 4.2
ok art@ deraadt@, tested by many
|
|
than the hardware page size, as was the case in the old clustering code.
This fixes vnd reads on alpha and sparc64
On behalf of pedro@, ok art@
|
|
ok thib@
|
|
|
|
machines. ok deraadt@
|
|
|
|
moves memset from the 20th most expensive function in the kernel to the
331st when doing heavy io.
ok tedu@ thib@ pedro@ beck@ art@
|