diff options
author | Stefan Fritsch <sf@cvs.openbsd.org> | 2014-10-06 20:34:59 +0000 |
---|---|---|
committer | Stefan Fritsch <sf@cvs.openbsd.org> | 2014-10-06 20:34:59 +0000 |
commit | d41036be6ee3c813374da5b0b78ba5df2ff92c97 (patch) | |
tree | be03285a14d85a20c60dd26ddc2a8d2f3e582ea4 /lib/libevent/Makefile | |
parent | 663dd3a733d1e1eb57165f93e9999a75de283023 (diff) |
Make amd64 pmap more efficient on multi-processor
With the current implementation, when accessing an inactive pmap, its
ptes are mapped in the APTE range. This has the problem that the APTE
range is mapped on all CPUs and changes to the APTE must therefore be
followed by a remote TLB flush on all CPUs. This is very inefficient
because the costs increase quadratically with the number of CPUs.
Therefore, the code is changed to remove the APTE mechanism completely
and instead switch the pmap locally. A remote TLB flush is then only
done if the pmap is in use on the remote CPU. In the common case, this
will replace one TLB flush on all CPUs with two local TLB flushes.
An additional optimization is done in cases where only a single PTE of
an inactive pmap is accessed: The requested PTE is found by walking the
page tables manually via the direct mapping. This makes some more TLB
flushes unnecessary.
Furthermore, some code is reordered so that the TLB-shootdown-IPIs are
sent first, then more local processing takes place, and only afterwards
the CPU waits for the remote TLB-shootdowns to finish.
This diff is based on a patch for i386 by Artur Grabowski <art blahonga org>
from 2008. Some additional bits were taken from a different patch by
Artur from 2005.
Tested by many. OK mlarkin@
Diffstat (limited to 'lib/libevent/Makefile')
0 files changed, 0 insertions, 0 deletions