blob: a494683c49a9f12daa76231e34f92f621a26b9bb (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
$OpenBSD: Pmap.notes,v 1.2 1997/01/12 15:12:10 downsj Exp $
$NetBSD: Pmap.notes,v 1.2 1994/10/26 07:22:54 cgd Exp $
Following are some observations about the the BSD hp300 pmap module that
may prove useful for other pmap modules:
1. pmap_remove should be efficient with large, sparsely populated ranges.
Profiling of exec/exit intensive work loads showed that much time was
being spent in pmap_remove. This was primarily due to calls from exec
when deallocating the stack segment. Since the current implementation
of the stack is to "lazy allocate" the maximum possible stack size
(typically 16-32mb) when the process is created, pmap_remove will be
called with a large chunk of largely empty address space. It is
important that this routine be able to quickly skip over large chunks
of allocated but unpopulated VA space. The hp300 pmap module did check
for unpopulated "segments" (which map 4mb chunks) and skipped them fairly
efficiently but once it found a valid segment descriptor (STE), it rather
clumsily moved forward over the PTEs mapping that segment. Particularly
bad was that for every PTE it would recheck that the STE was valid even
though we should already know that.
pmap_protect can benefit from similar optimizations though it is
(currently) not called with large regions.
Another solution would be to change the way stack allocation is done
(i.e. don't preallocate the entire address range) but I think it is
important to be able to efficiently support such large, spare ranges
that might show up in other applications (e.g. a randomly accessed
large mapped file).
2. Bit operations (i.e. ~,&,|) are more efficient than bitfields.
This is a 68k/gcc issue, but if you are trying to squeeze out maximum
performance...
3. Don't flush TLB/caches for inactive mappings.
On the hp300 the TLBs are either designed as, or used in such a way that,
they are flushed on every context switch (i.e. there are no "process
tags") Hence, doing TLB flushes on mappings that aren't associated with
either the kernel or the currently running process are a waste. Seems
pretty obvious but I missed it for many years. An analogous argument
applies to flushing untagged virtually addressed caches (ala the 320/350).
|