summaryrefslogtreecommitdiff
path: root/sys/arch/hp300/DOC/Pmap.notes
blob: ad0cbc973cca3cd87734416881a5bee482d3540d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
$NetBSD: Pmap.notes,v 1.2 1994/10/26 07:22:54 cgd Exp $

Following are some observations about the the BSD hp300 pmap module that
may prove useful for other pmap modules:

1. pmap_remove should be efficient with large, sparsely populated ranges.

   Profiling of exec/exit intensive work loads showed that much time was
   being spent in pmap_remove.  This was primarily due to calls from exec
   when deallocating the stack segment.  Since the current implementation
   of the stack is to "lazy allocate" the maximum possible stack size
   (typically 16-32mb) when the process is created, pmap_remove will be
   called with a large chunk of largely empty address space.  It is
   important that this routine be able to quickly skip over large chunks
   of allocated but unpopulated VA space.  The hp300 pmap module did check
   for unpopulated "segments" (which map 4mb chunks) and skipped them fairly
   efficiently but once it found a valid segment descriptor (STE), it rather
   clumsily moved forward over the PTEs mapping that segment.  Particularly
   bad was that for every PTE it would recheck that the STE was valid even
   though we should already know that.

   pmap_protect can benefit from similar optimizations though it is
   (currently) not called with large regions.

   Another solution would be to change the way stack allocation is done
   (i.e. don't preallocate the entire address range) but I think it is
   important to be able to efficiently support such large, spare ranges
   that might show up in other applications (e.g. a randomly accessed
   large mapped file).

2. Bit operations (i.e. ~,&,|) are more efficient than bitfields.

   This is a 68k/gcc issue, but if you are trying to squeeze out maximum
   performance...

3. Don't flush TLB/caches for inactive mappings.

   On the hp300 the TLBs are either designed as, or used in such a way that,
   they are flushed on every context switch (i.e. there are no "process
   tags")  Hence, doing TLB flushes on mappings that aren't associated with
   either the kernel or the currently running process are a waste.  Seems
   pretty obvious but I missed it for many years.  An analogous argument
   applies to flushing untagged virtually addressed caches (ala the 320/350).