summaryrefslogtreecommitdiff
path: root/sbin/bim
diff options
context:
space:
mode:
authorMarc Espie <espie@cvs.openbsd.org>2000-06-23 16:27:30 +0000
committerMarc Espie <espie@cvs.openbsd.org>2000-06-23 16:27:30 +0000
commit3be881f38f781744cfe617c28ce0fad7f282d1fe (patch)
tree1980bbdf6f04a278478c7cf15e04636aba4fbd78 /sbin/bim
parent239c19f2c09a9a0d658c57e5bdffe8403f0bc7fb (diff)
This is the speed-up patch, which doubles make speed (almost).
Use the open hashing functions for global contexts instead of List in var.c. All the preliminary work to trim down local contexts means that we don't suffer from the heavy initialization work that a hash table entails. There is some make kludgery to: - build the hashing functions as a library, - recreate hashconsts.h, even if make depend was not invoked. One point of the hashing scheme written was to separate the computation of the hash function, and the hash lookup itself. This is very convenient for make, because of those pesky special variables. hashconsts.h is there to pre-hash the correct values, which replaces a few expensive string comparisons with quick hash value comparisons, followed by one expensive string comparison. The modulus MAGICSLOTS chosen in the Makefile is ad-hoc: it is small enough to write a small switch without collision, and will need changing if the hash function changes... The function quick_lookup is the most important: it either returns an index, for a local variable, or it does compute a hashing value, and returns -1. Another somewhat controversial decision is the use of string intervals. This avoids either copying a string, or twiddling with a byte for cases such as ${VAR}. Finally, the variable name is stored within the variable itself. Since a given variable name never changes, this makes sense. All that was needed was a hash library with support for this. Note that the hashing table holds only a variable pointer AND the corresponding hashing value, WITHOUT a modulo hashtablesize. Two reasons: - hash resizes can be done faster, without having to recompute hashing values. - locality of access. The hash table fits into memory without problem. Once a candidate slot is found, we check the complete hashing value. Probability of a collision is very small (32 bits...). So bringing up the whole variable in memory at once is good: the name will almost always match, in which case we want the variable value as well, so it makes sense to put them together. The ohash functions implement open hashing, as described in Knuth, but with a variable table size. Choosing powers of 2 sizes does not yield more collisions, but it makes the hashing scheme much simpler. The thresholds at which to expand/shrink the tables seem to work well in practice. The default sizes were chosen such that the tables hardly ever shrink or expand anyways (though I've tried with smaller/larger sizes to verify that the shrinking/expanding worked correctly): larger Makefiles hold roughly 500/600 variables, which fits without trouble into a 1024-sized variable. Disregard #ifdef STATS_HASH, this is some internal scaffolding I'm using to measure make performance. The only known issue with open-hashing is that deletions cannot create empty slots, but do leave slots marked as `occupied once' so that lookup works. We use a well-known optimization which records those pseudo-empty slots while looking up values. If the value is not found, the pseudo-empty slot is returned to be filled. If the value is found, it is swapped with the pseudo-empty slot. This is an improvement in both cases, since this shortens the length of lookup chains, eventually pushing the pseudo-empty slots to the end. Reviewed by millert@ and miod@
Diffstat (limited to 'sbin/bim')
0 files changed, 0 insertions, 0 deletions