Age | Commit message (Collapse) | Author |
|
Change the conditional recognition algorithm:
scan for a sequence of alphabetic characters, hash it, and compare it against
a small table (using ohash functions).
This makes Cond_Eval entry more logical, and allows for some shortcuts in
recognizing .include, .for, .undef.
This also means that conditionals must have an intervening blank between
the keyword and the actual test, e.g.,
.ifA
will no longer work.
(but no-one actually uses this, and it's highly obfuscated)
Okay miod@.
|
|
- cut up those huge include files into separate interfaces for all modules.
Put the interface documentation there, and not with the implementation.
- light-weight includes for needed concrete types (lst_t.h, timestamp_t.h).
- cut out some more logically separate parts: cmd_exec, varname, parsevar,
timestamp.
- put all error handling functions together, so that we will be able to
clean them up.
- more systematic naming: functioni to handle interval, function to handle
string.
- put the init/end code apart to minimize coupling.
- kill weird types like ReturnStatus and Boolean. Use standard bool (with a
fallback for non-iso systems)
- better interface documentation for lots of subsystems.
As a result, make compilation goes somewhat faster (5%, even considering
the largish BSD copyrights to read). The corresponding preprocessed
source goes down from 1,5M to 1M.
A few minor code changes as well: Parse_DoVar is no longer destructive.
Parse_IsVar functionality is folded into Parse_DoVar (as it knows what an
assignment is), a few more interval handling functions. Avoid calling
XXX_End when they do nothing, just #define XXX_End to nothing.
Parse_DoVar is slightly more general: it will handle compound assignments
as long as they make sense, e.g., VAR +!= cmd
will work. As a side effect, VAR++=value now triggers an error
(two + in assignment).
- this stuff doesn't occur in portable Makefiles.
- writing VAR++ = value or VAR+ +=value disambiguates it.
- this is a good thing, it uncovered a bug in bsd.port.mk.
Tested by naddy@. Okayed millert@. I'll handle the fallback if there is
any. This went through a full make build anyways, including isakmpd
(without mickey's custom binutils, as he didn't see fit to share it with me).
|
|
Numerous changes:
- generate can build several tables
- style cleanup
- statistics code
- use variable names throughout (struct Name)
- recursive variables everywhere
- faster parser (pass buffer along instead of allocating multiple copies)
- correct parser. Handles comments everywhere, and ; correctly
- more string intervals
- simplified dir.c, less recursion.
- extended for loops
- sinclude()
- finished removing extra junk from Lst_*
- handles ${@D} and friends in a simpler way
- cleaned up and modular VarModifiers handling.
- recognizes some gnu Makefile usages and errors out about them.
Additionally, some extra functionality is defined by FEATURES. The set of
functionalities is currently hardcoded to OpenBSD defaults, but this may
include support for some NetBSD extensions, like ODE modifiers.
Backed by miod@ and millert@, who finally got sick of my endless patches...
|
|
|
|
Use the open hashing functions for global contexts instead of List in
var.c.
All the preliminary work to trim down local contexts means that we don't
suffer from the heavy initialization work that a hash table entails.
There is some make kludgery to:
- build the hashing functions as a library,
- recreate hashconsts.h, even if make depend was not invoked.
One point of the hashing scheme written was to separate the computation
of the hash function, and the hash lookup itself. This is very convenient
for make, because of those pesky special variables. hashconsts.h is there
to pre-hash the correct values, which replaces a few expensive string
comparisons with quick hash value comparisons, followed by one expensive
string comparison. The modulus MAGICSLOTS chosen in the Makefile is
ad-hoc: it is small enough to write a small switch without collision,
and will need changing if the hash function changes...
The function quick_lookup is the most important:
it either returns an index, for a local variable, or it does compute a
hashing value, and returns -1.
Another somewhat controversial decision is the use of string intervals.
This avoids either copying a string, or twiddling with a byte for cases
such as ${VAR}.
Finally, the variable name is stored within the variable itself. Since
a given variable name never changes, this makes sense. All that was needed
was a hash library with support for this. Note that the hashing table
holds only a variable pointer AND the corresponding hashing value, WITHOUT
a modulo hashtablesize. Two reasons:
- hash resizes can be done faster, without having to recompute hashing values.
- locality of access. The hash table fits into memory without problem. Once
a candidate slot is found, we check the complete hashing value. Probability
of a collision is very small (32 bits...). So bringing up the whole
variable in memory at once is good: the name will almost always match, in
which case we want the variable value as well, so it makes sense to put
them together.
The ohash functions implement open hashing, as described in Knuth, but with
a variable table size. Choosing powers of 2 sizes does not yield more
collisions, but it makes the hashing scheme much simpler. The thresholds at
which to expand/shrink the tables seem to work well in practice. The
default sizes were chosen such that the tables hardly ever shrink or expand
anyways (though I've tried with smaller/larger sizes to verify that the
shrinking/expanding worked correctly): larger Makefiles hold roughly
500/600 variables, which fits without trouble into a 1024-sized variable.
Disregard #ifdef STATS_HASH, this is some internal scaffolding I'm using
to measure make performance.
The only known issue with open-hashing is that deletions cannot create
empty slots, but do leave slots marked as `occupied once' so that lookup
works. We use a well-known optimization which records those pseudo-empty
slots while looking up values. If the value is not found, the pseudo-empty
slot is returned to be filled. If the value is found, it is swapped with
the pseudo-empty slot. This is an improvement in both cases, since this
shortens the length of lookup chains, eventually pushing the pseudo-empty
slots to the end.
Reviewed by millert@ and miod@
|