summaryrefslogtreecommitdiff
path: root/usr.bin/yacc/PSD.doc/ss5
blob: f2ee8d3b05f0a5b3b9352e313093fc3f31915e7d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
.\"	$OpenBSD: ss5,v 1.1 2002/12/03 21:36:35 mickey Exp $
.\"
.\" Copyright (C) Caldera International Inc.  2001-2002.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code and documentation must retain the above
.\"    copyright notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"    notice, this list of conditions and the following disclaimer in the
.\"    documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\"    must display the following acknowledgement:
.\"	This product includes software developed or owned by Caldera
.\"	International, Inc.
.\" 4. Neither the name of Caldera International, Inc. nor the names of other
.\"    contributors may be used to endorse or promote products derived from
.\"    this software without specific prior written permission.
.\"
.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA
.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR
.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
.\" IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE FOR ANY DIRECT,
.\" INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
.\" (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
.\" SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
.\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
.\" IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\" POSSIBILITY OF SUCH DAMAGE.
.\"
.\"	@(#)ss5	8.1 (Berkeley) 6/8/93
.\"
.SH
5: Ambiguity and Conflicts
.PP
A set of grammar rules is
.I ambiguous
if there is some input string that can be structured in two or more different ways.
For example, the grammar rule
.DS
expr	:	expr  \'\-\'  expr
.DE
is a natural way of expressing the fact that one way of forming an arithmetic expression
is to put two other expressions together with a minus sign between them.
Unfortunately, this grammar rule does not
completely specify the way that all complex inputs
should be structured.
For example, if the input is
.DS
expr  \-  expr  \-  expr
.DE
the rule allows this input to be structured as either
.DS
(  expr  \-  expr  )  \-  expr
.DE
or as
.DS
expr  \-  (  expr  \-  expr  )
.DE
(The first is called
.I "left association" ,
the second
.I "right association" ).
.PP
Yacc detects such ambiguities when it is attempting to build the parser.
It is instructive to consider the problem that confronts the parser when it is
given an input such as
.DS
expr  \-  expr  \-  expr
.DE
When the parser has read the second expr, the input that it has seen:
.DS
expr  \-  expr
.DE
matches the right side of the grammar rule above.
The parser could
.I reduce
the input by applying this rule;
after applying the rule;
the input is reduced to
.I expr (the
left side of the rule).
The parser would then read the final part of the input:
.DS
\-  expr
.DE
and again reduce.
The effect of this is to take the left associative interpretation.
.PP
Alternatively, when the parser has seen
.DS
expr  \-  expr
.DE
it could defer the immediate application of the rule, and continue reading
the input until it had seen
.DS
expr  \-  expr  \-  expr
.DE
It could then apply the rule to the rightmost three symbols, reducing them to
.I expr
and leaving
.DS
expr  \-  expr
.DE
Now the rule can be reduced once more; the effect is to
take the right associative interpretation.
Thus, having read
.DS
expr  \-  expr
.DE
the parser can do two legal things, a shift or a reduction, and has no way of
deciding between them.
This is called a
.I "shift / reduce conflict" .
It may also happen that the parser has a choice of two legal reductions;
this is called a
.I "reduce / reduce conflict" .
Note that there are never any ``Shift/shift'' conflicts.
.PP
When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser.
It does this by selecting one of the valid steps wherever it has a choice.
A rule describing which choice to make in a given situation is called
a
.I "disambiguating rule" .
.PP
Yacc invokes two disambiguating rules by default:
.IP 1.
In a shift/reduce conflict, the default is to do the shift.
.IP 2.
In a reduce/reduce conflict, the default is to reduce by the
.I earlier
grammar rule (in the input sequence).
.PP
Rule 1 implies that reductions are deferred whenever there is a choice,
in favor of shifts.
Rule 2 gives the user rather crude control over the behavior of the parser
in this situation, but reduce/reduce conflicts should be avoided whenever possible.
.PP
Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent,
require a more complex parser than Yacc can construct.
The use of actions within rules can also cause conflicts, if the action must
be done before the parser can be sure which rule is being recognized.
In these cases, the application of disambiguating rules is inappropriate,
and leads to an incorrect parser.
For this reason, Yacc
always reports the number of shift/reduce and reduce/reduce conflicts resolved by Rule 1 and Rule 2.
.PP
In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also
possible to rewrite the grammar rules so that the same inputs are read but there are no
conflicts.
For this reason, most previous parser generators
have considered conflicts to be fatal errors.
Our experience has suggested that this rewriting is somewhat unnatural,
and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts.
.PP
As an example of the power of disambiguating rules, consider a fragment from a programming
language involving an ``if-then-else'' construction:
.DS
stat	:	IF  \'(\'  cond  \')\'  stat
	|	IF  \'(\'  cond  \')\'  stat  ELSE  stat
	;
.DE
In these rules,
.I IF
and
.I ELSE
are tokens,
.I cond
is a nonterminal symbol describing
conditional (logical) expressions, and
.I stat
is a nonterminal symbol describing statements.
The first rule will be called the
.ul
simple-if
rule, and the
second the
.ul
if-else
rule.
.PP
These two rules form an ambiguous construction, since input of the form
.DS
IF  (  C1  )  IF  (  C2  )  S1  ELSE  S2
.DE
can be structured according to these rules in two ways:
.DS
IF  (  C1  )  {
	IF  (  C2  )  S1
	}
ELSE  S2
.DE
or
.DS
IF  (  C1  )  {
	IF  (  C2  )  S1
	ELSE  S2
	}
.DE
The second interpretation is the one given in most programming languages
having this construct.
Each
.I ELSE
is associated with the last preceding
``un-\fIELSE'\fRd''
.I IF .
In this example, consider the situation where the parser has seen
.DS
IF  (  C1  )  IF  (  C2  )  S1
.DE
and is looking at the
.I ELSE .
It can immediately
reduce
by the simple-if rule to get
.DS
IF  (  C1  )  stat
.DE
and then read the remaining input,
.DS
ELSE  S2
.DE
and reduce
.DS
IF  (  C1  )  stat  ELSE  S2
.DE
by the if-else rule.
This leads to the first of the above groupings of the input.
.PP
On the other hand, the
.I ELSE
may be shifted,
.I S2
read, and then the right hand portion of
.DS
IF  (  C1  )  IF  (  C2  )  S1  ELSE  S2
.DE
can be reduced by the if-else rule to get
.DS
IF  (  C1  )  stat
.DE
which can be reduced by the simple-if rule.
This leads to the second of the above groupings of the input, which
is usually desired.
.PP
Once again the parser can do two valid things \- there is a shift/reduce conflict.
The application of disambiguating rule 1 tells the parser to shift in this case,
which leads to the desired grouping.
.PP
This shift/reduce conflict arises only when there is a particular current input symbol,
.I ELSE ,
and particular inputs already seen, such as
.DS
IF  (  C1  )  IF  (  C2  )  S1
.DE
In general, there may be many conflicts, and each one
will be associated with an input symbol and
a set of previously read inputs.
The previously read inputs are characterized by the
state
of the parser.
.PP
The conflict messages of Yacc are best understood
by examining the verbose (\fB\-v\fR) option output file.
For example, the output corresponding to the above
conflict state might be:
.DS L
23: shift/reduce conflict (shift 45, reduce 18) on ELSE

state 23

	  stat  :  IF  (  cond  )  stat\_         (18)
	  stat  :  IF  (  cond  )  stat\_ELSE  stat

	 ELSE     shift 45
	 .        reduce 18

.DE
The first line describes the conflict, giving the state and the input symbol.
The ordinary state description follows, giving
the grammar rules active in the state, and the parser actions.
Recall that the underline marks the
portion of the grammar rules which has been seen.
Thus in the example, in state 23 the parser has seen input corresponding
to
.DS
IF  (  cond  )  stat
.DE
and the two grammar rules shown are active at this time.
The parser can do two possible things.
If the input symbol is
.I ELSE ,
it is possible to shift into state
45.
State 45 will have, as part of its description, the line
.DS
stat  :  IF  (  cond  )  stat  ELSE\_stat
.DE
since the
.I ELSE
will have been shifted in this state.
Back in state 23, the alternative action, described by ``\fB.\fR'',
is to be done if the input symbol is not mentioned explicitly in the above actions; thus,
in this case, if the input symbol is not
.I ELSE ,
the parser reduces by grammar rule 18:
.DS
stat  :  IF  \'(\'  cond  \')\'  stat
.DE
Once again, notice that the numbers following ``shift'' commands refer to other states,
while the numbers following ``reduce'' commands refer to grammar
rule numbers.
In the
.I y.output
file, the rule numbers are printed after those rules which can be reduced.
In most one states, there will be at most reduce action possible in the
state, and this will be the default command.
The user who encounters unexpected shift/reduce conflicts will probably want to
look at the verbose output to decide whether the default actions are appropriate.
In really tough cases, the user might need to know more about
the behavior and construction of the parser than can be covered here.
In this case, one of the theoretical references
.\".[
.\"Aho Johnson Surveys Parsing
.\".]
.\".[
.\"Aho Johnson Ullman Deterministic Ambiguous
.\".]
.\".[
.\"Aho Ullman Principles Design
.\".]
might be consulted; the services of a local guru might also be appropriate.