1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|
@c Copyright (C) 1999 Free Software Foundation, Inc.
@c This is part of the G77 manual.
@c For copying conditions, see the file g77.texi.
@node Front End
@chapter Front End
@cindex GNU Fortran Front End (FFE)
@cindex FFE
@cindex @code{g77}, front end
@cindex front end, @code{g77}
This chapter describes some aspects of the design and implementation
of the @code{g77} front end.
Much of the information below applies not to current
releases of @code{g77},
but to the 0.6 rewrite being designed and implemented
as of late May, 1999.
To find about things that are ``To Be Determined'' or ``To Be Done'',
search for the string TBD.
If you want to help by working on one or more of these items,
email me at @email{@value{email-burley}}.
If you're planning to do more than just research issues and offer comments,
see @uref{http://www.gnu.org/software/contribute.html} for steps you might
need to take first.
@menu
* Overview of Sources::
* Overview of Translation Process::
* Philosophy of Code Generation::
* Two-pass Design::
* Challenges Posed::
* Transforming Statements::
* Transforming Expressions::
* Internal Naming Conventions::
@end menu
@node Overview of Sources
@section Overview of Sources
The current directory layout includes the following:
@table @file
@item @value{srcdir}/gcc/
Non-g77 files in gcc
@item @value{srcdir}/gcc/f/
GNU Fortran front end sources
@item @value{srcdir}/libf2c/
@code{libg2c} configuration and @code{g2c.h} file generation
@item @value{srcdir}/libf2c/libF77/
General support and math portion of @code{libg2c}
@item @value{srcdir}/libf2c/libI77/
I/O portion of @code{libg2c}
@item @value{srcdir}/libf2c/libU77/
Additional interfaces to Unix @code{libc} for @code{libg2c}
@end table
Components of note in @code{g77} are described below.
@file{f/} as a whole contains the source for @code{g77},
while @file{libf2c/} contains a portion of the separate program
@code{f2c}.
Note that the @code{libf2c} code is not part of the program @code{g77},
just distributed with it.
@file{f/} contains text files that document the Fortran compiler, source
files for the GNU Fortran Front End (FFE), and some other stuff.
The @code{g77} compiler code is placed in @file{f/} because it,
along with its contents,
is designed to be a subdirectory of a @code{gcc} source directory,
@file{gcc/},
which is structured so that language-specific front ends can be ``dropped
in'' as subdirectories.
The C++ front end (@code{g++}), is an example of this---it resides in
the @file{cp/} subdirectory.
Note that the C front end (also referred to as @code{gcc})
is an exception to this, as its source files reside
in the @file{gcc/} directory itself.
@file{libf2c/} contains the run-time libraries for the @code{f2c} program,
also used by @code{g77}.
These libraries normally referred to collectively as @code{libf2c}.
When built as part of @code{g77},
@code{libf2c} is installed under the name @code{libg2c} to avoid
conflict with any existing version of @code{libf2c},
and thus is often referred to as @code{libg2c} when the
@code{g77} version is specifically being referred to.
The @code{netlib} version of @code{libf2c/}
contains two distinct libraries,
@code{libF77} and @code{libI77},
each in their own subdirectories.
In @code{g77}, this distinction is not made,
beyond maintaining the subdirectory structure in the source-code tree.
@file{libf2c/} is not part of the program @code{g77},
just distributed with it.
It contains files not present
in the official (@code{netlib}) version of @code{libf2c},
and also contains some minor changes made from @code{libf2c},
to fix some bugs,
and to facilitate automatic configuration, building, and installation of
@code{libf2c} (as @code{libg2c}) for use by @code{g77} users.
See @file{libf2c/README} for more information,
including licensing conditions
governing distribution of programs containing code from @code{libg2c}.
@code{libg2c}, @code{g77}'s version of @code{libf2c},
adds Dave Love's implementation of @code{libU77},
in the @file{libf2c/libU77/} directory.
This library is distributed under the
GNU Library General Public License (LGPL)---see the
file @file{libf2c/libU77/COPYING.LIB}
for more information,
as this license
governs distribution conditions for programs containing code
from this portion of the library.
Files of note in @file{f/} and @file{libf2c/} are described below:
@table @file
@item f/BUGS
Lists some important bugs known to be in g77.
Or use Info (or GNU Emacs Info mode) to read
the ``Actual Bugs'' node of the @code{g77} documentation:
@smallexample
info -f f/g77.info -n "Actual Bugs"
@end smallexample
@item f/ChangeLog
Lists recent changes to @code{g77} internals.
@item libf2c/ChangeLog
Lists recent changes to @code{libg2c} internals.
@item f/NEWS
Contains the per-release changes.
These include the user-visible
changes described in the node ``Changes''
in the @code{g77} documentation, plus internal
changes of import.
Or use:
@smallexample
info -f f/g77.info -n News
@end smallexample
@item f/g77.info*
The @code{g77} documentation, in Info format,
produced by building @code{g77}.
All users of @code{g77} (not just installers) should read this,
using the @code{more} command if neither the @code{info} command,
nor GNU Emacs (with its Info mode), are available, or if users
aren't yet accustomed to using these tools.
All of these files are readable as ``plain text'' files,
though they're easier to navigate using Info readers
such as @code{info} and GNU Emacs Info mode.
@end table
If you want to explore the FFE code, which lives entirely in @file{f/},
here are a few clues.
The file @file{g77spec.c} contains the @code{g77}-specific source code
for the @code{g77} command only---this just forms a variant of the
@code{gcc} command, so,
just as the @code{gcc} command itself does not contain the C front end,
the @code{g77} command does not contain the Fortran front end (FFE).
The FFE code ends up in an executable named @file{f771},
which does the actual compiling,
so it contains the FFE plus the @code{gcc} back end (GBE),
the latter to do most of the optimization, and the code generation.
The file @file{parse.c} is the source file for @code{yyparse()},
which is invoked by the GBE to start the compilation process,
for @file{f771}.
The file @file{top.c} contains the top-level FFE function @code{ffe_file}
and it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*},
and @samp{FFE_[A-Za-z].*} symbols.
The file @file{fini.c} is a @code{main()} program that is used when building
the FFE to generate C header and source files for recognizing keywords.
The files @file{malloc.c} and @file{malloc.h} comprise a memory manager
that defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and
@samp{MALLOC_[A-Za-z].*} symbols.
All other modules named @var{xyz}
are comprised of all files named @samp{@var{xyz}*.@var{ext}}
and define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*},
and @samp{FFE@var{XYZ}_[A-Za-z].*} symbols.
If you understand all this, congratulations---it's easier for me to remember
how it works than to type in these regular expressions.
But it does make it easy to find where a symbol is defined.
For example, the symbol @samp{ffexyz_set_something} would be defined
in @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}.
The ``porting'' files of note currently are:
@table @file
@item proj.c
@itemx proj.h
This defines the ``language'' used by all the other source files,
the language being Standard C plus some useful things
like @code{ARRAY_SIZE} and such.
@item target.c
@itemx target.h
These describe the target machine
in terms of what data types are supported,
how they are denoted
(to what C type does an @code{INTEGER*8} map, for example),
how to convert between them,
and so on.
Over time, versions of @code{g77} rely less on this file
and more on run-time configuration based on GBE info
in @file{com.c}.
@item com.c
@itemx com.h
These are the primary interface to the GBE.
@item ste.c
@itemx ste.h
This contains code for implementing recognized executable statements
in the GBE.
@item src.c
@itemx src.h
These contain information on the format(s) of source files
(such as whether they are never to be processed as case-insensitive
with regard to Fortran keywords).
@end table
If you want to debug the @file{f771} executable,
for example if it crashes,
note that the global variables @code{lineno} and @code{input_filename}
are usually set to reflect the current line being read by the lexer
during the first-pass analysis of a program unit and to reflect
the current line being processed during the second-pass compilation
of a program unit.
If an invocation of the function @code{ffestd_exec_end} is on the stack,
the compiler is in the second pass, otherwise it is in the first.
(This information might help you reduce a test case and/or work around
a bug in @code{g77} until a fix is available.)
@node Overview of Translation Process
@section Overview of Translation Process
The order of phases translating source code to the form accepted
by the GBE is:
@enumerate
@item
Stripping punched-card sources (@file{g77stripcard.c})
@item
Lexing (@file{lex.c})
@item
Stand-alone statement identification (@file{sta.c})
@item
Parsing (@file{stb.c} and @file{expr.c})
@item
Constructing (@file{stc.c})
@item
Collecting (@file{std.c})
@item
Expanding (@file{ste.c})
@end enumerate
To get a rough idea of how a particularly twisted Fortran statement
gets treated by the passes, consider:
@smallexample
FORMAT(I2 4H)=(J/
& I3)
@end smallexample
The job of @file{lex.c} is to know enough about Fortran syntax rules
to break the statement up into distinct lexemes without requiring
any feedback from subsequent phases:
@smallexample
`FORMAT'
`('
`I24H'
`)'
`='
`('
`J'
`/'
`I3'
`)'
@end smallexample
The job of @file{sta.c} is to figure out the kind of statement,
or, at least, statement form, that sequence of lexemes represent.
The sooner it can do this (in terms of using the smallest number of
lexemes, starting with the first for each statement), the better,
because that leaves diagnostics for problems beyond the recognition
of the statement form to subsequent phases,
which can usually better describe the nature of the problem.
In this case, the @samp{=} at ``level zero''
(not nested within parentheses)
tells @file{sta.c} that this is an @emph{assignment-form},
not @code{FORMAT}, statement.
An assignment-form statement might be a statement-function
definition or an executable assignment statement.
To make that determination,
@file{sta.c} looks at the first two lexemes.
Since the second lexeme is @samp{(},
the first must represent an array for this to be an assignment statement,
else it's a statement function.
Either way, @file{sta.c} hands off the statement to @file{stb.c}
(either its statement-function parser or its assignment-statement parser).
@file{stb.c} forms a
statement-specific record containing the pertinent information.
That information includes a source expression and,
for an assignment statement, a destination expression.
Expressions are parsed by @file{expr.c}.
This record is passed to @file{stc.c},
which copes with the implications of the statement
within the context established by previous statements.
For example, if it's the first statement in the file
or after an @code{END} statement,
@file{stc.c} recognizes that, first of all,
a main program unit is now being lexed
(and tells that to @file{std.c}
before telling it about the current statement).
@file{stc.c} attaches whatever information it can,
usually derived from the context established by the preceding statements,
and passes the information to @file{std.c}.
@file{std.c} saves this information away,
since the GBE cannot cope with information
that might be incomplete at this stage.
For example, @samp{I3} might later be determined
to be an argument to an alternate @code{ENTRY} point.
When @file{std.c} is told about the end of an external (top-level)
program unit,
it passes all the information it has saved away
on statements in that program unit
to @file{ste.c}.
@file{ste.c} ``expands'' each statement, in sequence, by
constructing the appropriate GBE information and calling
the appropriate GBE routines.
Details on the transformational phases follow.
Keep in mind that Fortran numbering is used,
so the first character on a line is column 1,
decimal numbering is used, and so on.
@menu
* g77stripcard::
* lex.c::
* sta.c::
* stb.c::
* expr.c::
* stc.c::
* std.c::
* ste.c::
* Gotchas (Transforming)::
* TBD (Transforming)::
@end menu
@node g77stripcard
@subsection g77stripcard
The @code{g77stripcard} program handles removing content beyond
column 72 (adjustable via a command-line option),
optionally warning about that content being something other
than trailing whitespace or Fortran commentary.
This program is needed because @code{lex.c} doesn't pay attention
to maximum line lengths at all, to make it easier to maintain,
as well as faster (for sources that don't depend on the maximum
column length vis-a-vis trailing non-blank non-commentary content).
Just how this program will be run---whether automatically for
old source (perhaps as the default for @file{.f} files?)---is not
yet determined.
In the meantime, it might as well be implemented as a typical UNIX pipe.
It should accept a @samp{-fline-length-@var{n}} option,
with the default line length set to 72.
When the text it strips off the end of a line is not blank
(not spaces and tabs),
it should insert an additional comment line
(beginning with @samp{!},
so it works for both fixed-form and free-form files)
containing the text,
following the stripped line.
The inserted comment should have a prefix of some kind,
TBD, that distinguishes the comment as representing stripped text.
Users could use that to @code{sed} out such lines, if they wished---it
seems silly to provide a command-line option to delete information
when it can be so easily filtered out by another program.
(This inserted comment should be designed to ``fit in'' well
with whatever the Fortran community is using these days for
preprocessor, translator, and other such products, like OpenMP.
What that's all about, and how @code{g77} can elegantly fit its
special comment conventions into it all, is TBD as well.
We don't want to reinvent the wheel here, but if there turn out
to be too many conflicting conventions, we might have to invent
one that looks nothing like the others, but which offers their
host products a better infrastructure in which to fit and coexist
peacefully.)
@code{g77stripcard} probably shouldn't do any tab expansion or other
fancy stuff.
People can use @code{expand} or other pre-filtering if they like.
The idea here is to keep each stage quite simple, while providing
excellent performance for ``normal'' code.
(Code with junk beyond column 73 is not really ``normal'',
as it comes from a card-punch heritage,
and will be increasingly hard for tomorrow's Fortran programmers to read.)
@node lex.c
@subsection lex.c
To help make the lexer simple, fast, and easy to maintain,
while also having @code{g77} generally encourage Fortran programmers
to write simple, maintainable, portable code by maximizing the
performance of compiling that kind of code:
@itemize @bullet
@item
There'll be just one lexer, for both fixed-form and free-form source.
@item
It'll care about the form only when handling the first 7 columns of
text, stuff like spaces between strings of alphanumerics, and
how lines are continued.
Some other distinctions will be handled by subsequent phases,
so at least one of them will have to know which form is involved.
For example, @samp{I = 2 . 4} is acceptable in fixed form,
and works in free form as well given the implementation @code{g77}
presently uses.
But the standard requires a diagnostic for it in free form,
so the parser has to be able to recognize that
the lexemes aren't contiguous
(information the lexer @emph{does} have to provide)
and that free-form source is being parsed,
so it can provide the diagnostic.
The @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme.
Otherwise, it'd have to know a whole lot more about how to parse Fortran,
or subsequent phases (mainly parsing) would have two paths through
lots of critical code---one to handle the lexeme @samp{2}, @samp{.},
and @samp{4} in sequence, another to handle the lexeme @samp{2.4}.
@item
It won't worry about line lengths
(beyond the first 7 columns for fixed-form source).
That is, once it starts parsing the ``statement'' part of a line
(column 7 for fixed-form, column 1 for free-form),
it'll keep going until it finds a newline,
rather than ignoring everything past a particular column
(72 or 132).
The implication here is that there shouldn't @emph{be}
anything past that last column, other than whitespace or
commentary, because users using typical editors
(or viewing output as typically printed)
won't necessarily know just where the last column is.
Code that has ``garbage'' beyond the last column
(almost certainly only fixed-form code with a punched-card legacy,
such as code using columns 73-80 for ``sequence numbers'')
will have to be run through @code{g77stripcard} first.
Also, keeping track of the maximum column position while also watching out
for the end of a line @emph{and} while reading from a file
just makes things slower.
Since a file must be read, and watching for the end of the line
is necessary (unless the typical input file was preprocessed to
include the necessary number of trailing spaces),
dropping the tracking of the maximum column position
is the only way to reduce the complexity of the pertinent code
while maintaining high performance.
@item
ASCII encoding is assumed for the input file.
Code written in other character sets will have to be converted first.
@item
Tabs (ASCII code 9)
will be converted to spaces via the straightforward
approach.
Specifically, a tab is converted to between one and eight spaces
as necessary to reach column @var{n},
where dividing @samp{(@var{n} - 1)} by eight
results in a remainder of zero.
@item
Linefeeds (ASCII code 10)
mark the ends of lines.
@item
A carriage return (ASCII code 13)
is accept if it immediately precedes a linefeed,
in which case it is ignored.
Otherwise, it is rejected (with a diagnostic).
@item
Any other characters other than the above
that are not part of the GNU Fortran Character Set
(@pxref{Character Set})
are rejected with a diagnostic.
This includes backspaces, form feeds, and the like.
(It might make sense to allow a form feed in column 1
as long as that's the only character on a line.
It certainly wouldn't seem to cost much in terms of performance.)
@item
The end of the input stream (EOF)
ends the current line.
@item
The distinction between uppercase and lowercase letters
will be preserved.
It will be up to subsequent phases to decide to fold case.
Current plans are to permit any casing for Fortran (reserved) keywords
while preserving casing for user-defined names.
(This might not be made the default for @file{.f} files, though.)
Preserving case seems necessary to provide more direct access
to facilities outside of @code{g77}, such as to C or Pascal code.
Names of intrinsics will probably be matchable in any case,
However, there probably won't be any option to require
a particular mixed-case appearance of intrinsics
(as there was for @code{g77} prior to version 0.6),
because that's painful to maintain,
and probably nobody uses it.
(How @samp{external SiN; r = sin(x)} would be handled is TBD.
I think old @code{g77} might already handle that pretty elegantly,
but whether we can cope with allowing the same fragment to reference
a @emph{different} procedure, even with the same interface,
via @samp{s = SiN(r)}, needs to be determined.
If it can't, we need to make sure that when code introduces
a user-defined name, any intrinsic matching that name
using a case-insensitive comparison
is ``turned off''.)
@item
Backslashes in @code{CHARACTER} and Hollerith constants
are not allowed.
This avoids the confusion introduced by some Fortran compiler vendors
providing C-like interpretation of backslashes,
while others provide straight-through interpretation.
Some kind of lexical construct (TBD) will be provided to allow
flagging of a @code{CHARACTER}
(but probably not a Hollerith)
constant that permits backslashes.
It'll necessarily be a prefix, such as:
@smallexample
PRINT *, C'This line has a backspace \b here.'
PRINT *, F'This line has a straight backslash \ here.'
@end smallexample
Further, command-line options might be provided to specify that
one prefix or the other is to be assumed as the default
for @code{CHARACTER} constants.
However, it seems more helpful for @code{g77} to provide a program
that converts prefix all constants
(or just those containing backslashes)
with the desired designation,
so printouts of code can be read
without knowing the compile-time options used when compiling it.
If such a program is provided
(let's name it @code{g77slash} for now),
then a command-line option to @code{g77} should not be provided.
(Though, given that it'll be easy to implement, it might be hard
to resist user requests for it ``to compile faster than if we
have to invoke another filter''.)
This program would take a command-line option to specify the
default interpretation of slashes,
affecting which prefix it uses for constants.
@code{g77slash} probably should automatically convert Hollerith
constants that contain slashes
to the appropriate @code{CHARACTER} constants.
Then @code{g77} wouldn't have to define a prefix syntax for Hollerith
constants specifying whether they want C-style or straight-through
backslashes.
@end itemize
The above implements nearly exactly what is specified by
@ref{Character Set},
and
@ref{Lines},
except it also provides automatic conversion of tabs
and ignoring of newline-related carriage returns.
It also effects the ``pure visual'' model,
by which is meant that a user viewing his code
in a typical text editor
(assuming it's not preprocessed via @code{g77stripcard} or similar)
doesn't need any special knowledge
of whether spaces on the screen are really tabs,
whether lines end immediately after the last visible non-space character
or after a number of spaces and tabs that follow it,
or whether the last line in the file is ended by a newline.
Most editors don't make these distinctions,
the ANSI FORTRAN 77 standard doesn't require them to,
and it permits a standard-conforming compiler
to define a method for transforming source code to
``standard form'' however it wants.
So, GNU Fortran defines it such that users have the best chance
of having the code be interpreted the way it looks on the screen
of the typical editor.
(Fancy editors should @emph{never} be required to correctly read code
written in classic two-dimensional-plaintext form.
By correct reading I mean ability to read it, book-like, without
mistaking text ignored by the compiler for program code and vice versa,
and without having to count beyond the first several columns.
The vague meaning of ASCII TAB, among other things, complicates
this somewhat, but as long as ``everyone'', including the editor,
other tools, and printer, agrees about the every-eighth-column convention,
the GNU Fortran ``pure visual'' model meets these requirements.
Any language or user-visible source form
requiring special tagging of tabs,
the ends of lines after spaces/tabs,
and so on, is broken by this definition.
Fortunately, Fortran @emph{itself} is not broken,
even if most vendor-supplied defaults for their Fortran compilers @emph{are}
in this regard.)
Further, this model provides a clean interface
to whatever preprocessors or code-generators are used
to produce input to this phase of @code{g77}.
Mainly, they need not worry about long lines.
@node sta.c
@subsection sta.c
@node stb.c
@subsection stb.c
@node expr.c
@subsection expr.c
@node stc.c
@subsection stc.c
@node std.c
@subsection std.c
@node ste.c
@subsection ste.c
@node Gotchas (Transforming)
@subsection Gotchas (Transforming)
This section is not about transforming ``gotchas'' into something else.
It is about the weirder aspects of transforming Fortran,
however that's defined,
into a more modern, canonical form.
@subsubsection Multi-character Lexemes
Each lexeme carries with it a pointer to where it appears in the source.
To provide the ability for diagnostics to point to column numbers,
in addition to line numbers and names,
lexemes that represent more than one (significant) character
in the source code need, generally,
to provide pointers to where each @emph{character} appears in the source.
This provides the ability to properly identify the precise location
of the problem in code like
@smallexample
SUBROUTINE X
END
BLOCK DATA X
END
@end smallexample
which, in fixed-form source, would result in single lexemes
consisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}.
(The problem is that @samp{X} is defined twice,
so a pointer to the @samp{X} in the second definition,
as well as a follow-up pointer to the corresponding pointer in the first,
would be preferable to pointing to the beginnings of the statements.)
This need also arises when parsing (and diagnosing) @code{FORMAT}
statements.
Further, it arises when diagnosing
@code{FMT=} specifiers that contain constants
(or partial constants, or even propagated constants!)
in I/O statements, as in:
@smallexample
PRINT '(I2, 3HAB)', J
@end smallexample
(A pointer to the beginning of the prematurely-terminated Hollerith
constant, and/or to the close parenthese, is preferable to a pointer
to the open-parenthese or the apostrophe that precedes it.)
Multi-character lexemes, which would seem to naturally include
at least digit strings, alphanumeric strings, @code{CHARACTER}
constants, and Hollerith constants, therefore need to provide
location information on each character.
(Maybe Hollerith constants don't, but it's unnecessary to except them.)
The question then arises, what about @emph{other} multi-character lexemes,
such as @samp{**} and @samp{//},
and Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on?
Turns out there's a need to identify the location of the second character
of these two-character lexemes.
For example, in @samp{I(/J) = K}, the slash needs to be diagnosed
as the problem, not the open parenthese.
Similarly, it is preferable to diagnose the second slash in
@samp{I = J // K} rather than the first, given the implicit typing
rules, which would result in the compiler disallowing the attempted
concatenation of two integers.
(Though, since that's more of a semantic issue,
it's not @emph{that} much preferable.)
Even sequences that could be parsed as digit strings could use location info,
for example, to diagnose the @samp{9} in the octal constant @samp{O'129'}.
(This probably will be parsed as a character string,
to be consistent with the parsing of @samp{Z'129A'}.)
To avoid the hassle of recording the location of the second character,
while also preserving the general rule that each significant character
is distinctly pointed to by the lexeme that contains it,
it's best to simply not have any fixed-size lexemes
larger than one character.
This new design is expected to make checking for two
@samp{*} lexemes in a row much easier than the old design,
so this is not much of a sacrifice.
It probably makes the lexer much easier to implement
than it makes the parser harder.
@subsubsection Space-padding Lexemes
Certain lexemes need to be padded with virtual spaces when the
end of the line (or file) is encountered.
This is necessary in fixed form, to handle lines that don't
extend to column 72, assuming that's the line length in effect.
@subsubsection Bizarre Free-form Hollerith Constants
Last I checked, the Fortran 90 standard actually required the compiler
to silently accept something like
@smallexample
FORMAT ( 1 2 Htwelve chars )
@end smallexample
as a valid @code{FORMAT} statement specifying a twelve-character
Hollerith constant.
The implication here is that, since the new lexer is a zero-feedback one,
it won't know that the special case of a @code{FORMAT} statement being parsed
requires apparently distinct lexemes @samp{1} and @samp{2} to be treated as
a single lexeme.
(This is a horrible misfeature of the Fortran 90 language.
It's one of many such misfeatures that almost make me want
to not support them, and forge ahead with designing a new
``GNU Fortran'' language that has the features,
but not the misfeatures, of Fortran 90,
and provide utility programs to do the conversion automatically.)
So, the lexer must gather distinct chunks of decimal strings into
a single lexeme in contexts where a single decimal lexeme might
start a Hollerith constant.
(Which probably means it might as well do that all the time
for all multi-character lexemes, even in free-form mode,
leaving it to subsequent phases to pull them apart as they see fit.)
Compare the treatment of this to how
@smallexample
CHARACTER * 4 5 HEY
@end smallexample
and
@smallexample
CHARACTER * 12 HEY
@end smallexample
must be treated---the former must be diagnosed, due to the separation
between lexemes, the latter must be accepted as a proper declaration.
@subsubsection Hollerith Constants
Recognizing a Hollerith constant---specifically,
that an @samp{H} or @samp{h} after a digit string begins
such a constant---requires some knowledge of context.
Hollerith constants (such as @samp{2HAB}) can appear after:
@itemize @bullet
@item
@samp{(}
@item
@samp{,}
@item
@samp{=}
@item
@samp{+}, @samp{-}, @samp{/}
@item
@samp{*}, except as noted below
@end itemize
Hollerith constants don't appear after:
@itemize @bullet
@item
@samp{CHARACTER*},
which can be treated generally as
any @samp{*} that is the second lexeme of a statement
@end itemize
@subsubsection Confusing Function Keyword
While
@smallexample
REAL FUNCTION FOO ()
@end smallexample
must be a @code{FUNCTION} statement and
@smallexample
REAL FUNCTION FOO (5)
@end smallexample
must be a type-definition statement,
@smallexample
REAL FUNCTION FOO (@var{names})
@end smallexample
where @var{names} is a comma-separated list of names,
can be one or the other.
The only way to disambiguate that statement
(short of mandating free-form source or a short maximum
length for name for external procedures)
is based on the context of the statement.
In particular, the statement is known to be within an
already-started program unit
(but not at the outer level of the @code{CONTAINS} block),
it is a type-declaration statement.
Otherwise, the statement is a @code{FUNCTION} statement,
in that it begins a function program unit
(external, or, within @code{CONTAINS}, nested).
@subsubsection Weird READ
The statement
@smallexample
READ (N)
@end smallexample
is equivalent to either
@smallexample
READ (UNIT=(N))
@end smallexample
or
@smallexample
READ (FMT=(N))
@end smallexample
depending on which would be valid in context.
Specifically, if @samp{N} is type @code{INTEGER},
@samp{READ (FMT=(N))} would not be valid,
because parentheses may not be used around @samp{N},
whereas they may around it in @samp{READ (UNIT=(N))}.
Further, if @samp{N} is type @code{CHARACTER},
the opposite is true---@samp{READ (UNIT=(N))} is not valid,
but @samp{READ (FMT=(N))} is.
Strictly speaking, if anything follows
@smallexample
READ (N)
@end smallexample
in the statement, whether the first lexeme after the close
parenthese is a comma could be used to disambiguate the two cases,
without looking at the type of @samp{N},
because the comma is required for the @samp{READ (FMT=(N))}
interpretation and disallowed for the @samp{READ (UNIT=(N))}
interpretation.
However, in practice, many Fortran compilers allow
the comma for the @samp{READ (UNIT=(N))}
interpretation anyway
(in that they generally allow a leading comma before
an I/O list in an I/O statement),
and much code takes advantage of this allowance.
(This is quite a reasonable allowance, since the
juxtaposition of a comma-separated list immediately
after an I/O control-specification list, which is also comma-separated,
without an intervening comma,
looks sufficiently ``wrong'' to programmers
that they can't resist the itch to insert the comma.
@samp{READ (I, J), K, L} simply looks cleaner than
@samp{READ (I, J) K, L}.)
So, type-based disambiguation is needed unless strict adherence
to the standard is always assumed, and we're not going to assume that.
@node TBD (Transforming)
@subsection TBD (Transforming)
Continue researching gotchas, designing the transformational process,
and implementing it.
Specific issues to resolve:
@itemize @bullet
@item
Just where should @code{INCLUDE} processing take place?
Clearly before (or part of) statement identification (@file{sta.c}),
since determining whether @samp{I(J)=K} is a statement-function
definition or an assignment statement requires knowing the context,
which in turn requires having processed @code{INCLUDE} files.
@item
Just where should (if it was implemented) @code{USE} processing take place?
This gets into the whole issue of how @code{g77} should handle the concept
of modules.
I think GNAT already takes on this issue, but don't know more than that.
Jim Giles has written extensively on @code{comp.lang.fortran}
about his opinions on module handling, as have others.
Jim's views should be taken into account.
Actually, Richard M. Stallman (RMS) also has written up
some guidelines for implementing such things,
but I'm not sure where I read them.
Perhaps the old @email{gcc2@@cygnus.com} list.
If someone could dig references to these up and get them to me,
that would be much appreciated!
Even though modules are not on the short-term list for implementation,
it'd be helpful to know @emph{now} how to avoid making them harder to
implement them @emph{later}.
@item
Should the @code{g77} command become just a script that invokes
all the various preprocessing that might be needed,
thus making it seem slower than necessary for legacy code
that people are unwilling to convert,
or should we provide a separate script for that,
thus encouraging people to convert their code once and for all?
At least, a separate script to behave as old @code{g77} did,
perhaps named @code{g77old}, might ease the transition,
as might a corresponding one that converts source codes
named @code{g77oldnew}.
These scripts would take all the pertinent options @code{g77} used
to take and run the appropriate filters,
passing the results to @code{g77} or just making new sources out of them
(in a subdirectory, leaving the user to do the dirty deed of
moving or copying them over the old sources).
@item
Do other Fortran compilers provide a prefix syntax
to govern the treatment of backslashes in @code{CHARACTER}
(or Hollerith) constants?
Knowing what other compilers provide would help.
@item
Is it okay to drop support for the @samp{-fintrin-case-initcap},
@samp{-fmatch-case-initcap}, @samp{-fsymbol-case-initcap},
and @samp{-fcase-initcap} options?
I've asked @email{info-gnu-fortran@@gnu.org} for input on this.
Not having to support these makes it easier to write the new front end,
and might also avoid complicated its design.
@end itemize
@node Philosophy of Code Generation
@section Philosophy of Code Generation
Don't poke the bear.
The @code{g77} front end generates code
via the @code{gcc} back end.
@cindex GNU Back End (GBE)
@cindex GBE
@cindex @code{gcc}, back end
@cindex back end, gcc
@cindex code generator
The @code{gcc} back end (GBE) is a large, complex
labyrinth of intricate code
written in a combination of the C language
and specialized languages internal to @code{gcc}.
While the @emph{code} that implements the GBE
is written in a combination of languages,
the GBE itself is,
to the front end for a language like Fortran,
best viewed as a @emph{compiler}
that compiles its own, unique, language.
The GBE's ``source'', then, is written in this language,
which consists primarily of
a combination of calls to GBE functions
and @dfn{tree} nodes
(which are, themselves, created
by calling GBE functions).
So, the @code{g77} generates code by, in effect,
translating the Fortran code it reads
into a form ``written'' in the ``language''
of the @code{gcc} back end.
@cindex GBEL
@cindex GNU Back End Language (GBEL)
This language will heretofore be referred to as @dfn{GBEL},
for GNU Back End Language.
GBEL is an evolving language,
not fully specified in any published form
as of this writing.
It offers many facilities,
but its ``core'' facilities
are those that corresponding most directly
to those needed to support @code{gcc}
(compiling code written in GNU C).
The @code{g77} Fortran Front End (FFE)
is designed and implemented
to navigate the currents and eddies
of ongoing GBEL and @code{gcc} development
while also delivering on the potential
of an integrated FFE
(as compared to using a converter like @code{f2c}
and feeding the output into @code{gcc}).
Goals of the FFE's code-generation strategy include:
@itemize @bullet
@item
High likelihood of generation of correct code,
or, failing that, producing a fatal diagnostic or crashing.
@item
Generation of highly optimized code,
as directed by the user
via GBE-specific (versus @code{g77}-specific) constructs,
such as command-line options.
@item
Fast overall (FFE plus GBE) compilation.
@item
Preservation of source-level debugging information.
@end itemize
The strategies historically, and currently, used by the FFE
to achieve these goals include:
@itemize @bullet
@item
Use of GBEL constructs that most faithfully encapsulate
the semantics of Fortran.
@item
Avoidance of GBEL constructs that are so rarely used,
or limited to use in specialized situations not related to Fortran,
that their reliability and performance has not yet been established
as sufficient for use by the FFE.
@item
Flexible design, to readily accommodate changes to specific
code-generation strategies, perhaps governed by command-line options.
@end itemize
@cindex Bear-poking
@cindex Poking the bear
``Don't poke the bear'' somewhat summarizes the above strategies.
The GBE is the bear.
The FFE is designed and implemented to avoid poking it
in ways that are likely to just annoy it.
The FFE usually either tackles it head-on,
or avoids treating it in ways dissimilar to how
the @code{gcc} front end treats it.
For example, the FFE uses the native array facility in the back end
instead of the lower-level pointer-arithmetic facility
used by @code{gcc} when compiling @code{f2c} output).
Theoretically, this presents more opportunities for optimization,
faster compile times,
and the production of more faithful debugging information.
These benefits were not, however, immediately realized,
mainly because @code{gcc} itself makes little or no use
of the native array facility.
Complex arithmetic is a case study of the evolution of this strategy.
When originally implemented,
the GBEL had just evolved its own native complex-arithmetic facility,
so the FFE took advantage of that.
When porting @code{g77} to 64-bit systems,
it was discovered that the GBE didn't really
implement its native complex-arithmetic facility properly.
The short-term solution was to rewrite the FFE
to instead use the lower-level facilities
that'd be used by @code{gcc}-compiled code
(assuming that code, itself, didn't use the native complex type
provided, as an extension, by @code{gcc}),
since these were known to work,
and, in any case, if shown to not work,
would likely be rapidly fixed
(since they'd likely not work for vanilla C code in similar circumstances).
However, the rewrite accommodated the original, native approach as well
by offering a command-line option to select it over the emulated approach.
This allowed users, and especially GBE maintainers, to try out
fixes to complex-arithmetic support in the GBE
while @code{g77} continued to default to compiling more code correctly,
albeit producing (typically) slower executables.
As of April 1999, it appeared that the last few bugs
in the GBE's support of its native complex-arithmetic facility
were worked out.
The FFE was changed back to default to using that native facility,
leaving emulation as an option.
Other Fortran constructs---arrays, character strings,
complex division, @code{COMMON} and @code{EQUIVALENCE} aggregates,
and so on---involve issues similar to those pertaining to complex arithmetic.
So, it is possible that the history
of how the FFE handled complex arithmetic
will be repeated, probably in modified form
(and hopefully over shorter timeframes),
for some of these other facilities.
@node Two-pass Design
@section Two-pass Design
The FFE does not tell the GBE anything about a program unit
until after the last statement in that unit has been parsed.
(A program unit is a Fortran concept that corresponds, in the C world,
mostly closely to functions definitions in ISO C.
That is, a program unit in Fortran is like a top-level function in C.
Nested functions, found among the extensions offered by GNU C,
correspond roughly to Fortran's statement functions.)
So, while parsing the code in a program unit,
the FFE saves up all the information
on statements, expressions, names, and so on,
until it has seen the last statement.
At that point, the FFE revisits the saved information
(in what amounts to a second @dfn{pass} over the program unit)
to perform the actual translation of the program unit into GBEL,
ultimating in the generation of assembly code for it.
Some lookahead is performed during this second pass,
so the FFE could be viewed as a ``two-plus-pass'' design.
@menu
* Two-pass Code::
* Why Two Passes::
@end menu
@node Two-pass Code
@subsection Two-pass Code
Most of the code that turns the first pass (parsing)
into a second pass for code generation
is in @file{@value{path-g77}/std.c}.
It has external functions,
called mainly by siblings in @file{@value{path-g77}/stc.c},
that record the information on statements and expressions
in the order they are seen in the source code.
These functions save that information.
It also has an external function that revisits that information,
calling the siblings in @file{@value{path-g77}/ste.c},
which handles the actual code generation
(by generating GBEL code,
that is, by calling GBE routines
to represent and specify expressions, statements, and so on).
@node Why Two Passes
@subsection Why Two Passes
The need for two passes was not immediately evident
during the design and implementation of the code in the FFE
that was to produce GBEL.
Only after a few kludges,
to handle things like incorrectly-guessed @code{ASSIGN} label nature,
had been implemented,
did enough evidence pile up to make it clear
that @file{std.c} had to be introduced to intercept,
save, then revisit as part of a second pass,
the digested contents of a program unit.
Other such missteps have occurred during the evolution of the FFE,
because of the different goals of the FFE and the GBE.
Because the GBE's original, and still primary, goal
was to directly support the GNU C language,
the GBEL, and the GBE itself,
requires more complexity
on the part of most front ends
than it requires of @code{gcc}'s.
For example,
the GBEL offers an interface that permits the @code{gcc} front end
to implement most, or all, of the language features it supports,
without the front end having to
make use of non-user-defined variables.
(It's almost certainly the case that all of K&R C,
and probably ANSI C as well,
is handled by the @code{gcc} front end
without declaring such variables.)
The FFE, on the other hand, must resort to a variety of ``tricks''
to achieve its goals.
Consider the following C code:
@smallexample
int
foo (int a, int b)
@{
int c = 0;
if ((c = bar (c)) == 0)
goto done;
quux (c << 1);
done:
return c;
@}
@end smallexample
Note what kinds of objects are declared, or defined, before their use,
and before any actual code generation involving them
would normally take place:
@itemize @bullet
@item
Return type of function
@item
Entry point(s) of function
@item
Dummy arguments
@item
Variables
@item
Initial values for variables
@end itemize
Whereas, the following items can, and do,
suddenly appear ``out of the blue'' in C:
@itemize @bullet
@item
Label references
@item
Function references
@end itemize
Not surprisingly, the GBE faithfully permits the latter set of items
to be ``discovered'' partway through GBEL ``programs'',
just as they are permitted to in C.
Yet, the GBE has tended, at least in the past,
to be reticent to fully support similar ``late'' discovery
of items in the former set.
This makes Fortran a poor fit for the ``safe'' subset of GBEL.
Consider:
@smallexample
FUNCTION X (A, ARRAY, ID1)
CHARACTER*(*) A
DOUBLE PRECISION X, Y, Z, TMP, EE, PI
REAL ARRAY(ID1*ID2)
COMMON ID2
EXTERNAL FRED
ASSIGN 100 TO J
CALL FOO (I)
IF (I .EQ. 0) PRINT *, A(0)
GOTO 200
ENTRY Y (Z)
ASSIGN 101 TO J
200 PRINT *, A(1)
READ *, TMP
GOTO J
100 X = TMP * EE
RETURN
101 Y = TMP * PI
CALL FRED
DATA EE, PI /2.71D0, 3.14D0/
END
@end smallexample
Here are some observations about the above code,
which, while somewhat contrived,
conforms to the FORTRAN 77 and Fortran 90 standards:
@itemize @bullet
@item
The return type of function @samp{X} is not known
until the @samp{DOUBLE PRECISION} line has been parsed.
@item
Whether @samp{A} is a function or a variable
is not known until the @samp{PRINT *, A(0)} statement
has been parsed.
@item
The bounds of the array of argument @samp{ARRAY}
depend on a computation involving
the subsequent argument @samp{ID1}
and the blank-common member @samp{ID2}.
@item
Whether @samp{Y} and @samp{Z} are local variables,
additional function entry points,
or dummy arguments to additional entry points
is not known
until the @code{ENTRY} statement is parsed.
@item
Similarly, whether @samp{TMP} is a local variable is not known
until the @samp{READ *, TMP} statement is parsed.
@item
The initial values for @samp{EE} and @samp{PI}
are not known until after the @code{DATA} statement is parsed.
@item
Whether @samp{FRED} is a function returning type @code{REAL}
or a subroutine
(which can be thought of as returning type @code{void}
@emph{or}, to support alternate returns in a simple way,
type @code{int})
is not known
until the @samp{CALL FRED} statement is parsed.
@item
Whether @samp{100} is a @code{FORMAT} label
or the label of an executable statement
is not known
until the @samp{X =} statement is parsed.
(These two types of labels get @emph{very} different treatment,
especially when @code{ASSIGN}'ed.)
@item
That @samp{J} is a local variable is not known
until the first @code{ASSIGN} statement is parsed.
(This happens @emph{after} executable code has been seen.)
@end itemize
Very few of these ``discoveries''
can be accommodated by the GBE as it has evolved over the years.
The GBEL doesn't support several of them,
and those it might appear to support
don't always work properly,
especially in combination with other GBEL and GBE features,
as implemented in the GBE.
(Had the GBE and its GBEL originally evolved to support @code{g77},
the shoe would be on the other foot, so to speak---most, if not all,
of the above would be directly supported by the GBEL,
and a few C constructs would probably not, as they are in reality,
be supported.
Both this mythical, and today's real, GBE caters to its GBEL
by, sometimes, scrambling around, cleaning up after itself---after
discovering that assumptions it made earlier during code generation
are incorrect.)
So, the FFE handles these discrepancies---between the order in which
it discovers facts about the code it is compiling,
and the order in which the GBEL and GBE support such discoveries---by
performing what amounts to two
passes over each program unit.
(A few ambiguities can remain at that point,
such as whether, given @samp{EXTERNAL BAZ}
and no other reference to @samp{BAZ} in the program unit,
it is a subroutine, a function, or a block-data---which, in C-speak,
governs its declared return type.
Fortunately, these distinctions are easily finessed
for the procedure, library, and object-file interfaces
supported by @code{g77}.)
@node Challenges Posed
@section Challenges Posed
Consider the following Fortran code, which uses various extensions
(including some to Fortran 90):
@smallexample
SUBROUTINE X(A)
CHARACTER*(*) A
COMPLEX CFUNC
INTEGER*2 CLOCKS(200)
INTEGER IFUNC
CALL SYSTEM_CLOCK (CLOCKS (IFUNC (CFUNC ('('//A//')'))))
@end smallexample
The above poses the following challenges to any Fortran compiler
that uses run-time interfaces, and a run-time library, roughly similar
to those used by @code{g77}:
@itemize @bullet
@item
Assuming the library routine that supports @code{SYSTEM_CLOCK}
expects to set an @code{INTEGER*4} variable via its @code{COUNT} argument,
the compiler must make available to it a temporary variable of that type.
@item
Further, after the @code{SYSTEM_CLOCK} library routine returns,
the compiler must ensure that the temporary variable it wrote
is copied into the appropriate element of the @samp{CLOCKS} array.
(This assumes the compiler doesn't just reject the code,
which it should if it is compiling under some kind of a ``strict'' option.)
@item
To determine the correct index into the @samp{CLOCKS} array,
(putting aside the fact that the index, in this particular case,
need not be computed until after
the @code{SYSTEM_CLOCK} library routine returns),
the compiler must ensure that the @code{IFUNC} function is called.
That requires evaluating its argument,
which requires, for @code{g77}
(assuming @code{-ff2c} is in force),
reserving a temporary variable of type @code{COMPLEX}
for use as a repository for the return value
being computed by @samp{CFUNC}.
@item
Before invoking @samp{CFUNC},
is argument must be evaluated,
which requires allocating, at run time,
a temporary large enough to hold the result of the concatenation,
as well as actually performing the concatenation.
@item
The large temporary needed during invocation of @code{CFUNC}
should, ideally, be deallocated
(or, at least, left to the GBE to dispose of, as it sees fit)
as soon as @code{CFUNC} returns,
which means before @code{IFUNC} is called
(as it might need a lot of dynamically allocated memory).
@end itemize
@code{g77} currently doesn't support all of the above,
but, so that it might someday, it has evolved to handle
at least some of the above requirements.
Meeting the above requirements is made more challenging
by conforming to the requirements of the GBEL/GBE combination.
@node Transforming Statements
@section Transforming Statements
Most Fortran statements are given their own block,
and, for temporary variables they might need, their own scope.
(A block is what distinguishes @samp{@{ foo (); @}}
from just @samp{foo ();} in C.
A scope is included with every such block,
providing a distinct name space for local variables.)
Label definitions for the statement precede this block,
so @samp{10 PRINT *, I} is handled more like
@samp{fl10: @{ @dots{} @}} than @samp{@{ fl10: @dots{} @}}
(where @samp{fl10} is just a notation meaning ``Fortran Label 10''
for the purposes of this document).
@menu
* Statements Needing Temporaries::
* Transforming DO WHILE::
* Transforming Iterative DO::
* Transforming Block IF::
* Transforming SELECT CASE::
@end menu
@node Statements Needing Temporaries
@subsection Statements Needing Temporaries
Any temporaries needed during, but not beyond,
execution of a Fortran statement,
are made local to the scope of that statement's block.
This allows the GBE to share storage for these temporaries
among the various statements without the FFE
having to manage that itself.
(The GBE could, of course, decide to optimize
management of these temporaries.
For example, it could, theoretically,
schedule some of the computations involving these temporaries
to occur in parallel.
More practically, it might leave the storage for some temporaries
``live'' beyond their scopes, to reduce the number of
manipulations of the stack pointer at run time.)
Temporaries needed across distinct statement boundaries usually
are associated with Fortran blocks (such as @code{DO}/@code{END DO}).
(Also, there might be temporaries not associated with blocks at all---these
would be in the scope of the entire program unit.)
Each Fortran block @emph{should} get its own block/scope in the GBE.
This is best, because it allows temporaries to be more naturally handled.
However, it might pose problems when handling labels
(in particular, when they're the targets of @code{GOTO}s outside the Fortran
block), and generally just hassling with replicating
parts of the @code{gcc} front end
(because the FFE needs to support
an arbitrary number of nested back-end blocks
if each Fortran block gets one).
So, there might still be a need for top-level temporaries, whose
``owning'' scope is that of the containing procedure.
Also, there seems to be problems declaring new variables after
generating code (within a block) in the back end, leading to, e.g.,
@samp{label not defined before binding contour} or similar messages,
when compiling with @samp{-fstack-check} or
when compiling for certain targets.
Because of that, and because sometimes these temporaries are not
discovered until in the middle of of generating code for an expression
statement (as in the case of the optimization for @samp{X**I}),
it seems best to always
pre-scan all the expressions that'll be expanded for a block
before generating any of the code for that block.
This pre-scan then handles discovering and declaring, to the back end,
the temporaries needed for that block.
It's also important to treat distinct items in an I/O list as distinct
statements deserving their own blocks.
That's because there's a requirement
that each I/O item be fully processed before the next one,
which matters in cases like @samp{READ (*,*), I, A(I)}---the
element of @samp{A} read in the second item
@emph{must} be determined from the value
of @samp{I} read in the first item.
@node Transforming DO WHILE
@subsection Transforming DO WHILE
@samp{DO WHILE(expr)} @emph{must} be implemented
so that temporaries needed to evaluate @samp{expr}
are generated just for the test, each time.
Consider how @samp{DO WHILE (A//B .NE. 'END'); @dots{}; END DO} is transformed:
@smallexample
for (;;)
@{
int temp0;
@{
char temp1[large];
libg77_catenate (temp1, a, b);
temp0 = libg77_ne (temp1, 'END');
@}
if (! temp0)
break;
@dots{}
@}
@end smallexample
In this case, it seems like a time/space tradeoff
between allocating and deallocating @samp{temp1} for each iteration
and allocating it just once for the entire loop.
However, if @samp{temp1} is allocated just once for the entire loop,
it could be the wrong size for subsequent iterations of that loop
in cases like @samp{DO WHILE (A(I:J)//B .NE. 'END')},
because the body of the loop might modify @samp{I} or @samp{J}.
So, the above implementation is used,
though a more optimal one can be used
in specific circumstances.
@node Transforming Iterative DO
@subsection Transforming Iterative DO
An iterative @code{DO} loop
(one that specifies an iteration variable)
is required by the Fortran standards
to be implemented as though an iteration count
is computed before entering the loop body,
and that iteration count used to determine
the number of times the loop body is to be performed
(assuming the loop isn't cut short via @code{GOTO} or @code{EXIT}).
The FFE handles this by allocating a temporary variable
to contain the computed number of iterations.
Since this variable must be in a scope that includes the entire loop,
a GBEL block is created for that loop,
and the variable declared as belonging to the scope of that block.
@node Transforming Block IF
@subsection Transforming Block IF
Consider:
@smallexample
SUBROUTINE X(A,B,C)
CHARACTER*(*) A, B, C
LOGICAL LFUNC
IF (LFUNC (A//B)) THEN
CALL SUBR1
ELSE IF (LFUNC (A//C)) THEN
CALL SUBR2
ELSE
CALL SUBR3
END
@end smallexample
The arguments to the two calls to @samp{LFUNC}
require dynamic allocation (at run time),
but are not required during execution of the @code{CALL} statements.
So, the scopes of those temporaries must be within blocks inside
the block corresponding to the Fortran @code{IF} block.
This cannot be represented ``naturally''
in vanilla C, nor in GBEL.
The @code{if}, @code{elseif}, @code{else},
and @code{endif} constructs
provided by both languages must,
for a given @code{if} block,
share the same C/GBE block.
Therefore, any temporaries needed during evaluation of @samp{expr}
while executing @samp{ELSE IF(expr)}
must either have been predeclared
at the top of the corresponding @code{IF} block,
or declared within a new block for that @code{ELSE IF}---a block that,
since it cannot contain the @code{else} or @code{else if} itself
(due to the above requirement),
actually implements the rest of the @code{IF} block's
@code{ELSE IF} and @code{ELSE} statements
within an inner block.
The FFE takes the latter approach.
@node Transforming SELECT CASE
@subsection Transforming SELECT CASE
@code{SELECT CASE} poses a few interesting problems for code generation,
if efficiency and frugal stack management are important.
Consider @samp{SELECT CASE (I('PREFIX'//A))},
where @samp{A} is @code{CHARACTER*(*)}.
In a case like this---basically,
in any case where largish temporaries are needed
to evaluate the expression---those temporaries should
not be ``live'' during execution of any of the @code{CASE} blocks.
So, evaluation of the expression is best done within its own block,
which in turn is within the @code{SELECT CASE} block itself
(which contains the code for the CASE blocks as well,
though each within their own block).
Otherwise, we'd have the rough equivalent of this pseudo-code:
@smallexample
@{
char temp[large];
libg77_catenate (temp, 'prefix', a);
switch (i (temp))
@{
case 0:
@dots{}
@}
@}
@end smallexample
And that would leave temp[large] in scope during the CASE blocks
(although a clever back end *could* see that it isn't referenced
in them, and thus free that temp before executing the blocks).
So this approach is used instead:
@smallexample
@{
int temp0;
@{
char temp1[large];
libg77_catenate (temp1, 'prefix', a);
temp0 = i (temp1);
@}
switch (temp0)
@{
case 0:
@dots{}
@}
@}
@end smallexample
Note how @samp{temp1} goes out of scope before starting the switch,
thus making it easy for a back end to free it.
The problem @emph{that} solution has, however,
is with @samp{SELECT CASE('prefix'//A)}
(which is currently not supported).
Unless the GBEL is extended to support arbitrarily long character strings
in its @code{case} facility,
the FFE has to implement @code{SELECT CASE} on @code{CHARACTER}
(probably excepting @code{CHARACTER*1})
using a cascade of
@code{if}, @code{elseif}, @code{else}, and @code{endif} constructs
in GBEL.
To prevent the (potentially large) temporary,
needed to hold the selected expression itself (@samp{'prefix'//A}),
from being in scope during execution of the @code{CASE} blocks,
two approaches are available:
@itemize @bullet
@item
Pre-evaluate all the @code{CASE} tests,
producing an integer ordinal that is used,
a la @samp{temp0} in the earlier example,
as if @samp{SELECT CASE(temp0)} had been written.
Each corresponding @code{CASE} is replaced with @samp{CASE(@var{i})},
where @var{i} is the ordinal for that case,
determined while, or before,
generating the cascade of @code{if}-related constructs
to cope with @code{CHARACTER} selection.
@item
Make @samp{temp0} above just
large enough to hold the longest @code{CASE} string
that'll actually be compared against the expression
(in this case, @samp{'prefix'//A}).
Since that length must be constant
(because @code{CASE} expressions are all constant),
it won't be so large,
and, further, @samp{temp1} need not be dynamically allocated,
since normal @code{CHARACTER} assignment can be used
into the fixed-length @samp{temp0}.
@end itemize
Both of these solutions require @code{SELECT CASE} implementation
to be changed so all the corresponding @code{CASE} statements
are seen during the actual code generation for @code{SELECT CASE}.
@node Transforming Expressions
@section Transforming Expressions
The interactions between statements, expressions, and subexpressions
at program run time can be viewed as:
@smallexample
@var{action}(@var{expr})
@end smallexample
Here, @var{action} is the series of steps
performed to effect the statement,
and @var{expr} is the expression
whose value is used by @var{action}.
Expanding the above shows a typical order of events at run time:
@smallexample
Evaluate @var{expr}
Perform @var{action}, using result of evaluation of @var{expr}
Clean up after evaluating @var{expr}
@end smallexample
So, if evaluating @var{expr} requires allocating memory,
that memory can be freed before performing @var{action}
only if it is not needed to hold the result of evaluating @var{expr}.
Otherwise, it must be freed no sooner than
after @var{action} has been performed.
The above are recursive definitions,
in the sense that they apply to subexpressions of @var{expr}.
That is, evaluating @var{expr} involves
evaluating all of its subexpressions,
performing the @var{action} that computes the
result value of @var{expr},
then cleaning up after evaluating those subexpressions.
The recursive nature of this evaluation is implemented
via recursive-descent transformation of the top-level statements,
their expressions, @emph{their} subexpressions, and so on.
However, that recursive-descent transformation is,
due to the nature of the GBEL,
focused primarily on generating a @emph{single} stream of code
to be executed at run time.
Yet, from the above, it's clear that multiple streams of code
must effectively be simultaneously generated
during the recursive-descent analysis of statements.
The primary stream implements the primary @var{action} items,
while at least two other streams implement
the evaluation and clean-up items.
Requirements imposed by expressions include:
@itemize @bullet
@item
Whether the caller needs to have a temporary ready
to hold the value of the expression.
@item
Other stuff???
@end itemize
@node Internal Naming Conventions
@section Internal Naming Conventions
Names exported by FFE modules have the following (regular-expression) forms.
Note that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}},
where @var{mod} is lowercase or uppercase alphanumerics, respectively,
are exported by the module @code{ffe@var{mod}},
with the source code doing the exporting in @file{@var{mod}.h}.
(Usually, the source code for the implementation is in @file{@var{mod}.c}.)
Identifiers that don't fit the following forms
are not considered exported,
even if they are according to the C language.
(For example, they might be made available to other modules
solely for use within expansions of exported macros,
not for use within any source code in those other modules.)
@table @code
@item ffe@var{mod}
The single typedef exported by the module.
@item FFE@var{umod}_[A-Z][A-Z0-9_]*
(Where @var{umod} is the uppercase for of @var{mod}.)
A @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}.
@item ffe@var{mod}[A-Z][A-Z][a-z0-9]*
A typedef exported by the module.
The portion of the identifier after @code{ffe@var{mod}} is
referred to as @code{ctype}, a capitalized (mixed-case) form
of @code{type}.
@item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]?
(Where @var{umod} is the uppercase for of @var{mod}.)
A @code{#define} or @code{enum} constant of the type
@code{ffe@var{mod}@var{type}},
where @var{type} is the lowercase form of @var{ctype}
in an exported typedef.
@item ffe@var{mod}_@var{value}
A function that does or returns something,
as described by @var{value} (see below).
@item ffe@var{mod}_@var{value}_@var{input}
A function that does or returns something based
primarily on the thing described by @var{input} (see below).
@end table
Below are names used for @var{value} and @var{input},
along with their definitions.
@table @code
@item col
A column number within a line (first column is number 1).
@item file
An encapsulation of a file's name.
@item find
Looks up an instance of some type that matches specified criteria,
and returns that, even if it has to create a new instance or
crash trying to find it (as appropriate).
@item initialize
Initializes, usually a module. No type.
@item int
A generic integer of type @code{int}.
@item is
A generic integer that contains a true (non-zero) or false (zero) value.
@item len
A generic integer that contains the length of something.
@item line
A line number within a source file,
or a global line number.
@item lookup
Looks up an instance of some type that matches specified criteria,
and returns that, or returns nil.
@item name
A @code{text} that points to a name of something.
@item new
Makes a new instance of the indicated type.
Might return an existing one if appropriate---if so,
similar to @code{find} without crashing.
@item pt
Pointer to a particular character (line, column pairs)
in the input file (source code being compiled).
@item run
Performs some herculean task. No type.
@item terminate
Terminates, usually a module. No type.
@item text
A @code{char *} that points to generic text.
@end table
|