summaryrefslogtreecommitdiff
path: root/faq.texi
blob: ced5933a6e93ec9989bfc1d20bd80fbcc93c9a10 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
@c  This file is part of flex.

@c  Copyright (c) 1990, 1997 The Regents of the University of California.
@c  All rights reserved.

@c  This code is derived from software contributed to Berkeley by
@c  Vern Paxson.

@c  The United States Government has rights in this work pursuant
@c  to contract no. DE-AC03-76SF00098 between the United States
@c  Department of Energy and the University of California.

@c   Redistribution and use in source and binary forms, with or without
@c   modification, are permitted provided that the following conditions
@c   are met:

@c   1. Redistributions of source code must retain the above copyright
@c      notice, this list of conditions and the following disclaimer.
@c   2. Redistributions in binary form must reproduce the above copyright
@c      notice, this list of conditions and the following disclaimer in the
@c      documentation and/or other materials provided with the distribution.

@c   Neither the name of the University nor the names of its contributors
@c   may be used to endorse or promote products derived from this software
@c   without specific prior written permission.

@c   THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
@c   IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
@c   WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
@c   PURPOSE.

@node FAQ
@unnumbered FAQ

@menu
* When was flex born?::         
* How do I expand \ escape sequences in C-style quoted strings?::  
* Why do flex scanners call fileno if it is not ANSI compatible?::  
* Does flex support recursive pattern definitions?::  
* How do I skip huge chunks of input (tens of megabytes) while using flex?::  
* Flex is not matching my patterns in the same order that I defined them.::  
* My actions are executing out of order or sometimes not at all.::  
* How can I have multiple input sources feed into the same scanner at the same time?::  
* Can I build nested parsers that work with the same input file?::  
* How can I match text only at the end of a file?::  
* How can I make REJECT cascade across start condition boundaries?::  
* Why cant I use fast or full tables with interactive mode?::  
* How much faster is -F or -f than -C?::  
* If I have a simple grammar cant I just parse it with flex?::  
* Why doesnt yyrestart() set the start state back to INITIAL?::  
* How can I match C-style comments?::  
* The period isnt working the way I expected.::  
* Can I get the flex manual in another format?::  
* Does there exist a "faster" NDFA->DFA algorithm?::  
* How does flex compile the DFA so quickly?::  
* How can I use more than 8192 rules?::  
* How do I abandon a file in the middle of a scan and switch to a new file?::  
* How do I execute code only during initialization (only before the first scan)?::  
* How do I execute code at termination?::  
* Where else can I find help?::  
* Can I include comments in the "rules" section of the file file?::  
* I get an error about undefined yywrap().::  
* How can I change the matching pattern at run time?::  
* Is there a way to increase the rules (NFA states to a bigger number?)::  
* How can I expand macros in the input?::  
* How can I build a two-pass scanner?::  
* How do I match any string not matched in the preceding rules?::  
* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::  
* Is there a way to make flex treat NULL like a regular character?::  
* Whenever flex can not match the input it says "flex scanner jammed".::  
* Why doesnt flex have non-greedy operators like perl does?::  
* Memory leak - 16386 bytes allocated by malloc.::  
* How do I track the byte offset for lseek()?::  
* unnamed-faq-16::              
* How do I skip as many chars as possible?::  
* unnamed-faq-33::              
* unnamed-faq-42::              
* unnamed-faq-43::              
* unnamed-faq-44::              
* unnamed-faq-45::              
* unnamed-faq-46::              
* unnamed-faq-47::              
* unnamed-faq-48::              
* unnamed-faq-49::              
* unnamed-faq-50::              
* unnamed-faq-51::              
* unnamed-faq-52::              
* unnamed-faq-53::              
* unnamed-faq-54::              
* unnamed-faq-55::              
* unnamed-faq-56::              
* unnamed-faq-57::              
* unnamed-faq-58::              
* unnamed-faq-59::              
* unnamed-faq-60::              
* unnamed-faq-61::              
* unnamed-faq-62::              
* unnamed-faq-63::              
* unnamed-faq-64::              
* unnamed-faq-65::              
* unnamed-faq-66::              
* unnamed-faq-67::              
* unnamed-faq-68::              
* unnamed-faq-69::              
* unnamed-faq-70::              
* unnamed-faq-71::              
* unnamed-faq-72::              
* unnamed-faq-73::              
* unnamed-faq-74::              
* unnamed-faq-75::              
* unnamed-faq-76::              
* unnamed-faq-77::              
* unnamed-faq-78::              
* unnamed-faq-79::              
* unnamed-faq-80::              
* unnamed-faq-81::              
* unnamed-faq-82::              
* unnamed-faq-83::              
* unnamed-faq-84::              
* unnamed-faq-85::              
* unnamed-faq-86::              
* unnamed-faq-87::              
* unnamed-faq-88::              
* unnamed-faq-89::              
* unnamed-faq-90::              
* unnamed-faq-91::              
* unnamed-faq-92::              
* unnamed-faq-93::              
* unnamed-faq-94::              
* unnamed-faq-95::              
* unnamed-faq-96::              
* unnamed-faq-97::              
* unnamed-faq-98::              
* unnamed-faq-99::              
* unnamed-faq-100::             
* unnamed-faq-101::             
@end menu

@node  When was flex born?
@unnumberedsec When was flex born?

Vern Paxson took over
the @cite{Software Tools} lex project from Jef Poskanzer in 1982.  At that point it
was written in Ratfor.  Around 1987 or so, Paxson translated it into C, and
a legend was born :-).

@node How do I expand \ escape sequences in C-style quoted strings?
@unnumberedsec How do I expand \ escape sequences in C-style quoted strings?

A key point when scanning quoted strings is that you cannot (easily) write
a single rule that will precisely match the string if you allow things
like embedded escape sequences and newlines.  If you try to match strings
with a single rule then you'll wind up having to rescan the string anyway
to find any escape sequences.

Instead you can use exclusive start conditions and a set of rules, one for
matching non-escaped text, one for matching a single escape, one for
matching an embedded newline, and one for recognizing the end of the
string.  Each of these rules is then faced with the question of where to
put its intermediary results.  The best solution is for the rules to
append their local value of @code{yytext} to the end of a ``string literal''
buffer.  A rule like the escape-matcher will append to the buffer the
meaning of the escape sequence rather than the literal text in @code{yytext}.
In this way, @code{yytext} does not need to be modified at all.

@node  Why do flex scanners call fileno if it is not ANSI compatible?
@unnumberedsec Why do flex scanners call fileno if it is not ANSI compatible?

Flex scanners call @code{fileno()} in order to get the file descriptor
corresponding to @code{yyin}. The file descriptor may be passed to
@code{isatty()} or @code{read()}, depending upon which @code{%options} you specified.
If your system does not have @code{fileno()} support, to get rid of the
@code{read()} call, do not specify @code{%option read}. To get rid of the @code{isatty()}
call, you must specify one of @code{%option always-interactive} or
@code{%option never-interactive}.

@node  Does flex support recursive pattern definitions?
@unnumberedsec Does flex support recursive pattern definitions?

Does flex support recursive pattern definitions?
e.g.,

@example
@verbatim
%%
block   "{"({block}|{statement})*"}"
@end verbatim
@end example

No. You cannot have recursive definitions.  The pattern-matching power of
regular expressions in general (and therefore flex scanners, too) is
limited.  In particular, regular expressions cannot "balance" parentheses
to an arbitrary degree.  For example, it's impossible to write a regular
expression that matches all strings containing the same number of '@{'s
as '@}'s.  For more powerful pattern matching, you need a parser, such
as GNU bison.

@node  How do I skip huge chunks of input (tens of megabytes) while using flex?
@unnumberedsec How do I skip huge chunks of input (tens of megabytes) while using flex?

Use fseek (or lseek) to position yyin, then call yyrestart().

@node  Flex is not matching my patterns in the same order that I defined them.
@unnumberedsec Flex is not matching my patterns in the same order that I defined them.

Flex is not matching my patterns in the same order that I defined them.

This is indeed the natural way to expect it to work, however, flex picks the
rule that matches the most text (i.e., the longest possible input string).
This is because flex uses an entirely different matching technique
("deterministic finite automata") that actually does all of the matching
simultaneously, in parallel.  (Seems impossible, but it's actually a fairly
simple technique once you understand the principles.)

A side-effect of this parallel matching is that when the input matches more
than one rule, flex scanners pick the rule that matched the *most* text. This
is explained further in the manual, in the section "How the input
is Matched".

If you want flex to choose a shorter match, then you can work around this
behavior by expanding your short
rule to match more text, then put back the extra:

@example
@verbatim
data_.*        yyless( 5 ); BEGIN BLOCKIDSTATE;
@end verbatim
@end example

Another fix would be to make the second rule active only during the
<BLOCKIDSTATE> start condition, and make that start condition exclusive
by declaring it with %x instead of %s.

A final fix is to change the input language so that the ambiguity for
data_ is removed, by adding characters to it that don't match the
identifier rule, or by removing characters (such as '_') from the
identifier rule so it no longer matches "data_".  (Of course, you might
also not have the option of changing the input language ...)

@node  My actions are executing out of order or sometimes not at all.
@unnumberedsec My actions are executing out of order or sometimes not at all.

My actions are executing out of order or sometimes not at all. What's
happening?

Most likely, you have (in error) placed the opening @samp{@{} of the action
block on a different line than the rule, e.g.,

@example
@verbatim
^(foo|bar)
{  <<<--- WRONG!

}
@end verbatim
@end example

flex requires that the opening @samp{@{} of an action associated with a rule
begin on the same line as does the rule.  You need instead to write your rules
as follows:

@example
@verbatim
^(foo|bar)   {  // CORRECT!

}
@end verbatim
@end example

@node  How can I have multiple input sources feed into the same scanner at the same time?
@unnumberedsec How can I have multiple input sources feed into the same scanner at the same time?

How can I have multiple input sources feed into the same scanner at
the same time?

If...
@itemize
@item
your scanner is free of backtracking (verified using flex's -b flag),
@item
AND you run it interactively (-I option; default unless using special table
compression options),
@item
AND you feed it one character at a time by redefining YY_INPUT to do so,
@end itemize

then every time it matches a token, it will have exhausted its input
buffer (because the scanner is free of backtracking).  This means you
can safely use select() at the point and only call yylex() for another
token if select() indicates there's data available.

That is, move the select() out from the input function to a point where
it determines whether yylex() gets called for the next token.

With this approach, you will still have problems if your input can arrive
piecemeal; select() could inform you that the beginning of a token is
available, you call yylex() to get it, but it winds up blocking waiting
for the later characters in the token.

Here's another way:  Move your input multiplexing inside of YY_INPUT.  That
is, whenever YY_INPUT is called, it select()'s to see where input is
available.  If input is available for the scanner, it reads and returns the
next byte.  If input is available from another source, it calls whatever
function is responsible for reading from that source.  (If no input is
available, it blocks until some is.)  I've used this technique in an
interpreter I wrote that both reads keyboard input using a flex scanner and
IPC traffic from sockets, and it works fine.

@node  Can I build nested parsers that work with the same input file?
@unnumberedsec Can I build nested parsers that work with the same input file?

Can I build nested parsers that work with the same input file?

This is not going to work without some additional effort.  The reason is
that flex block-buffers the input it reads from yyin.  This means that the
"outermost" yylex(), when called, will automatically slurp up the first 8K
of input available on yyin, and subsequent calls to other yylex()'s won't
see that input.  You might be tempted to work around this problem by
redefining YY_INPUT to only return a small amount of text, but it turns out
that that approach is quite difficult.  Instead, the best solution is to
combine all of your scanners into one large scanner, using a different
exclusive start condition for each.

@node  How can I match text only at the end of a file?
@unnumberedsec How can I match text only at the end of a file?

How can I match text only at the end of a file?

There is no way to write a rule which is "match this text, but only if
it comes at the end of the file".  You can fake it, though, if you happen
to have a character lying around that you don't allow in your input.
Then you redefine YY_INPUT to call your own routine which, if it sees
an EOF, returns the magic character first (and remembers to return a
real EOF next time it's called).  Then you could write:

@example
@verbatim
<COMMENT>(.|\n)*{EOF_CHAR}    /* saw comment at EOF */
@end verbatim
@end example

@node  How can I make REJECT cascade across start condition boundaries?
@unnumberedsec How can I make REJECT cascade across start condition boundaries?

How can I make REJECT cascade across start condition boundaries?

You can do this as follows.  Suppose you have a start condition A, and
after exhausting all of the possible matches in <A>, you want to try
matches in <INITIAL>.  Then you could use the following:

@example
@verbatim
%x A
%%
<A>rule_that_is_long    ...; REJECT;
<A>rule                 ...; REJECT; /* shorter rule */
<A>etc.
...
<A>.|\n  {
/* Shortest and last rule in <A>, so
* cascaded REJECT's will eventually
* wind up matching this rule.  We want
* to now switch to the initial state
* and try matching from there instead.
*/
yyless(0);    /* put back matched text */
BEGIN(INITIAL);
}
@end verbatim
@end example

@node  Why cant I use fast or full tables with interactive mode?
@unnumberedsec Why can't I use fast or full tables with interactive mode?

One of the assumptions
flex makes is that interactive applications are inherently slow (they're
waiting on a human after all).
It has to do with how the scanner detects that it must be finished scanning
a token.  For interactive scanners, after scanning each character the current
state is looked up in a table (essentially) to see whether there's a chance
of another input character possibly extending the length of the match.  If
not, the scanner halts.  For non-interactive scanners, the end-of-token test
is much simpler, basically a compare with 0, so no memory bus cycles.  Since
the test occurs in the innermost scanning loop, one would like to make it go
as fast as possible.

Still, it seems reasonable to allow the user to choose to trade off a bit
of performance in this area to gain the corresponding flexibility.  There
might be another reason, though, why fast scanners don't support the
interactive option

@node  How much faster is -F or -f than -C?
@unnumberedsec How much faster is -F or -f than -C?

How much faster is -F or -f than -C?

Much faster (factor of 2-3).

@node  If I have a simple grammar cant I just parse it with flex?
@unnumberedsec If I have a simple grammar can't I just parse it with flex?

Is your grammar recursive? That's almost always a sign that you're
better off using a parser/scanner rather than just trying to use a scanner
alone.
@node  Why doesnt yyrestart() set the start state back to INITIAL?
@unnumberedsec Why doesn't yyrestart() set the start state back to INITIAL?

There are two reasons.  The first is that there might
be programs that rely on the start state not changing across file changes.
The second is that with flex 2.4, use of yyrestart() is no longer required,
so fixing the problem there doesn't solve the more general problem.

@node  How can I match C-style comments?
@unnumberedsec How can I match C-style comments?

How can I match C-style comments?

You might be tempted to try something like this:

@example
@verbatim
"/*".*"*/"       // WRONG!
@end verbatim
@end example

or, worse, this:

@example
@verbatim
"/*"(.|\n)"*/"   // WRONG!
@end verbatim
@end example

The above rules will eat too much input, and blow up on things like:

@example
@verbatim
/* a comment */ do_my_thing( "oops */" );
@end verbatim
@end example

Here is one way which allows you to track line information:

@example
@verbatim
<INITIAL>{
"/*"              BEGIN(IN_COMMENT);
}
<IN_COMMENT>{
"*/"      BEGIN(INITIAL);
[^*\n]+   // eat comment in chunks
"*"       // eat the lone star
\n        yylineno++;
}
@end verbatim
@end example

@node  The period isnt working the way I expected.
@unnumberedsec The '.' isn't working the way I expected.

Here are some tips for using @samp{.}:

@itemize
@item
A common mistake is to place the grouping parenthesis AFTER an operator, when
you really meant to place the parenthesis BEFORE the operator, e.g., you
probably want this @code{(foo|bar)+} and NOT this @code{(foo|bar+)}.

The first pattern matches the words @code{foo} or @code{bar} any number of
times, e.g., it matches the text @code{barfoofoobarfoo}. The
second pattern matches a single instance of @code{foo} or a single instance of
@code{ba} followed by one or more @samp{r}s, e.g., it matches the text @code{barrrr} .
@item
A @samp{.} inside []'s just means a literal@samp{.} (period),
and NOT "any character except newline".
@item
Remember that @samp{.} matches any character EXCEPT @samp{\n} (and EOF).
If you really want to match ANY character, including newlines, then use @code{(.|\n)}
--- Beware that the regex @code{(.|\n)+} will match your entire input!
@item
Finally, if you want to match a literal @samp{.} (a period), then use [.] or "."
@end itemize

@node  Can I get the flex manual in another format?
@unnumberedsec Can I get the flex manual in another format?

Can I get the flex manual in another format?

As of flex 2.5, the manual is distributed in texinfo format.
You can use the "texi2*" tools to convert the manual to any format
you desire (e.g., @samp{texi2html}).

@node  Does there exist a "faster" NDFA->DFA algorithm?
@unnumberedsec Does there exist a "faster" NDFA->DFA algorithm?

Does there exist a "faster" NDFA->DFA algorithm? Most standard texts (e.g.,
Aho), imply that NDFA->DFA can take exponential time, since there are
exponential number of potential states in NDFA.

There's no way around the potential exponential running time - it
can take you exponential time just to enumerate all of the DFA states.
In practice, though, the running time is closer to linear, or sometimes
quadratic.

@node  How does flex compile the DFA so quickly?
@unnumberedsec How does flex compile the DFA so quickly?

How does flex compile the DFA so quickly?

There are two big speed wins that flex uses:

@enumerate
@item
It analyzes the input rules to construct equivalence classes for those
characters that always make the same transitions.  It then rewrites the NFA
using equivalence classes for transitions instead of characters.  This cuts
down the NFA->DFA computation time dramatically, to the point where, for
uncompressed DFA tables, the DFA generation is often I/O bound in writing out
the tables.
@item
It maintains hash values for previously computed DFA states, so testing
whether a newly constructed DFA state is equivalent to a previously constructed
state can be done very quickly, by first comparing hash values.
@end enumerate

@node  How can I use more than 8192 rules?
@unnumberedsec How can I use more than 8192 rules?

How can I use more than 8192 rules?

Flex is compiled with an upper limit of 8192 rules per scanner.
If you need more than 8192 rules in your scanner, you'll have to recompile flex
with the following changes in flexdef.h:

@example
@verbatim
<    #define YY_TRAILING_MASK 0x2000
<    #define YY_TRAILING_HEAD_MASK 0x4000
--
>    #define YY_TRAILING_MASK 0x20000000
>    #define YY_TRAILING_HEAD_MASK 0x40000000
@end verbatim
@end example

This should work okay as long as your C compiler uses 32 bit integers.
But you might want to think about whether using such a huge number of rules
is the best way to solve your problem.

@node  How do I abandon a file in the middle of a scan and switch to a new file?
@unnumberedsec How do I abandon a file in the middle of a scan and switch to a new file?

How do I abandon a file in the middle of a scan and switch to a new file?

Just all yyrestart(newfile). Be sure to reset the start state if you want a
"fresh" start, since yyrestart does NOT reset the start state back to INITIAL.

@node  How do I execute code only during initialization (only before the first scan)?
@unnumberedsec How do I execute code only during initialization (only before the first scan)?

How do I execute code only during initialization (only before the first scan)?

You can specify an initial action by defining the macro YY_USER_INIT (though
note that yyout may not be available at the time this macro is executed).  Or you
can add to the beginning of your rules section:

@example
@verbatim
%%
/* Must be indented! */
static int did_init = 0;

if ( ! did_init ){
do_my_init();
did_init = 1;
}
@end verbatim
@end example

@node  How do I execute code at termination?
@unnumberedsec How do I execute code at termination?

How do I execute code at termination (i.e., only after the last scan?)

You can specifiy an action for the <<EOF>> rule.
@node  Where else can I find help?
@unnumberedsec Where else can I find help?

Where else can I find help?

The @code{help-flex} email list is served by GNU. See http://www.gnu.org/ for
details how to subscribe or search the archives.

@node  Can I include comments in the "rules" section of the file file?
@unnumberedsec Can I include comments in the "rules" section of the file file?

Can I include comments in the "rules" section of the file file?

Yes, just about anywhere you want to. See the manual for the specific syntax.

@node  I get an error about undefined yywrap().
@unnumberedsec I get an error about undefined yywrap().

I get an error about undefined yywrap().

You must supply a yywrap() function of your own, or link to libfl.a
(which provides one), or use

%option noyywrap

in your source to say you don't want a yywrap() function.
See the manual page for more details concerning yywrap().

@node  How can I change the matching pattern at run time?
@unnumberedsec How can I change the matching pattern at run time?

How can I change the matching pattern at run time?

You can't, it's compiled into a static table when flex builds the scanner.

@node  Is there a way to increase the rules (NFA states to a bigger number?)
@unnumberedsec Is there a way to increase the rules (NFA states to a bigger number?)

Is there a way to increase the rules (NFA states to a bigger number?)

With luck, you should be able to increase the definitions in flexdef.h for:

@example
@verbatim
#define JAMSTATE -32766 /* marks a reference to the state that always jams */
#define MAXIMUM_MNS 31999
#define BAD_SUBSCRIPT -32767
@end verbatim
@end example

recompile everything, and it'll all work.  Flex only has these 16-bit-like
values built into it because a long time ago it was developed on a machine
with 16-bit ints.  I've given this advice to others in the past but haven't
heard back from them whether it worked okay or not...

@node How can I expand macros in the input?
@unnumberedsec How can I expand macros in the input?

How can I expand macros in the input?

The best way to approach this problem is at a higher level, e.g., in the parser.

However, you can do this using multiple input buffers.

@example
@verbatim
%%
macro/[a-z]+	{
/* Saw the macro "macro" followed by extra stuff. */
main_buffer = YY_CURRENT_BUFFER;
expansion_buffer = yy_scan_string(expand(yytext));
yy_switch_to_buffer(expansion_buffer);
}

<<EOF>>	{
if ( expansion_buffer )
{
// We were doing an expansion, return to where
// we were.
yy_switch_to_buffer(main_buffer);
yy_delete_buffer(expansion_buffer);
expansion_buffer = 0;
}
else
yyterminate();
}
@end verbatim
@end example

You probably will want a stack of expansion buffers to allow nested macros.
From the above though hopefully the idea is clear.

@node How can I build a two-pass scanner?
@unnumberedsec How can I build a two-pass scanner?

How can I build a two-pass scanner?

One way to do it is to filter the first pass to a temporary file,
then process the temporary file on the second pass. You will probably see a
performance hit, do to all the disk I/O.

When you need to look ahead far forward like this, it almost always means
that the right solution is to build a parse tree of the entire input, then
walk it after the parse in order to generate the output.  In a sense, this
is a two-pass approach, once through the text and once through the parse
tree, but the performance hit for the latter is usually an order of magnitude
smaller, since everything is already classified, in binary format, and
residing in memory.

@node How do I match any string not matched in the preceding rules?
@unnumberedsec How do I match any string not matched in the preceding rules?

How do I match any string not matched in the preceding rules?

One way to assign precedence, is to place the more specific rules first. If
two rules would match the same input (same sequence of characters) then the
first rule listed in the flex input wins. e.g.,

@example
@verbatim
%%
foo[a-zA-Z_]+    return FOO_ID;
bar[a-zA-Z_]+    return BAR_ID;
[a-zA-Z_]+       return GENERIC_ID;
@end verbatim
@end example

Note that the rule @code{[a-zA-Z_]+} must come *after* the others.  It will match the
same amount of text as the more specific rules, and in that case the
flex scanner will pick the first rule listed in your scanner as the
one to match.

@node I am trying to port code from AT&T lex that uses yysptr and yysbuf.
@unnumberedsec I am trying to port code from AT&T lex that uses yysptr and yysbuf.

I am trying to port code from AT&T lex that uses yysptr and yysbuf.

Those are internal variables pointing into the AT&T scanner's input buffer.  I
imagine they're being manipulated in user versions of the input() and unput()
functions.  If so, what you need to do is analyze those functions to figure out
what they're doing, and then replace input() with an appropriate definition of
YY_INPUT (see the flex man page).  You shouldn't need to (and must not) replace
flex's unput() function.

@node Is there a way to make flex treat NULL like a regular character?
@unnumberedsec Is there a way to make flex treat NULL like a regular character?

Is there a way to make flex treat NULL like a regular character?

Yes, \0 and \x00 should both do the trick.  Perhaps you have an ancient
version of flex.  The latest release is version @value{VERSION}.

@node Whenever flex can not match the input it says "flex scanner jammed".
@unnumberedsec Whenever flex can not match the input it says "flex scanner jammed".

Whenever flex can not match the input it says "flex scanner jammed".

You need to add a rule that matches the otherwise-unmatched text.
e.g.,

@example
@verbatim
%option yylineno
%%
[[a bunch of rules here]]

.	printf("bad input character '%s' at line %d\n", yytext, yylineno);
@end verbatim
@end example

See %option default for more information.

@node Why doesnt flex have non-greedy operators like perl does?
@unnumberedsec Why doesn't flex have non-greedy operators like perl does?

A DFA can do a non-greedy match by stopping
the first time it enters an accepting state, instead of consuming input until
it determines that no further matching is possible (a ``jam'' state).  This
is actually easier to implement than longest leftmost match (which flex does).

But it's also much less useful than longest leftmost match.  In general,
when you find yourself wishing for non-greedy matching, that's usually a
sign that you're trying to make the scanner do some parsing.  That's
generally the wrong approach, since it lacks the power to do a decent job.
Better is to either introduce a separate parser, or to split the scanner
into multiple scanners using (exclusive) start conditions.

You might have
a separate start state once you've seen the BEGIN. In that state, you
might then have a regex that will match END (to kick you out of the
state), and perhaps (.|\n) to get a single character within the chunk ...

This approach also has much better error-reporting properties.

@node Memory leak - 16386 bytes allocated by malloc.
@unnumberedsec Memory leak - 16386 bytes allocated by malloc.
@anchor{faq-memory-leak}
UPDATED 2002-07-10: As of flex version 2.5.9, this leak means that you did not
call yylex_destroy(). If you are using an earlier version of flex, then read
on.

The leak is about 16426 bytes.  That is, (8192 * 2 + 2) for the read-buffer, and
about 40 for struct yy_buffer_state (depending upon alignment). The leak is in
the non-reentrant C scanner only (NOT in the reentrant scanner, NOT in the C++
scanner). Since flex doesn't know when you are done, the buffer is never freed.

However, the leak won't multiply since the buffer is reused no matter how many
times you call yylex().

If you want to reclaim the memory when you are completely done scanning, then
you might try this:

@example
@verbatim
/* For non-reentrant C scanner only. */
yy_delete_buffer(yy_current_buffer);
yy_init = 1;
@end verbatim
@end example

Note: yy_init is an "internal variable", and hasn't been tested in this
situation. It is possible that some other globals may need resetting as well.

@node How do I track the byte offset for lseek()?
@unnumberedsec How do I track the byte offset for lseek()?

@example
@verbatim
>   We thought that it would be possible to have this number through the
>   evaluation of the following expression:
>
>   seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - yy_current_buffer->yy_ch_buf
@end verbatim
@end example

While this is the right ideas, it has two problems.  The first is that
it's possible that flex will request less than YY_READ_BUF_SIZE during
an invocation of YY_INPUT (or that your input source will return less
even though YY_READ_BUF_SIZE bytes were requested).  The second problem
is that when refilling its internal buffer, flex keeps some characters
from the previous buffer (because usually it's in the middle of a match,
and needs those characters to construct yytext for the match once it's
done).  Because of this, yy_c_buf_p - yy_current_buffer->yy_ch_buf won't
be exactly the number of characters already read from the current buffer.

An alternative solution is to count the number of characters you've matched
since starting to scan.  This can be done by using YY_USER_ACTION.  For
example,

	#define YY_USER_ACTION num_chars += yyleng;

(You need to be careful to update your bookkeeping if you use yymore(),
yyless(), unput(), or input().)

@c TODO: Evaluate this faq.
@node unnamed-faq-16
@unnumberedsec unnamed-faq-16
@example
@verbatim
To: steves@telebase.com
Subject: Re: flex C++ question
In-reply-to: Your message of Thu, 08 Dec 94 13:10:58 EST.
Date: Wed, 14 Dec 94 16:40:47 PST
From: Vern Paxson <vern>

> We'd like to override the provided LexerInput() and LexerOutput()
> functions, but we'd like to *not* use iostreams.  Instead, we'd like
> to use some of our own I/O classes.  Is this possible?

You can do this by passing the various functions nil iostream*'s, and then
dealing with your own I/O classes surreptitiously (i.e., stashing them in
special member variables).  This works because the only assumption about
the lexer regarding what's done with the iostream's is that they're
ultimately passed to LexerInput and LexerOutput, which then do whatever
necessary with them.

When the flex C++ scanning class rewrite finally happens (no date for this
in sight), then this sort of thing should become much easier.

		Vern
@end verbatim
@end example

@node How do I skip as many chars as possible?
@unnumberedsec How do I skip as many chars as possible?

How do I skip as many chars as possible -- without interfering with the other
patterns?

In the example below, we want to skip over characters until we see the phrase
"endskip". The following will @emph{NOT} work correctly (do you see why not?)

@example
@verbatim
/* INCORRECT SCANNER */
%x SKIP
%%
<INITIAL>startskip   BEGIN(SKIP);
...
<SKIP>"endskip"       BEGIN(INITIAL);
<SKIP>.*             ;
@end verbatim
@end example

The problem is that the pattern .* will eat up the word "endskip."
The simplest (but slow) fix is:

@example
@verbatim
<SKIP>"endskip"      BEGIN(INITIAL);
<SKIP>.              ;
@end verbatim
@end example

The fix involves making the second rule match more, without
making it match "endskip" plus something else.  So for example:

@example
@verbatim
<SKIP>"endskip"     BEGIN(INITIAL);
<SKIP>[^e]+         ;
<SKIP>.		        ;/* so you eat up e's, too */
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-33
@unnumberedsec unnamed-faq-33
@example
@verbatim
QUESTION:
When was flex born?

Vern Paxson took over
the Software Tools lex project from Jef Poskanzer in 1982.  At that point it
was written in Ratfor.  Around 1987 or so, Paxson translated it into C, and
a legend was born :-).
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-42
@unnumberedsec unnamed-faq-42
@example
@verbatim
To: Adoram Rogel <adoram@orna.hybridge.com>
Subject: Re: Flex 2.5.2 performance questions
In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
Date: Wed, 18 Sep 96 10:51:02 PDT
From: Vern Paxson <vern>

[Note, the most recent flex release is 2.5.4, which you can get from
ftp.ee.lbl.gov.  It has bug fixes over 2.5.2 and 2.5.3.]

> 1. Using the pattern
>    ([Ff](oot)?)?[Nn](ote)?(\.)?
>    instead of
>    (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
>    (in a very complicated flex program) caused the program to slow from
>    300K+/min to 100K/min (no other changes were done).

These two are not equivalent.  For example, the first can match "footnote."
but the second can only match "footnote".  This is almost certainly the
cause in the discrepancy - the slower scanner run is matching more tokens,
and/or having to do more backing up.

> 2. Which of these two are better: [Ff]oot or (F|f)oot ?

From a performance point of view, they're equivalent (modulo presumably
minor effects such as memory cache hit rates; and the presence of trailing
context, see below).  From a space point of view, the first is slightly
preferable.

> 3. I have a pattern that look like this:
>    pats {p1}|{p2}|{p3}|...|{p50}     (50 patterns ORd)
>
>    running yet another complicated program that includes the following rule:
>    <snext>{and}/{no4}{bb}{pats}
>
>    gets me to "too complicated - over 32,000 states"...

I can't tell from this example whether the trailing context is variable-length
or fixed-length (it could be the latter if {and} is fixed-length).  If it's
variable length, which flex -p will tell you, then this reflects a basic
performance problem, and if you can eliminate it by restructuring your
scanner, you will see significant improvement.

>    so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
>    10 patterns and changed the rule to be 5 rules.
>    This did compile, but what is the rule of thumb here ?

The rule is to avoid trailing context other than fixed-length, in which for
a/b, either the 'a' pattern or the 'b' pattern have a fixed length.  Use
of the '|' operator automatically makes the pattern variable length, so in
this case '[Ff]oot' is preferred to '(F|f)oot'.

> 4. I changed a rule that looked like this:
>    <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
>
>    to the next 2 rules:
>    <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
>    <snext8>{and}{bb}/{ROMAN}         { BEGIN...
>
>    Again, I understand the using [^...] will cause a great performance loss

Actually, it doesn't cause any sort of performance loss.  It's a surprising
fact about regular expressions that they always match in linear time
regardless of how complex they are.

>    but are there any specific rules about it ?

See the "Performance Considerations" section of the man page, and also
the example in MISC/fastwc/.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-43
@unnumberedsec unnamed-faq-43
@example
@verbatim
To: Adoram Rogel <adoram@hybridge.com>
Subject: Re: Flex 2.5.2 performance questions
In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
Date: Thu, 19 Sep 96 09:58:00 PDT
From: Vern Paxson <vern>

> a lot about the backing up problem.
> I believe that there lies my biggest problem, and I'll try to improve
> it.

Since you have variable trailing context, this is a bigger performance
problem.  Fixing it is usually easier than fixing backing up, which in a
complicated scanner (yours seems to fit the bill) can be extremely
difficult to do correctly.

You also don't mention what flags you are using for your scanner.
-f makes a large speed difference, and -Cfe buys you nearly as much
speed but the resulting scanner is considerably smaller.

> I have an | operator in {and} and in {pats} so both of them are variable
> length.

-p should have reported this.

> Is changing one of them to fixed-length is enough ?

Yes.

> Is it possible to change the 32,000 states limit ?

Yes.  I've appended instructions on how.  Before you make this change,
though, you should think about whether there are ways to fundamentally
simplify your scanner - those are certainly preferable!

		Vern

To increase the 32K limit (on a machine with 32 bit integers), you increase
the magnitude of the following in flexdef.h:

#define JAMSTATE -32766 /* marks a reference to the state that always jams */
#define MAXIMUM_MNS 31999
#define BAD_SUBSCRIPT -32767
#define MAX_SHORT 32700

Adding a 0 or two after each should do the trick.
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-44
@unnumberedsec unnamed-faq-44
@example
@verbatim
To: Heeman_Lee@hp.com
Subject: Re: flex - multi-byte support?
In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
Date: Fri, 04 Oct 1996 11:42:18 PDT
From: Vern Paxson <vern>

>      I assume as long as my *.l file defines the
>      range of expected character code values (in octal format), flex will
>      scan the file and read multi-byte characters correctly. But I have no
>      confidence in this assumption.

Your lack of confidence is justified - this won't work.

Flex has in it a widespread assumption that the input is processed
one byte at a time.  Fixing this is on the to-do list, but is involved,
so it won't happen any time soon.  In the interim, the best I can suggest
(unless you want to try fixing it yourself) is to write your rules in
terms of pairs of bytes, using definitions in the first section:

	X	\xfe\xc2
	...
	%%
	foo{X}bar	found_foo_fe_c2_bar();

etc.  Definitely a pain - sorry about that.

By the way, the email address you used for me is ancient, indicating you
have a very old version of flex.  You can get the most recent, 2.5.4, from
ftp.ee.lbl.gov.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-45
@unnumberedsec unnamed-faq-45
@example
@verbatim
To: moleary@primus.com
Subject: Re: Flex / Unicode compatibility question
In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
Date: Tue, 22 Oct 1996 11:06:13 PDT
From: Vern Paxson <vern>

Unfortunately flex at the moment has a widespread assumption within it
that characters are processed 8 bits at a time.  I don't see any easy
fix for this (other than writing your rules in terms of double characters -
a pain).  I also don't know of a wider lex, though you might try surfing
the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
toolkit (try searching say Alta Vista for "Purdue Compiler Construction
Toolkit").

Fixing flex to handle wider characters is on the long-term to-do list.
But since flex is a strictly spare-time project these days, this probably
won't happen for quite a while, unless someone else does it first.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-46
@unnumberedsec unnamed-faq-46
@example
@verbatim
To: Johan Linde <jl@theophys.kth.se>
Subject: Re: translation of flex
In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
Date: Mon, 11 Nov 1996 10:33:50 PST
From: Vern Paxson <vern>

> I'm working for the Swedish team translating GNU program, and I'm currently
> working with flex. I have a few questions about some of the messages which
> I hope you can answer.

All of the things you're wondering about, by the way, concerning flex
internals - probably the only person who understands what they mean in
English is me!  So I wouldn't worry too much about getting them right.
That said ...

> #: main.c:545
> msgid "  %d protos created\n"
>
> Does proto mean prototype?

Yes - prototypes of state compression tables.

> #: main.c:539
> msgid "  %d/%d (peak %d) template nxt-chk entries created\n"
>
> Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
> However, 'template next-check entries' doesn't make much sense to me. To be
> able to find a good translation I need to know a little bit more about it.

There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
scanner tables.  It involves creating two pairs of tables.  The first has
"base" and "default" entries, the second has "next" and "check" entries.
The "base" entry is indexed by the current state and yields an index into
the next/check table.  The "default" entry gives what to do if the state
transition isn't found in next/check.  The "next" entry gives the next
state to enter, but only if the "check" entry verifies that this entry is
correct for the current state.  Flex creates templates of series of
next/check entries and then encodes differences from these templates as a
way to compress the tables.

> #: main.c:533
> msgid "  %d/%d base-def entries created\n"
>
> The same problem here for 'base-def'.

See above.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-47
@unnumberedsec unnamed-faq-47
@example
@verbatim
To: Xinying Li <xli@npac.syr.edu>
Subject: Re: FLEX ?
In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
Date: Wed, 13 Nov 1996 19:51:54 PST
From: Vern Paxson <vern>

> "unput()" them to input flow, question occurs. If I do this after I scan
> a carriage, the variable "yy_current_buffer->yy_at_bol" is changed. That
> means the carriage flag has gone.

You can control this by calling yy_set_bol().  It's described in the manual.

>      And if in pre-reading it goes to the end of file, is anything done
> to control the end of curren buffer and end of file?

No, there's no way to put back an end-of-file.

>      By the way I am using flex 2.5.2 and using the "-l".

The latest release is 2.5.4, by the way.  It fixes some bugs in 2.5.2 and
2.5.3.  You can get it from ftp.ee.lbl.gov.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-48
@unnumberedsec unnamed-faq-48
@example
@verbatim
To: Alain.ISSARD@st.com
Subject: Re: Start condition with FLEX
In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
Date: Mon, 18 Nov 1996 10:41:34 PST
From: Vern Paxson <vern>

> I am not able to use the start condition scope and to use the | (OR) with
> rules having start conditions.

The problem is that if you use '|' as a regular expression operator, for
example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
any blanks around it.  If you instead want the special '|' *action* (which
from your scanner appears to be the case), which is a way of giving two
different rules the same action:

	foo	|
	bar	matched_foo_or_bar();

then '|' *must* be separated from the first rule by whitespace and *must*
be followed by a new line.  You *cannot* write it as:

	foo | bar	matched_foo_or_bar();

even though you might think you could because yacc supports this syntax.
The reason for this unfortunately incompatibility is historical, but it's
unlikely to be changed.

Your problems with start condition scope are simply due to syntax errors
from your use of '|' later confusing flex.

Let me know if you still have problems.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-49
@unnumberedsec unnamed-faq-49
@example
@verbatim
To: Gregory Margo <gmargo@newton.vip.best.com>
Subject: Re: flex-2.5.3 bug report
In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
Date: Sat, 23 Nov 1996 17:07:32 PST
From: Vern Paxson <vern>

> Enclosed is a lex file that "real" lex will process, but I cannot get
> flex to process it.  Could you try it and maybe point me in the right direction?

Your problem is that some of the definitions in the scanner use the '/'
trailing context operator, and have it enclosed in ()'s.  Flex does not
allow this operator to be enclosed in ()'s because doing so allows undefined
regular expressions such as "(a/b)+".  So the solution is to remove the
parentheses.  Note that you must also be building the scanner with the -l
option for AT&T lex compatibility.  Without this option, flex automatically
encloses the definitions in parentheses.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-50
@unnumberedsec unnamed-faq-50
@example
@verbatim
To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
Subject: Re: Flex Bug ?
In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
Date: Tue, 26 Nov 1996 11:15:05 PST
From: Vern Paxson <vern>

> In my lexer code, i have the line :
> ^\*.*          { }
>
> Thus all lines starting with an astrix (*) are comment lines.
> This does not work !

I can't get this problem to reproduce - it works fine for me.  Note
though that if what you have is slightly different:

	COMMENT	^\*.*
	%%
	{COMMENT}	{ }

then it won't work, because flex pushes back macro definitions enclosed
in ()'s, so the rule becomes

	(^\*.*)		{ }

and now that the '^' operator is not at the immediate beginning of the
line, it's interpreted as just a regular character.  You can avoid this
behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-51
@unnumberedsec unnamed-faq-51
@example
@verbatim
To: Adoram Rogel <adoram@hybridge.com>
Subject: Re: Flex 2.5.4 BOF ???
In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
Date: Wed, 27 Nov 1996 10:56:25 PST
From: Vern Paxson <vern>

>     Organization(s)?/[a-z]
>
> This matched "Organizations" (looking in debug mode, the trailing s
> was matched with trailing context instead of the optional (s) in the
> end of the word.

That should only happen with lex.  Flex can properly match this pattern.
(That might be what you're saying, I'm just not sure.)

> Is there a way to avoid this dangerous trailing context problem ?

Unfortunately, there's no easy way.  On the other hand, I don't see why
it should be a problem.  Lex's matching is clearly wrong, and I'd hope
that usually the intent remains the same as expressed with the pattern,
so flex's matching will be correct.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-52
@unnumberedsec unnamed-faq-52
@example
@verbatim
To: Cameron MacKinnon <mackin@interlog.com>
Subject: Re: Flex documentation bug
In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
Date: Sun, 01 Dec 1996 22:29:39 PST
From: Vern Paxson <vern>

> I'm not sure how or where to submit bug reports (documentation or
> otherwise) for the GNU project stuff ...

Well, strictly speaking flex isn't part of the GNU project.  They just
distribute it because no one's written a decent GPL'd lex replacement.
So you should send bugs directly to me.  Those sent to the GNU folks
sometimes find there way to me, but some may drop between the cracks.

> In GNU Info, under the section 'Start Conditions', and also in the man
> page (mine's dated April '95) is a nice little snippet showing how to
> parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
> size. Unfortunately, no overflow checking is ever done ...

This is already mentioned in the manual:

Finally, here's an example of how to  match  C-style  quoted
strings using exclusive start conditions, including expanded
escape sequences (but not including checking  for  a  string
that's too long):

The reason for not doing the overflow checking is that it will needlessly
clutter up an example whose main purpose is just to demonstrate how to
use flex.

The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-53
@unnumberedsec unnamed-faq-53
@example
@verbatim
To: tsv@cs.UManitoba.CA
Subject: Re: Flex (reg)..
In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
Date: Thu, 06 Mar 1997 15:54:19 PST
From: Vern Paxson <vern>

> [:alpha:] ([:alnum:] | \\_)*

If your rule really has embedded blanks as shown above, then it won't
work, as the first blank delimits the rule from the action.  (It wouldn't
even compile ...)  You need instead:

[:alpha:]([:alnum:]|\\_)*

and that should work fine - there's no restriction on what can go inside
of ()'s except for the trailing context operator, '/'.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-54
@unnumberedsec unnamed-faq-54
@example
@verbatim
To: "Mike Stolnicki" <mstolnic@ford.com>
Subject: Re: FLEX help
In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
Date: Fri, 30 May 1997 10:46:35 PDT
From: Vern Paxson <vern>

> We'd like to add "if-then-else", "while", and "for" statements to our
> language ...
> We've investigated many possible solutions.  The one solution that seems
> the most reasonable involves knowing the position of a TOKEN in yyin.

I strongly advise you to instead build a parse tree (abstract syntax tree)
and loop over that instead.  You'll find this has major benefits in keeping
your interpreter simple and extensible.

That said, the functionality you mention for get_position and set_position
have been on the to-do list for a while.  As flex is a purely spare-time
project for me, no guarantees when this will be added (in particular, it
for sure won't be for many months to come).

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-55
@unnumberedsec unnamed-faq-55
@example
@verbatim
To: Colin Paul Adams <colin@colina.demon.co.uk>
Subject: Re: Flex C++ classes and Bison
In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
Date: Fri, 15 Aug 1997 10:48:19 PDT
From: Vern Paxson <vern>

> #define YY_DECL   int yylex (YYSTYPE *lvalp, struct parser_control
> *parm)
>
> I have been trying  to get this to work as a C++ scanner, but it does
> not appear to be possible (warning that it matches no declarations in
> yyFlexLexer, or something like that).
>
> Is this supposed to be possible, or is it being worked on (I DID
> notice the comment that scanner classes are still experimental, so I'm
> not too hopeful)?

What you need to do is derive a subclass from yyFlexLexer that provides
the above yylex() method, squirrels away lvalp and parm into member
variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-56
@unnumberedsec unnamed-faq-56
@example
@verbatim
To: Mikael.Latvala@lmf.ericsson.se
Subject: Re: Possible mistake in Flex v2.5 document
In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
Date: Fri, 05 Sep 1997 10:01:54 PDT
From: Vern Paxson <vern>

> In that example you show how to count comment lines when using
> C style /* ... */ comments. My question is, shouldn't you take into
> account a scenario where end of a comment marker occurs inside
> character or string literals?

The scanner certainly needs to also scan character and string literals.
However it does that (there's an example in the man page for strings), the
lexer will recognize the beginning of the literal before it runs across the
embedded "/*".  Consequently, it will finish scanning the literal before it
even considers the possibility of matching "/*".

Example:

	'([^']*|{ESCAPE_SEQUENCE})'

will match all the text between the ''s (inclusive).  So the lexer
considers this as a token beginning at the first ', and doesn't even
attempt to match other tokens inside it.

I thinnk this subtlety is not worth putting in the manual, as I suspect
it would confuse more people than it would enlighten.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-57
@unnumberedsec unnamed-faq-57
@example
@verbatim
To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
Subject: Re: flex limitations
In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
Date: Mon, 08 Sep 1997 11:38:08 PDT
From: Vern Paxson <vern>

> %%
> [a-zA-Z]+       /* skip a line */
>                 {  printf("got %s\n", yytext); }
> %%

What version of flex are you using?  If I feed this to 2.5.4, it complains:

	"bug.l", line 5: EOF encountered inside an action
	"bug.l", line 5: unrecognized rule
	"bug.l", line 5: fatal parse error

Not the world's greatest error message, but it manages to flag the problem.

(With the introduction of start condition scopes, flex can't accommodate
an action on a separate line, since it's ambiguous with an indented rule.)

You can get 2.5.4 from ftp.ee.lbl.gov.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-58
@unnumberedsec unnamed-faq-58
@example
@verbatim
To: uocarroll@deagostini.co.uk (Ultan O'Carroll)
Subject: Re: Flex repositries
In-reply-to: Your message of Fri, 12 Sep 1997 15:02:28 PDT.
Date: Fri, 12 Sep 1997 10:31:50 PDT
From: Vern Paxson <vern>

>      before I start beavering away I wonder if you know of any
>      place/libraries for flex
>      desciption files that might already do this or give me a head start ?

Unfortunately, no, I don't.  You might try asking on comp.compilers.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-59
@unnumberedsec unnamed-faq-59
@example
@verbatim
To: Adoram Rogel <adoram@hybridge.com>
Subject: Re: Conditional compiling in the definitions section
In-reply-to: Your message of Thu, 25 Sep 1997 11:22:42 PDT.
Date: Thu, 25 Sep 1997 10:56:31 PDT
From: Vern Paxson <vern>

> I'm trying to combine two large lex files that now differ only in
> about 10 lines in the definitions section.
> I would like to have something like this:
> #ifdef FFF
> it	\<IT\>
> #else
> it	\<I\>
> #endif
>
> Now, I can't add states for these, as I have already too many states
> and the program is very complicated, and I won't be able to handle
> 10 or 20 more states.
>
> Any trick to do this ?

You might try using m4, or the C preprocessor plus a sed script to
clean up the result (strip out the #line's).

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-60
@unnumberedsec unnamed-faq-60
@example
@verbatim
To: Steve Antoch <SteveAn@visio.com>
Subject: Re: lex and yacc grammars
In-reply-to: Your message of Mon, 17 Nov 1997 15:31:25 PST.
Date: Mon, 17 Nov 1997 15:27:01 PST
From: Vern Paxson <vern>

> Would you happen to know where I can find grammars for lex and yacc?

The flex sources have a grammar for (f)lex.  Dunno about yacc,

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-61
@unnumberedsec unnamed-faq-61
@example
@verbatim
To: Bryan Housel <bryan@drawcomp.com>
Subject: Re: Question about Flex v2.5
In-reply-to: Your message of Tue, 11 Nov 1997 21:30:23 PST.
Date: Mon, 17 Nov 1997 17:12:21 PST
From: Vern Paxson <vern>

> It prints one of those "end of buffer.." messages for each character in the
> token...

This will happen if your LexerInput() function returns only one character
at a time, which can happen either if you're scanner is "interactive", or
if the streams library on your platform always returns 1 for yyin->gcount().

Solution: override LexerInput() with a version that returns whole buffers.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-62
@unnumberedsec unnamed-faq-62
@example
@verbatim
To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
Subject: Re: Flex maximums
In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
Date: Mon, 17 Nov 1997 17:16:15 PST
From: Vern Paxson <vern>

> I took a quick look into the flex-sources and altered some #defines in
> flexdefs.h:
>
> 	#define INITIAL_MNS 64000
> 	#define MNS_INCREMENT 1024000
> 	#define MAXIMUM_MNS 64000

The things to fix are to add a couple of zeroes to:

#define JAMSTATE -32766 /* marks a reference to the state that always jams */
#define MAXIMUM_MNS 31999
#define BAD_SUBSCRIPT -32767
#define MAX_SHORT 32700

and, if you get complaints about too many rules, make the following change too:

	#define YY_TRAILING_MASK 0x200000
	#define YY_TRAILING_HEAD_MASK 0x400000

- Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-63
@unnumberedsec unnamed-faq-63
@example
@verbatim
To: jimmey@lexis-nexis.com (Jimmey Todd)
Subject: Re: FLEX question regarding istream vs ifstream
In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
Date: Mon, 15 Dec 1997 13:21:35 PST
From: Vern Paxson <vern>

>         stdin_handle = YY_CURRENT_BUFFER;
>         ifstream fin( "aFile" );
>         yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
>
> What I'm wanting to do, is pass the contents of a file thru one set
> of rules and then pass stdin thru another set... It works great if, I
> don't use the C++ classes. But since everything else that I'm doing is
> in C++, I thought I'd be consistent.
>
> The problem is that 'yy_create_buffer' is expecting an istream* as it's
> first argument (as stated in the man page). However, fin is a ifstream
> object. Any ideas on what I might be doing wrong? Any help would be
> appreciated. Thanks!!

You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
Then its type will be compatible with the expected istream*, because ifstream
is derived from istream.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-64
@unnumberedsec unnamed-faq-64
@example
@verbatim
To: Enda Fadian <fadiane@piercom.ie>
Subject: Re: Question related to Flex man page?
In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
Date: Tue, 16 Dec 1997 14:17:09 PST
From: Vern Paxson <vern>

> Can you explain to me what is ment by a long-jump in relation to flex?

Using the longjmp() function while inside yylex() or a routine called by it.

> what is the flex activation frame.

Just yylex()'s stack frame.

> As far as I can see yyrestart will bring me back to the sart of the input
> file and using flex++ isnot really an option!

No, yyrestart() doesn't imply a rewind, even though its name might sound
like it does.  It tells the scanner to flush its internal buffers and
start reading from the given file at its present location.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-65
@unnumberedsec unnamed-faq-65
@example
@verbatim
To: hassan@larc.info.uqam.ca (Hassan Alaoui)
Subject: Re: Need urgent Help
In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
Date: Sun, 21 Dec 1997 21:30:46 PST
From: Vern Paxson <vern>

> /usr/lib/yaccpar: In function `int yyparse()':
> /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
>
> ld: Undefined symbol
>    _yylex
>    _yyparse
>    _yyin

This is a known problem with Solaris C++ (and/or Solaris yacc).  I believe
the fix is to explicitly insert some 'extern "C"' statements for the
corresponding routines/symbols.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-66
@unnumberedsec unnamed-faq-66
@example
@verbatim
To: mc0307@mclink.it
Cc: gnu@prep.ai.mit.edu
Subject: Re: [mc0307@mclink.it: Help request]
In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
Date: Sun, 21 Dec 1997 22:33:37 PST
From: Vern Paxson <vern>

> This is my definition for float and integer types:
> . . .
> NZD          [1-9]
> ...
> I've tested my program on other lex version (on UNIX Sun Solaris an HP
> UNIX) and it work well, so I think that my definitions are correct.
> There are any differences between Lex and Flex?

There are indeed differences, as discussed in the man page.  The one
you are probably running into is that when flex expands a name definition,
it puts parentheses around the expansion, while lex does not.  There's
an example in the man page of how this can lead to different matching.
Flex's behavior complies with the POSIX standard (or at least with the
last POSIX draft I saw).

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-67
@unnumberedsec unnamed-faq-67
@example
@verbatim
To: hassan@larc.info.uqam.ca (Hassan Alaoui)
Subject: Re: Thanks
In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
Date: Mon, 22 Dec 1997 14:35:05 PST
From: Vern Paxson <vern>

> Thank you very much for your help. I compile and link well with C++ while
> declaring 'yylex ...' extern, But a little problem remains. I get a
> segmentation default when executing ( I linked with lfl library) while it
> works well when using LEX instead of flex. Do you have some ideas about the
> reason for this ?

The one possible reason for this that comes to mind is if you've defined
yytext as "extern char yytext[]" (which is what lex uses) instead of
"extern char *yytext" (which is what flex uses).  If it's not that, then
I'm afraid I don't know what the problem might be.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-68
@unnumberedsec unnamed-faq-68
@example
@verbatim
To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
Subject: Re: flex 2.5: c++ scanners & start conditions
In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
Date: Tue, 06 Jan 1998 19:19:30 PST
From: Vern Paxson <vern>

> The problem is that when I do this (using %option c++) start
> conditions seem to not apply.

The BEGIN macro modifies the yy_start variable.  For C scanners, this
is a static with scope visible through the whole file.  For C++ scanners,
it's a member variable, so it only has visible scope within a member
function.  Your lexbegin() routine is not a member function when you
build a C++ scanner, so it's not modifying the correct yy_start.  The
diagnostic that indicates this is that you found you needed to add
a declaration of yy_start in order to get your scanner to compile when
using C++; instead, the correct fix is to make lexbegin() a member
function (by deriving from yyFlexLexer).

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-69
@unnumberedsec unnamed-faq-69
@example
@verbatim
To: "Boris Zinin" <boris@ippe.rssi.ru>
Subject: Re: current position in flex buffer
In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
Date: Mon, 12 Jan 1998 12:03:15 PST
From: Vern Paxson <vern>

> The problem is how to determine the current position in flex active
> buffer when a rule is matched....

You will need to keep track of this explicitly, such as by redefining
YY_USER_ACTION to count the number of characters matched.

The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-70
@unnumberedsec unnamed-faq-70
@example
@verbatim
To: Bik.Dhaliwal@bis.org
Subject: Re: Flex question
In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
Date: Tue, 27 Jan 1998 22:41:52 PST
From: Vern Paxson <vern>

> That requirement involves knowing
> the character position at which a particular token was matched
> in the lexer.

The way you have to do this is by explicitly keeping track of where
you are in the file, by counting the number of characters scanned
for each token (available in yyleng).  It may prove convenient to
do this by redefining YY_USER_ACTION, as described in the manual.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-71
@unnumberedsec unnamed-faq-71
@example
@verbatim
To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
Subject: Re: flex: how to control start condition from parser?
In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
Date: Tue, 27 Jan 1998 22:45:37 PST
From: Vern Paxson <vern>

> It seems useful for the parser to be able to tell the lexer about such
> context dependencies, because then they don't have to be limited to
> local or sequential context.

One way to do this is to have the parser call a stub routine that's
included in the scanner's .l file, and consequently that has access ot
BEGIN.  The only ugliness is that the parser can't pass in the state
it wants, because those aren't visible - but if you don't have many
such states, then using a different set of names doesn't seem like
to much of a burden.

While generating a .h file like you suggests is certainly cleaner,
flex development has come to a virtual stand-still :-(, so a workaround
like the above is much more pragmatic than waiting for a new feature.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-72
@unnumberedsec unnamed-faq-72
@example
@verbatim
To: Barbara Denny <denny@3com.com>
Subject: Re: freebsd flex bug?
In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
Date: Fri, 30 Jan 1998 12:42:32 PST
From: Vern Paxson <vern>

> lex.yy.c:1996: parse error before `='

This is the key, identifying this error.  (It may help to pinpoint
it by using flex -L, so it doesn't generate #line directives in its
output.)  I will bet you heavy money that you have a start condition
name that is also a variable name, or something like that; flex spits
out #define's for each start condition name, mapping them to a number,
so you can wind up with:

	%x foo
	%%
		...
	%%
	void bar()
		{
		int foo = 3;
		}

and the penultimate will turn into "int 1 = 3" after C preprocessing,
since flex will put "#define foo 1" in the generated scanner.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-73
@unnumberedsec unnamed-faq-73
@example
@verbatim
To: Maurice Petrie <mpetrie@infoscigroup.com>
Subject: Re: Lost flex .l file
In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
Date: Mon, 02 Feb 1998 11:15:12 PST
From: Vern Paxson <vern>

> I am curious as to
> whether there is a simple way to backtrack from the generated source to
> reproduce the lost list of tokens we are searching on.

In theory, it's straight-forward to go from the DFA representation
back to a regular-expression representation - the two are isomorphic.
In practice, a huge headache, because you have to unpack all the tables
back into a single DFA representation, and then write a program to munch
on that and translate it into an RE.

Sorry for the less-than-happy news ...

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-74
@unnumberedsec unnamed-faq-74
@example
@verbatim
To: jimmey@lexis-nexis.com (Jimmey Todd)
Subject: Re: Flex performance question
In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
Date: Thu, 19 Feb 1998 08:48:51 PST
From: Vern Paxson <vern>

> What I have found, is that the smaller the data chunk, the faster the
> program executes. This is the opposite of what I expected. Should this be
> happening this way?

This is exactly what will happen if your input file has embedded NULs.
From the man page:

A final note: flex is slow when matching NUL's, particularly
when  a  token  contains multiple NUL's.  It's best to write
rules which match short amounts of text if it's  anticipated
that the text will often include NUL's.

So that's the first thing to look for.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-75
@unnumberedsec unnamed-faq-75
@example
@verbatim
To: jimmey@lexis-nexis.com (Jimmey Todd)
Subject: Re: Flex performance question
In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
Date: Thu, 19 Feb 1998 15:42:25 PST
From: Vern Paxson <vern>

So there are several problems.

First, to go fast, you want to match as much text as possible, which
your scanners don't in the case that what they're scanning is *not*
a <RN> tag.  So you want a rule like:

	[^<]+

Second, C++ scanners are particularly slow if they're interactive,
which they are by default.  Using -B speeds it up by a factor of 3-4
on my workstation.

Third, C++ scanners that use the istream interface are slow, because
of how poorly implemented istream's are.  I built two versions of
the following scanner:

	%%
	.*\n
	.*
	%%

and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
The C++ istream version, using -B, takes 3.8 seconds.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-76
@unnumberedsec unnamed-faq-76
@example
@verbatim
To: "Frescatore, David (CRD, TAD)" <frescatore@exc01crdge.crd.ge.com>
Subject: Re: FLEX 2.5 & THE YEAR 2000
In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT.
Date: Wed, 03 Jun 1998 10:22:26 PDT
From: Vern Paxson <vern>

> I am researching the Y2K problem with General Electric R&D
> and need to know if there are any known issues concerning
> the above mentioned software and Y2K regardless of version.

There shouldn't be, all it ever does with the date is ask the system
for it and then print it out.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-77
@unnumberedsec unnamed-faq-77
@example
@verbatim
To: "Hans Dermot Doran" <htd@ibhdoran.com>
Subject: Re: flex problem
In-reply-to: Your message of Wed, 15 Jul 1998 21:30:13 PDT.
Date: Tue, 21 Jul 1998 14:23:34 PDT
From: Vern Paxson <vern>

> To overcome this, I gets() the stdin into a string and lex the string. The
> string is lexed OK except that the end of string isn't lexed properly
> (yy_scan_string()), that is the lexer dosn't recognise the end of string.

Flex doesn't contain mechanisms for recognizing buffer endpoints.  But if
you use fgets instead (which you should anyway, to protect against buffer
overflows), then the final \n will be preserved in the string, and you can
scan that in order to find the end of the string.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-78
@unnumberedsec unnamed-faq-78
@example
@verbatim
To: soumen@almaden.ibm.com
Subject: Re: Flex++ 2.5.3 instance member vs. static member
In-reply-to: Your message of Mon, 27 Jul 1998 02:10:04 PDT.
Date: Tue, 28 Jul 1998 01:10:34 PDT
From: Vern Paxson <vern>

> %{
> int mylineno = 0;
> %}
> ws      [ \t]+
> alpha   [A-Za-z]
> dig     [0-9]
> %%
>
> Now you'd expect mylineno to be a member of each instance of class
> yyFlexLexer, but is this the case?  A look at the lex.yy.cc file seems to
> indicate otherwise; unless I am missing something the declaration of
> mylineno seems to be outside any class scope.
>
> How will this work if I want to run a multi-threaded application with each
> thread creating a FlexLexer instance?

Derive your own subclass and make mylineno a member variable of it.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-79
@unnumberedsec unnamed-faq-79
@example
@verbatim
To: Adoram Rogel <adoram@hybridge.com>
Subject: Re: More than 32K states change hangs
In-reply-to: Your message of Tue, 04 Aug 1998 16:55:39 PDT.
Date: Tue, 04 Aug 1998 22:28:45 PDT
From: Vern Paxson <vern>

> Vern Paxson,
>
> I followed your advice, posted on Usenet bu you, and emailed to me
> personally by you, on how to overcome the 32K states limit. I'm running
> on Linux machines.
> I took the full source of version 2.5.4 and did the following changes in
> flexdef.h:
> #define JAMSTATE -327660
> #define MAXIMUM_MNS 319990
> #define BAD_SUBSCRIPT -327670
> #define MAX_SHORT 327000
>
> and compiled.
> All looked fine, including check and bigcheck, so I installed.

Hmmm, you shouldn't increase MAX_SHORT, though looking through my email
archives I see that I did indeed recommend doing so.  Try setting it back
to 32700; that should suffice that you no longer need -Ca.  If it still
hangs, then the interesting question is - where?

> Compiling the same hanged program with a out-of-the-box (RedHat 4.2
> distribution of Linux)
> flex 2.5.4 binary works.

Since Linux comes with source code, you should diff it against what
you have to see what problems they missed.

> Should I always compile with the -Ca option now ? even short and simple
> filters ?

No, definitely not.  It's meant to be for those situations where you
absolutely must squeeze every last cycle out of your scanner.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-80
@unnumberedsec unnamed-faq-80
@example
@verbatim
To: "Schmackpfeffer, Craig" <Craig.Schmackpfeffer@usa.xerox.com>
Subject: Re: flex output for static code portion
In-reply-to: Your message of Tue, 11 Aug 1998 11:55:30 PDT.
Date: Mon, 17 Aug 1998 23:57:42 PDT
From: Vern Paxson <vern>

> I would like to use flex under the hood to generate a binary file
> containing the data structures that control the parse.

This has been on the wish-list for a long time.  In principle it's
straight-forward - you redirect mkdata() et al's I/O to another file,
and modify the skeleton to have a start-up function that slurps these
into dynamic arrays.  The concerns are (1) the scanner generation code
is hairy and full of corner cases, so it's easy to get surprised when
going down this path :-( ; and (2) being careful about buffering so
that when the tables change you make sure the scanner starts in the
correct state and reading at the right point in the input file.

> I was wondering if you know of anyone who has used flex in this way.

I don't - but it seems like a reasonable project to undertake (unlike
numerous other flex tweaks :-).

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-81
@unnumberedsec unnamed-faq-81
@example
@verbatim
Received: from 131.173.17.11 (131.173.17.11 [131.173.17.11])
	by ee.lbl.gov (8.9.1/8.9.1) with ESMTP id AAA03838
	for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 00:47:57 -0700 (PDT)
Received: from hal.cl-ki.uni-osnabrueck.de (hal.cl-ki.Uni-Osnabrueck.DE [131.173.141.2])
	by deimos.rz.uni-osnabrueck.de (8.8.7/8.8.8) with ESMTP id JAA34694
	for <vern@ee.lbl.gov>; Thu, 20 Aug 1998 09:47:55 +0200
Received: (from georg@localhost) by hal.cl-ki.uni-osnabrueck.de (8.6.12/8.6.12) id JAA34834 for vern@ee.lbl.gov; Thu, 20 Aug 1998 09:47:54 +0200
From: Georg Rehm <georg@hal.cl-ki.uni-osnabrueck.de>
Message-Id: <199808200747.JAA34834@hal.cl-ki.uni-osnabrueck.de>
Subject: "flex scanner push-back overflow"
To: vern@ee.lbl.gov
Date: Thu, 20 Aug 1998 09:47:54 +0200 (MEST)
Reply-To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
X-NoJunk: Do NOT send commercial mail, spam or ads to this address!
X-URL: http://www.cl-ki.uni-osnabrueck.de/~georg/
X-Mailer: ELM [version 2.4ME+ PL28 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Hi Vern,

Yesterday, I encountered a strange problem: I use the macro processor m4
to include some lengthy lists into a .l file. Following is a flex macro
definition that causes some serious pain in my neck:

AUTHOR           ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...])

The complete list contains about 10kB. When I try to "flex" this file
(on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased
some of the predefined values in flexdefs.h) I get the error:

myflex/flex -8  sentag.tmp.l
flex scanner push-back overflow

When I remove the slashes in the macro definition everything works fine.
As I understand it, the double quotes escape the slash-character so it
really means "/" and not "trailing context". Furthermore, I tried to
escape the slashes with backslashes, but with no use, the same error message
appeared when flexing the code.

Do you have an idea what's going on here?

Greetings from Germany,
	Georg
--
Georg Rehm                                     georg@cl-ki.uni-osnabrueck.de
Institute for Semantic Information Processing, University of Osnabrueck, FRG
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-82
@unnumberedsec unnamed-faq-82
@example
@verbatim
To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
Subject: Re: "flex scanner push-back overflow"
In-reply-to: Your message of Thu, 20 Aug 1998 09:47:54 PDT.
Date: Thu, 20 Aug 1998 07:05:35 PDT
From: Vern Paxson <vern>

> myflex/flex -8  sentag.tmp.l
> flex scanner push-back overflow

Flex itself uses a flex scanner.  That scanner is running out of buffer
space when it tries to unput() the humongous macro you've defined.  When
you remove the '/'s, you make it small enough so that it fits in the buffer;
removing spaces would do the same thing.

The fix is to either rethink how come you're using such a big macro and
perhaps there's another/better way to do it; or to rebuild flex's own
scan.c with a larger value for

	#define YY_BUF_SIZE 16384

- Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-83
@unnumberedsec unnamed-faq-83
@example
@verbatim
To: Jan Kort <jan@research.techforce.nl>
Subject: Re: Flex
In-reply-to: Your message of Fri, 04 Sep 1998 12:18:43 +0200.
Date: Sat, 05 Sep 1998 00:59:49 PDT
From: Vern Paxson <vern>

> %%
>
> "TEST1\n"       { fprintf(stderr, "TEST1\n"); yyless(5); }
> ^\n             { fprintf(stderr, "empty line\n"); }
> .               { }
> \n              { fprintf(stderr, "new line\n"); }
>
> %%
> -- input ---------------------------------------
> TEST1
> -- output --------------------------------------
> TEST1
> empty line
> ------------------------------------------------

IMHO, it's not clear whether or not this is in fact a bug.  It depends
on whether you view yyless() as backing up in the input stream, or as
pushing new characters onto the beginning of the input stream.  Flex
interprets it as the latter (for implementation convenience, I'll admit),
and so considers the newline as in fact matching at the beginning of a
line, as after all the last token scanned an entire line and so the
scanner is now at the beginning of a new line.

I agree that this is counter-intuitive for yyless(), given its
functional description (it's less so for unput(), depending on whether
you're unput()'ing new text or scanned text).  But I don't plan to
change it any time soon, as it's a pain to do so.  Consequently,
you do indeed need to use yy_set_bol() and YY_AT_BOL() to tweak
your scanner into the behavior you desire.

Sorry for the less-than-completely-satisfactory answer.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-84
@unnumberedsec unnamed-faq-84
@example
@verbatim
To: Patrick Krusenotto <krusenot@mac-info-link.de>
Subject: Re: Problems with restarting flex-2.5.2-generated scanner
In-reply-to: Your message of Thu, 24 Sep 1998 10:14:07 PDT.
Date: Thu, 24 Sep 1998 23:28:43 PDT
From: Vern Paxson <vern>

> I am using flex-2.5.2 and bison 1.25 for Solaris and I am desperately
> trying to make my scanner restart with a new file after my parser stops
> with a parse error. When my compiler restarts, the parser always
> receives the token after the token (in the old file!) that caused the
> parser error.

I suspect the problem is that your parser has read ahead in order
to attempt to resolve an ambiguity, and when it's restarted it picks
up with that token rather than reading a fresh one.  If you're using
yacc, then the special "error" production can sometimes be used to
consume tokens in an attempt to get the parser into a consistent state.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-85
@unnumberedsec unnamed-faq-85
@example
@verbatim
To: Henric Jungheim <junghelh@pe-nelson.com>
Subject: Re: flex 2.5.4a
In-reply-to: Your message of Tue, 27 Oct 1998 16:41:42 PST.
Date: Tue, 27 Oct 1998 16:50:14 PST
From: Vern Paxson <vern>

> This brings up a feature request:  How about a command line
> option to specify the filename when reading from stdin?  That way one
> doesn't need to create a temporary file in order to get the "#line"
> directives to make sense.

Use -o combined with -t (per the man page description of -o).

> P.S., Is there any simple way to use non-blocking IO to parse multiple
> streams?

Simple, no.

One approach might be to return a magic character on EWOULDBLOCK and
have a rule

	.*<magic-character>	// put back .*, eat magic character

This is off the top of my head, not sure it'll work.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-86
@unnumberedsec unnamed-faq-86
@example
@verbatim
To: "Repko, Billy D" <billy.d.repko@intel.com>
Subject: Re: Compiling scanners
In-reply-to: Your message of Wed, 13 Jan 1999 10:52:47 PST.
Date: Thu, 14 Jan 1999 00:25:30 PST
From: Vern Paxson <vern>

> It appears that maybe it cannot find the lfl library.

The Makefile in the distribution builds it, so you should have it.
It's exceedingly trivial, just a main() that calls yylex() and
a yyrap() that always returns 1.

> %%
>       \n      ++num_lines; ++num_chars;
>       .       ++num_chars;

You can't indent your rules like this - that's where the errors are coming
from.  Flex copies indented text to the output file, it's how you do things
like

	int num_lines_seen = 0;

to declare local variables.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-87
@unnumberedsec unnamed-faq-87
@example
@verbatim
To: Erick Branderhorst <Erick.Branderhorst@asml.nl>
Subject: Re: flex input buffer
In-reply-to: Your message of Tue, 09 Feb 1999 13:53:46 PST.
Date: Tue, 09 Feb 1999 21:03:37 PST
From: Vern Paxson <vern>

> In the flex.skl file the size of the default input buffers is set.  Can you
> explain why this size is set and why it is such a high number.

It's large to optimize performance when scanning large files.  You can
safely make it a lot lower if needed.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-88
@unnumberedsec unnamed-faq-88
@example
@verbatim
To: "Guido Minnen" <guidomi@cogs.susx.ac.uk>
Subject: Re: Flex error message
In-reply-to: Your message of Wed, 24 Feb 1999 15:31:46 PST.
Date: Thu, 25 Feb 1999 00:11:31 PST
From: Vern Paxson <vern>

> I'm extending a larger scanner written in Flex and I keep running into
> problems. More specifically, I get the error message:
> "flex: input rules are too complicated (>= 32000 NFA states)"

Increase the definitions in flexdef.h for:

#define JAMSTATE -32766 /* marks a reference to the state that always j
ams */
#define MAXIMUM_MNS 31999
#define BAD_SUBSCRIPT -32767

recompile everything, and it should all work.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-89
@unnumberedsec unnamed-faq-89
@example
@verbatim
To: John Victor J <vjohn@its.soft.net>
Subject: Re: flex---is thread safe
In-reply-to: Your message of Sun, 23 May 1999 12:56:56 +0530.
Date: Sun, 23 May 1999 00:32:53 PDT
From: Vern Paxson <vern>

>      I would like to know whether flex is thread safe???

I take it you mean the scanners it generates and not flex itself.

The answer is (still) No, except if you use the -+ option to generate
a C++ scanning class (and if your stream library is thread-safe).

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-90
@unnumberedsec unnamed-faq-90
@example
@verbatim
To: "Dmitriy Goldobin" <gold@ems.chel.su>
Subject: Re: FLEX trouble
In-reply-to: Your message of Mon, 31 May 1999 18:44:49 PDT.
Date: Tue, 01 Jun 1999 00:15:07 PDT
From: Vern Paxson <vern>

>   I have a trouble with FLEX. Why rule "/*".*"*/" work properly,=20
> but rule "/*"(.|\n)*"*/" don't work ?

The second of these will have to scan the entire input stream (because
"(.|\n)*" matches an arbitrary amount of any text) in order to see if
it ends with "*/", terminating the comment.  That potentially will overflow
the input buffer.

>   More complex rule "/*"([^*]|(\*/[^/]))*"*/ give an error
> 'unrecognized rule'.

You can't use the '/' operator inside parentheses.  It's not clear
what "(a/b)*" actually means.

>   I now use workaround with state <comment>, but single-rule is
> better, i think.

Single-rule is nice but will always have the problem of either setting
restrictions on comments (like not allowing multi-line comments) and/or
running the risk of consuming the entire input stream, as noted above.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-91
@unnumberedsec unnamed-faq-91
@example
@verbatim
Received: from mc-qout4.whowhere.com (mc-qout4.whowhere.com [209.185.123.18])
	by ee.lbl.gov (8.9.3/8.9.3) with SMTP id IAA05100
	for <vern@ee.lbl.gov>; Tue, 15 Jun 1999 08:56:06 -0700 (PDT)
Received: from Unknown/Local ([?.?.?.?]) by my-deja.com; Tue Jun 15 08:55:43 1999
To: vern@ee.lbl.gov
Date: Tue, 15 Jun 1999 08:55:43 -0700
From: "Aki Niimura" <neko@my-deja.com>
Message-ID: <KNONDOHDOBGAEAAA@my-deja.com>
Mime-Version: 1.0
Cc:
X-Sent-Mail: on
Reply-To:
X-Mailer: MailCity Service
Subject: A question on flex C++ scanner
X-Sender-Ip: 12.72.207.61
Organization: My Deja Email  (http://www.my-deja.com:80)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Dear Dr. Paxon,

I have been using flex for years.
It works very well on many projects.
Most case, I used it to generate a scanner on C language.
However, one project I needed to generate  a scanner
on C++ lanuage. Thanks to your enhancement, flex did
the job.

Currently, I'm working on enhancing my previous project.
I need to deal with multiple input streams (recursive
inclusion) in this scanner (C++).
I did similar thing for another scanner (C) as you
explained in your documentation.

The generated scanner (C++) has necessary methods:
- switch_to_buffer(struct yy_buffer_state *b)
- yy_create_buffer(istream *is, int sz)
- yy_delete_buffer(struct yy_buffer_state *b)

However, I couldn't figure out how to access current
buffer (yy_current_buffer).

yy_current_buffer is a protected member of yyFlexLexer.
I can't access it directly.
Then, I thought yy_create_buffer() with is = 0 might
return current stream buffer. But it seems not as far
as I checked the source. (flex 2.5.4)

I went through the Web in addition to Flex documentation.
However, it hasn't been successful, so far.

It is not my intention to bother you, but, can you
comment about how to obtain the current stream buffer?

Your response would be highly appreciated.

Best regards,
Aki Niimura

--== Sent via Deja.com http://www.deja.com/ ==--
Share what you know. Learn what you don't.
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-92
@unnumberedsec unnamed-faq-92
@example
@verbatim
To: neko@my-deja.com
Subject: Re: A question on flex C++ scanner
In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT.
Date: Tue, 15 Jun 1999 09:04:24 PDT
From: Vern Paxson <vern>

> However, I couldn't figure out how to access current
> buffer (yy_current_buffer).

Derive your own subclass from yyFlexLexer.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-93
@unnumberedsec unnamed-faq-93
@example
@verbatim
To: "Stones, Darren" <Darren.Stones@nectech.co.uk>
Subject: Re: You're the man to see?
In-reply-to: Your message of Wed, 23 Jun 1999 11:10:29 PDT.
Date: Wed, 23 Jun 1999 09:01:40 PDT
From: Vern Paxson <vern>

> I hope you can help me.  I am using Flex and Bison to produce an interpreted
> language.  However all goes well until I try to implement an IF statement or
> a WHILE.  I cannot get this to work as the parser parses all the conditions
> eg. the TRUE and FALSE conditons to check for a rule match.  So I cannot
> make a decision!!

You need to use the parser to build a parse tree (= abstract syntax trwee),
and when that's all done you recursively evaluate the tree, binding variables
to values at that time.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-94
@unnumberedsec unnamed-faq-94
@example
@verbatim
To: Petr Danecek <petr@ics.cas.cz>
Subject: Re: flex - question
In-reply-to: Your message of Mon, 28 Jun 1999 19:21:41 PDT.
Date: Fri, 02 Jul 1999 16:52:13 PDT
From: Vern Paxson <vern>

> file, it takes an enormous amount of time. It is funny, because the
> source code has only 12 rules!!! I think it looks like an exponencial
> growth.

Right, that's the problem - some patterns (those with a lot of
ambiguity, where yours has because at any given time the scanner can
be in the middle of all sorts of combinations of the different
rules) blow up exponentially.

For your rules, there is an easy fix.  Change the ".*" that comes fater
the directory name to "[^ ]*".  With that in place, the rules are no
longer nearly so ambiguous, because then once one of the directories
has been matched, no other can be matched (since they all require a
leading blank).

If that's not an acceptable solution, then you can enter a start state
to pick up the .*\n after each directory is matched.

Also note that for speed, you'll want to add a ".*" rule at the end,
otherwise rules that don't match any of the patterns will be matched
very slowly, a character at a time.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-95
@unnumberedsec unnamed-faq-95
@example
@verbatim
To: Tielman Koekemoer <tielman@spi.co.za>
Subject: Re: Please help.
In-reply-to: Your message of Thu, 08 Jul 1999 13:20:37 PDT.
Date: Thu, 08 Jul 1999 08:20:39 PDT
From: Vern Paxson <vern>

> I was hoping you could help me with my problem.
>
> I tried compiling (gnu)flex on a Solaris 2.4 machine
> but when I ran make (after configure) I got an error.
>
> --------------------------------------------------------------
> gcc -c -I. -I. -g -O parse.c
> ./flex -t -p  ./scan.l >scan.c
> sh: ./flex: not found
> *** Error code 1
> make: Fatal error: Command failed for target `scan.c'
> -------------------------------------------------------------
>
> What's strange to me is that I'm only
> trying to install flex now. I then edited the Makefile to
> and changed where it says "FLEX = flex" to "FLEX = lex"
> ( lex: the native Solaris one ) but then it complains about
> the "-p" option. Is there any way I can compile flex without
> using flex or lex?
>
> Thanks so much for your time.

You managed to step on the bootstrap sequence, which first copies
initscan.c to scan.c in order to build flex.  Try fetching a fresh
distribution from ftp.ee.lbl.gov.  (Or you can first try removing
".bootstrap" and doing a make again.)

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-96
@unnumberedsec unnamed-faq-96
@example
@verbatim
To: Tielman Koekemoer <tielman@spi.co.za>
Subject: Re: Please help.
In-reply-to: Your message of Fri, 09 Jul 1999 09:16:14 PDT.
Date: Fri, 09 Jul 1999 00:27:20 PDT
From: Vern Paxson <vern>

> First I removed .bootstrap (and ran make) - no luck. I downloaded the
> software but I still have the same problem. Is there anything else I
> could try.

Try:

	cp initscan.c scan.c
	touch scan.c
	make scan.o

If this last tries to first build scan.c from scan.l using ./flex, then
your "make" is broken, in which case compile scan.c to scan.o by hand.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-97
@unnumberedsec unnamed-faq-97
@example
@verbatim
To: Sumanth Kamenani <skamenan@crl.nmsu.edu>
Subject: Re: Error
In-reply-to: Your message of Mon, 19 Jul 1999 23:08:41 PDT.
Date: Tue, 20 Jul 1999 00:18:26 PDT
From: Vern Paxson <vern>

> I am getting a compilation error. The error is given as "unknown symbol- yylex".

The parser relies on calling yylex(), but you're instead using the C++ scanning
class, so you need to supply a yylex() "glue" function that calls an instance
scanner of the scanner (e.g., "scanner->yylex()").

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-98
@unnumberedsec unnamed-faq-98
@example
@verbatim
To: daniel@synchrods.synchrods.COM (Daniel Senderowicz)
Subject: Re: lex
In-reply-to: Your message of Mon, 22 Nov 1999 11:19:04 PST.
Date: Tue, 23 Nov 1999 15:54:30 PST
From: Vern Paxson <vern>

Well, your problem is the

switch (yybgin-yysvec-1) {      /* witchcraft */

at the beginning of lex rules.  "witchcraft" == "non-portable".  It's
assuming knowledge of the AT&T lex's internal variables.

For flex, you can probably do the equivalent using a switch on YYSTATE.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-99
@unnumberedsec unnamed-faq-99
@example
@verbatim
To: archow@hss.hns.com
Subject: Re: Regarding distribution of flex and yacc based grammars
In-reply-to: Your message of Sun, 19 Dec 1999 17:50:24 +0530.
Date: Wed, 22 Dec 1999 01:56:24 PST
From: Vern Paxson <vern>

> When we provide the customer with an object code distribution, is it
> necessary for us to provide source
> for the generated C files from flex and bison since they are generated by
> flex and bison ?

For flex, no.  I don't know what the current state of this is for bison.

> Also, is there any requrirement for us to neccessarily  provide source for
> the grammar files which are fed into flex and bison ?

Again, for flex, no.

See the file "COPYING" in the flex distribution for the legalese.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-100
@unnumberedsec unnamed-faq-100
@example
@verbatim
To: Martin Gallwey <gallweym@hyperion.moe.ul.ie>
Subject: Re: Flex, and self referencing rules
In-reply-to: Your message of Sun, 20 Feb 2000 01:01:21 PST.
Date: Sat, 19 Feb 2000 18:33:16 PST
From: Vern Paxson <vern>

> However, I do not use unput anywhere. I do use self-referencing
> rules like this:
>
> UnaryExpr               ({UnionExpr})|("-"{UnaryExpr})

You can't do this - flex is *not* a parser like yacc (which does indeed
allow recursion), it is a scanner that's confined to regular expressions.

		Vern
@end verbatim
@end example

@c TODO: Evaluate this faq.
@node unnamed-faq-101
@unnumberedsec unnamed-faq-101
@example
@verbatim
To: slg3@lehigh.edu (SAMUEL L. GULDEN)
Subject: Re: Flex problem
In-reply-to: Your message of Thu, 02 Mar 2000 12:29:04 PST.
Date: Thu, 02 Mar 2000 23:00:46 PST
From: Vern Paxson <vern>

If this is exactly your program:

> digit [0-9]
> digits {digit}+
> whitespace [ \t\n]+
>
> %%
> "[" { printf("open_brac\n");}
> "]" { printf("close_brac\n");}
> "+" { printf("addop\n");}
> "*" { printf("multop\n");}
> {digits} { printf("NUMBER = %s\n", yytext);}
> whitespace ;

then the problem is that the last rule needs to be "{whitespace}" !

		Vern
@end verbatim
@end example