summaryrefslogtreecommitdiff
path: root/doc/flex.info-3
blob: 7884ba134eeb2d2f069e96fb6d89ff068a3322c4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
This is flex.info, produced by makeinfo version 4.5 from flex.texi.

INFO-DIR-SECTION Programming
START-INFO-DIR-ENTRY
* flex: (flex).      Fast lexical analyzer generator (lex replacement).
END-INFO-DIR-ENTRY


   The flex manual is placed under the same licensing conditions as the
rest of flex:

   Copyright (C) 1990, 1997 The Regents of the University of California.
All rights reserved.

   This code is derived from software contributed to Berkeley by Vern
Paxson.

   The United States Government has rights in this work pursuant to
contract no. DE-AC03-76SF00098 between the United States Department of
Energy and the University of California.

   Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

  1.  Redistributions of source code must retain the above copyright
     notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright
     notice, this list of conditions and the following disclaimer in the
     documentation and/or other materials provided with the
     distribution.
   Neither the name of the University nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

File: flex.info,  Node: Debugging Options,  Next: Miscellaneous Options,  Prev: Options for Scanner Speed and Size,  Up: Scanner Options

Debugging Options
=================

`-b, --backup, `%option backup''
     Generate backing-up information to `lex.backup'.  This is a list of
     scanner states which require backing up and the input characters on
     which they do so.  By adding rules one can remove backing-up
     states.  If _all_ backing-up states are eliminated and `-Cf' or
     `-CF' is used, the generated scanner will run faster (see the
     `--perf-report' flag).  Only users who wish to squeeze every last
     cycle out of their scanners need worry about this option.  (*note
     Performance::).

`-d, --debug, `%option debug''
     makes the generated scanner run in "debug" mode.  Whenever a
     pattern is recognized and the global variable `yy_flex_debug' is
     non-zero (which is the default), the scanner will write to
     `stderr' a line of the form:


              -accepting rule at line 53 ("the matched text")

     The line number refers to the location of the rule in the file
     defining the scanner (i.e., the file that was fed to flex).
     Messages are also generated when the scanner backs up, accepts the
     default rule, reaches the end of its input buffer (or encounters a
     NUL; at this point, the two look the same as far as the scanner's
     concerned), or reaches an end-of-file.

`-p, --perf-report, `%option perf-report''
     generates a performance report to `stderr'.  The report consists of
     comments regarding features of the `flex' input file which will
     cause a serious loss of performance in the resulting scanner.  If
     you give the flag twice, you will also get comments regarding
     features that lead to minor performance losses.

     Note that the use of `REJECT', and variable trailing context
     (*note Limitations::) entails a substantial performance penalty;
     use of `yymore()', the `^' operator, and the `--interactive' flag
     entail minor performance penalties.

`-s, --nodefault, `%option nodefault''
     causes the _default rule_ (that unmatched scanner input is echoed
     to `stdout)' to be suppressed.  If the scanner encounters input
     that does not match any of its rules, it aborts with an error.
     This option is useful for finding holes in a scanner's rule set.

`-T, --trace, `%option trace''
     makes `flex' run in "trace" mode.  It will generate a lot of
     messages to `stderr' concerning the form of the input and the
     resultant non-deterministic and deterministic finite automata.
     This option is mostly for use in maintaining `flex'.

`-w, --nowarn, `%option nowarn''
     suppresses warning messages.

`-v, --verbose, `%option verbose''
     specifies that `flex' should write to `stderr' a summary of
     statistics regarding the scanner it generates.  Most of the
     statistics are meaningless to the casual `flex' user, but the
     first line identifies the version of `flex' (same as reported by
     `--version'), and the next line the flags used when generating the
     scanner, including those that are on by default.

`--warn, `%option warn''
     warn about certain things. In particular, if the default rule can
     be matched but no defualt rule has been given, the flex will warn
     you.  We recommend using this option always.



File: flex.info,  Node: Miscellaneous Options,  Prev: Debugging Options,  Up: Scanner Options

Miscellaneous Options
=====================

`-c'
     is a do-nothing option included for POSIX compliance.

     generates

`-h, -?, --help'
     generates a "help" summary of `flex''s options to `stdout' and
     then exits.

`-n'
     is another do-nothing option included only for POSIX compliance.

`-V, --version'
     prints the version number to `stdout' and exits.



File: flex.info,  Node: Performance,  Next: Cxx,  Prev: Scanner Options,  Up: Top

Performance Considerations
**************************

   The main design goal of `flex' is that it generate high-performance
scanners.  It has been optimized for dealing well with large sets of
rules.  Aside from the effects on scanner speed of the table compression
`-C' options outlined above, there are a number of options/actions
which degrade performance.  These are, from most expensive to least:


         REJECT
         arbitrary trailing context
     
         pattern sets that require backing up
         %option yylineno
         %array
     
         %option interactive
         %option always-interactive
     
         @samp{^} beginning-of-line operator
         yymore()

   with the first two all being quite expensive and the last two being
quite cheap.  Note also that `unput()' is implemented as a routine call
that potentially does quite a bit of work, while `yyless()' is a
quite-cheap macro. So if you are just putting back some excess text you
scanned, use `ss()'.

   `REJECT' should be avoided at all costs when performance is
important.  It is a particularly expensive option.

   There is one case when `%option yylineno' can be expensive. That is
when your patterns match long tokens that could _possibly_ contain a
newline character. There is no performance penalty for rules that can
not possibly match newlines, since flex does not need to check them for
newlines.  In general, you should avoid rules such as `[^f]+', which
match very long tokens, including newlines, and may possibly match your
entire file! A better approach is to separate `[^f]+' into two rules:


     %option yylineno
     %%
         [^f\n]+
         \n+

   The above scanner does not incur a performance penalty.

   Getting rid of backing up is messy and often may be an enormous
amount of work for a complicated scanner.  In principal, one begins by
using the `-b' flag to generate a `lex.backup' file.  For example, on
the input:


         %%
         foo        return TOK_KEYWORD;
         foobar     return TOK_KEYWORD;

   the file looks like:


         State #6 is non-accepting -
          associated rule line numbers:
                2       3
          out-transitions: [ o ]
          jam-transitions: EOF [ \001-n  p-\177 ]
     
         State #8 is non-accepting -
          associated rule line numbers:
                3
          out-transitions: [ a ]
          jam-transitions: EOF [ \001-`  b-\177 ]
     
         State #9 is non-accepting -
          associated rule line numbers:
                3
          out-transitions: [ r ]
          jam-transitions: EOF [ \001-q  s-\177 ]
     
         Compressed tables always back up.

   The first few lines tell us that there's a scanner state in which it
can make a transition on an 'o' but not on any other character, and
that in that state the currently scanned text does not match any rule.
The state occurs when trying to match the rules found at lines 2 and 3
in the input file.  If the scanner is in that state and then reads
something other than an 'o', it will have to back up to find a rule
which is matched.  With a bit of headscratching one can see that this
must be the state it's in when it has seen `fo'.  When this has
happened, if anything other than another `o' is seen, the scanner will
have to back up to simply match the `f' (by the default rule).

   The comment regarding State #8 indicates there's a problem when
`foob' has been scanned.  Indeed, on any character other than an `a',
the scanner will have to back up to accept "foo".  Similarly, the
comment for State #9 concerns when `fooba' has been scanned and an `r'
does not follow.

   The final comment reminds us that there's no point going to all the
trouble of removing backing up from the rules unless we're using `-Cf'
or `-CF', since there's no performance gain doing so with compressed
scanners.

   The way to remove the backing up is to add "error" rules:


         %%
         foo         return TOK_KEYWORD;
         foobar      return TOK_KEYWORD;
     
         fooba       |
         foob        |
         fo          {
                     /* false alarm, not really a keyword */
                     return TOK_ID;
                     }

   Eliminating backing up among a list of keywords can also be done
using a "catch-all" rule:


         %%
         foo         return TOK_KEYWORD;
         foobar      return TOK_KEYWORD;
     
         [a-z]+      return TOK_ID;

   This is usually the best solution when appropriate.

   Backing up messages tend to cascade.  With a complicated set of rules
it's not uncommon to get hundreds of messages.  If one can decipher
them, though, it often only takes a dozen or so rules to eliminate the
backing up (though it's easy to make a mistake and have an error rule
accidentally match a valid token.  A possible future `flex' feature
will be to automatically add rules to eliminate backing up).

   It's important to keep in mind that you gain the benefits of
eliminating backing up only if you eliminate _every_ instance of
backing up.  Leaving just one means you gain nothing.

   _Variable_ trailing context (where both the leading and trailing
parts do not have a fixed length) entails almost the same performance
loss as `REJECT' (i.e., substantial).  So when possible a rule like:


         %%
         mouse|rat/(cat|dog)   run();

   is better written:


         %%
         mouse/cat|dog         run();
         rat/cat|dog           run();

   or as


         %%
         mouse|rat/cat         run();
         mouse|rat/dog         run();

   Note that here the special '|' action does _not_ provide any
savings, and can even make things worse (*note Limitations::).

   Another area where the user can increase a scanner's performance (and
one that's easier to implement) arises from the fact that the longer the
tokens matched, the faster the scanner will run.  This is because with
long tokens the processing of most input characters takes place in the
(short) inner scanning loop, and does not often have to go through the
additional work of setting up the scanning environment (e.g., `yytext')
for the action.  Recall the scanner for C comments:


         %x comment
         %%
                 int line_num = 1;
     
         "/*"         BEGIN(comment);
     
         <comment>[^*\n]*
         <comment>"*"+[^*/\n]*
         <comment>\n             ++line_num;
         <comment>"*"+"/"        BEGIN(INITIAL);

   This could be sped up by writing it as:


         %x comment
         %%
                 int line_num = 1;
     
         "/*"         BEGIN(comment);
     
         <comment>[^*\n]*
         <comment>[^*\n]*\n      ++line_num;
         <comment>"*"+[^*/\n]*
         <comment>"*"+[^*/\n]*\n ++line_num;
         <comment>"*"+"/"        BEGIN(INITIAL);

   Now instead of each newline requiring the processing of another
action, recognizing the newlines is distributed over the other rules to
keep the matched text as long as possible.  Note that _adding_ rules
does _not_ slow down the scanner!  The speed of the scanner is
independent of the number of rules or (modulo the considerations given
at the beginning of this section) how complicated the rules are with
regard to operators such as `*' and `|'.

   A final example in speeding up a scanner: suppose you want to scan
through a file containing identifiers and keywords, one per line and
with no other extraneous characters, and recognize all the keywords.  A
natural first approach is:


         %%
         asm      |
         auto     |
         break    |
         ... etc ...
         volatile |
         while    /* it's a keyword */
     
         .|\n     /* it's not a keyword */

   To eliminate the back-tracking, introduce a catch-all rule:


         %%
         asm      |
         auto     |
         break    |
         ... etc ...
         volatile |
         while    /* it's a keyword */
     
         [a-z]+   |
         .|\n     /* it's not a keyword */

   Now, if it's guaranteed that there's exactly one word per line, then
we can reduce the total number of matches by a half by merging in the
recognition of newlines with that of the other tokens:


         %%
         asm\n    |
         auto\n   |
         break\n  |
         ... etc ...
         volatile\n |
         while\n  /* it's a keyword */
     
         [a-z]+\n |
         .|\n     /* it's not a keyword */

   One has to be careful here, as we have now reintroduced backing up
into the scanner.  In particular, while _we_ know that there will never
be any characters in the input stream other than letters or newlines,
`flex' can't figure this out, and it will plan for possibly needing to
back up when it has scanned a token like `auto' and then the next
character is something other than a newline or a letter.  Previously it
would then just match the `auto' rule and be done, but now it has no
`auto' rule, only a `auto\n' rule.  To eliminate the possibility of
backing up, we could either duplicate all rules but without final
newlines, or, since we never expect to encounter such an input and
therefore don't how it's classified, we can introduce one more
catch-all rule, this one which doesn't include a newline:


         %%
         asm\n    |
         auto\n   |
         break\n  |
         ... etc ...
         volatile\n |
         while\n  /* it's a keyword */
     
         [a-z]+\n |
         [a-z]+   |
         .|\n     /* it's not a keyword */

   Compiled with `-Cf', this is about as fast as one can get a `flex'
scanner to go for this particular problem.

   A final note: `flex' is slow when matching `NUL's, particularly when
a token contains multiple `NUL's.  It's best to write rules which match
_short_ amounts of text if it's anticipated that the text will often
include `NUL's.

   Another final note regarding performance: as mentioned in *Note
Matching::, dynamically resizing `yytext' to accommodate huge tokens is
a slow process because it presently requires that the (huge) token be
rescanned from the beginning.  Thus if performance is vital, you should
attempt to match "large" quantities of text but not "huge" quantities,
where the cutoff between the two is at about 8K characters per token.


File: flex.info,  Node: Cxx,  Next: Reentrant,  Prev: Performance,  Up: Top

Generating C++ Scanners
***********************

   *IMPORTANT*: the present form of the scanning class is _experimental_
and may change considerably between major releases.

   `flex' provides two different ways to generate scanners for use with
C++.  The first way is to simply compile a scanner generated by `flex'
using a C++ compiler instead of a C compiler.  You should not encounter
any compilation errors (*note Reporting Bugs::).  You can then use C++
code in your rule actions instead of C code.  Note that the default
input source for your scanner remains `yyin', and default echoing is
still done to `yyout'.  Both of these remain `FILE *' variables and not
C++ _streams_.

   You can also use `flex' to generate a C++ scanner class, using the
`-+' option (or, equivalently, `%option c++)', which is automatically
specified if the name of the `flex' executable ends in a '+', such as
`flex++'.  When using this option, `flex' defaults to generating the
scanner to the file `lex.yy.cc' instead of `lex.yy.c'.  The generated
scanner includes the header file `FlexLexer.h', which defines the
interface to two C++ classes.

   The first class, `FlexLexer', provides an abstract base class
defining the general scanner class interface.  It provides the
following member functions:

`const char* YYText()'
     returns the text of the most recently matched token, the
     equivalent of `yytext'.

`int YYLeng()'
     returns the length of the most recently matched token, the
     equivalent of `yyleng'.

`int lineno() const'
     returns the current input line number (see `%option yylineno)', or
     `1' if `%option yylineno' was not used.

`void set_debug( int flag )'
     sets the debugging flag for the scanner, equivalent to assigning to
     `yy_flex_debug' (*note Scanner Options::).  Note that you must
     build the scannerusing `%option debug' to include debugging
     information in it.

`int debug() const'
     returns the current setting of the debugging flag.

   Also provided are member functions equivalent to
`yy_switch_to_buffer()', `yy_create_buffer()' (though the first
argument is an `istream*' object pointer and not a `FILE*)',
`yy_flush_buffer()', `yy_delete_buffer()', and `yyrestart()' (again,
the first argument is a `istream*' object pointer).

   The second class defined in `FlexLexer.h' is `yyFlexLexer', which is
derived from `FlexLexer'.  It defines the following additional member
functions:

`yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
     constructs a `yyFlexLexer' object using the given streams for input
     and output.  If not specified, the streams default to `cin' and
     `cout', respectively.

`virtual int yylex()'
     performs the same role is `yylex()' does for ordinary `flex'
     scanners: it scans the input stream, consuming tokens, until a
     rule's action returns a value.  If you derive a subclass `S' from
     `yyFlexLexer' and want to access the member functions and variables
     of `S' inside `yylex()', then you need to use `%option
     yyclass="S"' to inform `flex' that you will be using that subclass
     instead of `yyFlexLexer'.  In this case, rather than generating
     `yyFlexLexer::yylex()', `flex' generates `S::yylex()' (and also
     generates a dummy `yyFlexLexer::yylex()' that calls
     `yyFlexLexer::LexerError()' if called).

`virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
     reassigns `yyin' to `new_in' (if non-null) and `yyout' to
     `new_out' (if non-null), deleting the previous input buffer if
     `yyin' is reassigned.

`int yylex( istream* new_in, ostream* new_out = 0 )'
     first switches the input streams via `switch_streams( new_in,
     new_out )' and then returns the value of `yylex()'.

   In addition, `yyFlexLexer' defines the following protected virtual
functions which you can redefine in derived classes to tailor the
scanner:

`virtual int LexerInput( char* buf, int max_size )'
     reads up to `max_size' characters into `buf' and returns the
     number of characters read.  To indicate end-of-input, return 0
     characters.  Note that `interactive' scanners (see the `-B' and
     `-I' flags in *Note Scanner Options::) define the macro
     `YY_INTERACTIVE'.  If you redefine `LexerInput()' and need to take
     different actions depending on whether or not the scanner might be
     scanning an interactive input source, you can test for the
     presence of this name via `#ifdef' statements.

`virtual void LexerOutput( const char* buf, int size )'
     writes out `size' characters from the buffer `buf', which, while
     `NUL'-terminated, may also contain internal `NUL's if the
     scanner's rules can match text with `NUL's in them.

`virtual void LexerError( const char* msg )'
     reports a fatal error message.  The default version of this
     function writes the message to the stream `cerr' and exits.

   Note that a `yyFlexLexer' object contains its _entire_ scanning
state.  Thus you can use such objects to create reentrant scanners, but
see also *Note Reentrant::.  You can instantiate multiple instances of
the same `yyFlexLexer' class, and you can also combine multiple C++
scanner classes together in the same program using the `-P' option
discussed above.

   Finally, note that the `%array' feature is not available to C++
scanner classes; you must use `%pointer' (the default).

   Here is an example of a simple C++ scanner:


             // An example of using the flex C++ scanner class.
     
         %{
         int mylineno = 0;
         %}
     
         string  \"[^\n"]+\"
     
         ws      [ \t]+
     
         alpha   [A-Za-z]
         dig     [0-9]
         name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
         num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
         num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
         number  {num1}|{num2}
     
         %%
     
         {ws}    /* skip blanks and tabs */
     
         "/*"    {
                 int c;
     
                 while((c = yyinput()) != 0)
                     {
                     if(c == '\n')
                         ++mylineno;
     
                     else if(c == @samp{*})
                         {
                         if((c = yyinput()) == '/')
                             break;
                         else
                             unput(c);
                         }
                     }
                 }
     
         {number}  cout  "number "  YYText()  '\n';
     
         \n        mylineno++;
     
         {name}    cout  "name "  YYText()  '\n';
     
         {string}  cout  "string "  YYText()  '\n';
     
         %%
     
         int main( int /* argc */, char** /* argv */ )
             {
             @code{flex}Lexer* lexer = new yyFlexLexer;
             while(lexer->yylex() != 0)
                 ;
             return 0;
             }

   If you want to create multiple (different) lexer classes, you use the
`-P' flag (or the `prefix=' option) to rename each `yyFlexLexer' to
some other `xxFlexLexer'.  You then can include `<FlexLexer.h>' in your
other sources once per lexer class, first renaming `yyFlexLexer' as
follows:


         #undef yyFlexLexer
         #define yyFlexLexer xxFlexLexer
         #include <FlexLexer.h>
     
         #undef yyFlexLexer
         #define yyFlexLexer zzFlexLexer
         #include <FlexLexer.h>

   if, for example, you used `%option prefix="xx"' for one of your
scanners and `%option prefix="zz"' for the other.


File: flex.info,  Node: Reentrant,  Next: Lex and Posix,  Prev: Cxx,  Up: Top

Reentrant C Scanners
********************

   `flex' has the ability to generate a reentrant C scanner. This is
accomplished by specifying `%option reentrant' (`-R') The generated
scanner is both portable, and safe to use in one or more separate
threads of control.  The most common use for reentrant scanners is from
within multi-threaded applications.  Any thread may create and execute
a reentrant `flex' scanner without the need for synchronization with
other threads.

* Menu:

* Reentrant Uses::
* Reentrant Overview::
* Reentrant Example::
* Reentrant Detail::
* Reentrant Functions::


File: flex.info,  Node: Reentrant Uses,  Next: Reentrant Overview,  Prev: Reentrant,  Up: Reentrant

Uses for Reentrant Scanners
===========================

   However, there are other uses for a reentrant scanner.  For example,
you could scan two or more files simultaneously to implement a `diff' at
the token level (i.e., instead of at the character level):


         /* Example of maintaining more than one active scanner. */
     
         do {
             int tok1, tok2;
     
             tok1 = yylex( scanner_1 );
             tok2 = yylex( scanner_2 );
     
             if( tok1 != tok2 )
                 printf("Files are different.");
     
        } while ( tok1 && tok2 );

   Another use for a reentrant scanner is recursion.  (Note that a
recursive scanner can also be created using a non-reentrant scanner and
buffer states. *Note Multiple Input Buffers::.)

   The following crude scanner supports the `eval' command by invoking
another instance of itself.


         /* Example of recursive invocation. */
     
         %option reentrant
     
         %%
         "eval(".+")"  {
                           yyscan_t scanner;
                           YY_BUFFER_STATE buf;
     
                           yylex_init( &scanner );
                           yytext[yyleng-1] = ' ';
     
                           buf = yy_scan_string( yytext + 5, scanner );
                           yylex( scanner );
     
                           yy_delete_buffer(buf,scanner);
                           yylex_destroy( scanner );
                      }
         ...
         %%


File: flex.info,  Node: Reentrant Overview,  Next: Reentrant Example,  Prev: Reentrant Uses,  Up: Reentrant

An Overview of the Reentrant API
================================

   The API for reentrant scanners is different than for non-reentrant
scanners. Here is a quick overview of the API:

     `%option reentrant' must be specified.

   * All functions take one additional argument: `yyscanner'

   * All global variables are replaced by their macro equivalents.  (We
     tell you this because it may be important to you during debugging.)

   * `yylex_init' and `yylex_destroy' must be called before and after
     `yylex', respectively.

   * Accessor methods (get/set functions) provide access to common
     `flex' variables.

   * User-specific data can be stored in `yyextra'.


File: flex.info,  Node: Reentrant Example,  Next: Reentrant Detail,  Prev: Reentrant Overview,  Up: Reentrant

Reentrant Example
=================

   First, an example of a reentrant scanner:

         /* This scanner prints "//" comments. */
         %option reentrant stack
         %x COMMENT
         %%
         "//"                 yy_push_state( COMMENT, yyscanner);
         .|\n
         <COMMENT>\n          yy_pop_state( yyscanner );
         <COMMENT>[^\n]+      fprintf( yyout, "%s\n", yytext);
         %%
         int main ( int argc, char * argv[] )
         {
             yyscan_t scanner;
     
             yylex_init ( &scanner );
             yylex ( scanner );
             yylex_destroy ( scanner );
         return 0;
        }


File: flex.info,  Node: Reentrant Detail,  Next: Reentrant Functions,  Prev: Reentrant Example,  Up: Reentrant

The Reentrant API in Detail
===========================

   Here are the things you need to do or know to use the reentrant C
API of `flex'.

* Menu:

* Specify Reentrant::
* Extra Reentrant Argument::
* Global Replacement::
* Init and Destroy Functions::
* Accessor Methods::
* Extra Data::
* About yyscan_t::


File: flex.info,  Node: Specify Reentrant,  Next: Extra Reentrant Argument,  Prev: Reentrant Detail,  Up: Reentrant Detail

Declaring a Scanner As Reentrant
--------------------------------

   %option reentrant (-reentrant) must be specified.

   Notice that `%option reentrant' is specified in the above example
(*note Reentrant Example::. Had this option not been specified, `flex'
would have happily generated a non-reentrant scanner without
complaining. You may explicitly specify `%option noreentrant', if you
do _not_ want a reentrant scanner, although it is not necessary. The
default is to generate a non-reentrant scanner.


File: flex.info,  Node: Extra Reentrant Argument,  Next: Global Replacement,  Prev: Specify Reentrant,  Up: Reentrant Detail

The Extra Argument
------------------

   All functions take one additional argument: `yyscanner'.

   Notice that the calls to `yy_push_state' and `yy_pop_state' both
have an argument, `yyscanner' , that is not present in a non-reentrant
scanner.  Here are the declarations of `yy_push_state' and
`yy_pop_state' in the generated scanner:


         static void yy_push_state  ( int new_state , yyscan_t yyscanner ) ;
         static void yy_pop_state  ( yyscan_t yyscanner  ) ;

   Notice that the argument `yyscanner' appears in the declaration of
both functions.  In fact, all `flex' functions in a reentrant scanner
have this additional argument.  It is always the last argument in the
argument list, it is always of type `yyscan_t' (which is typedef'd to
`void *') and it is always named `yyscanner'.  As you may have guessed,
`yyscanner' is a pointer to an opaque data structure encapsulating the
current state of the scanner.  For a list of function declarations, see
*Note Reentrant Functions::. Note that preprocessor macros, such as
`BEGIN', `ECHO', and `REJECT', do not take this additional argument.


File: flex.info,  Node: Global Replacement,  Next: Init and Destroy Functions,  Prev: Extra Reentrant Argument,  Up: Reentrant Detail

Global Variables Replaced By Macros
-----------------------------------

   All global variables in traditional flex have been replaced by macro
equivalents.

   Note that in the above example, `yyout' and `yytext' are not plain
variables. These are macros that will expand to their equivalent lvalue.
All of the familiar `flex' globals have been replaced by their macro
equivalents. In particular, `yytext', `yyleng', `yylineno', `yyin',
`yyout', `yyextra', `yylval', and `yylloc' are macros. You may safely
use these macros in actions as if they were plain variables. We only
tell you this so you don't expect to link to these variables
externally. Currently, each macro expands to a member of an internal
struct, e.g.,


     #define yytext (((struct yyguts_t*)yyscanner)->yytext_r)

   One important thing to remember about `yytext' and friends is that
`yytext' is not a global variable in a reentrant scanner, you can not
access it directly from outside an action or from other functions. You
must use an accessor method, e.g., `yyget_text', to accomplish this.
(See below).


File: flex.info,  Node: Init and Destroy Functions,  Next: Accessor Methods,  Prev: Global Replacement,  Up: Reentrant Detail

Init and Destroy Functions
--------------------------

   `yylex_init' and `yylex_destroy' must be called before and after
`yylex', respectively.


         int yylex_init ( yyscan_t * ptr_yy_globals ) ;
         int yylex ( yyscan_t yyscanner ) ;
         int yylex_destroy ( yyscan_t yyscanner ) ;

   The function `yylex_init' must be called before calling any other
function. The argument to `yylex_init' is the address of an
uninitialized pointer to be filled in by `flex'. The contents of
`ptr_yy_globals' need not be initialized, since `flex' will overwrite
it anyway. The value stored in `ptr_yy_globals' should thereafter be
passed to `yylex()' and yylex_destroy().  Flex does not save the
argument passed to `yylex_init', so it is safe to pass the address of a
local pointer to `yylex_init'.  The function `yylex' should be familiar
to you by now. The reentrant version takes one argument, which is the
value returned (via an argument) by `yylex_init'.  Otherwise, it
behaves the same as the non-reentrant version of `yylex'.

   `yylex_init' returns 0 (zero) on success, or non-zero on failure, in
which case, errno is set to one of the following values:

   * ENOMEM Memory allocation error. *Note memory-management::.

   * EINVAL Invalid argument.

   The function `yylex_destroy' should be called to free resources used
by the scanner. After `yylex_destroy' is called, the contents of
`yyscanner' should not be used.  Of course, there is no need to destroy
a scanner if you plan to reuse it.  A `flex' scanner (both reentrant
and non-reentrant) may be restarted by calling `yyrestart'.

   Below is an example of a program that creates a scanner, uses it,
then destroys it when done:


         int main ()
         {
             yyscan_t scanner;
             int tok;
     
             yylex_init(&scanner);
     
             while ((tok=yylex()) > 0)
                 printf("tok=%d  yytext=%s\n", tok, yyget_text(scanner));
     
             yylex_destroy(scanner);
             return 0;
         }


File: flex.info,  Node: Accessor Methods,  Next: Extra Data,  Prev: Init and Destroy Functions,  Up: Reentrant Detail

Accessing Variables with Reentrant Scanners
-------------------------------------------

   Accessor methods (get/set functions) provide access to common `flex'
variables.

   Many scanners that you build will be part of a larger project.
Portions of your project will need access to `flex' values, such as
`yytext'.  In a non-reentrant scanner, these values are global, so
there is no problem accessing them. However, in a reentrant scanner,
there are no global `flex' values. You can not access them directly.
Instead, you must access `flex' values using accessor methods (get/set
functions). Each accessor method is named `yyget_NAME' or `yyset_NAME',
where `NAME' is the name of the `flex' variable you want. For example:


         /* Set the last character of yytext to NULL. */
         void chop ( yyscan_t scanner )
         {
             int len = yyget_leng( scanner );
             yyget_text( scanner )[len - 1] = '\0';
         }

   The above code may be called from within an action like this:


         %%
         .+\n    { chop( yyscanner );}

   You may find that `%option header-file' is particularly useful for
generating prototypes of all the accessor functions. *Note
option-header::.


File: flex.info,  Node: Extra Data,  Next: About yyscan_t,  Prev: Accessor Methods,  Up: Reentrant Detail

Extra Data
----------

   User-specific data can be stored in `yyextra'.

   In a reentrant scanner, it is unwise to use global variables to
communicate with or maintain state between different pieces of your
program.  However, you may need access to external data or invoke
external functions from within the scanner actions.  Likewise, you may
need to pass information to your scanner (e.g., open file descriptors,
or database connections).  In a non-reentrant scanner, the only way to
do this would be through the use of global variables.  `Flex' allows
you to store arbitrary, "extra" data in a scanner.  This data is
accessible through the accessor methods `yyget_extra' and `yyset_extra'
from outside the scanner, and through the shortcut macro `yyextra' from
within the scanner itself. They are defined as follows:


         #define YY_EXTRA_TYPE  void*
         YY_EXTRA_TYPE  yyget_extra ( yyscan_t scanner );
         void           yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner);

   By default, `YY_EXTRA_TYPE' is defined as type `void *'.  You will
have to cast `yyextra' and the return value from `yyget_extra' to the
appropriate value each time you access the extra data.  To avoid
casting, you may override the default type by defining `YY_EXTRA_TYPE'
in section 1 of your scanner:


         /* An example of overriding YY_EXTRA_TYPE. */
         %{
         #include <sys/stat.h>
         #include <unistd.h>
         #define YY_EXTRA_TYPE  struct stat*
         %}
         %option reentrant
         %%
     
         __filesize__     printf( "%ld", yyextra->st_size  );
         __lastmod__      printf( "%ld", yyextra->st_mtime );
         %%
         void scan_file( char* filename )
         {
             yyscan_t scanner;
             struct stat buf;
     
             yylex_init ( &scanner );
             yyset_in( fopen(filename,"r"), scanner );
     
             stat( filename, &buf);
             yyset_extra( &buf, scanner );
             yylex ( scanner );
             yylex_destroy( scanner );
        }


File: flex.info,  Node: About yyscan_t,  Prev: Extra Data,  Up: Reentrant Detail

About yyscan_t
--------------

   `yyscan_t' is defined as:


          typedef void* yyscan_t;

   It is initialized by `yylex_init()' to point to an internal
structure. You should never access this value directly. In particular,
you should never attempt to free it (use `yylex_destroy()' instead.)


File: flex.info,  Node: Reentrant Functions,  Prev: Reentrant Detail,  Up: Reentrant

Functions and Macros Available in Reentrant C Scanners
======================================================

   The following Functions are available in a reentrant scanner:


         char *yyget_text ( yyscan_t scanner );
         int yyget_leng ( yyscan_t scanner );
         FILE *yyget_in ( yyscan_t scanner );
         FILE *yyget_out ( yyscan_t scanner );
         int yyget_lineno ( yyscan_t scanner );
         YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
         int  yyget_debug ( yyscan_t scanner );
     
         void yyset_debug ( int flag, yyscan_t scanner );
         void yyset_in  ( FILE * in_str , yyscan_t scanner );
         void yyset_out  ( FILE * out_str , yyscan_t scanner );
         void yyset_lineno ( int line_number , yyscan_t scanner );
         void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner );

   There are no "set" functions for yytext and yyleng. This is
intentional.

   The following Macro shortcuts are available in actions in a reentrant
scanner:


         yytext
         yyleng
         yyin
         yyout
         yylineno
         yyextra
         yy_flex_debug

   In a reentrant C scanner, support for yylineno is always present
(i.e., you may access yylineno), but the value is never modified by
`flex' unless `%option yylineno' is enabled. This is to allow the user
to maintain the line count independently of `flex'.

   The following functions and macros are made available when `%option
bison-bridge' (`--bison-bridge') is specified:


         YYSTYPE * yyget_lval ( yyscan_t scanner );
         void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner );
         yylval

   The following functions and macros are made available when `%option
bison-locations' (`--bison-locations') is specified:


         YYLTYPE *yyget_lloc ( yyscan_t scanner );
         void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner );
         yylloc

   Support for yylval assumes that `YYSTYPE' is a valid type.  Support
for yylloc assumes that `YYSLYPE' is a valid type.  Typically, these
types are generated by `bison', and are included in section 1 of the
`flex' input.


File: flex.info,  Node: Lex and Posix,  Next: Memory Management,  Prev: Reentrant,  Up: Top

Incompatibilities with Lex and Posix
************************************

   `flex' is a rewrite of the AT&T Unix _lex_ tool (the two
implementations do not share any code, though), with some extensions and
incompatibilities, both of which are of concern to those who wish to
write scanners acceptable to both implementations.  `flex' is fully
compliant with the POSIX `lex' specification, except that when using
`%pointer' (the default), a call to `unput()' destroys the contents of
`yytext', which is counter to the POSIX specification.  In this section
we discuss all of the known areas of incompatibility between `flex',
AT&T `lex', and the POSIX specification.  `flex''s `-l' option turns on
maximum compatibility with the original AT&T `lex' implementation, at
the cost of a major loss in the generated scanner's performance.  We
note below which incompatibilities can be overcome using the `-l'
option.  `flex' is fully compatible with `lex' with the following
exceptions:

   * The undocumented `lex' scanner internal variable `yylineno' is not
     supported unless `-l' or `%option yylineno' is used.

   * `yylineno' should be maintained on a per-buffer basis, rather than
     a per-scanner (single global variable) basis.

   * `yylineno' is not part of the POSIX specification.

   * The `input()' routine is not redefinable, though it may be called
     to read characters following whatever has been matched by a rule.
     If `input()' encounters an end-of-file the normal `yywrap()'
     processing is done.  A "real" end-of-file is returned by `input()'
     as `EOF'.

   * Input is instead controlled by defining the `YY_INPUT()' macro.

   * The `flex' restriction that `input()' cannot be redefined is in
     accordance with the POSIX specification, which simply does not
     specify any way of controlling the scanner's input other than by
     making an initial assignment to `yyin'.

   * The `unput()' routine is not redefinable.  This restriction is in
     accordance with POSIX.

   * `flex' scanners are not as reentrant as `lex' scanners.  In
     particular, if you have an interactive scanner and an interrupt
     handler which long-jumps out of the scanner, and the scanner is
     subsequently called again, you may get the following message:


              fatal @code{flex} scanner internal error--end of buffer missed

     To reenter the scanner, first use:


              yyrestart( yyin );

     Note that this call will throw away any buffered input; usually
     this isn't a problem with an interactive scanner. *Note
     Reentrant::, for `flex''s reentrant API.

   * Also note that `flex' C++ scanner classes _are_ reentrant, so if
     using C++ is an option for you, you should use them instead.
     *Note Cxx::, and *Note Reentrant::  for details.

   * `output()' is not supported.  Output from the ECHO macro is done
     to the file-pointer `yyout' (default `stdout)'.

   * `output()' is not part of the POSIX specification.

   * `lex' does not support exclusive start conditions (%x), though they
     are in the POSIX specification.

   * When definitions are expanded, `flex' encloses them in parentheses.
     With `lex', the following:


              NAME    [A-Z][A-Z0-9]*
              %%
              foo{NAME}?      printf( "Found it\n" );
              %%

     will not match the string `foo' because when the macro is expanded
     the rule is equivalent to `foo[A-Z][A-Z0-9]*?'  and the precedence
     is such that the `?' is associated with `[A-Z0-9]*'.  With `flex',
     the rule will be expanded to `foo([A-Z][A-Z0-9]*)?' and so the
     string `foo' will match.

   * Note that if the definition begins with `^' or ends with `$' then
     it is _not_ expanded with parentheses, to allow these operators to
     appear in definitions without losing their special meanings.  But
     the `<s>', `/', and `<<EOF>>' operators cannot be used in a `flex'
     definition.

   * Using `-l' results in the `lex' behavior of no parentheses around
     the definition.

   * The POSIX specification is that the definition be enclosed in
     parentheses.

   * Some implementations of `lex' allow a rule's action to begin on a
     separate line, if the rule's pattern has trailing whitespace:


              %%
              foo|bar<space here>
                { foobar_action();}

     `flex' does not support this feature.

   * The `lex' `%r' (generate a Ratfor scanner) option is not
     supported.  It is not part of the POSIX specification.

   * After a call to `unput()', _yytext_ is undefined until the next
     token is matched, unless the scanner was built using `%array'.
     This is not the case with `lex' or the POSIX specification.  The
     `-l' option does away with this incompatibility.

   * The precedence of the `{,}' (numeric range) operator is different.
     The AT&T and POSIX specifications of `lex' interpret `abc{1,3}'
     as match one, two, or three occurrences of `abc'", whereas `flex'
     interprets it as "match `ab' followed by one, two, or three
     occurrences of `c'".  The `-l' and `--posix' options do away with
     this incompatibility.

   * The precedence of the `^' operator is different.  `lex' interprets
     `^foo|bar' as "match either 'foo' at the beginning of a line, or
     'bar' anywhere", whereas `flex' interprets it as "match either
     `foo' or `bar' if they come at the beginning of a line".  The
     latter is in agreement with the POSIX specification.

   * The special table-size declarations such as `%a' supported by
     `lex' are not required by `flex' scanners..  `flex' ignores them.

   * The name `FLEX_SCANNER' is `#define''d so scanners may be written
     for use with either `flex' or `lex'.  Scanners also include
     `YY_FLEX_MAJOR_VERSION',  `YY_FLEX_MINOR_VERSION' and
     `YY_FLEX_SUBMINOR_VERSION' indicating which version of `flex'
     generated the scanner. For example, for the 2.5.22 release, these
     defines would be 2,  5 and 22 respectively. If the version of
     `flex' being used is a beta version, then the symbol `FLEX_BETA'
     is defined.

   * The symbols `[[' and `]]' in the code sections of the input may
     conflict with the m4 delimiters. *Note M4 Dependency::.


   The following `flex' features are not included in `lex' or the POSIX
specification:

   * C++ scanners

   * %option

   * start condition scopes

   * start condition stacks

   * interactive/non-interactive scanners

   * yy_scan_string() and friends

   * yyterminate()

   * yy_set_interactive()

   * yy_set_bol()

   * YY_AT_BOL()    <<EOF>>

   * <*>

   * YY_DECL

   * YY_START

   * YY_USER_ACTION

   * YY_USER_INIT

   * #line directives

   * %{}'s around actions

   * reentrant C API

   * multiple actions on a line

   * almost all of the `flex' command-line options

   The feature "multiple actions on a line" refers to the fact that
with `flex' you can put multiple actions on the same line, separated
with semi-colons, while with `lex', the following:


         foo    handle_foo(); ++num_foos_seen;

   is (rather surprisingly) truncated to


         foo    handle_foo();

   `flex' does not truncate the action.  Actions that are not enclosed
in braces are simply terminated at the end of the line.


File: flex.info,  Node: Memory Management,  Next: Serialized Tables,  Prev: Lex and Posix,  Up: Top

Memory Management
*****************

   This chapter describes how flex handles dynamic memory, and how you
can override the default behavior.

* Menu:

* The Default Memory Management::
* Overriding The Default Memory Management::
* A Note About yytext And Memory::


File: flex.info,  Node: The Default Memory Management,  Next: Overriding The Default Memory Management,  Prev: Memory Management,  Up: Memory Management

The Default Memory Management
=============================

   Flex allocates dynamic memory during initialization, and once in a
while from within a call to yylex(). Initialization takes place during
the first call to yylex(). Thereafter, flex may reallocate more memory
if it needs to enlarge a buffer. As of version 2.5.9 Flex will clean up
all memory when you call `yylex_destroy' *Note faq-memory-leak::.

   Flex allocates dynamic memory for four purposes, listed below (1)

16kB for the input buffer.
     Flex allocates memory for the character buffer used to perform
     pattern matching.  Flex must read ahead from the input stream and
     store it in a large character buffer.  This buffer is typically
     the largest chunk of dynamic memory flex consumes. This buffer
     will grow if necessary, doubling the size each time.  Flex frees
     this memory when you call yylex_destroy().  The default size of
     this buffer (16384 bytes) is almost always too large.  The ideal
     size for this buffer is the length of the longest token expected,
     in bytes, plus a little more.  Flex will allocate a few extra
     bytes for housekeeping. Currently, to override the size of the
     input buffer you must `#define YY_BUF_SIZE' to whatever number of
     bytes you want. We don't plan to change this in the near future,
     but we reserve the right to do so if we ever add a more robust
     memory management API.

64kb for the REJECT state. This will only be allocated if you use REJECT.
     The size is the large enough to hold the same number of states as
     characters in the input buffer. If you override the size of the
     input buffer (via `YY_BUF_SIZE'), then you automatically override
     the size of this buffer as well.

100 bytes for the start condition stack.
     Flex allocates memory for the start condition stack. This is the
     stack used for pushing start states, i.e., with yy_push_state().
     It will grow if necessary.  Since the states are simply integers,
     this stack doesn't consume much memory.  This stack is not present
     if `%option stack' is not specified.  You will rarely need to tune
     this buffer. The ideal size for this stack is the maximum depth
     expected.  The memory for this stack is automatically destroyed
     when you call yylex_destroy(). *Note option-stack::.

40 bytes for each YY_BUFFER_STATE.
     Flex allocates memory for each YY_BUFFER_STATE. The buffer state
     itself is about 40 bytes, plus an additional large character
     buffer (described above.)  The initial buffer state is created
     during initialization, and with each call to yy_create_buffer().
     You can't tune the size of this, but you can tune the character
     buffer as described above. Any buffer state that you explicitly
     create by calling yy_create_buffer() is _NOT_ destroyed
     automatically. You must call yy_delete_buffer() to free the
     memory. The exception to this rule is that flex will delete the
     current buffer automatically when you call yylex_destroy(). If you
     delete the current buffer, be sure to set it to NULL.  That way,
     flex will not try to delete the buffer a second time (possibly
     crashing your program!) At the time of this writing, flex does not
     provide a growable stack for the buffer states.  You have to
     manage that yourself.  *Note Multiple Input Buffers::.

84 bytes for the reentrant scanner guts
     Flex allocates about 84 bytes for the reentrant scanner structure
     when you call yylex_init(). It is destroyed when the user calls
     yylex_destroy().


   ---------- Footnotes ----------

   (1) The quantities given here are approximate, and may vary due to
host architecture, compiler configuration, or due to future
enhancements to flex.