summaryrefslogtreecommitdiff
path: root/doc/flex.info-5
blob: 8935ccf4b834ed6503da7ebcbebdc29864e3fede (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
This is flex.info, produced by makeinfo version 4.5 from flex.texi.

INFO-DIR-SECTION Programming
START-INFO-DIR-ENTRY
* flex: (flex).      Fast lexical analyzer generator (lex replacement).
END-INFO-DIR-ENTRY


   The flex manual is placed under the same licensing conditions as the
rest of flex:

   Copyright (C) 1990, 1997 The Regents of the University of California.
All rights reserved.

   This code is derived from software contributed to Berkeley by Vern
Paxson.

   The United States Government has rights in this work pursuant to
contract no. DE-AC03-76SF00098 between the United States Department of
Energy and the University of California.

   Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

  1.  Redistributions of source code must retain the above copyright
     notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright
     notice, this list of conditions and the following disclaimer in the
     documentation and/or other materials provided with the
     distribution.
   Neither the name of the University nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

File: flex.info,  Node: How do I match any string not matched in the preceding rules?,  Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Prev: How can I build a two-pass scanner?,  Up: FAQ

How do I match any string not matched in the preceding rules?
=============================================================

   One way to assign precedence, is to place the more specific rules
first. If two rules would match the same input (same sequence of
characters) then the first rule listed in the `flex' input wins. e.g.,


     %%
     foo[a-zA-Z_]+    return FOO_ID;
     bar[a-zA-Z_]+    return BAR_ID;
     [a-zA-Z_]+       return GENERIC_ID;

   Note that the rule `[a-zA-Z_]+' must come *after* the others.  It
will match the same amount of text as the more specific rules, and in
that case the `flex' scanner will pick the first rule listed in your
scanner as the one to match.


File: flex.info,  Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Next: Is there a way to make flex treat NULL like a regular character?,  Prev: How do I match any string not matched in the preceding rules?,  Up: FAQ

I am trying to port code from AT&T lex that uses yysptr and yysbuf.
===================================================================

   Those are internal variables pointing into the AT&T scanner's input
buffer.  I imagine they're being manipulated in user versions of the
`input()' and `unput()' functions.  If so, what you need to do is
analyze those functions to figure out what they're doing, and then
replace `input()' with an appropriate definition of `YY_INPUT'.  You
shouldn't need to (and must not) replace `flex''s `unput()' function.


File: flex.info,  Node: Is there a way to make flex treat NULL like a regular character?,  Next: Whenever flex can not match the input it says "flex scanner jammed".,  Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf.,  Up: FAQ

Is there a way to make flex treat NULL like a regular character?
================================================================

   Yes, `\0' and `\x00' should both do the trick.  Perhaps you have an
ancient version of `flex'.  The latest release is version 2.5.33.


File: flex.info,  Node: Whenever flex can not match the input it says "flex scanner jammed".,  Next: Why doesnt flex have non-greedy operators like perl does?,  Prev: Is there a way to make flex treat NULL like a regular character?,  Up: FAQ

Whenever flex can not match the input it says "flex scanner jammed".
====================================================================

   You need to add a rule that matches the otherwise-unmatched text.
e.g.,


     %option yylineno
     %%
     [[a bunch of rules here]]
     
     .	printf("bad input character '%s' at line %d\n", yytext, yylineno);

   See `%option default' for more information.


File: flex.info,  Node: Why doesnt flex have non-greedy operators like perl does?,  Next: Memory leak - 16386 bytes allocated by malloc.,  Prev: Whenever flex can not match the input it says "flex scanner jammed".,  Up: FAQ

Why doesn't flex have non-greedy operators like perl does?
==========================================================

   A DFA can do a non-greedy match by stopping the first time it enters
an accepting state, instead of consuming input until it determines that
no further matching is possible (a "jam" state).  This is actually
easier to implement than longest leftmost match (which flex does).

   But it's also much less useful than longest leftmost match.  In
general, when you find yourself wishing for non-greedy matching, that's
usually a sign that you're trying to make the scanner do some parsing.
That's generally the wrong approach, since it lacks the power to do a
decent job.  Better is to either introduce a separate parser, or to
split the scanner into multiple scanners using (exclusive) start
conditions.

   You might have a separate start state once you've seen the `BEGIN'.
In that state, you might then have a regex that will match `END' (to
kick you out of the state), and perhaps `(.|\n)' to get a single
character within the chunk ...

   This approach also has much better error-reporting properties.


File: flex.info,  Node: Memory leak - 16386 bytes allocated by malloc.,  Next: How do I track the byte offset for lseek()?,  Prev: Why doesnt flex have non-greedy operators like perl does?,  Up: FAQ

Memory leak - 16386 bytes allocated by malloc.
==============================================

   UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that
you did not call `yylex_destroy()'. If you are using an earlier version
of `flex', then read on.

   The leak is about 16426 bytes.  That is, (8192 * 2 + 2) for the
read-buffer, and about 40 for `struct yy_buffer_state' (depending upon
alignment). The leak is in the non-reentrant C scanner only (NOT in the
reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know
when you are done, the buffer is never freed.

   However, the leak won't multiply since the buffer is reused no
matter how many times you call `yylex()'.

   If you want to reclaim the memory when you are completely done
scanning, then you might try this:


     /* For non-reentrant C scanner only. */
     yy_delete_buffer(YY_CURRENT_BUFFER);
     yy_init = 1;

   Note: `yy_init' is an "internal variable", and hasn't been tested in
this situation. It is possible that some other globals may need
resetting as well.


File: flex.info,  Node: How do I track the byte offset for lseek()?,  Next: How do I use my own I/O classes in a C++ scanner?,  Prev: Memory leak - 16386 bytes allocated by malloc.,  Up: FAQ

How do I track the byte offset for lseek()?
===========================================


     >   We thought that it would be possible to have this number through the
     >   evaluation of the following expression:
     >
     >   seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf

   While this is the right idea, it has two problems.  The first is that
it's possible that `flex' will request less than `YY_READ_BUF_SIZE'
during an invocation of `YY_INPUT' (or that your input source will
return less even though `YY_READ_BUF_SIZE' bytes were requested).  The
second problem is that when refilling its internal buffer, `flex' keeps
some characters from the previous buffer (because usually it's in the
middle of a match, and needs those characters to construct `yytext' for
the match once it's done).  Because of this, `yy_c_buf_p -
YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
already read from the current buffer.

   An alternative solution is to count the number of characters you've
matched since starting to scan.  This can be done by using
`YY_USER_ACTION'.  For example,


     #define YY_USER_ACTION num_chars += yyleng;

   (You need to be careful to update your bookkeeping if you use
`yymore('), `yyless()', `unput()', or `input()'.)


File: flex.info,  Node: How do I use my own I/O classes in a C++ scanner?,  Next: How do I skip as many chars as possible?,  Prev: How do I track the byte offset for lseek()?,  Up: FAQ

How do I use my own I/O classes in a C++ scanner?
=================================================

   When the flex C++ scanning class rewrite finally happens, then this
sort of thing should become much easier.

   You can do this by passing the various functions (such as
`LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then
dealing with your own I/O classes surreptitiously (i.e., stashing them
in special member variables).  This works because the only assumption
about the lexer regarding what's done with the iostream's is that
they're ultimately passed to `LexerInput()' and `LexerOutput', which
then do whatever is necessary with them.


File: flex.info,  Node: How do I skip as many chars as possible?,  Next: deleteme00,  Prev: How do I use my own I/O classes in a C++ scanner?,  Up: FAQ

How do I skip as many chars as possible?
========================================

   How do I skip as many chars as possible - without interfering with
the other patterns?

   In the example below, we want to skip over characters until we see
the phrase "endskip". The following will _NOT_ work correctly (do you
see why not?)


     /* INCORRECT SCANNER */
     %x SKIP
     %%
     <INITIAL>startskip   BEGIN(SKIP);
     ...
     <SKIP>"endskip"       BEGIN(INITIAL);
     <SKIP>.*             ;

   The problem is that the pattern .* will eat up the word "endskip."
The simplest (but slow) fix is:


     <SKIP>"endskip"      BEGIN(INITIAL);
     <SKIP>.              ;

   The fix involves making the second rule match more, without making
it match "endskip" plus something else.  So for example:


     <SKIP>"endskip"     BEGIN(INITIAL);
     <SKIP>[^e]+         ;
     <SKIP>.		        ;/* so you eat up e's, too */


File: flex.info,  Node: deleteme00,  Next: Are certain equivalent patterns faster than others?,  Prev: How do I skip as many chars as possible?,  Up: FAQ

deleteme00
==========


     QUESTION:
     When was flex born?
     
     Vern Paxson took over
     the Software Tools lex project from Jef Poskanzer in 1982.  At that point it
     was written in Ratfor.  Around 1987 or so, Paxson translated it into C, and
     a legend was born :-).


File: flex.info,  Node: Are certain equivalent patterns faster than others?,  Next: Is backing up a big deal?,  Prev: deleteme00,  Up: FAQ

Are certain equivalent patterns faster than others?
===================================================


     To: Adoram Rogel <adoram@orna.hybridge.com>
     Subject: Re: Flex 2.5.2 performance questions
     In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
     Date: Wed, 18 Sep 96 10:51:02 PDT
     From: Vern Paxson <vern>
     
     [Note, the most recent flex release is 2.5.4, which you can get from
     ftp.ee.lbl.gov.  It has bug fixes over 2.5.2 and 2.5.3.]
     
     > 1. Using the pattern
     >    ([Ff](oot)?)?[Nn](ote)?(\.)?
     >    instead of
     >    (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
     >    (in a very complicated flex program) caused the program to slow from
     >    300K+/min to 100K/min (no other changes were done).
     
     These two are not equivalent.  For example, the first can match "footnote."
     but the second can only match "footnote".  This is almost certainly the
     cause in the discrepancy - the slower scanner run is matching more tokens,
     and/or having to do more backing up.
     
     > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
     
     From a performance point of view, they're equivalent (modulo presumably
     minor effects such as memory cache hit rates; and the presence of trailing
     context, see below).  From a space point of view, the first is slightly
     preferable.
     
     > 3. I have a pattern that look like this:
     >    pats {p1}|{p2}|{p3}|...|{p50}     (50 patterns ORd)
     >
     >    running yet another complicated program that includes the following rule:
     >    <snext>{and}/{no4}{bb}{pats}
     >
     >    gets me to "too complicated - over 32,000 states"...
     
     I can't tell from this example whether the trailing context is variable-length
     or fixed-length (it could be the latter if {and} is fixed-length).  If it's
     variable length, which flex -p will tell you, then this reflects a basic
     performance problem, and if you can eliminate it by restructuring your
     scanner, you will see significant improvement.
     
     >    so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
     >    10 patterns and changed the rule to be 5 rules.
     >    This did compile, but what is the rule of thumb here ?
     
     The rule is to avoid trailing context other than fixed-length, in which for
     a/b, either the 'a' pattern or the 'b' pattern have a fixed length.  Use
     of the '|' operator automatically makes the pattern variable length, so in
     this case '[Ff]oot' is preferred to '(F|f)oot'.
     
     > 4. I changed a rule that looked like this:
     >    <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
     >
     >    to the next 2 rules:
     >    <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
     >    <snext8>{and}{bb}/{ROMAN}         { BEGIN...
     >
     >    Again, I understand the using [^...] will cause a great performance loss
     
     Actually, it doesn't cause any sort of performance loss.  It's a surprising
     fact about regular expressions that they always match in linear time
     regardless of how complex they are.
     
     >    but are there any specific rules about it ?
     
     See the "Performance Considerations" section of the man page, and also
     the example in MISC/fastwc/.
     
     		Vern


File: flex.info,  Node: Is backing up a big deal?,  Next: Can I fake multi-byte character support?,  Prev: Are certain equivalent patterns faster than others?,  Up: FAQ

Is backing up a big deal?
=========================


     To: Adoram Rogel <adoram@hybridge.com>
     Subject: Re: Flex 2.5.2 performance questions
     In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
     Date: Thu, 19 Sep 96 09:58:00 PDT
     From: Vern Paxson <vern>
     
     > a lot about the backing up problem.
     > I believe that there lies my biggest problem, and I'll try to improve
     > it.
     
     Since you have variable trailing context, this is a bigger performance
     problem.  Fixing it is usually easier than fixing backing up, which in a
     complicated scanner (yours seems to fit the bill) can be extremely
     difficult to do correctly.
     
     You also don't mention what flags you are using for your scanner.
     -f makes a large speed difference, and -Cfe buys you nearly as much
     speed but the resulting scanner is considerably smaller.
     
     > I have an | operator in {and} and in {pats} so both of them are variable
     > length.
     
     -p should have reported this.
     
     > Is changing one of them to fixed-length is enough ?
     
     Yes.
     
     > Is it possible to change the 32,000 states limit ?
     
     Yes.  I've appended instructions on how.  Before you make this change,
     though, you should think about whether there are ways to fundamentally
     simplify your scanner - those are certainly preferable!
     
     		Vern
     
     To increase the 32K limit (on a machine with 32 bit integers), you increase
     the magnitude of the following in flexdef.h:
     
     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
     #define MAXIMUM_MNS 31999
     #define BAD_SUBSCRIPT -32767
     #define MAX_SHORT 32700
     
     Adding a 0 or two after each should do the trick.


File: flex.info,  Node: Can I fake multi-byte character support?,  Next: deleteme01,  Prev: Is backing up a big deal?,  Up: FAQ

Can I fake multi-byte character support?
========================================


     To: Heeman_Lee@hp.com
     Subject: Re: flex - multi-byte support?
     In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
     Date: Fri, 04 Oct 1996 11:42:18 PDT
     From: Vern Paxson <vern>
     
     >      I assume as long as my *.l file defines the
     >      range of expected character code values (in octal format), flex will
     >      scan the file and read multi-byte characters correctly. But I have no
     >      confidence in this assumption.
     
     Your lack of confidence is justified - this won't work.
     
     Flex has in it a widespread assumption that the input is processed
     one byte at a time.  Fixing this is on the to-do list, but is involved,
     so it won't happen any time soon.  In the interim, the best I can suggest
     (unless you want to try fixing it yourself) is to write your rules in
     terms of pairs of bytes, using definitions in the first section:
     
     	X	\xfe\xc2
     	...
     	%%
     	foo{X}bar	found_foo_fe_c2_bar();
     
     etc.  Definitely a pain - sorry about that.
     
     By the way, the email address you used for me is ancient, indicating you
     have a very old version of flex.  You can get the most recent, 2.5.4, from
     ftp.ee.lbl.gov.
     
     		Vern


File: flex.info,  Node: deleteme01,  Next: Can you discuss some flex internals?,  Prev: Can I fake multi-byte character support?,  Up: FAQ

deleteme01
==========


     To: moleary@primus.com
     Subject: Re: Flex / Unicode compatibility question
     In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
     Date: Tue, 22 Oct 1996 11:06:13 PDT
     From: Vern Paxson <vern>
     
     Unfortunately flex at the moment has a widespread assumption within it
     that characters are processed 8 bits at a time.  I don't see any easy
     fix for this (other than writing your rules in terms of double characters -
     a pain).  I also don't know of a wider lex, though you might try surfing
     the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
     toolkit (try searching say Alta Vista for "Purdue Compiler Construction
     Toolkit").
     
     Fixing flex to handle wider characters is on the long-term to-do list.
     But since flex is a strictly spare-time project these days, this probably
     won't happen for quite a while, unless someone else does it first.
     
     		Vern


File: flex.info,  Node: Can you discuss some flex internals?,  Next: unput() messes up yy_at_bol,  Prev: deleteme01,  Up: FAQ

Can you discuss some flex internals?
====================================


     To: Johan Linde <jl@theophys.kth.se>
     Subject: Re: translation of flex
     In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
     Date: Mon, 11 Nov 1996 10:33:50 PST
     From: Vern Paxson <vern>
     
     > I'm working for the Swedish team translating GNU program, and I'm currently
     > working with flex. I have a few questions about some of the messages which
     > I hope you can answer.
     
     All of the things you're wondering about, by the way, concerning flex
     internals - probably the only person who understands what they mean in
     English is me!  So I wouldn't worry too much about getting them right.
     That said ...
     
     > #: main.c:545
     > msgid "  %d protos created\n"
     >
     > Does proto mean prototype?
     
     Yes - prototypes of state compression tables.
     
     > #: main.c:539
     > msgid "  %d/%d (peak %d) template nxt-chk entries created\n"
     >
     > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
     > However, 'template next-check entries' doesn't make much sense to me. To be
     > able to find a good translation I need to know a little bit more about it.
     
     There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
     scanner tables.  It involves creating two pairs of tables.  The first has
     "base" and "default" entries, the second has "next" and "check" entries.
     The "base" entry is indexed by the current state and yields an index into
     the next/check table.  The "default" entry gives what to do if the state
     transition isn't found in next/check.  The "next" entry gives the next
     state to enter, but only if the "check" entry verifies that this entry is
     correct for the current state.  Flex creates templates of series of
     next/check entries and then encodes differences from these templates as a
     way to compress the tables.
     
     > #: main.c:533
     > msgid "  %d/%d base-def entries created\n"
     >
     > The same problem here for 'base-def'.
     
     See above.
     
     		Vern


File: flex.info,  Node: unput() messes up yy_at_bol,  Next: The | operator is not doing what I want,  Prev: Can you discuss some flex internals?,  Up: FAQ

unput() messes up yy_at_bol
===========================


     To: Xinying Li <xli@npac.syr.edu>
     Subject: Re: FLEX ?
     In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
     Date: Wed, 13 Nov 1996 19:51:54 PST
     From: Vern Paxson <vern>
     
     > "unput()" them to input flow, question occurs. If I do this after I scan
     > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
     > means the carriage flag has gone.
     
     You can control this by calling yy_set_bol().  It's described in the manual.
     
     >      And if in pre-reading it goes to the end of file, is anything done
     > to control the end of curren buffer and end of file?
     
     No, there's no way to put back an end-of-file.
     
     >      By the way I am using flex 2.5.2 and using the "-l".
     
     The latest release is 2.5.4, by the way.  It fixes some bugs in 2.5.2 and
     2.5.3.  You can get it from ftp.ee.lbl.gov.
     
     		Vern


File: flex.info,  Node: The | operator is not doing what I want,  Next: Why can't flex understand this variable trailing context pattern?,  Prev: unput() messes up yy_at_bol,  Up: FAQ

The | operator is not doing what I want
=======================================


     To: Alain.ISSARD@st.com
     Subject: Re: Start condition with FLEX
     In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
     Date: Mon, 18 Nov 1996 10:41:34 PST
     From: Vern Paxson <vern>
     
     > I am not able to use the start condition scope and to use the | (OR) with
     > rules having start conditions.
     
     The problem is that if you use '|' as a regular expression operator, for
     example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
     any blanks around it.  If you instead want the special '|' *action* (which
     from your scanner appears to be the case), which is a way of giving two
     different rules the same action:
     
     	foo	|
     	bar	matched_foo_or_bar();
     
     then '|' *must* be separated from the first rule by whitespace and *must*
     be followed by a new line.  You *cannot* write it as:
     
     	foo | bar	matched_foo_or_bar();
     
     even though you might think you could because yacc supports this syntax.
     The reason for this unfortunately incompatibility is historical, but it's
     unlikely to be changed.
     
     Your problems with start condition scope are simply due to syntax errors
     from your use of '|' later confusing flex.
     
     Let me know if you still have problems.
     
     		Vern


File: flex.info,  Node: Why can't flex understand this variable trailing context pattern?,  Next: The ^ operator isn't working,  Prev: The | operator is not doing what I want,  Up: FAQ

Why can't flex understand this variable trailing context pattern?
=================================================================


     To: Gregory Margo <gmargo@newton.vip.best.com>
     Subject: Re: flex-2.5.3 bug report
     In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
     Date: Sat, 23 Nov 1996 17:07:32 PST
     From: Vern Paxson <vern>
     
     > Enclosed is a lex file that "real" lex will process, but I cannot get
     > flex to process it.  Could you try it and maybe point me in the right direction?
     
     Your problem is that some of the definitions in the scanner use the '/'
     trailing context operator, and have it enclosed in ()'s.  Flex does not
     allow this operator to be enclosed in ()'s because doing so allows undefined
     regular expressions such as "(a/b)+".  So the solution is to remove the
     parentheses.  Note that you must also be building the scanner with the -l
     option for AT&T lex compatibility.  Without this option, flex automatically
     encloses the definitions in parentheses.
     
     		Vern


File: flex.info,  Node: The ^ operator isn't working,  Next: Trailing context is getting confused with trailing optional patterns,  Prev: Why can't flex understand this variable trailing context pattern?,  Up: FAQ

The ^ operator isn't working
============================


     To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
     Subject: Re: Flex Bug ?
     In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
     Date: Tue, 26 Nov 1996 11:15:05 PST
     From: Vern Paxson <vern>
     
     > In my lexer code, i have the line :
     > ^\*.*          { }
     >
     > Thus all lines starting with an astrix (*) are comment lines.
     > This does not work !
     
     I can't get this problem to reproduce - it works fine for me.  Note
     though that if what you have is slightly different:
     
     	COMMENT	^\*.*
     	%%
     	{COMMENT}	{ }
     
     then it won't work, because flex pushes back macro definitions enclosed
     in ()'s, so the rule becomes
     
     	(^\*.*)		{ }
     
     and now that the '^' operator is not at the immediate beginning of the
     line, it's interpreted as just a regular character.  You can avoid this
     behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
     
     		Vern


File: flex.info,  Node: Trailing context is getting confused with trailing optional patterns,  Next: Is flex GNU or not?,  Prev: The ^ operator isn't working,  Up: FAQ

Trailing context is getting confused with trailing optional patterns
====================================================================


     To: Adoram Rogel <adoram@hybridge.com>
     Subject: Re: Flex 2.5.4 BOF ???
     In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
     Date: Wed, 27 Nov 1996 10:56:25 PST
     From: Vern Paxson <vern>
     
     >     Organization(s)?/[a-z]
     >
     > This matched "Organizations" (looking in debug mode, the trailing s
     > was matched with trailing context instead of the optional (s) in the
     > end of the word.
     
     That should only happen with lex.  Flex can properly match this pattern.
     (That might be what you're saying, I'm just not sure.)
     
     > Is there a way to avoid this dangerous trailing context problem ?
     
     Unfortunately, there's no easy way.  On the other hand, I don't see why
     it should be a problem.  Lex's matching is clearly wrong, and I'd hope
     that usually the intent remains the same as expressed with the pattern,
     so flex's matching will be correct.
     
     		Vern


File: flex.info,  Node: Is flex GNU or not?,  Next: ERASEME53,  Prev: Trailing context is getting confused with trailing optional patterns,  Up: FAQ

Is flex GNU or not?
===================


     To: Cameron MacKinnon <mackin@interlog.com>
     Subject: Re: Flex documentation bug
     In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
     Date: Sun, 01 Dec 1996 22:29:39 PST
     From: Vern Paxson <vern>
     
     > I'm not sure how or where to submit bug reports (documentation or
     > otherwise) for the GNU project stuff ...
     
     Well, strictly speaking flex isn't part of the GNU project.  They just
     distribute it because no one's written a decent GPL'd lex replacement.
     So you should send bugs directly to me.  Those sent to the GNU folks
     sometimes find there way to me, but some may drop between the cracks.
     
     > In GNU Info, under the section 'Start Conditions', and also in the man
     > page (mine's dated April '95) is a nice little snippet showing how to
     > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
     > size. Unfortunately, no overflow checking is ever done ...
     
     This is already mentioned in the manual:
     
     Finally, here's an example of how to  match  C-style  quoted
     strings using exclusive start conditions, including expanded
     escape sequences (but not including checking  for  a  string
     that's too long):
     
     The reason for not doing the overflow checking is that it will needlessly
     clutter up an example whose main purpose is just to demonstrate how to
     use flex.
     
     The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
     
     		Vern


File: flex.info,  Node: ERASEME53,  Next: I need to scan if-then-else blocks and while loops,  Prev: Is flex GNU or not?,  Up: FAQ

ERASEME53
=========


     To: tsv@cs.UManitoba.CA
     Subject: Re: Flex (reg)..
     In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
     Date: Thu, 06 Mar 1997 15:54:19 PST
     From: Vern Paxson <vern>
     
     > [:alpha:] ([:alnum:] | \\_)*
     
     If your rule really has embedded blanks as shown above, then it won't
     work, as the first blank delimits the rule from the action.  (It wouldn't
     even compile ...)  You need instead:
     
     [:alpha:]([:alnum:]|\\_)*
     
     and that should work fine - there's no restriction on what can go inside
     of ()'s except for the trailing context operator, '/'.
     
     		Vern


File: flex.info,  Node: I need to scan if-then-else blocks and while loops,  Next: ERASEME55,  Prev: ERASEME53,  Up: FAQ

I need to scan if-then-else blocks and while loops
==================================================


     To: "Mike Stolnicki" <mstolnic@ford.com>
     Subject: Re: FLEX help
     In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
     Date: Fri, 30 May 1997 10:46:35 PDT
     From: Vern Paxson <vern>
     
     > We'd like to add "if-then-else", "while", and "for" statements to our
     > language ...
     > We've investigated many possible solutions.  The one solution that seems
     > the most reasonable involves knowing the position of a TOKEN in yyin.
     
     I strongly advise you to instead build a parse tree (abstract syntax tree)
     and loop over that instead.  You'll find this has major benefits in keeping
     your interpreter simple and extensible.
     
     That said, the functionality you mention for get_position and set_position
     have been on the to-do list for a while.  As flex is a purely spare-time
     project for me, no guarantees when this will be added (in particular, it
     for sure won't be for many months to come).
     
     		Vern


File: flex.info,  Node: ERASEME55,  Next: ERASEME56,  Prev: I need to scan if-then-else blocks and while loops,  Up: FAQ

ERASEME55
=========


     To: Colin Paul Adams <colin@colina.demon.co.uk>
     Subject: Re: Flex C++ classes and Bison
     In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
     Date: Fri, 15 Aug 1997 10:48:19 PDT
     From: Vern Paxson <vern>
     
     > #define YY_DECL   int yylex (YYSTYPE *lvalp, struct parser_control
     > *parm)
     >
     > I have been trying  to get this to work as a C++ scanner, but it does
     > not appear to be possible (warning that it matches no declarations in
     > yyFlexLexer, or something like that).
     >
     > Is this supposed to be possible, or is it being worked on (I DID
     > notice the comment that scanner classes are still experimental, so I'm
     > not too hopeful)?
     
     What you need to do is derive a subclass from yyFlexLexer that provides
     the above yylex() method, squirrels away lvalp and parm into member
     variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
     
     		Vern


File: flex.info,  Node: ERASEME56,  Next: ERASEME57,  Prev: ERASEME55,  Up: FAQ

ERASEME56
=========


     To: Mikael.Latvala@lmf.ericsson.se
     Subject: Re: Possible mistake in Flex v2.5 document
     In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
     Date: Fri, 05 Sep 1997 10:01:54 PDT
     From: Vern Paxson <vern>
     
     > In that example you show how to count comment lines when using
     > C style /* ... */ comments. My question is, shouldn't you take into
     > account a scenario where end of a comment marker occurs inside
     > character or string literals?
     
     The scanner certainly needs to also scan character and string literals.
     However it does that (there's an example in the man page for strings), the
     lexer will recognize the beginning of the literal before it runs across the
     embedded "/*".  Consequently, it will finish scanning the literal before it
     even considers the possibility of matching "/*".
     
     Example:
     
     	'([^']*|{ESCAPE_SEQUENCE})'
     
     will match all the text between the ''s (inclusive).  So the lexer
     considers this as a token beginning at the first ', and doesn't even
     attempt to match other tokens inside it.
     
     I thinnk this subtlety is not worth putting in the manual, as I suspect
     it would confuse more people than it would enlighten.
     
     		Vern


File: flex.info,  Node: ERASEME57,  Next: Is there a repository for flex scanners?,  Prev: ERASEME56,  Up: FAQ

ERASEME57
=========


     To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
     Subject: Re: flex limitations
     In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
     Date: Mon, 08 Sep 1997 11:38:08 PDT
     From: Vern Paxson <vern>
     
     > %%
     > [a-zA-Z]+       /* skip a line */
     >                 {  printf("got %s\n", yytext); }
     > %%
     
     What version of flex are you using?  If I feed this to 2.5.4, it complains:
     
     	"bug.l", line 5: EOF encountered inside an action
     	"bug.l", line 5: unrecognized rule
     	"bug.l", line 5: fatal parse error
     
     Not the world's greatest error message, but it manages to flag the problem.
     
     (With the introduction of start condition scopes, flex can't accommodate
     an action on a separate line, since it's ambiguous with an indented rule.)
     
     You can get 2.5.4 from ftp.ee.lbl.gov.
     
     		Vern


File: flex.info,  Node: Is there a repository for flex scanners?,  Next: How can I conditionally compile or preprocess my flex input file?,  Prev: ERASEME57,  Up: FAQ

Is there a repository for flex scanners?
========================================

   Not that we know of. You might try asking on comp.compilers.


File: flex.info,  Node: How can I conditionally compile or preprocess my flex input file?,  Next: Where can I find grammars for lex and yacc?,  Prev: Is there a repository for flex scanners?,  Up: FAQ

How can I conditionally compile or preprocess my flex input file?
=================================================================

   Flex doesn't have a preprocessor like C does.  You might try using
m4, or the C preprocessor plus a sed script to clean up the result.


File: flex.info,  Node: Where can I find grammars for lex and yacc?,  Next: I get an end-of-buffer message for each character scanned.,  Prev: How can I conditionally compile or preprocess my flex input file?,  Up: FAQ

Where can I find grammars for lex and yacc?
===========================================

   In the sources for flex and bison.


File: flex.info,  Node: I get an end-of-buffer message for each character scanned.,  Next: unnamed-faq-62,  Prev: Where can I find grammars for lex and yacc?,  Up: FAQ

I get an end-of-buffer message for each character scanned.
==========================================================

   This will happen if your LexerInput() function returns only one
character at a time, which can happen either if you're scanner is
"interactive", or if the streams library on your platform always
returns 1 for yyin->gcount().

   Solution: override LexerInput() with a version that returns whole
buffers.


File: flex.info,  Node: unnamed-faq-62,  Next: unnamed-faq-63,  Prev: I get an end-of-buffer message for each character scanned.,  Up: FAQ

unnamed-faq-62
==============


     To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
     Subject: Re: Flex maximums
     In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
     Date: Mon, 17 Nov 1997 17:16:15 PST
     From: Vern Paxson <vern>
     
     > I took a quick look into the flex-sources and altered some #defines in
     > flexdefs.h:
     >
     > 	#define INITIAL_MNS 64000
     > 	#define MNS_INCREMENT 1024000
     > 	#define MAXIMUM_MNS 64000
     
     The things to fix are to add a couple of zeroes to:
     
     #define JAMSTATE -32766 /* marks a reference to the state that always jams */
     #define MAXIMUM_MNS 31999
     #define BAD_SUBSCRIPT -32767
     #define MAX_SHORT 32700
     
     and, if you get complaints about too many rules, make the following change too:
     
     	#define YY_TRAILING_MASK 0x200000
     	#define YY_TRAILING_HEAD_MASK 0x400000
     
     - Vern


File: flex.info,  Node: unnamed-faq-63,  Next: unnamed-faq-64,  Prev: unnamed-faq-62,  Up: FAQ

unnamed-faq-63
==============


     To: jimmey@lexis-nexis.com (Jimmey Todd)
     Subject: Re: FLEX question regarding istream vs ifstream
     In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
     Date: Mon, 15 Dec 1997 13:21:35 PST
     From: Vern Paxson <vern>
     
     >         stdin_handle = YY_CURRENT_BUFFER;
     >         ifstream fin( "aFile" );
     >         yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
     >
     > What I'm wanting to do, is pass the contents of a file thru one set
     > of rules and then pass stdin thru another set... It works great if, I
     > don't use the C++ classes. But since everything else that I'm doing is
     > in C++, I thought I'd be consistent.
     >
     > The problem is that 'yy_create_buffer' is expecting an istream* as it's
     > first argument (as stated in the man page). However, fin is a ifstream
     > object. Any ideas on what I might be doing wrong? Any help would be
     > appreciated. Thanks!!
     
     You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
     Then its type will be compatible with the expected istream*, because ifstream
     is derived from istream.
     
     		Vern


File: flex.info,  Node: unnamed-faq-64,  Next: unnamed-faq-65,  Prev: unnamed-faq-63,  Up: FAQ

unnamed-faq-64
==============


     To: Enda Fadian <fadiane@piercom.ie>
     Subject: Re: Question related to Flex man page?
     In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
     Date: Tue, 16 Dec 1997 14:17:09 PST
     From: Vern Paxson <vern>
     
     > Can you explain to me what is ment by a long-jump in relation to flex?
     
     Using the longjmp() function while inside yylex() or a routine called by it.
     
     > what is the flex activation frame.
     
     Just yylex()'s stack frame.
     
     > As far as I can see yyrestart will bring me back to the sart of the input
     > file and using flex++ isnot really an option!
     
     No, yyrestart() doesn't imply a rewind, even though its name might sound
     like it does.  It tells the scanner to flush its internal buffers and
     start reading from the given file at its present location.
     
     		Vern


File: flex.info,  Node: unnamed-faq-65,  Next: unnamed-faq-66,  Prev: unnamed-faq-64,  Up: FAQ

unnamed-faq-65
==============


     To: hassan@larc.info.uqam.ca (Hassan Alaoui)
     Subject: Re: Need urgent Help
     In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
     Date: Sun, 21 Dec 1997 21:30:46 PST
     From: Vern Paxson <vern>
     
     > /usr/lib/yaccpar: In function `int yyparse()':
     > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
     >
     > ld: Undefined symbol
     >    _yylex
     >    _yyparse
     >    _yyin
     
     This is a known problem with Solaris C++ (and/or Solaris yacc).  I believe
     the fix is to explicitly insert some 'extern "C"' statements for the
     corresponding routines/symbols.
     
     		Vern


File: flex.info,  Node: unnamed-faq-66,  Next: unnamed-faq-67,  Prev: unnamed-faq-65,  Up: FAQ

unnamed-faq-66
==============


     To: mc0307@mclink.it
     Cc: gnu@prep.ai.mit.edu
     Subject: Re: [mc0307@mclink.it: Help request]
     In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
     Date: Sun, 21 Dec 1997 22:33:37 PST
     From: Vern Paxson <vern>
     
     > This is my definition for float and integer types:
     > . . .
     > NZD          [1-9]
     > ...
     > I've tested my program on other lex version (on UNIX Sun Solaris an HP
     > UNIX) and it work well, so I think that my definitions are correct.
     > There are any differences between Lex and Flex?
     
     There are indeed differences, as discussed in the man page.  The one
     you are probably running into is that when flex expands a name definition,
     it puts parentheses around the expansion, while lex does not.  There's
     an example in the man page of how this can lead to different matching.
     Flex's behavior complies with the POSIX standard (or at least with the
     last POSIX draft I saw).
     
     		Vern


File: flex.info,  Node: unnamed-faq-67,  Next: unnamed-faq-68,  Prev: unnamed-faq-66,  Up: FAQ

unnamed-faq-67
==============


     To: hassan@larc.info.uqam.ca (Hassan Alaoui)
     Subject: Re: Thanks
     In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
     Date: Mon, 22 Dec 1997 14:35:05 PST
     From: Vern Paxson <vern>
     
     > Thank you very much for your help. I compile and link well with C++ while
     > declaring 'yylex ...' extern, But a little problem remains. I get a
     > segmentation default when executing ( I linked with lfl library) while it
     > works well when using LEX instead of flex. Do you have some ideas about the
     > reason for this ?
     
     The one possible reason for this that comes to mind is if you've defined
     yytext as "extern char yytext[]" (which is what lex uses) instead of
     "extern char *yytext" (which is what flex uses).  If it's not that, then
     I'm afraid I don't know what the problem might be.
     
     		Vern


File: flex.info,  Node: unnamed-faq-68,  Next: unnamed-faq-69,  Prev: unnamed-faq-67,  Up: FAQ

unnamed-faq-68
==============


     To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
     Subject: Re: flex 2.5: c++ scanners & start conditions
     In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
     Date: Tue, 06 Jan 1998 19:19:30 PST
     From: Vern Paxson <vern>
     
     > The problem is that when I do this (using %option c++) start
     > conditions seem to not apply.
     
     The BEGIN macro modifies the yy_start variable.  For C scanners, this
     is a static with scope visible through the whole file.  For C++ scanners,
     it's a member variable, so it only has visible scope within a member
     function.  Your lexbegin() routine is not a member function when you
     build a C++ scanner, so it's not modifying the correct yy_start.  The
     diagnostic that indicates this is that you found you needed to add
     a declaration of yy_start in order to get your scanner to compile when
     using C++; instead, the correct fix is to make lexbegin() a member
     function (by deriving from yyFlexLexer).
     
     		Vern


File: flex.info,  Node: unnamed-faq-69,  Next: unnamed-faq-70,  Prev: unnamed-faq-68,  Up: FAQ

unnamed-faq-69
==============


     To: "Boris Zinin" <boris@ippe.rssi.ru>
     Subject: Re: current position in flex buffer
     In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
     Date: Mon, 12 Jan 1998 12:03:15 PST
     From: Vern Paxson <vern>
     
     > The problem is how to determine the current position in flex active
     > buffer when a rule is matched....
     
     You will need to keep track of this explicitly, such as by redefining
     YY_USER_ACTION to count the number of characters matched.
     
     The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
     
     		Vern


File: flex.info,  Node: unnamed-faq-70,  Next: unnamed-faq-71,  Prev: unnamed-faq-69,  Up: FAQ

unnamed-faq-70
==============


     To: Bik.Dhaliwal@bis.org
     Subject: Re: Flex question
     In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
     Date: Tue, 27 Jan 1998 22:41:52 PST
     From: Vern Paxson <vern>
     
     > That requirement involves knowing
     > the character position at which a particular token was matched
     > in the lexer.
     
     The way you have to do this is by explicitly keeping track of where
     you are in the file, by counting the number of characters scanned
     for each token (available in yyleng).  It may prove convenient to
     do this by redefining YY_USER_ACTION, as described in the manual.
     
     		Vern


File: flex.info,  Node: unnamed-faq-71,  Next: unnamed-faq-72,  Prev: unnamed-faq-70,  Up: FAQ

unnamed-faq-71
==============


     To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
     Subject: Re: flex: how to control start condition from parser?
     In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
     Date: Tue, 27 Jan 1998 22:45:37 PST
     From: Vern Paxson <vern>
     
     > It seems useful for the parser to be able to tell the lexer about such
     > context dependencies, because then they don't have to be limited to
     > local or sequential context.
     
     One way to do this is to have the parser call a stub routine that's
     included in the scanner's .l file, and consequently that has access ot
     BEGIN.  The only ugliness is that the parser can't pass in the state
     it wants, because those aren't visible - but if you don't have many
     such states, then using a different set of names doesn't seem like
     to much of a burden.
     
     While generating a .h file like you suggests is certainly cleaner,
     flex development has come to a virtual stand-still :-(, so a workaround
     like the above is much more pragmatic than waiting for a new feature.
     
     		Vern


File: flex.info,  Node: unnamed-faq-72,  Next: unnamed-faq-73,  Prev: unnamed-faq-71,  Up: FAQ

unnamed-faq-72
==============


     To: Barbara Denny <denny@3com.com>
     Subject: Re: freebsd flex bug?
     In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
     Date: Fri, 30 Jan 1998 12:42:32 PST
     From: Vern Paxson <vern>
     
     > lex.yy.c:1996: parse error before `='
     
     This is the key, identifying this error.  (It may help to pinpoint
     it by using flex -L, so it doesn't generate #line directives in its
     output.)  I will bet you heavy money that you have a start condition
     name that is also a variable name, or something like that; flex spits
     out #define's for each start condition name, mapping them to a number,
     so you can wind up with:
     
     	%x foo
     	%%
     		...
     	%%
     	void bar()
     		{
     		int foo = 3;
     		}
     
     and the penultimate will turn into "int 1 = 3" after C preprocessing,
     since flex will put "#define foo 1" in the generated scanner.
     
     		Vern


File: flex.info,  Node: unnamed-faq-73,  Next: unnamed-faq-74,  Prev: unnamed-faq-72,  Up: FAQ

unnamed-faq-73
==============


     To: Maurice Petrie <mpetrie@infoscigroup.com>
     Subject: Re: Lost flex .l file
     In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
     Date: Mon, 02 Feb 1998 11:15:12 PST
     From: Vern Paxson <vern>
     
     > I am curious as to
     > whether there is a simple way to backtrack from the generated source to
     > reproduce the lost list of tokens we are searching on.
     
     In theory, it's straight-forward to go from the DFA representation
     back to a regular-expression representation - the two are isomorphic.
     In practice, a huge headache, because you have to unpack all the tables
     back into a single DFA representation, and then write a program to munch
     on that and translate it into an RE.
     
     Sorry for the less-than-happy news ...
     
     		Vern


File: flex.info,  Node: unnamed-faq-74,  Next: unnamed-faq-75,  Prev: unnamed-faq-73,  Up: FAQ

unnamed-faq-74
==============


     To: jimmey@lexis-nexis.com (Jimmey Todd)
     Subject: Re: Flex performance question
     In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
     Date: Thu, 19 Feb 1998 08:48:51 PST
     From: Vern Paxson <vern>
     
     > What I have found, is that the smaller the data chunk, the faster the
     > program executes. This is the opposite of what I expected. Should this be
     > happening this way?
     
     This is exactly what will happen if your input file has embedded NULs.
     From the man page:
     
     A final note: flex is slow when matching NUL's, particularly
     when  a  token  contains multiple NUL's.  It's best to write
     rules which match short amounts of text if it's  anticipated
     that the text will often include NUL's.
     
     So that's the first thing to look for.
     
     		Vern


File: flex.info,  Node: unnamed-faq-75,  Next: unnamed-faq-76,  Prev: unnamed-faq-74,  Up: FAQ

unnamed-faq-75
==============


     To: jimmey@lexis-nexis.com (Jimmey Todd)
     Subject: Re: Flex performance question
     In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
     Date: Thu, 19 Feb 1998 15:42:25 PST
     From: Vern Paxson <vern>
     
     So there are several problems.
     
     First, to go fast, you want to match as much text as possible, which
     your scanners don't in the case that what they're scanning is *not*
     a <RN> tag.  So you want a rule like:
     
     	[^<]+
     
     Second, C++ scanners are particularly slow if they're interactive,
     which they are by default.  Using -B speeds it up by a factor of 3-4
     on my workstation.
     
     Third, C++ scanners that use the istream interface are slow, because
     of how poorly implemented istream's are.  I built two versions of
     the following scanner:
     
     	%%
     	.*\n
     	.*
     	%%
     
     and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
     The C++ istream version, using -B, takes 3.8 seconds.
     
     		Vern