diff options
Diffstat (limited to 'doc/flex.info-5')
-rw-r--r-- | doc/flex.info-5 | 1330 |
1 files changed, 1330 insertions, 0 deletions
diff --git a/doc/flex.info-5 b/doc/flex.info-5 new file mode 100644 index 0000000..c3b0c72 --- /dev/null +++ b/doc/flex.info-5 @@ -0,0 +1,1330 @@ +This is flex.info, produced by makeinfo version 4.3d from flex.texi. + +INFO-DIR-SECTION Programming +START-INFO-DIR-ENTRY +* flex: (flex). Fast lexical analyzer generator (lex replacement). +END-INFO-DIR-ENTRY + + + The flex manual is placed under the same licensing conditions as the +rest of flex: + + Copyright (C) 1990, 1997 The Regents of the University of California. +All rights reserved. + + This code is derived from software contributed to Berkeley by Vern +Paxson. + + The United States Government has rights in this work pursuant to +contract no. DE-AC03-76SF00098 between the United States Department of +Energy and the University of California. + + Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the + distribution. + Neither the name of the University nor the names of its contributors +may be used to endorse or promote products derived from this software +without specific prior written permission. + + THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED +WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. + +File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ + +How do I match any string not matched in the preceding rules? +============================================================= + + One way to assign precedence, is to place the more specific rules +first. If two rules would match the same input (same sequence of +characters) then the first rule listed in the `flex' input wins. e.g., + + + %% + foo[a-zA-Z_]+ return FOO_ID; + bar[a-zA-Z_]+ return BAR_ID; + [a-zA-Z_]+ return GENERIC_ID; + + Note that the rule `[a-zA-Z_]+' must come *after* the others. It +will match the same amount of text as the more specific rules, and in +that case the `flex' scanner will pick the first rule listed in your +scanner as the one to match. + + +File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ + +I am trying to port code from AT&T lex that uses yysptr and yysbuf. +=================================================================== + + Those are internal variables pointing into the AT&T scanner's input +buffer. I imagine they're being manipulated in user versions of the +`input()' and `unput()' functions. If so, what you need to do is +analyze those functions to figure out what they're doing, and then +replace `input()' with an appropriate definition of `YY_INPUT'. You +shouldn't need to (and must not) replace `flex''s `unput()' function. + + +File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ + +Is there a way to make flex treat NULL like a regular character? +================================================================ + + Yes, `\0' and `\x00' should both do the trick. Perhaps you have an +ancient version of `flex'. The latest release is version 2.5.31. + + +File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesnt flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ + +Whenever flex can not match the input it says "flex scanner jammed". +==================================================================== + + You need to add a rule that matches the otherwise-unmatched text. +e.g., + + + %option yylineno + %% + [[a bunch of rules here]] + + . printf("bad input character '%s' at line %d\n", yytext, yylineno); + + See `%option default' for more information. + + +File: flex.info, Node: Why doesnt flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ + +Why doesn't flex have non-greedy operators like perl does? +========================================================== + + A DFA can do a non-greedy match by stopping the first time it enters +an accepting state, instead of consuming input until it determines that +no further matching is possible (a "jam" state). This is actually +easier to implement than longest leftmost match (which flex does). + + But it's also much less useful than longest leftmost match. In +general, when you find yourself wishing for non-greedy matching, that's +usually a sign that you're trying to make the scanner do some parsing. +That's generally the wrong approach, since it lacks the power to do a +decent job. Better is to either introduce a separate parser, or to +split the scanner into multiple scanners using (exclusive) start +conditions. + + You might have a separate start state once you've seen the `BEGIN'. +In that state, you might then have a regex that will match `END' (to +kick you out of the state), and perhaps `(.|\n)' to get a single +character within the chunk ... + + This approach also has much better error-reporting properties. + + +File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesnt flex have non-greedy operators like perl does?, Up: FAQ + +Memory leak - 16386 bytes allocated by malloc. +============================================== + + UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that +you did not call `yylex_destroy()'. If you are using an earlier version +of `flex', then read on. + + The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the +read-buffer, and about 40 for `struct yy_buffer_state' (depending upon +alignment). The leak is in the non-reentrant C scanner only (NOT in the +reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know +when you are done, the buffer is never freed. + + However, the leak won't multiply since the buffer is reused no +matter how many times you call `yylex()'. + + If you want to reclaim the memory when you are completely done +scanning, then you might try this: + + + /* For non-reentrant C scanner only. */ + yy_delete_buffer(YY_CURRENT_BUFFER); + yy_init = 1; + + Note: `yy_init' is an "internal variable", and hasn't been tested in +this situation. It is possible that some other globals may need +resetting as well. + + +File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ + +How do I track the byte offset for lseek()? +=========================================== + + + > We thought that it would be possible to have this number through the + > evaluation of the following expression: + > + > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf + + While this is the right idea, it has two problems. The first is that +it's possible that `flex' will request less than `YY_READ_BUF_SIZE' +during an invocation of `YY_INPUT' (or that your input source will +return less even though `YY_READ_BUF_SIZE' bytes were requested). The +second problem is that when refilling its internal buffer, `flex' keeps +some characters from the previous buffer (because usually it's in the +middle of a match, and needs those characters to construct `yytext' for +the match once it's done). Because of this, `yy_c_buf_p - +YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters +already read from the current buffer. + + An alternative solution is to count the number of characters you've +matched since starting to scan. This can be done by using +`YY_USER_ACTION'. For example, + + + #define YY_USER_ACTION num_chars += yyleng; + + (You need to be careful to update your bookkeeping if you use +`yymore('), `yyless()', `unput()', or `input()'.) + + +File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ + +How do I use my own I/O classes in a C++ scanner? +================================================= + + When the flex C++ scanning class rewrite finally happens, then this +sort of thing should become much easier. + + You can do this by passing the various functions (such as +`LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then +dealing with your own I/O classes surreptitiously (i.e., stashing them +in special member variables). This works because the only assumption +about the lexer regarding what's done with the iostream's is that +they're ultimately passed to `LexerInput()' and `LexerOutput', which +then do whatever is necessary with them. + + +File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ + +How do I skip as many chars as possible? +======================================== + + How do I skip as many chars as possible - without interfering with +the other patterns? + + In the example below, we want to skip over characters until we see +the phrase "endskip". The following will _NOT_ work correctly (do you +see why not?) + + + /* INCORRECT SCANNER */ + %x SKIP + %% + <INITIAL>startskip BEGIN(SKIP); + ... + <SKIP>"endskip" BEGIN(INITIAL); + <SKIP>.* ; + + The problem is that the pattern .* will eat up the word "endskip." +The simplest (but slow) fix is: + + + <SKIP>"endskip" BEGIN(INITIAL); + <SKIP>. ; + + The fix involves making the second rule match more, without making +it match "endskip" plus something else. So for example: + + + <SKIP>"endskip" BEGIN(INITIAL); + <SKIP>[^e]+ ; + <SKIP>. ;/* so you eat up e's, too */ + + +File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ + +deleteme00 +========== + + + QUESTION: + When was flex born? + + Vern Paxson took over + the Software Tools lex project from Jef Poskanzer in 1982. At that point it + was written in Ratfor. Around 1987 or so, Paxson translated it into C, and + a legend was born :-). + + +File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ + +Are certain equivalent patterns faster than others? +=================================================== + + + To: Adoram Rogel <adoram@orna.hybridge.com> + Subject: Re: Flex 2.5.2 performance questions + In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT. + Date: Wed, 18 Sep 96 10:51:02 PDT + From: Vern Paxson <vern> + + [Note, the most recent flex release is 2.5.4, which you can get from + ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.] + + > 1. Using the pattern + > ([Ff](oot)?)?[Nn](ote)?(\.)? + > instead of + > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.))) + > (in a very complicated flex program) caused the program to slow from + > 300K+/min to 100K/min (no other changes were done). + + These two are not equivalent. For example, the first can match "footnote." + but the second can only match "footnote". This is almost certainly the + cause in the discrepancy - the slower scanner run is matching more tokens, + and/or having to do more backing up. + + > 2. Which of these two are better: [Ff]oot or (F|f)oot ? + + From a performance point of view, they're equivalent (modulo presumably + minor effects such as memory cache hit rates; and the presence of trailing + context, see below). From a space point of view, the first is slightly + preferable. + + > 3. I have a pattern that look like this: + > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd) + > + > running yet another complicated program that includes the following rule: + > <snext>{and}/{no4}{bb}{pats} + > + > gets me to "too complicated - over 32,000 states"... + + I can't tell from this example whether the trailing context is variable-length + or fixed-length (it could be the latter if {and} is fixed-length). If it's + variable length, which flex -p will tell you, then this reflects a basic + performance problem, and if you can eliminate it by restructuring your + scanner, you will see significant improvement. + + > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about + > 10 patterns and changed the rule to be 5 rules. + > This did compile, but what is the rule of thumb here ? + + The rule is to avoid trailing context other than fixed-length, in which for + a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use + of the '|' operator automatically makes the pattern variable length, so in + this case '[Ff]oot' is preferred to '(F|f)oot'. + + > 4. I changed a rule that looked like this: + > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN... + > + > to the next 2 rules: + > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;} + > <snext8>{and}{bb}/{ROMAN} { BEGIN... + > + > Again, I understand the using [^...] will cause a great performance loss + + Actually, it doesn't cause any sort of performance loss. It's a surprising + fact about regular expressions that they always match in linear time + regardless of how complex they are. + + > but are there any specific rules about it ? + + See the "Performance Considerations" section of the man page, and also + the example in MISC/fastwc/. + + Vern + + +File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ + +Is backing up a big deal? +========================= + + + To: Adoram Rogel <adoram@hybridge.com> + Subject: Re: Flex 2.5.2 performance questions + In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT. + Date: Thu, 19 Sep 96 09:58:00 PDT + From: Vern Paxson <vern> + + > a lot about the backing up problem. + > I believe that there lies my biggest problem, and I'll try to improve + > it. + + Since you have variable trailing context, this is a bigger performance + problem. Fixing it is usually easier than fixing backing up, which in a + complicated scanner (yours seems to fit the bill) can be extremely + difficult to do correctly. + + You also don't mention what flags you are using for your scanner. + -f makes a large speed difference, and -Cfe buys you nearly as much + speed but the resulting scanner is considerably smaller. + + > I have an | operator in {and} and in {pats} so both of them are variable + > length. + + -p should have reported this. + + > Is changing one of them to fixed-length is enough ? + + Yes. + + > Is it possible to change the 32,000 states limit ? + + Yes. I've appended instructions on how. Before you make this change, + though, you should think about whether there are ways to fundamentally + simplify your scanner - those are certainly preferable! + + Vern + + To increase the 32K limit (on a machine with 32 bit integers), you increase + the magnitude of the following in flexdef.h: + + #define JAMSTATE -32766 /* marks a reference to the state that always jams */ + #define MAXIMUM_MNS 31999 + #define BAD_SUBSCRIPT -32767 + #define MAX_SHORT 32700 + + Adding a 0 or two after each should do the trick. + + +File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ + +Can I fake multi-byte character support? +======================================== + + + To: Heeman_Lee@hp.com + Subject: Re: flex - multi-byte support? + In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT. + Date: Fri, 04 Oct 1996 11:42:18 PDT + From: Vern Paxson <vern> + + > I assume as long as my *.l file defines the + > range of expected character code values (in octal format), flex will + > scan the file and read multi-byte characters correctly. But I have no + > confidence in this assumption. + + Your lack of confidence is justified - this won't work. + + Flex has in it a widespread assumption that the input is processed + one byte at a time. Fixing this is on the to-do list, but is involved, + so it won't happen any time soon. In the interim, the best I can suggest + (unless you want to try fixing it yourself) is to write your rules in + terms of pairs of bytes, using definitions in the first section: + + X \xfe\xc2 + ... + %% + foo{X}bar found_foo_fe_c2_bar(); + + etc. Definitely a pain - sorry about that. + + By the way, the email address you used for me is ancient, indicating you + have a very old version of flex. You can get the most recent, 2.5.4, from + ftp.ee.lbl.gov. + + Vern + + +File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ + +deleteme01 +========== + + + To: moleary@primus.com + Subject: Re: Flex / Unicode compatibility question + In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT. + Date: Tue, 22 Oct 1996 11:06:13 PDT + From: Vern Paxson <vern> + + Unfortunately flex at the moment has a widespread assumption within it + that characters are processed 8 bits at a time. I don't see any easy + fix for this (other than writing your rules in terms of double characters - + a pain). I also don't know of a wider lex, though you might try surfing + the Plan 9 stuff because I know it's a Unicode system, and also the PCCT + toolkit (try searching say Alta Vista for "Purdue Compiler Construction + Toolkit"). + + Fixing flex to handle wider characters is on the long-term to-do list. + But since flex is a strictly spare-time project these days, this probably + won't happen for quite a while, unless someone else does it first. + + Vern + + +File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ + +Can you discuss some flex internals? +==================================== + + + To: Johan Linde <jl@theophys.kth.se> + Subject: Re: translation of flex + In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST. + Date: Mon, 11 Nov 1996 10:33:50 PST + From: Vern Paxson <vern> + + > I'm working for the Swedish team translating GNU program, and I'm currently + > working with flex. I have a few questions about some of the messages which + > I hope you can answer. + + All of the things you're wondering about, by the way, concerning flex + internals - probably the only person who understands what they mean in + English is me! So I wouldn't worry too much about getting them right. + That said ... + + > #: main.c:545 + > msgid " %d protos created\n" + > + > Does proto mean prototype? + + Yes - prototypes of state compression tables. + + > #: main.c:539 + > msgid " %d/%d (peak %d) template nxt-chk entries created\n" + > + > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?) + > However, 'template next-check entries' doesn't make much sense to me. To be + > able to find a good translation I need to know a little bit more about it. + + There is a scheme in the Aho/Sethi/Ullman compiler book for compressing + scanner tables. It involves creating two pairs of tables. The first has + "base" and "default" entries, the second has "next" and "check" entries. + The "base" entry is indexed by the current state and yields an index into + the next/check table. The "default" entry gives what to do if the state + transition isn't found in next/check. The "next" entry gives the next + state to enter, but only if the "check" entry verifies that this entry is + correct for the current state. Flex creates templates of series of + next/check entries and then encodes differences from these templates as a + way to compress the tables. + + > #: main.c:533 + > msgid " %d/%d base-def entries created\n" + > + > The same problem here for 'base-def'. + + See above. + + Vern + + +File: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ + +unput() messes up yy_at_bol +=========================== + + + To: Xinying Li <xli@npac.syr.edu> + Subject: Re: FLEX ? + In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST. + Date: Wed, 13 Nov 1996 19:51:54 PST + From: Vern Paxson <vern> + + > "unput()" them to input flow, question occurs. If I do this after I scan + > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That + > means the carriage flag has gone. + + You can control this by calling yy_set_bol(). It's described in the manual. + + > And if in pre-reading it goes to the end of file, is anything done + > to control the end of curren buffer and end of file? + + No, there's no way to put back an end-of-file. + + > By the way I am using flex 2.5.2 and using the "-l". + + The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and + 2.5.3. You can get it from ftp.ee.lbl.gov. + + Vern + + +File: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ + +The | operator is not doing what I want +======================================= + + + To: Alain.ISSARD@st.com + Subject: Re: Start condition with FLEX + In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST. + Date: Mon, 18 Nov 1996 10:41:34 PST + From: Vern Paxson <vern> + + > I am not able to use the start condition scope and to use the | (OR) with + > rules having start conditions. + + The problem is that if you use '|' as a regular expression operator, for + example "a|b" meaning "match either 'a' or 'b'", then it must *not* have + any blanks around it. If you instead want the special '|' *action* (which + from your scanner appears to be the case), which is a way of giving two + different rules the same action: + + foo | + bar matched_foo_or_bar(); + + then '|' *must* be separated from the first rule by whitespace and *must* + be followed by a new line. You *cannot* write it as: + + foo | bar matched_foo_or_bar(); + + even though you might think you could because yacc supports this syntax. + The reason for this unfortunately incompatibility is historical, but it's + unlikely to be changed. + + Your problems with start condition scope are simply due to syntax errors + from your use of '|' later confusing flex. + + Let me know if you still have problems. + + Vern + + +File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ + +Why can't flex understand this variable trailing context pattern? +================================================================= + + + To: Gregory Margo <gmargo@newton.vip.best.com> + Subject: Re: flex-2.5.3 bug report + In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST. + Date: Sat, 23 Nov 1996 17:07:32 PST + From: Vern Paxson <vern> + + > Enclosed is a lex file that "real" lex will process, but I cannot get + > flex to process it. Could you try it and maybe point me in the right direction? + + Your problem is that some of the definitions in the scanner use the '/' + trailing context operator, and have it enclosed in ()'s. Flex does not + allow this operator to be enclosed in ()'s because doing so allows undefined + regular expressions such as "(a/b)+". So the solution is to remove the + parentheses. Note that you must also be building the scanner with the -l + option for AT&T lex compatibility. Without this option, flex automatically + encloses the definitions in parentheses. + + Vern + + +File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ + +The ^ operator isn't working +============================ + + + To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de> + Subject: Re: Flex Bug ? + In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST. + Date: Tue, 26 Nov 1996 11:15:05 PST + From: Vern Paxson <vern> + + > In my lexer code, i have the line : + > ^\*.* { } + > + > Thus all lines starting with an astrix (*) are comment lines. + > This does not work ! + + I can't get this problem to reproduce - it works fine for me. Note + though that if what you have is slightly different: + + COMMENT ^\*.* + %% + {COMMENT} { } + + then it won't work, because flex pushes back macro definitions enclosed + in ()'s, so the rule becomes + + (^\*.*) { } + + and now that the '^' operator is not at the immediate beginning of the + line, it's interpreted as just a regular character. You can avoid this + behavior by using the "-l" lex-compatibility flag, or "%option lex-compat". + + Vern + + +File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ + +Trailing context is getting confused with trailing optional patterns +==================================================================== + + + To: Adoram Rogel <adoram@hybridge.com> + Subject: Re: Flex 2.5.4 BOF ??? + In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST. + Date: Wed, 27 Nov 1996 10:56:25 PST + From: Vern Paxson <vern> + + > Organization(s)?/[a-z] + > + > This matched "Organizations" (looking in debug mode, the trailing s + > was matched with trailing context instead of the optional (s) in the + > end of the word. + + That should only happen with lex. Flex can properly match this pattern. + (That might be what you're saying, I'm just not sure.) + + > Is there a way to avoid this dangerous trailing context problem ? + + Unfortunately, there's no easy way. On the other hand, I don't see why + it should be a problem. Lex's matching is clearly wrong, and I'd hope + that usually the intent remains the same as expressed with the pattern, + so flex's matching will be correct. + + Vern + + +File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ + +Is flex GNU or not? +=================== + + + To: Cameron MacKinnon <mackin@interlog.com> + Subject: Re: Flex documentation bug + In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST. + Date: Sun, 01 Dec 1996 22:29:39 PST + From: Vern Paxson <vern> + + > I'm not sure how or where to submit bug reports (documentation or + > otherwise) for the GNU project stuff ... + + Well, strictly speaking flex isn't part of the GNU project. They just + distribute it because no one's written a decent GPL'd lex replacement. + So you should send bugs directly to me. Those sent to the GNU folks + sometimes find there way to me, but some may drop between the cracks. + + > In GNU Info, under the section 'Start Conditions', and also in the man + > page (mine's dated April '95) is a nice little snippet showing how to + > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in + > size. Unfortunately, no overflow checking is ever done ... + + This is already mentioned in the manual: + + Finally, here's an example of how to match C-style quoted + strings using exclusive start conditions, including expanded + escape sequences (but not including checking for a string + that's too long): + + The reason for not doing the overflow checking is that it will needlessly + clutter up an example whose main purpose is just to demonstrate how to + use flex. + + The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov. + + Vern + + +File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ + +ERASEME53 +========= + + + To: tsv@cs.UManitoba.CA + Subject: Re: Flex (reg).. + In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST. + Date: Thu, 06 Mar 1997 15:54:19 PST + From: Vern Paxson <vern> + + > [:alpha:] ([:alnum:] | \\_)* + + If your rule really has embedded blanks as shown above, then it won't + work, as the first blank delimits the rule from the action. (It wouldn't + even compile ...) You need instead: + + [:alpha:]([:alnum:]|\\_)* + + and that should work fine - there's no restriction on what can go inside + of ()'s except for the trailing context operator, '/'. + + Vern + + +File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ + +I need to scan if-then-else blocks and while loops +================================================== + + + To: "Mike Stolnicki" <mstolnic@ford.com> + Subject: Re: FLEX help + In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT. + Date: Fri, 30 May 1997 10:46:35 PDT + From: Vern Paxson <vern> + + > We'd like to add "if-then-else", "while", and "for" statements to our + > language ... + > We've investigated many possible solutions. The one solution that seems + > the most reasonable involves knowing the position of a TOKEN in yyin. + + I strongly advise you to instead build a parse tree (abstract syntax tree) + and loop over that instead. You'll find this has major benefits in keeping + your interpreter simple and extensible. + + That said, the functionality you mention for get_position and set_position + have been on the to-do list for a while. As flex is a purely spare-time + project for me, no guarantees when this will be added (in particular, it + for sure won't be for many months to come). + + Vern + + +File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ + +ERASEME55 +========= + + + To: Colin Paul Adams <colin@colina.demon.co.uk> + Subject: Re: Flex C++ classes and Bison + In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT. + Date: Fri, 15 Aug 1997 10:48:19 PDT + From: Vern Paxson <vern> + + > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control + > *parm) + > + > I have been trying to get this to work as a C++ scanner, but it does + > not appear to be possible (warning that it matches no declarations in + > yyFlexLexer, or something like that). + > + > Is this supposed to be possible, or is it being worked on (I DID + > notice the comment that scanner classes are still experimental, so I'm + > not too hopeful)? + + What you need to do is derive a subclass from yyFlexLexer that provides + the above yylex() method, squirrels away lvalp and parm into member + variables, and then invokes yyFlexLexer::yylex() to do the regular scanning. + + Vern + + +File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ + +ERASEME56 +========= + + + To: Mikael.Latvala@lmf.ericsson.se + Subject: Re: Possible mistake in Flex v2.5 document + In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT. + Date: Fri, 05 Sep 1997 10:01:54 PDT + From: Vern Paxson <vern> + + > In that example you show how to count comment lines when using + > C style /* ... */ comments. My question is, shouldn't you take into + > account a scenario where end of a comment marker occurs inside + > character or string literals? + + The scanner certainly needs to also scan character and string literals. + However it does that (there's an example in the man page for strings), the + lexer will recognize the beginning of the literal before it runs across the + embedded "/*". Consequently, it will finish scanning the literal before it + even considers the possibility of matching "/*". + + Example: + + '([^']*|{ESCAPE_SEQUENCE})' + + will match all the text between the ''s (inclusive). So the lexer + considers this as a token beginning at the first ', and doesn't even + attempt to match other tokens inside it. + + I thinnk this subtlety is not worth putting in the manual, as I suspect + it would confuse more people than it would enlighten. + + Vern + + +File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ + +ERASEME57 +========= + + + To: "Marty Leisner" <leisner@sdsp.mc.xerox.com> + Subject: Re: flex limitations + In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT. + Date: Mon, 08 Sep 1997 11:38:08 PDT + From: Vern Paxson <vern> + + > %% + > [a-zA-Z]+ /* skip a line */ + > { printf("got %s\n", yytext); } + > %% + + What version of flex are you using? If I feed this to 2.5.4, it complains: + + "bug.l", line 5: EOF encountered inside an action + "bug.l", line 5: unrecognized rule + "bug.l", line 5: fatal parse error + + Not the world's greatest error message, but it manages to flag the problem. + + (With the introduction of start condition scopes, flex can't accommodate + an action on a separate line, since it's ambiguous with an indented rule.) + + You can get 2.5.4 from ftp.ee.lbl.gov. + + Vern + + +File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ + +Is there a repository for flex scanners? +======================================== + + Not that we know of. You might try asking on comp.compilers. + + +File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ + +How can I conditionally compile or preprocess my flex input file? +================================================================= + + Flex doesn't have a preprocessor like C does. You might try using +m4, or the C preprocessor plus a sed script to clean up the result. + + +File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ + +Where can I find grammars for lex and yacc? +=========================================== + + In the sources for flex and bison. + + +File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ + +I get an end-of-buffer message for each character scanned. +========================================================== + + This will happen if your LexerInput() function returns only one +character at a time, which can happen either if you're scanner is +"interactive", or if the streams library on your platform always +returns 1 for yyin->gcount(). + + Solution: override LexerInput() with a version that returns whole +buffers. + + +File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ + +unnamed-faq-62 +============== + + + To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE + Subject: Re: Flex maximums + In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST. + Date: Mon, 17 Nov 1997 17:16:15 PST + From: Vern Paxson <vern> + + > I took a quick look into the flex-sources and altered some #defines in + > flexdefs.h: + > + > #define INITIAL_MNS 64000 + > #define MNS_INCREMENT 1024000 + > #define MAXIMUM_MNS 64000 + + The things to fix are to add a couple of zeroes to: + + #define JAMSTATE -32766 /* marks a reference to the state that always jams */ + #define MAXIMUM_MNS 31999 + #define BAD_SUBSCRIPT -32767 + #define MAX_SHORT 32700 + + and, if you get complaints about too many rules, make the following change too: + + #define YY_TRAILING_MASK 0x200000 + #define YY_TRAILING_HEAD_MASK 0x400000 + + - Vern + + +File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ + +unnamed-faq-63 +============== + + + To: jimmey@lexis-nexis.com (Jimmey Todd) + Subject: Re: FLEX question regarding istream vs ifstream + In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST. + Date: Mon, 15 Dec 1997 13:21:35 PST + From: Vern Paxson <vern> + + > stdin_handle = YY_CURRENT_BUFFER; + > ifstream fin( "aFile" ); + > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) ); + > + > What I'm wanting to do, is pass the contents of a file thru one set + > of rules and then pass stdin thru another set... It works great if, I + > don't use the C++ classes. But since everything else that I'm doing is + > in C++, I thought I'd be consistent. + > + > The problem is that 'yy_create_buffer' is expecting an istream* as it's + > first argument (as stated in the man page). However, fin is a ifstream + > object. Any ideas on what I might be doing wrong? Any help would be + > appreciated. Thanks!! + + You need to pass &fin, to turn it into an ifstream* instead of an ifstream. + Then its type will be compatible with the expected istream*, because ifstream + is derived from istream. + + Vern + + +File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ + +unnamed-faq-64 +============== + + + To: Enda Fadian <fadiane@piercom.ie> + Subject: Re: Question related to Flex man page? + In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST. + Date: Tue, 16 Dec 1997 14:17:09 PST + From: Vern Paxson <vern> + + > Can you explain to me what is ment by a long-jump in relation to flex? + + Using the longjmp() function while inside yylex() or a routine called by it. + + > what is the flex activation frame. + + Just yylex()'s stack frame. + + > As far as I can see yyrestart will bring me back to the sart of the input + > file and using flex++ isnot really an option! + + No, yyrestart() doesn't imply a rewind, even though its name might sound + like it does. It tells the scanner to flush its internal buffers and + start reading from the given file at its present location. + + Vern + + +File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ + +unnamed-faq-65 +============== + + + To: hassan@larc.info.uqam.ca (Hassan Alaoui) + Subject: Re: Need urgent Help + In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST. + Date: Sun, 21 Dec 1997 21:30:46 PST + From: Vern Paxson <vern> + + > /usr/lib/yaccpar: In function `int yyparse()': + > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)' + > + > ld: Undefined symbol + > _yylex + > _yyparse + > _yyin + + This is a known problem with Solaris C++ (and/or Solaris yacc). I believe + the fix is to explicitly insert some 'extern "C"' statements for the + corresponding routines/symbols. + + Vern + + +File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ + +unnamed-faq-66 +============== + + + To: mc0307@mclink.it + Cc: gnu@prep.ai.mit.edu + Subject: Re: [mc0307@mclink.it: Help request] + In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST. + Date: Sun, 21 Dec 1997 22:33:37 PST + From: Vern Paxson <vern> + + > This is my definition for float and integer types: + > . . . + > NZD [1-9] + > ... + > I've tested my program on other lex version (on UNIX Sun Solaris an HP + > UNIX) and it work well, so I think that my definitions are correct. + > There are any differences between Lex and Flex? + + There are indeed differences, as discussed in the man page. The one + you are probably running into is that when flex expands a name definition, + it puts parentheses around the expansion, while lex does not. There's + an example in the man page of how this can lead to different matching. + Flex's behavior complies with the POSIX standard (or at least with the + last POSIX draft I saw). + + Vern + + +File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ + +unnamed-faq-67 +============== + + + To: hassan@larc.info.uqam.ca (Hassan Alaoui) + Subject: Re: Thanks + In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST. + Date: Mon, 22 Dec 1997 14:35:05 PST + From: Vern Paxson <vern> + + > Thank you very much for your help. I compile and link well with C++ while + > declaring 'yylex ...' extern, But a little problem remains. I get a + > segmentation default when executing ( I linked with lfl library) while it + > works well when using LEX instead of flex. Do you have some ideas about the + > reason for this ? + + The one possible reason for this that comes to mind is if you've defined + yytext as "extern char yytext[]" (which is what lex uses) instead of + "extern char *yytext" (which is what flex uses). If it's not that, then + I'm afraid I don't know what the problem might be. + + Vern + + +File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ + +unnamed-faq-68 +============== + + + To: "Bart Niswonger" <NISWONGR@almaden.ibm.com> + Subject: Re: flex 2.5: c++ scanners & start conditions + In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST. + Date: Tue, 06 Jan 1998 19:19:30 PST + From: Vern Paxson <vern> + + > The problem is that when I do this (using %option c++) start + > conditions seem to not apply. + + The BEGIN macro modifies the yy_start variable. For C scanners, this + is a static with scope visible through the whole file. For C++ scanners, + it's a member variable, so it only has visible scope within a member + function. Your lexbegin() routine is not a member function when you + build a C++ scanner, so it's not modifying the correct yy_start. The + diagnostic that indicates this is that you found you needed to add + a declaration of yy_start in order to get your scanner to compile when + using C++; instead, the correct fix is to make lexbegin() a member + function (by deriving from yyFlexLexer). + + Vern + + +File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ + +unnamed-faq-69 +============== + + + To: "Boris Zinin" <boris@ippe.rssi.ru> + Subject: Re: current position in flex buffer + In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST. + Date: Mon, 12 Jan 1998 12:03:15 PST + From: Vern Paxson <vern> + + > The problem is how to determine the current position in flex active + > buffer when a rule is matched.... + + You will need to keep track of this explicitly, such as by redefining + YY_USER_ACTION to count the number of characters matched. + + The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov. + + Vern + + +File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ + +unnamed-faq-70 +============== + + + To: Bik.Dhaliwal@bis.org + Subject: Re: Flex question + In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST. + Date: Tue, 27 Jan 1998 22:41:52 PST + From: Vern Paxson <vern> + + > That requirement involves knowing + > the character position at which a particular token was matched + > in the lexer. + + The way you have to do this is by explicitly keeping track of where + you are in the file, by counting the number of characters scanned + for each token (available in yyleng). It may prove convenient to + do this by redefining YY_USER_ACTION, as described in the manual. + + Vern + + +File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ + +unnamed-faq-71 +============== + + + To: Vladimir Alexiev <vladimir@cs.ualberta.ca> + Subject: Re: flex: how to control start condition from parser? + In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST. + Date: Tue, 27 Jan 1998 22:45:37 PST + From: Vern Paxson <vern> + + > It seems useful for the parser to be able to tell the lexer about such + > context dependencies, because then they don't have to be limited to + > local or sequential context. + + One way to do this is to have the parser call a stub routine that's + included in the scanner's .l file, and consequently that has access ot + BEGIN. The only ugliness is that the parser can't pass in the state + it wants, because those aren't visible - but if you don't have many + such states, then using a different set of names doesn't seem like + to much of a burden. + + While generating a .h file like you suggests is certainly cleaner, + flex development has come to a virtual stand-still :-(, so a workaround + like the above is much more pragmatic than waiting for a new feature. + + Vern + + +File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ + +unnamed-faq-72 +============== + + + To: Barbara Denny <denny@3com.com> + Subject: Re: freebsd flex bug? + In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST. + Date: Fri, 30 Jan 1998 12:42:32 PST + From: Vern Paxson <vern> + + > lex.yy.c:1996: parse error before `=' + + This is the key, identifying this error. (It may help to pinpoint + it by using flex -L, so it doesn't generate #line directives in its + output.) I will bet you heavy money that you have a start condition + name that is also a variable name, or something like that; flex spits + out #define's for each start condition name, mapping them to a number, + so you can wind up with: + + %x foo + %% + ... + %% + void bar() + { + int foo = 3; + } + + and the penultimate will turn into "int 1 = 3" after C preprocessing, + since flex will put "#define foo 1" in the generated scanner. + + Vern + + +File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ + +unnamed-faq-73 +============== + + + To: Maurice Petrie <mpetrie@infoscigroup.com> + Subject: Re: Lost flex .l file + In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST. + Date: Mon, 02 Feb 1998 11:15:12 PST + From: Vern Paxson <vern> + + > I am curious as to + > whether there is a simple way to backtrack from the generated source to + > reproduce the lost list of tokens we are searching on. + + In theory, it's straight-forward to go from the DFA representation + back to a regular-expression representation - the two are isomorphic. + In practice, a huge headache, because you have to unpack all the tables + back into a single DFA representation, and then write a program to munch + on that and translate it into an RE. + + Sorry for the less-than-happy news ... + + Vern + + +File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ + +unnamed-faq-74 +============== + + + To: jimmey@lexis-nexis.com (Jimmey Todd) + Subject: Re: Flex performance question + In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. + Date: Thu, 19 Feb 1998 08:48:51 PST + From: Vern Paxson <vern> + + > What I have found, is that the smaller the data chunk, the faster the + > program executes. This is the opposite of what I expected. Should this be + > happening this way? + + This is exactly what will happen if your input file has embedded NULs. + From the man page: + + A final note: flex is slow when matching NUL's, particularly + when a token contains multiple NUL's. It's best to write + rules which match short amounts of text if it's anticipated + that the text will often include NUL's. + + So that's the first thing to look for. + + Vern + + +File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ + +unnamed-faq-75 +============== + + + To: jimmey@lexis-nexis.com (Jimmey Todd) + Subject: Re: Flex performance question + In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. + Date: Thu, 19 Feb 1998 15:42:25 PST + From: Vern Paxson <vern> + + So there are several problems. + + First, to go fast, you want to match as much text as possible, which + your scanners don't in the case that what they're scanning is *not* + a <RN> tag. So you want a rule like: + + [^<]+ + + Second, C++ scanners are particularly slow if they're interactive, + which they are by default. Using -B speeds it up by a factor of 3-4 + on my workstation. + + Third, C++ scanners that use the istream interface are slow, because + of how poorly implemented istream's are. I built two versions of + the following scanner: + + %% + .*\n + .* + %% + + and the C version inhales a 2.5MB file on my workstation in 0.8 seconds. + The C++ istream version, using -B, takes 3.8 seconds. + + Vern + |