diff options
Diffstat (limited to 'doc/flex.info-5')
-rw-r--r-- | doc/flex.info-5 | 1330 |
1 files changed, 0 insertions, 1330 deletions
diff --git a/doc/flex.info-5 b/doc/flex.info-5 deleted file mode 100644 index 8935ccf..0000000 --- a/doc/flex.info-5 +++ /dev/null @@ -1,1330 +0,0 @@ -This is flex.info, produced by makeinfo version 4.5 from flex.texi. - -INFO-DIR-SECTION Programming -START-INFO-DIR-ENTRY -* flex: (flex). Fast lexical analyzer generator (lex replacement). -END-INFO-DIR-ENTRY - - - The flex manual is placed under the same licensing conditions as the -rest of flex: - - Copyright (C) 1990, 1997 The Regents of the University of California. -All rights reserved. - - This code is derived from software contributed to Berkeley by Vern -Paxson. - - The United States Government has rights in this work pursuant to -contract no. DE-AC03-76SF00098 between the United States Department of -Energy and the University of California. - - Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are -met: - - 1. Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - 2. Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the - distribution. - Neither the name of the University nor the names of its contributors -may be used to endorse or promote products derived from this software -without specific prior written permission. - - THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED -WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. - -File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ - -How do I match any string not matched in the preceding rules? -============================================================= - - One way to assign precedence, is to place the more specific rules -first. If two rules would match the same input (same sequence of -characters) then the first rule listed in the `flex' input wins. e.g., - - - %% - foo[a-zA-Z_]+ return FOO_ID; - bar[a-zA-Z_]+ return BAR_ID; - [a-zA-Z_]+ return GENERIC_ID; - - Note that the rule `[a-zA-Z_]+' must come *after* the others. It -will match the same amount of text as the more specific rules, and in -that case the `flex' scanner will pick the first rule listed in your -scanner as the one to match. - - -File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ - -I am trying to port code from AT&T lex that uses yysptr and yysbuf. -=================================================================== - - Those are internal variables pointing into the AT&T scanner's input -buffer. I imagine they're being manipulated in user versions of the -`input()' and `unput()' functions. If so, what you need to do is -analyze those functions to figure out what they're doing, and then -replace `input()' with an appropriate definition of `YY_INPUT'. You -shouldn't need to (and must not) replace `flex''s `unput()' function. - - -File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ - -Is there a way to make flex treat NULL like a regular character? -================================================================ - - Yes, `\0' and `\x00' should both do the trick. Perhaps you have an -ancient version of `flex'. The latest release is version 2.5.33. - - -File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesnt flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ - -Whenever flex can not match the input it says "flex scanner jammed". -==================================================================== - - You need to add a rule that matches the otherwise-unmatched text. -e.g., - - - %option yylineno - %% - [[a bunch of rules here]] - - . printf("bad input character '%s' at line %d\n", yytext, yylineno); - - See `%option default' for more information. - - -File: flex.info, Node: Why doesnt flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ - -Why doesn't flex have non-greedy operators like perl does? -========================================================== - - A DFA can do a non-greedy match by stopping the first time it enters -an accepting state, instead of consuming input until it determines that -no further matching is possible (a "jam" state). This is actually -easier to implement than longest leftmost match (which flex does). - - But it's also much less useful than longest leftmost match. In -general, when you find yourself wishing for non-greedy matching, that's -usually a sign that you're trying to make the scanner do some parsing. -That's generally the wrong approach, since it lacks the power to do a -decent job. Better is to either introduce a separate parser, or to -split the scanner into multiple scanners using (exclusive) start -conditions. - - You might have a separate start state once you've seen the `BEGIN'. -In that state, you might then have a regex that will match `END' (to -kick you out of the state), and perhaps `(.|\n)' to get a single -character within the chunk ... - - This approach also has much better error-reporting properties. - - -File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesnt flex have non-greedy operators like perl does?, Up: FAQ - -Memory leak - 16386 bytes allocated by malloc. -============================================== - - UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that -you did not call `yylex_destroy()'. If you are using an earlier version -of `flex', then read on. - - The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the -read-buffer, and about 40 for `struct yy_buffer_state' (depending upon -alignment). The leak is in the non-reentrant C scanner only (NOT in the -reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know -when you are done, the buffer is never freed. - - However, the leak won't multiply since the buffer is reused no -matter how many times you call `yylex()'. - - If you want to reclaim the memory when you are completely done -scanning, then you might try this: - - - /* For non-reentrant C scanner only. */ - yy_delete_buffer(YY_CURRENT_BUFFER); - yy_init = 1; - - Note: `yy_init' is an "internal variable", and hasn't been tested in -this situation. It is possible that some other globals may need -resetting as well. - - -File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ - -How do I track the byte offset for lseek()? -=========================================== - - - > We thought that it would be possible to have this number through the - > evaluation of the following expression: - > - > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf - - While this is the right idea, it has two problems. The first is that -it's possible that `flex' will request less than `YY_READ_BUF_SIZE' -during an invocation of `YY_INPUT' (or that your input source will -return less even though `YY_READ_BUF_SIZE' bytes were requested). The -second problem is that when refilling its internal buffer, `flex' keeps -some characters from the previous buffer (because usually it's in the -middle of a match, and needs those characters to construct `yytext' for -the match once it's done). Because of this, `yy_c_buf_p - -YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters -already read from the current buffer. - - An alternative solution is to count the number of characters you've -matched since starting to scan. This can be done by using -`YY_USER_ACTION'. For example, - - - #define YY_USER_ACTION num_chars += yyleng; - - (You need to be careful to update your bookkeeping if you use -`yymore('), `yyless()', `unput()', or `input()'.) - - -File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ - -How do I use my own I/O classes in a C++ scanner? -================================================= - - When the flex C++ scanning class rewrite finally happens, then this -sort of thing should become much easier. - - You can do this by passing the various functions (such as -`LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then -dealing with your own I/O classes surreptitiously (i.e., stashing them -in special member variables). This works because the only assumption -about the lexer regarding what's done with the iostream's is that -they're ultimately passed to `LexerInput()' and `LexerOutput', which -then do whatever is necessary with them. - - -File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ - -How do I skip as many chars as possible? -======================================== - - How do I skip as many chars as possible - without interfering with -the other patterns? - - In the example below, we want to skip over characters until we see -the phrase "endskip". The following will _NOT_ work correctly (do you -see why not?) - - - /* INCORRECT SCANNER */ - %x SKIP - %% - <INITIAL>startskip BEGIN(SKIP); - ... - <SKIP>"endskip" BEGIN(INITIAL); - <SKIP>.* ; - - The problem is that the pattern .* will eat up the word "endskip." -The simplest (but slow) fix is: - - - <SKIP>"endskip" BEGIN(INITIAL); - <SKIP>. ; - - The fix involves making the second rule match more, without making -it match "endskip" plus something else. So for example: - - - <SKIP>"endskip" BEGIN(INITIAL); - <SKIP>[^e]+ ; - <SKIP>. ;/* so you eat up e's, too */ - - -File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ - -deleteme00 -========== - - - QUESTION: - When was flex born? - - Vern Paxson took over - the Software Tools lex project from Jef Poskanzer in 1982. At that point it - was written in Ratfor. Around 1987 or so, Paxson translated it into C, and - a legend was born :-). - - -File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ - -Are certain equivalent patterns faster than others? -=================================================== - - - To: Adoram Rogel <adoram@orna.hybridge.com> - Subject: Re: Flex 2.5.2 performance questions - In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT. - Date: Wed, 18 Sep 96 10:51:02 PDT - From: Vern Paxson <vern> - - [Note, the most recent flex release is 2.5.4, which you can get from - ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.] - - > 1. Using the pattern - > ([Ff](oot)?)?[Nn](ote)?(\.)? - > instead of - > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.))) - > (in a very complicated flex program) caused the program to slow from - > 300K+/min to 100K/min (no other changes were done). - - These two are not equivalent. For example, the first can match "footnote." - but the second can only match "footnote". This is almost certainly the - cause in the discrepancy - the slower scanner run is matching more tokens, - and/or having to do more backing up. - - > 2. Which of these two are better: [Ff]oot or (F|f)oot ? - - From a performance point of view, they're equivalent (modulo presumably - minor effects such as memory cache hit rates; and the presence of trailing - context, see below). From a space point of view, the first is slightly - preferable. - - > 3. I have a pattern that look like this: - > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd) - > - > running yet another complicated program that includes the following rule: - > <snext>{and}/{no4}{bb}{pats} - > - > gets me to "too complicated - over 32,000 states"... - - I can't tell from this example whether the trailing context is variable-length - or fixed-length (it could be the latter if {and} is fixed-length). If it's - variable length, which flex -p will tell you, then this reflects a basic - performance problem, and if you can eliminate it by restructuring your - scanner, you will see significant improvement. - - > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about - > 10 patterns and changed the rule to be 5 rules. - > This did compile, but what is the rule of thumb here ? - - The rule is to avoid trailing context other than fixed-length, in which for - a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use - of the '|' operator automatically makes the pattern variable length, so in - this case '[Ff]oot' is preferred to '(F|f)oot'. - - > 4. I changed a rule that looked like this: - > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN... - > - > to the next 2 rules: - > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;} - > <snext8>{and}{bb}/{ROMAN} { BEGIN... - > - > Again, I understand the using [^...] will cause a great performance loss - - Actually, it doesn't cause any sort of performance loss. It's a surprising - fact about regular expressions that they always match in linear time - regardless of how complex they are. - - > but are there any specific rules about it ? - - See the "Performance Considerations" section of the man page, and also - the example in MISC/fastwc/. - - Vern - - -File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ - -Is backing up a big deal? -========================= - - - To: Adoram Rogel <adoram@hybridge.com> - Subject: Re: Flex 2.5.2 performance questions - In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT. - Date: Thu, 19 Sep 96 09:58:00 PDT - From: Vern Paxson <vern> - - > a lot about the backing up problem. - > I believe that there lies my biggest problem, and I'll try to improve - > it. - - Since you have variable trailing context, this is a bigger performance - problem. Fixing it is usually easier than fixing backing up, which in a - complicated scanner (yours seems to fit the bill) can be extremely - difficult to do correctly. - - You also don't mention what flags you are using for your scanner. - -f makes a large speed difference, and -Cfe buys you nearly as much - speed but the resulting scanner is considerably smaller. - - > I have an | operator in {and} and in {pats} so both of them are variable - > length. - - -p should have reported this. - - > Is changing one of them to fixed-length is enough ? - - Yes. - - > Is it possible to change the 32,000 states limit ? - - Yes. I've appended instructions on how. Before you make this change, - though, you should think about whether there are ways to fundamentally - simplify your scanner - those are certainly preferable! - - Vern - - To increase the 32K limit (on a machine with 32 bit integers), you increase - the magnitude of the following in flexdef.h: - - #define JAMSTATE -32766 /* marks a reference to the state that always jams */ - #define MAXIMUM_MNS 31999 - #define BAD_SUBSCRIPT -32767 - #define MAX_SHORT 32700 - - Adding a 0 or two after each should do the trick. - - -File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ - -Can I fake multi-byte character support? -======================================== - - - To: Heeman_Lee@hp.com - Subject: Re: flex - multi-byte support? - In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT. - Date: Fri, 04 Oct 1996 11:42:18 PDT - From: Vern Paxson <vern> - - > I assume as long as my *.l file defines the - > range of expected character code values (in octal format), flex will - > scan the file and read multi-byte characters correctly. But I have no - > confidence in this assumption. - - Your lack of confidence is justified - this won't work. - - Flex has in it a widespread assumption that the input is processed - one byte at a time. Fixing this is on the to-do list, but is involved, - so it won't happen any time soon. In the interim, the best I can suggest - (unless you want to try fixing it yourself) is to write your rules in - terms of pairs of bytes, using definitions in the first section: - - X \xfe\xc2 - ... - %% - foo{X}bar found_foo_fe_c2_bar(); - - etc. Definitely a pain - sorry about that. - - By the way, the email address you used for me is ancient, indicating you - have a very old version of flex. You can get the most recent, 2.5.4, from - ftp.ee.lbl.gov. - - Vern - - -File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ - -deleteme01 -========== - - - To: moleary@primus.com - Subject: Re: Flex / Unicode compatibility question - In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT. - Date: Tue, 22 Oct 1996 11:06:13 PDT - From: Vern Paxson <vern> - - Unfortunately flex at the moment has a widespread assumption within it - that characters are processed 8 bits at a time. I don't see any easy - fix for this (other than writing your rules in terms of double characters - - a pain). I also don't know of a wider lex, though you might try surfing - the Plan 9 stuff because I know it's a Unicode system, and also the PCCT - toolkit (try searching say Alta Vista for "Purdue Compiler Construction - Toolkit"). - - Fixing flex to handle wider characters is on the long-term to-do list. - But since flex is a strictly spare-time project these days, this probably - won't happen for quite a while, unless someone else does it first. - - Vern - - -File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ - -Can you discuss some flex internals? -==================================== - - - To: Johan Linde <jl@theophys.kth.se> - Subject: Re: translation of flex - In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST. - Date: Mon, 11 Nov 1996 10:33:50 PST - From: Vern Paxson <vern> - - > I'm working for the Swedish team translating GNU program, and I'm currently - > working with flex. I have a few questions about some of the messages which - > I hope you can answer. - - All of the things you're wondering about, by the way, concerning flex - internals - probably the only person who understands what they mean in - English is me! So I wouldn't worry too much about getting them right. - That said ... - - > #: main.c:545 - > msgid " %d protos created\n" - > - > Does proto mean prototype? - - Yes - prototypes of state compression tables. - - > #: main.c:539 - > msgid " %d/%d (peak %d) template nxt-chk entries created\n" - > - > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?) - > However, 'template next-check entries' doesn't make much sense to me. To be - > able to find a good translation I need to know a little bit more about it. - - There is a scheme in the Aho/Sethi/Ullman compiler book for compressing - scanner tables. It involves creating two pairs of tables. The first has - "base" and "default" entries, the second has "next" and "check" entries. - The "base" entry is indexed by the current state and yields an index into - the next/check table. The "default" entry gives what to do if the state - transition isn't found in next/check. The "next" entry gives the next - state to enter, but only if the "check" entry verifies that this entry is - correct for the current state. Flex creates templates of series of - next/check entries and then encodes differences from these templates as a - way to compress the tables. - - > #: main.c:533 - > msgid " %d/%d base-def entries created\n" - > - > The same problem here for 'base-def'. - - See above. - - Vern - - -File: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ - -unput() messes up yy_at_bol -=========================== - - - To: Xinying Li <xli@npac.syr.edu> - Subject: Re: FLEX ? - In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST. - Date: Wed, 13 Nov 1996 19:51:54 PST - From: Vern Paxson <vern> - - > "unput()" them to input flow, question occurs. If I do this after I scan - > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That - > means the carriage flag has gone. - - You can control this by calling yy_set_bol(). It's described in the manual. - - > And if in pre-reading it goes to the end of file, is anything done - > to control the end of curren buffer and end of file? - - No, there's no way to put back an end-of-file. - - > By the way I am using flex 2.5.2 and using the "-l". - - The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and - 2.5.3. You can get it from ftp.ee.lbl.gov. - - Vern - - -File: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ - -The | operator is not doing what I want -======================================= - - - To: Alain.ISSARD@st.com - Subject: Re: Start condition with FLEX - In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST. - Date: Mon, 18 Nov 1996 10:41:34 PST - From: Vern Paxson <vern> - - > I am not able to use the start condition scope and to use the | (OR) with - > rules having start conditions. - - The problem is that if you use '|' as a regular expression operator, for - example "a|b" meaning "match either 'a' or 'b'", then it must *not* have - any blanks around it. If you instead want the special '|' *action* (which - from your scanner appears to be the case), which is a way of giving two - different rules the same action: - - foo | - bar matched_foo_or_bar(); - - then '|' *must* be separated from the first rule by whitespace and *must* - be followed by a new line. You *cannot* write it as: - - foo | bar matched_foo_or_bar(); - - even though you might think you could because yacc supports this syntax. - The reason for this unfortunately incompatibility is historical, but it's - unlikely to be changed. - - Your problems with start condition scope are simply due to syntax errors - from your use of '|' later confusing flex. - - Let me know if you still have problems. - - Vern - - -File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ - -Why can't flex understand this variable trailing context pattern? -================================================================= - - - To: Gregory Margo <gmargo@newton.vip.best.com> - Subject: Re: flex-2.5.3 bug report - In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST. - Date: Sat, 23 Nov 1996 17:07:32 PST - From: Vern Paxson <vern> - - > Enclosed is a lex file that "real" lex will process, but I cannot get - > flex to process it. Could you try it and maybe point me in the right direction? - - Your problem is that some of the definitions in the scanner use the '/' - trailing context operator, and have it enclosed in ()'s. Flex does not - allow this operator to be enclosed in ()'s because doing so allows undefined - regular expressions such as "(a/b)+". So the solution is to remove the - parentheses. Note that you must also be building the scanner with the -l - option for AT&T lex compatibility. Without this option, flex automatically - encloses the definitions in parentheses. - - Vern - - -File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ - -The ^ operator isn't working -============================ - - - To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de> - Subject: Re: Flex Bug ? - In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST. - Date: Tue, 26 Nov 1996 11:15:05 PST - From: Vern Paxson <vern> - - > In my lexer code, i have the line : - > ^\*.* { } - > - > Thus all lines starting with an astrix (*) are comment lines. - > This does not work ! - - I can't get this problem to reproduce - it works fine for me. Note - though that if what you have is slightly different: - - COMMENT ^\*.* - %% - {COMMENT} { } - - then it won't work, because flex pushes back macro definitions enclosed - in ()'s, so the rule becomes - - (^\*.*) { } - - and now that the '^' operator is not at the immediate beginning of the - line, it's interpreted as just a regular character. You can avoid this - behavior by using the "-l" lex-compatibility flag, or "%option lex-compat". - - Vern - - -File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ - -Trailing context is getting confused with trailing optional patterns -==================================================================== - - - To: Adoram Rogel <adoram@hybridge.com> - Subject: Re: Flex 2.5.4 BOF ??? - In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST. - Date: Wed, 27 Nov 1996 10:56:25 PST - From: Vern Paxson <vern> - - > Organization(s)?/[a-z] - > - > This matched "Organizations" (looking in debug mode, the trailing s - > was matched with trailing context instead of the optional (s) in the - > end of the word. - - That should only happen with lex. Flex can properly match this pattern. - (That might be what you're saying, I'm just not sure.) - - > Is there a way to avoid this dangerous trailing context problem ? - - Unfortunately, there's no easy way. On the other hand, I don't see why - it should be a problem. Lex's matching is clearly wrong, and I'd hope - that usually the intent remains the same as expressed with the pattern, - so flex's matching will be correct. - - Vern - - -File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ - -Is flex GNU or not? -=================== - - - To: Cameron MacKinnon <mackin@interlog.com> - Subject: Re: Flex documentation bug - In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST. - Date: Sun, 01 Dec 1996 22:29:39 PST - From: Vern Paxson <vern> - - > I'm not sure how or where to submit bug reports (documentation or - > otherwise) for the GNU project stuff ... - - Well, strictly speaking flex isn't part of the GNU project. They just - distribute it because no one's written a decent GPL'd lex replacement. - So you should send bugs directly to me. Those sent to the GNU folks - sometimes find there way to me, but some may drop between the cracks. - - > In GNU Info, under the section 'Start Conditions', and also in the man - > page (mine's dated April '95) is a nice little snippet showing how to - > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in - > size. Unfortunately, no overflow checking is ever done ... - - This is already mentioned in the manual: - - Finally, here's an example of how to match C-style quoted - strings using exclusive start conditions, including expanded - escape sequences (but not including checking for a string - that's too long): - - The reason for not doing the overflow checking is that it will needlessly - clutter up an example whose main purpose is just to demonstrate how to - use flex. - - The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov. - - Vern - - -File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ - -ERASEME53 -========= - - - To: tsv@cs.UManitoba.CA - Subject: Re: Flex (reg).. - In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST. - Date: Thu, 06 Mar 1997 15:54:19 PST - From: Vern Paxson <vern> - - > [:alpha:] ([:alnum:] | \\_)* - - If your rule really has embedded blanks as shown above, then it won't - work, as the first blank delimits the rule from the action. (It wouldn't - even compile ...) You need instead: - - [:alpha:]([:alnum:]|\\_)* - - and that should work fine - there's no restriction on what can go inside - of ()'s except for the trailing context operator, '/'. - - Vern - - -File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ - -I need to scan if-then-else blocks and while loops -================================================== - - - To: "Mike Stolnicki" <mstolnic@ford.com> - Subject: Re: FLEX help - In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT. - Date: Fri, 30 May 1997 10:46:35 PDT - From: Vern Paxson <vern> - - > We'd like to add "if-then-else", "while", and "for" statements to our - > language ... - > We've investigated many possible solutions. The one solution that seems - > the most reasonable involves knowing the position of a TOKEN in yyin. - - I strongly advise you to instead build a parse tree (abstract syntax tree) - and loop over that instead. You'll find this has major benefits in keeping - your interpreter simple and extensible. - - That said, the functionality you mention for get_position and set_position - have been on the to-do list for a while. As flex is a purely spare-time - project for me, no guarantees when this will be added (in particular, it - for sure won't be for many months to come). - - Vern - - -File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ - -ERASEME55 -========= - - - To: Colin Paul Adams <colin@colina.demon.co.uk> - Subject: Re: Flex C++ classes and Bison - In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT. - Date: Fri, 15 Aug 1997 10:48:19 PDT - From: Vern Paxson <vern> - - > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control - > *parm) - > - > I have been trying to get this to work as a C++ scanner, but it does - > not appear to be possible (warning that it matches no declarations in - > yyFlexLexer, or something like that). - > - > Is this supposed to be possible, or is it being worked on (I DID - > notice the comment that scanner classes are still experimental, so I'm - > not too hopeful)? - - What you need to do is derive a subclass from yyFlexLexer that provides - the above yylex() method, squirrels away lvalp and parm into member - variables, and then invokes yyFlexLexer::yylex() to do the regular scanning. - - Vern - - -File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ - -ERASEME56 -========= - - - To: Mikael.Latvala@lmf.ericsson.se - Subject: Re: Possible mistake in Flex v2.5 document - In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT. - Date: Fri, 05 Sep 1997 10:01:54 PDT - From: Vern Paxson <vern> - - > In that example you show how to count comment lines when using - > C style /* ... */ comments. My question is, shouldn't you take into - > account a scenario where end of a comment marker occurs inside - > character or string literals? - - The scanner certainly needs to also scan character and string literals. - However it does that (there's an example in the man page for strings), the - lexer will recognize the beginning of the literal before it runs across the - embedded "/*". Consequently, it will finish scanning the literal before it - even considers the possibility of matching "/*". - - Example: - - '([^']*|{ESCAPE_SEQUENCE})' - - will match all the text between the ''s (inclusive). So the lexer - considers this as a token beginning at the first ', and doesn't even - attempt to match other tokens inside it. - - I thinnk this subtlety is not worth putting in the manual, as I suspect - it would confuse more people than it would enlighten. - - Vern - - -File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ - -ERASEME57 -========= - - - To: "Marty Leisner" <leisner@sdsp.mc.xerox.com> - Subject: Re: flex limitations - In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT. - Date: Mon, 08 Sep 1997 11:38:08 PDT - From: Vern Paxson <vern> - - > %% - > [a-zA-Z]+ /* skip a line */ - > { printf("got %s\n", yytext); } - > %% - - What version of flex are you using? If I feed this to 2.5.4, it complains: - - "bug.l", line 5: EOF encountered inside an action - "bug.l", line 5: unrecognized rule - "bug.l", line 5: fatal parse error - - Not the world's greatest error message, but it manages to flag the problem. - - (With the introduction of start condition scopes, flex can't accommodate - an action on a separate line, since it's ambiguous with an indented rule.) - - You can get 2.5.4 from ftp.ee.lbl.gov. - - Vern - - -File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ - -Is there a repository for flex scanners? -======================================== - - Not that we know of. You might try asking on comp.compilers. - - -File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ - -How can I conditionally compile or preprocess my flex input file? -================================================================= - - Flex doesn't have a preprocessor like C does. You might try using -m4, or the C preprocessor plus a sed script to clean up the result. - - -File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ - -Where can I find grammars for lex and yacc? -=========================================== - - In the sources for flex and bison. - - -File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ - -I get an end-of-buffer message for each character scanned. -========================================================== - - This will happen if your LexerInput() function returns only one -character at a time, which can happen either if you're scanner is -"interactive", or if the streams library on your platform always -returns 1 for yyin->gcount(). - - Solution: override LexerInput() with a version that returns whole -buffers. - - -File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ - -unnamed-faq-62 -============== - - - To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE - Subject: Re: Flex maximums - In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST. - Date: Mon, 17 Nov 1997 17:16:15 PST - From: Vern Paxson <vern> - - > I took a quick look into the flex-sources and altered some #defines in - > flexdefs.h: - > - > #define INITIAL_MNS 64000 - > #define MNS_INCREMENT 1024000 - > #define MAXIMUM_MNS 64000 - - The things to fix are to add a couple of zeroes to: - - #define JAMSTATE -32766 /* marks a reference to the state that always jams */ - #define MAXIMUM_MNS 31999 - #define BAD_SUBSCRIPT -32767 - #define MAX_SHORT 32700 - - and, if you get complaints about too many rules, make the following change too: - - #define YY_TRAILING_MASK 0x200000 - #define YY_TRAILING_HEAD_MASK 0x400000 - - - Vern - - -File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ - -unnamed-faq-63 -============== - - - To: jimmey@lexis-nexis.com (Jimmey Todd) - Subject: Re: FLEX question regarding istream vs ifstream - In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST. - Date: Mon, 15 Dec 1997 13:21:35 PST - From: Vern Paxson <vern> - - > stdin_handle = YY_CURRENT_BUFFER; - > ifstream fin( "aFile" ); - > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) ); - > - > What I'm wanting to do, is pass the contents of a file thru one set - > of rules and then pass stdin thru another set... It works great if, I - > don't use the C++ classes. But since everything else that I'm doing is - > in C++, I thought I'd be consistent. - > - > The problem is that 'yy_create_buffer' is expecting an istream* as it's - > first argument (as stated in the man page). However, fin is a ifstream - > object. Any ideas on what I might be doing wrong? Any help would be - > appreciated. Thanks!! - - You need to pass &fin, to turn it into an ifstream* instead of an ifstream. - Then its type will be compatible with the expected istream*, because ifstream - is derived from istream. - - Vern - - -File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ - -unnamed-faq-64 -============== - - - To: Enda Fadian <fadiane@piercom.ie> - Subject: Re: Question related to Flex man page? - In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST. - Date: Tue, 16 Dec 1997 14:17:09 PST - From: Vern Paxson <vern> - - > Can you explain to me what is ment by a long-jump in relation to flex? - - Using the longjmp() function while inside yylex() or a routine called by it. - - > what is the flex activation frame. - - Just yylex()'s stack frame. - - > As far as I can see yyrestart will bring me back to the sart of the input - > file and using flex++ isnot really an option! - - No, yyrestart() doesn't imply a rewind, even though its name might sound - like it does. It tells the scanner to flush its internal buffers and - start reading from the given file at its present location. - - Vern - - -File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ - -unnamed-faq-65 -============== - - - To: hassan@larc.info.uqam.ca (Hassan Alaoui) - Subject: Re: Need urgent Help - In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST. - Date: Sun, 21 Dec 1997 21:30:46 PST - From: Vern Paxson <vern> - - > /usr/lib/yaccpar: In function `int yyparse()': - > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)' - > - > ld: Undefined symbol - > _yylex - > _yyparse - > _yyin - - This is a known problem with Solaris C++ (and/or Solaris yacc). I believe - the fix is to explicitly insert some 'extern "C"' statements for the - corresponding routines/symbols. - - Vern - - -File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ - -unnamed-faq-66 -============== - - - To: mc0307@mclink.it - Cc: gnu@prep.ai.mit.edu - Subject: Re: [mc0307@mclink.it: Help request] - In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST. - Date: Sun, 21 Dec 1997 22:33:37 PST - From: Vern Paxson <vern> - - > This is my definition for float and integer types: - > . . . - > NZD [1-9] - > ... - > I've tested my program on other lex version (on UNIX Sun Solaris an HP - > UNIX) and it work well, so I think that my definitions are correct. - > There are any differences between Lex and Flex? - - There are indeed differences, as discussed in the man page. The one - you are probably running into is that when flex expands a name definition, - it puts parentheses around the expansion, while lex does not. There's - an example in the man page of how this can lead to different matching. - Flex's behavior complies with the POSIX standard (or at least with the - last POSIX draft I saw). - - Vern - - -File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ - -unnamed-faq-67 -============== - - - To: hassan@larc.info.uqam.ca (Hassan Alaoui) - Subject: Re: Thanks - In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST. - Date: Mon, 22 Dec 1997 14:35:05 PST - From: Vern Paxson <vern> - - > Thank you very much for your help. I compile and link well with C++ while - > declaring 'yylex ...' extern, But a little problem remains. I get a - > segmentation default when executing ( I linked with lfl library) while it - > works well when using LEX instead of flex. Do you have some ideas about the - > reason for this ? - - The one possible reason for this that comes to mind is if you've defined - yytext as "extern char yytext[]" (which is what lex uses) instead of - "extern char *yytext" (which is what flex uses). If it's not that, then - I'm afraid I don't know what the problem might be. - - Vern - - -File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ - -unnamed-faq-68 -============== - - - To: "Bart Niswonger" <NISWONGR@almaden.ibm.com> - Subject: Re: flex 2.5: c++ scanners & start conditions - In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST. - Date: Tue, 06 Jan 1998 19:19:30 PST - From: Vern Paxson <vern> - - > The problem is that when I do this (using %option c++) start - > conditions seem to not apply. - - The BEGIN macro modifies the yy_start variable. For C scanners, this - is a static with scope visible through the whole file. For C++ scanners, - it's a member variable, so it only has visible scope within a member - function. Your lexbegin() routine is not a member function when you - build a C++ scanner, so it's not modifying the correct yy_start. The - diagnostic that indicates this is that you found you needed to add - a declaration of yy_start in order to get your scanner to compile when - using C++; instead, the correct fix is to make lexbegin() a member - function (by deriving from yyFlexLexer). - - Vern - - -File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ - -unnamed-faq-69 -============== - - - To: "Boris Zinin" <boris@ippe.rssi.ru> - Subject: Re: current position in flex buffer - In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST. - Date: Mon, 12 Jan 1998 12:03:15 PST - From: Vern Paxson <vern> - - > The problem is how to determine the current position in flex active - > buffer when a rule is matched.... - - You will need to keep track of this explicitly, such as by redefining - YY_USER_ACTION to count the number of characters matched. - - The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov. - - Vern - - -File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ - -unnamed-faq-70 -============== - - - To: Bik.Dhaliwal@bis.org - Subject: Re: Flex question - In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST. - Date: Tue, 27 Jan 1998 22:41:52 PST - From: Vern Paxson <vern> - - > That requirement involves knowing - > the character position at which a particular token was matched - > in the lexer. - - The way you have to do this is by explicitly keeping track of where - you are in the file, by counting the number of characters scanned - for each token (available in yyleng). It may prove convenient to - do this by redefining YY_USER_ACTION, as described in the manual. - - Vern - - -File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ - -unnamed-faq-71 -============== - - - To: Vladimir Alexiev <vladimir@cs.ualberta.ca> - Subject: Re: flex: how to control start condition from parser? - In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST. - Date: Tue, 27 Jan 1998 22:45:37 PST - From: Vern Paxson <vern> - - > It seems useful for the parser to be able to tell the lexer about such - > context dependencies, because then they don't have to be limited to - > local or sequential context. - - One way to do this is to have the parser call a stub routine that's - included in the scanner's .l file, and consequently that has access ot - BEGIN. The only ugliness is that the parser can't pass in the state - it wants, because those aren't visible - but if you don't have many - such states, then using a different set of names doesn't seem like - to much of a burden. - - While generating a .h file like you suggests is certainly cleaner, - flex development has come to a virtual stand-still :-(, so a workaround - like the above is much more pragmatic than waiting for a new feature. - - Vern - - -File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ - -unnamed-faq-72 -============== - - - To: Barbara Denny <denny@3com.com> - Subject: Re: freebsd flex bug? - In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST. - Date: Fri, 30 Jan 1998 12:42:32 PST - From: Vern Paxson <vern> - - > lex.yy.c:1996: parse error before `=' - - This is the key, identifying this error. (It may help to pinpoint - it by using flex -L, so it doesn't generate #line directives in its - output.) I will bet you heavy money that you have a start condition - name that is also a variable name, or something like that; flex spits - out #define's for each start condition name, mapping them to a number, - so you can wind up with: - - %x foo - %% - ... - %% - void bar() - { - int foo = 3; - } - - and the penultimate will turn into "int 1 = 3" after C preprocessing, - since flex will put "#define foo 1" in the generated scanner. - - Vern - - -File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ - -unnamed-faq-73 -============== - - - To: Maurice Petrie <mpetrie@infoscigroup.com> - Subject: Re: Lost flex .l file - In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST. - Date: Mon, 02 Feb 1998 11:15:12 PST - From: Vern Paxson <vern> - - > I am curious as to - > whether there is a simple way to backtrack from the generated source to - > reproduce the lost list of tokens we are searching on. - - In theory, it's straight-forward to go from the DFA representation - back to a regular-expression representation - the two are isomorphic. - In practice, a huge headache, because you have to unpack all the tables - back into a single DFA representation, and then write a program to munch - on that and translate it into an RE. - - Sorry for the less-than-happy news ... - - Vern - - -File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ - -unnamed-faq-74 -============== - - - To: jimmey@lexis-nexis.com (Jimmey Todd) - Subject: Re: Flex performance question - In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. - Date: Thu, 19 Feb 1998 08:48:51 PST - From: Vern Paxson <vern> - - > What I have found, is that the smaller the data chunk, the faster the - > program executes. This is the opposite of what I expected. Should this be - > happening this way? - - This is exactly what will happen if your input file has embedded NULs. - From the man page: - - A final note: flex is slow when matching NUL's, particularly - when a token contains multiple NUL's. It's best to write - rules which match short amounts of text if it's anticipated - that the text will often include NUL's. - - So that's the first thing to look for. - - Vern - - -File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ - -unnamed-faq-75 -============== - - - To: jimmey@lexis-nexis.com (Jimmey Todd) - Subject: Re: Flex performance question - In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST. - Date: Thu, 19 Feb 1998 15:42:25 PST - From: Vern Paxson <vern> - - So there are several problems. - - First, to go fast, you want to match as much text as possible, which - your scanners don't in the case that what they're scanning is *not* - a <RN> tag. So you want a rule like: - - [^<]+ - - Second, C++ scanners are particularly slow if they're interactive, - which they are by default. Using -B speeds it up by a factor of 3-4 - on my workstation. - - Third, C++ scanners that use the istream interface are slow, because - of how poorly implemented istream's are. I built two versions of - the following scanner: - - %% - .*\n - .* - %% - - and the C version inhales a 2.5MB file on my workstation in 0.8 seconds. - The C++ istream version, using -B, takes 3.8 seconds. - - Vern - |