summaryrefslogtreecommitdiff
path: root/doc/flex.info-5
diff options
context:
space:
mode:
Diffstat (limited to 'doc/flex.info-5')
-rw-r--r--doc/flex.info-51330
1 files changed, 1330 insertions, 0 deletions
diff --git a/doc/flex.info-5 b/doc/flex.info-5
new file mode 100644
index 0000000..c3b0c72
--- /dev/null
+++ b/doc/flex.info-5
@@ -0,0 +1,1330 @@
+This is flex.info, produced by makeinfo version 4.3d from flex.texi.
+
+INFO-DIR-SECTION Programming
+START-INFO-DIR-ENTRY
+* flex: (flex). Fast lexical analyzer generator (lex replacement).
+END-INFO-DIR-ENTRY
+
+
+ The flex manual is placed under the same licensing conditions as the
+rest of flex:
+
+ Copyright (C) 1990, 1997 The Regents of the University of California.
+All rights reserved.
+
+ This code is derived from software contributed to Berkeley by Vern
+Paxson.
+
+ The United States Government has rights in this work pursuant to
+contract no. DE-AC03-76SF00098 between the United States Department of
+Energy and the University of California.
+
+ Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+ 1. Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+
+ 2. Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the
+ distribution.
+ Neither the name of the University nor the names of its contributors
+may be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
+WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
+
+File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ
+
+How do I match any string not matched in the preceding rules?
+=============================================================
+
+ One way to assign precedence, is to place the more specific rules
+first. If two rules would match the same input (same sequence of
+characters) then the first rule listed in the `flex' input wins. e.g.,
+
+
+ %%
+ foo[a-zA-Z_]+ return FOO_ID;
+ bar[a-zA-Z_]+ return BAR_ID;
+ [a-zA-Z_]+ return GENERIC_ID;
+
+ Note that the rule `[a-zA-Z_]+' must come *after* the others. It
+will match the same amount of text as the more specific rules, and in
+that case the `flex' scanner will pick the first rule listed in your
+scanner as the one to match.
+
+
+File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ
+
+I am trying to port code from AT&T lex that uses yysptr and yysbuf.
+===================================================================
+
+ Those are internal variables pointing into the AT&T scanner's input
+buffer. I imagine they're being manipulated in user versions of the
+`input()' and `unput()' functions. If so, what you need to do is
+analyze those functions to figure out what they're doing, and then
+replace `input()' with an appropriate definition of `YY_INPUT'. You
+shouldn't need to (and must not) replace `flex''s `unput()' function.
+
+
+File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ
+
+Is there a way to make flex treat NULL like a regular character?
+================================================================
+
+ Yes, `\0' and `\x00' should both do the trick. Perhaps you have an
+ancient version of `flex'. The latest release is version 2.5.31.
+
+
+File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesnt flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ
+
+Whenever flex can not match the input it says "flex scanner jammed".
+====================================================================
+
+ You need to add a rule that matches the otherwise-unmatched text.
+e.g.,
+
+
+ %option yylineno
+ %%
+ [[a bunch of rules here]]
+
+ . printf("bad input character '%s' at line %d\n", yytext, yylineno);
+
+ See `%option default' for more information.
+
+
+File: flex.info, Node: Why doesnt flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ
+
+Why doesn't flex have non-greedy operators like perl does?
+==========================================================
+
+ A DFA can do a non-greedy match by stopping the first time it enters
+an accepting state, instead of consuming input until it determines that
+no further matching is possible (a "jam" state). This is actually
+easier to implement than longest leftmost match (which flex does).
+
+ But it's also much less useful than longest leftmost match. In
+general, when you find yourself wishing for non-greedy matching, that's
+usually a sign that you're trying to make the scanner do some parsing.
+That's generally the wrong approach, since it lacks the power to do a
+decent job. Better is to either introduce a separate parser, or to
+split the scanner into multiple scanners using (exclusive) start
+conditions.
+
+ You might have a separate start state once you've seen the `BEGIN'.
+In that state, you might then have a regex that will match `END' (to
+kick you out of the state), and perhaps `(.|\n)' to get a single
+character within the chunk ...
+
+ This approach also has much better error-reporting properties.
+
+
+File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesnt flex have non-greedy operators like perl does?, Up: FAQ
+
+Memory leak - 16386 bytes allocated by malloc.
+==============================================
+
+ UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that
+you did not call `yylex_destroy()'. If you are using an earlier version
+of `flex', then read on.
+
+ The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the
+read-buffer, and about 40 for `struct yy_buffer_state' (depending upon
+alignment). The leak is in the non-reentrant C scanner only (NOT in the
+reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know
+when you are done, the buffer is never freed.
+
+ However, the leak won't multiply since the buffer is reused no
+matter how many times you call `yylex()'.
+
+ If you want to reclaim the memory when you are completely done
+scanning, then you might try this:
+
+
+ /* For non-reentrant C scanner only. */
+ yy_delete_buffer(YY_CURRENT_BUFFER);
+ yy_init = 1;
+
+ Note: `yy_init' is an "internal variable", and hasn't been tested in
+this situation. It is possible that some other globals may need
+resetting as well.
+
+
+File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ
+
+How do I track the byte offset for lseek()?
+===========================================
+
+
+ > We thought that it would be possible to have this number through the
+ > evaluation of the following expression:
+ >
+ > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
+
+ While this is the right idea, it has two problems. The first is that
+it's possible that `flex' will request less than `YY_READ_BUF_SIZE'
+during an invocation of `YY_INPUT' (or that your input source will
+return less even though `YY_READ_BUF_SIZE' bytes were requested). The
+second problem is that when refilling its internal buffer, `flex' keeps
+some characters from the previous buffer (because usually it's in the
+middle of a match, and needs those characters to construct `yytext' for
+the match once it's done). Because of this, `yy_c_buf_p -
+YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
+already read from the current buffer.
+
+ An alternative solution is to count the number of characters you've
+matched since starting to scan. This can be done by using
+`YY_USER_ACTION'. For example,
+
+
+ #define YY_USER_ACTION num_chars += yyleng;
+
+ (You need to be careful to update your bookkeeping if you use
+`yymore('), `yyless()', `unput()', or `input()'.)
+
+
+File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ
+
+How do I use my own I/O classes in a C++ scanner?
+=================================================
+
+ When the flex C++ scanning class rewrite finally happens, then this
+sort of thing should become much easier.
+
+ You can do this by passing the various functions (such as
+`LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then
+dealing with your own I/O classes surreptitiously (i.e., stashing them
+in special member variables). This works because the only assumption
+about the lexer regarding what's done with the iostream's is that
+they're ultimately passed to `LexerInput()' and `LexerOutput', which
+then do whatever is necessary with them.
+
+
+File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ
+
+How do I skip as many chars as possible?
+========================================
+
+ How do I skip as many chars as possible - without interfering with
+the other patterns?
+
+ In the example below, we want to skip over characters until we see
+the phrase "endskip". The following will _NOT_ work correctly (do you
+see why not?)
+
+
+ /* INCORRECT SCANNER */
+ %x SKIP
+ %%
+ <INITIAL>startskip BEGIN(SKIP);
+ ...
+ <SKIP>"endskip" BEGIN(INITIAL);
+ <SKIP>.* ;
+
+ The problem is that the pattern .* will eat up the word "endskip."
+The simplest (but slow) fix is:
+
+
+ <SKIP>"endskip" BEGIN(INITIAL);
+ <SKIP>. ;
+
+ The fix involves making the second rule match more, without making
+it match "endskip" plus something else. So for example:
+
+
+ <SKIP>"endskip" BEGIN(INITIAL);
+ <SKIP>[^e]+ ;
+ <SKIP>. ;/* so you eat up e's, too */
+
+
+File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ
+
+deleteme00
+==========
+
+
+ QUESTION:
+ When was flex born?
+
+ Vern Paxson took over
+ the Software Tools lex project from Jef Poskanzer in 1982. At that point it
+ was written in Ratfor. Around 1987 or so, Paxson translated it into C, and
+ a legend was born :-).
+
+
+File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ
+
+Are certain equivalent patterns faster than others?
+===================================================
+
+
+ To: Adoram Rogel <adoram@orna.hybridge.com>
+ Subject: Re: Flex 2.5.2 performance questions
+ In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
+ Date: Wed, 18 Sep 96 10:51:02 PDT
+ From: Vern Paxson <vern>
+
+ [Note, the most recent flex release is 2.5.4, which you can get from
+ ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.]
+
+ > 1. Using the pattern
+ > ([Ff](oot)?)?[Nn](ote)?(\.)?
+ > instead of
+ > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
+ > (in a very complicated flex program) caused the program to slow from
+ > 300K+/min to 100K/min (no other changes were done).
+
+ These two are not equivalent. For example, the first can match "footnote."
+ but the second can only match "footnote". This is almost certainly the
+ cause in the discrepancy - the slower scanner run is matching more tokens,
+ and/or having to do more backing up.
+
+ > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
+
+ From a performance point of view, they're equivalent (modulo presumably
+ minor effects such as memory cache hit rates; and the presence of trailing
+ context, see below). From a space point of view, the first is slightly
+ preferable.
+
+ > 3. I have a pattern that look like this:
+ > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd)
+ >
+ > running yet another complicated program that includes the following rule:
+ > <snext>{and}/{no4}{bb}{pats}
+ >
+ > gets me to "too complicated - over 32,000 states"...
+
+ I can't tell from this example whether the trailing context is variable-length
+ or fixed-length (it could be the latter if {and} is fixed-length). If it's
+ variable length, which flex -p will tell you, then this reflects a basic
+ performance problem, and if you can eliminate it by restructuring your
+ scanner, you will see significant improvement.
+
+ > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
+ > 10 patterns and changed the rule to be 5 rules.
+ > This did compile, but what is the rule of thumb here ?
+
+ The rule is to avoid trailing context other than fixed-length, in which for
+ a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use
+ of the '|' operator automatically makes the pattern variable length, so in
+ this case '[Ff]oot' is preferred to '(F|f)oot'.
+
+ > 4. I changed a rule that looked like this:
+ > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
+ >
+ > to the next 2 rules:
+ > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
+ > <snext8>{and}{bb}/{ROMAN} { BEGIN...
+ >
+ > Again, I understand the using [^...] will cause a great performance loss
+
+ Actually, it doesn't cause any sort of performance loss. It's a surprising
+ fact about regular expressions that they always match in linear time
+ regardless of how complex they are.
+
+ > but are there any specific rules about it ?
+
+ See the "Performance Considerations" section of the man page, and also
+ the example in MISC/fastwc/.
+
+ Vern
+
+
+File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ
+
+Is backing up a big deal?
+=========================
+
+
+ To: Adoram Rogel <adoram@hybridge.com>
+ Subject: Re: Flex 2.5.2 performance questions
+ In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
+ Date: Thu, 19 Sep 96 09:58:00 PDT
+ From: Vern Paxson <vern>
+
+ > a lot about the backing up problem.
+ > I believe that there lies my biggest problem, and I'll try to improve
+ > it.
+
+ Since you have variable trailing context, this is a bigger performance
+ problem. Fixing it is usually easier than fixing backing up, which in a
+ complicated scanner (yours seems to fit the bill) can be extremely
+ difficult to do correctly.
+
+ You also don't mention what flags you are using for your scanner.
+ -f makes a large speed difference, and -Cfe buys you nearly as much
+ speed but the resulting scanner is considerably smaller.
+
+ > I have an | operator in {and} and in {pats} so both of them are variable
+ > length.
+
+ -p should have reported this.
+
+ > Is changing one of them to fixed-length is enough ?
+
+ Yes.
+
+ > Is it possible to change the 32,000 states limit ?
+
+ Yes. I've appended instructions on how. Before you make this change,
+ though, you should think about whether there are ways to fundamentally
+ simplify your scanner - those are certainly preferable!
+
+ Vern
+
+ To increase the 32K limit (on a machine with 32 bit integers), you increase
+ the magnitude of the following in flexdef.h:
+
+ #define JAMSTATE -32766 /* marks a reference to the state that always jams */
+ #define MAXIMUM_MNS 31999
+ #define BAD_SUBSCRIPT -32767
+ #define MAX_SHORT 32700
+
+ Adding a 0 or two after each should do the trick.
+
+
+File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ
+
+Can I fake multi-byte character support?
+========================================
+
+
+ To: Heeman_Lee@hp.com
+ Subject: Re: flex - multi-byte support?
+ In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
+ Date: Fri, 04 Oct 1996 11:42:18 PDT
+ From: Vern Paxson <vern>
+
+ > I assume as long as my *.l file defines the
+ > range of expected character code values (in octal format), flex will
+ > scan the file and read multi-byte characters correctly. But I have no
+ > confidence in this assumption.
+
+ Your lack of confidence is justified - this won't work.
+
+ Flex has in it a widespread assumption that the input is processed
+ one byte at a time. Fixing this is on the to-do list, but is involved,
+ so it won't happen any time soon. In the interim, the best I can suggest
+ (unless you want to try fixing it yourself) is to write your rules in
+ terms of pairs of bytes, using definitions in the first section:
+
+ X \xfe\xc2
+ ...
+ %%
+ foo{X}bar found_foo_fe_c2_bar();
+
+ etc. Definitely a pain - sorry about that.
+
+ By the way, the email address you used for me is ancient, indicating you
+ have a very old version of flex. You can get the most recent, 2.5.4, from
+ ftp.ee.lbl.gov.
+
+ Vern
+
+
+File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ
+
+deleteme01
+==========
+
+
+ To: moleary@primus.com
+ Subject: Re: Flex / Unicode compatibility question
+ In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
+ Date: Tue, 22 Oct 1996 11:06:13 PDT
+ From: Vern Paxson <vern>
+
+ Unfortunately flex at the moment has a widespread assumption within it
+ that characters are processed 8 bits at a time. I don't see any easy
+ fix for this (other than writing your rules in terms of double characters -
+ a pain). I also don't know of a wider lex, though you might try surfing
+ the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
+ toolkit (try searching say Alta Vista for "Purdue Compiler Construction
+ Toolkit").
+
+ Fixing flex to handle wider characters is on the long-term to-do list.
+ But since flex is a strictly spare-time project these days, this probably
+ won't happen for quite a while, unless someone else does it first.
+
+ Vern
+
+
+File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ
+
+Can you discuss some flex internals?
+====================================
+
+
+ To: Johan Linde <jl@theophys.kth.se>
+ Subject: Re: translation of flex
+ In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
+ Date: Mon, 11 Nov 1996 10:33:50 PST
+ From: Vern Paxson <vern>
+
+ > I'm working for the Swedish team translating GNU program, and I'm currently
+ > working with flex. I have a few questions about some of the messages which
+ > I hope you can answer.
+
+ All of the things you're wondering about, by the way, concerning flex
+ internals - probably the only person who understands what they mean in
+ English is me! So I wouldn't worry too much about getting them right.
+ That said ...
+
+ > #: main.c:545
+ > msgid " %d protos created\n"
+ >
+ > Does proto mean prototype?
+
+ Yes - prototypes of state compression tables.
+
+ > #: main.c:539
+ > msgid " %d/%d (peak %d) template nxt-chk entries created\n"
+ >
+ > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
+ > However, 'template next-check entries' doesn't make much sense to me. To be
+ > able to find a good translation I need to know a little bit more about it.
+
+ There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
+ scanner tables. It involves creating two pairs of tables. The first has
+ "base" and "default" entries, the second has "next" and "check" entries.
+ The "base" entry is indexed by the current state and yields an index into
+ the next/check table. The "default" entry gives what to do if the state
+ transition isn't found in next/check. The "next" entry gives the next
+ state to enter, but only if the "check" entry verifies that this entry is
+ correct for the current state. Flex creates templates of series of
+ next/check entries and then encodes differences from these templates as a
+ way to compress the tables.
+
+ > #: main.c:533
+ > msgid " %d/%d base-def entries created\n"
+ >
+ > The same problem here for 'base-def'.
+
+ See above.
+
+ Vern
+
+
+File: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ
+
+unput() messes up yy_at_bol
+===========================
+
+
+ To: Xinying Li <xli@npac.syr.edu>
+ Subject: Re: FLEX ?
+ In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
+ Date: Wed, 13 Nov 1996 19:51:54 PST
+ From: Vern Paxson <vern>
+
+ > "unput()" them to input flow, question occurs. If I do this after I scan
+ > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
+ > means the carriage flag has gone.
+
+ You can control this by calling yy_set_bol(). It's described in the manual.
+
+ > And if in pre-reading it goes to the end of file, is anything done
+ > to control the end of curren buffer and end of file?
+
+ No, there's no way to put back an end-of-file.
+
+ > By the way I am using flex 2.5.2 and using the "-l".
+
+ The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and
+ 2.5.3. You can get it from ftp.ee.lbl.gov.
+
+ Vern
+
+
+File: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ
+
+The | operator is not doing what I want
+=======================================
+
+
+ To: Alain.ISSARD@st.com
+ Subject: Re: Start condition with FLEX
+ In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
+ Date: Mon, 18 Nov 1996 10:41:34 PST
+ From: Vern Paxson <vern>
+
+ > I am not able to use the start condition scope and to use the | (OR) with
+ > rules having start conditions.
+
+ The problem is that if you use '|' as a regular expression operator, for
+ example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
+ any blanks around it. If you instead want the special '|' *action* (which
+ from your scanner appears to be the case), which is a way of giving two
+ different rules the same action:
+
+ foo |
+ bar matched_foo_or_bar();
+
+ then '|' *must* be separated from the first rule by whitespace and *must*
+ be followed by a new line. You *cannot* write it as:
+
+ foo | bar matched_foo_or_bar();
+
+ even though you might think you could because yacc supports this syntax.
+ The reason for this unfortunately incompatibility is historical, but it's
+ unlikely to be changed.
+
+ Your problems with start condition scope are simply due to syntax errors
+ from your use of '|' later confusing flex.
+
+ Let me know if you still have problems.
+
+ Vern
+
+
+File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ
+
+Why can't flex understand this variable trailing context pattern?
+=================================================================
+
+
+ To: Gregory Margo <gmargo@newton.vip.best.com>
+ Subject: Re: flex-2.5.3 bug report
+ In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
+ Date: Sat, 23 Nov 1996 17:07:32 PST
+ From: Vern Paxson <vern>
+
+ > Enclosed is a lex file that "real" lex will process, but I cannot get
+ > flex to process it. Could you try it and maybe point me in the right direction?
+
+ Your problem is that some of the definitions in the scanner use the '/'
+ trailing context operator, and have it enclosed in ()'s. Flex does not
+ allow this operator to be enclosed in ()'s because doing so allows undefined
+ regular expressions such as "(a/b)+". So the solution is to remove the
+ parentheses. Note that you must also be building the scanner with the -l
+ option for AT&T lex compatibility. Without this option, flex automatically
+ encloses the definitions in parentheses.
+
+ Vern
+
+
+File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ
+
+The ^ operator isn't working
+============================
+
+
+ To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
+ Subject: Re: Flex Bug ?
+ In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
+ Date: Tue, 26 Nov 1996 11:15:05 PST
+ From: Vern Paxson <vern>
+
+ > In my lexer code, i have the line :
+ > ^\*.* { }
+ >
+ > Thus all lines starting with an astrix (*) are comment lines.
+ > This does not work !
+
+ I can't get this problem to reproduce - it works fine for me. Note
+ though that if what you have is slightly different:
+
+ COMMENT ^\*.*
+ %%
+ {COMMENT} { }
+
+ then it won't work, because flex pushes back macro definitions enclosed
+ in ()'s, so the rule becomes
+
+ (^\*.*) { }
+
+ and now that the '^' operator is not at the immediate beginning of the
+ line, it's interpreted as just a regular character. You can avoid this
+ behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
+
+ Vern
+
+
+File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ
+
+Trailing context is getting confused with trailing optional patterns
+====================================================================
+
+
+ To: Adoram Rogel <adoram@hybridge.com>
+ Subject: Re: Flex 2.5.4 BOF ???
+ In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
+ Date: Wed, 27 Nov 1996 10:56:25 PST
+ From: Vern Paxson <vern>
+
+ > Organization(s)?/[a-z]
+ >
+ > This matched "Organizations" (looking in debug mode, the trailing s
+ > was matched with trailing context instead of the optional (s) in the
+ > end of the word.
+
+ That should only happen with lex. Flex can properly match this pattern.
+ (That might be what you're saying, I'm just not sure.)
+
+ > Is there a way to avoid this dangerous trailing context problem ?
+
+ Unfortunately, there's no easy way. On the other hand, I don't see why
+ it should be a problem. Lex's matching is clearly wrong, and I'd hope
+ that usually the intent remains the same as expressed with the pattern,
+ so flex's matching will be correct.
+
+ Vern
+
+
+File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ
+
+Is flex GNU or not?
+===================
+
+
+ To: Cameron MacKinnon <mackin@interlog.com>
+ Subject: Re: Flex documentation bug
+ In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
+ Date: Sun, 01 Dec 1996 22:29:39 PST
+ From: Vern Paxson <vern>
+
+ > I'm not sure how or where to submit bug reports (documentation or
+ > otherwise) for the GNU project stuff ...
+
+ Well, strictly speaking flex isn't part of the GNU project. They just
+ distribute it because no one's written a decent GPL'd lex replacement.
+ So you should send bugs directly to me. Those sent to the GNU folks
+ sometimes find there way to me, but some may drop between the cracks.
+
+ > In GNU Info, under the section 'Start Conditions', and also in the man
+ > page (mine's dated April '95) is a nice little snippet showing how to
+ > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
+ > size. Unfortunately, no overflow checking is ever done ...
+
+ This is already mentioned in the manual:
+
+ Finally, here's an example of how to match C-style quoted
+ strings using exclusive start conditions, including expanded
+ escape sequences (but not including checking for a string
+ that's too long):
+
+ The reason for not doing the overflow checking is that it will needlessly
+ clutter up an example whose main purpose is just to demonstrate how to
+ use flex.
+
+ The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
+
+ Vern
+
+
+File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ
+
+ERASEME53
+=========
+
+
+ To: tsv@cs.UManitoba.CA
+ Subject: Re: Flex (reg)..
+ In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
+ Date: Thu, 06 Mar 1997 15:54:19 PST
+ From: Vern Paxson <vern>
+
+ > [:alpha:] ([:alnum:] | \\_)*
+
+ If your rule really has embedded blanks as shown above, then it won't
+ work, as the first blank delimits the rule from the action. (It wouldn't
+ even compile ...) You need instead:
+
+ [:alpha:]([:alnum:]|\\_)*
+
+ and that should work fine - there's no restriction on what can go inside
+ of ()'s except for the trailing context operator, '/'.
+
+ Vern
+
+
+File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ
+
+I need to scan if-then-else blocks and while loops
+==================================================
+
+
+ To: "Mike Stolnicki" <mstolnic@ford.com>
+ Subject: Re: FLEX help
+ In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
+ Date: Fri, 30 May 1997 10:46:35 PDT
+ From: Vern Paxson <vern>
+
+ > We'd like to add "if-then-else", "while", and "for" statements to our
+ > language ...
+ > We've investigated many possible solutions. The one solution that seems
+ > the most reasonable involves knowing the position of a TOKEN in yyin.
+
+ I strongly advise you to instead build a parse tree (abstract syntax tree)
+ and loop over that instead. You'll find this has major benefits in keeping
+ your interpreter simple and extensible.
+
+ That said, the functionality you mention for get_position and set_position
+ have been on the to-do list for a while. As flex is a purely spare-time
+ project for me, no guarantees when this will be added (in particular, it
+ for sure won't be for many months to come).
+
+ Vern
+
+
+File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ
+
+ERASEME55
+=========
+
+
+ To: Colin Paul Adams <colin@colina.demon.co.uk>
+ Subject: Re: Flex C++ classes and Bison
+ In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
+ Date: Fri, 15 Aug 1997 10:48:19 PDT
+ From: Vern Paxson <vern>
+
+ > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control
+ > *parm)
+ >
+ > I have been trying to get this to work as a C++ scanner, but it does
+ > not appear to be possible (warning that it matches no declarations in
+ > yyFlexLexer, or something like that).
+ >
+ > Is this supposed to be possible, or is it being worked on (I DID
+ > notice the comment that scanner classes are still experimental, so I'm
+ > not too hopeful)?
+
+ What you need to do is derive a subclass from yyFlexLexer that provides
+ the above yylex() method, squirrels away lvalp and parm into member
+ variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
+
+ Vern
+
+
+File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ
+
+ERASEME56
+=========
+
+
+ To: Mikael.Latvala@lmf.ericsson.se
+ Subject: Re: Possible mistake in Flex v2.5 document
+ In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
+ Date: Fri, 05 Sep 1997 10:01:54 PDT
+ From: Vern Paxson <vern>
+
+ > In that example you show how to count comment lines when using
+ > C style /* ... */ comments. My question is, shouldn't you take into
+ > account a scenario where end of a comment marker occurs inside
+ > character or string literals?
+
+ The scanner certainly needs to also scan character and string literals.
+ However it does that (there's an example in the man page for strings), the
+ lexer will recognize the beginning of the literal before it runs across the
+ embedded "/*". Consequently, it will finish scanning the literal before it
+ even considers the possibility of matching "/*".
+
+ Example:
+
+ '([^']*|{ESCAPE_SEQUENCE})'
+
+ will match all the text between the ''s (inclusive). So the lexer
+ considers this as a token beginning at the first ', and doesn't even
+ attempt to match other tokens inside it.
+
+ I thinnk this subtlety is not worth putting in the manual, as I suspect
+ it would confuse more people than it would enlighten.
+
+ Vern
+
+
+File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ
+
+ERASEME57
+=========
+
+
+ To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
+ Subject: Re: flex limitations
+ In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
+ Date: Mon, 08 Sep 1997 11:38:08 PDT
+ From: Vern Paxson <vern>
+
+ > %%
+ > [a-zA-Z]+ /* skip a line */
+ > { printf("got %s\n", yytext); }
+ > %%
+
+ What version of flex are you using? If I feed this to 2.5.4, it complains:
+
+ "bug.l", line 5: EOF encountered inside an action
+ "bug.l", line 5: unrecognized rule
+ "bug.l", line 5: fatal parse error
+
+ Not the world's greatest error message, but it manages to flag the problem.
+
+ (With the introduction of start condition scopes, flex can't accommodate
+ an action on a separate line, since it's ambiguous with an indented rule.)
+
+ You can get 2.5.4 from ftp.ee.lbl.gov.
+
+ Vern
+
+
+File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ
+
+Is there a repository for flex scanners?
+========================================
+
+ Not that we know of. You might try asking on comp.compilers.
+
+
+File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ
+
+How can I conditionally compile or preprocess my flex input file?
+=================================================================
+
+ Flex doesn't have a preprocessor like C does. You might try using
+m4, or the C preprocessor plus a sed script to clean up the result.
+
+
+File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ
+
+Where can I find grammars for lex and yacc?
+===========================================
+
+ In the sources for flex and bison.
+
+
+File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ
+
+I get an end-of-buffer message for each character scanned.
+==========================================================
+
+ This will happen if your LexerInput() function returns only one
+character at a time, which can happen either if you're scanner is
+"interactive", or if the streams library on your platform always
+returns 1 for yyin->gcount().
+
+ Solution: override LexerInput() with a version that returns whole
+buffers.
+
+
+File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ
+
+unnamed-faq-62
+==============
+
+
+ To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
+ Subject: Re: Flex maximums
+ In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
+ Date: Mon, 17 Nov 1997 17:16:15 PST
+ From: Vern Paxson <vern>
+
+ > I took a quick look into the flex-sources and altered some #defines in
+ > flexdefs.h:
+ >
+ > #define INITIAL_MNS 64000
+ > #define MNS_INCREMENT 1024000
+ > #define MAXIMUM_MNS 64000
+
+ The things to fix are to add a couple of zeroes to:
+
+ #define JAMSTATE -32766 /* marks a reference to the state that always jams */
+ #define MAXIMUM_MNS 31999
+ #define BAD_SUBSCRIPT -32767
+ #define MAX_SHORT 32700
+
+ and, if you get complaints about too many rules, make the following change too:
+
+ #define YY_TRAILING_MASK 0x200000
+ #define YY_TRAILING_HEAD_MASK 0x400000
+
+ - Vern
+
+
+File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ
+
+unnamed-faq-63
+==============
+
+
+ To: jimmey@lexis-nexis.com (Jimmey Todd)
+ Subject: Re: FLEX question regarding istream vs ifstream
+ In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
+ Date: Mon, 15 Dec 1997 13:21:35 PST
+ From: Vern Paxson <vern>
+
+ > stdin_handle = YY_CURRENT_BUFFER;
+ > ifstream fin( "aFile" );
+ > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
+ >
+ > What I'm wanting to do, is pass the contents of a file thru one set
+ > of rules and then pass stdin thru another set... It works great if, I
+ > don't use the C++ classes. But since everything else that I'm doing is
+ > in C++, I thought I'd be consistent.
+ >
+ > The problem is that 'yy_create_buffer' is expecting an istream* as it's
+ > first argument (as stated in the man page). However, fin is a ifstream
+ > object. Any ideas on what I might be doing wrong? Any help would be
+ > appreciated. Thanks!!
+
+ You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
+ Then its type will be compatible with the expected istream*, because ifstream
+ is derived from istream.
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ
+
+unnamed-faq-64
+==============
+
+
+ To: Enda Fadian <fadiane@piercom.ie>
+ Subject: Re: Question related to Flex man page?
+ In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
+ Date: Tue, 16 Dec 1997 14:17:09 PST
+ From: Vern Paxson <vern>
+
+ > Can you explain to me what is ment by a long-jump in relation to flex?
+
+ Using the longjmp() function while inside yylex() or a routine called by it.
+
+ > what is the flex activation frame.
+
+ Just yylex()'s stack frame.
+
+ > As far as I can see yyrestart will bring me back to the sart of the input
+ > file and using flex++ isnot really an option!
+
+ No, yyrestart() doesn't imply a rewind, even though its name might sound
+ like it does. It tells the scanner to flush its internal buffers and
+ start reading from the given file at its present location.
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ
+
+unnamed-faq-65
+==============
+
+
+ To: hassan@larc.info.uqam.ca (Hassan Alaoui)
+ Subject: Re: Need urgent Help
+ In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
+ Date: Sun, 21 Dec 1997 21:30:46 PST
+ From: Vern Paxson <vern>
+
+ > /usr/lib/yaccpar: In function `int yyparse()':
+ > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
+ >
+ > ld: Undefined symbol
+ > _yylex
+ > _yyparse
+ > _yyin
+
+ This is a known problem with Solaris C++ (and/or Solaris yacc). I believe
+ the fix is to explicitly insert some 'extern "C"' statements for the
+ corresponding routines/symbols.
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ
+
+unnamed-faq-66
+==============
+
+
+ To: mc0307@mclink.it
+ Cc: gnu@prep.ai.mit.edu
+ Subject: Re: [mc0307@mclink.it: Help request]
+ In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
+ Date: Sun, 21 Dec 1997 22:33:37 PST
+ From: Vern Paxson <vern>
+
+ > This is my definition for float and integer types:
+ > . . .
+ > NZD [1-9]
+ > ...
+ > I've tested my program on other lex version (on UNIX Sun Solaris an HP
+ > UNIX) and it work well, so I think that my definitions are correct.
+ > There are any differences between Lex and Flex?
+
+ There are indeed differences, as discussed in the man page. The one
+ you are probably running into is that when flex expands a name definition,
+ it puts parentheses around the expansion, while lex does not. There's
+ an example in the man page of how this can lead to different matching.
+ Flex's behavior complies with the POSIX standard (or at least with the
+ last POSIX draft I saw).
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ
+
+unnamed-faq-67
+==============
+
+
+ To: hassan@larc.info.uqam.ca (Hassan Alaoui)
+ Subject: Re: Thanks
+ In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
+ Date: Mon, 22 Dec 1997 14:35:05 PST
+ From: Vern Paxson <vern>
+
+ > Thank you very much for your help. I compile and link well with C++ while
+ > declaring 'yylex ...' extern, But a little problem remains. I get a
+ > segmentation default when executing ( I linked with lfl library) while it
+ > works well when using LEX instead of flex. Do you have some ideas about the
+ > reason for this ?
+
+ The one possible reason for this that comes to mind is if you've defined
+ yytext as "extern char yytext[]" (which is what lex uses) instead of
+ "extern char *yytext" (which is what flex uses). If it's not that, then
+ I'm afraid I don't know what the problem might be.
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ
+
+unnamed-faq-68
+==============
+
+
+ To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
+ Subject: Re: flex 2.5: c++ scanners & start conditions
+ In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
+ Date: Tue, 06 Jan 1998 19:19:30 PST
+ From: Vern Paxson <vern>
+
+ > The problem is that when I do this (using %option c++) start
+ > conditions seem to not apply.
+
+ The BEGIN macro modifies the yy_start variable. For C scanners, this
+ is a static with scope visible through the whole file. For C++ scanners,
+ it's a member variable, so it only has visible scope within a member
+ function. Your lexbegin() routine is not a member function when you
+ build a C++ scanner, so it's not modifying the correct yy_start. The
+ diagnostic that indicates this is that you found you needed to add
+ a declaration of yy_start in order to get your scanner to compile when
+ using C++; instead, the correct fix is to make lexbegin() a member
+ function (by deriving from yyFlexLexer).
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ
+
+unnamed-faq-69
+==============
+
+
+ To: "Boris Zinin" <boris@ippe.rssi.ru>
+ Subject: Re: current position in flex buffer
+ In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
+ Date: Mon, 12 Jan 1998 12:03:15 PST
+ From: Vern Paxson <vern>
+
+ > The problem is how to determine the current position in flex active
+ > buffer when a rule is matched....
+
+ You will need to keep track of this explicitly, such as by redefining
+ YY_USER_ACTION to count the number of characters matched.
+
+ The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ
+
+unnamed-faq-70
+==============
+
+
+ To: Bik.Dhaliwal@bis.org
+ Subject: Re: Flex question
+ In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
+ Date: Tue, 27 Jan 1998 22:41:52 PST
+ From: Vern Paxson <vern>
+
+ > That requirement involves knowing
+ > the character position at which a particular token was matched
+ > in the lexer.
+
+ The way you have to do this is by explicitly keeping track of where
+ you are in the file, by counting the number of characters scanned
+ for each token (available in yyleng). It may prove convenient to
+ do this by redefining YY_USER_ACTION, as described in the manual.
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ
+
+unnamed-faq-71
+==============
+
+
+ To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
+ Subject: Re: flex: how to control start condition from parser?
+ In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
+ Date: Tue, 27 Jan 1998 22:45:37 PST
+ From: Vern Paxson <vern>
+
+ > It seems useful for the parser to be able to tell the lexer about such
+ > context dependencies, because then they don't have to be limited to
+ > local or sequential context.
+
+ One way to do this is to have the parser call a stub routine that's
+ included in the scanner's .l file, and consequently that has access ot
+ BEGIN. The only ugliness is that the parser can't pass in the state
+ it wants, because those aren't visible - but if you don't have many
+ such states, then using a different set of names doesn't seem like
+ to much of a burden.
+
+ While generating a .h file like you suggests is certainly cleaner,
+ flex development has come to a virtual stand-still :-(, so a workaround
+ like the above is much more pragmatic than waiting for a new feature.
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ
+
+unnamed-faq-72
+==============
+
+
+ To: Barbara Denny <denny@3com.com>
+ Subject: Re: freebsd flex bug?
+ In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
+ Date: Fri, 30 Jan 1998 12:42:32 PST
+ From: Vern Paxson <vern>
+
+ > lex.yy.c:1996: parse error before `='
+
+ This is the key, identifying this error. (It may help to pinpoint
+ it by using flex -L, so it doesn't generate #line directives in its
+ output.) I will bet you heavy money that you have a start condition
+ name that is also a variable name, or something like that; flex spits
+ out #define's for each start condition name, mapping them to a number,
+ so you can wind up with:
+
+ %x foo
+ %%
+ ...
+ %%
+ void bar()
+ {
+ int foo = 3;
+ }
+
+ and the penultimate will turn into "int 1 = 3" after C preprocessing,
+ since flex will put "#define foo 1" in the generated scanner.
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ
+
+unnamed-faq-73
+==============
+
+
+ To: Maurice Petrie <mpetrie@infoscigroup.com>
+ Subject: Re: Lost flex .l file
+ In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
+ Date: Mon, 02 Feb 1998 11:15:12 PST
+ From: Vern Paxson <vern>
+
+ > I am curious as to
+ > whether there is a simple way to backtrack from the generated source to
+ > reproduce the lost list of tokens we are searching on.
+
+ In theory, it's straight-forward to go from the DFA representation
+ back to a regular-expression representation - the two are isomorphic.
+ In practice, a huge headache, because you have to unpack all the tables
+ back into a single DFA representation, and then write a program to munch
+ on that and translate it into an RE.
+
+ Sorry for the less-than-happy news ...
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ
+
+unnamed-faq-74
+==============
+
+
+ To: jimmey@lexis-nexis.com (Jimmey Todd)
+ Subject: Re: Flex performance question
+ In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
+ Date: Thu, 19 Feb 1998 08:48:51 PST
+ From: Vern Paxson <vern>
+
+ > What I have found, is that the smaller the data chunk, the faster the
+ > program executes. This is the opposite of what I expected. Should this be
+ > happening this way?
+
+ This is exactly what will happen if your input file has embedded NULs.
+ From the man page:
+
+ A final note: flex is slow when matching NUL's, particularly
+ when a token contains multiple NUL's. It's best to write
+ rules which match short amounts of text if it's anticipated
+ that the text will often include NUL's.
+
+ So that's the first thing to look for.
+
+ Vern
+
+
+File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ
+
+unnamed-faq-75
+==============
+
+
+ To: jimmey@lexis-nexis.com (Jimmey Todd)
+ Subject: Re: Flex performance question
+ In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
+ Date: Thu, 19 Feb 1998 15:42:25 PST
+ From: Vern Paxson <vern>
+
+ So there are several problems.
+
+ First, to go fast, you want to match as much text as possible, which
+ your scanners don't in the case that what they're scanning is *not*
+ a <RN> tag. So you want a rule like:
+
+ [^<]+
+
+ Second, C++ scanners are particularly slow if they're interactive,
+ which they are by default. Using -B speeds it up by a factor of 3-4
+ on my workstation.
+
+ Third, C++ scanners that use the istream interface are slow, because
+ of how poorly implemented istream's are. I built two versions of
+ the following scanner:
+
+ %%
+ .*\n
+ .*
+ %%
+
+ and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
+ The C++ istream version, using -B, takes 3.8 seconds.
+
+ Vern
+