summaryrefslogtreecommitdiff
path: root/doc/flex.info-5
diff options
context:
space:
mode:
Diffstat (limited to 'doc/flex.info-5')
-rw-r--r--doc/flex.info-51330
1 files changed, 0 insertions, 1330 deletions
diff --git a/doc/flex.info-5 b/doc/flex.info-5
deleted file mode 100644
index 8935ccf..0000000
--- a/doc/flex.info-5
+++ /dev/null
@@ -1,1330 +0,0 @@
-This is flex.info, produced by makeinfo version 4.5 from flex.texi.
-
-INFO-DIR-SECTION Programming
-START-INFO-DIR-ENTRY
-* flex: (flex). Fast lexical analyzer generator (lex replacement).
-END-INFO-DIR-ENTRY
-
-
- The flex manual is placed under the same licensing conditions as the
-rest of flex:
-
- Copyright (C) 1990, 1997 The Regents of the University of California.
-All rights reserved.
-
- This code is derived from software contributed to Berkeley by Vern
-Paxson.
-
- The United States Government has rights in this work pursuant to
-contract no. DE-AC03-76SF00098 between the United States Department of
-Energy and the University of California.
-
- Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are
-met:
-
- 1. Redistributions of source code must retain the above copyright
- notice, this list of conditions and the following disclaimer.
-
- 2. Redistributions in binary form must reproduce the above copyright
- notice, this list of conditions and the following disclaimer in the
- documentation and/or other materials provided with the
- distribution.
- Neither the name of the University nor the names of its contributors
-may be used to endorse or promote products derived from this software
-without specific prior written permission.
-
- THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
-WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
-MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
-
-File: flex.info, Node: How do I match any string not matched in the preceding rules?, Next: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Prev: How can I build a two-pass scanner?, Up: FAQ
-
-How do I match any string not matched in the preceding rules?
-=============================================================
-
- One way to assign precedence, is to place the more specific rules
-first. If two rules would match the same input (same sequence of
-characters) then the first rule listed in the `flex' input wins. e.g.,
-
-
- %%
- foo[a-zA-Z_]+ return FOO_ID;
- bar[a-zA-Z_]+ return BAR_ID;
- [a-zA-Z_]+ return GENERIC_ID;
-
- Note that the rule `[a-zA-Z_]+' must come *after* the others. It
-will match the same amount of text as the more specific rules, and in
-that case the `flex' scanner will pick the first rule listed in your
-scanner as the one to match.
-
-
-File: flex.info, Node: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Next: Is there a way to make flex treat NULL like a regular character?, Prev: How do I match any string not matched in the preceding rules?, Up: FAQ
-
-I am trying to port code from AT&T lex that uses yysptr and yysbuf.
-===================================================================
-
- Those are internal variables pointing into the AT&T scanner's input
-buffer. I imagine they're being manipulated in user versions of the
-`input()' and `unput()' functions. If so, what you need to do is
-analyze those functions to figure out what they're doing, and then
-replace `input()' with an appropriate definition of `YY_INPUT'. You
-shouldn't need to (and must not) replace `flex''s `unput()' function.
-
-
-File: flex.info, Node: Is there a way to make flex treat NULL like a regular character?, Next: Whenever flex can not match the input it says "flex scanner jammed"., Prev: I am trying to port code from AT&T lex that uses yysptr and yysbuf., Up: FAQ
-
-Is there a way to make flex treat NULL like a regular character?
-================================================================
-
- Yes, `\0' and `\x00' should both do the trick. Perhaps you have an
-ancient version of `flex'. The latest release is version 2.5.33.
-
-
-File: flex.info, Node: Whenever flex can not match the input it says "flex scanner jammed"., Next: Why doesnt flex have non-greedy operators like perl does?, Prev: Is there a way to make flex treat NULL like a regular character?, Up: FAQ
-
-Whenever flex can not match the input it says "flex scanner jammed".
-====================================================================
-
- You need to add a rule that matches the otherwise-unmatched text.
-e.g.,
-
-
- %option yylineno
- %%
- [[a bunch of rules here]]
-
- . printf("bad input character '%s' at line %d\n", yytext, yylineno);
-
- See `%option default' for more information.
-
-
-File: flex.info, Node: Why doesnt flex have non-greedy operators like perl does?, Next: Memory leak - 16386 bytes allocated by malloc., Prev: Whenever flex can not match the input it says "flex scanner jammed"., Up: FAQ
-
-Why doesn't flex have non-greedy operators like perl does?
-==========================================================
-
- A DFA can do a non-greedy match by stopping the first time it enters
-an accepting state, instead of consuming input until it determines that
-no further matching is possible (a "jam" state). This is actually
-easier to implement than longest leftmost match (which flex does).
-
- But it's also much less useful than longest leftmost match. In
-general, when you find yourself wishing for non-greedy matching, that's
-usually a sign that you're trying to make the scanner do some parsing.
-That's generally the wrong approach, since it lacks the power to do a
-decent job. Better is to either introduce a separate parser, or to
-split the scanner into multiple scanners using (exclusive) start
-conditions.
-
- You might have a separate start state once you've seen the `BEGIN'.
-In that state, you might then have a regex that will match `END' (to
-kick you out of the state), and perhaps `(.|\n)' to get a single
-character within the chunk ...
-
- This approach also has much better error-reporting properties.
-
-
-File: flex.info, Node: Memory leak - 16386 bytes allocated by malloc., Next: How do I track the byte offset for lseek()?, Prev: Why doesnt flex have non-greedy operators like perl does?, Up: FAQ
-
-Memory leak - 16386 bytes allocated by malloc.
-==============================================
-
- UPDATED 2002-07-10: As of `flex' version 2.5.9, this leak means that
-you did not call `yylex_destroy()'. If you are using an earlier version
-of `flex', then read on.
-
- The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the
-read-buffer, and about 40 for `struct yy_buffer_state' (depending upon
-alignment). The leak is in the non-reentrant C scanner only (NOT in the
-reentrant scanner, NOT in the C++ scanner). Since `flex' doesn't know
-when you are done, the buffer is never freed.
-
- However, the leak won't multiply since the buffer is reused no
-matter how many times you call `yylex()'.
-
- If you want to reclaim the memory when you are completely done
-scanning, then you might try this:
-
-
- /* For non-reentrant C scanner only. */
- yy_delete_buffer(YY_CURRENT_BUFFER);
- yy_init = 1;
-
- Note: `yy_init' is an "internal variable", and hasn't been tested in
-this situation. It is possible that some other globals may need
-resetting as well.
-
-
-File: flex.info, Node: How do I track the byte offset for lseek()?, Next: How do I use my own I/O classes in a C++ scanner?, Prev: Memory leak - 16386 bytes allocated by malloc., Up: FAQ
-
-How do I track the byte offset for lseek()?
-===========================================
-
-
- > We thought that it would be possible to have this number through the
- > evaluation of the following expression:
- >
- > seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - YY_CURRENT_BUFFER->yy_ch_buf
-
- While this is the right idea, it has two problems. The first is that
-it's possible that `flex' will request less than `YY_READ_BUF_SIZE'
-during an invocation of `YY_INPUT' (or that your input source will
-return less even though `YY_READ_BUF_SIZE' bytes were requested). The
-second problem is that when refilling its internal buffer, `flex' keeps
-some characters from the previous buffer (because usually it's in the
-middle of a match, and needs those characters to construct `yytext' for
-the match once it's done). Because of this, `yy_c_buf_p -
-YY_CURRENT_BUFFER->yy_ch_buf' won't be exactly the number of characters
-already read from the current buffer.
-
- An alternative solution is to count the number of characters you've
-matched since starting to scan. This can be done by using
-`YY_USER_ACTION'. For example,
-
-
- #define YY_USER_ACTION num_chars += yyleng;
-
- (You need to be careful to update your bookkeeping if you use
-`yymore('), `yyless()', `unput()', or `input()'.)
-
-
-File: flex.info, Node: How do I use my own I/O classes in a C++ scanner?, Next: How do I skip as many chars as possible?, Prev: How do I track the byte offset for lseek()?, Up: FAQ
-
-How do I use my own I/O classes in a C++ scanner?
-=================================================
-
- When the flex C++ scanning class rewrite finally happens, then this
-sort of thing should become much easier.
-
- You can do this by passing the various functions (such as
-`LexerInput()' and `LexerOutput()') NULL `iostream*''s, and then
-dealing with your own I/O classes surreptitiously (i.e., stashing them
-in special member variables). This works because the only assumption
-about the lexer regarding what's done with the iostream's is that
-they're ultimately passed to `LexerInput()' and `LexerOutput', which
-then do whatever is necessary with them.
-
-
-File: flex.info, Node: How do I skip as many chars as possible?, Next: deleteme00, Prev: How do I use my own I/O classes in a C++ scanner?, Up: FAQ
-
-How do I skip as many chars as possible?
-========================================
-
- How do I skip as many chars as possible - without interfering with
-the other patterns?
-
- In the example below, we want to skip over characters until we see
-the phrase "endskip". The following will _NOT_ work correctly (do you
-see why not?)
-
-
- /* INCORRECT SCANNER */
- %x SKIP
- %%
- <INITIAL>startskip BEGIN(SKIP);
- ...
- <SKIP>"endskip" BEGIN(INITIAL);
- <SKIP>.* ;
-
- The problem is that the pattern .* will eat up the word "endskip."
-The simplest (but slow) fix is:
-
-
- <SKIP>"endskip" BEGIN(INITIAL);
- <SKIP>. ;
-
- The fix involves making the second rule match more, without making
-it match "endskip" plus something else. So for example:
-
-
- <SKIP>"endskip" BEGIN(INITIAL);
- <SKIP>[^e]+ ;
- <SKIP>. ;/* so you eat up e's, too */
-
-
-File: flex.info, Node: deleteme00, Next: Are certain equivalent patterns faster than others?, Prev: How do I skip as many chars as possible?, Up: FAQ
-
-deleteme00
-==========
-
-
- QUESTION:
- When was flex born?
-
- Vern Paxson took over
- the Software Tools lex project from Jef Poskanzer in 1982. At that point it
- was written in Ratfor. Around 1987 or so, Paxson translated it into C, and
- a legend was born :-).
-
-
-File: flex.info, Node: Are certain equivalent patterns faster than others?, Next: Is backing up a big deal?, Prev: deleteme00, Up: FAQ
-
-Are certain equivalent patterns faster than others?
-===================================================
-
-
- To: Adoram Rogel <adoram@orna.hybridge.com>
- Subject: Re: Flex 2.5.2 performance questions
- In-reply-to: Your message of Wed, 18 Sep 96 11:12:17 EDT.
- Date: Wed, 18 Sep 96 10:51:02 PDT
- From: Vern Paxson <vern>
-
- [Note, the most recent flex release is 2.5.4, which you can get from
- ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.]
-
- > 1. Using the pattern
- > ([Ff](oot)?)?[Nn](ote)?(\.)?
- > instead of
- > (((F|f)oot(N|n)ote)|((N|n)ote)|((N|n)\.)|((F|f)(N|n)(\.)))
- > (in a very complicated flex program) caused the program to slow from
- > 300K+/min to 100K/min (no other changes were done).
-
- These two are not equivalent. For example, the first can match "footnote."
- but the second can only match "footnote". This is almost certainly the
- cause in the discrepancy - the slower scanner run is matching more tokens,
- and/or having to do more backing up.
-
- > 2. Which of these two are better: [Ff]oot or (F|f)oot ?
-
- From a performance point of view, they're equivalent (modulo presumably
- minor effects such as memory cache hit rates; and the presence of trailing
- context, see below). From a space point of view, the first is slightly
- preferable.
-
- > 3. I have a pattern that look like this:
- > pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd)
- >
- > running yet another complicated program that includes the following rule:
- > <snext>{and}/{no4}{bb}{pats}
- >
- > gets me to "too complicated - over 32,000 states"...
-
- I can't tell from this example whether the trailing context is variable-length
- or fixed-length (it could be the latter if {and} is fixed-length). If it's
- variable length, which flex -p will tell you, then this reflects a basic
- performance problem, and if you can eliminate it by restructuring your
- scanner, you will see significant improvement.
-
- > so I divided {pats} to {pats1}, {pats2},..., {pats5} each consists of about
- > 10 patterns and changed the rule to be 5 rules.
- > This did compile, but what is the rule of thumb here ?
-
- The rule is to avoid trailing context other than fixed-length, in which for
- a/b, either the 'a' pattern or the 'b' pattern have a fixed length. Use
- of the '|' operator automatically makes the pattern variable length, so in
- this case '[Ff]oot' is preferred to '(F|f)oot'.
-
- > 4. I changed a rule that looked like this:
- > <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
- >
- > to the next 2 rules:
- > <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
- > <snext8>{and}{bb}/{ROMAN} { BEGIN...
- >
- > Again, I understand the using [^...] will cause a great performance loss
-
- Actually, it doesn't cause any sort of performance loss. It's a surprising
- fact about regular expressions that they always match in linear time
- regardless of how complex they are.
-
- > but are there any specific rules about it ?
-
- See the "Performance Considerations" section of the man page, and also
- the example in MISC/fastwc/.
-
- Vern
-
-
-File: flex.info, Node: Is backing up a big deal?, Next: Can I fake multi-byte character support?, Prev: Are certain equivalent patterns faster than others?, Up: FAQ
-
-Is backing up a big deal?
-=========================
-
-
- To: Adoram Rogel <adoram@hybridge.com>
- Subject: Re: Flex 2.5.2 performance questions
- In-reply-to: Your message of Thu, 19 Sep 96 10:16:04 EDT.
- Date: Thu, 19 Sep 96 09:58:00 PDT
- From: Vern Paxson <vern>
-
- > a lot about the backing up problem.
- > I believe that there lies my biggest problem, and I'll try to improve
- > it.
-
- Since you have variable trailing context, this is a bigger performance
- problem. Fixing it is usually easier than fixing backing up, which in a
- complicated scanner (yours seems to fit the bill) can be extremely
- difficult to do correctly.
-
- You also don't mention what flags you are using for your scanner.
- -f makes a large speed difference, and -Cfe buys you nearly as much
- speed but the resulting scanner is considerably smaller.
-
- > I have an | operator in {and} and in {pats} so both of them are variable
- > length.
-
- -p should have reported this.
-
- > Is changing one of them to fixed-length is enough ?
-
- Yes.
-
- > Is it possible to change the 32,000 states limit ?
-
- Yes. I've appended instructions on how. Before you make this change,
- though, you should think about whether there are ways to fundamentally
- simplify your scanner - those are certainly preferable!
-
- Vern
-
- To increase the 32K limit (on a machine with 32 bit integers), you increase
- the magnitude of the following in flexdef.h:
-
- #define JAMSTATE -32766 /* marks a reference to the state that always jams */
- #define MAXIMUM_MNS 31999
- #define BAD_SUBSCRIPT -32767
- #define MAX_SHORT 32700
-
- Adding a 0 or two after each should do the trick.
-
-
-File: flex.info, Node: Can I fake multi-byte character support?, Next: deleteme01, Prev: Is backing up a big deal?, Up: FAQ
-
-Can I fake multi-byte character support?
-========================================
-
-
- To: Heeman_Lee@hp.com
- Subject: Re: flex - multi-byte support?
- In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
- Date: Fri, 04 Oct 1996 11:42:18 PDT
- From: Vern Paxson <vern>
-
- > I assume as long as my *.l file defines the
- > range of expected character code values (in octal format), flex will
- > scan the file and read multi-byte characters correctly. But I have no
- > confidence in this assumption.
-
- Your lack of confidence is justified - this won't work.
-
- Flex has in it a widespread assumption that the input is processed
- one byte at a time. Fixing this is on the to-do list, but is involved,
- so it won't happen any time soon. In the interim, the best I can suggest
- (unless you want to try fixing it yourself) is to write your rules in
- terms of pairs of bytes, using definitions in the first section:
-
- X \xfe\xc2
- ...
- %%
- foo{X}bar found_foo_fe_c2_bar();
-
- etc. Definitely a pain - sorry about that.
-
- By the way, the email address you used for me is ancient, indicating you
- have a very old version of flex. You can get the most recent, 2.5.4, from
- ftp.ee.lbl.gov.
-
- Vern
-
-
-File: flex.info, Node: deleteme01, Next: Can you discuss some flex internals?, Prev: Can I fake multi-byte character support?, Up: FAQ
-
-deleteme01
-==========
-
-
- To: moleary@primus.com
- Subject: Re: Flex / Unicode compatibility question
- In-reply-to: Your message of Tue, 22 Oct 1996 10:15:42 PDT.
- Date: Tue, 22 Oct 1996 11:06:13 PDT
- From: Vern Paxson <vern>
-
- Unfortunately flex at the moment has a widespread assumption within it
- that characters are processed 8 bits at a time. I don't see any easy
- fix for this (other than writing your rules in terms of double characters -
- a pain). I also don't know of a wider lex, though you might try surfing
- the Plan 9 stuff because I know it's a Unicode system, and also the PCCT
- toolkit (try searching say Alta Vista for "Purdue Compiler Construction
- Toolkit").
-
- Fixing flex to handle wider characters is on the long-term to-do list.
- But since flex is a strictly spare-time project these days, this probably
- won't happen for quite a while, unless someone else does it first.
-
- Vern
-
-
-File: flex.info, Node: Can you discuss some flex internals?, Next: unput() messes up yy_at_bol, Prev: deleteme01, Up: FAQ
-
-Can you discuss some flex internals?
-====================================
-
-
- To: Johan Linde <jl@theophys.kth.se>
- Subject: Re: translation of flex
- In-reply-to: Your message of Sun, 10 Nov 1996 09:16:36 PST.
- Date: Mon, 11 Nov 1996 10:33:50 PST
- From: Vern Paxson <vern>
-
- > I'm working for the Swedish team translating GNU program, and I'm currently
- > working with flex. I have a few questions about some of the messages which
- > I hope you can answer.
-
- All of the things you're wondering about, by the way, concerning flex
- internals - probably the only person who understands what they mean in
- English is me! So I wouldn't worry too much about getting them right.
- That said ...
-
- > #: main.c:545
- > msgid " %d protos created\n"
- >
- > Does proto mean prototype?
-
- Yes - prototypes of state compression tables.
-
- > #: main.c:539
- > msgid " %d/%d (peak %d) template nxt-chk entries created\n"
- >
- > Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
- > However, 'template next-check entries' doesn't make much sense to me. To be
- > able to find a good translation I need to know a little bit more about it.
-
- There is a scheme in the Aho/Sethi/Ullman compiler book for compressing
- scanner tables. It involves creating two pairs of tables. The first has
- "base" and "default" entries, the second has "next" and "check" entries.
- The "base" entry is indexed by the current state and yields an index into
- the next/check table. The "default" entry gives what to do if the state
- transition isn't found in next/check. The "next" entry gives the next
- state to enter, but only if the "check" entry verifies that this entry is
- correct for the current state. Flex creates templates of series of
- next/check entries and then encodes differences from these templates as a
- way to compress the tables.
-
- > #: main.c:533
- > msgid " %d/%d base-def entries created\n"
- >
- > The same problem here for 'base-def'.
-
- See above.
-
- Vern
-
-
-File: flex.info, Node: unput() messes up yy_at_bol, Next: The | operator is not doing what I want, Prev: Can you discuss some flex internals?, Up: FAQ
-
-unput() messes up yy_at_bol
-===========================
-
-
- To: Xinying Li <xli@npac.syr.edu>
- Subject: Re: FLEX ?
- In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
- Date: Wed, 13 Nov 1996 19:51:54 PST
- From: Vern Paxson <vern>
-
- > "unput()" them to input flow, question occurs. If I do this after I scan
- > a carriage, the variable "YY_CURRENT_BUFFER->yy_at_bol" is changed. That
- > means the carriage flag has gone.
-
- You can control this by calling yy_set_bol(). It's described in the manual.
-
- > And if in pre-reading it goes to the end of file, is anything done
- > to control the end of curren buffer and end of file?
-
- No, there's no way to put back an end-of-file.
-
- > By the way I am using flex 2.5.2 and using the "-l".
-
- The latest release is 2.5.4, by the way. It fixes some bugs in 2.5.2 and
- 2.5.3. You can get it from ftp.ee.lbl.gov.
-
- Vern
-
-
-File: flex.info, Node: The | operator is not doing what I want, Next: Why can't flex understand this variable trailing context pattern?, Prev: unput() messes up yy_at_bol, Up: FAQ
-
-The | operator is not doing what I want
-=======================================
-
-
- To: Alain.ISSARD@st.com
- Subject: Re: Start condition with FLEX
- In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
- Date: Mon, 18 Nov 1996 10:41:34 PST
- From: Vern Paxson <vern>
-
- > I am not able to use the start condition scope and to use the | (OR) with
- > rules having start conditions.
-
- The problem is that if you use '|' as a regular expression operator, for
- example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
- any blanks around it. If you instead want the special '|' *action* (which
- from your scanner appears to be the case), which is a way of giving two
- different rules the same action:
-
- foo |
- bar matched_foo_or_bar();
-
- then '|' *must* be separated from the first rule by whitespace and *must*
- be followed by a new line. You *cannot* write it as:
-
- foo | bar matched_foo_or_bar();
-
- even though you might think you could because yacc supports this syntax.
- The reason for this unfortunately incompatibility is historical, but it's
- unlikely to be changed.
-
- Your problems with start condition scope are simply due to syntax errors
- from your use of '|' later confusing flex.
-
- Let me know if you still have problems.
-
- Vern
-
-
-File: flex.info, Node: Why can't flex understand this variable trailing context pattern?, Next: The ^ operator isn't working, Prev: The | operator is not doing what I want, Up: FAQ
-
-Why can't flex understand this variable trailing context pattern?
-=================================================================
-
-
- To: Gregory Margo <gmargo@newton.vip.best.com>
- Subject: Re: flex-2.5.3 bug report
- In-reply-to: Your message of Sat, 23 Nov 1996 16:50:09 PST.
- Date: Sat, 23 Nov 1996 17:07:32 PST
- From: Vern Paxson <vern>
-
- > Enclosed is a lex file that "real" lex will process, but I cannot get
- > flex to process it. Could you try it and maybe point me in the right direction?
-
- Your problem is that some of the definitions in the scanner use the '/'
- trailing context operator, and have it enclosed in ()'s. Flex does not
- allow this operator to be enclosed in ()'s because doing so allows undefined
- regular expressions such as "(a/b)+". So the solution is to remove the
- parentheses. Note that you must also be building the scanner with the -l
- option for AT&T lex compatibility. Without this option, flex automatically
- encloses the definitions in parentheses.
-
- Vern
-
-
-File: flex.info, Node: The ^ operator isn't working, Next: Trailing context is getting confused with trailing optional patterns, Prev: Why can't flex understand this variable trailing context pattern?, Up: FAQ
-
-The ^ operator isn't working
-============================
-
-
- To: Thomas Hadig <hadig@toots.physik.rwth-aachen.de>
- Subject: Re: Flex Bug ?
- In-reply-to: Your message of Tue, 26 Nov 1996 14:35:01 PST.
- Date: Tue, 26 Nov 1996 11:15:05 PST
- From: Vern Paxson <vern>
-
- > In my lexer code, i have the line :
- > ^\*.* { }
- >
- > Thus all lines starting with an astrix (*) are comment lines.
- > This does not work !
-
- I can't get this problem to reproduce - it works fine for me. Note
- though that if what you have is slightly different:
-
- COMMENT ^\*.*
- %%
- {COMMENT} { }
-
- then it won't work, because flex pushes back macro definitions enclosed
- in ()'s, so the rule becomes
-
- (^\*.*) { }
-
- and now that the '^' operator is not at the immediate beginning of the
- line, it's interpreted as just a regular character. You can avoid this
- behavior by using the "-l" lex-compatibility flag, or "%option lex-compat".
-
- Vern
-
-
-File: flex.info, Node: Trailing context is getting confused with trailing optional patterns, Next: Is flex GNU or not?, Prev: The ^ operator isn't working, Up: FAQ
-
-Trailing context is getting confused with trailing optional patterns
-====================================================================
-
-
- To: Adoram Rogel <adoram@hybridge.com>
- Subject: Re: Flex 2.5.4 BOF ???
- In-reply-to: Your message of Tue, 26 Nov 1996 16:10:41 PST.
- Date: Wed, 27 Nov 1996 10:56:25 PST
- From: Vern Paxson <vern>
-
- > Organization(s)?/[a-z]
- >
- > This matched "Organizations" (looking in debug mode, the trailing s
- > was matched with trailing context instead of the optional (s) in the
- > end of the word.
-
- That should only happen with lex. Flex can properly match this pattern.
- (That might be what you're saying, I'm just not sure.)
-
- > Is there a way to avoid this dangerous trailing context problem ?
-
- Unfortunately, there's no easy way. On the other hand, I don't see why
- it should be a problem. Lex's matching is clearly wrong, and I'd hope
- that usually the intent remains the same as expressed with the pattern,
- so flex's matching will be correct.
-
- Vern
-
-
-File: flex.info, Node: Is flex GNU or not?, Next: ERASEME53, Prev: Trailing context is getting confused with trailing optional patterns, Up: FAQ
-
-Is flex GNU or not?
-===================
-
-
- To: Cameron MacKinnon <mackin@interlog.com>
- Subject: Re: Flex documentation bug
- In-reply-to: Your message of Mon, 02 Dec 1996 00:07:08 PST.
- Date: Sun, 01 Dec 1996 22:29:39 PST
- From: Vern Paxson <vern>
-
- > I'm not sure how or where to submit bug reports (documentation or
- > otherwise) for the GNU project stuff ...
-
- Well, strictly speaking flex isn't part of the GNU project. They just
- distribute it because no one's written a decent GPL'd lex replacement.
- So you should send bugs directly to me. Those sent to the GNU folks
- sometimes find there way to me, but some may drop between the cracks.
-
- > In GNU Info, under the section 'Start Conditions', and also in the man
- > page (mine's dated April '95) is a nice little snippet showing how to
- > parse C quoted strings into a buffer, defined to be MAX_STR_CONST in
- > size. Unfortunately, no overflow checking is ever done ...
-
- This is already mentioned in the manual:
-
- Finally, here's an example of how to match C-style quoted
- strings using exclusive start conditions, including expanded
- escape sequences (but not including checking for a string
- that's too long):
-
- The reason for not doing the overflow checking is that it will needlessly
- clutter up an example whose main purpose is just to demonstrate how to
- use flex.
-
- The latest release is 2.5.4, by the way, available from ftp.ee.lbl.gov.
-
- Vern
-
-
-File: flex.info, Node: ERASEME53, Next: I need to scan if-then-else blocks and while loops, Prev: Is flex GNU or not?, Up: FAQ
-
-ERASEME53
-=========
-
-
- To: tsv@cs.UManitoba.CA
- Subject: Re: Flex (reg)..
- In-reply-to: Your message of Thu, 06 Mar 1997 23:50:16 PST.
- Date: Thu, 06 Mar 1997 15:54:19 PST
- From: Vern Paxson <vern>
-
- > [:alpha:] ([:alnum:] | \\_)*
-
- If your rule really has embedded blanks as shown above, then it won't
- work, as the first blank delimits the rule from the action. (It wouldn't
- even compile ...) You need instead:
-
- [:alpha:]([:alnum:]|\\_)*
-
- and that should work fine - there's no restriction on what can go inside
- of ()'s except for the trailing context operator, '/'.
-
- Vern
-
-
-File: flex.info, Node: I need to scan if-then-else blocks and while loops, Next: ERASEME55, Prev: ERASEME53, Up: FAQ
-
-I need to scan if-then-else blocks and while loops
-==================================================
-
-
- To: "Mike Stolnicki" <mstolnic@ford.com>
- Subject: Re: FLEX help
- In-reply-to: Your message of Fri, 30 May 1997 13:33:27 PDT.
- Date: Fri, 30 May 1997 10:46:35 PDT
- From: Vern Paxson <vern>
-
- > We'd like to add "if-then-else", "while", and "for" statements to our
- > language ...
- > We've investigated many possible solutions. The one solution that seems
- > the most reasonable involves knowing the position of a TOKEN in yyin.
-
- I strongly advise you to instead build a parse tree (abstract syntax tree)
- and loop over that instead. You'll find this has major benefits in keeping
- your interpreter simple and extensible.
-
- That said, the functionality you mention for get_position and set_position
- have been on the to-do list for a while. As flex is a purely spare-time
- project for me, no guarantees when this will be added (in particular, it
- for sure won't be for many months to come).
-
- Vern
-
-
-File: flex.info, Node: ERASEME55, Next: ERASEME56, Prev: I need to scan if-then-else blocks and while loops, Up: FAQ
-
-ERASEME55
-=========
-
-
- To: Colin Paul Adams <colin@colina.demon.co.uk>
- Subject: Re: Flex C++ classes and Bison
- In-reply-to: Your message of 09 Aug 1997 17:11:41 PDT.
- Date: Fri, 15 Aug 1997 10:48:19 PDT
- From: Vern Paxson <vern>
-
- > #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control
- > *parm)
- >
- > I have been trying to get this to work as a C++ scanner, but it does
- > not appear to be possible (warning that it matches no declarations in
- > yyFlexLexer, or something like that).
- >
- > Is this supposed to be possible, or is it being worked on (I DID
- > notice the comment that scanner classes are still experimental, so I'm
- > not too hopeful)?
-
- What you need to do is derive a subclass from yyFlexLexer that provides
- the above yylex() method, squirrels away lvalp and parm into member
- variables, and then invokes yyFlexLexer::yylex() to do the regular scanning.
-
- Vern
-
-
-File: flex.info, Node: ERASEME56, Next: ERASEME57, Prev: ERASEME55, Up: FAQ
-
-ERASEME56
-=========
-
-
- To: Mikael.Latvala@lmf.ericsson.se
- Subject: Re: Possible mistake in Flex v2.5 document
- In-reply-to: Your message of Fri, 05 Sep 1997 16:07:24 PDT.
- Date: Fri, 05 Sep 1997 10:01:54 PDT
- From: Vern Paxson <vern>
-
- > In that example you show how to count comment lines when using
- > C style /* ... */ comments. My question is, shouldn't you take into
- > account a scenario where end of a comment marker occurs inside
- > character or string literals?
-
- The scanner certainly needs to also scan character and string literals.
- However it does that (there's an example in the man page for strings), the
- lexer will recognize the beginning of the literal before it runs across the
- embedded "/*". Consequently, it will finish scanning the literal before it
- even considers the possibility of matching "/*".
-
- Example:
-
- '([^']*|{ESCAPE_SEQUENCE})'
-
- will match all the text between the ''s (inclusive). So the lexer
- considers this as a token beginning at the first ', and doesn't even
- attempt to match other tokens inside it.
-
- I thinnk this subtlety is not worth putting in the manual, as I suspect
- it would confuse more people than it would enlighten.
-
- Vern
-
-
-File: flex.info, Node: ERASEME57, Next: Is there a repository for flex scanners?, Prev: ERASEME56, Up: FAQ
-
-ERASEME57
-=========
-
-
- To: "Marty Leisner" <leisner@sdsp.mc.xerox.com>
- Subject: Re: flex limitations
- In-reply-to: Your message of Sat, 06 Sep 1997 11:27:21 PDT.
- Date: Mon, 08 Sep 1997 11:38:08 PDT
- From: Vern Paxson <vern>
-
- > %%
- > [a-zA-Z]+ /* skip a line */
- > { printf("got %s\n", yytext); }
- > %%
-
- What version of flex are you using? If I feed this to 2.5.4, it complains:
-
- "bug.l", line 5: EOF encountered inside an action
- "bug.l", line 5: unrecognized rule
- "bug.l", line 5: fatal parse error
-
- Not the world's greatest error message, but it manages to flag the problem.
-
- (With the introduction of start condition scopes, flex can't accommodate
- an action on a separate line, since it's ambiguous with an indented rule.)
-
- You can get 2.5.4 from ftp.ee.lbl.gov.
-
- Vern
-
-
-File: flex.info, Node: Is there a repository for flex scanners?, Next: How can I conditionally compile or preprocess my flex input file?, Prev: ERASEME57, Up: FAQ
-
-Is there a repository for flex scanners?
-========================================
-
- Not that we know of. You might try asking on comp.compilers.
-
-
-File: flex.info, Node: How can I conditionally compile or preprocess my flex input file?, Next: Where can I find grammars for lex and yacc?, Prev: Is there a repository for flex scanners?, Up: FAQ
-
-How can I conditionally compile or preprocess my flex input file?
-=================================================================
-
- Flex doesn't have a preprocessor like C does. You might try using
-m4, or the C preprocessor plus a sed script to clean up the result.
-
-
-File: flex.info, Node: Where can I find grammars for lex and yacc?, Next: I get an end-of-buffer message for each character scanned., Prev: How can I conditionally compile or preprocess my flex input file?, Up: FAQ
-
-Where can I find grammars for lex and yacc?
-===========================================
-
- In the sources for flex and bison.
-
-
-File: flex.info, Node: I get an end-of-buffer message for each character scanned., Next: unnamed-faq-62, Prev: Where can I find grammars for lex and yacc?, Up: FAQ
-
-I get an end-of-buffer message for each character scanned.
-==========================================================
-
- This will happen if your LexerInput() function returns only one
-character at a time, which can happen either if you're scanner is
-"interactive", or if the streams library on your platform always
-returns 1 for yyin->gcount().
-
- Solution: override LexerInput() with a version that returns whole
-buffers.
-
-
-File: flex.info, Node: unnamed-faq-62, Next: unnamed-faq-63, Prev: I get an end-of-buffer message for each character scanned., Up: FAQ
-
-unnamed-faq-62
-==============
-
-
- To: Georg.Rehm@CL-KI.Uni-Osnabrueck.DE
- Subject: Re: Flex maximums
- In-reply-to: Your message of Mon, 17 Nov 1997 17:16:06 PST.
- Date: Mon, 17 Nov 1997 17:16:15 PST
- From: Vern Paxson <vern>
-
- > I took a quick look into the flex-sources and altered some #defines in
- > flexdefs.h:
- >
- > #define INITIAL_MNS 64000
- > #define MNS_INCREMENT 1024000
- > #define MAXIMUM_MNS 64000
-
- The things to fix are to add a couple of zeroes to:
-
- #define JAMSTATE -32766 /* marks a reference to the state that always jams */
- #define MAXIMUM_MNS 31999
- #define BAD_SUBSCRIPT -32767
- #define MAX_SHORT 32700
-
- and, if you get complaints about too many rules, make the following change too:
-
- #define YY_TRAILING_MASK 0x200000
- #define YY_TRAILING_HEAD_MASK 0x400000
-
- - Vern
-
-
-File: flex.info, Node: unnamed-faq-63, Next: unnamed-faq-64, Prev: unnamed-faq-62, Up: FAQ
-
-unnamed-faq-63
-==============
-
-
- To: jimmey@lexis-nexis.com (Jimmey Todd)
- Subject: Re: FLEX question regarding istream vs ifstream
- In-reply-to: Your message of Mon, 08 Dec 1997 15:54:15 PST.
- Date: Mon, 15 Dec 1997 13:21:35 PST
- From: Vern Paxson <vern>
-
- > stdin_handle = YY_CURRENT_BUFFER;
- > ifstream fin( "aFile" );
- > yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
- >
- > What I'm wanting to do, is pass the contents of a file thru one set
- > of rules and then pass stdin thru another set... It works great if, I
- > don't use the C++ classes. But since everything else that I'm doing is
- > in C++, I thought I'd be consistent.
- >
- > The problem is that 'yy_create_buffer' is expecting an istream* as it's
- > first argument (as stated in the man page). However, fin is a ifstream
- > object. Any ideas on what I might be doing wrong? Any help would be
- > appreciated. Thanks!!
-
- You need to pass &fin, to turn it into an ifstream* instead of an ifstream.
- Then its type will be compatible with the expected istream*, because ifstream
- is derived from istream.
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-64, Next: unnamed-faq-65, Prev: unnamed-faq-63, Up: FAQ
-
-unnamed-faq-64
-==============
-
-
- To: Enda Fadian <fadiane@piercom.ie>
- Subject: Re: Question related to Flex man page?
- In-reply-to: Your message of Tue, 16 Dec 1997 15:17:34 PST.
- Date: Tue, 16 Dec 1997 14:17:09 PST
- From: Vern Paxson <vern>
-
- > Can you explain to me what is ment by a long-jump in relation to flex?
-
- Using the longjmp() function while inside yylex() or a routine called by it.
-
- > what is the flex activation frame.
-
- Just yylex()'s stack frame.
-
- > As far as I can see yyrestart will bring me back to the sart of the input
- > file and using flex++ isnot really an option!
-
- No, yyrestart() doesn't imply a rewind, even though its name might sound
- like it does. It tells the scanner to flush its internal buffers and
- start reading from the given file at its present location.
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-65, Next: unnamed-faq-66, Prev: unnamed-faq-64, Up: FAQ
-
-unnamed-faq-65
-==============
-
-
- To: hassan@larc.info.uqam.ca (Hassan Alaoui)
- Subject: Re: Need urgent Help
- In-reply-to: Your message of Sat, 20 Dec 1997 19:38:19 PST.
- Date: Sun, 21 Dec 1997 21:30:46 PST
- From: Vern Paxson <vern>
-
- > /usr/lib/yaccpar: In function `int yyparse()':
- > /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
- >
- > ld: Undefined symbol
- > _yylex
- > _yyparse
- > _yyin
-
- This is a known problem with Solaris C++ (and/or Solaris yacc). I believe
- the fix is to explicitly insert some 'extern "C"' statements for the
- corresponding routines/symbols.
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-66, Next: unnamed-faq-67, Prev: unnamed-faq-65, Up: FAQ
-
-unnamed-faq-66
-==============
-
-
- To: mc0307@mclink.it
- Cc: gnu@prep.ai.mit.edu
- Subject: Re: [mc0307@mclink.it: Help request]
- In-reply-to: Your message of Fri, 12 Dec 1997 17:57:29 PST.
- Date: Sun, 21 Dec 1997 22:33:37 PST
- From: Vern Paxson <vern>
-
- > This is my definition for float and integer types:
- > . . .
- > NZD [1-9]
- > ...
- > I've tested my program on other lex version (on UNIX Sun Solaris an HP
- > UNIX) and it work well, so I think that my definitions are correct.
- > There are any differences between Lex and Flex?
-
- There are indeed differences, as discussed in the man page. The one
- you are probably running into is that when flex expands a name definition,
- it puts parentheses around the expansion, while lex does not. There's
- an example in the man page of how this can lead to different matching.
- Flex's behavior complies with the POSIX standard (or at least with the
- last POSIX draft I saw).
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-67, Next: unnamed-faq-68, Prev: unnamed-faq-66, Up: FAQ
-
-unnamed-faq-67
-==============
-
-
- To: hassan@larc.info.uqam.ca (Hassan Alaoui)
- Subject: Re: Thanks
- In-reply-to: Your message of Mon, 22 Dec 1997 16:06:35 PST.
- Date: Mon, 22 Dec 1997 14:35:05 PST
- From: Vern Paxson <vern>
-
- > Thank you very much for your help. I compile and link well with C++ while
- > declaring 'yylex ...' extern, But a little problem remains. I get a
- > segmentation default when executing ( I linked with lfl library) while it
- > works well when using LEX instead of flex. Do you have some ideas about the
- > reason for this ?
-
- The one possible reason for this that comes to mind is if you've defined
- yytext as "extern char yytext[]" (which is what lex uses) instead of
- "extern char *yytext" (which is what flex uses). If it's not that, then
- I'm afraid I don't know what the problem might be.
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-68, Next: unnamed-faq-69, Prev: unnamed-faq-67, Up: FAQ
-
-unnamed-faq-68
-==============
-
-
- To: "Bart Niswonger" <NISWONGR@almaden.ibm.com>
- Subject: Re: flex 2.5: c++ scanners & start conditions
- In-reply-to: Your message of Tue, 06 Jan 1998 10:34:21 PST.
- Date: Tue, 06 Jan 1998 19:19:30 PST
- From: Vern Paxson <vern>
-
- > The problem is that when I do this (using %option c++) start
- > conditions seem to not apply.
-
- The BEGIN macro modifies the yy_start variable. For C scanners, this
- is a static with scope visible through the whole file. For C++ scanners,
- it's a member variable, so it only has visible scope within a member
- function. Your lexbegin() routine is not a member function when you
- build a C++ scanner, so it's not modifying the correct yy_start. The
- diagnostic that indicates this is that you found you needed to add
- a declaration of yy_start in order to get your scanner to compile when
- using C++; instead, the correct fix is to make lexbegin() a member
- function (by deriving from yyFlexLexer).
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-69, Next: unnamed-faq-70, Prev: unnamed-faq-68, Up: FAQ
-
-unnamed-faq-69
-==============
-
-
- To: "Boris Zinin" <boris@ippe.rssi.ru>
- Subject: Re: current position in flex buffer
- In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
- Date: Mon, 12 Jan 1998 12:03:15 PST
- From: Vern Paxson <vern>
-
- > The problem is how to determine the current position in flex active
- > buffer when a rule is matched....
-
- You will need to keep track of this explicitly, such as by redefining
- YY_USER_ACTION to count the number of characters matched.
-
- The latest flex release, by the way, is 2.5.4, available from ftp.ee.lbl.gov.
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-70, Next: unnamed-faq-71, Prev: unnamed-faq-69, Up: FAQ
-
-unnamed-faq-70
-==============
-
-
- To: Bik.Dhaliwal@bis.org
- Subject: Re: Flex question
- In-reply-to: Your message of Mon, 26 Jan 1998 13:05:35 PST.
- Date: Tue, 27 Jan 1998 22:41:52 PST
- From: Vern Paxson <vern>
-
- > That requirement involves knowing
- > the character position at which a particular token was matched
- > in the lexer.
-
- The way you have to do this is by explicitly keeping track of where
- you are in the file, by counting the number of characters scanned
- for each token (available in yyleng). It may prove convenient to
- do this by redefining YY_USER_ACTION, as described in the manual.
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-71, Next: unnamed-faq-72, Prev: unnamed-faq-70, Up: FAQ
-
-unnamed-faq-71
-==============
-
-
- To: Vladimir Alexiev <vladimir@cs.ualberta.ca>
- Subject: Re: flex: how to control start condition from parser?
- In-reply-to: Your message of Mon, 26 Jan 1998 05:50:16 PST.
- Date: Tue, 27 Jan 1998 22:45:37 PST
- From: Vern Paxson <vern>
-
- > It seems useful for the parser to be able to tell the lexer about such
- > context dependencies, because then they don't have to be limited to
- > local or sequential context.
-
- One way to do this is to have the parser call a stub routine that's
- included in the scanner's .l file, and consequently that has access ot
- BEGIN. The only ugliness is that the parser can't pass in the state
- it wants, because those aren't visible - but if you don't have many
- such states, then using a different set of names doesn't seem like
- to much of a burden.
-
- While generating a .h file like you suggests is certainly cleaner,
- flex development has come to a virtual stand-still :-(, so a workaround
- like the above is much more pragmatic than waiting for a new feature.
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-72, Next: unnamed-faq-73, Prev: unnamed-faq-71, Up: FAQ
-
-unnamed-faq-72
-==============
-
-
- To: Barbara Denny <denny@3com.com>
- Subject: Re: freebsd flex bug?
- In-reply-to: Your message of Fri, 30 Jan 1998 12:00:43 PST.
- Date: Fri, 30 Jan 1998 12:42:32 PST
- From: Vern Paxson <vern>
-
- > lex.yy.c:1996: parse error before `='
-
- This is the key, identifying this error. (It may help to pinpoint
- it by using flex -L, so it doesn't generate #line directives in its
- output.) I will bet you heavy money that you have a start condition
- name that is also a variable name, or something like that; flex spits
- out #define's for each start condition name, mapping them to a number,
- so you can wind up with:
-
- %x foo
- %%
- ...
- %%
- void bar()
- {
- int foo = 3;
- }
-
- and the penultimate will turn into "int 1 = 3" after C preprocessing,
- since flex will put "#define foo 1" in the generated scanner.
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-73, Next: unnamed-faq-74, Prev: unnamed-faq-72, Up: FAQ
-
-unnamed-faq-73
-==============
-
-
- To: Maurice Petrie <mpetrie@infoscigroup.com>
- Subject: Re: Lost flex .l file
- In-reply-to: Your message of Mon, 02 Feb 1998 14:10:01 PST.
- Date: Mon, 02 Feb 1998 11:15:12 PST
- From: Vern Paxson <vern>
-
- > I am curious as to
- > whether there is a simple way to backtrack from the generated source to
- > reproduce the lost list of tokens we are searching on.
-
- In theory, it's straight-forward to go from the DFA representation
- back to a regular-expression representation - the two are isomorphic.
- In practice, a huge headache, because you have to unpack all the tables
- back into a single DFA representation, and then write a program to munch
- on that and translate it into an RE.
-
- Sorry for the less-than-happy news ...
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-74, Next: unnamed-faq-75, Prev: unnamed-faq-73, Up: FAQ
-
-unnamed-faq-74
-==============
-
-
- To: jimmey@lexis-nexis.com (Jimmey Todd)
- Subject: Re: Flex performance question
- In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
- Date: Thu, 19 Feb 1998 08:48:51 PST
- From: Vern Paxson <vern>
-
- > What I have found, is that the smaller the data chunk, the faster the
- > program executes. This is the opposite of what I expected. Should this be
- > happening this way?
-
- This is exactly what will happen if your input file has embedded NULs.
- From the man page:
-
- A final note: flex is slow when matching NUL's, particularly
- when a token contains multiple NUL's. It's best to write
- rules which match short amounts of text if it's anticipated
- that the text will often include NUL's.
-
- So that's the first thing to look for.
-
- Vern
-
-
-File: flex.info, Node: unnamed-faq-75, Next: unnamed-faq-76, Prev: unnamed-faq-74, Up: FAQ
-
-unnamed-faq-75
-==============
-
-
- To: jimmey@lexis-nexis.com (Jimmey Todd)
- Subject: Re: Flex performance question
- In-reply-to: Your message of Thu, 19 Feb 1998 11:01:17 PST.
- Date: Thu, 19 Feb 1998 15:42:25 PST
- From: Vern Paxson <vern>
-
- So there are several problems.
-
- First, to go fast, you want to match as much text as possible, which
- your scanners don't in the case that what they're scanning is *not*
- a <RN> tag. So you want a rule like:
-
- [^<]+
-
- Second, C++ scanners are particularly slow if they're interactive,
- which they are by default. Using -B speeds it up by a factor of 3-4
- on my workstation.
-
- Third, C++ scanners that use the istream interface are slow, because
- of how poorly implemented istream's are. I built two versions of
- the following scanner:
-
- %%
- .*\n
- .*
- %%
-
- and the C version inhales a 2.5MB file on my workstation in 0.8 seconds.
- The C++ istream version, using -B, takes 3.8 seconds.
-
- Vern
-