summaryrefslogtreecommitdiff
path: root/faq.texi
diff options
context:
space:
mode:
authorWill Estes <wlestes@users.sourceforge.net>2002-07-30 15:59:02 +0000
committerWill Estes <wlestes@users.sourceforge.net>2002-07-30 15:59:02 +0000
commit8cbd7c94a048e42156617e82471db756448a171e (patch)
treec6dd27864aaddb4c29339b6bfc84dd9a125c20df /faq.texi
parentd30efe4faa8d16f174a21eeb5787d549c153e65c (diff)
fix up some fatal bugs in the texinfo of the faq; begin the clean up; remove trailing and leading white space
Diffstat (limited to 'faq.texi')
-rw-r--r--faq.texi368
1 files changed, 162 insertions, 206 deletions
diff --git a/faq.texi b/faq.texi
index 7492d4f..ff020f1 100644
--- a/faq.texi
+++ b/faq.texi
@@ -43,12 +43,12 @@
* Can I build nested parsers that work with the same input file?::
* How can I match text only at the end of a file?::
* How can I make REJECT cascade across start condition boundaries?::
-* Why can't I use fast or full tables with interactive mode?::
+* Why cant I use fast or full tables with interactive mode?::
* How much faster is -F or -f than -C?::
-* If I have a simple grammar can't I just parse it with flex?::
-* Why doesn't yyrestart() set the start state back to INITIAL?::
+* If I have a simple grammar cant I just parse it with flex?::
+* Why doesnt yyrestart() set the start state back to INITIAL?::
* How can I match C-style comments?::
-* The '.' isn't working the way I expected.::
+* The period isnt working the way I expected.::
* Can I get the flex manual in another format?::
* Does there exist a "faster" NDFA->DFA algorithm?::
* How does flex compile the DFA so quickly?::
@@ -67,7 +67,7 @@
* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
* Is there a way to make flex treat NULL like a regular character?::
* Whenever flex can not match the input it says "flex scanner jammed".::
-* Why doesn't flex have non-greedy operators like perl does?::
+* Why doesnt flex have non-greedy operators like perl does?::
* Memory leak - 16386 bytes allocated by malloc.::
* How do I track the byte offset for lseek()?::
* unnamed-faq-16::
@@ -135,14 +135,11 @@
* unnamed-faq-101::
@end menu
-
@node When was flex born?
@unnumberedsec When was flex born?
-When was flex born?
-
Vern Paxson took over
-the Software Tools lex project from Jef Poskanzer in 1982. At that point it
+the @cite{Software Tools} lex project from Jef Poskanzer in 1982. At that point it
was written in Ratfor. Around 1987 or so, Paxson translated it into C, and
a legend was born :-).
@@ -151,7 +148,6 @@ a legend was born :-).
How do I expand \ escape sequences in C-style quoted strings?
-
A key point when scanning quoted strings is that you cannot (easily) write
a single rule that will precisely match the string if you allow things
like embedded escape sequences and newlines. If you try to match strings
@@ -163,7 +159,7 @@ matching non-escaped text, one for matching a single escape, one for
matching an embedded newline, and one for recognizing the end of the
string. Each of these rules is then faced with the question of where to
put its intermediary results. The best solution is for the rules to
-append their local value of yytext to the end of a "string literal"
+append their local value of yytext to the end of a ``string literal''
buffer. A rule like the escape-matcher will append to the buffer the
meaning of the escape sequence rather than the literal text in yytext.
In this way, yytext does not need to be modified at all.
@@ -171,15 +167,12 @@ In this way, yytext does not need to be modified at all.
@node Why do flex scanners call fileno if it is not ANSI compatible?
@unnumberedsec Why do flex scanners call fileno if it is not ANSI compatible?
-Why do flex scanners call fileno if it is not ANSI compatible?
-
-
Flex scanners call fileno() in order to get the file descriptor
corresponding to yyin. The file descriptor may be passed to
isatty() or read(), depending upon which %options you specified.
-If your system does not have fileno() support. To get rid of the
+If your system does not have fileno() support, to get rid of the
read() call, do not specify %option read. To get rid of the isatty()
-call, you must specify one of %option always-interactive or
+call, you must specify one of %option always-interactive or
%option never-interactive.
@node Does flex support recursive pattern definitions?
@@ -195,19 +188,16 @@ block "{"({block}|{statement})*"}"
@end verbatim
@end example
-No. You cannot have recursive definitions. The pattern-matching power of
+No. You cannot have recursive definitions. The pattern-matching power of
regular expressions in general (and therefore flex scanners, too) is
limited. In particular, regular expressions cannot "balance" parentheses
to an arbitrary degree. For example, it's impossible to write a regular
expression that matches all strings containing the same number of '@{'s
as '@}'s. For more powerful pattern matching, you need a parser, such
-as GNU bison.
-
-@node How do skip huge chunks of input (tens of megabytes) while using flex?
-@unnumberedsec How do skip huge chunks of input (tens of megabytes) while using flex?
-
-How do skip huge chunks of input (tens of megabytes) while using flex?
+as GNU bison.
+@node How do I skip huge chunks of input (tens of megabytes) while using flex?
+@unnumberedsec How do I skip huge chunks of input (tens of megabytes) while using flex?
Use fseek (or lseek) to position yyin, then call yyrestart().
@@ -216,7 +206,6 @@ Use fseek (or lseek) to position yyin, then call yyrestart().
Flex is not matching my patterns in the same order that I defined them.
-
This is indeed the natural way to expect it to work, however, flex picks the
rule that matches the most text (i.e., the longest possible input string).
This is because flex uses an entirely different matching technique
@@ -250,21 +239,20 @@ identifier rule so it no longer matches "data_". (Of course, you might
also not have the option of changing the input language ...)
@node My actions are executing out of order or sometimes not at all.
-@unnumberedsec My actions are executing out of order or sometimes not at all.
+@unnumberedsec My actions are executing out of order or sometimes not at all.
My actions are executing out of order or sometimes not at all. What's
happening?
-
Most likely, you have (in error) placed the opening @samp{@{} of the action
block on a different line than the rule, e.g.,
@example
@verbatim
^(foo|bar)
- { <<<--- WRONG!
+{ <<<--- WRONG!
- }
+}
@end verbatim
@end example
@@ -276,7 +264,7 @@ as follows:
@verbatim
^(foo|bar) { // CORRECT!
- }
+}
@end verbatim
@end example
@@ -287,7 +275,7 @@ How can I have multiple input sources feed into the same scanner at
the same time?
If...
-@itemize @w
+@itemize
@item
your scanner is free of backtracking (verified using flex's -b flag),
@item
@@ -324,7 +312,6 @@ IPC traffic from sockets, and it works fine.
Can I build nested parsers that work with the same input file?
-
This is not going to work without some additional effort. The reason is
that flex block-buffers the input it reads from yyin. This means that the
"outermost" yylex(), when called, will automatically slurp up the first 8K
@@ -340,7 +327,6 @@ exclusive start condition for each.
How can I match text only at the end of a file?
-
There is no way to write a rule which is "match this text, but only if
it comes at the end of the file". You can fake it, though, if you happen
to have a character lying around that you don't allow in your input.
@@ -359,7 +345,6 @@ real EOF next time it's called). Then you could write:
How can I make REJECT cascade across start condition boundaries?
-
You can do this as follows. Suppose you have a start condition A, and
after exhausting all of the possible matches in <A>, you want to try
matches in <INITIAL>. Then you could use the following:
@@ -373,27 +358,24 @@ matches in <INITIAL>. Then you could use the following:
<A>etc.
...
<A>.|\n {
- /* Shortest and last rule in <A>, so
- * cascaded REJECT's will eventually
- * wind up matching this rule. We want
- * to now switch to the initial state
- * and try matching from there instead.
- */
- yyless(0); /* put back matched text */
- BEGIN(INITIAL);
- }
+/* Shortest and last rule in <A>, so
+* cascaded REJECT's will eventually
+* wind up matching this rule. We want
+* to now switch to the initial state
+* and try matching from there instead.
+*/
+yyless(0); /* put back matched text */
+BEGIN(INITIAL);
+}
@end verbatim
@end example
-@node Why can't I use fast or full tables with interactive mode?
+@node Why cant I use fast or full tables with interactive mode?
@unnumberedsec Why can't I use fast or full tables with interactive mode?
-Why can't I use fast or full tables with interactive mode?
-
-
One of the assumptions
-flex makes is that interactive applications are inherently slow (for just
-that reason, they're waiting on a human).
+flex makes is that interactive applications are inherently slow (they're
+waiting on a human after all).
It has to do with how the scanner detects that it must be finished scanning
a token. For interactive scanners, after scanning each character the current
state is looked up in a table (essentially) to see whether there's a chance
@@ -406,36 +388,28 @@ as fast as possible.
Still, it seems reasonable to allow the user to choose to trade off a bit
of performance in this area to gain the corresponding flexibility. There
might be another reason, though, why fast scanners don't support the
-interactive option
+interactive option
@node How much faster is -F or -f than -C?
@unnumberedsec How much faster is -F or -f than -C?
How much faster is -F or -f than -C?
-
Much faster (factor of 2-3).
-@node If I have a simple grammar can't I just parse it with flex?
+@node If I have a simple grammar cant I just parse it with flex?
@unnumberedsec If I have a simple grammar can't I just parse it with flex?
-If I have a simple grammar, can't I just parse it with flex?
-
-
Is your grammar recursive? That's almost always a sign that you're
better off using a parser/scanner rather than just trying to use a scanner
alone.
-@node Why doesn't yyrestart() set the start state back to INITIAL?
+@node Why doesnt yyrestart() set the start state back to INITIAL?
@unnumberedsec Why doesn't yyrestart() set the start state back to INITIAL?
-Why doesn't yyrestart() set the start state back to INITIAL?
-
-
-
There are two reasons. The first is that there might
be programs that rely on the start state not changing across file changes.
The second is that with flex 2.4, use of yyrestart() is no longer required,
-so fixing the problem there doesn't solve the more general problem.
+so fixing the problem there doesn't solve the more general problem.
@node How can I match C-style comments?
@unnumberedsec How can I match C-style comments?
@@ -458,12 +432,11 @@ or, worse, this:
@end verbatim
@end example
-
The above rules will eat too much input, and blow up on things like:
@example
@verbatim
- /* a comment */ do_my_thing( "oops */" );
+/* a comment */ do_my_thing( "oops */" );
@end verbatim
@end example
@@ -472,22 +445,20 @@ Here is one way which allows you to track line information:
@example
@verbatim
<INITIAL>{
- "/*" BEGIN(IN_COMMENT);
+"/*" BEGIN(IN_COMMENT);
}
<IN_COMMENT>{
- "*/" BEGIN(INITIAL);
- [^*\n]+ // eat comment in chunks
- "*" // eat the lone star
- \n yylineno++;
+"*/" BEGIN(INITIAL);
+[^*\n]+ // eat comment in chunks
+"*" // eat the lone star
+\n yylineno++;
}
@end verbatim
@end example
-@node The '.' isn't working the way I expected.
+@node The period isnt working the way I expected.
@unnumberedsec The '.' isn't working the way I expected.
-The '.' (dot) isn't working the way I expected.
-
Here are some tips for using @samp{.}:
@itemize
@@ -511,7 +482,6 @@ If you really want to match ANY character, including newlines, then use @code{(.
Finally, if you want to match a literal @samp{.} (a period), then use [.] or "."
@end itemize
-
@node Can I get the flex manual in another format?
@unnumberedsec Can I get the flex manual in another format?
@@ -522,7 +492,7 @@ You can use the "texi2*" tools to convert the manual to any format
you desire (e.g., @samp{texi2html}).
@node Does there exist a "faster" NDFA->DFA algorithm?
-@unnumberedsec Does there exist a "faster" NDFA->DFA algorithm?
+@unnumberedsec Does there exist a "faster" NDFA->DFA algorithm?
Does there exist a "faster" NDFA->DFA algorithm? Most standard texts (e.g.,
Aho), imply that NDFA->DFA can take exponential time, since there are
@@ -559,8 +529,7 @@ state can be done very quickly, by first comparing hash values.
How can I use more than 8192 rules?
-
-Flex is compiled with an upper limit of 8192 rules per scanner.
+Flex is compiled with an upper limit of 8192 rules per scanner.
If you need more than 8192 rules in your scanner, you'll have to recompile flex
with the following changes in flexdef.h:
@@ -583,7 +552,6 @@ is the best way to solve your problem.
How do I abandon a file in the middle of a scan and switch to a new file?
-
Just all yyrestart(newfile). Be sure to reset the start state if you want a
"fresh" start, since yyrestart does NOT reset the start state back to INITIAL.
@@ -599,13 +567,13 @@ can add to the beginning of your rules section:
@example
@verbatim
%%
- /* Must be indented! */
- static int did_init = 0;
+/* Must be indented! */
+static int did_init = 0;
- if ( ! did_init ){
- do_my_init();
- did_init = 1;
- }
+if ( ! did_init ){
+do_my_init();
+did_init = 1;
+}
@end verbatim
@end example
@@ -614,7 +582,6 @@ can add to the beginning of your rules section:
How do I execute code at termination (i.e., only after the last scan?)
-
You can specifiy an action for the <<EOF>> rule.
@node Where else can I find help?
@unnumberedsec Where else can I find help?
@@ -639,7 +606,7 @@ I get an error about undefined yywrap().
You must supply a yywrap() function of your own, or link to libfl.a
(which provides one), or use
- %option noyywrap
+%option noyywrap
in your source to say you don't want a yywrap() function.
See the manual page for more details concerning yywrap().
@@ -684,31 +651,30 @@ However, you can do this using multiple input buffers.
@verbatim
%%
macro/[a-z]+ {
- /* Saw the macro "macro" followed by extra stuff. */
- main_buffer = YY_CURRENT_BUFFER;
- expansion_buffer = yy_scan_string(expand(yytext));
- yy_switch_to_buffer(expansion_buffer);
- }
+/* Saw the macro "macro" followed by extra stuff. */
+main_buffer = YY_CURRENT_BUFFER;
+expansion_buffer = yy_scan_string(expand(yytext));
+yy_switch_to_buffer(expansion_buffer);
+}
<<EOF>> {
- if ( expansion_buffer )
- {
- // We were doing an expansion, return to where
- // we were.
- yy_switch_to_buffer(main_buffer);
- yy_delete_buffer(expansion_buffer);
- expansion_buffer = 0;
- }
- else
- yyterminate();
- }
+if ( expansion_buffer )
+{
+// We were doing an expansion, return to where
+// we were.
+yy_switch_to_buffer(main_buffer);
+yy_delete_buffer(expansion_buffer);
+expansion_buffer = 0;
+}
+else
+yyterminate();
+}
@end verbatim
@end example
You probably will want a stack of expansion buffers to allow nested macros.
From the above though hopefully the idea is clear.
-
@node How can I build a two-pass scanner?
@unnumberedsec How can I build a two-pass scanner?
@@ -726,7 +692,6 @@ tree, but the performance hit for the latter is usually an order of magnitude
smaller, since everything is already classified, in binary format, and
residing in memory.
-
@node How do I match any string not matched in the preceding rules?
@unnumberedsec How do I match any string not matched in the preceding rules?
@@ -762,7 +727,6 @@ what they're doing, and then replace input() with an appropriate definition of
YY_INPUT (see the flex man page). You shouldn't need to (and must not) replace
flex's unput() function.
-
@node Is there a way to make flex treat NULL like a regular character?
@unnumberedsec Is there a way to make flex treat NULL like a regular character?
@@ -771,7 +735,6 @@ Is there a way to make flex treat NULL like a regular character?
Yes, \0 and \x00 should both do the trick. Perhaps you have an ancient
version of flex. The latest release is version @value{VERSION}.
-
@node Whenever flex can not match the input it says "flex scanner jammed".
@unnumberedsec Whenever flex can not match the input it says "flex scanner jammed".
@@ -792,14 +755,12 @@ e.g.,
See %option default for more information.
-@node Why doesn't flex have non-greedy operators like perl does?
+@node Why doesnt flex have non-greedy operators like perl does?
@unnumberedsec Why doesn't flex have non-greedy operators like perl does?
-Why doesn't flex have non-greedy operators like perl does?
-
A DFA can do a non-greedy match by stopping
the first time it enters an accepting state, instead of consuming input until
-it determines that no further matching is possible (a "jam" state). This
+it determines that no further matching is possible (a ``jam'' state). This
is actually easier to implement than longest leftmost match (which flex does).
But it's also much less useful than longest leftmost match. In general,
@@ -807,7 +768,7 @@ when you find yourself wishing for non-greedy matching, that's usually a
sign that you're trying to make the scanner do some parsing. That's
generally the wrong approach, since it lacks the power to do a decent job.
Better is to either introduce a separate parser, or to split the scanner
-into multiple scanners using (exclusive) start conditions.
+into multiple scanners using (exclusive) start conditions.
You might have
a separate start state once you've seen the BEGIN. In that state, you
@@ -816,7 +777,6 @@ state), and perhaps (.|\n) to get a single character within the chunk ...
This approach also has much better error-reporting properties.
-
@node Memory leak - 16386 bytes allocated by malloc.
@unnumberedsec Memory leak - 16386 bytes allocated by malloc.
@anchor{faq-memory-leak}
@@ -827,10 +787,10 @@ on.
The leak is about 16426 bytes. That is, (8192 * 2 + 2) for the read-buffer, and
about 40 for struct yy_buffer_state (depending upon alignment). The leak is in
the non-reentrant C scanner only (NOT in the reentrant scanner, NOT in the C++
-scanner). Since flex doesn't know when you are done, the buffer is never freed.
+scanner). Since flex doesn't know when you are done, the buffer is never freed.
However, the leak won't multiply since the buffer is reused no matter how many
-times you call yylex().
+times you call yylex().
If you want to reclaim the memory when you are completely done scanning, then
you might try this:
@@ -853,7 +813,7 @@ situation. It is possible that some other globals may need resetting as well.
@verbatim
> We thought that it would be possible to have this number through the
> evaluation of the following expression:
->
+>
> seek_position = (no_buffers)*YY_READ_BUF_SIZE + yy_c_buf_p - yy_current_buffer->yy_ch_buf
@end verbatim
@end example
@@ -888,7 +848,7 @@ In-reply-to: Your message of Thu, 08 Dec 94 13:10:58 EST.
Date: Wed, 14 Dec 94 16:40:47 PST
From: Vern Paxson <vern>
-> We'd like to override the provided LexerInput() and LexerOutput()
+> We'd like to override the provided LexerInput() and LexerOutput()
> functions, but we'd like to *not* use iostreams. Instead, we'd like
> to use some of our own I/O classes. Is this possible?
@@ -914,10 +874,10 @@ patterns?
In the example below, we want to skip over characters until we see the phrase
"endskip". The following will @emph{NOT} work correctly (do you see why not?)
-
+
@example
@verbatim
- /* INCORRECT SCANNER */
+/* INCORRECT SCANNER */
%x SKIP
%%
<INITIAL>startskip BEGIN(SKIP);
@@ -975,7 +935,7 @@ Date: Wed, 18 Sep 96 10:51:02 PDT
From: Vern Paxson <vern>
[Note, the most recent flex release is 2.5.4, which you can get from
- ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.]
+ftp.ee.lbl.gov. It has bug fixes over 2.5.2 and 2.5.3.]
> 1. Using the pattern
> ([Ff](oot)?)?[Nn](ote)?(\.)?
@@ -998,10 +958,10 @@ preferable.
> 3. I have a pattern that look like this:
> pats {p1}|{p2}|{p3}|...|{p50} (50 patterns ORd)
->
+>
> running yet another complicated program that includes the following rule:
> <snext>{and}/{no4}{bb}{pats}
->
+>
> gets me to "too complicated - over 32,000 states"...
I can't tell from this example whether the trailing context is variable-length
@@ -1021,11 +981,11 @@ this case '[Ff]oot' is preferred to '(F|f)oot'.
> 4. I changed a rule that looked like this:
> <snext8>{and}{bb}/{ROMAN}[^A-Za-z] { BEGIN...
->
+>
> to the next 2 rules:
> <snext8>{and}{bb}/{ROMAN}[A-Za-z] { ECHO;}
> <snext8>{and}{bb}/{ROMAN} { BEGIN...
->
+>
> Again, I understand the using [^...] will cause a great performance loss
Actually, it doesn't cause any sort of performance loss. It's a surprising
@@ -1082,14 +1042,13 @@ simplify your scanner - those are certainly preferable!
Vern
-
To increase the 32K limit (on a machine with 32 bit integers), you increase
the magnitude of the following in flexdef.h:
- #define JAMSTATE -32766 /* marks a reference to the state that always jams */
- #define MAXIMUM_MNS 31999
- #define BAD_SUBSCRIPT -32767
- #define MAX_SHORT 32700
+#define JAMSTATE -32766 /* marks a reference to the state that always jams */
+#define MAXIMUM_MNS 31999
+#define BAD_SUBSCRIPT -32767
+#define MAX_SHORT 32700
Adding a 0 or two after each should do the trick.
@end verbatim
@@ -1106,9 +1065,9 @@ In-reply-to: Your message of Thu, 03 Oct 1996 17:24:04 PDT.
Date: Fri, 04 Oct 1996 11:42:18 PDT
From: Vern Paxson <vern>
-> I assume as long as my *.l file defines the
-> range of expected character code values (in octal format), flex will
-> scan the file and read multi-byte characters correctly. But I have no
+> I assume as long as my *.l file defines the
+> range of expected character code values (in octal format), flex will
+> scan the file and read multi-byte characters correctly. But I have no
> confidence in this assumption.
Your lack of confidence is justified - this won't work.
@@ -1183,14 +1142,14 @@ That said ...
> #: main.c:545
> msgid " %d protos created\n"
->
+>
> Does proto mean prototype?
Yes - prototypes of state compression tables.
> #: main.c:539
> msgid " %d/%d (peak %d) template nxt-chk entries created\n"
->
+>
> Here I'm mainly puzzled by 'nxt-chk'. I guess it means 'next-check'. (?)
> However, 'template next-check entries' doesn't make much sense to me. To be
> able to find a good translation I need to know a little bit more about it.
@@ -1208,7 +1167,7 @@ way to compress the tables.
> #: main.c:533
> msgid " %d/%d base-def entries created\n"
->
+>
> The same problem here for 'base-def'.
See above.
@@ -1228,14 +1187,14 @@ In-reply-to: Your message of Wed, 13 Nov 1996 17:28:38 PST.
Date: Wed, 13 Nov 1996 19:51:54 PST
From: Vern Paxson <vern>
-> "unput()" them to input flow, question occurs. If I do this after I scan
-> a carriage, the variable "yy_current_buffer->yy_at_bol" is changed. That
-> means the carriage flag has gone.
+> "unput()" them to input flow, question occurs. If I do this after I scan
+> a carriage, the variable "yy_current_buffer->yy_at_bol" is changed. That
+> means the carriage flag has gone.
You can control this by calling yy_set_bol(). It's described in the manual.
-> And if in pre-reading it goes to the end of file, is anything done
-> to control the end of curren buffer and end of file?
+> And if in pre-reading it goes to the end of file, is anything done
+> to control the end of curren buffer and end of file?
No, there's no way to put back an end-of-file.
@@ -1259,8 +1218,8 @@ In-reply-to: Your message of Mon, 18 Nov 1996 09:45:02 PST.
Date: Mon, 18 Nov 1996 10:41:34 PST
From: Vern Paxson <vern>
-> I am not able to use the start condition scope and to use the | (OR) with
-> rules having start conditions.
+> I am not able to use the start condition scope and to use the | (OR) with
+> rules having start conditions.
The problem is that if you use '|' as a regular expression operator, for
example "a|b" meaning "match either 'a' or 'b'", then it must *not* have
@@ -1328,7 +1287,7 @@ From: Vern Paxson <vern>
> In my lexer code, i have the line :
> ^\*.* { }
->
+>
> Thus all lines starting with an astrix (*) are comment lines.
> This does not work !
@@ -1364,7 +1323,7 @@ Date: Wed, 27 Nov 1996 10:56:25 PST
From: Vern Paxson <vern>
> Organization(s)?/[a-z]
->
+>
> This matched "Organizations" (looking in debug mode, the trailing s
> was matched with trailing context instead of the optional (s) in the
> end of the word.
@@ -1409,10 +1368,10 @@ sometimes find there way to me, but some may drop between the cracks.
This is already mentioned in the manual:
- Finally, here's an example of how to match C-style quoted
- strings using exclusive start conditions, including expanded
- escape sequences (but not including checking for a string
- that's too long):
+Finally, here's an example of how to match C-style quoted
+strings using exclusive start conditions, including expanded
+escape sequences (but not including checking for a string
+that's too long):
The reason for not doing the overflow checking is that it will needlessly
clutter up an example whose main purpose is just to demonstrate how to
@@ -1492,11 +1451,11 @@ From: Vern Paxson <vern>
> #define YY_DECL int yylex (YYSTYPE *lvalp, struct parser_control
> *parm)
->
+>
> I have been trying to get this to work as a C++ scanner, but it does
> not appear to be possible (warning that it matches no declarations in
> yyFlexLexer, or something like that).
->
+>
> Is this supposed to be possible, or is it being worked on (I DID
> notice the comment that scanner classes are still experimental, so I'm
> not too hopeful)?
@@ -1521,7 +1480,7 @@ Date: Fri, 05 Sep 1997 10:01:54 PDT
From: Vern Paxson <vern>
> In that example you show how to count comment lines when using
-> C style /* ... */ comments. My question is, shouldn't you take into
+> C style /* ... */ comments. My question is, shouldn't you take into
> account a scenario where end of a comment marker occurs inside
> character or string literals?
@@ -1590,9 +1549,9 @@ In-reply-to: Your message of Fri, 12 Sep 1997 15:02:28 PDT.
Date: Fri, 12 Sep 1997 10:31:50 PDT
From: Vern Paxson <vern>
-> before I start beavering away I wonder if you know of any
-> place/libraries for flex
-> desciption files that might already do this or give me a head start ?
+> before I start beavering away I wonder if you know of any
+> place/libraries for flex
+> desciption files that might already do this or give me a head start ?
Unfortunately, no, I don't. You might try asking on comp.compilers.
@@ -1619,11 +1578,11 @@ From: Vern Paxson <vern>
> #else
> it \<I\>
> #endif
->
+>
> Now, I can't add states for these, as I have already too many states
> and the program is very complicated, and I won't be able to handle
> 10 or 20 more states.
->
+>
> Any trick to do this ?
You might try using m4, or the C preprocessor plus a sed script to
@@ -1689,17 +1648,17 @@ From: Vern Paxson <vern>
> I took a quick look into the flex-sources and altered some #defines in
> flexdefs.h:
->
-> #define INITIAL_MNS 64000
-> #define MNS_INCREMENT 1024000
+>
+> #define INITIAL_MNS 64000
+> #define MNS_INCREMENT 1024000
> #define MAXIMUM_MNS 64000
The things to fix are to add a couple of zeroes to:
- #define JAMSTATE -32766 /* marks a reference to the state that always jams */
- #define MAXIMUM_MNS 31999
- #define BAD_SUBSCRIPT -32767
- #define MAX_SHORT 32700
+#define JAMSTATE -32766 /* marks a reference to the state that always jams */
+#define MAXIMUM_MNS 31999
+#define BAD_SUBSCRIPT -32767
+#define MAX_SHORT 32700
and, if you get complaints about too many rules, make the following change too:
@@ -1724,12 +1683,12 @@ From: Vern Paxson <vern>
> stdin_handle = YY_CURRENT_BUFFER;
> ifstream fin( "aFile" );
> yy_switch_to_buffer( yy_create_buffer( fin, YY_BUF_SIZE ) );
->
+>
> What I'm wanting to do, is pass the contents of a file thru one set
> of rules and then pass stdin thru another set... It works great if, I
> don't use the C++ classes. But since everything else that I'm doing is
> in C++, I thought I'd be consistent.
->
+>
> The problem is that 'yy_create_buffer' is expecting an istream* as it's
> first argument (as stated in the man page). However, fin is a ifstream
> object. Any ideas on what I might be doing wrong? Any help would be
@@ -1786,11 +1745,11 @@ From: Vern Paxson <vern>
> /usr/lib/yaccpar: In function `int yyparse()':
> /usr/lib/yaccpar:184: warning: implicit declaration of function `int yylex(...)'
->
-> ld: Undefined symbol
-> _yylex
-> _yyparse
-> _yyin
+>
+> ld: Undefined symbol
+> _yylex
+> _yyparse
+> _yyin
This is a known problem with Solaris C++ (and/or Solaris yacc). I believe
the fix is to explicitly insert some 'extern "C"' statements for the
@@ -1896,7 +1855,7 @@ In-reply-to: Your message of Mon, 12 Jan 1998 18:58:23 PST.
Date: Mon, 12 Jan 1998 12:03:15 PST
From: Vern Paxson <vern>
-> The problem is how to determine the current position in flex active
+> The problem is how to determine the current position in flex active
> buffer when a rule is matched....
You will need to keep track of this explicitly, such as by redefining
@@ -2011,7 +1970,7 @@ From: Vern Paxson <vern>
> I am curious as to
> whether there is a simple way to backtrack from the generated source to
-> reproduce the lost list of tokens we are searching on.
+> reproduce the lost list of tokens we are searching on.
In theory, it's straight-forward to go from the DFA representation
back to a regular-expression representation - the two are isomorphic.
@@ -2043,10 +2002,10 @@ From: Vern Paxson <vern>
This is exactly what will happen if your input file has embedded NULs.
From the man page:
- A final note: flex is slow when matching NUL's, particularly
- when a token contains multiple NUL's. It's best to write
- rules which match short amounts of text if it's anticipated
- that the text will often include NUL's.
+A final note: flex is slow when matching NUL's, particularly
+when a token contains multiple NUL's. It's best to write
+rules which match short amounts of text if it's anticipated
+that the text will often include NUL's.
So that's the first thing to look for.
@@ -2104,8 +2063,8 @@ In-reply-to: Your message of Wed, 03 Jun 1998 11:26:22 PDT.
Date: Wed, 03 Jun 1998 10:22:26 PDT
From: Vern Paxson <vern>
-> I am researching the Y2K problem with General Electric R&D
-> and need to know if there are any known issues concerning
+> I am researching the Y2K problem with General Electric R&D
+> and need to know if there are any known issues concerning
> the above mentioned software and Y2K regardless of version.
There shouldn't be, all it ever does with the date is ask the system
@@ -2157,12 +2116,12 @@ From: Vern Paxson <vern>
> alpha [A-Za-z]
> dig [0-9]
> %%
->
+>
> Now you'd expect mylineno to be a member of each instance of class
> yyFlexLexer, but is this the case? A look at the lex.yy.cc file seems to
> indicate otherwise; unless I am missing something the declaration of
> mylineno seems to be outside any class scope.
->
+>
> How will this work if I want to run a multi-threaded application with each
> thread creating a FlexLexer instance?
@@ -2184,7 +2143,7 @@ Date: Tue, 04 Aug 1998 22:28:45 PDT
From: Vern Paxson <vern>
> Vern Paxson,
->
+>
> I followed your advice, posted on Usenet bu you, and emailed to me
> personally by you, on how to overcome the 32K states limit. I'm running
> on Linux machines.
@@ -2194,7 +2153,7 @@ From: Vern Paxson <vern>
> #define MAXIMUM_MNS 319990
> #define BAD_SUBSCRIPT -327670
> #define MAX_SHORT 327000
->
+>
> and compiled.
> All looked fine, including check and bigcheck, so I installed.
@@ -2280,13 +2239,13 @@ Content-Transfer-Encoding: 7bit
Hi Vern,
Yesterday, I encountered a strange problem: I use the macro processor m4
-to include some lengthy lists into a .l file. Following is a flex macro
+to include some lengthy lists into a .l file. Following is a flex macro
definition that causes some serious pain in my neck:
AUTHOR ("A. Boucard / L. Boucard"|"A. Dastarac / M. Levent"|"A.Boucaud / L.Boucaud"|"Abderrahim Lamchichi"|"Achmat Dangor"|"Adeline Toullier"|"Adewale Maja-Pearce"|"Ahmed Ziri"|"Akram Ellyas"|"Alain Bihr"|"Alain Gresh"|"Alain Guillemoles"|"Alain Joxe"|"Alain Morice"|"Alain Renon"|"Alain Zecchini"|"Albert Memmi"|"Alberto Manguel"|"Alex De Waal"|"Alfonso Artico"| [...])
The complete list contains about 10kB. When I try to "flex" this file
-(on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased
+(on a Solaris 2.6 machine, using a modified flex 2.5.4 (I only increased
some of the predefined values in flexdefs.h) I get the error:
myflex/flex -8 sentag.tmp.l
@@ -2298,11 +2257,11 @@ really means "/" and not "trailing context". Furthermore, I tried to
escape the slashes with backslashes, but with no use, the same error message
appeared when flexing the code.
-Do you have an idea what's going on here?
+Do you have an idea what's going on here?
Greetings from Germany,
Georg
---
+--
Georg Rehm georg@cl-ki.uni-osnabrueck.de
Institute for Semantic Information Processing, University of Osnabrueck, FRG
@end verbatim
@@ -2329,7 +2288,7 @@ removing spaces would do the same thing.
The fix is to either rethink how come you're using such a big macro and
perhaps there's another/better way to do it; or to rebuild flex's own
-scan.c with a larger value for
+scan.c with a larger value for
#define YY_BUF_SIZE 16384
@@ -2349,12 +2308,12 @@ Date: Sat, 05 Sep 1998 00:59:49 PDT
From: Vern Paxson <vern>
> %%
->
+>
> "TEST1\n" { fprintf(stderr, "TEST1\n"); yyless(5); }
> ^\n { fprintf(stderr, "empty line\n"); }
> . { }
> \n { fprintf(stderr, "new line\n"); }
->
+>
> %%
> -- input ---------------------------------------
> TEST1
@@ -2399,7 +2358,7 @@ From: Vern Paxson <vern>
> trying to make my scanner restart with a new file after my parser stops
> with a parse error. When my compiler restarts, the parser always
> receives the token after the token (in the old file!) that caused the
-> parser error.
+> parser error.
I suspect the problem is that your parser has read ahead in order
to attempt to resolve an ambiguity, and when it's restarted it picks
@@ -2516,10 +2475,10 @@ From: Vern Paxson <vern>
Increase the definitions in flexdef.h for:
- #define JAMSTATE -32766 /* marks a reference to the state that always j
+#define JAMSTATE -32766 /* marks a reference to the state that always j
ams */
- #define MAXIMUM_MNS 31999
- #define BAD_SUBSCRIPT -32767
+#define MAXIMUM_MNS 31999
+#define BAD_SUBSCRIPT -32767
recompile everything, and it should all work.
@@ -2599,11 +2558,11 @@ Date: Tue, 15 Jun 1999 08:55:43 -0700
From: "Aki Niimura" <neko@my-deja.com>
Message-ID: <KNONDOHDOBGAEAAA@my-deja.com>
Mime-Version: 1.0
-Cc:
+Cc:
X-Sent-Mail: on
-Reply-To:
+Reply-To:
X-Mailer: MailCity Service
-Subject: A question on flex C++ scanner
+Subject: A question on flex C++ scanner
X-Sender-Ip: 12.72.207.61
Organization: My Deja Email (http://www.my-deja.com:80)
Content-Type: text/plain; charset=us-ascii
@@ -2649,8 +2608,6 @@ Your response would be highly appreciated.
Best regards,
Aki Niimura
-
-
--== Sent via Deja.com http://www.deja.com/ ==--
Share what you know. Learn what you don't.
@end verbatim
@@ -2662,7 +2619,7 @@ Share what you know. Learn what you don't.
@example
@verbatim
To: neko@my-deja.com
-Subject: Re: A question on flex C++ scanner
+Subject: Re: A question on flex C++ scanner
In-reply-to: Your message of Tue, 15 Jun 1999 08:55:43 PDT.
Date: Tue, 15 Jun 1999 09:04:24 PDT
From: Vern Paxson <vern>
@@ -2750,10 +2707,10 @@ Date: Thu, 08 Jul 1999 08:20:39 PDT
From: Vern Paxson <vern>
> I was hoping you could help me with my problem.
->
+>
> I tried compiling (gnu)flex on a Solaris 2.4 machine
> but when I ran make (after configure) I got an error.
->
+>
> --------------------------------------------------------------
> gcc -c -I. -I. -g -O parse.c
> ./flex -t -p ./scan.l >scan.c
@@ -2761,14 +2718,14 @@ From: Vern Paxson <vern>
> *** Error code 1
> make: Fatal error: Command failed for target `scan.c'
> -------------------------------------------------------------
->
-> What's strange to me is that I'm only
-> trying to install flex now. I then edited the Makefile to
+>
+> What's strange to me is that I'm only
+> trying to install flex now. I then edited the Makefile to
> and changed where it says "FLEX = flex" to "FLEX = lex"
> ( lex: the native Solaris one ) but then it complains about
-> the "-p" option. Is there any way I can compile flex without
+> the "-p" option. Is there any way I can compile flex without
> using flex or lex?
->
+>
> Thanks so much for your time.
You managed to step on the bootstrap sequence, which first copies
@@ -2842,7 +2799,7 @@ From: Vern Paxson <vern>
Well, your problem is the
- switch (yybgin-yysvec-1) { /* witchcraft */
+switch (yybgin-yysvec-1) { /* witchcraft */
at the beginning of lex rules. "witchcraft" == "non-portable". It's
assuming knowledge of the AT&T lex's internal variables.
@@ -2895,7 +2852,7 @@ From: Vern Paxson <vern>
> However, I do not use unput anywhere. I do use self-referencing
> rules like this:
->
+>
> UnaryExpr ({UnionExpr})|("-"{UnaryExpr})
You can't do this - flex is *not* a parser like yacc (which does indeed
@@ -2921,7 +2878,7 @@ If this is exactly your program:
> digit [0-9]
> digits {digit}+
> whitespace [ \t\n]+
->
+>
> %%
> "[" { printf("open_brac\n");}
> "]" { printf("close_brac\n");}
@@ -2935,4 +2892,3 @@ then the problem is that the last rule needs to be "{whitespace}" !
Vern
@end verbatim
@end example
-