summaryrefslogtreecommitdiff
path: root/flex.1
diff options
context:
space:
mode:
authorVern Paxson <vern@ee.lbl.gov>1993-11-10 10:06:51 +0000
committerVern Paxson <vern@ee.lbl.gov>1993-11-10 10:06:51 +0000
commitfd48fa42d0b3c457e90842fa572edab21087d55b (patch)
treedd19ee1d26d8f0f6cd51ab02556c97876c67dae5 /flex.1
parent31395609cae40e75df0d326fdd03b5c830be7de3 (diff)
2.4 documentation
Diffstat (limited to 'flex.1')
-rw-r--r--flex.1853
1 files changed, 665 insertions, 188 deletions
diff --git a/flex.1 b/flex.1
index b5dad63..3d12fb9 100644
--- a/flex.1
+++ b/flex.1
@@ -1,9 +1,9 @@
-.TH FLEXDOC 1 "October 1993" "Version 2.4"
+.TH FLEXDOC 1 "November 1993" "Version 2.4"
.SH NAME
flexdoc \- documentation for flex, fast lexical analyzer generator
.SH SYNOPSIS
.B flex
-.B [\-bcdfinpstvFILT8 \-C[efmF] \-Sskeleton]
+.B [\-abcdfhinpstvwBFILTV78+ \-C[efmF] \-Pprefix \-Sskeleton]
.I [filename ...]
.SH DESCRIPTION
.I flex
@@ -311,6 +311,7 @@ expressions. These are:
<s1,s2,s3>r
same, but in any of start conditions s1,
s2, or s3
+ <*>r an r in any start condition, even an exclusive one.
<<EOF>> an end-of-file
@@ -318,6 +319,10 @@ expressions. These are:
an end-of-file when in start condition s1 or s2
.fi
+Note that inside of a character class, all regular expression operators
+lose their special meaning except escape ('\\') and the character class
+operators, '-', ']', and, at the beginning of the class, '^'.
+.PP
The regular expressions listed above are grouped according to
precedence, from highest precedence at the top to lowest at the bottom.
Those grouped together have equal precedence. For example,
@@ -362,9 +367,8 @@ characters explicitly present in the negated character class
(e.g., "[^A-Z\\n]"). This is unlike how many other regular
expression tools treat negated character classes, but unfortunately
the inconsistency is historically entrenched.
-Matching newlines means that a pattern like [^"]* can match an entire
-input (overflowing the scanner's input buffer) unless there's another
-quote in the input.
+Matching newlines means that a pattern like [^"]* can match the entire
+input unless there's another quote in the input.
.IP -
A rule can have at most one instance of trailing context (the '/' operator
or the '$' operator). The start condition, '^', and "<<EOF>>" patterns
@@ -436,6 +440,92 @@ input is:
.fi
which generates a scanner that simply copies its input (one character
at a time) to its output.
+.PP
+Note that
+.B yytext
+can be defined in two different ways: either as a character
+.I pointer
+or as a character
+.I array.
+You can control which definition
+.I flex
+uses by including one of the special directives
+.B %pointer
+or
+.B %array
+in the first (definitions) section of your flex input. The default is
+.B %pointer.
+The advantage of using
+.B %pointer
+is substantially faster scanning and no buffer overflow when matching
+very large tokens (unless you run out of dynamic memory). The disadvantage
+is that you are restricted in how your actions can modify
+.B yytext
+(see the next section), and calls to the
+.B input()
+and
+.B unput()
+functions destroy the present contents of
+.B yytext,
+which can be a considerable porting headache when moving between different
+.I lex
+versions.
+.PP
+The advantage of
+.B %array
+is that you can then modify
+.B yytext
+to your heart's content, and calls to
+.B input()
+and
+.B unput()
+do not destroy
+.B yytext
+(see below). Furthermore, existing
+.I lex
+programs sometimes access
+.B yytext
+externally using declarations of the form:
+.nf
+ extern char yytext[];
+.fi
+This definition is erroneous when used with
+.B %pointer,
+but correct for
+.B %array.
+.PP
+.B %array
+defines
+.B yytext
+to be an array of
+.B YYLMAX
+characters, which defaults to a fairly large value. You can change
+the size by simply #define'ing
+.B YYLMAX
+to a different value in the first section of your
+.I flex
+input. As mentioned above, with
+.B %pointer
+yytext grows dynamically to accomodate large tokens. While this means your
+.B %pointer
+scanner can accomodate very large tokens (such as matching entire blocks
+of comments), bear in mind that each time the scanner must resize
+.B yytext
+it also must rescan the entire token from the beginning, so matching such
+tokens can prove slow.
+.B yytext
+presently does
+.I not
+dynamically grow if a call to
+.B unput()
+results in too much text being pushed back; instead, a run-time error results.
+.PP
+Also note that you cannot use
+.B %array
+with C++ scanner classes
+(the
+.B \-+
+option; see below).
.SH ACTIONS
Each pattern in a rule has a corresponding action, which can be any
arbitrary C statement. The pattern ends at the first non-escaped
@@ -485,14 +575,25 @@ is called it continues processing tokens from where it last left
off until it either reaches
the end of the file or executes a return.
.PP
-Actions are free to modify yytext except for lengthening it (adding
+Actions are free to modify
+.B yytext
+except for lengthening it (adding
characters to its end--these will overwrite later characters in the
input stream). Modifying the final character of yytext may alter
whether when scanning resumes rules anchored with '^' are active.
Specifically, changing the final character of yytext to a newline will
activate such rules on the next scan, and changing it to anything else
will deactivate the rules. Users should not rely on this behavior being
-present in future releases.
+present in future releases. Finally, note that none of this paragraph
+applies when using
+.B %array
+(see above).
+.PP
+Actions are free to modify
+.B yyleng
+except they should not do so if the action also includes use of
+.B yymore()
+(see below).
.PP
There are a number of special directives which can be included within
an action:
@@ -758,7 +859,6 @@ is pointed at a new input file (in which case scanning continues from
that file), or
.B yyrestart()
is called.
-.I yyin
.B yyrestart()
takes one argument, a
.B FILE *
@@ -839,10 +939,7 @@ caller.
.PP
The default
.B yywrap()
-always returns 1. Presently, to redefine it you must first
-"#undef yywrap", as it is currently implemented as a macro. As indicated
-by the hedging in the previous sentence, it may be changed to
-a true function in the near future.
+always returns 1.
.PP
The scanner writes its
.B ECHO
@@ -929,6 +1026,18 @@ is equivalent to
.fi
.PP
+Also note that the special start-condition specifier
+.B <*>
+matches every start condition. Thus, the above example could also
+have been written;
+.nf
+
+ %x example
+ %%
+ <*>foo /* do something */
+
+.fi
+.PP
The default rule (to
.B ECHO
any unmatched character) remains active in start conditions.
@@ -1060,11 +1169,74 @@ macro. For example, the above assignments to
.I comment_caller
could instead be written
.nf
+
comment_caller = YY_START;
.fi
.PP
Note that start conditions do not have their own name-space; %s's and %x's
declare names in the same fashion as #define's.
+.PP
+Finally, here's an example of how to match C-style quoted strings using
+exclusive start conditions, including expanded escape sequences (but
+not including checking for a string that's too long):
+.nf
+
+ %x str
+
+ %%
+ char string_buf[MAX_STR_CONST];
+ char *string_buf_ptr;
+
+
+ \\" string_buf_ptr = string_buf; BEGIN(str);
+
+ <str>\\" { /* saw closing quote - all done */
+ BEGIN(INITIAL);
+ *string_buf_ptr = '\\0';
+ /* return string constant token type and
+ * value to parser
+ */
+ }
+
+ <str>\\n {
+ /* error - unterminated string constant */
+ /* generate error message */
+ }
+
+ <str>\\\\[0-7]{1,3} {
+ /* octal escape sequence */
+ int result;
+
+ (void) sscanf( yytext + 1, "%o", &result );
+
+ if ( result > 0xff )
+ /* error, constant is out-of-bounds */
+
+ *string_buf_ptr++ = result;
+ }
+
+ <str>\\\\[0-9]+ {
+ /* generate error - bad escape sequence; something
+ * like '\\48' or '\\0777777'
+ */
+ }
+
+ <str>\\\\n *string_buf_ptr++ = '\\n';
+ <str>\\\\t *string_buf_ptr++ = '\\t';
+ <str>\\\\r *string_buf_ptr++ = '\\r';
+ <str>\\\\b *string_buf_ptr++ = '\\b';
+ <str>\\\\f *string_buf_ptr++ = '\\f';
+
+ <str>\\\\(.|\\n) *string_buf_ptr++ = yytext[1];
+
+ <str>[^\\\\\\n\\"]+ {
+ char *yytext_ptr = yytext;
+
+ while ( *yytext_ptr )
+ *string_buf_ptr++ = *yytext_ptr++;
+ }
+
+.fi
.SH MULTIPLE INPUT BUFFERS
Some scanners (such as those which support "include" files)
require reading from several input streams. As
@@ -1324,53 +1496,18 @@ part of the scanner might look like:
[0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
.fi
-.SH TRANSLATION TABLE
-In the name of POSIX compliance,
-.I flex
-supports a
-.I translation table
-for mapping input characters into groups.
-The table is specified in the first section, and its format looks like:
-.nf
-
- %t
- 1 abcd
- 2 ABCDEFGHIJKLMNOPQRSTUVWXYZ
- 52 0123456789
- 6 \\t\\ \\n
- %t
-
-.fi
-This example specifies that the characters 'a', 'b', 'c', and 'd'
-are to all be lumped into group #1, upper-case letters
-in group #2, digits in group #52, tabs, blanks, and newlines into
-group #6, and
-.I
-no other characters will appear in the patterns.
-The group numbers are actually disregarded by
-.I flex;
-.B %t
-serves, though, to lump characters together. Given the above
-table, for example, the pattern "a(AA)*5" is equivalent to "d(ZQ)*0".
-They both say, "match any character in group #1, followed by
-zero-or-more pairs of characters
-from group #2, followed by a character from group #52." Thus
-.B %t
-provides a crude way for introducing equivalence classes into
-the scanner specification.
-.PP
-Note that the
-.B \-i
-option (see below) coupled with the equivalence classes which
-.I flex
-automatically generates take care of virtually all the instances
-when one might consider using
-.B %t.
-But what the hell, it's there if you want it.
.SH OPTIONS
.I flex
has the following options:
.TP
+.B \-a
+(``align'') instructs flex to trade off larger tables in the
+generated scanner for faster performance because the elements of
+the tables are better aligned for memory access and computation. On some RISC
+architectures, fetching and manipulating longwords is more efficient than
+with smaller-sized datums such as shortwords. This option can
+double the size of the tables used by your scanner.
+.TP
.B \-b
Generate backing-up information to
.I lex.backup.
@@ -1384,8 +1521,8 @@ or
is used, the generated scanner will run faster (see the
.B \-p
flag). Only users who wish to squeeze every last cycle out of their
-scanners need worry about this option. (See the section on PERFORMANCE
-CONSIDERATIONS below.)
+scanners need worry about this option. (See the section on Performance
+Considerations below.)
.TP
.B \-c
is a do-nothing, deprecated option included for POSIX compliance.
@@ -1441,6 +1578,13 @@ This option is equivalent to
.B \-Cf
(see below).
.TP
+.B \-h
+generates a "help" summary of
+.I flex's
+options to
+.I stderr
+and then exits.
+.TP
.B \-i
instructs
.I flex
@@ -1462,10 +1606,13 @@ POSIX compliance.
generates a performance report to stderr. The report
consists of comments regarding features of the
.I flex
-input file which will cause a loss of performance in the resulting scanner.
+input file which will cause a serious loss of performance in the resulting
+scanner. If you give the flag twice, you will also get comments regarding
+features that lead to minor performance losses.
+.IP
Note that the use of
.I REJECT
-and variable trailing context (see the BUGS section in flex(1))
+and variable trailing context (see the Bugs section in flex(1))
entails a substantial performance penalty; use of
.I yymore(),
the
@@ -1499,13 +1646,41 @@ should write to
a summary of statistics regarding the scanner it generates.
Most of the statistics are meaningless to the casual
.I flex
-user, but the
-first line identifies the version of
-.I flex,
-which is useful for figuring
-out where you stand with respect to patches and new releases,
-and the next two lines give the date when the scanner was created
-and a summary of the flags which were in effect.
+user, but the first line identifies the version of
+.I flex
+(same as reported by
+.B \-V),
+and the next line the flags used when generating the scanner, including
+those that are on by default.
+.TP
+.B \-w
+suppresses warning messages.
+.TP
+.B \-B
+instructs
+.I flex
+to generate a
+.I batch
+scanner, the opposite of
+.I interactive
+scanners generated by
+.B \-I
+(see below). In general, you use
+.B \-B
+when you are
+.I certain
+that your scanner will never be used interactively, and you want to
+squeeze a
+.I little
+more performance out of it. If your goal is instead to squeeze out a
+.I lot
+more performance, you should be using the
+.B \-Cf
+or
+.B \-CF
+options (discussed below), which turn on
+.B \-B
+automatically anyway.
.TP
.B \-F
specifies that the
@@ -1542,43 +1717,44 @@ instructs
.I flex
to generate an
.I interactive
-scanner. Normally, scanners generated by
-.I flex
-always look ahead one
-character before deciding that a rule has been matched. At the cost of
-some scanning overhead,
-.I flex
-will generate a scanner which only looks ahead
-when needed. Such scanners are called
-.I interactive
-because if you want to write a scanner for an interactive system such as a
-command shell, you will probably want the user's input to be terminated
-with a newline, and without
-.B \-I
-the user will have to type a character in addition to the newline in order
-to have the newline recognized. This leads to dreadful interactive
-performance.
+scanner. An interactive scanner is one that only looks ahead to decide
+what token has been matched if it absolutely must. It turns out that
+always looking one extra character ahead, even if the scanner has already
+seen enough text to disambiguate the current token, is a bit faster than
+only looking ahead when necessary. But scanners that always look ahead
+give dreadful interactive performance; for example, when a user types
+a newline, it is not recognized as a newline token until they enter
+.I another
+token, which often means typing in another whole line.
.IP
-If all this seems to confusing, here's the general rule: if a human will
-be typing in input to your scanner, use
-.B \-I,
-otherwise don't; if you don't care about squeezing the utmost performance
-from your scanner and you
-don't want to make any assumptions about the input to your scanner,
+.I Flex
+scanners default to
+.I interactive
+unless you use the
+.B \-Cf
+or
+.B \-CF
+table-compression options (see below). That's because if you're looking
+for high-performance you should be using one of these options, so if you
+didn't,
+.I flex
+assumes you'd rather trade off a bit of run-time performance for intuitive
+interactive behavior. Note also that you
+.I cannot
use
-.B \-I.
-.IP
-Note,
.B \-I
-cannot be used in conjunction with
-.I full
-or
-.I fast tables,
-i.e., the
-.B \-f, \-F, \-Cf,
+in conjunction with
+.B \-Cf
or
-.B \-CF
-flags.
+.B \-CF.
+Thus, this option is not really needed; it is on by default for all those
+cases in which it is allowed.
+.IP
+You can force a scanner to
+.I not
+be interactive by using
+.B \-B
+(see above).
.TP
.B \-L
instructs
@@ -1614,29 +1790,73 @@ the form of the input and the resultant non-deterministic and deterministic
finite automata. This option is mostly for use in maintaining
.I flex.
.TP
-.B \-8
+.B \-V
+prints the version number to
+.I stderr
+and exits.
+.TP
+.B \-7
instructs
.I flex
-to generate an 8-bit scanner, i.e., one which can recognize 8-bit
-characters. On some sites,
-.I flex
-is installed with this option as the default. On others, the default
-is 7-bit characters. To see which is the case, check the verbose
-.B (\-v)
-output for "equivalence classes created". If the denominator of
-the number shown is 128, then by default
+to generate a 7-bit scanner, i.e., one which can only recognized 7-bit
+characters in its input. The advantage of using
+.B \-7
+is that the scanner's tables can be up to half the size of those generated
+using the
+.B \-8
+option (see below). The disadvantage is that such scanners often hang
+or crash if their input contains an 8-bit character.
+.IP
+Note, however, that unless you generate your scanner using the
+.B \-Cf
+or
+.B \-CF
+table compression options, use of
+.B \-7
+will save only a small amount of table space, and make your scanner
+considerably less portable.
+.I Flex's
+default behavior is to generate an 8-bit scanner unless you use the
+.B \-Cf
+or
+.B \-CF,
+in which case
.I flex
-is generating 7-bit characters. If it is 256, then the default is
-8-bit characters and the
+defaults to generating 7-bit scanners unless your site was always
+configured to generate 8-bit scanners (as will often be the case
+with non-USA sites). You can tell whether flex generated a 7-bit
+or an 8-bit scanner by inspecting the flag summary in the
+.B \-v
+output as described above.
+.IP
+Note that if you use
+.B \-Cfe
+or
+.B \-CFe
+(those table compression options, but also using equivalence classes as
+discussed see below), flex still defaults to generating an 8-bit
+scanner, since usually with these compression options full 8-bit tables
+are not much more expensive than 7-bit tables.
+.TP
.B \-8
-flag is not required (but may be a good idea to keep the scanner
-specification portable). Feeding a 7-bit scanner 8-bit characters
-will result in infinite loops, bus errors, or other such fireworks,
-so when in doubt, use the flag. Note that if equivalence classes
-are used, 8-bit scanners take only slightly more table space than
-7-bit scanners (128 bytes, to be exact); if equivalence classes are
-not used, however, then the tables may grow up to twice their
-7-bit size.
+instructs
+.I flex
+to generate an 8-bit scanner, i.e., one which can recognize 8-bit
+characters. This flag is only needed for scanners generated using
+.B \-Cf
+or
+.B \-CF,
+as otherwise flex defaults to generating an 8-bit scanner anyway.
+.IP
+See the discussion of
+.B \-7
+above for flex's default behavior and the tradeoffs between 7-bit
+and 8-bit scanners.
+.TP
+.B \-+
+specifies that you want flex to generate a C++
+scanner class. See the section on Generating C++ Scanners below for
+details.
.TP
.B \-C[efmF]
controls the degree of table compression.
@@ -1729,6 +1949,58 @@ compression.
is often a good compromise between speed and size for production
scanners.
.TP
+.B \-Pprefix
+changes the default
+.I "yy"
+prefix used by
+.I flex
+for all globally-visible variable and function names to instead be
+.I prefix.
+For example,
+.B \-Pfoo
+changes the name of
+.B yytext
+to
+.B footext.
+It also changes the name of the default output file from
+.B lex.yy.c
+to
+.B lex.foo.c.
+Here are all of the names affected:
+.nf
+
+ yyFlexLexer
+ yy_create_buffer
+ yy_delete_buffer
+ yy_flex_debug
+ yy_init_buffer
+ yy_load_buffer_state
+ yy_switch_to_buffer
+ yyin
+ yyleng
+ yylex
+ yyout
+ yyrestart
+ yytext
+ yywrap
+
+.fi
+Within your scanner itself, you can still refer to the global variables
+and functions using either version of their name; but eternally, they
+have the modified name.
+.IP
+This option lets you easily link together multiple
+.I flex
+programs into the same executable. Note, though, that using this
+option also renames
+.B yywrap(),
+so you now
+.I must
+provide your own (appropriately-named) version of the routine for your
+scanner, as linking with
+.B \-lfl
+no longer provides one for you by default.
+.TP
.B \-Sskeleton_file
overrides the default skeleton file from which
.I flex
@@ -1739,8 +2011,12 @@ maintenance or development.
The main design goal of
.I flex
is that it generate high-performance scanners. It has been optimized
-for dealing well with large sets of rules. Aside from the effects
-of table compression on scanner speed outlined above,
+for dealing well with large sets of rules. Aside from the effects on
+scanner speed of the table compression
+.B \-C
+and
+.B \-a
+options outlined above,
there are a number of options/actions which degrade performance. These
are, from most expensive to least:
.nf
@@ -1901,8 +2177,15 @@ or as
Note that here the special '|' action does
.I not
provide any savings, and can even make things worse (see
-.B BUGS
-in flex(1)).
+.PP
+A final note regarding performance: as mentioned above in the section
+How the Input is Matched, dynamically resizing
+.B yytext
+to accomodate huge tokens is a slow process because it presently requires that
+the (huge) token be rescanned from the beginning. Thus if performance is
+vital, you should attempt to match "large" quantities of text but not
+"huge" quantities, where the cutoff between the two is at about 8K
+characters/token.
.PP
Another area where the user can increase a scanner's performance
(and one that's easier to implement) arises from the fact that
@@ -2047,6 +2330,192 @@ multiple NUL's.
It's best to write rules which match
.I short
amounts of text if it's anticipated that the text will often include NUL's.
+.SH GENERATING C++ SCANNERS
+.I flex
+provides two different ways to generate scanners for use with C++. The
+first way is to simply compile a scanner generated by
+.I flex
+using a C++ compiler instead of a C compiler. You should not encounter
+any compilations errors (please report any you find to the email address
+given in the Author section below). You can then use C++ code in your
+rule actions instead of C code. Note that the default input source for
+your scanner remains
+.I yyin,
+and default echoing is still done to
+.I yyout.
+Both of these remain
+.I FILE *
+variables and not C++
+.I streams.
+.PP
+You can also use
+.I flex
+to generate a C++ scanner class, using the
+.B \-+
+option, which is automatically specified if the name of the flex
+executable ends in a '+', such as
+.I flex++.
+When using this option, flex defaults to generating the scanner to the file
+.B lex.yy.cc
+instead of
+.B lex.yy.c.
+The generated scanner includes the header file
+.I FlexLexer.h,
+which defines the interface to two C++ classes.
+.PP
+The first class,
+.B FlexLexer,
+provides an abstract base class defining the general scanner class
+interface. It provides the following member functions:
+.TP
+.B const char* YYText()
+returns the text of the most recently matched token, the equivalent of
+.B yytext.
+.TP
+.B int YYLeng()
+returns the length of the most recently matched token, the equivalent of
+.B yyleng.
+.PP
+Also provided are member functions equivalent to
+.B yy_switch_to_buffer(),
+.B yy_create_buffer()
+(though the first argument is an
+.B istream*
+object pointer and not a
+.B FILE*),
+.B yy_delete_buffer(),
+and
+.B yyrestart()
+(again, the first argument is a
+.B istream*
+object pointer).
+.PP
+The second class defined in
+.I FlexLexer.h
+is
+.B yyFlexLexer,
+which is derived from
+.B FlexLexer.
+It defines the following additional member functions:
+.TP
+.B
+yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
+constructs a
+.B yyFlexLexer
+object using the given streams for input and output. If not specified,
+the streams default to
+.B cin
+and
+.B cout,
+respectively.
+.TP
+.B virtual int yylex()
+performs the same role is
+.B yylex()
+does for ordinary flex scanners: it scans the input stream, consuming
+tokens, until a rule's action returns a value.
+.PP
+In addition,
+.B yyFlexLexer
+defines the following protected virtual functions which you can redefine
+in derived classes to tailor the scanner's input and output:
+.TP
+.B
+virtual int LexerInput( char* buf, int max_size )
+reads up to
+.B max_size
+characters into
+.B buf
+and returns the number of characters read. To indicate end-of-input,
+return 0 characters.
+.TP
+.B
+virtual void LexerOutput( const char* buf, int size )
+writes out
+.B size
+characters from the buffer
+.B buf,
+which, while NUL-terminated, may also contain "internal" NUL's if
+the scanner's rules can match text with NUL's in them.
+.PP
+Note that a
+.B yyFlexLexer
+object contains its
+.I entire
+scanning state. Thus you can use such objects to create reentrant
+scanners. You can instantiate multiple instances of the same
+.B yyFlexLexer
+class, and you can also combine multiple C++ scanner classes together
+in the same program using the
+.B \-P
+option discussed above.
+.PP
+Finally, note that the
+.B %array
+feature is not available to C++ scanner classes; you must use
+.B %pointer
+(the default).
+.PP
+Here is an example of a simple C++ scanner:
+.nf
+
+ // An example of using the flex C++ scanner class.
+
+ %{
+ int mylineno = 0;
+ %}
+
+ string \\"[^\\n"]+\\"
+
+ ws [ \\t]+
+
+ alpha [A-Za-z]
+ dig [0-9]
+ name ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
+ num1 [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
+ num2 [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
+ number {num1}|{num2}
+
+ %%
+
+ {ws} /* skip blanks and tabs */
+
+ "/*" {
+ int c;
+
+ while((c = yyinput()) != 0)
+ {
+ if(c == '\\n')
+ ++mylineno;
+
+ else if(c == '*')
+ {
+ if((c = yyinput()) == '/')
+ break;
+ else
+ unput(c);
+ }
+ }
+ }
+
+ {number} cout << "number " << YYText() << '\\n';
+
+ \\n mylineno++;
+
+ {name} cout << "name " << YYText() << '\\n';
+
+ {string} cout << "string " << YYText() << '\\n';
+
+ %%
+
+ int main( int /* argc */, char** /* argv */ )
+ {
+ FlexLexer* lexer = new yyFlexLexer;
+ while(lexer->yylex() != 0)
+ ;
+ return 0;
+ }
+.fi
.SH INCOMPATIBILITIES WITH LEX AND POSIX
.I flex
is a rewrite of the Unix
@@ -2057,20 +2526,16 @@ are of concern to those who wish to write scanners acceptable
to either implementation. At present, the POSIX
.I lex
draft is
-very close to the original
+close to the original
.I lex
implementation, so some of these
incompatibilities are also in conflict with the POSIX draft. But
-the intent is that except as noted below,
+the intent is that ultimately
.I flex
-as it presently stands will
-ultimately be POSIX conformant (i.e., that those areas of conflict with
-the POSIX draft will be resolved in
-.I flex's
-favor). Please bear in
+will be fully POSIX-conformant. Please bear in
mind that all the comments which follow are with regard to the POSIX
.I draft
-standard of Summer 1989, and not the final document (or subsequent
+of Spring 1990 (draft 10), and not the final document (or subsequent
drafts); they are included so
.I flex
users can be aware of the standardization issues and those areas where
@@ -2138,11 +2603,7 @@ such writes are automatically flushed since
.I lex
scanners use
.B getchar()
-for their input. Also, when writing interactive scanners with
-.I flex,
-the
-.B \-I
-flag must be used.
+for their input.
.IP -
.I flex
scanners are not as reentrant as
@@ -2164,6 +2625,11 @@ To reenter the scanner, first use
.fi
Note that this call will throw away any buffered input; usually this
isn't a problem with an interactive scanner.
+.IP
+Also note that flex C++ scanner classes
+.I are
+reentrant, so if using C++ is an option for you, you should use
+them instead. See "Generating C++ Scanners" above for details.
.IP -
.B output()
is not supported.
@@ -2174,9 +2640,8 @@ macro is done to the file-pointer
(default
.I stdout).
.IP
-The POSIX draft mentions that an
.B output()
-routine exists but currently gives no details as to what it does.
+is not part of the POSIX draft.
.IP -
.I lex
does not support exclusive start conditions (%x), though they
@@ -2201,7 +2666,7 @@ and the precedence is such that the '?' is associated with
.I flex,
the rule will be expanded to
"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
-.PP
+.IP
Note that if the definition begins with
.B ^
or ends with
@@ -2235,17 +2700,6 @@ The
(generate a Ratfor scanner) option is not supported. It is not part
of the POSIX draft.
.IP -
-If you are providing your own yywrap() routine, you must include a
-"#undef yywrap" in the definitions section (section 1). Note that
-the "#undef" will have to be enclosed in %{}'s.
-.IP
-The POSIX draft
-specifies that yywrap() is a function and this is very unlikely to change; so
-.I flex users are warned
-that
-.B yywrap()
-is likely to be changed to a function in the near future.
-.IP -
After a call to
.B unput(),
.I yytext
@@ -2276,21 +2730,6 @@ or 'bar' anywhere", whereas
interprets it as "match either 'foo' or 'bar' if they come at the beginning
of a line". The latter is in agreement with the current POSIX draft.
.IP -
-To refer to yytext outside of the scanner source file,
-the correct definition with
-.I flex
-is "extern char *yytext" rather than "extern char yytext[]".
-This is contrary to the current POSIX draft but a point on which
-.I flex
-will not be changing, as the array representation entails a
-serious performance penalty. It is hoped that the POSIX draft will
-be emended to support the
-.I flex
-variety of declaration (as this is a fairly painless change to
-require of
-.I lex
-users).
-.IP -
.I yyin
is
.I initialized
@@ -2343,15 +2782,17 @@ or the POSIX draft standard:
yyterminate()
<<EOF>>
+ <*>
YY_DECL
+ YY_START
+ YY_USER_ACTION
#line directives
%{}'s around actions
- yyrestart()
- comments beginning with '#' (deprecated)
multiple actions on a line
.fi
-This last feature refers to the fact that with
+plus almost all of the flex flags.
+The last feature in the list refers to the fact that with
.I flex
you can put multiple actions on the same line, separated with
semi-colons, while with
@@ -2372,6 +2813,23 @@ is (rather surprisingly) truncated to
does not truncate the action. Actions that are not enclosed in
braces are simply terminated at the end of the line.
.SH DIAGNOSTICS
+If you receive errors when linking a
+.I flex
+scanner complaining about the following missing routines:
+.ds
+ yywrap
+ yy_flex_alloc
+ yy_flex_realloc
+ yy_flex_free
+.de
+then you forgot to link your program with
+.B \-lfl.
+This run-time library is
+.I required
+for all
+.I flex
+scanners.
+.PP
.I warning, rule cannot be matched
indicates that the given rule
cannot be matched because it follows other rules that will
@@ -2390,8 +2848,8 @@ in a scanner suppresses this warning.
.PP
.I warning,
.B \-s
-.I option given but default rule
-.I can be matched
+.I
+option given but default rule can be matched
means that it is possible (perhaps only in a particular start condition)
that the default rule (match any single character) is the only one
that will match a particular input. Since
@@ -2426,20 +2884,41 @@ people who can argue compellingly that they need it.)
a scanner compiled with
.B \-s
has encountered an input string which wasn't matched by
-any of its rules.
-.PP
-.I flex input buffer overflowed -
-a scanner rule matched a string long enough to overflow the
-scanner's internal input buffer (16K bytes by default - controlled by
-.B YY_BUF_SIZE
-in "flex.skel". Note that to redefine this macro, you must first
-.B #undef
-it).
+any of its rules. This error can also occur due to internal problems.
+.PP
+.I token too large, exceeds YYLMAX -
+your scanner uses
+.B %array
+and one of its rules matched a string longer than the
+.B YYLMAX
+constant (8K bytes by default). You can increase the value by
+#define'ing
+.B YYLMAX
+in the definitions section of your
+.I flex
+input.
+.PP
+.I scanner requires \-8 flag to
+.I use the character 'x' -
+Your scanner specification includes recognizing the 8-bit character
+.I 'x'
+and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
+because you used the
+.B \-Cf
+or
+.B \-CF
+table compression options. See the discussion of the
+.B \-7
+flag for details.
.PP
-.I scanner requires \-8 flag -
-Your scanner specification includes recognizing 8-bit characters and
-you did not specify the \-8 flag (and your site has not installed flex
-with \-8 as the default).
+.I flex scanner push-back overflow -
+you used
+.B unput()
+to push back so much text that the scanner's buffer could not hold
+both the pushed-back text and the current token in
+.B yytext.
+Ideally the scanner should dynamically resize the buffer in this case, but at
+present it does not.
.PP
.I
fatal flex scanner internal error--end of buffer missed -
@@ -2451,17 +2930,15 @@ reentering the scanner, use:
yyrestart( yyin );
.fi
-.PP
-.I too many %t classes! -
-You managed to put every single character into its own %t class.
-.I flex
-requires that at least one of the classes share characters.
+or, as noted above, switch to using the C++ scanner class.
.PP
.I too many start conditions in <> construct! -
you listed more start conditions in a <> construct than exist (so
you must have listed at least one of them twice).
-.SH DEFICIENCIES / BUGS
+.SH FILES
See flex(1).
+.SH DEFICIENCIES / BUGS
+Again, see flex(1).
.SH "SEE ALSO"
.PP
flex(1), lex(1), yacc(1), sed(1), awk(1).