summaryrefslogtreecommitdiff
path: root/to.do/unicode/flex.1
diff options
context:
space:
mode:
Diffstat (limited to 'to.do/unicode/flex.1')
-rw-r--r--to.do/unicode/flex.14099
1 files changed, 4099 insertions, 0 deletions
diff --git a/to.do/unicode/flex.1 b/to.do/unicode/flex.1
new file mode 100644
index 0000000..545c58f
--- /dev/null
+++ b/to.do/unicode/flex.1
@@ -0,0 +1,4099 @@
+.TH FLEX 1 "April 1995" "Version 2.5"
+.SH NAME
+flex \- fast lexical analyzer generator
+.SH SYNOPSIS
+.B flex
+.B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
+.B [\-\-help \-\-version]
+.I [filename ...]
+.SH OVERVIEW
+This manual describes
+.I flex,
+a tool for generating programs that perform pattern-matching on text. The
+manual includes both tutorial and reference sections:
+.nf
+
+ Description
+ a brief overview of the tool
+
+ Some Simple Examples
+
+ Format Of The Input File
+
+ Patterns
+ the extended regular expressions used by flex
+
+ How The Input Is Matched
+ the rules for determining what has been matched
+
+ Actions
+ how to specify what to do when a pattern is matched
+
+ The Generated Scanner
+ details regarding the scanner that flex produces;
+ how to control the input source
+
+ Start Conditions
+ introducing context into your scanners, and
+ managing "mini-scanners"
+
+ Multiple Input Buffers
+ how to manipulate multiple input sources; how to
+ scan from strings instead of files
+
+ End-of-file Rules
+ special rules for matching the end of the input
+
+ Miscellaneous Macros
+ a summary of macros available to the actions
+
+ Values Available To The User
+ a summary of values available to the actions
+
+ Interfacing With Yacc
+ connecting flex scanners together with yacc parsers
+
+ Options
+ flex command-line options, and the "%option"
+ directive
+
+ Performance Considerations
+ how to make your scanner go as fast as possible
+
+ Generating C++ Scanners
+ the (experimental) facility for generating C++
+ scanner classes
+
+ Incompatibilities With Lex And POSIX
+ how flex differs from AT&T lex and the POSIX lex
+ standard
+
+ Diagnostics
+ those error messages produced by flex (or scanners
+ it generates) whose meanings might not be apparent
+
+ Files
+ files used by flex
+
+ Deficiencies / Bugs
+ known problems with flex
+
+ See Also
+ other documentation, related tools
+
+ Author
+ includes contact information
+
+.fi
+.SH DESCRIPTION
+.I flex
+is a tool for generating
+.I scanners:
+programs which recognized lexical patterns in text.
+.I flex
+reads
+the given input files, or its standard input if no file names are given,
+for a description of a scanner to generate. The description is in
+the form of pairs
+of regular expressions and C code, called
+.I rules. flex
+generates as output a C source file,
+.B lex.yy.c,
+which defines a routine
+.B yylex().
+This file is compiled and linked with the
+.B \-lfl
+library to produce an executable. When the executable is run,
+it analyzes its input for occurrences
+of the regular expressions. Whenever it finds one, it executes
+the corresponding C code.
+.SH SOME SIMPLE EXAMPLES
+.PP
+First some simple examples to get the flavor of how one uses
+.I flex.
+The following
+.I flex
+input specifies a scanner which whenever it encounters the string
+"username" will replace it with the user's login name:
+.nf
+
+ %%
+ username printf( "%s", getlogin() );
+
+.fi
+By default, any text not matched by a
+.I flex
+scanner
+is copied to the output, so the net effect of this scanner is
+to copy its input file to its output with each occurrence
+of "username" expanded.
+In this input, there is just one rule. "username" is the
+.I pattern
+and the "printf" is the
+.I action.
+The "%%" marks the beginning of the rules.
+.PP
+Here's another simple example:
+.nf
+
+ int num_lines = 0, num_chars = 0;
+
+ %%
+ \\n ++num_lines; ++num_chars;
+ . ++num_chars;
+
+ %%
+ main()
+ {
+ yylex();
+ printf( "# of lines = %d, # of chars = %d\\n",
+ num_lines, num_chars );
+ }
+
+.fi
+This scanner counts the number of characters and the number
+of lines in its input (it produces no output other than the
+final report on the counts). The first line
+declares two globals, "num_lines" and "num_chars", which are accessible
+both inside
+.B yylex()
+and in the
+.B main()
+routine declared after the second "%%". There are two rules, one
+which matches a newline ("\\n") and increments both the line count and
+the character count, and one which matches any character other than
+a newline (indicated by the "." regular expression).
+.PP
+A somewhat more complicated example:
+.nf
+
+ /* scanner for a toy Pascal-like language */
+
+ %{
+ /* need this for the call to atof() below */
+ #include <math.h>
+ %}
+
+ DIGIT [0-9]
+ ID [a-z][a-z0-9]*
+
+ %%
+
+ {DIGIT}+ {
+ printf( "An integer: %s (%d)\\n", yytext,
+ atoi( yytext ) );
+ }
+
+ {DIGIT}+"."{DIGIT}* {
+ printf( "A float: %s (%g)\\n", yytext,
+ atof( yytext ) );
+ }
+
+ if|then|begin|end|procedure|function {
+ printf( "A keyword: %s\\n", yytext );
+ }
+
+ {ID} printf( "An identifier: %s\\n", yytext );
+
+ "+"|"-"|"*"|"/" printf( "An operator: %s\\n", yytext );
+
+ "{"[^}\\n]*"}" /* eat up one-line comments */
+
+ [ \\t\\n]+ /* eat up whitespace */
+
+ . printf( "Unrecognized character: %s\\n", yytext );
+
+ %%
+
+ main( argc, argv )
+ int argc;
+ char **argv;
+ {
+ ++argv, --argc; /* skip over program name */
+ if ( argc > 0 )
+ yyin = fopen( argv[0], "r" );
+ else
+ yyin = stdin;
+
+ yylex();
+ }
+
+.fi
+This is the beginnings of a simple scanner for a language like
+Pascal. It identifies different types of
+.I tokens
+and reports on what it has seen.
+.PP
+The details of this example will be explained in the following
+sections.
+.SH FORMAT OF THE INPUT FILE
+The
+.I flex
+input file consists of three sections, separated by a line with just
+.B %%
+in it:
+.nf
+
+ definitions
+ %%
+ rules
+ %%
+ user code
+
+.fi
+The
+.I definitions
+section contains declarations of simple
+.I name
+definitions to simplify the scanner specification, and declarations of
+.I start conditions,
+which are explained in a later section.
+.PP
+Name definitions have the form:
+.nf
+
+ name definition
+
+.fi
+The "name" is a word beginning with a letter or an underscore ('_')
+followed by zero or more letters, digits, '_', or '-' (dash).
+The definition is taken to begin at the first non-white-space character
+following the name and continuing to the end of the line.
+The definition can subsequently be referred to using "{name}", which
+will expand to "(definition)". For example,
+.nf
+
+ DIGIT [0-9]
+ ID [a-z][a-z0-9]*
+
+.fi
+defines "DIGIT" to be a regular expression which matches a
+single digit, and
+"ID" to be a regular expression which matches a letter
+followed by zero-or-more letters-or-digits.
+A subsequent reference to
+.nf
+
+ {DIGIT}+"."{DIGIT}*
+
+.fi
+is identical to
+.nf
+
+ ([0-9])+"."([0-9])*
+
+.fi
+and matches one-or-more digits followed by a '.' followed
+by zero-or-more digits.
+.PP
+The
+.I rules
+section of the
+.I flex
+input contains a series of rules of the form:
+.nf
+
+ pattern action
+
+.fi
+where the pattern must be unindented and the action must begin
+on the same line.
+.PP
+See below for a further description of patterns and actions.
+.PP
+Finally, the user code section is simply copied to
+.B lex.yy.c
+verbatim.
+It is used for companion routines which call or are called
+by the scanner. The presence of this section is optional;
+if it is missing, the second
+.B %%
+in the input file may be skipped, too.
+.PP
+In the definitions and rules sections, any
+.I indented
+text or text enclosed in
+.B %{
+and
+.B %}
+is copied verbatim to the output (with the %{}'s removed).
+The %{}'s must appear unindented on lines by themselves.
+.PP
+In the rules section,
+any indented or %{} text appearing before the
+first rule may be used to declare variables
+which are local to the scanning routine and (after the declarations)
+code which is to be executed whenever the scanning routine is entered.
+Other indented or %{} text in the rule section is still copied to the output,
+but its meaning is not well-defined and it may well cause compile-time
+errors (this feature is present for
+.I POSIX
+compliance; see below for other such features).
+.PP
+In the definitions section (but not in the rules section),
+an unindented comment (i.e., a line
+beginning with "/*") is also copied verbatim to the output up
+to the next "*/".
+.SH PATTERNS
+The patterns in the input are written using an extended set of regular
+expressions. These are:
+.nf
+
+ x match the character 'x'
+ . any character (byte) except newline
+ [xyz] a "character class"; in this case, the pattern
+ matches either an 'x', a 'y', or a 'z'
+ [abj-oZ] a "character class" with a range in it; matches
+ an 'a', a 'b', any letter from 'j' through 'o',
+ or a 'Z'
+ [^A-Z] a "negated character class", i.e., any character
+ but those in the class. In this case, any
+ character EXCEPT an uppercase letter.
+ [^A-Z\\n] any character EXCEPT an uppercase letter or
+ a newline
+ r* zero or more r's, where r is any regular expression
+ r+ one or more r's
+ r? zero or one r's (that is, "an optional r")
+ r{2,5} anywhere from two to five r's
+ r{2,} two or more r's
+ r{4} exactly 4 r's
+ {name} the expansion of the "name" definition
+ (see above)
+ "[xyz]\\"foo"
+ the literal string: [xyz]"foo
+ \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
+ then the ANSI-C interpretation of \\x.
+ Otherwise, a literal 'X' (used to escape
+ operators such as '*')
+ \\0 a NUL character (ASCII code 0)
+ \\123 the character with octal value 123
+ \\x2a the character with hexadecimal value 2a
+ (r) match an r; parentheses are used to override
+ precedence (see below)
+
+
+ rs the regular expression r followed by the
+ regular expression s; called "concatenation"
+
+
+ r|s either an r or an s
+
+
+ r/s an r but only if it is followed by an s. The
+ text matched by s is included when determining
+ whether this rule is the "longest match",
+ but is then returned to the input before
+ the action is executed. So the action only
+ sees the text matched by r. This type
+ of pattern is called trailing context".
+ (There are some combinations of r/s that flex
+ cannot match correctly; see notes in the
+ Deficiencies / Bugs section below regarding
+ "dangerous trailing context".)
+ ^r an r, but only at the beginning of a line (i.e.,
+ which just starting to scan, or right after a
+ newline has been scanned).
+ r$ an r, but only at the end of a line (i.e., just
+ before a newline). Equivalent to "r/\\n".
+
+ Note that flex's notion of "newline" is exactly
+ whatever the C compiler used to compile flex
+ interprets '\\n' as; in particular, on some DOS
+ systems you must either filter out \\r's in the
+ input yourself, or explicitly use r/\\r\\n for "r$".
+
+
+ <s>r an r, but only in start condition s (see
+ below for discussion of start conditions)
+ <s1,s2,s3>r
+ same, but in any of start conditions s1,
+ s2, or s3
+ <*>r an r in any start condition, even an exclusive one.
+
+
+ <<EOF>> an end-of-file
+ <s1,s2><<EOF>>
+ an end-of-file when in start condition s1 or s2
+
+.fi
+Note that inside of a character class, all regular expression operators
+lose their special meaning except escape ('\\') and the character class
+operators, '-', ']', and, at the beginning of the class, '^'.
+.PP
+The regular expressions listed above are grouped according to
+precedence, from highest precedence at the top to lowest at the bottom.
+Those grouped together have equal precedence. For example,
+.nf
+
+ foo|bar*
+
+.fi
+is the same as
+.nf
+
+ (foo)|(ba(r*))
+
+.fi
+since the '*' operator has higher precedence than concatenation,
+and concatenation higher than alternation ('|'). This pattern
+therefore matches
+.I either
+the string "foo"
+.I or
+the string "ba" followed by zero-or-more r's.
+To match "foo" or zero-or-more "bar"'s, use:
+.nf
+
+ foo|(bar)*
+
+.fi
+and to match zero-or-more "foo"'s-or-"bar"'s:
+.nf
+
+ (foo|bar)*
+
+.fi
+.PP
+In addition to characters and ranges of characters, character classes
+can also contain character class
+.I expressions.
+These are expressions enclosed inside
+.B [:
+and
+.B :]
+delimiters (which themselves must appear between the '[' and ']' of the
+character class; other elements may occur inside the character class, too).
+The valid expressions are:
+.nf
+
+ [:alnum:] [:alpha:] [:blank:]
+ [:cntrl:] [:digit:] [:graph:]
+ [:lower:] [:print:] [:punct:]
+ [:space:] [:upper:] [:xdigit:]
+
+.fi
+These expressions all designate a set of characters equivalent to
+the corresponding standard C
+.B isXXX
+function. For example,
+.B [:alnum:]
+designates those characters for which
+.B isalnum()
+returns true - i.e., any alphabetic or numeric.
+Some systems don't provide
+.B isblank(),
+so flex defines
+.B [:blank:]
+as a blank or a tab.
+.PP
+For example, the following character classes are all equivalent:
+.nf
+
+ [[:alnum:]]
+ [[:alpha:][:digit:]
+ [[:alpha:]0-9]
+ [a-zA-Z0-9]
+
+.fi
+If your scanner is case-insensitive (the
+.B \-i
+flag), then
+.B [:upper:]
+and
+.B [:lower:]
+are equivalent to
+.B [:alpha:].
+.PP
+Some notes on patterns:
+.IP -
+A negated character class such as the example "[^A-Z]"
+above
+.I will match a newline
+unless "\\n" (or an equivalent escape sequence) is one of the
+characters explicitly present in the negated character class
+(e.g., "[^A-Z\\n]"). This is unlike how many other regular
+expression tools treat negated character classes, but unfortunately
+the inconsistency is historically entrenched.
+Matching newlines means that a pattern like [^"]* can match the entire
+input unless there's another quote in the input.
+.IP -
+A rule can have at most one instance of trailing context (the '/' operator
+or the '$' operator). The start condition, '^', and "<<EOF>>" patterns
+can only occur at the beginning of a pattern, and, as well as with '/' and '$',
+cannot be grouped inside parentheses. A '^' which does not occur at
+the beginning of a rule or a '$' which does not occur at the end of
+a rule loses its special properties and is treated as a normal character.
+.IP
+The following are illegal:
+.nf
+
+ foo/bar$
+ <sc1>foo<sc2>bar
+
+.fi
+Note that the first of these, can be written "foo/bar\\n".
+.IP
+The following will result in '$' or '^' being treated as a normal character:
+.nf
+
+ foo|(bar$)
+ foo|^bar
+
+.fi
+If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
+could be used (the special '|' action is explained below):
+.nf
+
+ foo |
+ bar$ /* action goes here */
+
+.fi
+A similar trick will work for matching a foo or a
+bar-at-the-beginning-of-a-line.
+.SH HOW THE INPUT IS MATCHED
+When the generated scanner is run, it analyzes its input looking
+for strings which match any of its patterns. If it finds more than
+one match, it takes the one matching the most text (for trailing
+context rules, this includes the length of the trailing part, even
+though it will then be returned to the input). If it finds two
+or more matches of the same length, the
+rule listed first in the
+.I flex
+input file is chosen.
+.PP
+Once the match is determined, the text corresponding to the match
+(called the
+.I token)
+is made available in the global character pointer
+.B yytext,
+and its length in the global integer
+.B yyleng.
+The
+.I action
+corresponding to the matched pattern is then executed (a more
+detailed description of actions follows), and then the remaining
+input is scanned for another match.
+.PP
+If no match is found, then the
+.I default rule
+is executed: the next character in the input is considered matched and
+copied to the standard output. Thus, the simplest legal
+.I flex
+input is:
+.nf
+
+ %%
+
+.fi
+which generates a scanner that simply copies its input (one character
+at a time) to its output.
+.PP
+Note that
+.B yytext
+can be defined in two different ways: either as a character
+.I pointer
+or as a character
+.I array.
+You can control which definition
+.I flex
+uses by including one of the special directives
+.B %pointer
+or
+.B %array
+in the first (definitions) section of your flex input. The default is
+.B %pointer,
+unless you use the
+.B -l
+lex compatibility option, in which case
+.B yytext
+will be an array.
+The advantage of using
+.B %pointer
+is substantially faster scanning and no buffer overflow when matching
+very large tokens (unless you run out of dynamic memory). The disadvantage
+is that you are restricted in how your actions can modify
+.B yytext
+(see the next section), and calls to the
+.B unput()
+function destroys the present contents of
+.B yytext,
+which can be a considerable porting headache when moving between different
+.I lex
+versions.
+.PP
+The advantage of
+.B %array
+is that you can then modify
+.B yytext
+to your heart's content, and calls to
+.B unput()
+do not destroy
+.B yytext
+(see below). Furthermore, existing
+.I lex
+programs sometimes access
+.B yytext
+externally using declarations of the form:
+.nf
+ extern char yytext[];
+.fi
+This definition is erroneous when used with
+.B %pointer,
+but correct for
+.B %array.
+.PP
+.B %array
+defines
+.B yytext
+to be an array of
+.B YYLMAX
+characters, which defaults to a fairly large value. You can change
+the size by simply #define'ing
+.B YYLMAX
+to a different value in the first section of your
+.I flex
+input. As mentioned above, with
+.B %pointer
+yytext grows dynamically to accommodate large tokens. While this means your
+.B %pointer
+scanner can accommodate very large tokens (such as matching entire blocks
+of comments), bear in mind that each time the scanner must resize
+.B yytext
+it also must rescan the entire token from the beginning, so matching such
+tokens can prove slow.
+.B yytext
+presently does
+.I not
+dynamically grow if a call to
+.B unput()
+results in too much text being pushed back; instead, a run-time error results.
+.PP
+Also note that you cannot use
+.B %array
+with C++ scanner classes
+(the
+.B c++
+option; see below).
+.SH ACTIONS
+Each pattern in a rule has a corresponding action, which can be any
+arbitrary C statement. The pattern ends at the first non-escaped
+whitespace character; the remainder of the line is its action. If the
+action is empty, then when the pattern is matched the input token
+is simply discarded. For example, here is the specification for a program
+which deletes all occurrences of "zap me" from its input:
+.nf
+
+ %%
+ "zap me"
+
+.fi
+(It will copy all other characters in the input to the output since
+they will be matched by the default rule.)
+.PP
+Here is a program which compresses multiple blanks and tabs down to
+a single blank, and throws away whitespace found at the end of a line:
+.nf
+
+ %%
+ [ \\t]+ putchar( ' ' );
+ [ \\t]+$ /* ignore this token */
+
+.fi
+.PP
+If the action contains a '{', then the action spans till the balancing '}'
+is found, and the action may cross multiple lines.
+.I flex
+knows about C strings and comments and won't be fooled by braces found
+within them, but also allows actions to begin with
+.B %{
+and will consider the action to be all the text up to the next
+.B %}
+(regardless of ordinary braces inside the action).
+.PP
+An action consisting solely of a vertical bar ('|') means "same as
+the action for the next rule." See below for an illustration.
+.PP
+Actions can include arbitrary C code, including
+.B return
+statements to return a value to whatever routine called
+.B yylex().
+Each time
+.B yylex()
+is called it continues processing tokens from where it last left
+off until it either reaches
+the end of the file or executes a return.
+.PP
+Actions are free to modify
+.B yytext
+except for lengthening it (adding
+characters to its end--these will overwrite later characters in the
+input stream). This however does not apply when using
+.B %array
+(see above); in that case,
+.B yytext
+may be freely modified in any way.
+.PP
+Actions are free to modify
+.B yyleng
+except they should not do so if the action also includes use of
+.B yymore()
+(see below).
+.PP
+There are a number of special directives which can be included within
+an action:
+.IP -
+.B ECHO
+copies yytext to the scanner's output.
+.IP -
+.B BEGIN
+followed by the name of a start condition places the scanner in the
+corresponding start condition (see below).
+.IP -
+.B REJECT
+directs the scanner to proceed on to the "second best" rule which matched the
+input (or a prefix of the input). The rule is chosen as described
+above in "How the Input is Matched", and
+.B yytext
+and
+.B yyleng
+set up appropriately.
+It may either be one which matched as much text
+as the originally chosen rule but came later in the
+.I flex
+input file, or one which matched less text.
+For example, the following will both count the
+words in the input and call the routine special() whenever "frob" is seen:
+.nf
+
+ int word_count = 0;
+ %%
+
+ frob special(); REJECT;
+ [^ \\t\\n]+ ++word_count;
+
+.fi
+Without the
+.B REJECT,
+any "frob"'s in the input would not be counted as words, since the
+scanner normally executes only one action per token.
+Multiple
+.B REJECT's
+are allowed, each one finding the next best choice to the currently
+active rule. For example, when the following scanner scans the token
+"abcd", it will write "abcdabcaba" to the output:
+.nf
+
+ %%
+ a |
+ ab |
+ abc |
+ abcd ECHO; REJECT;
+ .|\\n /* eat up any unmatched character */
+
+.fi
+(The first three rules share the fourth's action since they use
+the special '|' action.)
+.B REJECT
+is a particularly expensive feature in terms of scanner performance;
+if it is used in
+.I any
+of the scanner's actions it will slow down
+.I all
+of the scanner's matching. Furthermore,
+.B REJECT
+cannot be used with the
+.I -Cf
+or
+.I -CF
+options (see below).
+.IP
+Note also that unlike the other special actions,
+.B REJECT
+is a
+.I branch;
+code immediately following it in the action will
+.I not
+be executed.
+.IP -
+.B yymore()
+tells the scanner that the next time it matches a rule, the corresponding
+token should be
+.I appended
+onto the current value of
+.B yytext
+rather than replacing it. For example, given the input "mega-kludge"
+the following will write "mega-mega-kludge" to the output:
+.nf
+
+ %%
+ mega- ECHO; yymore();
+ kludge ECHO;
+
+.fi
+First "mega-" is matched and echoed to the output. Then "kludge"
+is matched, but the previous "mega-" is still hanging around at the
+beginning of
+.B yytext
+so the
+.B ECHO
+for the "kludge" rule will actually write "mega-kludge".
+.PP
+Two notes regarding use of
+.B yymore().
+First,
+.B yymore()
+depends on the value of
+.I yyleng
+correctly reflecting the size of the current token, so you must not
+modify
+.I yyleng
+if you are using
+.B yymore().
+Second, the presence of
+.B yymore()
+in the scanner's action entails a minor performance penalty in the
+scanner's matching speed.
+.IP -
+.B yyless(n)
+returns all but the first
+.I n
+characters of the current token back to the input stream, where they
+will be rescanned when the scanner looks for the next match.
+.B yytext
+and
+.B yyleng
+are adjusted appropriately (e.g.,
+.B yyleng
+will now be equal to
+.I n
+). For example, on the input "foobar" the following will write out
+"foobarbar":
+.nf
+
+ %%
+ foobar ECHO; yyless(3);
+ [a-z]+ ECHO;
+
+.fi
+An argument of 0 to
+.B yyless
+will cause the entire current input string to be scanned again. Unless you've
+changed how the scanner will subsequently process its input (using
+.B BEGIN,
+for example), this will result in an endless loop.
+.PP
+Note that
+.B yyless
+is a macro and can only be used in the flex input file, not from
+other source files.
+.IP -
+.B unput(c)
+puts the character
+.I c
+back onto the input stream. It will be the next character scanned.
+The following action will take the current token and cause it
+to be rescanned enclosed in parentheses.
+.nf
+
+ {
+ int i;
+ /* Copy yytext because unput() trashes yytext */
+ char *yycopy = strdup( yytext );
+ unput( ')' );
+ for ( i = yyleng - 1; i >= 0; --i )
+ unput( yycopy[i] );
+ unput( '(' );
+ free( yycopy );
+ }
+
+.fi
+Note that since each
+.B unput()
+puts the given character back at the
+.I beginning
+of the input stream, pushing back strings must be done back-to-front.
+.PP
+An important potential problem when using
+.B unput()
+is that if you are using
+.B %pointer
+(the default), a call to
+.B unput()
+.I destroys
+the contents of
+.I yytext,
+starting with its rightmost character and devouring one character to
+the left with each call. If you need the value of yytext preserved
+after a call to
+.B unput()
+(as in the above example),
+you must either first copy it elsewhere, or build your scanner using
+.B %array
+instead (see How The Input Is Matched).
+.PP
+Finally, note that you cannot put back
+.B EOF
+to attempt to mark the input stream with an end-of-file.
+.IP -
+.B input()
+reads the next character from the input stream. For example,
+the following is one way to eat up C comments:
+.nf
+
+ %%
+ "/*" {
+ register int c;
+
+ for ( ; ; )
+ {
+ while ( (c = input()) != '*' &&
+ c != EOF )
+ ; /* eat up text of comment */
+
+ if ( c == '*' )
+ {
+ while ( (c = input()) == '*' )
+ ;
+ if ( c == '/' )
+ break; /* found the end */
+ }
+
+ if ( c == EOF )
+ {
+ error( "EOF in comment" );
+ break;
+ }
+ }
+ }
+
+.fi
+(Note that if the scanner is compiled using
+.B C++,
+then
+.B input()
+is instead referred to as
+.B yyinput(),
+in order to avoid a name clash with the
+.B C++
+stream by the name of
+.I input.)
+.IP -
+.B YY_FLUSH_BUFFER
+flushes the scanner's internal buffer
+so that the next time the scanner attempts to match a token, it will
+first refill the buffer using
+.B YY_INPUT
+(see The Generated Scanner, below). This action is a special case
+of the more general
+.B yy_flush_buffer()
+function, described below in the section Multiple Input Buffers.
+.IP -
+.B yyterminate()
+can be used in lieu of a return statement in an action. It terminates
+the scanner and returns a 0 to the scanner's caller, indicating "all done".
+By default,
+.B yyterminate()
+is also called when an end-of-file is encountered. It is a macro and
+may be redefined.
+.SH THE GENERATED SCANNER
+The output of
+.I flex
+is the file
+.B lex.yy.c,
+which contains the scanning routine
+.B yylex(),
+a number of tables used by it for matching tokens, and a number
+of auxiliary routines and macros. By default,
+.B yylex()
+is declared as follows:
+.nf
+
+ int yylex()
+ {
+ ... various definitions and the actions in here ...
+ }
+
+.fi
+(If your environment supports function prototypes, then it will
+be "int yylex( void )".) This definition may be changed by defining
+the "YY_DECL" macro. For example, you could use:
+.nf
+
+ #define YY_DECL float lexscan( a, b ) float a, b;
+
+.fi
+to give the scanning routine the name
+.I lexscan,
+returning a float, and taking two floats as arguments. Note that
+if you give arguments to the scanning routine using a
+K&R-style/non-prototyped function declaration, you must terminate
+the definition with a semi-colon (;).
+.PP
+Whenever
+.B yylex()
+is called, it scans tokens from the global input file
+.I yyin
+(which defaults to stdin). It continues until it either reaches
+an end-of-file (at which point it returns the value 0) or
+one of its actions executes a
+.I return
+statement.
+.PP
+If the scanner reaches an end-of-file, subsequent calls are undefined
+unless either
+.I yyin
+is pointed at a new input file (in which case scanning continues from
+that file), or
+.B yyrestart()
+is called.
+.B yyrestart()
+takes one argument, a
+.B FILE *
+pointer (which can be nil, if you've set up
+.B YY_INPUT
+to scan from a source other than
+.I yyin),
+and initializes
+.I yyin
+for scanning from that file. Essentially there is no difference between
+just assigning
+.I yyin
+to a new input file or using
+.B yyrestart()
+to do so; the latter is available for compatibility with previous versions
+of
+.I flex,
+and because it can be used to switch input files in the middle of scanning.
+It can also be used to throw away the current input buffer, by calling
+it with an argument of
+.I yyin;
+but better is to use
+.B YY_FLUSH_BUFFER
+(see above).
+Note that
+.B yyrestart()
+does
+.I not
+reset the start condition to
+.B INITIAL
+(see Start Conditions, below).
+.PP
+If
+.B yylex()
+stops scanning due to executing a
+.I return
+statement in one of the actions, the scanner may then be called again and it
+will resume scanning where it left off.
+.PP
+By default (and for purposes of efficiency), the scanner uses
+block-reads rather than simple
+.I getc()
+calls to read characters from
+.I yyin.
+The nature of how it gets its input can be controlled by defining the
+.B YY_INPUT
+macro.
+YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
+action is to place up to
+.I max_size
+characters in the character array
+.I buf
+and return in the integer variable
+.I result
+either the
+number of characters read or the constant YY_NULL (0 on Unix systems)
+to indicate EOF. The default YY_INPUT reads from the
+global file-pointer "yyin".
+.PP
+A sample definition of YY_INPUT (in the definitions
+section of the input file):
+.nf
+
+ %{
+ #define YY_INPUT(buf,result,max_size) \\
+ { \\
+ int c = getchar(); \\
+ result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
+ }
+ %}
+
+.fi
+This definition will change the input processing to occur
+one character at a time.
+.PP
+When the scanner receives an end-of-file indication from YY_INPUT,
+it then checks the
+.B yywrap()
+function. If
+.B yywrap()
+returns false (zero), then it is assumed that the
+function has gone ahead and set up
+.I yyin
+to point to another input file, and scanning continues. If it returns
+true (non-zero), then the scanner terminates, returning 0 to its
+caller. Note that in either case, the start condition remains unchanged;
+it does
+.I not
+revert to
+.B INITIAL.
+.PP
+If you do not supply your own version of
+.B yywrap(),
+then you must either use
+.B %option noyywrap
+(in which case the scanner behaves as though
+.B yywrap()
+returned 1), or you must link with
+.B \-lfl
+to obtain the default version of the routine, which always returns 1.
+.PP
+Three routines are available for scanning from in-memory buffers rather
+than files:
+.B yy_scan_string(), yy_scan_bytes(),
+and
+.B yy_scan_buffer().
+See the discussion of them below in the section Multiple Input Buffers.
+.PP
+The scanner writes its
+.B ECHO
+output to the
+.I yyout
+global (default, stdout), which may be redefined by the user simply
+by assigning it to some other
+.B FILE
+pointer.
+.SH START CONDITIONS
+.I flex
+provides a mechanism for conditionally activating rules. Any rule
+whose pattern is prefixed with "<sc>" will only be active when
+the scanner is in the start condition named "sc". For example,
+.nf
+
+ <STRING>[^"]* { /* eat up the string body ... */
+ ...
+ }
+
+.fi
+will be active only when the scanner is in the "STRING" start
+condition, and
+.nf
+
+ <INITIAL,STRING,QUOTE>\\. { /* handle an escape ... */
+ ...
+ }
+
+.fi
+will be active only when the current start condition is
+either "INITIAL", "STRING", or "QUOTE".
+.PP
+Start conditions
+are declared in the definitions (first) section of the input
+using unindented lines beginning with either
+.B %s
+or
+.B %x
+followed by a list of names.
+The former declares
+.I inclusive
+start conditions, the latter
+.I exclusive
+start conditions. A start condition is activated using the
+.B BEGIN
+action. Until the next
+.B BEGIN
+action is executed, rules with the given start
+condition will be active and
+rules with other start conditions will be inactive.
+If the start condition is
+.I inclusive,
+then rules with no start conditions at all will also be active.
+If it is
+.I exclusive,
+then
+.I only
+rules qualified with the start condition will be active.
+A set of rules contingent on the same exclusive start condition
+describe a scanner which is independent of any of the other rules in the
+.I flex
+input. Because of this,
+exclusive start conditions make it easy to specify "mini-scanners"
+which scan portions of the input that are syntactically different
+from the rest (e.g., comments).
+.PP
+If the distinction between inclusive and exclusive start conditions
+is still a little vague, here's a simple example illustrating the
+connection between the two. The set of rules:
+.nf
+
+ %s example
+ %%
+
+ <example>foo do_something();
+
+ bar something_else();
+
+.fi
+is equivalent to
+.nf
+
+ %x example
+ %%
+
+ <example>foo do_something();
+
+ <INITIAL,example>bar something_else();
+
+.fi
+Without the
+.B <INITIAL,example>
+qualifier, the
+.I bar
+pattern in the second example wouldn't be active (i.e., couldn't match)
+when in start condition
+.B example.
+If we just used
+.B <example>
+to qualify
+.I bar,
+though, then it would only be active in
+.B example
+and not in
+.B INITIAL,
+while in the first example it's active in both, because in the first
+example the
+.B example
+startion condition is an
+.I inclusive
+.B (%s)
+start condition.
+.PP
+Also note that the special start-condition specifier
+.B <*>
+matches every start condition. Thus, the above example could also
+have been written;
+.nf
+
+ %x example
+ %%
+
+ <example>foo do_something();
+
+ <*>bar something_else();
+
+.fi
+.PP
+The default rule (to
+.B ECHO
+any unmatched character) remains active in start conditions. It
+is equivalent to:
+.nf
+
+ <*>.|\\n ECHO;
+
+.fi
+.PP
+.B BEGIN(0)
+returns to the original state where only the rules with
+no start conditions are active. This state can also be
+referred to as the start-condition "INITIAL", so
+.B BEGIN(INITIAL)
+is equivalent to
+.B BEGIN(0).
+(The parentheses around the start condition name are not required but
+are considered good style.)
+.PP
+.B BEGIN
+actions can also be given as indented code at the beginning
+of the rules section. For example, the following will cause
+the scanner to enter the "SPECIAL" start condition whenever
+.B yylex()
+is called and the global variable
+.I enter_special
+is true:
+.nf
+
+ int enter_special;
+
+ %x SPECIAL
+ %%
+ if ( enter_special )
+ BEGIN(SPECIAL);
+
+ <SPECIAL>blahblahblah
+ ...more rules follow...
+
+.fi
+.PP
+To illustrate the uses of start conditions,
+here is a scanner which provides two different interpretations
+of a string like "123.456". By default it will treat it as
+three tokens, the integer "123", a dot ('.'), and the integer "456".
+But if the string is preceded earlier in the line by the string
+"expect-floats"
+it will treat it as a single token, the floating-point number
+123.456:
+.nf
+
+ %{
+ #include <math.h>
+ %}
+ %s expect
+
+ %%
+ expect-floats BEGIN(expect);
+
+ <expect>[0-9]+"."[0-9]+ {
+ printf( "found a float, = %f\\n",
+ atof( yytext ) );
+ }
+ <expect>\\n {
+ /* that's the end of the line, so
+ * we need another "expect-number"
+ * before we'll recognize any more
+ * numbers
+ */
+ BEGIN(INITIAL);
+ }
+
+ [0-9]+ {
+ printf( "found an integer, = %d\\n",
+ atoi( yytext ) );
+ }
+
+ "." printf( "found a dot\\n" );
+
+.fi
+Here is a scanner which recognizes (and discards) C comments while
+maintaining a count of the current input line.
+.nf
+
+ %x comment
+ %%
+ int line_num = 1;
+
+ "/*" BEGIN(comment);
+
+ <comment>[^*\\n]* /* eat anything that's not a '*' */
+ <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
+ <comment>\\n ++line_num;
+ <comment>"*"+"/" BEGIN(INITIAL);
+
+.fi
+This scanner goes to a bit of trouble to match as much
+text as possible with each rule. In general, when attempting to write
+a high-speed scanner try to match as much possible in each rule, as
+it's a big win.
+.PP
+Note that start-conditions names are really integer values and
+can be stored as such. Thus, the above could be extended in the
+following fashion:
+.nf
+
+ %x comment foo
+ %%
+ int line_num = 1;
+ int comment_caller;
+
+ "/*" {
+ comment_caller = INITIAL;
+ BEGIN(comment);
+ }
+
+ ...
+
+ <foo>"/*" {
+ comment_caller = foo;
+ BEGIN(comment);
+ }
+
+ <comment>[^*\\n]* /* eat anything that's not a '*' */
+ <comment>"*"+[^*/\\n]* /* eat up '*'s not followed by '/'s */
+ <comment>\\n ++line_num;
+ <comment>"*"+"/" BEGIN(comment_caller);
+
+.fi
+Furthermore, you can access the current start condition using
+the integer-valued
+.B YY_START
+macro. For example, the above assignments to
+.I comment_caller
+could instead be written
+.nf
+
+ comment_caller = YY_START;
+
+.fi
+Flex provides
+.B YYSTATE
+as an alias for
+.B YY_START
+(since that is what's used by AT&T
+.I lex).
+.PP
+Note that start conditions do not have their own name-space; %s's and %x's
+declare names in the same fashion as #define's.
+.PP
+Finally, here's an example of how to match C-style quoted strings using
+exclusive start conditions, including expanded escape sequences (but
+not including checking for a string that's too long):
+.nf
+
+ %x str
+
+ %%
+ char string_buf[MAX_STR_CONST];
+ char *string_buf_ptr;
+
+
+ \\" string_buf_ptr = string_buf; BEGIN(str);
+
+ <str>\\" { /* saw closing quote - all done */
+ BEGIN(INITIAL);
+ *string_buf_ptr = '\\0';
+ /* return string constant token type and
+ * value to parser
+ */
+ }
+
+ <str>\\n {
+ /* error - unterminated string constant */
+ /* generate error message */
+ }
+
+ <str>\\\\[0-7]{1,3} {
+ /* octal escape sequence */
+ int result;
+
+ (void) sscanf( yytext + 1, "%o", &result );
+
+ if ( result > 0xff )
+ /* error, constant is out-of-bounds */
+
+ *string_buf_ptr++ = result;
+ }
+
+ <str>\\\\[0-9]+ {
+ /* generate error - bad escape sequence; something
+ * like '\\48' or '\\0777777'
+ */
+ }
+
+ <str>\\\\n *string_buf_ptr++ = '\\n';
+ <str>\\\\t *string_buf_ptr++ = '\\t';
+ <str>\\\\r *string_buf_ptr++ = '\\r';
+ <str>\\\\b *string_buf_ptr++ = '\\b';
+ <str>\\\\f *string_buf_ptr++ = '\\f';
+
+ <str>\\\\(.|\\n) *string_buf_ptr++ = yytext[1];
+
+ <str>[^\\\\\\n\\"]+ {
+ char *yptr = yytext;
+
+ while ( *yptr )
+ *string_buf_ptr++ = *yptr++;
+ }
+
+.fi
+.PP
+Often, such as in some of the examples above, you wind up writing a
+whole bunch of rules all preceded by the same start condition(s). Flex
+makes this a little easier and cleaner by introducing a notion of
+start condition
+.I scope.
+A start condition scope is begun with:
+.nf
+
+ <SCs>{
+
+.fi
+where
+.I SCs
+is a list of one or more start conditions. Inside the start condition
+scope, every rule automatically has the prefix
+.I <SCs>
+applied to it, until a
+.I '}'
+which matches the initial
+.I '{'.
+So, for example,
+.nf
+
+ <ESC>{
+ "\\\\n" return '\\n';
+ "\\\\r" return '\\r';
+ "\\\\f" return '\\f';
+ "\\\\0" return '\\0';
+ }
+
+.fi
+is equivalent to:
+.nf
+
+ <ESC>"\\\\n" return '\\n';
+ <ESC>"\\\\r" return '\\r';
+ <ESC>"\\\\f" return '\\f';
+ <ESC>"\\\\0" return '\\0';
+
+.fi
+Start condition scopes may be nested.
+.PP
+Three routines are available for manipulating stacks of start conditions:
+.TP
+.B void yy_push_state(int new_state)
+pushes the current start condition onto the top of the start condition
+stack and switches to
+.I new_state
+as though you had used
+.B BEGIN new_state
+(recall that start condition names are also integers).
+.TP
+.B void yy_pop_state()
+pops the top of the stack and switches to it via
+.B BEGIN.
+.TP
+.B int yy_top_state()
+returns the top of the stack without altering the stack's contents.
+.PP
+The start condition stack grows dynamically and so has no built-in
+size limitation. If memory is exhausted, program execution aborts.
+.PP
+To use start condition stacks, your scanner must include a
+.B %option stack
+directive (see Options below).
+.SH MULTIPLE INPUT BUFFERS
+Some scanners (such as those which support "include" files)
+require reading from several input streams. As
+.I flex
+scanners do a large amount of buffering, one cannot control
+where the next input will be read from by simply writing a
+.B YY_INPUT
+which is sensitive to the scanning context.
+.B YY_INPUT
+is only called when the scanner reaches the end of its buffer, which
+may be a long time after scanning a statement such as an "include"
+which requires switching the input source.
+.PP
+To negotiate these sorts of problems,
+.I flex
+provides a mechanism for creating and switching between multiple
+input buffers. An input buffer is created by using:
+.nf
+
+ YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
+
+.fi
+which takes a
+.I FILE
+pointer and a size and creates a buffer associated with the given
+file and large enough to hold
+.I size
+characters (when in doubt, use
+.B YY_BUF_SIZE
+for the size). It returns a
+.B YY_BUFFER_STATE
+handle, which may then be passed to other routines (see below). The
+.B YY_BUFFER_STATE
+type is a pointer to an opaque
+.B struct yy_buffer_state
+structure, so you may safely initialize YY_BUFFER_STATE variables to
+.B ((YY_BUFFER_STATE) 0)
+if you wish, and also refer to the opaque structure in order to
+correctly declare input buffers in source files other than that
+of your scanner. Note that the
+.I FILE
+pointer in the call to
+.B yy_create_buffer
+is only used as the value of
+.I yyin
+seen by
+.B YY_INPUT;
+if you redefine
+.B YY_INPUT
+so it no longer uses
+.I yyin,
+then you can safely pass a nil
+.I FILE
+pointer to
+.B yy_create_buffer.
+You select a particular buffer to scan from using:
+.nf
+
+ void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
+
+.fi
+switches the scanner's input buffer so subsequent tokens will
+come from
+.I new_buffer.
+Note that
+.B yy_switch_to_buffer()
+may be used by yywrap() to set things up for continued scanning, instead
+of opening a new file and pointing
+.I yyin
+at it. Note also that switching input sources via either
+.B yy_switch_to_buffer()
+or
+.B yywrap()
+does
+.I not
+change the start condition.
+.nf
+
+ void yy_delete_buffer( YY_BUFFER_STATE buffer )
+
+.fi
+is used to reclaim the storage associated with a buffer. (
+.B buffer
+can be nil, in which case the routine does nothing.)
+You can also clear the current contents of a buffer using:
+.nf
+
+ void yy_flush_buffer( YY_BUFFER_STATE buffer )
+
+.fi
+This function discards the buffer's contents,
+so the next time the scanner attempts to match a token from the
+buffer, it will first fill the buffer anew using
+.B YY_INPUT.
+.PP
+.B yy_new_buffer()
+is an alias for
+.B yy_create_buffer(),
+provided for compatibility with the C++ use of
+.I new
+and
+.I delete
+for creating and destroying dynamic objects.
+.PP
+Finally, the
+.B YY_CURRENT_BUFFER
+macro returns a
+.B YY_BUFFER_STATE
+handle to the current buffer.
+.PP
+Here is an example of using these features for writing a scanner
+which expands include files (the
+.B <<EOF>>
+feature is discussed below):
+.nf
+
+ /* the "incl" state is used for picking up the name
+ * of an include file
+ */
+ %x incl
+
+ %{
+ #define MAX_INCLUDE_DEPTH 10
+ YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
+ int include_stack_ptr = 0;
+ %}
+
+ %%
+ include BEGIN(incl);
+
+ [a-z]+ ECHO;
+ [^a-z\\n]*\\n? ECHO;
+
+ <incl>[ \\t]* /* eat the whitespace */
+ <incl>[^ \\t\\n]+ { /* got the include file name */
+ if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
+ {
+ fprintf( stderr, "Includes nested too deeply" );
+ exit( 1 );
+ }
+
+ include_stack[include_stack_ptr++] =
+ YY_CURRENT_BUFFER;
+
+ yyin = fopen( yytext, "r" );
+
+ if ( ! yyin )
+ error( ... );
+
+ yy_switch_to_buffer(
+ yy_create_buffer( yyin, YY_BUF_SIZE ) );
+
+ BEGIN(INITIAL);
+ }
+
+ <<EOF>> {
+ if ( --include_stack_ptr < 0 )
+ {
+ yyterminate();
+ }
+
+ else
+ {
+ yy_delete_buffer( YY_CURRENT_BUFFER );
+ yy_switch_to_buffer(
+ include_stack[include_stack_ptr] );
+ }
+ }
+
+.fi
+Three routines are available for setting up input buffers for
+scanning in-memory strings instead of files. All of them create
+a new input buffer for scanning the string, and return a corresponding
+.B YY_BUFFER_STATE
+handle (which you should delete with
+.B yy_delete_buffer()
+when done with it). They also switch to the new buffer using
+.B yy_switch_to_buffer(),
+so the next call to
+.B yylex()
+will start scanning the string.
+.TP
+.B yy_scan_string(const char *str)
+scans a NUL-terminated string.
+.TP
+.B yy_scan_bytes(const char *bytes, int len)
+scans
+.I len
+bytes (including possibly NUL's)
+starting at location
+.I bytes.
+.PP
+Note that both of these functions create and scan a
+.I copy
+of the string or bytes. (This may be desirable, since
+.B yylex()
+modifies the contents of the buffer it is scanning.) You can avoid the
+copy by using:
+.TP
+.B yy_scan_buffer(char *base, yy_size_t size)
+which scans in place the buffer starting at
+.I base,
+consisting of
+.I size
+bytes, the last two bytes of which
+.I must
+be
+.B YY_END_OF_BUFFER_CHAR
+(ASCII NUL).
+These last two bytes are not scanned; thus, scanning
+consists of
+.B base[0]
+through
+.B base[size-2],
+inclusive.
+.IP
+If you fail to set up
+.I base
+in this manner (i.e., forget the final two
+.B YY_END_OF_BUFFER_CHAR
+bytes), then
+.B yy_scan_buffer()
+returns a nil pointer instead of creating a new input buffer.
+.IP
+The type
+.B yy_size_t
+is an integral type to which you can cast an integer expression
+reflecting the size of the buffer.
+.SH END-OF-FILE RULES
+The special rule "<<EOF>>" indicates
+actions which are to be taken when an end-of-file is
+encountered and yywrap() returns non-zero (i.e., indicates
+no further files to process). The action must finish
+by doing one of four things:
+.IP -
+assigning
+.I yyin
+to a new input file (in previous versions of flex, after doing the
+assignment you had to call the special action
+.B YY_NEW_FILE;
+this is no longer necessary);
+.IP -
+executing a
+.I return
+statement;
+.IP -
+executing the special
+.B yyterminate()
+action;
+.IP -
+or, switching to a new buffer using
+.B yy_switch_to_buffer()
+as shown in the example above.
+.PP
+<<EOF>> rules may not be used with other
+patterns; they may only be qualified with a list of start
+conditions. If an unqualified <<EOF>> rule is given, it
+applies to
+.I all
+start conditions which do not already have <<EOF>> actions. To
+specify an <<EOF>> rule for only the initial start condition, use
+.nf
+
+ <INITIAL><<EOF>>
+
+.fi
+.PP
+These rules are useful for catching things like unclosed comments.
+An example:
+.nf
+
+ %x quote
+ %%
+
+ ...other rules for dealing with quotes...
+
+ <quote><<EOF>> {
+ error( "unterminated quote" );
+ yyterminate();
+ }
+ <<EOF>> {
+ if ( *++filelist )
+ yyin = fopen( *filelist, "r" );
+ else
+ yyterminate();
+ }
+
+.fi
+.SH MISCELLANEOUS MACROS
+The macro
+.B YY_USER_ACTION
+can be defined to provide an action
+which is always executed prior to the matched rule's action. For example,
+it could be #define'd to call a routine to convert yytext to lower-case.
+When
+.B YY_USER_ACTION
+is invoked, the variable
+.I yy_act
+gives the number of the matched rule (rules are numbered starting with 1).
+Suppose you want to profile how often each of your rules is matched. The
+following would do the trick:
+.nf
+
+ #define YY_USER_ACTION ++ctr[yy_act]
+
+.fi
+where
+.I ctr
+is an array to hold the counts for the different rules. Note that
+the macro
+.B YY_NUM_RULES
+gives the total number of rules (including the default rule, even if
+you use
+.B \-s),
+so a correct declaration for
+.I ctr
+is:
+.nf
+
+ int ctr[YY_NUM_RULES];
+
+.fi
+.PP
+The macro
+.B YY_USER_INIT
+may be defined to provide an action which is always executed before
+the first scan (and before the scanner's internal initializations are done).
+For example, it could be used to call a routine to read
+in a data table or open a logging file.
+.PP
+The macro
+.B yy_set_interactive(is_interactive)
+can be used to control whether the current buffer is considered
+.I interactive.
+An interactive buffer is processed more slowly,
+but must be used when the scanner's input source is indeed
+interactive to avoid problems due to waiting to fill buffers
+(see the discussion of the
+.B \-I
+flag below). A non-zero value
+in the macro invocation marks the buffer as interactive, a zero
+value as non-interactive. Note that use of this macro overrides
+.B %option always-interactive
+or
+.B %option never-interactive
+(see Options below).
+.B yy_set_interactive()
+must be invoked prior to beginning to scan the buffer that is
+(or is not) to be considered interactive.
+.PP
+The macro
+.B yy_set_bol(at_bol)
+can be used to control whether the current buffer's scanning
+context for the next token match is done as though at the
+beginning of a line. A non-zero macro argument makes rules anchored with
+'^' active, while a zero argument makes '^' rules inactive.
+.PP
+The macro
+.B YY_AT_BOL()
+returns true if the next token scanned from the current buffer
+will have '^' rules active, false otherwise.
+.PP
+In the generated scanner, the actions are all gathered in one large
+switch statement and separated using
+.B YY_BREAK,
+which may be redefined. By default, it is simply a "break", to separate
+each rule's action from the following rule's.
+Redefining
+.B YY_BREAK
+allows, for example, C++ users to
+#define YY_BREAK to do nothing (while being very careful that every
+rule ends with a "break" or a "return"!) to avoid suffering from
+unreachable statement warnings where because a rule's action ends with
+"return", the
+.B YY_BREAK
+is inaccessible.
+.SH VALUES AVAILABLE TO THE USER
+This section summarizes the various values available to the user
+in the rule actions.
+.IP -
+.B char *yytext
+holds the text of the current token. It may be modified but not lengthened
+(you cannot append characters to the end).
+.IP
+If the special directive
+.B %array
+appears in the first section of the scanner description, then
+.B yytext
+is instead declared
+.B char yytext[YYLMAX],
+where
+.B YYLMAX
+is a macro definition that you can redefine in the first section
+if you don't like the default value (generally 8KB). Using
+.B %array
+results in somewhat slower scanners, but the value of
+.B yytext
+becomes immune to calls to
+.I input()
+and
+.I unput(),
+which potentially destroy its value when
+.B yytext
+is a character pointer. The opposite of
+.B %array
+is
+.B %pointer,
+which is the default.
+.IP
+You cannot use
+.B %array
+when generating C++ scanner classes
+(the
+.B \-+
+flag).
+.IP -
+.B int yyleng
+holds the length of the current token.
+.IP -
+.B FILE *yyin
+is the file which by default
+.I flex
+reads from. It may be redefined but doing so only makes sense before
+scanning begins or after an EOF has been encountered. Changing it in
+the midst of scanning will have unexpected results since
+.I flex
+buffers its input; use
+.B yyrestart()
+instead.
+Once scanning terminates because an end-of-file
+has been seen, you can assign
+.I yyin
+at the new input file and then call the scanner again to continue scanning.
+.IP -
+.B void yyrestart( FILE *new_file )
+may be called to point
+.I yyin
+at the new input file. The switch-over to the new file is immediate
+(any previously buffered-up input is lost). Note that calling
+.B yyrestart()
+with
+.I yyin
+as an argument thus throws away the current input buffer and continues
+scanning the same input file.
+.IP -
+.B FILE *yyout
+is the file to which
+.B ECHO
+actions are done. It can be reassigned by the user.
+.IP -
+.B YY_CURRENT_BUFFER
+returns a
+.B YY_BUFFER_STATE
+handle to the current buffer.
+.IP -
+.B YY_START
+returns an integer value corresponding to the current start
+condition. You can subsequently use this value with
+.B BEGIN
+to return to that start condition.
+.SH INTERFACING WITH YACC
+One of the main uses of
+.I flex
+is as a companion to the
+.I yacc
+parser-generator.
+.I yacc
+parsers expect to call a routine named
+.B yylex()
+to find the next input token. The routine is supposed to
+return the type of the next token as well as putting any associated
+value in the global
+.B yylval.
+To use
+.I flex
+with
+.I yacc,
+one specifies the
+.B \-d
+option to
+.I yacc
+to instruct it to generate the file
+.B y.tab.h
+containing definitions of all the
+.B %tokens
+appearing in the
+.I yacc
+input. This file is then included in the
+.I flex
+scanner. For example, if one of the tokens is "TOK_NUMBER",
+part of the scanner might look like:
+.nf
+
+ %{
+ #include "y.tab.h"
+ %}
+
+ %%
+
+ [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
+
+.fi
+.SH OPTIONS
+.I flex
+has the following options:
+.TP
+.B \-b
+Generate backing-up information to
+.I lex.backup.
+This is a list of scanner states which require backing up
+and the input characters on which they do so. By adding rules one
+can remove backing-up states. If
+.I all
+backing-up states are eliminated and
+.B \-Cf
+or
+.B \-CF
+is used, the generated scanner will run faster (see the
+.B \-p
+flag). Only users who wish to squeeze every last cycle out of their
+scanners need worry about this option. (See the section on Performance
+Considerations below.)
+.TP
+.B \-c
+is a do-nothing, deprecated option included for POSIX compliance.
+.TP
+.B \-d
+makes the generated scanner run in
+.I debug
+mode. Whenever a pattern is recognized and the global
+.B yy_flex_debug
+is non-zero (which is the default),
+the scanner will write to
+.I stderr
+a line of the form:
+.nf
+
+ --accepting rule at line 53 ("the matched text")
+
+.fi
+The line number refers to the location of the rule in the file
+defining the scanner (i.e., the file that was fed to flex). Messages
+are also generated when the scanner backs up, accepts the
+default rule, reaches the end of its input buffer (or encounters
+a NUL; at this point, the two look the same as far as the scanner's concerned),
+or reaches an end-of-file.
+.TP
+.B \-f
+specifies
+.I fast scanner.
+No table compression is done and stdio is bypassed.
+The result is large but fast. This option is equivalent to
+.B \-Cfr
+(see below).
+.TP
+.B \-h
+generates a "help" summary of
+.I flex's
+options to
+.I stdout
+and then exits.
+.B \-?
+and
+.B \-\-help
+are synonyms for
+.B \-h.
+.TP
+.B \-i
+instructs
+.I flex
+to generate a
+.I case-insensitive
+scanner. The case of letters given in the
+.I flex
+input patterns will
+be ignored, and tokens in the input will be matched regardless of case. The
+matched text given in
+.I yytext
+will have the preserved case (i.e., it will not be folded).
+.TP
+.B \-l
+turns on maximum compatibility with the original AT&T
+.I lex
+implementation. Note that this does not mean
+.I full
+compatibility. Use of this option costs a considerable amount of
+performance, and it cannot be used with the
+.B \-+, -f, -F, -Cf,
+or
+.B -CF
+options. For details on the compatibilities it provides, see the section
+"Incompatibilities With Lex And POSIX" below. This option also results
+in the name
+.B YY_FLEX_LEX_COMPAT
+being #define'd in the generated scanner.
+.TP
+.B \-n
+is another do-nothing, deprecated option included only for
+POSIX compliance.
+.TP
+.B \-p
+generates a performance report to stderr. The report
+consists of comments regarding features of the
+.I flex
+input file which will cause a serious loss of performance in the resulting
+scanner. If you give the flag twice, you will also get comments regarding
+features that lead to minor performance losses.
+.IP
+Note that the use of
+.B REJECT,
+.B %option yylineno,
+and variable trailing context (see the Deficiencies / Bugs section below)
+entails a substantial performance penalty; use of
+.I yymore(),
+the
+.B ^
+operator,
+and the
+.B \-I
+flag entail minor performance penalties.
+.TP
+.B \-s
+causes the
+.I default rule
+(that unmatched scanner input is echoed to
+.I stdout)
+to be suppressed. If the scanner encounters input that does not
+match any of its rules, it aborts with an error. This option is
+useful for finding holes in a scanner's rule set.
+.TP
+.B \-t
+instructs
+.I flex
+to write the scanner it generates to standard output instead
+of
+.B lex.yy.c.
+.TP
+.B \-v
+specifies that
+.I flex
+should write to
+.I stderr
+a summary of statistics regarding the scanner it generates.
+Most of the statistics are meaningless to the casual
+.I flex
+user, but the first line identifies the version of
+.I flex
+(same as reported by
+.B \-V),
+and the next line the flags used when generating the scanner, including
+those that are on by default.
+.TP
+.B \-w
+suppresses warning messages.
+.TP
+.B \-B
+instructs
+.I flex
+to generate a
+.I batch
+scanner, the opposite of
+.I interactive
+scanners generated by
+.B \-I
+(see below). In general, you use
+.B \-B
+when you are
+.I certain
+that your scanner will never be used interactively, and you want to
+squeeze a
+.I little
+more performance out of it. If your goal is instead to squeeze out a
+.I lot
+more performance, you should be using the
+.B \-Cf
+or
+.B \-CF
+options (discussed below), which turn on
+.B \-B
+automatically anyway.
+.TP
+.B \-F
+specifies that the
+.ul
+fast
+scanner table representation should be used (and stdio
+bypassed). This representation is
+about as fast as the full table representation
+.B (-f),
+and for some sets of patterns will be considerably smaller (and for
+others, larger). In general, if the pattern set contains both "keywords"
+and a catch-all, "identifier" rule, such as in the set:
+.nf
+
+ "case" return TOK_CASE;
+ "switch" return TOK_SWITCH;
+ ...
+ "default" return TOK_DEFAULT;
+ [a-z]+ return TOK_ID;
+
+.fi
+then you're better off using the full table representation. If only
+the "identifier" rule is present and you then use a hash table or some such
+to detect the keywords, you're better off using
+.B -F.
+.IP
+This option is equivalent to
+.B \-CFr
+(see below). It cannot be used with
+.B \-+.
+.TP
+.B \-I
+instructs
+.I flex
+to generate an
+.I interactive
+scanner. An interactive scanner is one that only looks ahead to decide
+what token has been matched if it absolutely must. It turns out that
+always looking one extra character ahead, even if the scanner has already
+seen enough text to disambiguate the current token, is a bit faster than
+only looking ahead when necessary. But scanners that always look ahead
+give dreadful interactive performance; for example, when a user types
+a newline, it is not recognized as a newline token until they enter
+.I another
+token, which often means typing in another whole line.
+.IP
+.I Flex
+scanners default to
+.I interactive
+unless you use the
+.B \-Cf
+or
+.B \-CF
+table-compression options (see below). That's because if you're looking
+for high-performance you should be using one of these options, so if you
+didn't,
+.I flex
+assumes you'd rather trade off a bit of run-time performance for intuitive
+interactive behavior. Note also that you
+.I cannot
+use
+.B \-I
+in conjunction with
+.B \-Cf
+or
+.B \-CF.
+Thus, this option is not really needed; it is on by default for all those
+cases in which it is allowed.
+.IP
+You can force a scanner to
+.I not
+be interactive by using
+.B \-B
+(see above).
+.TP
+.B \-L
+instructs
+.I flex
+not to generate
+.B #line
+directives. Without this option,
+.I flex
+peppers the generated scanner
+with #line directives so error messages in the actions will be correctly
+located with respect to either the original
+.I flex
+input file (if the errors are due to code in the input file), or
+.B lex.yy.c
+(if the errors are
+.I flex's
+fault -- you should report these sorts of errors to the email address
+given below).
+.TP
+.B \-T
+makes
+.I flex
+run in
+.I trace
+mode. It will generate a lot of messages to
+.I stderr
+concerning
+the form of the input and the resultant non-deterministic and deterministic
+finite automata. This option is mostly for use in maintaining
+.I flex.
+.TP
+.B \-V
+prints the version number to
+.I stdout
+and exits.
+.B \-\-version
+is a synonym for
+.B \-V.
+.TP
+.B \-7
+instructs
+.I flex
+to generate a 7-bit scanner, i.e., one which can only recognized 7-bit
+characters in its input. The advantage of using
+.B \-7
+is that the scanner's tables can be up to half the size of those generated
+using the
+.B \-8
+option (see below). The disadvantage is that such scanners often hang
+or crash if their input contains an 8-bit character.
+.IP
+Note, however, that unless you generate your scanner using the
+.B \-Cf
+or
+.B \-CF
+table compression options, use of
+.B \-7
+will save only a small amount of table space, and make your scanner
+considerably less portable.
+.I Flex's
+default behavior is to generate an 8-bit scanner unless you use the
+.B \-Cf
+or
+.B \-CF,
+in which case
+.I flex
+defaults to generating 7-bit scanners unless your site was always
+configured to generate 8-bit scanners (as will often be the case
+with non-USA sites). You can tell whether flex generated a 7-bit
+or an 8-bit scanner by inspecting the flag summary in the
+.B \-v
+output as described above.
+.IP
+Note that if you use
+.B \-Cfe
+or
+.B \-CFe
+(those table compression options, but also using equivalence classes as
+discussed see below), flex still defaults to generating an 8-bit
+scanner, since usually with these compression options full 8-bit tables
+are not much more expensive than 7-bit tables.
+.TP
+.B \-8
+instructs
+.I flex
+to generate an 8-bit scanner, i.e., one which can recognize 8-bit
+characters. This flag is only needed for scanners generated using
+.B \-Cf
+or
+.B \-CF,
+as otherwise flex defaults to generating an 8-bit scanner anyway.
+.IP
+See the discussion of
+.B \-7
+above for flex's default behavior and the tradeoffs between 7-bit
+and 8-bit scanners.
+.TP
+.B \-U
+instructs
+.I flex
+to generate a 16-bit scanner, i.e., one which can recognize Unicode
+characters. The tables of a scanner generated with
+.B \-U
+are always substantially larger than those of a 7- or 8-bit scanner, but there
+are three significant benefits if you need to scan Unicode. First, a 16-bit
+Unicode scanner is much faster than an equivalent 8-bit because it does not
+have to match NULs. Second, the generated scanner is portable - 8-bit Unicode
+scanners are non-portable because their patterns reflect the endianness of the
+platform on which they were written. Third, 16-bit patterns use the standard
+syntax of regular expressions with one small addition: escape sequences can
+specify 16-bit characters. For example, the patterns \177377 and \xFEFF both
+match the Unicode byte-order mark. Note the following related issues:
+.IP
+If your operating system distinguishes between text and binary file I/O,
+.I yyin
+and
+.I yyout
+should be opened in binary mode.
+.IP
+C++ support of Unicode varies. This currently limits 16-bit C++
+scanners to file I/O.
+.IP
+The
+.B \-C, -Cf,
+and
+.B \-CF
+table compression options are not available with
+.B \-U
+in order to keep table sizes within reason.
+.TP
+.B \-+
+specifies that you want flex to generate a C++
+scanner class. See the section on Generating C++ Scanners below for
+details.
+.TP
+.B \-C[aefFmr]
+controls the degree of table compression and, more generally, trade-offs
+between small scanners and fast scanners.
+.IP
+.B \-Ca
+("align") instructs flex to trade off larger tables in the
+generated scanner for faster performance because the elements of
+the tables are better aligned for memory access and computation. On some
+RISC architectures, fetching and manipulating longwords is more efficient
+than with smaller-sized units such as shortwords. This option can
+double the size of the tables used by a 7- or 8-bit scanner, and can
+quadruple those of a 16-bit scanner.
+.IP
+.B \-Ce
+directs
+.I flex
+to construct
+.I equivalence classes,
+i.e., sets of characters
+which have identical lexical properties (for example, if the only
+appearance of digits in the
+.I flex
+input is in the character class
+"[0-9]" then the digits '0', '1', ..., '9' will all be put
+in the same equivalence class). Equivalence classes usually give
+dramatic reductions in the final table/object file sizes (typically
+a factor of 2-5) and are pretty cheap performance-wise (one array
+look-up per character scanned).
+.IP
+.B \-Cf
+specifies that the
+.I full
+scanner tables should be generated -
+.I flex
+should not compress the
+tables by taking advantages of similar transition functions for
+different states. This option cannot be used with
+.B \-U.
+.IP
+.B \-CF
+specifies that the alternate fast scanner representation (described
+above under the
+.B \-F
+flag)
+should be used. This option cannot be used with
+.B \-+
+or
+.B \-U.
+.IP
+.B \-Cm
+directs
+.I flex
+to construct
+.I meta-equivalence classes,
+which are sets of equivalence classes (or characters, if equivalence
+classes are not being used) that are commonly used together. Meta-equivalence
+classes are often a big win when using compressed tables, but they
+have a moderate performance impact (one or two "if" tests and one
+array look-up per character scanned).
+.IP
+.B \-Cr
+causes the generated scanner to
+.I bypass
+use of the standard I/O library (stdio) for input. Instead of calling
+.B fread()
+or
+.B getc(),
+the scanner will use the
+.B read()
+system call, resulting in a performance gain which varies from system
+to system, but in general is probably negligible unless you are also using
+.B \-Cf
+or
+.B \-CF.
+Using
+.B \-Cr
+can cause strange behavior if, for example, you read from
+.I yyin
+using stdio prior to calling the scanner (because the scanner will miss
+whatever text your previous reads left in the stdio input buffer).
+.IP
+.B \-Cr
+has no effect if you define
+.B YY_INPUT
+(see The Generated Scanner above).
+.IP
+A lone
+.B \-C
+specifies that the scanner tables should be compressed but neither
+equivalence classes nor meta-equivalence classes should be used.
+This option cannot be used with
+.B \-U.
+.IP
+The options
+.B \-Cf
+or
+.B \-CF
+and
+.B \-Cm
+do not make sense together - there is no opportunity for meta-equivalence
+classes if the table is not being compressed. Otherwise the options
+may be freely mixed, and are cumulative.
+.IP
+The default setting is
+.B \-Cem,
+which specifies that
+.I flex
+should generate equivalence classes
+and meta-equivalence classes. This setting provides the highest
+degree of table compression. You can trade off
+faster-executing scanners at the cost of larger tables with
+the following generally being true:
+.nf
+
+ slowest & smallest
+ -Cem
+ -Cm
+ -Ce
+ -C
+ -C{f,F}e
+ -C{f,F}
+ -C{f,F}a
+ fastest & largest
+
+.fi
+Note that scanners with the smallest tables are usually generated and
+compiled the quickest, so
+during development you will usually want to use the default, maximal
+compression.
+.IP
+.B \-Cfe
+is often a good compromise between speed and size for production
+scanners.
+.TP
+.B \-ooutput
+directs flex to write the scanner to the file
+.B output
+instead of
+.B lex.yy.c.
+If you combine
+.B \-o
+with the
+.B \-t
+option, then the scanner is written to
+.I stdout
+but its
+.B #line
+directives (see the
+.B \\-L
+option above) refer to the file
+.B output.
+.TP
+.B \-Pprefix
+changes the default
+.I "yy"
+prefix used by
+.I flex
+for all globally-visible variable and function names to instead be
+.I prefix.
+For example,
+.B \-Pfoo
+changes the name of
+.B yytext
+to
+.B footext.
+It also changes the name of the default output file from
+.B lex.yy.c
+to
+.B lex.foo.c.
+Here are all of the names affected:
+.nf
+
+ yy_create_buffer
+ yy_delete_buffer
+ yy_flex_debug
+ yy_init_buffer
+ yy_flush_buffer
+ yy_load_buffer_state
+ yy_switch_to_buffer
+ yyin
+ yyleng
+ yylex
+ yylineno
+ yyout
+ yyrestart
+ yytext
+ yywrap
+
+.fi
+(If you are using a C++ scanner, then only
+.B yywrap
+and
+.B yyFlexLexer
+are affected.)
+Within your scanner itself, you can still refer to the global variables
+and functions using either version of their name; but externally, they
+have the modified name.
+.IP
+This option lets you easily link together multiple
+.I flex
+programs into the same executable. Note, though, that using this
+option also renames
+.B yywrap(),
+so you now
+.I must
+either
+provide your own (appropriately-named) version of the routine for your
+scanner, or use
+.B %option noyywrap,
+as linking with
+.B \-lfl
+no longer provides one for you by default.
+.TP
+.B \-Sskeleton_file
+overrides the default skeleton file from which
+.I flex
+constructs its scanners. You'll never need this option unless you are doing
+.I flex
+maintenance or development.
+.PP
+.I flex
+also provides a mechanism for controlling options within the
+scanner specification itself, rather than from the flex command-line.
+This is done by including
+.B %option
+directives in the first section of the scanner specification.
+You can specify multiple options with a single
+.B %option
+directive, and multiple directives in the first section of your flex input
+file.
+.PP
+Most options are given simply as names, optionally preceded by the
+word "no" (with no intervening whitespace) to negate their meaning.
+A number are equivalent to flex flags or their negation:
+.nf
+
+ 7bit -7 option
+ 8bit -8 option
+ align -Ca option
+ backup -b option
+ batch -B option
+ c++ -+ option
+
+ caseful or
+ case-sensitive opposite of -i (default)
+
+ case-insensitive or
+ caseless -i option
+
+ debug -d option
+ default opposite of -s option
+ ecs -Ce option
+ fast -F option
+ full -f option
+ interactive -I option
+ lex-compat -l option
+ meta-ecs -Cm option
+ perf-report -p option
+ read -Cr option
+ stdout -t option
+ verbose -v option
+ warn opposite of -w option
+ (use "%option nowarn" for -w)
+
+ array equivalent to "%array"
+ pointer equivalent to "%pointer" (default)
+
+.fi
+Some
+.B %option's
+provide features otherwise not available:
+.TP
+.B always-interactive
+instructs flex to generate a scanner which always considers its input
+"interactive". Normally, on each new input file the scanner calls
+.B isatty()
+in an attempt to determine whether
+the scanner's input source is interactive and thus should be read a
+character at a time. When this option is used, however, then no
+such call is made.
+.TP
+.B main
+directs flex to provide a default
+.B main()
+program for the scanner, which simply calls
+.B yylex().
+This option implies
+.B noyywrap
+(see below).
+.TP
+.B never-interactive
+instructs flex to generate a scanner which never considers its input
+"interactive" (again, no call made to
+.B isatty()).
+This is the opposite of
+.B always-interactive.
+.TP
+.B stack
+enables the use of start condition stacks (see Start Conditions above).
+.TP
+.B stdinit
+if set (i.e.,
+.B %option stdinit)
+initializes
+.I yyin
+and
+.I yyout
+to
+.I stdin
+and
+.I stdout,
+instead of the default of
+.I nil.
+Some existing
+.I lex
+programs depend on this behavior, even though it is not compliant with
+ANSI C, which does not require
+.I stdin
+and
+.I stdout
+to be compile-time constant.
+.TP
+.B yylineno
+directs
+.I flex
+to generate a scanner that maintains the number of the current line
+read from its input in the global variable
+.B yylineno.
+This option is implied by
+.B %option lex-compat.
+.TP
+.B yywrap
+if unset (i.e.,
+.B %option noyywrap),
+makes the scanner not call
+.B yywrap()
+upon an end-of-file, but simply assume that there are no more
+files to scan (until the user points
+.I yyin
+at a new file and calls
+.B yylex()
+again).
+.PP
+.I flex
+scans your rule actions to determine whether you use the
+.B REJECT
+or
+.B yymore()
+features. The
+.B reject
+and
+.B yymore
+options are available to override its decision as to whether you use the
+options, either by setting them (e.g.,
+.B %option reject)
+to indicate the feature is indeed used, or
+unsetting them to indicate it actually is not used
+(e.g.,
+.B %option noyymore).
+.PP
+Three options take string-delimited values, offset with '=':
+.nf
+
+ %option outfile="ABC"
+
+.fi
+is equivalent to
+.B -oABC,
+and
+.nf
+
+ %option prefix="XYZ"
+
+.fi
+is equivalent to
+.B -PXYZ.
+Finally,
+.nf
+
+ %option yyclass="foo"
+
+.fi
+only applies when generating a C++ scanner (
+.B \-+
+option). It informs
+.I flex
+that you have derived
+.B foo
+as a subclass of
+.B yyFlexLexer,
+so
+.I flex
+will place your actions in the member function
+.B foo::yylex()
+instead of
+.B yyFlexLexer::yylex().
+It also generates a
+.B yyFlexLexer::yylex()
+member function that emits a run-time error (by invoking
+.B yyFlexLexer::LexerError())
+if called.
+See Generating C++ Scanners, below, for additional information.
+.PP
+A number of options are available for lint purists who want to suppress
+the appearance of unneeded routines in the generated scanner. Each of the
+following, if unset
+(e.g.,
+.B %option nounput
+), results in the corresponding routine not appearing in
+the generated scanner:
+.nf
+
+ input, unput
+ yy_push_state, yy_pop_state, yy_top_state
+ yy_scan_buffer, yy_scan_bytes, yy_scan_string
+
+.fi
+(though
+.B yy_push_state()
+and friends won't appear anyway unless you use
+.B %option stack).
+.SH PERFORMANCE CONSIDERATIONS
+The main design goal of
+.I flex
+is that it generate high-performance scanners. It has been optimized
+for dealing well with large sets of rules. Aside from the effects on
+scanner speed of the table compression
+.B \-C
+options outlined above,
+there are a number of options/actions which degrade performance. These
+are, from most expensive to least:
+.nf
+
+ REJECT
+ %option yylineno
+ arbitrary trailing context
+
+ pattern sets that require backing up
+ %array
+ %option interactive
+ %option always-interactive
+
+ '^' beginning-of-line operator
+ yymore()
+
+.fi
+with the first three all being quite expensive and the last two
+being quite cheap. Note also that
+.B unput()
+is implemented as a routine call that potentially does quite a bit of
+work, while
+.B yyless()
+is a quite-cheap macro; so if just putting back some excess text you
+scanned, use
+.B yyless().
+.PP
+.B REJECT
+should be avoided at all costs when performance is important.
+It is a particularly expensive option.
+.PP
+Getting rid of backing up is messy and often may be an enormous
+amount of work for a complicated scanner. In principal, one begins
+by using the
+.B \-b
+flag to generate a
+.I lex.backup
+file. For example, on the input
+.nf
+
+ %%
+ foo return TOK_KEYWORD;
+ foobar return TOK_KEYWORD;
+
+.fi
+the file looks like:
+.nf
+
+ State #6 is non-accepting -
+ associated rule line numbers:
+ 2 3
+ out-transitions: [ o ]
+ jam-transitions: EOF [ \\001-n p-\\177 ]
+
+ State #8 is non-accepting -
+ associated rule line numbers:
+ 3
+ out-transitions: [ a ]
+ jam-transitions: EOF [ \\001-` b-\\177 ]
+
+ State #9 is non-accepting -
+ associated rule line numbers:
+ 3
+ out-transitions: [ r ]
+ jam-transitions: EOF [ \\001-q s-\\177 ]
+
+ Compressed tables always back up.
+
+.fi
+The first few lines tell us that there's a scanner state in
+which it can make a transition on an 'o' but not on any other
+character, and that in that state the currently scanned text does not match
+any rule. The state occurs when trying to match the rules found
+at lines 2 and 3 in the input file.
+If the scanner is in that state and then reads
+something other than an 'o', it will have to back up to find
+a rule which is matched. With
+a bit of headscratching one can see that this must be the
+state it's in when it has seen "fo". When this has happened,
+if anything other than another 'o' is seen, the scanner will
+have to back up to simply match the 'f' (by the default rule).
+.PP
+The comment regarding State #8 indicates there's a problem
+when "foob" has been scanned. Indeed, on any character other
+than an 'a', the scanner will have to back up to accept "foo".
+Similarly, the comment for State #9 concerns when "fooba" has
+been scanned and an 'r' does not follow.
+.PP
+The final comment reminds us that there's no point going to
+all the trouble of removing backing up from the rules unless
+we're using
+.B \-Cf
+or
+.B \-CF,
+since there's no performance gain doing so with compressed scanners.
+.PP
+The way to remove the backing up is to add "error" rules:
+.nf
+
+ %%
+ foo return TOK_KEYWORD;
+ foobar return TOK_KEYWORD;
+
+ fooba |
+ foob |
+ fo {
+ /* false alarm, not really a keyword */
+ return TOK_ID;
+ }
+
+.fi
+.PP
+Eliminating backing up among a list of keywords can also be
+done using a "catch-all" rule:
+.nf
+
+ %%
+ foo return TOK_KEYWORD;
+ foobar return TOK_KEYWORD;
+
+ [a-z]+ return TOK_ID;
+
+.fi
+This is usually the best solution when appropriate.
+.PP
+Backing up messages tend to cascade.
+With a complicated set of rules it's not uncommon to get hundreds
+of messages. If one can decipher them, though, it often
+only takes a dozen or so rules to eliminate the backing up (though
+it's easy to make a mistake and have an error rule accidentally match
+a valid token. A possible future
+.I flex
+feature will be to automatically add rules to eliminate backing up).
+.PP
+It's important to keep in mind that you gain the benefits of eliminating
+backing up only if you eliminate
+.I every
+instance of backing up. Leaving just one means you gain nothing.
+.PP
+.I Variable
+trailing context (where both the leading and trailing parts do not have
+a fixed length) entails almost the same performance loss as
+.B REJECT
+(i.e., substantial). So when possible a rule like:
+.nf
+
+ %%
+ mouse|rat/(cat|dog) run();
+
+.fi
+is better written:
+.nf
+
+ %%
+ mouse/cat|dog run();
+ rat/cat|dog run();
+
+.fi
+or as
+.nf
+
+ %%
+ mouse|rat/cat run();
+ mouse|rat/dog run();
+
+.fi
+Note that here the special '|' action does
+.I not
+provide any savings, and can even make things worse (see
+Deficiencies / Bugs below).
+.LP
+Another area where the user can increase a scanner's performance
+(and one that's easier to implement) arises from the fact that
+the longer the tokens matched, the faster the scanner will run.
+This is because with long tokens the processing of most input
+characters takes place in the (short) inner scanning loop, and
+does not often have to go through the additional work of setting up
+the scanning environment (e.g.,
+.B yytext)
+for the action. Recall the scanner for C comments:
+.nf
+
+ %x comment
+ %%
+ int line_num = 1;
+
+ "/*" BEGIN(comment);
+
+ <comment>[^*\\n]*
+ <comment>"*"+[^*/\\n]*
+ <comment>\\n ++line_num;
+ <comment>"*"+"/" BEGIN(INITIAL);
+
+.fi
+This could be sped up by writing it as:
+.nf
+
+ %x comment
+ %%
+ int line_num = 1;
+
+ "/*" BEGIN(comment);
+
+ <comment>[^*\\n]*
+ <comment>[^*\\n]*\\n ++line_num;
+ <comment>"*"+[^*/\\n]*
+ <comment>"*"+[^*/\\n]*\\n ++line_num;
+ <comment>"*"+"/" BEGIN(INITIAL);
+
+.fi
+Now instead of each newline requiring the processing of another
+action, recognizing the newlines is "distributed" over the other rules
+to keep the matched text as long as possible. Note that
+.I adding
+rules does
+.I not
+slow down the scanner! The speed of the scanner is independent
+of the number of rules or (modulo the considerations given at the
+beginning of this section) how complicated the rules are with
+regard to operators such as '*' and '|'.
+.PP
+A final example in speeding up a scanner: suppose you want to scan
+through a file containing identifiers and keywords, one per line
+and with no other extraneous characters, and recognize all the
+keywords. A natural first approach is:
+.nf
+
+ %%
+ asm |
+ auto |
+ break |
+ ... etc ...
+ volatile |
+ while /* it's a keyword */
+
+ .|\\n /* it's not a keyword */
+
+.fi
+To eliminate the back-tracking, introduce a catch-all rule:
+.nf
+
+ %%
+ asm |
+ auto |
+ break |
+ ... etc ...
+ volatile |
+ while /* it's a keyword */
+
+ [a-z]+ |
+ .|\\n /* it's not a keyword */
+
+.fi
+Now, if it's guaranteed that there's exactly one word per line,
+then we can reduce the total number of matches by a half by
+merging in the recognition of newlines with that of the other
+tokens:
+.nf
+
+ %%
+ asm\\n |
+ auto\\n |
+ break\\n |
+ ... etc ...
+ volatile\\n |
+ while\\n /* it's a keyword */
+
+ [a-z]+\\n |
+ .|\\n /* it's not a keyword */
+
+.fi
+One has to be careful here, as we have now reintroduced backing up
+into the scanner. In particular, while
+.I we
+know that there will never be any characters in the input stream
+other than letters or newlines,
+.I flex
+can't figure this out, and it will plan for possibly needing to back up
+when it has scanned a token like "auto" and then the next character
+is something other than a newline or a letter. Previously it would
+then just match the "auto" rule and be done, but now it has no "auto"
+rule, only a "auto\\n" rule. To eliminate the possibility of backing up,
+we could either duplicate all rules but without final newlines, or,
+since we never expect to encounter such an input and therefore don't
+how it's classified, we can introduce one more catch-all rule, this
+one which doesn't include a newline:
+.nf
+
+ %%
+ asm\\n |
+ auto\\n |
+ break\\n |
+ ... etc ...
+ volatile\\n |
+ while\\n /* it's a keyword */
+
+ [a-z]+\\n |
+ [a-z]+ |
+ .|\\n /* it's not a keyword */
+
+.fi
+Compiled with
+.B \-Cf,
+this is about as fast as one can get a
+.I flex
+scanner to go for this particular problem.
+.PP
+A final note:
+.I flex
+is slow when matching NUL's, particularly when a token contains
+multiple NUL's.
+It's best to write rules which match
+.I short
+amounts of text if it's anticipated that the text will often include NUL's.
+.PP
+Another final note regarding performance: as mentioned above in the section
+How the Input is Matched, dynamically resizing
+.B yytext
+to accommodate huge tokens is a slow process because it presently requires that
+the (huge) token be rescanned from the beginning. Thus if performance is
+vital, you should attempt to match "large" quantities of text but not
+"huge" quantities, where the cutoff between the two is at about 8K
+characters/token.
+.SH GENERATING C++ SCANNERS
+.I flex
+provides two different ways to generate scanners for use with C++. The
+first way is to simply compile a scanner generated by
+.I flex
+using a C++ compiler instead of a C compiler. You should not encounter
+any compilations errors (please report any you find to the email address
+given in the Author section below). You can then use C++ code in your
+rule actions instead of C code. Note that the default input source for
+your scanner remains
+.I yyin,
+and default echoing is still done to
+.I yyout.
+Both of these remain
+.I FILE *
+variables and not C++
+.I streams.
+.PP
+You can also use
+.I flex
+to generate a C++ scanner class, using the
+.B \-+
+option (or, equivalently,
+.B %option c++),
+which is automatically specified if the name of the flex
+executable ends in a '+', such as
+.I flex++.
+When using this option, flex defaults to generating the scanner to the file
+.B lex.yy.cc
+instead of
+.B lex.yy.c.
+The generated scanner includes the header file
+.I FlexLexer.h,
+which defines the interface to two C++ classes.
+.PP
+The first class,
+.B FlexLexer,
+provides an abstract base class defining the general scanner class
+interface. It provides the following member functions:
+.TP
+.B const char* YYText()
+returns the text of the most recently matched token, the equivalent of
+.B yytext.
+.TP
+.B int YYLeng()
+returns the length of the most recently matched token, the equivalent of
+.B yyleng.
+.TP
+.B int lineno() const
+returns the current input line number
+(see
+.B %option yylineno),
+or
+.B 1
+if
+.B %option yylineno
+was not used.
+.TP
+.B void set_debug( int flag )
+sets the debugging flag for the scanner, equivalent to assigning to
+.B yy_flex_debug
+(see the Options section above). Note that you must build the scanner
+using
+.B %option debug
+to include debugging information in it.
+.TP
+.B int debug() const
+returns the current setting of the debugging flag.
+.PP
+Also provided are member functions equivalent to
+.B yy_switch_to_buffer(),
+.B yy_create_buffer()
+(though the first argument is an
+.B istream*
+object pointer and not a
+.B FILE*),
+.B yy_flush_buffer(),
+.B yy_delete_buffer(),
+and
+.B yyrestart()
+(again, the first argument is a
+.B istream*
+object pointer).
+.PP
+The second class defined in
+.I FlexLexer.h
+is
+.B yyFlexLexer,
+which is derived from
+.B FlexLexer.
+It defines the following additional member functions:
+.TP
+.B
+yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
+constructs a
+.B yyFlexLexer
+object using the given streams for input and output. If not specified,
+the streams default to
+.B cin
+and
+.B cout,
+respectively.
+.TP
+.B virtual int yylex()
+performs the same role is
+.B yylex()
+does for ordinary flex scanners: it scans the input stream, consuming
+tokens, until a rule's action returns a value. If you derive a subclass
+.B S
+from
+.B yyFlexLexer
+and want to access the member functions and variables of
+.B S
+inside
+.B yylex(),
+then you need to use
+.B %option yyclass="S"
+to inform
+.I flex
+that you will be using that subclass instead of
+.B yyFlexLexer.
+In this case, rather than generating
+.B yyFlexLexer::yylex(),
+.I flex
+generates
+.B S::yylex()
+(and also generates a dummy
+.B yyFlexLexer::yylex()
+that calls
+.B yyFlexLexer::LexerError()
+if called).
+.TP
+.B
+virtual void switch_streams(istream* new_in = 0,
+.B
+ostream* new_out = 0)
+reassigns
+.B yyin
+to
+.B new_in
+(if non-nil)
+and
+.B yyout
+to
+.B new_out
+(ditto), deleting the previous input buffer if
+.B yyin
+is reassigned.
+.TP
+.B
+int yylex( istream* new_in, ostream* new_out = 0 )
+first switches the input streams via
+.B switch_streams( new_in, new_out )
+and then returns the value of
+.B yylex().
+.PP
+In addition,
+.B yyFlexLexer
+defines the following protected virtual functions which you can redefine
+in derived classes to tailor the scanner:
+.TP
+.B
+virtual int LexerInput( char* buf, int max_size )
+reads up to
+.B max_size
+characters into
+.B buf
+and returns the number of characters read. To indicate end-of-input,
+return 0 characters. Note that "interactive" scanners (see the
+.B \-B
+and
+.B \-I
+flags) define the macro
+.B YY_INTERACTIVE.
+If you redefine
+.B LexerInput()
+and need to take different actions depending on whether or not
+the scanner might be scanning an interactive input source, you can
+test for the presence of this name via
+.B #ifdef.
+.TP
+.B
+virtual void LexerOutput( const char* buf, int size )
+writes out
+.B size
+characters from the buffer
+.B buf,
+which, while NUL-terminated, may also contain "internal" NUL's if
+the scanner's rules can match text with NUL's in them.
+.TP
+.B
+virtual void LexerError( const char* msg )
+reports a fatal error message. The default version of this function
+writes the message to the stream
+.B cerr
+and exits.
+.PP
+Note that a
+.B yyFlexLexer
+object contains its
+.I entire
+scanning state. Thus you can use such objects to create reentrant
+scanners. You can instantiate multiple instances of the same
+.B yyFlexLexer
+class, and you can also combine multiple C++ scanner classes together
+in the same program using the
+.B \-P
+option discussed above.
+.PP
+Finally, note that the
+.B %array
+feature is not available to C++ scanner classes; you must use
+.B %pointer
+(the default).
+.PP
+Here is an example of a simple C++ scanner:
+.nf
+
+ // An example of using the flex C++ scanner class.
+
+ %{
+ int mylineno = 0;
+ %}
+
+ string \\"[^\\n"]+\\"
+
+ ws [ \\t]+
+
+ alpha [A-Za-z]
+ dig [0-9]
+ name ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
+ num1 [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
+ num2 [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
+ number {num1}|{num2}
+
+ %%
+
+ {ws} /* skip blanks and tabs */
+
+ "/*" {
+ int c;
+
+ while((c = yyinput()) != 0)
+ {
+ if(c == '\\n')
+ ++mylineno;
+
+ else if(c == '*')
+ {
+ if((c = yyinput()) == '/')
+ break;
+ else
+ unput(c);
+ }
+ }
+ }
+
+ {number} cout << "number " << YYText() << '\\n';
+
+ \\n mylineno++;
+
+ {name} cout << "name " << YYText() << '\\n';
+
+ {string} cout << "string " << YYText() << '\\n';
+
+ %%
+
+ int main( int /* argc */, char** /* argv */ )
+ {
+ FlexLexer* lexer = new yyFlexLexer;
+ while(lexer->yylex() != 0)
+ ;
+ return 0;
+ }
+.fi
+If you want to create multiple (different) lexer classes, you use the
+.B \-P
+flag (or the
+.B prefix=
+option) to rename each
+.B yyFlexLexer
+to some other
+.B xxFlexLexer.
+You then can include
+.B <FlexLexer.h>
+in your other sources once per lexer class, first renaming
+.B yyFlexLexer
+as follows:
+.nf
+
+ #undef yyFlexLexer
+ #define yyFlexLexer xxFlexLexer
+ #include <FlexLexer.h>
+
+ #undef yyFlexLexer
+ #define yyFlexLexer zzFlexLexer
+ #include <FlexLexer.h>
+
+.fi
+if, for example, you used
+.B %option prefix="xx"
+for one of your scanners and
+.B %option prefix="zz"
+for the other.
+.PP
+IMPORTANT: the present form of the scanning class is
+.I experimental
+and may change considerably between major releases.
+.SH INCOMPATIBILITIES WITH LEX AND POSIX
+.I flex
+is a rewrite of the AT&T Unix
+.I lex
+tool (the two implementations do not share any code, though),
+with some extensions and incompatibilities, both of which
+are of concern to those who wish to write scanners acceptable
+to either implementation. Flex is fully compliant with the POSIX
+.I lex
+specification, except that when using
+.B %pointer
+(the default), a call to
+.B unput()
+destroys the contents of
+.B yytext,
+which is counter to the POSIX specification.
+.PP
+In this section we discuss all of the known areas of incompatibility
+between flex, AT&T lex, and the POSIX specification.
+.PP
+.I flex's
+.B \-l
+option turns on maximum compatibility with the original AT&T
+.I lex
+implementation, at the cost of a major loss in the generated scanner's
+performance. We note below which incompatibilities can be overcome
+using the
+.B \-l
+option.
+.PP
+.I flex
+is fully compatible with
+.I lex
+with the following exceptions:
+.IP -
+The undocumented
+.I lex
+scanner internal variable
+.B yylineno
+is not supported unless
+.B \-l
+or
+.B %option yylineno
+is used.
+.IP
+.B yylineno
+should be maintained on a per-buffer basis, rather than a per-scanner
+(single global variable) basis.
+.IP
+.B yylineno
+is not part of the POSIX specification.
+.IP -
+The
+.B input()
+routine is not redefinable, though it may be called to read characters
+following whatever has been matched by a rule. If
+.B input()
+encounters an end-of-file the normal
+.B yywrap()
+processing is done. A ``real'' end-of-file is returned by
+.B input()
+as
+.I EOF.
+.IP
+Input is instead controlled by defining the
+.B YY_INPUT
+macro.
+.IP
+The
+.I flex
+restriction that
+.B input()
+cannot be redefined is in accordance with the POSIX specification,
+which simply does not specify any way of controlling the
+scanner's input other than by making an initial assignment to
+.I yyin.
+.IP -
+The
+.B unput()
+routine is not redefinable. This restriction is in accordance with POSIX.
+.IP -
+.I flex
+scanners are not as reentrant as
+.I lex
+scanners. In particular, if you have an interactive scanner and
+an interrupt handler which long-jumps out of the scanner, and
+the scanner is subsequently called again, you may get the following
+message:
+.nf
+
+ fatal flex scanner internal error--end of buffer missed
+
+.fi
+To reenter the scanner, first use
+.nf
+
+ yyrestart( yyin );
+
+.fi
+Note that this call will throw away any buffered input; usually this
+isn't a problem with an interactive scanner.
+.IP
+Also note that flex C++ scanner classes
+.I are
+reentrant, so if using C++ is an option for you, you should use
+them instead. See "Generating C++ Scanners" above for details.
+.IP -
+.B output()
+is not supported.
+Output from the
+.B ECHO
+macro is done to the file-pointer
+.I yyout
+(default
+.I stdout).
+.IP
+.B output()
+is not part of the POSIX specification.
+.IP -
+.I lex
+does not support exclusive start conditions (%x), though they
+are in the POSIX specification.
+.IP -
+When definitions are expanded,
+.I flex
+encloses them in parentheses.
+With lex, the following:
+.nf
+
+ NAME [A-Z][A-Z0-9]*
+ %%
+ foo{NAME}? printf( "Found it\\n" );
+ %%
+
+.fi
+will not match the string "foo" because when the macro
+is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
+and the precedence is such that the '?' is associated with
+"[A-Z0-9]*". With
+.I flex,
+the rule will be expanded to
+"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
+.IP
+Note that if the definition begins with
+.B ^
+or ends with
+.B $
+then it is
+.I not
+expanded with parentheses, to allow these operators to appear in
+definitions without losing their special meanings. But the
+.B <s>, /,
+and
+.B <<EOF>>
+operators cannot be used in a
+.I flex
+definition.
+.IP
+Using
+.B \-l
+results in the
+.I lex
+behavior of no parentheses around the definition.
+.IP
+The POSIX specification is that the definition be enclosed in parentheses.
+.IP -
+Some implementations of
+.I lex
+allow a rule's action to begin on a separate line, if the rule's pattern
+has trailing whitespace:
+.nf
+
+ %%
+ foo|bar<space here>
+ { foobar_action(); }
+
+.fi
+.I flex
+does not support this feature.
+.IP -
+The
+.I lex
+.B %r
+(generate a Ratfor scanner) option is not supported. It is not part
+of the POSIX specification.
+.IP -
+After a call to
+.B unput(),
+.I yytext
+is undefined until the next token is matched, unless the scanner
+was built using
+.B %array.
+This is not the case with
+.I lex
+or the POSIX specification. The
+.B \-l
+option does away with this incompatibility.
+.IP -
+The precedence of the
+.B {}
+(numeric range) operator is different.
+.I lex
+interprets "abc{1,3}" as "match one, two, or
+three occurrences of 'abc'", whereas
+.I flex
+interprets it as "match 'ab'
+followed by one, two, or three occurrences of 'c'". The latter is
+in agreement with the POSIX specification.
+.IP -
+The precedence of the
+.B ^
+operator is different.
+.I lex
+interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
+or 'bar' anywhere", whereas
+.I flex
+interprets it as "match either 'foo' or 'bar' if they come at the beginning
+of a line". The latter is in agreement with the POSIX specification.
+.IP -
+The special table-size declarations such as
+.B %a
+supported by
+.I lex
+are not required by
+.I flex
+scanners;
+.I flex
+ignores them.
+.IP -
+The name
+.B
+FLEX_SCANNER
+is #define'd so scanners may be written for use with either
+.I flex
+or
+.I lex.
+Scanners also include
+.B YY_FLEX_MAJOR_VERSION
+and
+.B YY_FLEX_MINOR_VERSION
+indicating which version of
+.I flex
+generated the scanner
+(for example, for the 2.5 release, these defines would be 2 and 5
+respectively).
+.PP
+The following
+.I flex
+features are not included in
+.I lex
+or the POSIX specification:
+.nf
+
+ C++ scanners
+ %option
+ start condition scopes
+ start condition stacks
+ interactive/non-interactive scanners
+ yy_scan_string() and friends
+ yyterminate()
+ yy_set_interactive()
+ yy_set_bol()
+ YY_AT_BOL()
+ <<EOF>>
+ <*>
+ YY_DECL
+ YY_START
+ YY_USER_ACTION
+ YY_USER_INIT
+ #line directives
+ %{}'s around actions
+ multiple actions on a line
+
+.fi
+plus almost all of the flex flags.
+The last feature in the list refers to the fact that with
+.I flex
+you can put multiple actions on the same line, separated with
+semi-colons, while with
+.I lex,
+the following
+.nf
+
+ foo handle_foo(); ++num_foos_seen;
+
+.fi
+is (rather surprisingly) truncated to
+.nf
+
+ foo handle_foo();
+
+.fi
+.I flex
+does not truncate the action. Actions that are not enclosed in
+braces are simply terminated at the end of the line.
+.SH DIAGNOSTICS
+.PP
+.I warning, rule cannot be matched
+indicates that the given rule
+cannot be matched because it follows other rules that will
+always match the same text as it. For
+example, in the following "foo" cannot be matched because it comes after
+an identifier "catch-all" rule:
+.nf
+
+ [a-z]+ got_identifier();
+ foo got_foo();
+
+.fi
+Using
+.B REJECT
+in a scanner suppresses this warning.
+.PP
+.I warning,
+.B \-s
+.I
+option given but default rule can be matched
+means that it is possible (perhaps only in a particular start condition)
+that the default rule (match any single character) is the only one
+that will match a particular input. Since
+.B \-s
+was given, presumably this is not intended.
+.PP
+.I reject_used_but_not_detected undefined
+or
+.I yymore_used_but_not_detected undefined -
+These errors can occur at compile time. They indicate that the
+scanner uses
+.B REJECT
+or
+.B yymore()
+but that
+.I flex
+failed to notice the fact, meaning that
+.I flex
+scanned the first two sections looking for occurrences of these actions
+and failed to find any, but somehow you snuck some in (via a #include
+file, for example). Use
+.B %option reject
+or
+.B %option yymore
+to indicate to flex that you really do use these features.
+.PP
+.I flex scanner jammed -
+a scanner compiled with
+.B \-s
+has encountered an input string which wasn't matched by
+any of its rules. This error can also occur due to internal problems.
+.PP
+.I token too large, exceeds YYLMAX -
+your scanner uses
+.B %array
+and one of its rules matched a string longer than the
+.B YYLMAX
+constant (8K bytes by default). You can increase the value by
+#define'ing
+.B YYLMAX
+in the definitions section of your
+.I flex
+input.
+.PP
+.I scanner requires \-8 flag to
+.I use the character 'x' -
+Your scanner specification includes recognizing the 8-bit character
+.I 'x'
+and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
+because you used the
+.B \-Cf
+or
+.B \-CF
+table compression options. See the discussion of the
+.B \-7
+flag for details.
+.PP
+.I flex scanner push-back overflow -
+you used
+.B unput()
+to push back so much text that the scanner's buffer could not hold
+both the pushed-back text and the current token in
+.B yytext.
+Ideally the scanner should dynamically resize the buffer in this case, but at
+present it does not.
+.PP
+.I
+input buffer overflow, can't enlarge buffer because scanner uses REJECT -
+the scanner was working on matching an extremely large token and needed
+to expand the input buffer. This doesn't work with scanners that use
+.B
+REJECT.
+.PP
+.I
+fatal flex scanner internal error--end of buffer missed -
+This can occur in an scanner which is reentered after a long-jump
+has jumped out (or over) the scanner's activation frame. Before
+reentering the scanner, use:
+.nf
+
+ yyrestart( yyin );
+
+.fi
+or, as noted above, switch to using the C++ scanner class.
+.PP
+.I too many start conditions in <> construct! -
+you listed more start conditions in a <> construct than exist (so
+you must have listed at least one of them twice).
+.SH FILES
+.TP
+.B \-lfl
+library with which scanners must be linked.
+.TP
+.I lex.yy.c
+generated scanner (called
+.I lexyy.c
+on some systems).
+.TP
+.I lex.yy.cc
+generated C++ scanner class, when using
+.B -+.
+.TP
+.I <FlexLexer.h>
+header file defining the C++ scanner base class,
+.B FlexLexer,
+and its derived class,
+.B yyFlexLexer.
+.TP
+.I flex.skl
+skeleton scanner. This file is only used when building flex, not when
+flex executes.
+.TP
+.I lex.backup
+backing-up information for
+.B \-b
+flag (called
+.I lex.bck
+on some systems).
+.SH DEFICIENCIES / BUGS
+.PP
+Some trailing context
+patterns cannot be properly matched and generate
+warning messages ("dangerous trailing context"). These are
+patterns where the ending of the
+first part of the rule matches the beginning of the second
+part, such as "zx*/xy*", where the 'x*' matches the 'x' at
+the beginning of the trailing context. (Note that the POSIX draft
+states that the text matched by such patterns is undefined.)
+.PP
+For some trailing context rules, parts which are actually fixed-length are
+not recognized as such, leading to the abovementioned performance loss.
+In particular, parts using '|' or {n} (such as "foo{3}") are always
+considered variable-length.
+.PP
+Combining trailing context with the special '|' action can result in
+.I fixed
+trailing context being turned into the more expensive
+.I variable
+trailing context. For example, in the following:
+.nf
+
+ %%
+ abc |
+ xyz/def
+
+.fi
+.PP
+Use of
+.B unput()
+invalidates yytext and yyleng, unless the
+.B %array
+directive
+or the
+.B \-l
+option has been used.
+.PP
+Pattern-matching of NUL's is substantially slower than matching other
+characters.
+.PP
+Dynamic resizing of the input buffer is slow, as it entails rescanning
+all the text matched so far by the current (generally huge) token.
+.PP
+Due to both buffering of input and read-ahead, you cannot intermix
+calls to <stdio.h> routines, such as, for example,
+.B getchar(),
+with
+.I flex
+rules and expect it to work. Call
+.B input()
+instead.
+.PP
+The total table entries listed by the
+.B \-v
+flag excludes the number of table entries needed to determine
+what rule has been matched. The number of entries is equal
+to the number of DFA states if the scanner does not use
+.B REJECT,
+and somewhat greater than the number of states if it does.
+.PP
+.B REJECT
+cannot be used with the
+.B \-f
+or
+.B \-F
+options.
+.PP
+The
+.I flex
+internal algorithms need documentation.
+.SH SEE ALSO
+.PP
+lex(1), yacc(1), sed(1), awk(1).
+.PP
+John Levine, Tony Mason, and Doug Brown,
+.I Lex & Yacc,
+O'Reilly and Associates. Be sure to get the 2nd edition.
+.PP
+M. E. Lesk and E. Schmidt,
+.I LEX \- Lexical Analyzer Generator
+.PP
+Alfred Aho, Ravi Sethi and Jeffrey Ullman,
+.I Compilers: Principles, Techniques and Tools,
+Addison-Wesley (1986). Describes the pattern-matching techniques used by
+.I flex
+(deterministic finite automata).
+.SH AUTHOR
+Vern Paxson, with the help of many ideas and much inspiration from
+Van Jacobson. Original version by Jef Poskanzer. The fast table
+representation is a partial implementation of a design done by Van
+Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
+.PP
+Thanks to the many
+.I flex
+beta-testers, feedbackers, and contributors, especially Francois Pinard,
+Casey Leedom,
+Robert Abramovitz,
+Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
+Neal Becker, Nelson H.F. Beebe, benson@odi.com,
+Karl Berry, Peter A. Bigot, Simon Blanchard,
+Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher,
+Brian Clapper, J.T. Conklin,
+Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David
+Daniels, Chris G. Demetriou, Theo Deraadt,
+Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin,
+Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl,
+Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz,
+Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel,
+Jan Hajic, Charles Hemphill, NORO Hideo,
+Jarkko Hietaniemi, Scott Hofmann,
+Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
+Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
+Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
+Amir Katz, ken@ken.hilco.com, Kevin B. Kenny,
+Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht,
+Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle,
+David Loffredo, Mike Long,
+Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall,
+Bengt Martensson, Chris Metcalf,
+Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum,
+G.T. Nicol, Landon Noll, James Nordby, Marc Nozell,
+Richard Ohnemus, Karsten Pahnke,
+Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
+Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,
+Frederic Raimbault, Pat Rankin, Rick Richardson,
+Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini,
+Andreas Scherer, Darrell Schiebel, Raf Schietekat,
+Doug Schmidt, Philippe Schnoebelen, Andreas Schwab,
+Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
+Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor,
+Chris Thewalt, Richard M. Timoney, Jodi Tsai,
+Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
+Yap, Ron Zellar, Nathan Zelle, David Zuhn,
+and those whose names have slipped my marginal
+mail-archiving skills but whose contributions are appreciated all the
+same.
+.PP
+Thanks to Keith Bostic, Jon Forrest, Noah Friedman,
+John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.
+Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various
+distribution headaches.
+.PP
+Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to
+Benson Margulies and Fred Burke for C++ support; to Kent Williams and Tom
+Epperly for C++ class support; to Ove Ewerlid for support of NUL's; and to
+Eric Hughes for support of multiple buffers.
+.PP
+This work was primarily done when I was with the Real Time Systems Group
+at the Lawrence Berkeley Laboratory in Berkeley, CA. Many thanks to all there
+for the support I received.
+.PP
+Send comments to vern@ee.lbl.gov.