summaryrefslogtreecommitdiff
path: root/MISC/flex.man
diff options
context:
space:
mode:
Diffstat (limited to 'MISC/flex.man')
-rw-r--r--MISC/flex.man3696
1 files changed, 3696 insertions, 0 deletions
diff --git a/MISC/flex.man b/MISC/flex.man
new file mode 100644
index 0000000..d41f5ba
--- /dev/null
+++ b/MISC/flex.man
@@ -0,0 +1,3696 @@
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+NAME
+ flex - fast lexical analyzer generator
+
+SYNOPSIS
+ flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix
+ -Sskeleton] [--help --version] [filename ...]
+
+OVERVIEW
+ This manual describes flex, a tool for generating programs
+ that perform pattern-matching on text. The manual includes
+ both tutorial and reference sections:
+
+ Description
+ a brief overview of the tool
+
+ Some Simple Examples
+
+ Format Of The Input File
+
+ Patterns
+ the extended regular expressions used by flex
+
+ How The Input Is Matched
+ the rules for determining what has been matched
+
+ Actions
+ how to specify what to do when a pattern is matched
+
+ The Generated Scanner
+ details regarding the scanner that flex produces;
+ how to control the input source
+
+ Start Conditions
+ introducing context into your scanners, and
+ managing "mini-scanners"
+
+ Multiple Input Buffers
+ how to manipulate multiple input sources; how to
+ scan from strings instead of files
+
+ End-of-file Rules
+ special rules for matching the end of the input
+
+ Miscellaneous Macros
+ a summary of macros available to the actions
+
+ Values Available To The User
+ a summary of values available to the actions
+
+ Interfacing With Yacc
+ connecting flex scanners together with yacc parsers
+
+
+
+
+Version 2.5 Last change: April 1995 1
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ Options
+ flex command-line options, and the "%option"
+ directive
+
+ Performance Considerations
+ how to make your scanner go as fast as possible
+
+ Generating C++ Scanners
+ the (experimental) facility for generating C++
+ scanner classes
+
+ Incompatibilities With Lex And POSIX
+ how flex differs from AT&T lex and the POSIX lex
+ standard
+
+ Diagnostics
+ those error messages produced by flex (or scanners
+ it generates) whose meanings might not be apparent
+
+ Files
+ files used by flex
+
+ Deficiencies / Bugs
+ known problems with flex
+
+ See Also
+ other documentation, related tools
+
+ Author
+ includes contact information
+
+
+DESCRIPTION
+ flex is a tool for generating scanners: programs which
+ recognized lexical patterns in text. flex reads the given
+ input files, or its standard input if no file names are
+ given, for a description of a scanner to generate. The
+ description is in the form of pairs of regular expressions
+ and C code, called rules. flex generates as output a C
+ source file, lex.yy.c, which defines a routine yylex(). This
+ file is compiled and linked with the -lfl library to produce
+ an executable. When the executable is run, it analyzes its
+ input for occurrences of the regular expressions. Whenever
+ it finds one, it executes the corresponding C code.
+
+SOME SIMPLE EXAMPLES
+ First some simple examples to get the flavor of how one uses
+ flex. The following flex input specifies a scanner which
+ whenever it encounters the string "username" will replace it
+ with the user's login name:
+
+ %%
+
+
+
+Version 2.5 Last change: April 1995 2
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ username printf( "%s", getlogin() );
+
+ By default, any text not matched by a flex scanner is copied
+ to the output, so the net effect of this scanner is to copy
+ its input file to its output with each occurrence of "user-
+ name" expanded. In this input, there is just one rule.
+ "username" is the pattern and the "printf" is the action.
+ The "%%" marks the beginning of the rules.
+
+ Here's another simple example:
+
+ int num_lines = 0, num_chars = 0;
+
+ %%
+ \n ++num_lines; ++num_chars;
+ . ++num_chars;
+
+ %%
+ main()
+ {
+ yylex();
+ printf( "# of lines = %d, # of chars = %d\n",
+ num_lines, num_chars );
+ }
+
+ This scanner counts the number of characters and the number
+ of lines in its input (it produces no output other than the
+ final report on the counts). The first line declares two
+ globals, "num_lines" and "num_chars", which are accessible
+ both inside yylex() and in the main() routine declared after
+ the second "%%". There are two rules, one which matches a
+ newline ("\n") and increments both the line count and the
+ character count, and one which matches any character other
+ than a newline (indicated by the "." regular expression).
+
+ A somewhat more complicated example:
+
+ /* scanner for a toy Pascal-like language */
+
+ %{
+ /* need this for the call to atof() below */
+ #include <math.h>
+ %}
+
+ DIGIT [0-9]
+ ID [a-z][a-z0-9]*
+
+ %%
+
+ {DIGIT}+ {
+ printf( "An integer: %s (%d)\n", yytext,
+ atoi( yytext ) );
+
+
+
+Version 2.5 Last change: April 1995 3
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ }
+
+ {DIGIT}+"."{DIGIT}* {
+ printf( "A float: %s (%g)\n", yytext,
+ atof( yytext ) );
+ }
+
+ if|then|begin|end|procedure|function {
+ printf( "A keyword: %s\n", yytext );
+ }
+
+ {ID} printf( "An identifier: %s\n", yytext );
+
+ "+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext );
+
+ "{"[^}\n]*"}" /* eat up one-line comments */
+
+ [ \t\n]+ /* eat up whitespace */
+
+ . printf( "Unrecognized character: %s\n", yytext );
+
+ %%
+
+ main( argc, argv )
+ int argc;
+ char **argv;
+ {
+ ++argv, --argc; /* skip over program name */
+ if ( argc > 0 )
+ yyin = fopen( argv[0], "r" );
+ else
+ yyin = stdin;
+
+ yylex();
+ }
+
+ This is the beginnings of a simple scanner for a language
+ like Pascal. It identifies different types of tokens and
+ reports on what it has seen.
+
+ The details of this example will be explained in the follow-
+ ing sections.
+
+FORMAT OF THE INPUT FILE
+ The flex input file consists of three sections, separated by
+ a line with just %% in it:
+
+ definitions
+ %%
+ rules
+ %%
+ user code
+
+
+
+Version 2.5 Last change: April 1995 4
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ The definitions section contains declarations of simple name
+ definitions to simplify the scanner specification, and
+ declarations of start conditions, which are explained in a
+ later section.
+
+ Name definitions have the form:
+
+ name definition
+
+ The "name" is a word beginning with a letter or an under-
+ score ('_') followed by zero or more letters, digits, '_',
+ or '-' (dash). The definition is taken to begin at the
+ first non-white-space character following the name and con-
+ tinuing to the end of the line. The definition can subse-
+ quently be referred to using "{name}", which will expand to
+ "(definition)". For example,
+
+ DIGIT [0-9]
+ ID [a-z][a-z0-9]*
+
+ defines "DIGIT" to be a regular expression which matches a
+ single digit, and "ID" to be a regular expression which
+ matches a letter followed by zero-or-more letters-or-digits.
+ A subsequent reference to
+
+ {DIGIT}+"."{DIGIT}*
+
+ is identical to
+
+ ([0-9])+"."([0-9])*
+
+ and matches one-or-more digits followed by a '.' followed by
+ zero-or-more digits.
+
+ The rules section of the flex input contains a series of
+ rules of the form:
+
+ pattern action
+
+ where the pattern must be unindented and the action must
+ begin on the same line.
+
+ See below for a further description of patterns and actions.
+
+ Finally, the user code section is simply copied to lex.yy.c
+ verbatim. It is used for companion routines which call or
+ are called by the scanner. The presence of this section is
+ optional; if it is missing, the second %% in the input file
+ may be skipped, too.
+
+ In the definitions and rules sections, any indented text or
+ text enclosed in %{ and %} is copied verbatim to the output
+
+
+
+Version 2.5 Last change: April 1995 5
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ (with the %{}'s removed). The %{}'s must appear unindented
+ on lines by themselves.
+
+ In the rules section, any indented or %{} text appearing
+ before the first rule may be used to declare variables which
+ are local to the scanning routine and (after the declara-
+ tions) code which is to be executed whenever the scanning
+ routine is entered. Other indented or %{} text in the rule
+ section is still copied to the output, but its meaning is
+ not well-defined and it may well cause compile-time errors
+ (this feature is present for POSIX compliance; see below for
+ other such features).
+
+ In the definitions section (but not in the rules section),
+ an unindented comment (i.e., a line beginning with "/*") is
+ also copied verbatim to the output up to the next "*/".
+
+PATTERNS
+ The patterns in the input are written using an extended set
+ of regular expressions. These are:
+
+ x match the character 'x'
+ . any character (byte) except newline
+ [xyz] a "character class"; in this case, the pattern
+ matches either an 'x', a 'y', or a 'z'
+ [abj-oZ] a "character class" with a range in it; matches
+ an 'a', a 'b', any letter from 'j' through 'o',
+ or a 'Z'
+ [^A-Z] a "negated character class", i.e., any character
+ but those in the class. In this case, any
+ character EXCEPT an uppercase letter.
+ [^A-Z\n] any character EXCEPT an uppercase letter or
+ a newline
+ r* zero or more r's, where r is any regular expression
+ r+ one or more r's
+ r? zero or one r's (that is, "an optional r")
+ r{2,5} anywhere from two to five r's
+ r{2,} two or more r's
+ r{4} exactly 4 r's
+ {name} the expansion of the "name" definition
+ (see above)
+ "[xyz]\"foo"
+ the literal string: [xyz]"foo
+ \X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
+ then the ANSI-C interpretation of \x.
+ Otherwise, a literal 'X' (used to escape
+ operators such as '*')
+ \0 a NUL character (ASCII code 0)
+ \123 the character with octal value 123
+ \x2a the character with hexadecimal value 2a
+ (r) match an r; parentheses are used to override
+ precedence (see below)
+
+
+
+Version 2.5 Last change: April 1995 6
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ rs the regular expression r followed by the
+ regular expression s; called "concatenation"
+
+
+ r|s either an r or an s
+
+
+ r/s an r but only if it is followed by an s. The
+ text matched by s is included when determining
+ whether this rule is the "longest match",
+ but is then returned to the input before
+ the action is executed. So the action only
+ sees the text matched by r. This type
+ of pattern is called trailing context".
+ (There are some combinations of r/s that flex
+ cannot match correctly; see notes in the
+ Deficiencies / Bugs section below regarding
+ "dangerous trailing context".)
+ ^r an r, but only at the beginning of a line (i.e.,
+ which just starting to scan, or right after a
+ newline has been scanned).
+ r$ an r, but only at the end of a line (i.e., just
+ before a newline). Equivalent to "r/\n".
+
+ Note that flex's notion of "newline" is exactly
+ whatever the C compiler used to compile flex
+ interprets '\n' as; in particular, on some DOS
+ systems you must either filter out \r's in the
+ input yourself, or explicitly use r/\r\n for "r$".
+
+
+ <s>r an r, but only in start condition s (see
+ below for discussion of start conditions)
+ <s1,s2,s3>r
+ same, but in any of start conditions s1,
+ s2, or s3
+ <*>r an r in any start condition, even an exclusive one.
+
+
+ <<EOF>> an end-of-file
+ <s1,s2><<EOF>>
+ an end-of-file when in start condition s1 or s2
+
+ Note that inside of a character class, all regular expres-
+ sion operators lose their special meaning except escape
+ ('\') and the character class operators, '-', ']', and, at
+ the beginning of the class, '^'.
+
+ The regular expressions listed above are grouped according
+ to precedence, from highest precedence at the top to lowest
+ at the bottom. Those grouped together have equal pre-
+ cedence. For example,
+
+
+
+Version 2.5 Last change: April 1995 7
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ foo|bar*
+
+ is the same as
+
+ (foo)|(ba(r*))
+
+ since the '*' operator has higher precedence than concatena-
+ tion, and concatenation higher than alternation ('|'). This
+ pattern therefore matches either the string "foo" or the
+ string "ba" followed by zero-or-more r's. To match "foo" or
+ zero-or-more "bar"'s, use:
+
+ foo|(bar)*
+
+ and to match zero-or-more "foo"'s-or-"bar"'s:
+
+ (foo|bar)*
+
+
+ In addition to characters and ranges of characters, charac-
+ ter classes can also contain character class expressions.
+ These are expressions enclosed inside [: and :] delimiters
+ (which themselves must appear between the '[' and ']' of the
+ character class; other elements may occur inside the charac-
+ ter class, too). The valid expressions are:
+
+ [:alnum:] [:alpha:] [:blank:]
+ [:cntrl:] [:digit:] [:graph:]
+ [:lower:] [:print:] [:punct:]
+ [:space:] [:upper:] [:xdigit:]
+
+ These expressions all designate a set of characters
+ equivalent to the corresponding standard C isXXX function.
+ For example, [:alnum:] designates those characters for which
+ isalnum() returns true - i.e., any alphabetic or numeric.
+ Some systems don't provide isblank(), so flex defines
+ [:blank:] as a blank or a tab.
+
+ For example, the following character classes are all
+ equivalent:
+
+ [[:alnum:]]
+ [[:alpha:][:digit:]
+ [[:alpha:]0-9]
+ [a-zA-Z0-9]
+
+ If your scanner is case-insensitive (the -i flag), then
+ [:upper:] and [:lower:] are equivalent to [:alpha:].
+
+ Some notes on patterns:
+
+ - A negated character class such as the example "[^A-Z]"
+
+
+
+Version 2.5 Last change: April 1995 8
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ above will match a newline unless "\n" (or an
+ equivalent escape sequence) is one of the characters
+ explicitly present in the negated character class
+ (e.g., "[^A-Z\n]"). This is unlike how many other reg-
+ ular expression tools treat negated character classes,
+ but unfortunately the inconsistency is historically
+ entrenched. Matching newlines means that a pattern
+ like [^"]* can match the entire input unless there's
+ another quote in the input.
+
+ - A rule can have at most one instance of trailing con-
+ text (the '/' operator or the '$' operator). The start
+ condition, '^', and "<<EOF>>" patterns can only occur
+ at the beginning of a pattern, and, as well as with '/'
+ and '$', cannot be grouped inside parentheses. A '^'
+ which does not occur at the beginning of a rule or a
+ '$' which does not occur at the end of a rule loses its
+ special properties and is treated as a normal charac-
+ ter.
+
+ The following are illegal:
+
+ foo/bar$
+ <sc1>foo<sc2>bar
+
+ Note that the first of these, can be written
+ "foo/bar\n".
+
+ The following will result in '$' or '^' being treated
+ as a normal character:
+
+ foo|(bar$)
+ foo|^bar
+
+ If what's wanted is a "foo" or a bar-followed-by-a-
+ newline, the following could be used (the special '|'
+ action is explained below):
+
+ foo |
+ bar$ /* action goes here */
+
+ A similar trick will work for matching a foo or a bar-
+ at-the-beginning-of-a-line.
+
+HOW THE INPUT IS MATCHED
+ When the generated scanner is run, it analyzes its input
+ looking for strings which match any of its patterns. If it
+ finds more than one match, it takes the one matching the
+ most text (for trailing context rules, this includes the
+ length of the trailing part, even though it will then be
+ returned to the input). If it finds two or more matches of
+ the same length, the rule listed first in the flex input
+
+
+
+Version 2.5 Last change: April 1995 9
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ file is chosen.
+
+ Once the match is determined, the text corresponding to the
+ match (called the token) is made available in the global
+ character pointer yytext, and its length in the global
+ integer yyleng. The action corresponding to the matched pat-
+ tern is then executed (a more detailed description of
+ actions follows), and then the remaining input is scanned
+ for another match.
+
+ If no match is found, then the default rule is executed: the
+ next character in the input is considered matched and copied
+ to the standard output. Thus, the simplest legal flex input
+ is:
+
+ %%
+
+ which generates a scanner that simply copies its input (one
+ character at a time) to its output.
+
+ Note that yytext can be defined in two different ways:
+ either as a character pointer or as a character array. You
+ can control which definition flex uses by including one of
+ the special directives %pointer or %array in the first
+ (definitions) section of your flex input. The default is
+ %pointer, unless you use the -l lex compatibility option, in
+ which case yytext will be an array. The advantage of using
+ %pointer is substantially faster scanning and no buffer
+ overflow when matching very large tokens (unless you run out
+ of dynamic memory). The disadvantage is that you are res-
+ tricted in how your actions can modify yytext (see the next
+ section), and calls to the unput() function destroys the
+ present contents of yytext, which can be a considerable
+ porting headache when moving between different lex versions.
+
+ The advantage of %array is that you can then modify yytext
+ to your heart's content, and calls to unput() do not destroy
+ yytext (see below). Furthermore, existing lex programs
+ sometimes access yytext externally using declarations of the
+ form:
+ extern char yytext[];
+ This definition is erroneous when used with %pointer, but
+ correct for %array.
+
+ %array defines yytext to be an array of YYLMAX characters,
+ which defaults to a fairly large value. You can change the
+ size by simply #define'ing YYLMAX to a different value in
+ the first section of your flex input. As mentioned above,
+ with %pointer yytext grows dynamically to accommodate large
+ tokens. While this means your %pointer scanner can accommo-
+ date very large tokens (such as matching entire blocks of
+ comments), bear in mind that each time the scanner must
+
+
+
+Version 2.5 Last change: April 1995 10
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ resize yytext it also must rescan the entire token from the
+ beginning, so matching such tokens can prove slow. yytext
+ presently does not dynamically grow if a call to unput()
+ results in too much text being pushed back; instead, a run-
+ time error results.
+
+ Also note that you cannot use %array with C++ scanner
+ classes (the c++ option; see below).
+
+ACTIONS
+ Each pattern in a rule has a corresponding action, which can
+ be any arbitrary C statement. The pattern ends at the first
+ non-escaped whitespace character; the remainder of the line
+ is its action. If the action is empty, then when the pat-
+ tern is matched the input token is simply discarded. For
+ example, here is the specification for a program which
+ deletes all occurrences of "zap me" from its input:
+
+ %%
+ "zap me"
+
+ (It will copy all other characters in the input to the out-
+ put since they will be matched by the default rule.)
+
+ Here is a program which compresses multiple blanks and tabs
+ down to a single blank, and throws away whitespace found at
+ the end of a line:
+
+ %%
+ [ \t]+ putchar( ' ' );
+ [ \t]+$ /* ignore this token */
+
+
+ If the action contains a '{', then the action spans till the
+ balancing '}' is found, and the action may cross multiple
+ lines. flex knows about C strings and comments and won't be
+ fooled by braces found within them, but also allows actions
+ to begin with %{ and will consider the action to be all the
+ text up to the next %} (regardless of ordinary braces inside
+ the action).
+
+ An action consisting solely of a vertical bar ('|') means
+ "same as the action for the next rule." See below for an
+ illustration.
+
+ Actions can include arbitrary C code, including return
+ statements to return a value to whatever routine called
+ yylex(). Each time yylex() is called it continues processing
+ tokens from where it last left off until it either reaches
+ the end of the file or executes a return.
+
+
+
+
+
+Version 2.5 Last change: April 1995 11
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ Actions are free to modify yytext except for lengthening it
+ (adding characters to its end--these will overwrite later
+ characters in the input stream). This however does not
+ apply when using %array (see above); in that case, yytext
+ may be freely modified in any way.
+
+ Actions are free to modify yyleng except they should not do
+ so if the action also includes use of yymore() (see below).
+
+ There are a number of special directives which can be
+ included within an action:
+
+ - ECHO copies yytext to the scanner's output.
+
+ - BEGIN followed by the name of a start condition places
+ the scanner in the corresponding start condition (see
+ below).
+
+ - REJECT directs the scanner to proceed on to the "second
+ best" rule which matched the input (or a prefix of the
+ input). The rule is chosen as described above in "How
+ the Input is Matched", and yytext and yyleng set up
+ appropriately. It may either be one which matched as
+ much text as the originally chosen rule but came later
+ in the flex input file, or one which matched less text.
+ For example, the following will both count the words in
+ the input and call the routine special() whenever
+ "frob" is seen:
+
+ int word_count = 0;
+ %%
+
+ frob special(); REJECT;
+ [^ \t\n]+ ++word_count;
+
+ Without the REJECT, any "frob"'s in the input would not
+ be counted as words, since the scanner normally exe-
+ cutes only one action per token. Multiple REJECT's are
+ allowed, each one finding the next best choice to the
+ currently active rule. For example, when the following
+ scanner scans the token "abcd", it will write "abcdab-
+ caba" to the output:
+
+ %%
+ a |
+ ab |
+ abc |
+ abcd ECHO; REJECT;
+ .|\n /* eat up any unmatched character */
+
+ (The first three rules share the fourth's action since
+ they use the special '|' action.) REJECT is a
+
+
+
+Version 2.5 Last change: April 1995 12
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ particularly expensive feature in terms of scanner per-
+ formance; if it is used in any of the scanner's actions
+ it will slow down all of the scanner's matching.
+ Furthermore, REJECT cannot be used with the -Cf or -CF
+ options (see below).
+
+ Note also that unlike the other special actions, REJECT
+ is a branch; code immediately following it in the
+ action will not be executed.
+
+ - yymore() tells the scanner that the next time it
+ matches a rule, the corresponding token should be
+ appended onto the current value of yytext rather than
+ replacing it. For example, given the input "mega-
+ kludge" the following will write "mega-mega-kludge" to
+ the output:
+
+ %%
+ mega- ECHO; yymore();
+ kludge ECHO;
+
+ First "mega-" is matched and echoed to the output.
+ Then "kludge" is matched, but the previous "mega-" is
+ still hanging around at the beginning of yytext so the
+ ECHO for the "kludge" rule will actually write "mega-
+ kludge".
+
+ Two notes regarding use of yymore(). First, yymore() depends
+ on the value of yyleng correctly reflecting the size of the
+ current token, so you must not modify yyleng if you are
+ using yymore(). Second, the presence of yymore() in the
+ scanner's action entails a minor performance penalty in the
+ scanner's matching speed.
+
+ - yyless(n) returns all but the first n characters of the
+ current token back to the input stream, where they will
+ be rescanned when the scanner looks for the next match.
+ yytext and yyleng are adjusted appropriately (e.g.,
+ yyleng will now be equal to n ). For example, on the
+ input "foobar" the following will write out "foobar-
+ bar":
+
+ %%
+ foobar ECHO; yyless(3);
+ [a-z]+ ECHO;
+
+ An argument of 0 to yyless will cause the entire
+ current input string to be scanned again. Unless
+ you've changed how the scanner will subsequently pro-
+ cess its input (using BEGIN, for example), this will
+ result in an endless loop.
+
+
+
+
+Version 2.5 Last change: April 1995 13
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ Note that yyless is a macro and can only be used in the flex
+ input file, not from other source files.
+
+ - unput(c) puts the character c back onto the input
+ stream. It will be the next character scanned. The
+ following action will take the current token and cause
+ it to be rescanned enclosed in parentheses.
+
+ {
+ int i;
+ /* Copy yytext because unput() trashes yytext */
+ char *yycopy = strdup( yytext );
+ unput( ')' );
+ for ( i = yyleng - 1; i >= 0; --i )
+ unput( yycopy[i] );
+ unput( '(' );
+ free( yycopy );
+ }
+
+ Note that since each unput() puts the given character
+ back at the beginning of the input stream, pushing back
+ strings must be done back-to-front.
+
+ An important potential problem when using unput() is that if
+ you are using %pointer (the default), a call to unput() des-
+ troys the contents of yytext, starting with its rightmost
+ character and devouring one character to the left with each
+ call. If you need the value of yytext preserved after a
+ call to unput() (as in the above example), you must either
+ first copy it elsewhere, or build your scanner using %array
+ instead (see How The Input Is Matched).
+
+ Finally, note that you cannot put back EOF to attempt to
+ mark the input stream with an end-of-file.
+
+ - input() reads the next character from the input stream.
+ For example, the following is one way to eat up C com-
+ ments:
+
+ %%
+ "/*" {
+ register int c;
+
+ for ( ; ; )
+ {
+ while ( (c = input()) != '*' &&
+ c != EOF )
+ ; /* eat up text of comment */
+
+ if ( c == '*' )
+ {
+ while ( (c = input()) == '*' )
+
+
+
+Version 2.5 Last change: April 1995 14
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ ;
+ if ( c == '/' )
+ break; /* found the end */
+ }
+
+ if ( c == EOF )
+ {
+ error( "EOF in comment" );
+ break;
+ }
+ }
+ }
+
+ (Note that if the scanner is compiled using C++, then
+ input() is instead referred to as yyinput(), in order
+ to avoid a name clash with the C++ stream by the name
+ of input.)
+
+ - YY_FLUSH_BUFFER flushes the scanner's internal buffer
+ so that the next time the scanner attempts to match a
+ token, it will first refill the buffer using YY_INPUT
+ (see The Generated Scanner, below). This action is a
+ special case of the more general yy_flush_buffer()
+ function, described below in the section Multiple Input
+ Buffers.
+
+ - yyterminate() can be used in lieu of a return statement
+ in an action. It terminates the scanner and returns a
+ 0 to the scanner's caller, indicating "all done". By
+ default, yyterminate() is also called when an end-of-
+ file is encountered. It is a macro and may be rede-
+ fined.
+
+THE GENERATED SCANNER
+ The output of flex is the file lex.yy.c, which contains the
+ scanning routine yylex(), a number of tables used by it for
+ matching tokens, and a number of auxiliary routines and mac-
+ ros. By default, yylex() is declared as follows:
+
+ int yylex()
+ {
+ ... various definitions and the actions in here ...
+ }
+
+ (If your environment supports function prototypes, then it
+ will be "int yylex( void )".) This definition may be
+ changed by defining the "YY_DECL" macro. For example, you
+ could use:
+
+ #define YY_DECL float lexscan( a, b ) float a, b;
+
+ to give the scanning routine the name lexscan, returning a
+
+
+
+Version 2.5 Last change: April 1995 15
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ float, and taking two floats as arguments. Note that if you
+ give arguments to the scanning routine using a K&R-
+ style/non-prototyped function declaration, you must ter-
+ minate the definition with a semi-colon (;).
+
+ Whenever yylex() is called, it scans tokens from the global
+ input file yyin (which defaults to stdin). It continues
+ until it either reaches an end-of-file (at which point it
+ returns the value 0) or one of its actions executes a return
+ statement.
+
+ If the scanner reaches an end-of-file, subsequent calls are
+ undefined unless either yyin is pointed at a new input file
+ (in which case scanning continues from that file), or yyres-
+ tart() is called. yyrestart() takes one argument, a FILE *
+ pointer (which can be nil, if you've set up YY_INPUT to scan
+ from a source other than yyin), and initializes yyin for
+ scanning from that file. Essentially there is no difference
+ between just assigning yyin to a new input file or using
+ yyrestart() to do so; the latter is available for compati-
+ bility with previous versions of flex, and because it can be
+ used to switch input files in the middle of scanning. It
+ can also be used to throw away the current input buffer, by
+ calling it with an argument of yyin; but better is to use
+ YY_FLUSH_BUFFER (see above). Note that yyrestart() does not
+ reset the start condition to INITIAL (see Start Conditions,
+ below).
+
+ If yylex() stops scanning due to executing a return state-
+ ment in one of the actions, the scanner may then be called
+ again and it will resume scanning where it left off.
+
+ By default (and for purposes of efficiency), the scanner
+ uses block-reads rather than simple getc() calls to read
+ characters from yyin. The nature of how it gets its input
+ can be controlled by defining the YY_INPUT macro.
+ YY_INPUT's calling sequence is
+ "YY_INPUT(buf,result,max_size)". Its action is to place up
+ to max_size characters in the character array buf and return
+ in the integer variable result either the number of charac-
+ ters read or the constant YY_NULL (0 on Unix systems) to
+ indicate EOF. The default YY_INPUT reads from the global
+ file-pointer "yyin".
+
+ A sample definition of YY_INPUT (in the definitions section
+ of the input file):
+
+ %{
+ #define YY_INPUT(buf,result,max_size) \
+ { \
+ int c = getchar(); \
+ result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
+
+
+
+Version 2.5 Last change: April 1995 16
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ }
+ %}
+
+ This definition will change the input processing to occur
+ one character at a time.
+
+ When the scanner receives an end-of-file indication from
+ YY_INPUT, it then checks the yywrap() function. If yywrap()
+ returns false (zero), then it is assumed that the function
+ has gone ahead and set up yyin to point to another input
+ file, and scanning continues. If it returns true (non-
+ zero), then the scanner terminates, returning 0 to its
+ caller. Note that in either case, the start condition
+ remains unchanged; it does not revert to INITIAL.
+
+ If you do not supply your own version of yywrap(), then you
+ must either use %option noyywrap (in which case the scanner
+ behaves as though yywrap() returned 1), or you must link
+ with -lfl to obtain the default version of the routine,
+ which always returns 1.
+
+ Three routines are available for scanning from in-memory
+ buffers rather than files: yy_scan_string(),
+ yy_scan_bytes(), and yy_scan_buffer(). See the discussion of
+ them below in the section Multiple Input Buffers.
+
+ The scanner writes its ECHO output to the yyout global
+ (default, stdout), which may be redefined by the user simply
+ by assigning it to some other FILE pointer.
+
+START CONDITIONS
+ flex provides a mechanism for conditionally activating
+ rules. Any rule whose pattern is prefixed with "<sc>" will
+ only be active when the scanner is in the start condition
+ named "sc". For example,
+
+ <STRING>[^"]* { /* eat up the string body ... */
+ ...
+ }
+
+ will be active only when the scanner is in the "STRING"
+ start condition, and
+
+ <INITIAL,STRING,QUOTE>\. { /* handle an escape ... */
+ ...
+ }
+
+ will be active only when the current start condition is
+ either "INITIAL", "STRING", or "QUOTE".
+
+ Start conditions are declared in the definitions (first)
+ section of the input using unindented lines beginning with
+
+
+
+Version 2.5 Last change: April 1995 17
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ either %s or %x followed by a list of names. The former
+ declares inclusive start conditions, the latter exclusive
+ start conditions. A start condition is activated using the
+ BEGIN action. Until the next BEGIN action is executed,
+ rules with the given start condition will be active and
+ rules with other start conditions will be inactive. If the
+ start condition is inclusive, then rules with no start con-
+ ditions at all will also be active. If it is exclusive,
+ then only rules qualified with the start condition will be
+ active. A set of rules contingent on the same exclusive
+ start condition describe a scanner which is independent of
+ any of the other rules in the flex input. Because of this,
+ exclusive start conditions make it easy to specify "mini-
+ scanners" which scan portions of the input that are syntac-
+ tically different from the rest (e.g., comments).
+
+ If the distinction between inclusive and exclusive start
+ conditions is still a little vague, here's a simple example
+ illustrating the connection between the two. The set of
+ rules:
+
+ %s example
+ %%
+
+ <example>foo do_something();
+
+ bar something_else();
+
+ is equivalent to
+
+ %x example
+ %%
+
+ <example>foo do_something();
+
+ <INITIAL,example>bar something_else();
+
+ Without the <INITIAL,example> qualifier, the bar pattern in
+ the second example wouldn't be active (i.e., couldn't match)
+ when in start condition example. If we just used <example>
+ to qualify bar, though, then it would only be active in
+ example and not in INITIAL, while in the first example it's
+ active in both, because in the first example the example
+ startion condition is an inclusive (%s) start condition.
+
+ Also note that the special start-condition specifier <*>
+ matches every start condition. Thus, the above example
+ could also have been written;
+
+ %x example
+ %%
+
+
+
+
+Version 2.5 Last change: April 1995 18
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ <example>foo do_something();
+
+ <*>bar something_else();
+
+
+ The default rule (to ECHO any unmatched character) remains
+ active in start conditions. It is equivalent to:
+
+ <*>.|\n ECHO;
+
+
+ BEGIN(0) returns to the original state where only the rules
+ with no start conditions are active. This state can also be
+ referred to as the start-condition "INITIAL", so
+ BEGIN(INITIAL) is equivalent to BEGIN(0). (The parentheses
+ around the start condition name are not required but are
+ considered good style.)
+
+ BEGIN actions can also be given as indented code at the
+ beginning of the rules section. For example, the following
+ will cause the scanner to enter the "SPECIAL" start condi-
+ tion whenever yylex() is called and the global variable
+ enter_special is true:
+
+ int enter_special;
+
+ %x SPECIAL
+ %%
+ if ( enter_special )
+ BEGIN(SPECIAL);
+
+ <SPECIAL>blahblahblah
+ ...more rules follow...
+
+
+ To illustrate the uses of start conditions, here is a
+ scanner which provides two different interpretations of a
+ string like "123.456". By default it will treat it as three
+ tokens, the integer "123", a dot ('.'), and the integer
+ "456". But if the string is preceded earlier in the line by
+ the string "expect-floats" it will treat it as a single
+ token, the floating-point number 123.456:
+
+ %{
+ #include <math.h>
+ %}
+ %s expect
+
+ %%
+ expect-floats BEGIN(expect);
+
+ <expect>[0-9]+"."[0-9]+ {
+
+
+
+Version 2.5 Last change: April 1995 19
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ printf( "found a float, = %f\n",
+ atof( yytext ) );
+ }
+ <expect>\n {
+ /* that's the end of the line, so
+ * we need another "expect-number"
+ * before we'll recognize any more
+ * numbers
+ */
+ BEGIN(INITIAL);
+ }
+
+ [0-9]+ {
+ printf( "found an integer, = %d\n",
+ atoi( yytext ) );
+ }
+
+ "." printf( "found a dot\n" );
+
+ Here is a scanner which recognizes (and discards) C comments
+ while maintaining a count of the current input line.
+
+ %x comment
+ %%
+ int line_num = 1;
+
+ "/*" BEGIN(comment);
+
+ <comment>[^*\n]* /* eat anything that's not a '*' */
+ <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
+ <comment>\n ++line_num;
+ <comment>"*"+"/" BEGIN(INITIAL);
+
+ This scanner goes to a bit of trouble to match as much text
+ as possible with each rule. In general, when attempting to
+ write a high-speed scanner try to match as much possible in
+ each rule, as it's a big win.
+
+ Note that start-conditions names are really integer values
+ and can be stored as such. Thus, the above could be
+ extended in the following fashion:
+
+ %x comment foo
+ %%
+ int line_num = 1;
+ int comment_caller;
+
+ "/*" {
+ comment_caller = INITIAL;
+ BEGIN(comment);
+ }
+
+
+
+
+Version 2.5 Last change: April 1995 20
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ ...
+
+ <foo>"/*" {
+ comment_caller = foo;
+ BEGIN(comment);
+ }
+
+ <comment>[^*\n]* /* eat anything that's not a '*' */
+ <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
+ <comment>\n ++line_num;
+ <comment>"*"+"/" BEGIN(comment_caller);
+
+ Furthermore, you can access the current start condition
+ using the integer-valued YY_START macro. For example, the
+ above assignments to comment_caller could instead be written
+
+ comment_caller = YY_START;
+
+ Flex provides YYSTATE as an alias for YY_START (since that
+ is what's used by AT&T lex).
+
+ Note that start conditions do not have their own name-space;
+ %s's and %x's declare names in the same fashion as
+ #define's.
+
+ Finally, here's an example of how to match C-style quoted
+ strings using exclusive start conditions, including expanded
+ escape sequences (but not including checking for a string
+ that's too long):
+
+ %x str
+
+ %%
+ char string_buf[MAX_STR_CONST];
+ char *string_buf_ptr;
+
+
+ \" string_buf_ptr = string_buf; BEGIN(str);
+
+ <str>\" { /* saw closing quote - all done */
+ BEGIN(INITIAL);
+ *string_buf_ptr = '\0';
+ /* return string constant token type and
+ * value to parser
+ */
+ }
+
+ <str>\n {
+ /* error - unterminated string constant */
+ /* generate error message */
+ }
+
+
+
+
+Version 2.5 Last change: April 1995 21
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ <str>\\[0-7]{1,3} {
+ /* octal escape sequence */
+ int result;
+
+ (void) sscanf( yytext + 1, "%o", &result );
+
+ if ( result > 0xff )
+ /* error, constant is out-of-bounds */
+
+ *string_buf_ptr++ = result;
+ }
+
+ <str>\\[0-9]+ {
+ /* generate error - bad escape sequence; something
+ * like '\48' or '\0777777'
+ */
+ }
+
+ <str>\\n *string_buf_ptr++ = '\n';
+ <str>\\t *string_buf_ptr++ = '\t';
+ <str>\\r *string_buf_ptr++ = '\r';
+ <str>\\b *string_buf_ptr++ = '\b';
+ <str>\\f *string_buf_ptr++ = '\f';
+
+ <str>\\(.|\n) *string_buf_ptr++ = yytext[1];
+
+ <str>[^\\\n\"]+ {
+ char *yptr = yytext;
+
+ while ( *yptr )
+ *string_buf_ptr++ = *yptr++;
+ }
+
+
+ Often, such as in some of the examples above, you wind up
+ writing a whole bunch of rules all preceded by the same
+ start condition(s). Flex makes this a little easier and
+ cleaner by introducing a notion of start condition scope. A
+ start condition scope is begun with:
+
+ <SCs>{
+
+ where SCs is a list of one or more start conditions. Inside
+ the start condition scope, every rule automatically has the
+ prefix <SCs> applied to it, until a '}' which matches the
+ initial '{'. So, for example,
+
+ <ESC>{
+ "\\n" return '\n';
+ "\\r" return '\r';
+ "\\f" return '\f';
+ "\\0" return '\0';
+
+
+
+Version 2.5 Last change: April 1995 22
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ }
+
+ is equivalent to:
+
+ <ESC>"\\n" return '\n';
+ <ESC>"\\r" return '\r';
+ <ESC>"\\f" return '\f';
+ <ESC>"\\0" return '\0';
+
+ Start condition scopes may be nested.
+
+ Three routines are available for manipulating stacks of
+ start conditions:
+
+ void yy_push_state(int new_state)
+ pushes the current start condition onto the top of the
+ start condition stack and switches to new_state as
+ though you had used BEGIN new_state (recall that start
+ condition names are also integers).
+
+ void yy_pop_state()
+ pops the top of the stack and switches to it via BEGIN.
+
+ int yy_top_state()
+ returns the top of the stack without altering the
+ stack's contents.
+
+ The start condition stack grows dynamically and so has no
+ built-in size limitation. If memory is exhausted, program
+ execution aborts.
+
+ To use start condition stacks, your scanner must include a
+ %option stack directive (see Options below).
+
+MULTIPLE INPUT BUFFERS
+ Some scanners (such as those which support "include" files)
+ require reading from several input streams. As flex
+ scanners do a large amount of buffering, one cannot control
+ where the next input will be read from by simply writing a
+ YY_INPUT which is sensitive to the scanning context.
+ YY_INPUT is only called when the scanner reaches the end of
+ its buffer, which may be a long time after scanning a state-
+ ment such as an "include" which requires switching the input
+ source.
+
+ To negotiate these sorts of problems, flex provides a
+ mechanism for creating and switching between multiple input
+ buffers. An input buffer is created by using:
+
+ YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
+
+ which takes a FILE pointer and a size and creates a buffer
+
+
+
+Version 2.5 Last change: April 1995 23
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ associated with the given file and large enough to hold size
+ characters (when in doubt, use YY_BUF_SIZE for the size).
+ It returns a YY_BUFFER_STATE handle, which may then be
+ passed to other routines (see below). The YY_BUFFER_STATE
+ type is a pointer to an opaque struct yy_buffer_state struc-
+ ture, so you may safely initialize YY_BUFFER_STATE variables
+ to ((YY_BUFFER_STATE) 0) if you wish, and also refer to the
+ opaque structure in order to correctly declare input buffers
+ in source files other than that of your scanner. Note that
+ the FILE pointer in the call to yy_create_buffer is only
+ used as the value of yyin seen by YY_INPUT; if you redefine
+ YY_INPUT so it no longer uses yyin, then you can safely pass
+ a nil FILE pointer to yy_create_buffer. You select a partic-
+ ular buffer to scan from using:
+
+ void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
+
+ switches the scanner's input buffer so subsequent tokens
+ will come from new_buffer. Note that yy_switch_to_buffer()
+ may be used by yywrap() to set things up for continued scan-
+ ning, instead of opening a new file and pointing yyin at it.
+ Note also that switching input sources via either
+ yy_switch_to_buffer() or yywrap() does not change the start
+ condition.
+
+ void yy_delete_buffer( YY_BUFFER_STATE buffer )
+
+ is used to reclaim the storage associated with a buffer. (
+ buffer can be nil, in which case the routine does nothing.)
+ You can also clear the current contents of a buffer using:
+
+ void yy_flush_buffer( YY_BUFFER_STATE buffer )
+
+ This function discards the buffer's contents, so the next
+ time the scanner attempts to match a token from the buffer,
+ it will first fill the buffer anew using YY_INPUT.
+
+ yy_new_buffer() is an alias for yy_create_buffer(), provided
+ for compatibility with the C++ use of new and delete for
+ creating and destroying dynamic objects.
+
+ Finally, the YY_CURRENT_BUFFER macro returns a
+ YY_BUFFER_STATE handle to the current buffer.
+
+ Here is an example of using these features for writing a
+ scanner which expands include files (the <<EOF>> feature is
+ discussed below):
+
+ /* the "incl" state is used for picking up the name
+ * of an include file
+ */
+ %x incl
+
+
+
+Version 2.5 Last change: April 1995 24
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ %{
+ #define MAX_INCLUDE_DEPTH 10
+ YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
+ int include_stack_ptr = 0;
+ %}
+
+ %%
+ include BEGIN(incl);
+
+ [a-z]+ ECHO;
+ [^a-z\n]*\n? ECHO;
+
+ <incl>[ \t]* /* eat the whitespace */
+ <incl>[^ \t\n]+ { /* got the include file name */
+ if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
+ {
+ fprintf( stderr, "Includes nested too deeply" );
+ exit( 1 );
+ }
+
+ include_stack[include_stack_ptr++] =
+ YY_CURRENT_BUFFER;
+
+ yyin = fopen( yytext, "r" );
+
+ if ( ! yyin )
+ error( ... );
+
+ yy_switch_to_buffer(
+ yy_create_buffer( yyin, YY_BUF_SIZE ) );
+
+ BEGIN(INITIAL);
+ }
+
+ <<EOF>> {
+ if ( --include_stack_ptr < 0 )
+ {
+ yyterminate();
+ }
+
+ else
+ {
+ yy_delete_buffer( YY_CURRENT_BUFFER );
+ yy_switch_to_buffer(
+ include_stack[include_stack_ptr] );
+ }
+ }
+
+ Three routines are available for setting up input buffers
+ for scanning in-memory strings instead of files. All of
+ them create a new input buffer for scanning the string, and
+ return a corresponding YY_BUFFER_STATE handle (which you
+
+
+
+Version 2.5 Last change: April 1995 25
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ should delete with yy_delete_buffer() when done with it).
+ They also switch to the new buffer using
+ yy_switch_to_buffer(), so the next call to yylex() will
+ start scanning the string.
+
+ yy_scan_string(const char *str)
+ scans a NUL-terminated string.
+
+ yy_scan_bytes(const char *bytes, int len)
+ scans len bytes (including possibly NUL's) starting at
+ location bytes.
+
+ Note that both of these functions create and scan a copy of
+ the string or bytes. (This may be desirable, since yylex()
+ modifies the contents of the buffer it is scanning.) You
+ can avoid the copy by using:
+
+ yy_scan_buffer(char *base, yy_size_t size)
+ which scans in place the buffer starting at base, con-
+ sisting of size bytes, the last two bytes of which must
+ be YY_END_OF_BUFFER_CHAR (ASCII NUL). These last two
+ bytes are not scanned; thus, scanning consists of
+ base[0] through base[size-2], inclusive.
+
+ If you fail to set up base in this manner (i.e., forget
+ the final two YY_END_OF_BUFFER_CHAR bytes), then
+ yy_scan_buffer() returns a nil pointer instead of
+ creating a new input buffer.
+
+ The type yy_size_t is an integral type to which you can
+ cast an integer expression reflecting the size of the
+ buffer.
+
+END-OF-FILE RULES
+ The special rule "<<EOF>>" indicates actions which are to be
+ taken when an end-of-file is encountered and yywrap()
+ returns non-zero (i.e., indicates no further files to pro-
+ cess). The action must finish by doing one of four things:
+
+ - assigning yyin to a new input file (in previous ver-
+ sions of flex, after doing the assignment you had to
+ call the special action YY_NEW_FILE; this is no longer
+ necessary);
+
+ - executing a return statement;
+
+ - executing the special yyterminate() action;
+
+ - or, switching to a new buffer using
+ yy_switch_to_buffer() as shown in the example above.
+
+
+
+
+
+Version 2.5 Last change: April 1995 26
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ <<EOF>> rules may not be used with other patterns; they may
+ only be qualified with a list of start conditions. If an
+ unqualified <<EOF>> rule is given, it applies to all start
+ conditions which do not already have <<EOF>> actions. To
+ specify an <<EOF>> rule for only the initial start condi-
+ tion, use
+
+ <INITIAL><<EOF>>
+
+
+ These rules are useful for catching things like unclosed
+ comments. An example:
+
+ %x quote
+ %%
+
+ ...other rules for dealing with quotes...
+
+ <quote><<EOF>> {
+ error( "unterminated quote" );
+ yyterminate();
+ }
+ <<EOF>> {
+ if ( *++filelist )
+ yyin = fopen( *filelist, "r" );
+ else
+ yyterminate();
+ }
+
+
+MISCELLANEOUS MACROS
+ The macro YY_USER_ACTION can be defined to provide an action
+ which is always executed prior to the matched rule's action.
+ For example, it could be #define'd to call a routine to con-
+ vert yytext to lower-case. When YY_USER_ACTION is invoked,
+ the variable yy_act gives the number of the matched rule
+ (rules are numbered starting with 1). Suppose you want to
+ profile how often each of your rules is matched. The fol-
+ lowing would do the trick:
+
+ #define YY_USER_ACTION ++ctr[yy_act]
+
+ where ctr is an array to hold the counts for the different
+ rules. Note that the macro YY_NUM_RULES gives the total
+ number of rules (including the default rule, even if you use
+ -s), so a correct declaration for ctr is:
+
+ int ctr[YY_NUM_RULES];
+
+
+ The macro YY_USER_INIT may be defined to provide an action
+ which is always executed before the first scan (and before
+
+
+
+Version 2.5 Last change: April 1995 27
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ the scanner's internal initializations are done). For exam-
+ ple, it could be used to call a routine to read in a data
+ table or open a logging file.
+
+ The macro yy_set_interactive(is_interactive) can be used to
+ control whether the current buffer is considered interac-
+ tive. An interactive buffer is processed more slowly, but
+ must be used when the scanner's input source is indeed
+ interactive to avoid problems due to waiting to fill buffers
+ (see the discussion of the -I flag below). A non-zero value
+ in the macro invocation marks the buffer as interactive, a
+ zero value as non-interactive. Note that use of this macro
+ overrides %option always-interactive or %option never-
+ interactive (see Options below). yy_set_interactive() must
+ be invoked prior to beginning to scan the buffer that is (or
+ is not) to be considered interactive.
+
+ The macro yy_set_bol(at_bol) can be used to control whether
+ the current buffer's scanning context for the next token
+ match is done as though at the beginning of a line. A non-
+ zero macro argument makes rules anchored with
+
+ The macro YY_AT_BOL() returns true if the next token scanned
+ from the current buffer will have '^' rules active, false
+ otherwise.
+
+ In the generated scanner, the actions are all gathered in
+ one large switch statement and separated using YY_BREAK,
+ which may be redefined. By default, it is simply a "break",
+ to separate each rule's action from the following rule's.
+ Redefining YY_BREAK allows, for example, C++ users to
+ #define YY_BREAK to do nothing (while being very careful
+ that every rule ends with a "break" or a "return"!) to avoid
+ suffering from unreachable statement warnings where because
+ a rule's action ends with "return", the YY_BREAK is inacces-
+ sible.
+
+VALUES AVAILABLE TO THE USER
+ This section summarizes the various values available to the
+ user in the rule actions.
+
+ - char *yytext holds the text of the current token. It
+ may be modified but not lengthened (you cannot append
+ characters to the end).
+
+ If the special directive %array appears in the first
+ section of the scanner description, then yytext is
+ instead declared char yytext[YYLMAX], where YYLMAX is a
+ macro definition that you can redefine in the first
+ section if you don't like the default value (generally
+ 8KB). Using %array results in somewhat slower
+ scanners, but the value of yytext becomes immune to
+
+
+
+Version 2.5 Last change: April 1995 28
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ calls to input() and unput(), which potentially destroy
+ its value when yytext is a character pointer. The
+ opposite of %array is %pointer, which is the default.
+
+ You cannot use %array when generating C++ scanner
+ classes (the -+ flag).
+
+ - int yyleng holds the length of the current token.
+
+ - FILE *yyin is the file which by default flex reads
+ from. It may be redefined but doing so only makes
+ sense before scanning begins or after an EOF has been
+ encountered. Changing it in the midst of scanning will
+ have unexpected results since flex buffers its input;
+ use yyrestart() instead. Once scanning terminates
+ because an end-of-file has been seen, you can assign
+ yyin at the new input file and then call the scanner
+ again to continue scanning.
+
+ - void yyrestart( FILE *new_file ) may be called to point
+ yyin at the new input file. The switch-over to the new
+ file is immediate (any previously buffered-up input is
+ lost). Note that calling yyrestart() with yyin as an
+ argument thus throws away the current input buffer and
+ continues scanning the same input file.
+
+ - FILE *yyout is the file to which ECHO actions are done.
+ It can be reassigned by the user.
+
+ - YY_CURRENT_BUFFER returns a YY_BUFFER_STATE handle to
+ the current buffer.
+
+ - YY_START returns an integer value corresponding to the
+ current start condition. You can subsequently use this
+ value with BEGIN to return to that start condition.
+
+INTERFACING WITH YACC
+ One of the main uses of flex is as a companion to the yacc
+ parser-generator. yacc parsers expect to call a routine
+ named yylex() to find the next input token. The routine is
+ supposed to return the type of the next token as well as
+ putting any associated value in the global yylval. To use
+ flex with yacc, one specifies the -d option to yacc to
+ instruct it to generate the file y.tab.h containing defini-
+ tions of all the %tokens appearing in the yacc input. This
+ file is then included in the flex scanner. For example, if
+ one of the tokens is "TOK_NUMBER", part of the scanner might
+ look like:
+
+ %{
+ #include "y.tab.h"
+ %}
+
+
+
+Version 2.5 Last change: April 1995 29
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ %%
+
+ [0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
+
+
+OPTIONS
+ flex has the following options:
+
+ -b Generate backing-up information to lex.backup. This is
+ a list of scanner states which require backing up and
+ the input characters on which they do so. By adding
+ rules one can remove backing-up states. If all
+ backing-up states are eliminated and -Cf or -CF is
+ used, the generated scanner will run faster (see the -p
+ flag). Only users who wish to squeeze every last cycle
+ out of their scanners need worry about this option.
+ (See the section on Performance Considerations below.)
+
+ -c is a do-nothing, deprecated option included for POSIX
+ compliance.
+
+ -d makes the generated scanner run in debug mode. When-
+ ever a pattern is recognized and the global
+ yy_flex_debug is non-zero (which is the default), the
+ scanner will write to stderr a line of the form:
+
+ --accepting rule at line 53 ("the matched text")
+
+ The line number refers to the location of the rule in
+ the file defining the scanner (i.e., the file that was
+ fed to flex). Messages are also generated when the
+ scanner backs up, accepts the default rule, reaches the
+ end of its input buffer (or encounters a NUL; at this
+ point, the two look the same as far as the scanner's
+ concerned), or reaches an end-of-file.
+
+ -f specifies fast scanner. No table compression is done
+ and stdio is bypassed. The result is large but fast.
+ This option is equivalent to -Cfr (see below).
+
+ -h generates a "help" summary of flex's options to stdout
+ and then exits. -? and --help are synonyms for -h.
+
+ -i instructs flex to generate a case-insensitive scanner.
+ The case of letters given in the flex input patterns
+ will be ignored, and tokens in the input will be
+ matched regardless of case. The matched text given in
+ yytext will have the preserved case (i.e., it will not
+ be folded).
+
+ -l turns on maximum compatibility with the original AT&T
+ lex implementation. Note that this does not mean full
+
+
+
+Version 2.5 Last change: April 1995 30
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ compatibility. Use of this option costs a considerable
+ amount of performance, and it cannot be used with the
+ -+, -f, -F, -Cf, or -CF options. For details on the
+ compatibilities it provides, see the section "Incompa-
+ tibilities With Lex And POSIX" below. This option also
+ results in the name YY_FLEX_LEX_COMPAT being #define'd
+ in the generated scanner.
+
+ -n is another do-nothing, deprecated option included only
+ for POSIX compliance.
+
+ -p generates a performance report to stderr. The report
+ consists of comments regarding features of the flex
+ input file which will cause a serious loss of perfor-
+ mance in the resulting scanner. If you give the flag
+ twice, you will also get comments regarding features
+ that lead to minor performance losses.
+
+ Note that the use of REJECT, %option yylineno, and
+ variable trailing context (see the Deficiencies / Bugs
+ section below) entails a substantial performance
+ penalty; use of yymore(), the ^ operator, and the -I
+ flag entail minor performance penalties.
+
+ -s causes the default rule (that unmatched scanner input
+ is echoed to stdout) to be suppressed. If the scanner
+ encounters input that does not match any of its rules,
+ it aborts with an error. This option is useful for
+ finding holes in a scanner's rule set.
+
+ -t instructs flex to write the scanner it generates to
+ standard output instead of lex.yy.c.
+
+ -v specifies that flex should write to stderr a summary of
+ statistics regarding the scanner it generates. Most of
+ the statistics are meaningless to the casual flex user,
+ but the first line identifies the version of flex (same
+ as reported by -V), and the next line the flags used
+ when generating the scanner, including those that are
+ on by default.
+
+ -w suppresses warning messages.
+
+ -B instructs flex to generate a batch scanner, the oppo-
+ site of interactive scanners generated by -I (see
+ below). In general, you use -B when you are certain
+ that your scanner will never be used interactively, and
+ you want to squeeze a little more performance out of
+ it. If your goal is instead to squeeze out a lot more
+ performance, you should be using the -Cf or -CF
+ options (discussed below), which turn on -B automati-
+ cally anyway.
+
+
+
+Version 2.5 Last change: April 1995 31
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ -F specifies that the fast scanner table representation
+ should be used (and stdio bypassed). This representa-
+ tion is about as fast as the full table representation
+ (-f), and for some sets of patterns will be consider-
+ ably smaller (and for others, larger). In general, if
+ the pattern set contains both "keywords" and a catch-
+ all, "identifier" rule, such as in the set:
+
+ "case" return TOK_CASE;
+ "switch" return TOK_SWITCH;
+ ...
+ "default" return TOK_DEFAULT;
+ [a-z]+ return TOK_ID;
+
+ then you're better off using the full table representa-
+ tion. If only the "identifier" rule is present and you
+ then use a hash table or some such to detect the key-
+ words, you're better off using -F.
+
+ This option is equivalent to -CFr (see below). It can-
+ not be used with -+.
+
+ -I instructs flex to generate an interactive scanner. An
+ interactive scanner is one that only looks ahead to
+ decide what token has been matched if it absolutely
+ must. It turns out that always looking one extra char-
+ acter ahead, even if the scanner has already seen
+ enough text to disambiguate the current token, is a bit
+ faster than only looking ahead when necessary. But
+ scanners that always look ahead give dreadful interac-
+ tive performance; for example, when a user types a new-
+ line, it is not recognized as a newline token until
+ they enter another token, which often means typing in
+ another whole line.
+
+ Flex scanners default to interactive unless you use the
+ -Cf or -CF table-compression options (see below).
+ That's because if you're looking for high-performance
+ you should be using one of these options, so if you
+ didn't, flex assumes you'd rather trade off a bit of
+ run-time performance for intuitive interactive
+ behavior. Note also that you cannot use -I in conjunc-
+ tion with -Cf or -CF. Thus, this option is not really
+ needed; it is on by default for all those cases in
+ which it is allowed.
+
+ You can force a scanner to not be interactive by using
+ -B (see above).
+
+ -L instructs flex not to generate #line directives.
+ Without this option, flex peppers the generated scanner
+ with #line directives so error messages in the actions
+
+
+
+Version 2.5 Last change: April 1995 32
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ will be correctly located with respect to either the
+ original flex input file (if the errors are due to code
+ in the input file), or lex.yy.c (if the errors are
+ flex's fault -- you should report these sorts of errors
+ to the email address given below).
+
+ -T makes flex run in trace mode. It will generate a lot
+ of messages to stderr concerning the form of the input
+ and the resultant non-deterministic and deterministic
+ finite automata. This option is mostly for use in
+ maintaining flex.
+
+ -V prints the version number to stdout and exits. --ver-
+ sion is a synonym for -V.
+
+ -7 instructs flex to generate a 7-bit scanner, i.e., one
+ which can only recognized 7-bit characters in its
+ input. The advantage of using -7 is that the scanner's
+ tables can be up to half the size of those generated
+ using the -8 option (see below). The disadvantage is
+ that such scanners often hang or crash if their input
+ contains an 8-bit character.
+
+ Note, however, that unless you generate your scanner
+ using the -Cf or -CF table compression options, use of
+ -7 will save only a small amount of table space, and
+ make your scanner considerably less portable. Flex's
+ default behavior is to generate an 8-bit scanner unless
+ you use the -Cf or -CF, in which case flex defaults to
+ generating 7-bit scanners unless your site was always
+ configured to generate 8-bit scanners (as will often be
+ the case with non-USA sites). You can tell whether
+ flex generated a 7-bit or an 8-bit scanner by inspect-
+ ing the flag summary in the -v output as described
+ above.
+
+ Note that if you use -Cfe or -CFe (those table compres-
+ sion options, but also using equivalence classes as
+ discussed see below), flex still defaults to generating
+ an 8-bit scanner, since usually with these compression
+ options full 8-bit tables are not much more expensive
+ than 7-bit tables.
+
+ -8 instructs flex to generate an 8-bit scanner, i.e., one
+ which can recognize 8-bit characters. This flag is
+ only needed for scanners generated using -Cf or -CF, as
+ otherwise flex defaults to generating an 8-bit scanner
+ anyway.
+
+ See the discussion of -7 above for flex's default
+ behavior and the tradeoffs between 7-bit and 8-bit
+ scanners.
+
+
+
+Version 2.5 Last change: April 1995 33
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ -+ specifies that you want flex to generate a C++ scanner
+ class. See the section on Generating C++ Scanners
+ below for details.
+
+ -C[aefFmr]
+ controls the degree of table compression and, more gen-
+ erally, trade-offs between small scanners and fast
+ scanners.
+
+ -Ca ("align") instructs flex to trade off larger tables
+ in the generated scanner for faster performance because
+ the elements of the tables are better aligned for
+ memory access and computation. On some RISC architec-
+ tures, fetching and manipulating longwords is more
+ efficient than with smaller-sized units such as short-
+ words. This option can double the size of the tables
+ used by your scanner.
+
+ -Ce directs flex to construct equivalence classes,
+ i.e., sets of characters which have identical lexical
+ properties (for example, if the only appearance of
+ digits in the flex input is in the character class
+ "[0-9]" then the digits '0', '1', ..., '9' will all be
+ put in the same equivalence class). Equivalence
+ classes usually give dramatic reductions in the final
+ table/object file sizes (typically a factor of 2-5) and
+ are pretty cheap performance-wise (one array look-up
+ per character scanned).
+
+ -Cf specifies that the full scanner tables should be
+ generated - flex should not compress the tables by tak-
+ ing advantages of similar transition functions for dif-
+ ferent states.
+
+ -CF specifies that the alternate fast scanner represen-
+ tation (described above under the -F flag) should be
+ used. This option cannot be used with -+.
+
+ -Cm directs flex to construct meta-equivalence classes,
+ which are sets of equivalence classes (or characters,
+ if equivalence classes are not being used) that are
+ commonly used together. Meta-equivalence classes are
+ often a big win when using compressed tables, but they
+ have a moderate performance impact (one or two "if"
+ tests and one array look-up per character scanned).
+
+ -Cr causes the generated scanner to bypass use of the
+ standard I/O library (stdio) for input. Instead of
+ calling fread() or getc(), the scanner will use the
+ read() system call, resulting in a performance gain
+ which varies from system to system, but in general is
+ probably negligible unless you are also using -Cf or
+
+
+
+Version 2.5 Last change: April 1995 34
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ -CF. Using -Cr can cause strange behavior if, for exam-
+ ple, you read from yyin using stdio prior to calling
+ the scanner (because the scanner will miss whatever
+ text your previous reads left in the stdio input
+ buffer).
+
+ -Cr has no effect if you define YY_INPUT (see The Gen-
+ erated Scanner above).
+
+ A lone -C specifies that the scanner tables should be
+ compressed but neither equivalence classes nor meta-
+ equivalence classes should be used.
+
+ The options -Cf or -CF and -Cm do not make sense
+ together - there is no opportunity for meta-equivalence
+ classes if the table is not being compressed. Other-
+ wise the options may be freely mixed, and are cumula-
+ tive.
+
+ The default setting is -Cem, which specifies that flex
+ should generate equivalence classes and meta-
+ equivalence classes. This setting provides the highest
+ degree of table compression. You can trade off
+ faster-executing scanners at the cost of larger tables
+ with the following generally being true:
+
+ slowest & smallest
+ -Cem
+ -Cm
+ -Ce
+ -C
+ -C{f,F}e
+ -C{f,F}
+ -C{f,F}a
+ fastest & largest
+
+ Note that scanners with the smallest tables are usually
+ generated and compiled the quickest, so during develop-
+ ment you will usually want to use the default, maximal
+ compression.
+
+ -Cfe is often a good compromise between speed and size
+ for production scanners.
+
+ -ooutput
+ directs flex to write the scanner to the file output
+ instead of lex.yy.c. If you combine -o with the -t
+ option, then the scanner is written to stdout but its
+ #line directives (see the -L option above) refer to the
+ file output.
+
+ -Pprefix
+
+
+
+Version 2.5 Last change: April 1995 35
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ changes the default yy prefix used by flex for all
+ globally-visible variable and function names to instead
+ be prefix. For example, -Pfoo changes the name of
+ yytext to footext. It also changes the name of the
+ default output file from lex.yy.c to lex.foo.c. Here
+ are all of the names affected:
+
+ yy_create_buffer
+ yy_delete_buffer
+ yy_flex_debug
+ yy_init_buffer
+ yy_flush_buffer
+ yy_load_buffer_state
+ yy_switch_to_buffer
+ yyin
+ yyleng
+ yylex
+ yylineno
+ yyout
+ yyrestart
+ yytext
+ yywrap
+
+ (If you are using a C++ scanner, then only yywrap and
+ yyFlexLexer are affected.) Within your scanner itself,
+ you can still refer to the global variables and func-
+ tions using either version of their name; but exter-
+ nally, they have the modified name.
+
+ This option lets you easily link together multiple flex
+ programs into the same executable. Note, though, that
+ using this option also renames yywrap(), so you now
+ must either provide your own (appropriately-named) ver-
+ sion of the routine for your scanner, or use %option
+ noyywrap, as linking with -lfl no longer provides one
+ for you by default.
+
+ -Sskeleton_file
+ overrides the default skeleton file from which flex
+ constructs its scanners. You'll never need this option
+ unless you are doing flex maintenance or development.
+
+ flex also provides a mechanism for controlling options
+ within the scanner specification itself, rather than from
+ the flex command-line. This is done by including %option
+ directives in the first section of the scanner specifica-
+ tion. You can specify multiple options with a single
+ %option directive, and multiple directives in the first sec-
+ tion of your flex input file.
+
+ Most options are given simply as names, optionally preceded
+ by the word "no" (with no intervening whitespace) to negate
+
+
+
+Version 2.5 Last change: April 1995 36
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ their meaning. A number are equivalent to flex flags or
+ their negation:
+
+ 7bit -7 option
+ 8bit -8 option
+ align -Ca option
+ backup -b option
+ batch -B option
+ c++ -+ option
+
+ caseful or
+ case-sensitive opposite of -i (default)
+
+ case-insensitive or
+ caseless -i option
+
+ debug -d option
+ default opposite of -s option
+ ecs -Ce option
+ fast -F option
+ full -f option
+ interactive -I option
+ lex-compat -l option
+ meta-ecs -Cm option
+ perf-report -p option
+ read -Cr option
+ stdout -t option
+ verbose -v option
+ warn opposite of -w option
+ (use "%option nowarn" for -w)
+
+ array equivalent to "%array"
+ pointer equivalent to "%pointer" (default)
+
+ Some %option's provide features otherwise not available:
+
+ always-interactive
+ instructs flex to generate a scanner which always con-
+ siders its input "interactive". Normally, on each new
+ input file the scanner calls isatty() in an attempt to
+ determine whether the scanner's input source is
+ interactive and thus should be read a character at a
+ time. When this option is used, however, then no such
+ call is made.
+
+ main directs flex to provide a default main() program for
+ the scanner, which simply calls yylex(). This option
+ implies noyywrap (see below).
+
+ never-interactive
+ instructs flex to generate a scanner which never con-
+ siders its input "interactive" (again, no call made to
+
+
+
+Version 2.5 Last change: April 1995 37
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ isatty()). This is the opposite of always-interactive.
+
+ stack
+ enables the use of start condition stacks (see Start
+ Conditions above).
+
+ stdinit
+ if set (i.e., %option stdinit) initializes yyin and
+ yyout to stdin and stdout, instead of the default of
+ nil. Some existing lex programs depend on this
+ behavior, even though it is not compliant with ANSI C,
+ which does not require stdin and stdout to be compile-
+ time constant.
+
+ yylineno
+ directs flex to generate a scanner that maintains the
+ number of the current line read from its input in the
+ global variable yylineno. This option is implied by
+ %option lex-compat.
+
+ yywrap
+ if unset (i.e., %option noyywrap), makes the scanner
+ not call yywrap() upon an end-of-file, but simply
+ assume that there are no more files to scan (until the
+ user points yyin at a new file and calls yylex()
+ again).
+
+ flex scans your rule actions to determine whether you use
+ the REJECT or yymore() features. The reject and yymore
+ options are available to override its decision as to whether
+ you use the options, either by setting them (e.g., %option
+ reject) to indicate the feature is indeed used, or unsetting
+ them to indicate it actually is not used (e.g., %option
+ noyymore).
+
+ Three options take string-delimited values, offset with '=':
+
+ %option outfile="ABC"
+
+ is equivalent to -oABC, and
+
+ %option prefix="XYZ"
+
+ is equivalent to -PXYZ. Finally,
+
+ %option yyclass="foo"
+
+ only applies when generating a C++ scanner ( -+ option). It
+ informs flex that you have derived foo as a subclass of
+ yyFlexLexer, so flex will place your actions in the member
+ function foo::yylex() instead of yyFlexLexer::yylex(). It
+ also generates a yyFlexLexer::yylex() member function that
+
+
+
+Version 2.5 Last change: April 1995 38
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ emits a run-time error (by invoking
+ yyFlexLexer::LexerError()) if called. See Generating C++
+ Scanners, below, for additional information.
+
+ A number of options are available for lint purists who want
+ to suppress the appearance of unneeded routines in the gen-
+ erated scanner. Each of the following, if unset (e.g.,
+ %option nounput ), results in the corresponding routine not
+ appearing in the generated scanner:
+
+ input, unput
+ yy_push_state, yy_pop_state, yy_top_state
+ yy_scan_buffer, yy_scan_bytes, yy_scan_string
+
+ (though yy_push_state() and friends won't appear anyway
+ unless you use %option stack).
+
+PERFORMANCE CONSIDERATIONS
+ The main design goal of flex is that it generate high-
+ performance scanners. It has been optimized for dealing
+ well with large sets of rules. Aside from the effects on
+ scanner speed of the table compression -C options outlined
+ above, there are a number of options/actions which degrade
+ performance. These are, from most expensive to least:
+
+ REJECT
+ %option yylineno
+ arbitrary trailing context
+
+ pattern sets that require backing up
+ %array
+ %option interactive
+ %option always-interactive
+
+ '^' beginning-of-line operator
+ yymore()
+
+ with the first three all being quite expensive and the last
+ two being quite cheap. Note also that unput() is imple-
+ mented as a routine call that potentially does quite a bit
+ of work, while yyless() is a quite-cheap macro; so if just
+ putting back some excess text you scanned, use yyless().
+
+ REJECT should be avoided at all costs when performance is
+ important. It is a particularly expensive option.
+
+ Getting rid of backing up is messy and often may be an enor-
+ mous amount of work for a complicated scanner. In princi-
+ pal, one begins by using the -b flag to generate a
+ lex.backup file. For example, on the input
+
+ %%
+
+
+
+Version 2.5 Last change: April 1995 39
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ foo return TOK_KEYWORD;
+ foobar return TOK_KEYWORD;
+
+ the file looks like:
+
+ State #6 is non-accepting -
+ associated rule line numbers:
+ 2 3
+ out-transitions: [ o ]
+ jam-transitions: EOF [ \001-n p-\177 ]
+
+ State #8 is non-accepting -
+ associated rule line numbers:
+ 3
+ out-transitions: [ a ]
+ jam-transitions: EOF [ \001-` b-\177 ]
+
+ State #9 is non-accepting -
+ associated rule line numbers:
+ 3
+ out-transitions: [ r ]
+ jam-transitions: EOF [ \001-q s-\177 ]
+
+ Compressed tables always back up.
+
+ The first few lines tell us that there's a scanner state in
+ which it can make a transition on an 'o' but not on any
+ other character, and that in that state the currently
+ scanned text does not match any rule. The state occurs when
+ trying to match the rules found at lines 2 and 3 in the
+ input file. If the scanner is in that state and then reads
+ something other than an 'o', it will have to back up to find
+ a rule which is matched. With a bit of headscratching one
+ can see that this must be the state it's in when it has seen
+ "fo". When this has happened, if anything other than
+ another 'o' is seen, the scanner will have to back up to
+ simply match the 'f' (by the default rule).
+
+ The comment regarding State #8 indicates there's a problem
+ when "foob" has been scanned. Indeed, on any character
+ other than an 'a', the scanner will have to back up to
+ accept "foo". Similarly, the comment for State #9 concerns
+ when "fooba" has been scanned and an 'r' does not follow.
+
+ The final comment reminds us that there's no point going to
+ all the trouble of removing backing up from the rules unless
+ we're using -Cf or -CF, since there's no performance gain
+ doing so with compressed scanners.
+
+ The way to remove the backing up is to add "error" rules:
+
+ %%
+
+
+
+Version 2.5 Last change: April 1995 40
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ foo return TOK_KEYWORD;
+ foobar return TOK_KEYWORD;
+
+ fooba |
+ foob |
+ fo {
+ /* false alarm, not really a keyword */
+ return TOK_ID;
+ }
+
+
+ Eliminating backing up among a list of keywords can also be
+ done using a "catch-all" rule:
+
+ %%
+ foo return TOK_KEYWORD;
+ foobar return TOK_KEYWORD;
+
+ [a-z]+ return TOK_ID;
+
+ This is usually the best solution when appropriate.
+
+ Backing up messages tend to cascade. With a complicated set
+ of rules it's not uncommon to get hundreds of messages. If
+ one can decipher them, though, it often only takes a dozen
+ or so rules to eliminate the backing up (though it's easy to
+ make a mistake and have an error rule accidentally match a
+ valid token. A possible future flex feature will be to
+ automatically add rules to eliminate backing up).
+
+ It's important to keep in mind that you gain the benefits of
+ eliminating backing up only if you eliminate every instance
+ of backing up. Leaving just one means you gain nothing.
+
+ Variable trailing context (where both the leading and trail-
+ ing parts do not have a fixed length) entails almost the
+ same performance loss as REJECT (i.e., substantial). So
+ when possible a rule like:
+
+ %%
+ mouse|rat/(cat|dog) run();
+
+ is better written:
+
+ %%
+ mouse/cat|dog run();
+ rat/cat|dog run();
+
+ or as
+
+ %%
+ mouse|rat/cat run();
+
+
+
+Version 2.5 Last change: April 1995 41
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ mouse|rat/dog run();
+
+ Note that here the special '|' action does not provide any
+ savings, and can even make things worse (see Deficiencies /
+ Bugs below).
+
+ Another area where the user can increase a scanner's perfor-
+ mance (and one that's easier to implement) arises from the
+ fact that the longer the tokens matched, the faster the
+ scanner will run. This is because with long tokens the pro-
+ cessing of most input characters takes place in the (short)
+ inner scanning loop, and does not often have to go through
+ the additional work of setting up the scanning environment
+ (e.g., yytext) for the action. Recall the scanner for C
+ comments:
+
+ %x comment
+ %%
+ int line_num = 1;
+
+ "/*" BEGIN(comment);
+
+ <comment>[^*\n]*
+ <comment>"*"+[^*/\n]*
+ <comment>\n ++line_num;
+ <comment>"*"+"/" BEGIN(INITIAL);
+
+ This could be sped up by writing it as:
+
+ %x comment
+ %%
+ int line_num = 1;
+
+ "/*" BEGIN(comment);
+
+ <comment>[^*\n]*
+ <comment>[^*\n]*\n ++line_num;
+ <comment>"*"+[^*/\n]*
+ <comment>"*"+[^*/\n]*\n ++line_num;
+ <comment>"*"+"/" BEGIN(INITIAL);
+
+ Now instead of each newline requiring the processing of
+ another action, recognizing the newlines is "distributed"
+ over the other rules to keep the matched text as long as
+ possible. Note that adding rules does not slow down the
+ scanner! The speed of the scanner is independent of the
+ number of rules or (modulo the considerations given at the
+ beginning of this section) how complicated the rules are
+ with regard to operators such as '*' and '|'.
+
+ A final example in speeding up a scanner: suppose you want
+ to scan through a file containing identifiers and keywords,
+
+
+
+Version 2.5 Last change: April 1995 42
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ one per line and with no other extraneous characters, and
+ recognize all the keywords. A natural first approach is:
+
+ %%
+ asm |
+ auto |
+ break |
+ ... etc ...
+ volatile |
+ while /* it's a keyword */
+
+ .|\n /* it's not a keyword */
+
+ To eliminate the back-tracking, introduce a catch-all rule:
+
+ %%
+ asm |
+ auto |
+ break |
+ ... etc ...
+ volatile |
+ while /* it's a keyword */
+
+ [a-z]+ |
+ .|\n /* it's not a keyword */
+
+ Now, if it's guaranteed that there's exactly one word per
+ line, then we can reduce the total number of matches by a
+ half by merging in the recognition of newlines with that of
+ the other tokens:
+
+ %%
+ asm\n |
+ auto\n |
+ break\n |
+ ... etc ...
+ volatile\n |
+ while\n /* it's a keyword */
+
+ [a-z]+\n |
+ .|\n /* it's not a keyword */
+
+ One has to be careful here, as we have now reintroduced
+ backing up into the scanner. In particular, while we know
+ that there will never be any characters in the input stream
+ other than letters or newlines, flex can't figure this out,
+ and it will plan for possibly needing to back up when it has
+ scanned a token like "auto" and then the next character is
+ something other than a newline or a letter. Previously it
+ would then just match the "auto" rule and be done, but now
+ it has no "auto" rule, only a "auto\n" rule. To eliminate
+ the possibility of backing up, we could either duplicate all
+
+
+
+Version 2.5 Last change: April 1995 43
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ rules but without final newlines, or, since we never expect
+ to encounter such an input and therefore don't how it's
+ classified, we can introduce one more catch-all rule, this
+ one which doesn't include a newline:
+
+ %%
+ asm\n |
+ auto\n |
+ break\n |
+ ... etc ...
+ volatile\n |
+ while\n /* it's a keyword */
+
+ [a-z]+\n |
+ [a-z]+ |
+ .|\n /* it's not a keyword */
+
+ Compiled with -Cf, this is about as fast as one can get a
+ flex scanner to go for this particular problem.
+
+ A final note: flex is slow when matching NUL's, particularly
+ when a token contains multiple NUL's. It's best to write
+ rules which match short amounts of text if it's anticipated
+ that the text will often include NUL's.
+
+ Another final note regarding performance: as mentioned above
+ in the section How the Input is Matched, dynamically resiz-
+ ing yytext to accommodate huge tokens is a slow process
+ because it presently requires that the (huge) token be res-
+ canned from the beginning. Thus if performance is vital,
+ you should attempt to match "large" quantities of text but
+ not "huge" quantities, where the cutoff between the two is
+ at about 8K characters/token.
+
+GENERATING C++ SCANNERS
+ flex provides two different ways to generate scanners for
+ use with C++. The first way is to simply compile a scanner
+ generated by flex using a C++ compiler instead of a C com-
+ piler. You should not encounter any compilations errors
+ (please report any you find to the email address given in
+ the Author section below). You can then use C++ code in
+ your rule actions instead of C code. Note that the default
+ input source for your scanner remains yyin, and default
+ echoing is still done to yyout. Both of these remain FILE *
+ variables and not C++ streams.
+
+ You can also use flex to generate a C++ scanner class, using
+ the -+ option (or, equivalently, %option c++), which is
+ automatically specified if the name of the flex executable
+ ends in a '+', such as flex++. When using this option, flex
+ defaults to generating the scanner to the file lex.yy.cc
+ instead of lex.yy.c. The generated scanner includes the
+
+
+
+Version 2.5 Last change: April 1995 44
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ header file FlexLexer.h, which defines the interface to two
+ C++ classes.
+
+ The first class, FlexLexer, provides an abstract base class
+ defining the general scanner class interface. It provides
+ the following member functions:
+
+ const char* YYText()
+ returns the text of the most recently matched token,
+ the equivalent of yytext.
+
+ int YYLeng()
+ returns the length of the most recently matched token,
+ the equivalent of yyleng.
+
+ int lineno() const
+ returns the current input line number (see %option
+ yylineno), or 1 if %option yylineno was not used.
+
+ void set_debug( int flag )
+ sets the debugging flag for the scanner, equivalent to
+ assigning to yy_flex_debug (see the Options section
+ above). Note that you must build the scanner using
+ %option debug to include debugging information in it.
+
+ int debug() const
+ returns the current setting of the debugging flag.
+
+ Also provided are member functions equivalent to
+ yy_switch_to_buffer(), yy_create_buffer() (though the first
+ argument is an istream* object pointer and not a FILE*),
+ yy_flush_buffer(), yy_delete_buffer(), and yyrestart()
+ (again, the first argument is a istream* object pointer).
+
+ The second class defined in FlexLexer.h is yyFlexLexer,
+ which is derived from FlexLexer. It defines the following
+ additional member functions:
+
+ yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
+ constructs a yyFlexLexer object using the given streams
+ for input and output. If not specified, the streams
+ default to cin and cout, respectively.
+
+ virtual int yylex()
+ performs the same role is yylex() does for ordinary
+ flex scanners: it scans the input stream, consuming
+ tokens, until a rule's action returns a value. If you
+ derive a subclass S from yyFlexLexer and want to access
+ the member functions and variables of S inside yylex(),
+ then you need to use %option yyclass="S" to inform flex
+ that you will be using that subclass instead of yyFlex-
+ Lexer. In this case, rather than generating
+
+
+
+Version 2.5 Last change: April 1995 45
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ yyFlexLexer::yylex(), flex generates S::yylex() (and
+ also generates a dummy yyFlexLexer::yylex() that calls
+ yyFlexLexer::LexerError() if called).
+
+ virtual void switch_streams(istream* new_in = 0,
+ ostream* new_out = 0) reassigns yyin to new_in (if
+ non-nil) and yyout to new_out (ditto), deleting the
+ previous input buffer if yyin is reassigned.
+
+ int yylex( istream* new_in, ostream* new_out = 0 )
+ first switches the input streams via switch_streams(
+ new_in, new_out ) and then returns the value of
+ yylex().
+
+ In addition, yyFlexLexer defines the following protected
+ virtual functions which you can redefine in derived classes
+ to tailor the scanner:
+
+ virtual int LexerInput( char* buf, int max_size )
+ reads up to max_size characters into buf and returns
+ the number of characters read. To indicate end-of-
+ input, return 0 characters. Note that "interactive"
+ scanners (see the -B and -I flags) define the macro
+ YY_INTERACTIVE. If you redefine LexerInput() and need
+ to take different actions depending on whether or not
+ the scanner might be scanning an interactive input
+ source, you can test for the presence of this name via
+ #ifdef.
+
+ virtual void LexerOutput( const char* buf, int size )
+ writes out size characters from the buffer buf, which,
+ while NUL-terminated, may also contain "internal" NUL's
+ if the scanner's rules can match text with NUL's in
+ them.
+
+ virtual void LexerError( const char* msg )
+ reports a fatal error message. The default version of
+ this function writes the message to the stream cerr and
+ exits.
+
+ Note that a yyFlexLexer object contains its entire scanning
+ state. Thus you can use such objects to create reentrant
+ scanners. You can instantiate multiple instances of the
+ same yyFlexLexer class, and you can also combine multiple
+ C++ scanner classes together in the same program using the
+ -P option discussed above.
+
+ Finally, note that the %array feature is not available to
+ C++ scanner classes; you must use %pointer (the default).
+
+ Here is an example of a simple C++ scanner:
+
+
+
+
+Version 2.5 Last change: April 1995 46
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ // An example of using the flex C++ scanner class.
+
+ %{
+ int mylineno = 0;
+ %}
+
+ string \"[^\n"]+\"
+
+ ws [ \t]+
+
+ alpha [A-Za-z]
+ dig [0-9]
+ name ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
+ num1 [-+]?{dig}+\.?([eE][-+]?{dig}+)?
+ num2 [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
+ number {num1}|{num2}
+
+ %%
+
+ {ws} /* skip blanks and tabs */
+
+ "/*" {
+ int c;
+
+ while((c = yyinput()) != 0)
+ {
+ if(c == '\n')
+ ++mylineno;
+
+ else if(c == '*')
+ {
+ if((c = yyinput()) == '/')
+ break;
+ else
+ unput(c);
+ }
+ }
+ }
+
+ {number} cout << "number " << YYText() << '\n';
+
+ \n mylineno++;
+
+ {name} cout << "name " << YYText() << '\n';
+
+ {string} cout << "string " << YYText() << '\n';
+
+ %%
+
+ int main( int /* argc */, char** /* argv */ )
+ {
+ FlexLexer* lexer = new yyFlexLexer;
+
+
+
+Version 2.5 Last change: April 1995 47
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ while(lexer->yylex() != 0)
+ ;
+ return 0;
+ }
+ If you want to create multiple (different) lexer classes,
+ you use the -P flag (or the prefix= option) to rename each
+ yyFlexLexer to some other xxFlexLexer. You then can include
+ <FlexLexer.h> in your other sources once per lexer class,
+ first renaming yyFlexLexer as follows:
+
+ #undef yyFlexLexer
+ #define yyFlexLexer xxFlexLexer
+ #include <FlexLexer.h>
+
+ #undef yyFlexLexer
+ #define yyFlexLexer zzFlexLexer
+ #include <FlexLexer.h>
+
+ if, for example, you used %option prefix="xx" for one of
+ your scanners and %option prefix="zz" for the other.
+
+ IMPORTANT: the present form of the scanning class is experi-
+ mental and may change considerably between major releases.
+
+INCOMPATIBILITIES WITH LEX AND POSIX
+ flex is a rewrite of the AT&T Unix lex tool (the two imple-
+ mentations do not share any code, though), with some exten-
+ sions and incompatibilities, both of which are of concern to
+ those who wish to write scanners acceptable to either imple-
+ mentation. Flex is fully compliant with the POSIX lex
+ specification, except that when using %pointer (the
+ default), a call to unput() destroys the contents of yytext,
+ which is counter to the POSIX specification.
+
+ In this section we discuss all of the known areas of incom-
+ patibility between flex, AT&T lex, and the POSIX specifica-
+ tion.
+
+ flex's -l option turns on maximum compatibility with the
+ original AT&T lex implementation, at the cost of a major
+ loss in the generated scanner's performance. We note below
+ which incompatibilities can be overcome using the -l option.
+
+ flex is fully compatible with lex with the following excep-
+ tions:
+
+ - The undocumented lex scanner internal variable yylineno
+ is not supported unless -l or %option yylineno is used.
+
+ yylineno should be maintained on a per-buffer basis,
+ rather than a per-scanner (single global variable)
+ basis.
+
+
+
+Version 2.5 Last change: April 1995 48
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ yylineno is not part of the POSIX specification.
+
+ - The input() routine is not redefinable, though it may
+ be called to read characters following whatever has
+ been matched by a rule. If input() encounters an end-
+ of-file the normal yywrap() processing is done. A
+ ``real'' end-of-file is returned by input() as EOF.
+
+ Input is instead controlled by defining the YY_INPUT
+ macro.
+
+ The flex restriction that input() cannot be redefined
+ is in accordance with the POSIX specification, which
+ simply does not specify any way of controlling the
+ scanner's input other than by making an initial assign-
+ ment to yyin.
+
+ - The unput() routine is not redefinable. This restric-
+ tion is in accordance with POSIX.
+
+ - flex scanners are not as reentrant as lex scanners. In
+ particular, if you have an interactive scanner and an
+ interrupt handler which long-jumps out of the scanner,
+ and the scanner is subsequently called again, you may
+ get the following message:
+
+ fatal flex scanner internal error--end of buffer missed
+
+ To reenter the scanner, first use
+
+ yyrestart( yyin );
+
+ Note that this call will throw away any buffered input;
+ usually this isn't a problem with an interactive
+ scanner.
+
+ Also note that flex C++ scanner classes are reentrant,
+ so if using C++ is an option for you, you should use
+ them instead. See "Generating C++ Scanners" above for
+ details.
+
+ - output() is not supported. Output from the ECHO macro
+ is done to the file-pointer yyout (default stdout).
+
+ output() is not part of the POSIX specification.
+
+ - lex does not support exclusive start conditions (%x),
+ though they are in the POSIX specification.
+
+ - When definitions are expanded, flex encloses them in
+ parentheses. With lex, the following:
+
+
+
+
+Version 2.5 Last change: April 1995 49
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ NAME [A-Z][A-Z0-9]*
+ %%
+ foo{NAME}? printf( "Found it\n" );
+ %%
+
+ will not match the string "foo" because when the macro
+ is expanded the rule is equivalent to "foo[A-Z][A-Z0-
+ 9]*?" and the precedence is such that the '?' is asso-
+ ciated with "[A-Z0-9]*". With flex, the rule will be
+ expanded to "foo([A-Z][A-Z0-9]*)?" and so the string
+ "foo" will match.
+
+ Note that if the definition begins with ^ or ends with
+ $ then it is not expanded with parentheses, to allow
+ these operators to appear in definitions without losing
+ their special meanings. But the <s>, /, and <<EOF>>
+ operators cannot be used in a flex definition.
+
+ Using -l results in the lex behavior of no parentheses
+ around the definition.
+
+ The POSIX specification is that the definition be
+ enclosed in parentheses.
+
+ - Some implementations of lex allow a rule's action to
+ begin on a separate line, if the rule's pattern has
+ trailing whitespace:
+
+ %%
+ foo|bar<space here>
+ { foobar_action(); }
+
+ flex does not support this feature.
+
+ - The lex %r (generate a Ratfor scanner) option is not
+ supported. It is not part of the POSIX specification.
+
+ - After a call to unput(), yytext is undefined until the
+ next token is matched, unless the scanner was built
+ using %array. This is not the case with lex or the
+ POSIX specification. The -l option does away with this
+ incompatibility.
+
+ - The precedence of the {} (numeric range) operator is
+ different. lex interprets "abc{1,3}" as "match one,
+ two, or three occurrences of 'abc'", whereas flex
+ interprets it as "match 'ab' followed by one, two, or
+ three occurrences of 'c'". The latter is in agreement
+ with the POSIX specification.
+
+ - The precedence of the ^ operator is different. lex
+ interprets "^foo|bar" as "match either 'foo' at the
+
+
+
+Version 2.5 Last change: April 1995 50
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ beginning of a line, or 'bar' anywhere", whereas flex
+ interprets it as "match either 'foo' or 'bar' if they
+ come at the beginning of a line". The latter is in
+ agreement with the POSIX specification.
+
+ - The special table-size declarations such as %a sup-
+ ported by lex are not required by flex scanners; flex
+ ignores them.
+
+ - The name FLEX_SCANNER is #define'd so scanners may be
+ written for use with either flex or lex. Scanners also
+ include YY_FLEX_MAJOR_VERSION and YY_FLEX_MINOR_VERSION
+ indicating which version of flex generated the scanner
+ (for example, for the 2.5 release, these defines would
+ be 2 and 5 respectively).
+
+ The following flex features are not included in lex or the
+ POSIX specification:
+
+ C++ scanners
+ %option
+ start condition scopes
+ start condition stacks
+ interactive/non-interactive scanners
+ yy_scan_string() and friends
+ yyterminate()
+ yy_set_interactive()
+ yy_set_bol()
+ YY_AT_BOL()
+ <<EOF>>
+ <*>
+ YY_DECL
+ YY_START
+ YY_USER_ACTION
+ YY_USER_INIT
+ #line directives
+ %{}'s around actions
+ multiple actions on a line
+
+ plus almost all of the flex flags. The last feature in the
+ list refers to the fact that with flex you can put multiple
+ actions on the same line, separated with semi-colons, while
+ with lex, the following
+
+ foo handle_foo(); ++num_foos_seen;
+
+ is (rather surprisingly) truncated to
+
+ foo handle_foo();
+
+ flex does not truncate the action. Actions that are not
+ enclosed in braces are simply terminated at the end of the
+
+
+
+Version 2.5 Last change: April 1995 51
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ line.
+
+DIAGNOSTICS
+ warning, rule cannot be matched indicates that the given
+ rule cannot be matched because it follows other rules that
+ will always match the same text as it. For example, in the
+ following "foo" cannot be matched because it comes after an
+ identifier "catch-all" rule:
+
+ [a-z]+ got_identifier();
+ foo got_foo();
+
+ Using REJECT in a scanner suppresses this warning.
+
+ warning, -s option given but default rule can be matched
+ means that it is possible (perhaps only in a particular
+ start condition) that the default rule (match any single
+ character) is the only one that will match a particular
+ input. Since -s was given, presumably this is not intended.
+
+ reject_used_but_not_detected undefined or
+ yymore_used_but_not_detected undefined - These errors can
+ occur at compile time. They indicate that the scanner uses
+ REJECT or yymore() but that flex failed to notice the fact,
+ meaning that flex scanned the first two sections looking for
+ occurrences of these actions and failed to find any, but
+ somehow you snuck some in (via a #include file, for exam-
+ ple). Use %option reject or %option yymore to indicate to
+ flex that you really do use these features.
+
+ flex scanner jammed - a scanner compiled with -s has encoun-
+ tered an input string which wasn't matched by any of its
+ rules. This error can also occur due to internal problems.
+
+ token too large, exceeds YYLMAX - your scanner uses %array
+ and one of its rules matched a string longer than the YYLMAX
+ constant (8K bytes by default). You can increase the value
+ by #define'ing YYLMAX in the definitions section of your
+ flex input.
+
+ scanner requires -8 flag to use the character 'x' - Your
+ scanner specification includes recognizing the 8-bit charac-
+ ter 'x' and you did not specify the -8 flag, and your
+ scanner defaulted to 7-bit because you used the -Cf or -CF
+ table compression options. See the discussion of the -7
+ flag for details.
+
+ flex scanner push-back overflow - you used unput() to push
+ back so much text that the scanner's buffer could not hold
+ both the pushed-back text and the current token in yytext.
+ Ideally the scanner should dynamically resize the buffer in
+ this case, but at present it does not.
+
+
+
+Version 2.5 Last change: April 1995 52
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ input buffer overflow, can't enlarge buffer because scanner
+ uses REJECT - the scanner was working on matching an
+ extremely large token and needed to expand the input buffer.
+ This doesn't work with scanners that use REJECT.
+
+ fatal flex scanner internal error--end of buffer missed -
+ This can occur in an scanner which is reentered after a
+ long-jump has jumped out (or over) the scanner's activation
+ frame. Before reentering the scanner, use:
+
+ yyrestart( yyin );
+
+ or, as noted above, switch to using the C++ scanner class.
+
+ too many start conditions in <> you listed more start condi-
+ tions in a <> construct than exist (so you must have listed
+ at least one of them twice).
+
+FILES
+ -lfl library with which scanners must be linked.
+
+ lex.yy.c
+ generated scanner (called lexyy.c on some systems).
+
+ lex.yy.cc
+ generated C++ scanner class, when using -+.
+
+ <FlexLexer.h>
+ header file defining the C++ scanner base class, Flex-
+ Lexer, and its derived class, yyFlexLexer.
+
+ flex.skl
+ skeleton scanner. This file is only used when building
+ flex, not when flex executes.
+
+ lex.backup
+ backing-up information for -b flag (called lex.bck on
+ some systems).
+
+DEFICIENCIES / BUGS
+ Some trailing context patterns cannot be properly matched
+ and generate warning messages ("dangerous trailing con-
+ text"). These are patterns where the ending of the first
+ part of the rule matches the beginning of the second part,
+ such as "zx*/xy*", where the 'x*' matches the 'x' at the
+ beginning of the trailing context. (Note that the POSIX
+ draft states that the text matched by such patterns is unde-
+ fined.)
+
+ For some trailing context rules, parts which are actually
+ fixed-length are not recognized as such, leading to the
+ abovementioned performance loss. In particular, parts using
+
+
+
+Version 2.5 Last change: April 1995 53
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ '|' or {n} (such as "foo{3}") are always considered
+ variable-length.
+
+ Combining trailing context with the special '|' action can
+ result in fixed trailing context being turned into the more
+ expensive variable trailing context. For example, in the
+ following:
+
+ %%
+ abc |
+ xyz/def
+
+
+ Use of unput() invalidates yytext and yyleng, unless the
+ %array directive or the -l option has been used.
+
+ Pattern-matching of NUL's is substantially slower than
+ matching other characters.
+
+ Dynamic resizing of the input buffer is slow, as it entails
+ rescanning all the text matched so far by the current (gen-
+ erally huge) token.
+
+ Due to both buffering of input and read-ahead, you cannot
+ intermix calls to <stdio.h> routines, such as, for example,
+ getchar(), with flex rules and expect it to work. Call
+ input() instead.
+
+ The total table entries listed by the -v flag excludes the
+ number of table entries needed to determine what rule has
+ been matched. The number of entries is equal to the number
+ of DFA states if the scanner does not use REJECT, and some-
+ what greater than the number of states if it does.
+
+ REJECT cannot be used with the -f or -F options.
+
+ The flex internal algorithms need documentation.
+
+SEE ALSO
+ lex(1), yacc(1), sed(1), awk(1).
+
+ John Levine, Tony Mason, and Doug Brown, Lex & Yacc,
+ O'Reilly and Associates. Be sure to get the 2nd edition.
+
+ M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator
+
+ Alfred Aho, Ravi Sethi and Jeffrey Ullman, Compilers: Prin-
+ ciples, Techniques and Tools, Addison-Wesley (1986).
+ Describes the pattern-matching techniques used by flex
+ (deterministic finite automata).
+
+
+
+
+
+Version 2.5 Last change: April 1995 54
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+AUTHOR
+ Vern Paxson, with the help of many ideas and much inspira-
+ tion from Van Jacobson. Original version by Jef Poskanzer.
+ The fast table representation is a partial implementation of
+ a design done by Van Jacobson. The implementation was done
+ by Kevin Gong and Vern Paxson.
+
+ Thanks to the many flex beta-testers, feedbackers, and con-
+ tributors, especially Francois Pinard, Casey Leedom, Robert
+ Abramovitz, Stan Adermann, Terry Allen, David Barker-
+ Plummer, John Basrai, Neal Becker, Nelson H.F. Beebe,
+ benson@odi.com, Karl Berry, Peter A. Bigot, Simon Blanchard,
+ Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick
+ Christopher, Brian Clapper, J.T. Conklin, Jason Coughlin,
+ Bill Cox, Nick Cropper, Dave Curtis, Scott David Daniels,
+ Chris G. Demetriou, Theo Deraadt, Mike Donahue, Chuck
+ Doucette, Tom Epperly, Leo Eskin, Chris Faylor, Chris
+ Flatters, Jon Forrest, Jeffrey Friedl, Joe Gayda, Kaveh R.
+ Ghazi, Wolfgang Glunz, Eric Goldman, Christopher M. Gould,
+ Ulrich Grepel, Peer Griebel, Jan Hajic, Charles Hemphill,
+ NORO Hideo, Jarkko Hietaniemi, Scott Hofmann, Jeff Honig,
+ Dana Hudes, Eric Hughes, John Interrante, Ceriel Jacobs,
+ Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, Henry
+ Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
+ Amir Katz, ken@ken.hilco.com, Kevin B. Kenny, Steve Kirsch,
+ Winfried Koenig, Marq Kole, Ronald Lamprecht, Greg Lee,
+ Rohan Lenard, Craig Leres, John Levine, Steve Liddle, David
+ Loffredo, Mike Long, Mohamed el Lozy, Brian Madsen, Malte,
+ Joe Marshall, Bengt Martensson, Chris Metcalf, Luke Mewburn,
+ Jim Meyering, R. Alexander Milowski, Erik Naggum, G.T.
+ Nicol, Landon Noll, James Nordby, Marc Nozell, Richard
+ Ohnemus, Karsten Pahnke, Sven Panne, Roland Pesch, Walter
+ Pelissero, Gaumond Pierre, Esmond Pitt, Jef Poskanzer, Joe
+ Rahmeh, Jarmo Raiha, Frederic Raimbault, Pat Rankin, Rick
+ Richardson, Kevin Rodgers, Kai Uwe Rommel, Jim Roskind,
+ Alberto Santini, Andreas Scherer, Darrell Schiebel, Raf
+ Schietekat, Doug Schmidt, Philippe Schnoebelen, Andreas
+ Schwab, Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-
+ Erik Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, Ian
+ Lance Taylor, Chris Thewalt, Richard M. Timoney, Jodi Tsai,
+ Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms,
+ Kent Williams, Ken Yap, Ron Zellar, Nathan Zelle, David
+ Zuhn, and those whose names have slipped my marginal mail-
+ archiving skills but whose contributions are appreciated all
+ the same.
+
+ Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John
+ Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T. Nicol,
+ Francois Pinard, Rich Salz, and Richard Stallman for help
+ with various distribution headaches.
+
+
+
+
+
+Version 2.5 Last change: April 1995 55
+
+
+
+
+
+
+FLEX(1) USER COMMANDS FLEX(1)
+
+
+
+ Thanks to Esmond Pitt and Earle Horton for 8-bit character
+ support; to Benson Margulies and Fred Burke for C++ support;
+ to Kent Williams and Tom Epperly for C++ class support; to
+ Ove Ewerlid for support of NUL's; and to Eric Hughes for
+ support of multiple buffers.
+
+ This work was primarily done when I was with the Real Time
+ Systems Group at the Lawrence Berkeley Laboratory in Berke-
+ ley, CA. Many thanks to all there for the support I
+ received.
+
+ Send comments to vern@ee.lbl.gov.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Version 2.5 Last change: April 1995 56
+
+
+