Imported Upstream version 2.5.31

author: Manoj Srivastava <srivasta@golden-gryphon.com> 2003-12-03 22:33:17 -0800
committer: Manoj Srivastava <srivasta@golden-gryphon.com> 2003-12-03 22:33:17 -0800
commit: c2b22e08bd48278f2cf125f054c9f6286e345ff0 (patch)
tree: 3c0ab722c83ef33913ad293af7d56ce2c4e1fcc9 /doc/flex.info-1
parent: edc848712307fe5c881364e12e520e9fe58d9969 (diff)
1 files changed, 1251 insertions, 0 deletions
diff --git a/doc/flex.info-1 b/doc/flex.info-1
new file mode 100644
index 0000000..178d382
--- /dev/null
+++ b/doc/flex.info-1
@@ -0,0 +1,1251 @@
+This is flex.info, produced by makeinfo version 4.3d from flex.texi.
+
+INFO-DIR-SECTION Programming
+START-INFO-DIR-ENTRY
+* flex: (flex).      Fast lexical analyzer generator (lex replacement).
+END-INFO-DIR-ENTRY
+
+
+   The flex manual is placed under the same licensing conditions as the
+rest of flex:
+
+   Copyright (C) 1990, 1997 The Regents of the University of California.
+All rights reserved.
+
+   This code is derived from software contributed to Berkeley by Vern
+Paxson.
+
+   The United States Government has rights in this work pursuant to
+contract no. DE-AC03-76SF00098 between the United States Department of
+Energy and the University of California.
+
+   Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+  1.  Redistributions of source code must retain the above copyright
+     notice, this list of conditions and the following disclaimer.
+
+  2. Redistributions in binary form must reproduce the above copyright
+     notice, this list of conditions and the following disclaimer in the
+     documentation and/or other materials provided with the
+     distribution.
+   Neither the name of the University nor the names of its contributors
+may be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
+WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
+
+File: flex.info,  Node: Top,  Next: Copyright,  Prev: (dir),  Up: (dir)
+
+flex
+****
+
+   This manual describes `flex', a tool for generating programs that
+perform pattern-matching on text.  The manual includes both tutorial and
+reference sections.
+
+   This edition of `The flex Manual' documents `flex' version 2.5.31.
+It was last updated on 27 March 2003.
+
+* Menu:
+
+* Copyright::
+* Reporting Bugs::
+* Introduction::
+* Simple Examples::
+* Format::
+* Patterns::
+* Matching::
+* Actions::
+* Generated Scanner::
+* Start Conditions::
+* Multiple Input Buffers::
+* EOF::
+* Misc Macros::
+* User Values::
+* Yacc::
+* Scanner Options::
+* Performance::
+* Cxx::
+* Reentrant::
+* Lex and Posix::
+* Memory Management::
+* Serialized Tables::
+* Diagnostics::
+* Limitations::
+* Bibliography::
+* FAQ::
+* Appendices::
+* Indices::
+
+ --- The Detailed Node Listing ---
+
+Format of the Input File
+
+* Definitions Section::
+* Rules Section::
+* User Code Section::
+* Comments in the Input::
+
+Scanner Options
+
+* Options for Specifing Filenames::
+* Options Affecting Scanner Behavior::
+* Code-Level And API Options::
+* Options for Scanner Speed and Size::
+* Debugging Options::
+* Miscellaneous Options::
+
+Reentrant C Scanners
+
+* Reentrant Uses::
+* Reentrant Overview::
+* Reentrant Example::
+* Reentrant Detail::
+* Reentrant Functions::
+
+The Reentrant API in Detail
+
+* Specify Reentrant::
+* Extra Reentrant Argument::
+* Global Replacement::
+* Init and Destroy Functions::
+* Accessor Methods::
+* Extra Data::
+* About yyscan_t::
+
+Memory Management
+
+* The Default Memory Management::
+* Overriding The Default Memory Management::
+* A Note About yytext And Memory::
+
+Serialized Tables
+
+* Creating Serialized Tables::
+* Loading and Unloading Serialized Tables::
+* Tables File Format::
+
+FAQ
+
+* When was flex born?::
+* How do I expand \ escape sequences in C-style quoted strings?::
+* Why do flex scanners call fileno if it is not ANSI compatible?::
+* Does flex support recursive pattern definitions?::
+* How do I skip huge chunks of input (tens of megabytes) while using flex?::
+* Flex is not matching my patterns in the same order that I defined them.::
+* My actions are executing out of order or sometimes not at all.::
+* How can I have multiple input sources feed into the same scanner at the same time?::
+* Can I build nested parsers that work with the same input file?::
+* How can I match text only at the end of a file?::
+* How can I make REJECT cascade across start condition boundaries?::
+* Why cant I use fast or full tables with interactive mode?::
+* How much faster is -F or -f than -C?::
+* If I have a simple grammar cant I just parse it with flex?::
+* Why doesnt yyrestart() set the start state back to INITIAL?::
+* How can I match C-style comments?::
+* The period isnt working the way I expected.::
+* Can I get the flex manual in another format?::
+* Does there exist a "faster" NDFA->DFA algorithm?::
+* How does flex compile the DFA so quickly?::
+* How can I use more than 8192 rules?::
+* How do I abandon a file in the middle of a scan and switch to a new file?::
+* How do I execute code only during initialization (only before the first scan)?::
+* How do I execute code at termination?::
+* Where else can I find help?::
+* Can I include comments in the "rules" section of the file?::
+* I get an error about undefined yywrap().::
+* How can I change the matching pattern at run time?::
+* How can I expand macros in the input?::
+* How can I build a two-pass scanner?::
+* How do I match any string not matched in the preceding rules?::
+* I am trying to port code from AT&T lex that uses yysptr and yysbuf.::
+* Is there a way to make flex treat NULL like a regular character?::
+* Whenever flex can not match the input it says "flex scanner jammed".::
+* Why doesnt flex have non-greedy operators like perl does?::
+* Memory leak - 16386 bytes allocated by malloc.::
+* How do I track the byte offset for lseek()?::
+* How do I use my own I/O classes in a C++ scanner?::
+* How do I skip as many chars as possible?::
+* deleteme00::
+* Are certain equivalent patterns faster than others?::
+* Is backing up a big deal?::
+* Can I fake multi-byte character support?::
+* deleteme01::
+* Can you discuss some flex internals?::
+* unput() messes up yy_at_bol::
+* The | operator is not doing what I want::
+* Why can't flex understand this variable trailing context pattern?::
+* The ^ operator isn't working::
+* Trailing context is getting confused with trailing optional patterns::
+* Is flex GNU or not?::
+* ERASEME53::
+* I need to scan if-then-else blocks and while loops::
+* ERASEME55::
+* ERASEME56::
+* ERASEME57::
+* Is there a repository for flex scanners?::
+* How can I conditionally compile or preprocess my flex input file?::
+* Where can I find grammars for lex and yacc?::
+* I get an end-of-buffer message for each character scanned.::
+* unnamed-faq-62::
+* unnamed-faq-63::
+* unnamed-faq-64::
+* unnamed-faq-65::
+* unnamed-faq-66::
+* unnamed-faq-67::
+* unnamed-faq-68::
+* unnamed-faq-69::
+* unnamed-faq-70::
+* unnamed-faq-71::
+* unnamed-faq-72::
+* unnamed-faq-73::
+* unnamed-faq-74::
+* unnamed-faq-75::
+* unnamed-faq-76::
+* unnamed-faq-77::
+* unnamed-faq-78::
+* unnamed-faq-79::
+* unnamed-faq-80::
+* unnamed-faq-81::
+* unnamed-faq-82::
+* unnamed-faq-83::
+* unnamed-faq-84::
+* unnamed-faq-85::
+* unnamed-faq-86::
+* unnamed-faq-87::
+* unnamed-faq-88::
+* unnamed-faq-90::
+* unnamed-faq-91::
+* unnamed-faq-92::
+* unnamed-faq-93::
+* unnamed-faq-94::
+* unnamed-faq-95::
+* unnamed-faq-96::
+* unnamed-faq-97::
+* unnamed-faq-98::
+* unnamed-faq-99::
+* unnamed-faq-100::
+* unnamed-faq-101::
+
+Appendices
+
+* Makefiles and Flex::
+* Bison Bridge::
+* M4 Dependency::
+
+Indices
+
+* Concept Index::
+* Index of Functions and Macros::
+* Index of Variables::
+* Index of Data Types::
+* Index of Hooks::
+* Index of Scanner Options::
+
+
+File: flex.info,  Node: Copyright,  Next: Reporting Bugs,  Prev: Top,  Up: Top
+
+Copyright
+*********
+
+
+   The flex manual is placed under the same licensing conditions as the
+rest of flex:
+
+   Copyright (C) 1990, 1997 The Regents of the University of California.
+All rights reserved.
+
+   This code is derived from software contributed to Berkeley by Vern
+Paxson.
+
+   The United States Government has rights in this work pursuant to
+contract no. DE-AC03-76SF00098 between the United States Department of
+Energy and the University of California.
+
+   Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+  1.  Redistributions of source code must retain the above copyright
+     notice, this list of conditions and the following disclaimer.
+
+  2. Redistributions in binary form must reproduce the above copyright
+     notice, this list of conditions and the following disclaimer in the
+     documentation and/or other materials provided with the
+     distribution.
+   Neither the name of the University nor the names of its contributors
+may be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
+WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
+
+File: flex.info,  Node: Reporting Bugs,  Next: Introduction,  Prev: Copyright,  Up: Top
+
+Reporting Bugs
+**************
+
+   If you have problems with `flex' or think you have found a bug,
+please send mail detailing your problem to
+<lex-help@lists.sourceforge.net>. Patches are always welcome.
+
+
+File: flex.info,  Node: Introduction,  Next: Simple Examples,  Prev: Reporting Bugs,  Up: Top
+
+Introduction
+************
+
+   `flex' is a tool for generating "scanners".  A scanner is a program
+which recognizes lexical patterns in text.  The `flex' program reads
+the given input files, or its standard input if no file names are
+given, for a description of a scanner to generate.  The description is
+in the form of pairs of regular expressions and C code, called "rules".
+`flex' generates as output a C source file, `lex.yy.c' by default,
+which defines a routine `yylex()'.  This file can be compiled and
+linked with the flex runtime library to produce an executable.  When
+the executable is run, it analyzes its input for occurrences of the
+regular expressions.  Whenever it finds one, it executes the
+corresponding C code.
+
+
+File: flex.info,  Node: Simple Examples,  Next: Format,  Prev: Introduction,  Up: Top
+
+Some Simple Examples
+********************
+
+   First some simple examples to get the flavor of how one uses `flex'.
+
+   The following `flex' input specifies a scanner which, when it
+encounters the string `username' will replace it with the user's login
+name:
+
+
+         %%
+         username    printf( "%s", getlogin() );
+
+   By default, any text not matched by a `flex' scanner is copied to
+the output, so the net effect of this scanner is to copy its input file
+to its output with each occurrence of `username' expanded.  In this
+input, there is just one rule.  `username' is the "pattern" and the
+`printf' is the "action".  The `%%' symbol marks the beginning of the
+rules.
+
+   Here's another simple example:
+
+
+                 int num_lines = 0, num_chars = 0;
+     
+         %%
+         \n      ++num_lines; ++num_chars;
+         .       ++num_chars;
+     
+         %%
+         main()
+                 {
+                 yylex();
+                 printf( "# of lines = %d, # of chars = %d\n",
+                         num_lines, num_chars );
+                 }
+
+   This scanner counts the number of characters and the number of lines
+in its input. It produces no output other than the final report on the
+character and line counts.  The first line declares two globals,
+`num_lines' and `num_chars', which are accessible both inside `yylex()'
+and in the `main()' routine declared after the second `%%'.  There are
+two rules, one which matches a newline (`\n') and increments both the
+line count and the character count, and one which matches any character
+other than a newline (indicated by the `.' regular expression).
+
+   A somewhat more complicated example:
+
+
+         /* scanner for a toy Pascal-like language */
+     
+         %{
+         /* need this for the call to atof() below */
+         #include math.h>
+         %}
+     
+         DIGIT    [0-9]
+         ID       [a-z][a-z0-9]*
+     
+         %%
+     
+         {DIGIT}+    {
+                     printf( "An integer: %s (%d)\n", yytext,
+                             atoi( yytext ) );
+                     }
+     
+         {DIGIT}+"."{DIGIT}*        {
+                     printf( "A float: %s (%g)\n", yytext,
+                             atof( yytext ) );
+                     }
+     
+         if|then|begin|end|procedure|function        {
+                     printf( "A keyword: %s\n", yytext );
+                     }
+     
+         {ID}        printf( "An identifier: %s\n", yytext );
+     
+         "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
+     
+         "{"[\^{}}\n]*"}"     /* eat up one-line comments */
+     
+         [ \t\n]+          /* eat up whitespace */
+     
+         .           printf( "Unrecognized character: %s\n", yytext );
+     
+         %%
+     
+         main( argc, argv )
+         int argc;
+         char **argv;
+             {
+             ++argv, --argc;  /* skip over program name */
+             if ( argc > 0 )
+                     yyin = fopen( argv[0], "r" );
+             else
+                     yyin = stdin;
+     
+             yylex();
+             }
+
+   This is the beginnings of a simple scanner for a language like
+Pascal.  It identifies different types of "tokens" and reports on what
+it has seen.
+
+   The details of this example will be explained in the following
+sections.
+
+
+File: flex.info,  Node: Format,  Next: Patterns,  Prev: Simple Examples,  Up: Top
+
+Format of the Input File
+************************
+
+   The `flex' input file consists of three sections, separated by a
+line containing only `%%'.
+
+
+         definitions
+         %%
+         rules
+         %%
+         user code
+
+* Menu:
+
+* Definitions Section::
+* Rules Section::
+* User Code Section::
+* Comments in the Input::
+
+
+File: flex.info,  Node: Definitions Section,  Next: Rules Section,  Prev: Format,  Up: Format
+
+Format of the Definitions Section
+=================================
+
+   The "definitions section" contains declarations of simple "name"
+definitions to simplify the scanner specification, and declarations of
+"start conditions", which are explained in a later section.
+
+   Name definitions have the form:
+
+
+         name definition
+
+   The `name' is a word beginning with a letter or an underscore (`_')
+followed by zero or more letters, digits, `_', or `-' (dash).  The
+definition is taken to begin at the first non-whitespace character
+following the name and continuing to the end of the line.  The
+definition can subsequently be referred to using `{name}', which will
+expand to `(definition)'.  For example,
+
+
+         DIGIT    [0-9]
+         ID       [a-z][a-z0-9]*
+
+   Defines `DIGIT' to be a regular expression which matches a single
+digit, and `ID' to be a regular expression which matches a letter
+followed by zero-or-more letters-or-digits.  A subsequent reference to
+
+
+         {DIGIT}+"."{DIGIT}*
+
+   is identical to
+
+
+         ([0-9])+"."([0-9])*
+
+   and matches one-or-more digits followed by a `.' followed by
+zero-or-more digits.
+
+   An unindented comment (i.e., a line beginning with `/*') is copied
+verbatim to the output up to the next `*/'.
+
+   Any _indented_ text or text enclosed in `%{' and `%}' is also copied
+verbatim to the output (with the %{ and %} symbols removed).  The %{
+and %} symbols must appear unindented on lines by themselves.
+
+   A `%top' block is similar to a `%{' ... `%}' block, except that the
+code in a `%top' block is relocated to the _top_ of the generated file,
+before any flex definitions (1).  The `%top' block is useful when you
+want certain preprocessor macros to be defined or certain files to be
+included before the generated code.  The single characters, `{'  and
+`}' are used to delimit the `%top' block, as show in the example below:
+
+
+         %top{
+             /* This code goes at the "top" of the generated file. */
+             #include <stdint.h>
+             #include <inttypes.h>
+         }
+
+   Multiple `%top' blocks are allowed, and their order is preserved.
+
+   ---------- Footnotes ----------
+
+   (1) Actually, `yyIN_HEADER' is defined before the `%top' block.
+
+
+File: flex.info,  Node: Rules Section,  Next: User Code Section,  Prev: Definitions Section,  Up: Format
+
+Format of the Rules Section
+===========================
+
+   The "rules" section of the `flex' input contains a series of rules
+of the form:
+
+
+         pattern   action
+
+   where the pattern must be unindented and the action must begin on
+the same line.  *Note Patterns::, for a further description of patterns
+and actions.
+
+   In the rules section, any indented or %{ %} enclosed text appearing
+before the first rule may be used to declare variables which are local
+to the scanning routine and (after the declarations) code which is to be
+executed whenever the scanning routine is entered.  Other indented or
+%{ %} text in the rule section is still copied to the output, but its
+meaning is not well-defined and it may well cause compile-time errors
+(this feature is present for POSIX compliance. *Note Lex and Posix::,
+for other such features).
+
+   Any _indented_ text or text enclosed in `%{' and `%}' is copied
+verbatim to the output (with the %{ and %} symbols removed).  The %{
+and %} symbols must appear unindented on lines by themselves.
+
+
+File: flex.info,  Node: User Code Section,  Next: Comments in the Input,  Prev: Rules Section,  Up: Format
+
+Format of the User Code Section
+===============================
+
+   The user code section is simply copied to `lex.yy.c' verbatim.  It
+is used for companion routines which call or are called by the scanner.
+The presence of this section is optional; if it is missing, the second
+`%%' in the input file may be skipped, too.
+
+
+File: flex.info,  Node: Comments in the Input,  Prev: User Code Section,  Up: Format
+
+Comments in the Input
+=====================
+
+   Flex supports C-style comments, that is, anything between /* and */
+is considered a comment. Whenever flex encounters a comment, it copies
+the entire comment verbatim to the generated source code. Comments may
+appear just about anywhere, but with the following exceptions:
+
+   * Comments may not appear in the Rules Section wherever flex is
+     expecting a regular expression. This means comments may not appear
+     at the beginning of a line, or immediately following a list of
+     scanner states.
+
+   * Comments may not appear on an `%option' line in the Definitions
+     Section.
+
+   If you want to follow a simple rule, then always begin a comment on a
+new line, with one or more whitespace characters before the initial
+`/*').  This rule will work anywhere in the input file.
+
+   All the comments in the following example are valid:
+
+
+     %{
+     /* code block */
+     %}
+     
+     /* Definitions Section */
+     %x STATE_X
+     
+     %%
+         /* Rules Section */
+     ruleA   /* after regex */ { /* code block */ } /* after code block */
+             /* Rules Section (indented) */
+     <STATE_X>{
+     ruleC   ECHO;
+     ruleD   ECHO;
+     %{
+     /* code block */
+     %}
+     }
+     %%
+     /* User Code Section */
+
+
+File: flex.info,  Node: Patterns,  Next: Matching,  Prev: Format,  Up: Top
+
+Patterns
+********
+
+   The patterns in the input (see *Note Rules Section::) are written
+using an extended set of regular expressions.  These are:
+
+`x'
+     match the character 'x'
+
+`.'
+     any character (byte) except newline
+
+`[xyz]'
+     a "character class"; in this case, the pattern matches either an
+     'x', a 'y', or a 'z'
+
+`[abj-oZ]'
+     a "character class" with a range in it; matches an 'a', a 'b', any
+     letter from 'j' through 'o', or a 'Z'
+
+`[^A-Z]'
+     a "negated character class", i.e., any character but those in the
+     class.  In this case, any character EXCEPT an uppercase letter.
+
+`[^A-Z\n]'
+     any character EXCEPT an uppercase letter or a newline
+
+`r*'
+     zero or more r's, where r is any regular expression
+
+`r+'
+     one or more r's
+
+`r?'
+     zero or one r's (that is, "an optional r")
+
+`r{2,5}'
+     anywhere from two to five r's
+
+`r{2,}'
+     two or more r's
+
+`r{4}'
+     exactly 4 r's
+
+`{name}'
+     the expansion of the `name' definition (*note Format::).
+
+`"[xyz]\"foo"'
+     the literal string: `[xyz]"foo'
+
+`\X'
+     if X is `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C
+     interpretation of `\x'.  Otherwise, a literal `X' (used to escape
+     operators such as `*')
+
+`\0'
+     a NUL character (ASCII code 0)
+
+`\123'
+     the character with octal value 123
+
+`\x2a'
+     the character with hexadecimal value 2a
+
+`(r)'
+     match an `r'; parentheses are used to override precedence (see
+     below)
+
+`rs'
+     the regular expression `r' followed by the regular expression `s';
+     called "concatenation"
+
+`r|s'
+     either an `r' or an `s'
+
+`r/s'
+     an `r' but only if it is followed by an `s'.  The text matched by
+     `s' is included when determining whether this rule is the longest
+     match, but is then returned to the input before the action is
+     executed.  So the action only sees the text matched by `r'.  This
+     type of pattern is called "trailing context".  (There are some
+     combinations of `r/s' that flex cannot match correctly. *Note
+     Limitations::, regarding dangerous trailing context.)
+
+`^r'
+     an `r', but only at the beginning of a line (i.e., when just
+     starting to scan, or right after a newline has been scanned).
+
+`r$'
+     an `r', but only at the end of a line (i.e., just before a
+     newline).  Equivalent to `r/\n'.
+
+     Note that `flex''s notion of "newline" is exactly whatever the C
+     compiler used to compile `flex' interprets `\n' as; in particular,
+     on some DOS systems you must either filter out `\r's in the input
+     yourself, or explicitly use `r/\r\n' for `r$'.
+
+`<s>r'
+     an `r', but only in start condition `s' (see *Note Start
+     Conditions:: for discussion of start conditions).
+
+`<s1,s2,s3>r'
+     same, but in any of start conditions `s1', `s2', or `s3'.
+
+`<*>r'
+     an `r' in any start condition, even an exclusive one.
+
+`<<EOF>>'
+     an end-of-file.
+
+`<s1,s2><<EOF>>'
+     an end-of-file when in start condition `s1' or `s2'
+
+   Note that inside of a character class, all regular expression
+operators lose their special meaning except escape (`\') and the
+character class operators, `-', `]]', and, at the beginning of the
+class, `^'.
+
+   The regular expressions listed above are grouped according to
+precedence, from highest precedence at the top to lowest at the bottom.
+Those grouped together have equal precedence (see special note on the
+precedence of the repeat operator, `{}', under the documentation for
+the `--posix' POSIX compliance option).  For example,
+
+
+         foo|bar*
+
+   is the same as
+
+
+         (foo)|(ba(r*))
+
+   since the `*' operator has higher precedence than concatenation, and
+concatenation higher than alternation (`|').  This pattern therefore
+matches _either_ the string `foo' _or_ the string `ba' followed by
+zero-or-more `r''s.  To match `foo' or zero-or-more repetitions of the
+string `bar', use:
+
+
+         foo|(bar)*
+
+   And to match a sequence of zero or more repetitions of `foo' and
+`bar':
+
+
+         (foo|bar)*
+
+   In addition to characters and ranges of characters, character classes
+can also contain "character class expressions".  These are expressions
+enclosed inside `[': and `:]' delimiters (which themselves must appear
+between the `[' and `]' of the character class. Other elements may
+occur inside the character class, too).  The valid expressions are:
+
+
+         [:alnum:] [:alpha:] [:blank:]
+         [:cntrl:] [:digit:] [:graph:]
+         [:lower:] [:print:] [:punct:]
+         [:space:] [:upper:] [:xdigit:]
+
+   These expressions all designate a set of characters equivalent to the
+corresponding standard C `isXXX' function.  For example, `[:alnum:]'
+designates those characters for which `isalnum()' returns true - i.e.,
+any alphabetic or numeric character.  Some systems don't provide
+`isblank()', so flex defines `[:blank:]' as a blank or a tab.
+
+   For example, the following character classes are all equivalent:
+
+
+         [[:alnum:]]
+         [[:alpha:][:digit:]]
+         [[:alpha:][0-9]]
+         [a-zA-Z0-9]
+
+   Some notes on patterns are in order.
+
+   * If your scanner is case-insensitive (the `-i' flag), then
+     `[:upper:]' and `[:lower:]' are equivalent to `[:alpha:]'.
+
+   * Character classes with ranges, such as `[a-Z]', should be used with
+     caution in a case-insensitive scanner if the range spans upper or
+     lowercase characters. Flex does not know if you want to fold all
+     upper and lowercase characters together, or if you want the
+     literal numeric range specified (with no case folding). When in
+     doubt, flex will assume that you meant the literal numeric range,
+     and will issue a warning. The exception to this rule is a
+     character range such as `[a-z]' or `[S-W]' where it is obvious
+     that you want case-folding to occur. Here are some examples with
+     the `-i' flag enabled:
+
+     Range        Result      Literal Range        Alternate Range
+     `[a-t]'      ok          `[a-tA-T]'           
+     `[A-T]'      ok          `[a-tA-T]'           
+     `[A-t]'      ambiguous   `[A-Z\[\\\]_`a-t]'   `[a-tA-T]'
+     `[_-{]'      ambiguous   `[_`a-z{]'           `[_`a-zA-Z{]'
+     `[@-C]'      ambiguous   `[@ABC]'             `[@A-Z\[\\\]_`abc]'
+
+   * A negated character class such as the example `[^A-Z]' above
+     _will_ match a newline unless `\n' (or an equivalent escape
+     sequence) is one of the characters explicitly present in the
+     negated character class (e.g., `[^A-Z\n]').  This is unlike how
+     many other regular expression tools treat negated character
+     classes, but unfortunately the inconsistency is historically
+     entrenched.  Matching newlines means that a pattern like `[^"]*'
+     can match the entire input unless there's another quote in the
+     input.
+
+   * A rule can have at most one instance of trailing context (the `/'
+     operator or the `$' operator).  The start condition, `^', and
+     `<<EOF>>' patterns can only occur at the beginning of a pattern,
+     and, as well as with `/' and `$', cannot be grouped inside
+     parentheses.  A `^' which does not occur at the beginning of a
+     rule or a `$' which does not occur at the end of a rule loses its
+     special properties and is treated as a normal character.
+
+   * The following are invalid:
+
+
+              foo/bar$
+              <sc1>foo<sc2>bar
+
+     Note that the first of these can be written `foo/bar\n'.
+
+   * The following will result in `$' or `^' being treated as a normal
+     character:
+
+
+              foo|(bar$)
+              foo|^bar
+
+     If the desired meaning is a `foo' or a
+     `bar'-followed-by-a-newline, the following could be used (the
+     special `|' action is explained below, *note Actions::):
+
+
+              foo      |
+              bar$     /* action goes here */
+
+     A similar trick will work for matching a `foo' or a
+     `bar'-at-the-beginning-of-a-line.
+
+
+File: flex.info,  Node: Matching,  Next: Actions,  Prev: Patterns,  Up: Top
+
+How the Input Is Matched
+************************
+
+   When the generated scanner is run, it analyzes its input looking for
+strings which match any of its patterns.  If it finds more than one
+match, it takes the one matching the most text (for trailing context
+rules, this includes the length of the trailing part, even though it
+will then be returned to the input).  If it finds two or more matches of
+the same length, the rule listed first in the `flex' input file is
+chosen.
+
+   Once the match is determined, the text corresponding to the match
+(called the "token") is made available in the global character pointer
+`yytext', and its length in the global integer `yyleng'.  The "action"
+corresponding to the matched pattern is then executed (*note
+Actions::), and then the remaining input is scanned for another match.
+
+   If no match is found, then the "default rule" is executed: the next
+character in the input is considered matched and copied to the standard
+output.  Thus, the simplest valid `flex' input is:
+
+
+         %%
+
+   which generates a scanner that simply copies its input (one
+character at a time) to its output.
+
+   Note that `yytext' can be defined in two different ways: either as a
+character _pointer_ or as a character _array_. You can control which
+definition `flex' uses by including one of the special directives
+`%pointer' or `%array' in the first (definitions) section of your flex
+input.  The default is `%pointer', unless you use the `-l' lex
+compatibility option, in which case `yytext' will be an array.  The
+advantage of using `%pointer' is substantially faster scanning and no
+buffer overflow when matching very large tokens (unless you run out of
+dynamic memory).  The disadvantage is that you are restricted in how
+your actions can modify `yytext' (*note Actions::), and calls to the
+`unput()' function destroys the present contents of `yytext', which can
+be a considerable porting headache when moving between different `lex'
+versions.
+
+   The advantage of `%array' is that you can then modify `yytext' to
+your heart's content, and calls to `unput()' do not destroy `yytext'
+(*note Actions::).  Furthermore, existing `lex' programs sometimes
+access `yytext' externally using declarations of the form:
+
+
+         extern char yytext[];
+
+   This definition is erroneous when used with `%pointer', but correct
+for `%array'.
+
+   The `%array' declaration defines `yytext' to be an array of `YYLMAX'
+characters, which defaults to a fairly large value.  You can change the
+size by simply #define'ing `YYLMAX' to a different value in the first
+section of your `flex' input.  As mentioned above, with `%pointer'
+yytext grows dynamically to accommodate large tokens.  While this means
+your `%pointer' scanner can accommodate very large tokens (such as
+matching entire blocks of comments), bear in mind that each time the
+scanner must resize `yytext' it also must rescan the entire token from
+the beginning, so matching such tokens can prove slow.  `yytext'
+presently does _not_ dynamically grow if a call to `unput()' results in
+too much text being pushed back; instead, a run-time error results.
+
+   Also note that you cannot use `%array' with C++ scanner classes
+(*note Cxx::).
+
+
+File: flex.info,  Node: Actions,  Next: Generated Scanner,  Prev: Matching,  Up: Top
+
+Actions
+*******
+
+   Each pattern in a rule has a corresponding "action", which can be
+any arbitrary C statement.  The pattern ends at the first non-escaped
+whitespace character; the remainder of the line is its action.  If the
+action is empty, then when the pattern is matched the input token is
+simply discarded.  For example, here is the specification for a program
+which deletes all occurrences of `zap me' from its input:
+
+
+         %%
+         "zap me"
+
+   This example will copy all other characters in the input to the
+output since they will be matched by the default rule.
+
+   Here is a program which compresses multiple blanks and tabs down to a
+single blank, and throws away whitespace found at the end of a line:
+
+
+         %%
+         [ \t]+        putchar( ' ' );
+         [ \t]+$       /* ignore this token */
+
+   If the action contains a `}', then the action spans till the
+balancing `}' is found, and the action may cross multiple lines.
+`flex' knows about C strings and comments and won't be fooled by braces
+found within them, but also allows actions to begin with `%{' and will
+consider the action to be all the text up to the next `%}' (regardless
+of ordinary braces inside the action).
+
+   An action consisting solely of a vertical bar (`|') means "same as
+the action for the next rule".  See below for an illustration.
+
+   Actions can include arbitrary C code, including `return' statements
+to return a value to whatever routine called `yylex()'.  Each time
+`yylex()' is called it continues processing tokens from where it last
+left off until it either reaches the end of the file or executes a
+return.
+
+   Actions are free to modify `yytext' except for lengthening it
+(adding characters to its end-these will overwrite later characters in
+the input stream).  This however does not apply when using `%array'
+(*note Matching::). In that case, `yytext' may be freely modified in
+any way.
+
+   Actions are free to modify `yyleng' except they should not do so if
+the action also includes use of `yymore()' (see below).
+
+   There are a number of special directives which can be included
+within an action:
+
+`ECHO'
+     copies yytext to the scanner's output.
+
+`BEGIN'
+     followed by the name of a start condition places the scanner in the
+     corresponding start condition (see below).
+
+`REJECT'
+     directs the scanner to proceed on to the "second best" rule which
+     matched the input (or a prefix of the input).  The rule is chosen
+     as described above in *Note Matching::, and `yytext' and `yyleng'
+     set up appropriately.  It may either be one which matched as much
+     text as the originally chosen rule but came later in the `flex'
+     input file, or one which matched less text.  For example, the
+     following will both count the words in the input and call the
+     routine `special()' whenever `frob' is seen:
+
+
+                      int word_count = 0;
+              %%
+          
+              frob        special(); REJECT;
+              [^ \t\n]+   ++word_count;
+
+     Without the `REJECT', any occurences of `frob' in the input would
+     not be counted as words, since the scanner normally executes only
+     one action per token.  Multiple uses of `REJECT' are allowed, each
+     one finding the next best choice to the currently active rule.  For
+     example, when the following scanner scans the token `abcd', it will
+     write `abcdabcaba' to the output:
+
+
+              %%
+              a        |
+              ab       |
+              abc      |
+              abcd     ECHO; REJECT;
+              .|\n     /* eat up any unmatched character */
+
+     The first three rules share the fourth's action since they use the
+     special `|' action.
+
+     `REJECT' is a particularly expensive feature in terms of scanner
+     performance; if it is used in _any_ of the scanner's actions it
+     will slow down _all_ of the scanner's matching.  Furthermore,
+     `REJECT' cannot be used with the `-Cf' or `-CF' options (*note
+     Scanner Options::).
+
+     Note also that unlike the other special actions, `REJECT' is a
+     _branch_.  code immediately following it in the action will _not_
+     be executed.
+
+`yymore()'
+     tells the scanner that the next time it matches a rule, the
+     corresponding token should be _appended_ onto the current value of
+     `yytext' rather than replacing it.  For example, given the input
+     `mega-kludge' the following will write `mega-mega-kludge' to the
+     output:
+
+
+              %%
+              mega-    ECHO; yymore();
+              kludge   ECHO;
+
+     First `mega-' is matched and echoed to the output.  Then `kludge'
+     is matched, but the previous `mega-' is still hanging around at the
+     beginning of `yytext' so the `ECHO' for the `kludge' rule will
+     actually write `mega-kludge'.
+
+   Two notes regarding use of `yymore()'.  First, `yymore()' depends on
+the value of `yyleng' correctly reflecting the size of the current
+token, so you must not modify `yyleng' if you are using `yymore()'.
+Second, the presence of `yymore()' in the scanner's action entails a
+minor performance penalty in the scanner's matching speed.
+
+   `yyless(n)' returns all but the first `n' characters of the current
+token back to the input stream, where they will be rescanned when the
+scanner looks for the next match.  `yytext' and `yyleng' are adjusted
+appropriately (e.g., `yyleng' will now be equal to `n').  For example,
+on the input `foobar' the following will write out `foobarbar':
+
+
+         %%
+         foobar    ECHO; yyless(3);
+         [a-z]+    ECHO;
+
+   An argument of 0 to `yyless()' will cause the entire current input
+string to be scanned again.  Unless you've changed how the scanner will
+subsequently process its input (using `BEGIN', for example), this will
+result in an endless loop.
+
+   Note that `yyless()' is a macro and can only be used in the flex
+input file, not from other source files.
+
+   `unput(c)' puts the character `c' back onto the input stream.  It
+will be the next character scanned.  The following action will take the
+current token and cause it to be rescanned enclosed in parentheses.
+
+
+         {
+         int i;
+         /* Copy yytext because unput() trashes yytext */
+         char *yycopy = strdup( yytext );
+         unput( ')' );
+         for ( i = yyleng - 1; i >= 0; --i )
+             unput( yycopy[i] );
+         unput( '(' );
+         free( yycopy );
+         }
+
+   Note that since each `unput()' puts the given character back at the
+_beginning_ of the input stream, pushing back strings must be done
+back-to-front.
+
+   An important potential problem when using `unput()' is that if you
+are using `%pointer' (the default), a call to `unput()' _destroys_ the
+contents of `yytext', starting with its rightmost character and
+devouring one character to the left with each call.  If you need the
+value of `yytext' preserved after a call to `unput()' (as in the above
+example), you must either first copy it elsewhere, or build your
+scanner using `%array' instead (*note Matching::).
+
+   Finally, note that you cannot put back `EOF' to attempt to mark the
+input stream with an end-of-file.
+
+   `input()' reads the next character from the input stream.  For
+example, the following is one way to eat up C comments:
+
+
+         %%
+         "/*"        {
+                     register int c;
+     
+                     for ( ; ; )
+                         {
+                         while ( (c = input()) != '*' &&
+                                 c != EOF )
+                             ;    /* eat up text of comment */
+     
+                         if ( c == '*' )
+                             {
+                             while ( (c = input()) == '*' )
+                                 ;
+                             if ( c == '/' )
+                                 break;    /* found the end */
+                             }
+     
+                         if ( c == EOF )
+                             {
+                             error( "EOF in comment" );
+                             break;
+                             }
+                         }
+                     }
+
+   (Note that if the scanner is compiled using `C++', then `input()' is
+instead referred to as yyinput(), in order to avoid a name clash with
+the `C++' stream by the name of `input'.)
+
+   `YY_FLUSH_BUFFER()' flushes the scanner's internal buffer so that
+the next time the scanner attempts to match a token, it will first
+refill the buffer using `YY_INPUT()' (*note Generated Scanner::).  This
+action is a special case of the more general `yy_flush_buffer()'
+function, described below (*note Multiple Input Buffers::)
+
+   `yyterminate()' can be used in lieu of a return statement in an
+action.  It terminates the scanner and returns a 0 to the scanner's
+caller, indicating "all done".  By default, `yyterminate()' is also
+called when an end-of-file is encountered.  It is a macro and may be
+redefined.
+
+
+File: flex.info,  Node: Generated Scanner,  Next: Start Conditions,  Prev: Actions,  Up: Top
+
+The Generated Scanner
+*********************
+
+   The output of `flex' is the file `lex.yy.c', which contains the
+scanning routine `yylex()', a number of tables used by it for matching
+tokens, and a number of auxiliary routines and macros.  By default,
+`yylex()' is declared as follows:
+
+
+         int yylex()
+             {
+             ... various definitions and the actions in here ...
+             }
+
+   (If your environment supports function prototypes, then it will be
+`int yylex( void )'.)  This definition may be changed by defining the
+`YY_DECL' macro.  For example, you could use:
+
+
+         #define YY_DECL float lexscan( a, b ) float a, b;
+
+   to give the scanning routine the name `lexscan', returning a float,
+and taking two floats as arguments.  Note that if you give arguments to
+the scanning routine using a K&R-style/non-prototyped function
+declaration, you must terminate the definition with a semi-colon (;).
+
+   `flex' generates `C99' function definitions by default. However flex
+does have the ability to generate obsolete, er, `traditional', function
+definitions. This is to support bootstrapping gcc on old systems.
+Unfortunately, traditional definitions prevent us from using any
+standard data types smaller than int (such as short, char, or bool) as
+function arguments.  For this reason, future versions of `flex' may
+generate standard C99 code only, leaving K&R-style functions to the
+historians.  Currently, if you do *not* want `C99' definitions, then
+you must use `%option noansi-definitions'.
+
+   Whenever `yylex()' is called, it scans tokens from the global input
+file `yyin' (which defaults to stdin).  It continues until it either
+reaches an end-of-file (at which point it returns the value 0) or one
+of its actions executes a `return' statement.
+
+   If the scanner reaches an end-of-file, subsequent calls are undefined
+unless either `yyin' is pointed at a new input file (in which case
+scanning continues from that file), or `yyrestart()' is called.
+`yyrestart()' takes one argument, a `FILE *' pointer (which can be
+NULL, if you've set up `YY_INPUT' to scan from a source other than
+`yyin'), and initializes `yyin' for scanning from that file.
+Essentially there is no difference between just assigning `yyin' to a
+new input file or using `yyrestart()' to do so; the latter is available
+for compatibility with previous versions of `flex', and because it can
+be used to switch input files in the middle of scanning.  It can also
+be used to throw away the current input buffer, by calling it with an
+argument of `yyin'; but it would be better to use `YY_FLUSH_BUFFER'
+(*note Actions::).  Note that `yyrestart()' does _not_ reset the start
+condition to `INITIAL' (*note Start Conditions::).
+
+   If `yylex()' stops scanning due to executing a `return' statement in
+one of the actions, the scanner may then be called again and it will
+resume scanning where it left off.
+
+   By default (and for purposes of efficiency), the scanner uses
+block-reads rather than simple `getc()' calls to read characters from
+`yyin'.  The nature of how it gets its input can be controlled by
+defining the `YY_INPUT' macro.  The calling sequence for `YY_INPUT()'
+is `YY_INPUT(buf,result,max_size)'.  Its action is to place up to
+`max_size' characters in the character array `buf' and return in the
+integer variable `result' either the number of characters read or the
+constant `YY_NULL' (0 on Unix systems) to indicate `EOF'.  The default
+`YY_INPUT' reads from the global file-pointer `yyin'.
+
+   Here is a sample definition of `YY_INPUT' (in the definitions
+section of the input file):
+
+
+         %{
+         #define YY_INPUT(buf,result,max_size) \
+             { \
+             int c = getchar(); \
+             result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
+             }
+         %}
+
+   This definition will change the input processing to occur one
+character at a time.
+
+   When the scanner receives an end-of-file indication from YY_INPUT, it
+then checks the `yywrap()' function.  If `yywrap()' returns false
+(zero), then it is assumed that the function has gone ahead and set up
+`yyin' to point to another input file, and scanning continues.  If it
+returns true (non-zero), then the scanner terminates, returning 0 to
+its caller.  Note that in either case, the start condition remains
+unchanged; it does _not_ revert to `INITIAL'.
+
+   If you do not supply your own version of `yywrap()', then you must
+either use `%option noyywrap' (in which case the scanner behaves as
+though `yywrap()' returned 1), or you must link with `-lfl' to obtain
+the default version of the routine, which always returns 1.
+
+   For scanning from in-memory buffers (e.g., scanning strings), see
+*Note Scanning Strings::. *Note Multiple Input Buffers::.
+
+   The scanner writes its `ECHO' output to the `yyout' global (default,
+`stdout'), which may be redefined by the user simply by assigning it to
+some other `FILE' pointer.
+
author	Manoj Srivastava <srivasta@golden-gryphon.com>	2003-12-03 22:33:17 -0800
committer	Manoj Srivastava <srivasta@golden-gryphon.com>	2003-12-03 22:33:17 -0800
commit	c2b22e08bd48278f2cf125f054c9f6286e345ff0 (patch)
tree	3c0ab722c83ef33913ad293af7d56ce2c4e1fcc9 /doc/flex.info-1
parent	edc848712307fe5c881364e12e520e9fe58d9969 (diff)