*** empty log message ***

author: Vern Paxson <vern@ee.lbl.gov> 1990-02-26 17:59:14 +0000
committer: Vern Paxson <vern@ee.lbl.gov> 1990-02-26 17:59:14 +0000
commit: a4b5e58b7d2d495a77bc9f0ff4f1b0cda166626e (patch)
tree: a5a7d5bb4bf3d1e2690fd445a2bd1a7448fb785e /flex.1
parent: e234671be07e4dfa0ac24892392898da169007cb (diff)
1 files changed, 84 insertions, 50 deletions
diff --git a/flex.1 b/flex.1
index 2eef948..a05ca27 100644
--- a/flex.1
+++ b/flex.1
@@ -117,9 +117,9 @@ A somewhat more complicated example:
 
     "+"|"-"|"*"|"/"   printf( "An operator: %s\\n", yytext );
 
-    "{"[^}\\n]*"}"   /* eat up one-line comments */
+    "{"[^}\\n]*"}"     /* eat up one-line comments */
 
-    [ \\t\\n]+       /* eat up whitespace */
+    [ \\t\\n]+          /* eat up whitespace */
 
     .           printf( "Unrecognized character: %s\\n", yytext );
 
@@ -149,8 +149,9 @@ sections.
 .SH FORMAT OF THE INPUT FILE
 The
 .I flex
-input file consists of three sections, separated by
-.B %%:
+input file consists of three sections, separated by a line with just
+.B %%
+in it:
 .nf
 
     definitions
@@ -164,7 +165,7 @@ The
 .I definitions
 section contains declarations of simple
 .I name
-definitions to simplify the scanner specification and of
+definitions to simplify the scanner specification, and declarations of
 .I start conditions,
 which are explained in a later section.
 .LP
@@ -174,11 +175,11 @@ Name definitions have the form:
     name definition
 
 .fi
-The "name" is a word beginning with a letter or a '_'
-followed by zero or more letters, digits, '_', or '-'.
-The definition is taken to begin at the first non-white-space
-following the name and continue to the end of the line.
-Definition can subsequently be referred to using "{name}", which
+The "name" is a word beginning with a letter or an underscore ('_')
+followed by zero or more letters, digits, '_', or '-' (dash).
+The definition is taken to begin at the first non-white-space character
+following the name and continuing to the end of the line.
+The definition can subsequently be referred to using "{name}", which
 will expand to "(definition)".  For example,
 .nf
 
@@ -189,7 +190,7 @@ will expand to "(definition)".  For example,
 defines "DIGIT" to be a regular expression which matches a
 single digit, and
 "ID" to be a regular expression which matches a letter
-followed by zero-or-more letters or digits.
+followed by zero-or-more letters-or-digits.
 A subsequent reference to
 .nf
 
@@ -241,7 +242,7 @@ The %{}'s must appear unindented on lines by themselves.
 In the rules section,
 any indented or %{} text appearing before the
 first rule may be used to declare variables
-which are local to the scanning routine, and, after the declarations,
+which are local to the scanning routine and (after the declarations)
 code which is to be executed whenever the scanning routine is entered.
 Other indented or %{} text in the rule section is still copied to the output,
 but its meaning is not well-defined and it may well cause compile-time
@@ -251,7 +252,8 @@ compliance; see below for other such features).
 .LP
 In the definitions section, an unindented comment (i.e., a line
 beginning with "/*") is also copied verbatim to the output up
-to the next "*/".  Also, any line beginning with '#' is ignored.
+to the next "*/".  Also, any line in the definitions section
+beginning with '#' is ignored.
 .SH PATTERNS
 The patterns in the input are written using an extended set of regular
 expressions.  These are:
@@ -259,18 +261,16 @@ expressions.  These are:
 
     x          match the character 'x'
     .          any character except newline
-    [xyz]      an 'x', a 'y', or a 'z'
-    [abj-oZ]   an 'a', a 'b', any letter
-               from 'j' through 'o', or a 'Z'
-    [^A-Z]     any character EXCEPT an uppercase letter,
-               including a newline (unlike how many other
-               regular expression tools treat the '^'!).
-               This means that a pattern like [^"]* will
-               match an entire file (overflowing the input
-               buffer) unless there's another quote in
-               the input.
+    [xyz]      a "character class"; in this case, the pattern
+	         matches either an 'x', a 'y', or a 'z'
+    [abj-oZ]   a "character class" with a range in it; matches
+	         an 'a', a 'b', any letter from 'j' through 'o',
+                 or a 'Z'
+    [^A-Z]     a "negated character class", i.e., any character
+	         but those in the class.  In this case, any
+	         character EXCEPT an uppercase letter.
     [^A-Z\\n]   any character EXCEPT an uppercase letter or
-               a newline
+                 a newline
     r*         zero or more r's, where r is any regular expression
     r+         one or more r's
     r?         zero or one r's (that is, "an optional r")
@@ -281,32 +281,29 @@ expressions.  These are:
                (see above)
     "[xyz]\\"foo"
                the literal string: [xyz]"foo
-    \\x         if x is an 'a', 'b', 'f', 'n', 'r',
-               't', or 'v', then the ANSI-C
-               interpretation of \\x.  Otherwise,
-               a literal 'x' (used to escape
-               operators such as '*')
-    \\123      the character with octal value 123
-    \\x2a      the character with hexadecimal value 2a
-    (r)        match an r; parentheses are used
-               to override precedence (see below)
+    \\X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
+		 then the ANSI-C interpretation of \\x.
+		 Otherwise, a literal 'X' (used to escape
+                 operators such as '*')
+    \\123       the character with octal value 123
+    \\x2a       the character with hexadecimal value 2a
+    (r)        match an r; parentheses are used to override
+		 precedence (see below)
 
 
-    rs         the regular expression r followed
-               by the regular expression s; called
-               "concatenation"
+    rs         the regular expression r followed by the
+		 regular expression s; called "concatenation"
 
 
     r|s        either an r or an s
 
 
-    r/s        an r but only if it is followed by
-               an s.  The s is not part of the
-               matched text.  This type of
-               pattern is known as "trailing context".
+    r/s        an r but only if it is followed by an s.  The
+		 s is not part of the matched text.  This type
+		 of pattern is called as "trailing context".
     ^r         an r, but only at the beginning of a line
-    r$         an r, but only at the end of a line
-               (r must not use trailing context)
+    r$         an r, but only at the end of a line.  Equivalent
+		 to "r/\\n".
 
 
     <s>r       an r, but only in start condition s (see
@@ -348,12 +345,40 @@ To match "foo" or zero-or-more "bar"'s, use:
     foo|(bar)*
 
 .fi
-and to match zero-or-more "foo"'s or "bar"'s:
+and to match zero-or-more "foo"'s-or-"bar"'s:
 .nf
 
     (foo|bar)*
 
 .fi
+.LP
+Some notes on patterns:
+.IP -
+A negated character class such as the example "[^A-Z]"
+above
+.I will match a newline
+unless "\\n" (or an equivalent escape sequence) is one of the
+characters explicitly present in the negated character class
+(e.g., "[^A-Z\\n]").  This is unlike how many other regular
+expression tools treat negated character classes, but unfortunately
+the inconsistency is historically entrenched.
+Matching newlines means that a pattern like [^"]* can match an entire
+input (overflowing the scanner's input buffer) unless there's another
+quote in the input.
+.I -
+A rule can have at most one instance of trailing context (the '/' operator
+or the '$' operator).  The start condition, '^', and "<<EOF>>" patterns
+can only occur at the beginning of a pattern, and, as well as with '/' and '$',
+cannot be grouped inside parentheses.  The following are all illegal:
+.nf
+
+    foo/bar$
+    foo|(bar$)
+    foo|^bar
+    <sc1>foo<sc2>bar
+
+.fi
+(Note that the first of these, though, can be written "foo/bar\\n".)
 .SH HOW THE INPUT IS MATCHED
 When the generated scanner is run, it analyzes its input looking
 for strings which match any of its patterns.  If it finds more than
@@ -380,7 +405,7 @@ input is scanned for another match.
 .LP
 If no match is found, then the
 .I default rule
-is executed: the next character in the input is matched and
+is executed: the next character in the input is considered matched and
 copied to the standard output.  Thus, the simplest legal
 .I flex
 input is:
@@ -404,6 +429,9 @@ which deletes all occurrences of "zap me" from its input:
     "zap me"
 
 .fi
+(It will copy all other characters in the input to the output since
+they will be matched by the default rule.)
+.LP
 Here is a program which compresses multiple blanks and tabs down to
 a single blank, and throws away whitespace found at the end of a line:
 .nf
@@ -414,27 +442,33 @@ a single blank, and throws away whitespace found at the end of a line:
 
 .fi
 .LP
-If the action contains a '{', then the action spans till the balancing
-'}' is found, and the action may cross multiple lines.
+If the action contains a '{', then the action spans till the balancing '}'
+is found, and the action may cross multiple lines.
 .I flex 
 knows about C strings and comments and won't be fooled by braces found
 within them, but also allows actions to begin with
 .B %{
 and will consider the action to be all the text up to the next
-.B %}.
+.B %}
+(regardless of ordinary braces inside the action).
 .LP
 An action consisting solely of a vertical bar ('|') means "same as
-the action for the next rule.  See below for an illustration.
+the action for the next rule."  See below for an illustration.
 .LP
 Actions can include arbitrary C code, including
 .B return
-statements to return a value whatever routine called
+statements to return a value to whatever routine called
 .B yylex().
 Each time
 .B yylex()
 is called it continues processing tokens from where it last left
 off until it either reaches
-the end of the file or executes a return.
+the end of the file or executes a return.  Once it reaches an end-of-file,
+however, then any subsequent call to
+.B yylex()
+will simply immediately return, unless
+.B yyrestart()
+is first called (see below).
 .LP
 Actions are not allowed to modify yytext or yyleng.
 .LP
author	Vern Paxson <vern@ee.lbl.gov>	1990-02-26 17:59:14 +0000
committer	Vern Paxson <vern@ee.lbl.gov>	1990-02-26 17:59:14 +0000
commit	a4b5e58b7d2d495a77bc9f0ff4f1b0cda166626e (patch)
tree	a5a7d5bb4bf3d1e2690fd445a2bd1a7448fb785e /flex.1
parent	e234671be07e4dfa0ac24892392898da169007cb (diff)