diff options
Diffstat (limited to 'doc/html')
-rw-r--r-- | doc/html/NON-AUTOTOOLS-BUILD.txt | 16 | ||||
-rw-r--r-- | doc/html/README.txt | 29 | ||||
-rw-r--r-- | doc/html/pcre.html | 35 | ||||
-rw-r--r-- | doc/html/pcre_config.html | 6 | ||||
-rw-r--r-- | doc/html/pcre_fullinfo.html | 16 | ||||
-rw-r--r-- | doc/html/pcrepattern.html | 66 | ||||
-rw-r--r-- | doc/html/pcresyntax.html | 25 |
7 files changed, 148 insertions, 45 deletions
diff --git a/doc/html/NON-AUTOTOOLS-BUILD.txt b/doc/html/NON-AUTOTOOLS-BUILD.txt index cddf3e0..3910059 100644 --- a/doc/html/NON-AUTOTOOLS-BUILD.txt +++ b/doc/html/NON-AUTOTOOLS-BUILD.txt @@ -1,6 +1,14 @@ Building PCRE without using autotools ------------------------------------- +NOTE: This document relates to PCRE releases that use the original API, with +library names libpcre, libpcre16, and libpcre32. January 2015 saw the first +release of a new API, known as PCRE2, with release numbers starting at 10.00 +and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old libraries +(now called PCRE1) are still being maintained for bug fixes, but there will be +no new development. New projects are advised to use the new PCRE2 libraries. + + This document contains the following sections: General @@ -756,9 +764,9 @@ required. For details, please see this web site: http://www.zaconsultants.net -There is also a mirror here: - - http://www.vsoft-software.com/downloads.html +You may download PCRE from WWW.CBTTAPE.ORG, file 882. Everything, source and +executable, is in EBCDIC and native z/OS file formats and this is the +recommended download site. ========================== -Last Updated: 14 May 2013 +Last Updated: 25 June 2015 diff --git a/doc/html/README.txt b/doc/html/README.txt index 88f2dfd..4887ebf 100644 --- a/doc/html/README.txt +++ b/doc/html/README.txt @@ -1,7 +1,16 @@ README file for PCRE (Perl-compatible regular expression library) ----------------------------------------------------------------- -The latest release of PCRE is always available in three alternative formats +NOTE: This set of files relates to PCRE releases that use the original API, +with library names libpcre, libpcre16, and libpcre32. January 2015 saw the +first release of a new API, known as PCRE2, with release numbers starting at +10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old +libraries (now called PCRE1) are still being maintained for bug fixes, but +there will be no new development. New projects are advised to use the new PCRE2 +libraries. + + +The latest release of PCRE1 is always available in three alternative formats from: ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz @@ -45,14 +54,16 @@ the 16-bit library, which processes strings of 16-bit values, and one for the 32-bit library, which processes strings of 32-bit values. The distribution also includes a set of C++ wrapper functions (see the pcrecpp man page for details), courtesy of Google Inc., which can be used to call the 8-bit PCRE library from -C++. +C++. Other C++ wrappers have been created from time to time. See, for example: +https://github.com/YasserAsmi/regexp, which aims to be simple and similar in +style to the C API. -In addition, there is a set of C wrapper functions (again, just for the 8-bit -library) that are based on the POSIX regular expression API (see the pcreposix -man page). These end up in the library called libpcreposix. Note that this just -provides a POSIX calling interface to PCRE; the regular expressions themselves -still follow Perl syntax and semantics. The POSIX API is restricted, and does -not give full access to all of PCRE's facilities. +The distribution also contains a set of C wrapper functions (again, just for +the 8-bit library) that are based on the POSIX regular expression API (see the +pcreposix man page). These end up in the library called libpcreposix. Note that +this just provides a POSIX calling interface to PCRE; the regular expressions +themselves still follow Perl syntax and semantics. The POSIX API is restricted, +and does not give full access to all of PCRE's facilities. The header file for the POSIX-style functions is called pcreposix.h. The official POSIX name is regex.h, but I did not want to risk possible problems @@ -988,4 +999,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx. Philip Hazel Email local part: ph10 Email domain: cam.ac.uk -Last updated: 17 January 2014 +Last updated: 10 February 2015 diff --git a/doc/html/pcre.html b/doc/html/pcre.html index c2b29aa..c87b106 100644 --- a/doc/html/pcre.html +++ b/doc/html/pcre.html @@ -13,13 +13,24 @@ from the original man page. If there is any nonsense in it, please consult the man page, in case the conversion went wrong. <br> <ul> -<li><a name="TOC1" href="#SEC1">INTRODUCTION</a> -<li><a name="TOC2" href="#SEC2">SECURITY CONSIDERATIONS</a> -<li><a name="TOC3" href="#SEC3">USER DOCUMENTATION</a> -<li><a name="TOC4" href="#SEC4">AUTHOR</a> -<li><a name="TOC5" href="#SEC5">REVISION</a> +<li><a name="TOC1" href="#SEC1">PLEASE TAKE NOTE</a> +<li><a name="TOC2" href="#SEC2">INTRODUCTION</a> +<li><a name="TOC3" href="#SEC3">SECURITY CONSIDERATIONS</a> +<li><a name="TOC4" href="#SEC4">USER DOCUMENTATION</a> +<li><a name="TOC5" href="#SEC5">AUTHOR</a> +<li><a name="TOC6" href="#SEC6">REVISION</a> </ul> -<br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br> +<br><a name="SEC1" href="#TOC1">PLEASE TAKE NOTE</a><br> +<P> +This document relates to PCRE releases that use the original API, +with library names libpcre, libpcre16, and libpcre32. January 2015 saw the +first release of a new API, known as PCRE2, with release numbers starting at +10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old +libraries (now called PCRE1) are still being maintained for bug fixes, but +there will be no new development. New projects are advised to use the new PCRE2 +libraries. +</P> +<br><a name="SEC2" href="#TOC1">INTRODUCTION</a><br> <P> The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl, with just a few @@ -115,7 +126,7 @@ clashes. In some environments, it is possible to control which external symbols are exported when a shared library is built, and in these cases the undocumented symbols are not exported. </P> -<br><a name="SEC2" href="#TOC1">SECURITY CONSIDERATIONS</a><br> +<br><a name="SEC3" href="#TOC1">SECURITY CONSIDERATIONS</a><br> <P> If you are using PCRE in a non-UTF application that permits users to supply arbitrary patterns for compilation, you should be aware of a feature that @@ -149,7 +160,7 @@ against this: see the PCRE_EXTRA_MATCH_LIMIT feature in the <a href="pcreapi.html"><b>pcreapi</b></a> page. </P> -<br><a name="SEC3" href="#TOC1">USER DOCUMENTATION</a><br> +<br><a name="SEC4" href="#TOC1">USER DOCUMENTATION</a><br> <P> The user documentation for PCRE comprises a number of different sections. In the "man" format, each of these is a separate "man page". In the HTML format, @@ -188,7 +199,7 @@ follows: In the "man" and HTML formats, there is also a short page for each C library function, listing its arguments and results. </P> -<br><a name="SEC4" href="#TOC1">AUTHOR</a><br> +<br><a name="SEC5" href="#TOC1">AUTHOR</a><br> <P> Philip Hazel <br> @@ -202,11 +213,11 @@ Putting an actual email address here seems to have been a spam magnet, so I've taken it away. If you want to email me, use my two initials, followed by the two digits 10, at the domain cam.ac.uk. </P> -<br><a name="SEC5" href="#TOC1">REVISION</a><br> +<br><a name="SEC6" href="#TOC1">REVISION</a><br> <P> -Last updated: 08 January 2014 +Last updated: 10 February 2015 <br> -Copyright © 1997-2014 University of Cambridge. +Copyright © 1997-2015 University of Cambridge. <br> <p> Return to the <a href="index.html">PCRE index page</a>. diff --git a/doc/html/pcre_config.html b/doc/html/pcre_config.html index bcdcdde..72fb9ca 100644 --- a/doc/html/pcre_config.html +++ b/doc/html/pcre_config.html @@ -39,8 +39,10 @@ arguments are as follows: <i>where</i> Points to where to put the data </pre> The <i>where</i> argument must point to an integer variable, except for -PCRE_CONFIG_MATCH_LIMIT and PCRE_CONFIG_MATCH_LIMIT_RECURSION, when it must -point to an unsigned long integer. The available codes are: +PCRE_CONFIG_MATCH_LIMIT, PCRE_CONFIG_MATCH_LIMIT_RECURSION, and +PCRE_CONFIG_PARENS_LIMIT, when it must point to an unsigned long integer, +and for PCRE_CONFIG_JITTARGET, when it must point to a const char*. +The available codes are: <pre> PCRE_CONFIG_JIT Availability of just-in-time compiler support (1=yes 0=no) diff --git a/doc/html/pcre_fullinfo.html b/doc/html/pcre_fullinfo.html index b88fc11..2b7c72b 100644 --- a/doc/html/pcre_fullinfo.html +++ b/doc/html/pcre_fullinfo.html @@ -57,6 +57,10 @@ The following information is available: PCRE_INFO_JITSIZE Size of JIT compiled code PCRE_INFO_LASTLITERAL Literal last data unit required PCRE_INFO_MINLENGTH Lower bound length of matching strings + PCRE_INFO_MATCHEMPTY Return 1 if the pattern can match an empty string, + 0 otherwise + PCRE_INFO_MATCHLIMIT Match limit if set, otherwise PCRE_RROR_UNSET + PCRE_INFO_MAXLOOKBEHIND Length (in characters) of the longest lookbehind assertion PCRE_INFO_NAMECOUNT Number of named subpatterns PCRE_INFO_NAMEENTRYSIZE Size of name table entry PCRE_INFO_NAMETABLE Pointer to name table @@ -72,6 +76,7 @@ The following information is available: 2 if the first character is at the start of the data string or after a newline, and 0 otherwise + PCRE_INFO_RECURSIONLIMIT Recursion limit if set, otherwise PCRE_ERROR_UNSET PCRE_INFO_REQUIREDCHAR Literal last data unit required PCRE_INFO_REQUIREDCHARFLAGS Returns 1 if the last data character is set (which can then be retrieved using PCRE_INFO_REQUIREDCHAR); 0 otherwise @@ -79,14 +84,18 @@ The following information is available: The <i>where</i> argument must point to an integer variable, except for the following <i>what</i> values: <pre> - PCRE_INFO_DEFAULT_TABLES const unsigned char * - PCRE_INFO_FIRSTTABLE const unsigned char * + PCRE_INFO_DEFAULT_TABLES const uint8_t * + PCRE_INFO_FIRSTCHARACTER uint32_t + PCRE_INFO_FIRSTTABLE const uint8_t * + PCRE_INFO_JITSIZE size_t + PCRE_INFO_MATCHLIMIT uint32_t PCRE_INFO_NAMETABLE PCRE_SPTR16 (16-bit library) PCRE_INFO_NAMETABLE PCRE_SPTR32 (32-bit library) PCRE_INFO_NAMETABLE const unsigned char * (8-bit library) PCRE_INFO_OPTIONS unsigned long int PCRE_INFO_SIZE size_t - PCRE_INFO_FIRSTCHARACTER uint32_t + PCRE_INFO_STUDYSIZE size_t + PCRE_INFO_RECURSIONLIMIT uint32_t PCRE_INFO_REQUIREDCHAR uint32_t </pre> The yield of the function is zero on success or: @@ -95,6 +104,7 @@ The yield of the function is zero on success or: the argument <i>where</i> was NULL PCRE_ERROR_BADMAGIC the "magic number" was not found PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid + PCRE_ERROR_UNSET the option was not set </PRE> </P> <P> diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html index c06d1e0..55034a7 100644 --- a/doc/html/pcrepattern.html +++ b/doc/html/pcrepattern.html @@ -329,7 +329,8 @@ A second use of backslash provides a way of encoding non-printing characters in patterns in a visible manner. There is no restriction on the appearance of non-printing characters, apart from the binary zero that terminates a pattern, but when a pattern is being prepared by text editing, it is often easier to use -one of the following escape sequences than the binary character it represents: +one of the following escape sequences than the binary character it represents. +In an ASCII or Unicode environment, these escapes are as follows: <pre> \a alarm, that is, the BEL character (hex 07) \cx "control-x", where x is any ASCII character @@ -353,19 +354,33 @@ data item (byte or 16-bit value) following \c has a value greater than 127, a compile-time error occurs. This locks out non-ASCII characters in all modes. </P> <P> -The \c facility was designed for use with ASCII characters, but with the -extension to Unicode it is even less useful than it once was. It is, however, -recognized when PCRE is compiled in EBCDIC mode, where data items are always -bytes. In this mode, all values are valid after \c. If the next character is a -lower case letter, it is converted to upper case. Then the 0xc0 bits of the -byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because -the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other -characters also generate different values. +When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t +generate the appropriate EBCDIC code values. The \c escape is processed +as specified for Perl in the <b>perlebcdic</b> document. The only characters +that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any +other character provokes a compile-time error. The sequence \@ encodes +character code 0; the letters (in either case) encode characters 1-26 (hex 01 +to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and +\? becomes either 255 (hex FF) or 95 (hex 5F). +</P> +<P> +Thus, apart from \?, these escapes generate the same character code values as +they do in an ASCII environment, though the meanings of the values mostly +differ. For example, \G always generates code value 7, which is BEL in ASCII +but DEL in EBCDIC. +</P> +<P> +The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but +because 127 is not a control character in EBCDIC, Perl makes it generate the +APC character. Unfortunately, there are several variants of EBCDIC. In most of +them the APC character has the value 255 (hex FF), but in the one Perl calls +POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC +values, PCRE makes \? generate 95; otherwise it generates 255. </P> <P> After \0 up to two further octal digits are read. If there are fewer than two -digits, just those that are present are used. Thus the sequence \0\x\07 -specifies two binary zeros followed by a BEL character (code value 7). Make +digits, just those that are present are used. Thus the sequence \0\x\015 +specifies two binary zeros followed by a CR character (code value 13). Make sure you supply two digits after the initial zero if the pattern character that follows is itself an octal digit. </P> @@ -703,6 +718,7 @@ Armenian, Avestan, Balinese, Bamum, +Bassa_Vah, Batak, Bengali, Bopomofo, @@ -712,6 +728,7 @@ Buginese, Buhid, Canadian_Aboriginal, Carian, +Caucasian_Albanian, Chakma, Cham, Cherokee, @@ -722,11 +739,14 @@ Cypriot, Cyrillic, Deseret, Devanagari, +Duployan, Egyptian_Hieroglyphs, +Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, +Grantha, Greek, Gujarati, Gurmukhi, @@ -746,40 +766,56 @@ Katakana, Kayah_Li, Kharoshthi, Khmer, +Khojki, +Khudawadi, Lao, Latin, Lepcha, Limbu, +Linear_A, Linear_B, Lisu, Lycian, Lydian, +Mahajani, Malayalam, Mandaic, +Manichaean, Meetei_Mayek, +Mende_Kikakui, Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, +Modi, Mongolian, +Mro, Myanmar, +Nabataean, New_Tai_Lue, Nko, Ogham, +Ol_Chiki, Old_Italic, +Old_North_Arabian, +Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, -Ol_Chiki, Oriya, Osmanya, +Pahawh_Hmong, +Palmyrene, +Pau_Cin_Hau, Phags_Pa, Phoenician, +Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Shavian, +Siddham, Sinhala, Sora_Sompeng, Sundanese, @@ -797,8 +833,10 @@ Thaana, Thai, Tibetan, Tifinagh, +Tirhuta, Ugaritic, Vai, +Warang_Citi, Yi. </P> <P> @@ -3226,9 +3264,9 @@ Cambridge CB2 3QH, England. </P> <br><a name="SEC30" href="#TOC1">REVISION</a><br> <P> -Last updated: 08 January 2014 +Last updated: 14 June 2015 <br> -Copyright © 1997-2014 University of Cambridge. +Copyright © 1997-2015 University of Cambridge. <br> <p> Return to the <a href="index.html">PCRE index page</a>. diff --git a/doc/html/pcresyntax.html b/doc/html/pcresyntax.html index 89f3573..5896b9e 100644 --- a/doc/html/pcresyntax.html +++ b/doc/html/pcresyntax.html @@ -171,6 +171,7 @@ Armenian, Avestan, Balinese, Bamum, +Bassa_Vah, Batak, Bengali, Bopomofo, @@ -180,6 +181,7 @@ Buginese, Buhid, Canadian_Aboriginal, Carian, +Caucasian_Albanian, Chakma, Cham, Cherokee, @@ -190,11 +192,14 @@ Cypriot, Cyrillic, Deseret, Devanagari, +Duployan, Egyptian_Hieroglyphs, +Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, +Grantha, Greek, Gujarati, Gurmukhi, @@ -214,40 +219,56 @@ Katakana, Kayah_Li, Kharoshthi, Khmer, +Khojki, +Khudawadi, Lao, Latin, Lepcha, Limbu, +Linear_A, Linear_B, Lisu, Lycian, Lydian, +Mahajani, Malayalam, Mandaic, +Manichaean, Meetei_Mayek, +Mende_Kikakui, Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, +Modi, Mongolian, +Mro, Myanmar, +Nabataean, New_Tai_Lue, Nko, Ogham, +Ol_Chiki, Old_Italic, +Old_North_Arabian, +Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, -Ol_Chiki, Oriya, Osmanya, +Pahawh_Hmong, +Palmyrene, +Pau_Cin_Hau, Phags_Pa, Phoenician, +Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Shavian, +Siddham, Sinhala, Sora_Sompeng, Sundanese, @@ -265,8 +286,10 @@ Thaana, Thai, Tibetan, Tifinagh, +Tirhuta, Ugaritic, Vai, +Warang_Citi, Yi. </P> <br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br> |