summaryrefslogtreecommitdiff
path: root/doc/html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html')
-rw-r--r--doc/html/NON-AUTOTOOLS-BUILD.txt16
-rw-r--r--doc/html/README.txt29
-rw-r--r--doc/html/pcre.html35
-rw-r--r--doc/html/pcre_config.html6
-rw-r--r--doc/html/pcre_fullinfo.html16
-rw-r--r--doc/html/pcrepattern.html66
-rw-r--r--doc/html/pcresyntax.html25
7 files changed, 148 insertions, 45 deletions
diff --git a/doc/html/NON-AUTOTOOLS-BUILD.txt b/doc/html/NON-AUTOTOOLS-BUILD.txt
index cddf3e0..3910059 100644
--- a/doc/html/NON-AUTOTOOLS-BUILD.txt
+++ b/doc/html/NON-AUTOTOOLS-BUILD.txt
@@ -1,6 +1,14 @@
Building PCRE without using autotools
-------------------------------------
+NOTE: This document relates to PCRE releases that use the original API, with
+library names libpcre, libpcre16, and libpcre32. January 2015 saw the first
+release of a new API, known as PCRE2, with release numbers starting at 10.00
+and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old libraries
+(now called PCRE1) are still being maintained for bug fixes, but there will be
+no new development. New projects are advised to use the new PCRE2 libraries.
+
+
This document contains the following sections:
General
@@ -756,9 +764,9 @@ required. For details, please see this web site:
http://www.zaconsultants.net
-There is also a mirror here:
-
- http://www.vsoft-software.com/downloads.html
+You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
+executable, is in EBCDIC and native z/OS file formats and this is the
+recommended download site.
==========================
-Last Updated: 14 May 2013
+Last Updated: 25 June 2015
diff --git a/doc/html/README.txt b/doc/html/README.txt
index 88f2dfd..4887ebf 100644
--- a/doc/html/README.txt
+++ b/doc/html/README.txt
@@ -1,7 +1,16 @@
README file for PCRE (Perl-compatible regular expression library)
-----------------------------------------------------------------
-The latest release of PCRE is always available in three alternative formats
+NOTE: This set of files relates to PCRE releases that use the original API,
+with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
+first release of a new API, known as PCRE2, with release numbers starting at
+10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
+libraries (now called PCRE1) are still being maintained for bug fixes, but
+there will be no new development. New projects are advised to use the new PCRE2
+libraries.
+
+
+The latest release of PCRE1 is always available in three alternative formats
from:
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
@@ -45,14 +54,16 @@ the 16-bit library, which processes strings of 16-bit values, and one for the
32-bit library, which processes strings of 32-bit values. The distribution also
includes a set of C++ wrapper functions (see the pcrecpp man page for details),
courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
-C++.
+C++. Other C++ wrappers have been created from time to time. See, for example:
+https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
+style to the C API.
-In addition, there is a set of C wrapper functions (again, just for the 8-bit
-library) that are based on the POSIX regular expression API (see the pcreposix
-man page). These end up in the library called libpcreposix. Note that this just
-provides a POSIX calling interface to PCRE; the regular expressions themselves
-still follow Perl syntax and semantics. The POSIX API is restricted, and does
-not give full access to all of PCRE's facilities.
+The distribution also contains a set of C wrapper functions (again, just for
+the 8-bit library) that are based on the POSIX regular expression API (see the
+pcreposix man page). These end up in the library called libpcreposix. Note that
+this just provides a POSIX calling interface to PCRE; the regular expressions
+themselves still follow Perl syntax and semantics. The POSIX API is restricted,
+and does not give full access to all of PCRE's facilities.
The header file for the POSIX-style functions is called pcreposix.h. The
official POSIX name is regex.h, but I did not want to risk possible problems
@@ -988,4 +999,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 17 January 2014
+Last updated: 10 February 2015
diff --git a/doc/html/pcre.html b/doc/html/pcre.html
index c2b29aa..c87b106 100644
--- a/doc/html/pcre.html
+++ b/doc/html/pcre.html
@@ -13,13 +13,24 @@ from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
-<li><a name="TOC1" href="#SEC1">INTRODUCTION</a>
-<li><a name="TOC2" href="#SEC2">SECURITY CONSIDERATIONS</a>
-<li><a name="TOC3" href="#SEC3">USER DOCUMENTATION</a>
-<li><a name="TOC4" href="#SEC4">AUTHOR</a>
-<li><a name="TOC5" href="#SEC5">REVISION</a>
+<li><a name="TOC1" href="#SEC1">PLEASE TAKE NOTE</a>
+<li><a name="TOC2" href="#SEC2">INTRODUCTION</a>
+<li><a name="TOC3" href="#SEC3">SECURITY CONSIDERATIONS</a>
+<li><a name="TOC4" href="#SEC4">USER DOCUMENTATION</a>
+<li><a name="TOC5" href="#SEC5">AUTHOR</a>
+<li><a name="TOC6" href="#SEC6">REVISION</a>
</ul>
-<br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br>
+<br><a name="SEC1" href="#TOC1">PLEASE TAKE NOTE</a><br>
+<P>
+This document relates to PCRE releases that use the original API,
+with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
+first release of a new API, known as PCRE2, with release numbers starting at
+10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
+libraries (now called PCRE1) are still being maintained for bug fixes, but
+there will be no new development. New projects are advised to use the new PCRE2
+libraries.
+</P>
+<br><a name="SEC2" href="#TOC1">INTRODUCTION</a><br>
<P>
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl, with just a few
@@ -115,7 +126,7 @@ clashes. In some environments, it is possible to control which external symbols
are exported when a shared library is built, and in these cases the
undocumented symbols are not exported.
</P>
-<br><a name="SEC2" href="#TOC1">SECURITY CONSIDERATIONS</a><br>
+<br><a name="SEC3" href="#TOC1">SECURITY CONSIDERATIONS</a><br>
<P>
If you are using PCRE in a non-UTF application that permits users to supply
arbitrary patterns for compilation, you should be aware of a feature that
@@ -149,7 +160,7 @@ against this: see the PCRE_EXTRA_MATCH_LIMIT feature in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page.
</P>
-<br><a name="SEC3" href="#TOC1">USER DOCUMENTATION</a><br>
+<br><a name="SEC4" href="#TOC1">USER DOCUMENTATION</a><br>
<P>
The user documentation for PCRE comprises a number of different sections. In
the "man" format, each of these is a separate "man page". In the HTML format,
@@ -188,7 +199,7 @@ follows:
In the "man" and HTML formats, there is also a short page for each C library
function, listing its arguments and results.
</P>
-<br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
@@ -202,11 +213,11 @@ Putting an actual email address here seems to have been a spam magnet, so I've
taken it away. If you want to email me, use my two initials, followed by the
two digits 10, at the domain cam.ac.uk.
</P>
-<br><a name="SEC5" href="#TOC1">REVISION</a><br>
+<br><a name="SEC6" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 08 January 2014
+Last updated: 10 February 2015
<br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.
diff --git a/doc/html/pcre_config.html b/doc/html/pcre_config.html
index bcdcdde..72fb9ca 100644
--- a/doc/html/pcre_config.html
+++ b/doc/html/pcre_config.html
@@ -39,8 +39,10 @@ arguments are as follows:
<i>where</i> Points to where to put the data
</pre>
The <i>where</i> argument must point to an integer variable, except for
-PCRE_CONFIG_MATCH_LIMIT and PCRE_CONFIG_MATCH_LIMIT_RECURSION, when it must
-point to an unsigned long integer. The available codes are:
+PCRE_CONFIG_MATCH_LIMIT, PCRE_CONFIG_MATCH_LIMIT_RECURSION, and
+PCRE_CONFIG_PARENS_LIMIT, when it must point to an unsigned long integer,
+and for PCRE_CONFIG_JITTARGET, when it must point to a const char*.
+The available codes are:
<pre>
PCRE_CONFIG_JIT Availability of just-in-time compiler
support (1=yes 0=no)
diff --git a/doc/html/pcre_fullinfo.html b/doc/html/pcre_fullinfo.html
index b88fc11..2b7c72b 100644
--- a/doc/html/pcre_fullinfo.html
+++ b/doc/html/pcre_fullinfo.html
@@ -57,6 +57,10 @@ The following information is available:
PCRE_INFO_JITSIZE Size of JIT compiled code
PCRE_INFO_LASTLITERAL Literal last data unit required
PCRE_INFO_MINLENGTH Lower bound length of matching strings
+ PCRE_INFO_MATCHEMPTY Return 1 if the pattern can match an empty string,
+ 0 otherwise
+ PCRE_INFO_MATCHLIMIT Match limit if set, otherwise PCRE_RROR_UNSET
+ PCRE_INFO_MAXLOOKBEHIND Length (in characters) of the longest lookbehind assertion
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
@@ -72,6 +76,7 @@ The following information is available:
2 if the first character is at the start of the data
string or after a newline, and
0 otherwise
+ PCRE_INFO_RECURSIONLIMIT Recursion limit if set, otherwise PCRE_ERROR_UNSET
PCRE_INFO_REQUIREDCHAR Literal last data unit required
PCRE_INFO_REQUIREDCHARFLAGS Returns 1 if the last data character is set (which can then
be retrieved using PCRE_INFO_REQUIREDCHAR); 0 otherwise
@@ -79,14 +84,18 @@ The following information is available:
The <i>where</i> argument must point to an integer variable, except for the
following <i>what</i> values:
<pre>
- PCRE_INFO_DEFAULT_TABLES const unsigned char *
- PCRE_INFO_FIRSTTABLE const unsigned char *
+ PCRE_INFO_DEFAULT_TABLES const uint8_t *
+ PCRE_INFO_FIRSTCHARACTER uint32_t
+ PCRE_INFO_FIRSTTABLE const uint8_t *
+ PCRE_INFO_JITSIZE size_t
+ PCRE_INFO_MATCHLIMIT uint32_t
PCRE_INFO_NAMETABLE PCRE_SPTR16 (16-bit library)
PCRE_INFO_NAMETABLE PCRE_SPTR32 (32-bit library)
PCRE_INFO_NAMETABLE const unsigned char * (8-bit library)
PCRE_INFO_OPTIONS unsigned long int
PCRE_INFO_SIZE size_t
- PCRE_INFO_FIRSTCHARACTER uint32_t
+ PCRE_INFO_STUDYSIZE size_t
+ PCRE_INFO_RECURSIONLIMIT uint32_t
PCRE_INFO_REQUIREDCHAR uint32_t
</pre>
The yield of the function is zero on success or:
@@ -95,6 +104,7 @@ The yield of the function is zero on success or:
the argument <i>where</i> was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
+ PCRE_ERROR_UNSET the option was not set
</PRE>
</P>
<P>
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
index c06d1e0..55034a7 100644
--- a/doc/html/pcrepattern.html
+++ b/doc/html/pcrepattern.html
@@ -329,7 +329,8 @@ A second use of backslash provides a way of encoding non-printing characters
in patterns in a visible manner. There is no restriction on the appearance of
non-printing characters, apart from the binary zero that terminates a pattern,
but when a pattern is being prepared by text editing, it is often easier to use
-one of the following escape sequences than the binary character it represents:
+one of the following escape sequences than the binary character it represents.
+In an ASCII or Unicode environment, these escapes are as follows:
<pre>
\a alarm, that is, the BEL character (hex 07)
\cx "control-x", where x is any ASCII character
@@ -353,19 +354,33 @@ data item (byte or 16-bit value) following \c has a value greater than 127, a
compile-time error occurs. This locks out non-ASCII characters in all modes.
</P>
<P>
-The \c facility was designed for use with ASCII characters, but with the
-extension to Unicode it is even less useful than it once was. It is, however,
-recognized when PCRE is compiled in EBCDIC mode, where data items are always
-bytes. In this mode, all values are valid after \c. If the next character is a
-lower case letter, it is converted to upper case. Then the 0xc0 bits of the
-byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
-the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
-characters also generate different values.
+When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
+generate the appropriate EBCDIC code values. The \c escape is processed
+as specified for Perl in the <b>perlebcdic</b> document. The only characters
+that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
+other character provokes a compile-time error. The sequence \@ encodes
+character code 0; the letters (in either case) encode characters 1-26 (hex 01
+to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
+\? becomes either 255 (hex FF) or 95 (hex 5F).
+</P>
+<P>
+Thus, apart from \?, these escapes generate the same character code values as
+they do in an ASCII environment, though the meanings of the values mostly
+differ. For example, \G always generates code value 7, which is BEL in ASCII
+but DEL in EBCDIC.
+</P>
+<P>
+The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
+because 127 is not a control character in EBCDIC, Perl makes it generate the
+APC character. Unfortunately, there are several variants of EBCDIC. In most of
+them the APC character has the value 255 (hex FF), but in the one Perl calls
+POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
+values, PCRE makes \? generate 95; otherwise it generates 255.
</P>
<P>
After \0 up to two further octal digits are read. If there are fewer than two
-digits, just those that are present are used. Thus the sequence \0\x\07
-specifies two binary zeros followed by a BEL character (code value 7). Make
+digits, just those that are present are used. Thus the sequence \0\x\015
+specifies two binary zeros followed by a CR character (code value 13). Make
sure you supply two digits after the initial zero if the pattern character that
follows is itself an octal digit.
</P>
@@ -703,6 +718,7 @@ Armenian,
Avestan,
Balinese,
Bamum,
+Bassa_Vah,
Batak,
Bengali,
Bopomofo,
@@ -712,6 +728,7 @@ Buginese,
Buhid,
Canadian_Aboriginal,
Carian,
+Caucasian_Albanian,
Chakma,
Cham,
Cherokee,
@@ -722,11 +739,14 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
+Duployan,
Egyptian_Hieroglyphs,
+Elbasan,
Ethiopic,
Georgian,
Glagolitic,
Gothic,
+Grantha,
Greek,
Gujarati,
Gurmukhi,
@@ -746,40 +766,56 @@ Katakana,
Kayah_Li,
Kharoshthi,
Khmer,
+Khojki,
+Khudawadi,
Lao,
Latin,
Lepcha,
Limbu,
+Linear_A,
Linear_B,
Lisu,
Lycian,
Lydian,
+Mahajani,
Malayalam,
Mandaic,
+Manichaean,
Meetei_Mayek,
+Mende_Kikakui,
Meroitic_Cursive,
Meroitic_Hieroglyphs,
Miao,
+Modi,
Mongolian,
+Mro,
Myanmar,
+Nabataean,
New_Tai_Lue,
Nko,
Ogham,
+Ol_Chiki,
Old_Italic,
+Old_North_Arabian,
+Old_Permic,
Old_Persian,
Old_South_Arabian,
Old_Turkic,
-Ol_Chiki,
Oriya,
Osmanya,
+Pahawh_Hmong,
+Palmyrene,
+Pau_Cin_Hau,
Phags_Pa,
Phoenician,
+Psalter_Pahlavi,
Rejang,
Runic,
Samaritan,
Saurashtra,
Sharada,
Shavian,
+Siddham,
Sinhala,
Sora_Sompeng,
Sundanese,
@@ -797,8 +833,10 @@ Thaana,
Thai,
Tibetan,
Tifinagh,
+Tirhuta,
Ugaritic,
Vai,
+Warang_Citi,
Yi.
</P>
<P>
@@ -3226,9 +3264,9 @@ Cambridge CB2 3QH, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 08 January 2014
+Last updated: 14 June 2015
<br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.
diff --git a/doc/html/pcresyntax.html b/doc/html/pcresyntax.html
index 89f3573..5896b9e 100644
--- a/doc/html/pcresyntax.html
+++ b/doc/html/pcresyntax.html
@@ -171,6 +171,7 @@ Armenian,
Avestan,
Balinese,
Bamum,
+Bassa_Vah,
Batak,
Bengali,
Bopomofo,
@@ -180,6 +181,7 @@ Buginese,
Buhid,
Canadian_Aboriginal,
Carian,
+Caucasian_Albanian,
Chakma,
Cham,
Cherokee,
@@ -190,11 +192,14 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
+Duployan,
Egyptian_Hieroglyphs,
+Elbasan,
Ethiopic,
Georgian,
Glagolitic,
Gothic,
+Grantha,
Greek,
Gujarati,
Gurmukhi,
@@ -214,40 +219,56 @@ Katakana,
Kayah_Li,
Kharoshthi,
Khmer,
+Khojki,
+Khudawadi,
Lao,
Latin,
Lepcha,
Limbu,
+Linear_A,
Linear_B,
Lisu,
Lycian,
Lydian,
+Mahajani,
Malayalam,
Mandaic,
+Manichaean,
Meetei_Mayek,
+Mende_Kikakui,
Meroitic_Cursive,
Meroitic_Hieroglyphs,
Miao,
+Modi,
Mongolian,
+Mro,
Myanmar,
+Nabataean,
New_Tai_Lue,
Nko,
Ogham,
+Ol_Chiki,
Old_Italic,
+Old_North_Arabian,
+Old_Permic,
Old_Persian,
Old_South_Arabian,
Old_Turkic,
-Ol_Chiki,
Oriya,
Osmanya,
+Pahawh_Hmong,
+Palmyrene,
+Pau_Cin_Hau,
Phags_Pa,
Phoenician,
+Psalter_Pahlavi,
Rejang,
Runic,
Samaritan,
Saurashtra,
Sharada,
Shavian,
+Siddham,
Sinhala,
Sora_Sompeng,
Sundanese,
@@ -265,8 +286,10 @@ Thaana,
Thai,
Tibetan,
Tifinagh,
+Tirhuta,
Ugaritic,
Vai,
+Warang_Citi,
Yi.
</P>
<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>