summaryrefslogtreecommitdiff
path: root/doc/html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html')
-rw-r--r--doc/html/NON-AUTOTOOLS-BUILD.txt764
-rw-r--r--doc/html/README.txt991
-rw-r--r--doc/html/index.html185
-rw-r--r--doc/html/pcre-config.html109
-rw-r--r--doc/html/pcre.html213
-rw-r--r--doc/html/pcre16.html384
-rw-r--r--doc/html/pcre32.html382
-rw-r--r--doc/html/pcre_assign_jit_stack.html76
-rw-r--r--doc/html/pcre_compile.html111
-rw-r--r--doc/html/pcre_compile2.html115
-rw-r--r--doc/html/pcre_config.html92
-rw-r--r--doc/html/pcre_copy_named_substring.html65
-rw-r--r--doc/html/pcre_copy_substring.html61
-rw-r--r--doc/html/pcre_dfa_exec.html129
-rw-r--r--doc/html/pcre_exec.html111
-rw-r--r--doc/html/pcre_free_study.html46
-rw-r--r--doc/html/pcre_free_substring.html46
-rw-r--r--doc/html/pcre_free_substring_list.html46
-rw-r--r--doc/html/pcre_fullinfo.html108
-rw-r--r--doc/html/pcre_get_named_substring.html68
-rw-r--r--doc/html/pcre_get_stringnumber.html57
-rw-r--r--doc/html/pcre_get_stringtable_entries.html60
-rw-r--r--doc/html/pcre_get_substring.html64
-rw-r--r--doc/html/pcre_get_substring_list.html61
-rw-r--r--doc/html/pcre_jit_exec.html108
-rw-r--r--doc/html/pcre_jit_stack_alloc.html55
-rw-r--r--doc/html/pcre_jit_stack_free.html48
-rw-r--r--doc/html/pcre_maketables.html48
-rw-r--r--doc/html/pcre_pattern_to_host_byte_order.html58
-rw-r--r--doc/html/pcre_refcount.html51
-rw-r--r--doc/html/pcre_study.html68
-rw-r--r--doc/html/pcre_utf16_to_host_byte_order.html57
-rw-r--r--doc/html/pcre_utf32_to_host_byte_order.html57
-rw-r--r--doc/html/pcre_version.html46
-rw-r--r--doc/html/pcreapi.html2922
-rw-r--r--doc/html/pcrebuild.html534
-rw-r--r--doc/html/pcrecallout.html286
-rw-r--r--doc/html/pcrecompat.html235
-rw-r--r--doc/html/pcrecpp.html368
-rw-r--r--doc/html/pcredemo.html426
-rw-r--r--doc/html/pcregrep.html759
-rw-r--r--doc/html/pcrejit.html452
-rw-r--r--doc/html/pcrelimits.html90
-rw-r--r--doc/html/pcrematching.html242
-rw-r--r--doc/html/pcrepartial.html509
-rw-r--r--doc/html/pcrepattern.html3235
-rw-r--r--doc/html/pcreperform.html195
-rw-r--r--doc/html/pcreposix.html290
-rw-r--r--doc/html/pcreprecompile.html163
-rw-r--r--doc/html/pcresample.html110
-rw-r--r--doc/html/pcrestack.html225
-rw-r--r--doc/html/pcresyntax.html538
-rw-r--r--doc/html/pcretest.html1158
-rw-r--r--doc/html/pcreunicode.html262
54 files changed, 17939 insertions, 0 deletions
diff --git a/doc/html/NON-AUTOTOOLS-BUILD.txt b/doc/html/NON-AUTOTOOLS-BUILD.txt
new file mode 100644
index 0000000..cddf3e0
--- /dev/null
+++ b/doc/html/NON-AUTOTOOLS-BUILD.txt
@@ -0,0 +1,764 @@
+Building PCRE without using autotools
+-------------------------------------
+
+This document contains the following sections:
+
+ General
+ Generic instructions for the PCRE C library
+ The C++ wrapper functions
+ Building for virtual Pascal
+ Stack size in Windows environments
+ Linking programs in Windows environments
+ Calling conventions in Windows environments
+ Comments about Win32 builds
+ Building PCRE on Windows with CMake
+ Use of relative paths with CMake on Windows
+ Testing with RunTest.bat
+ Building under Windows CE with Visual Studio 200x
+ Building under Windows with BCC5.5
+ Building using Borland C++ Builder 2007 (CB2007) and higher
+ Building PCRE on OpenVMS
+ Building PCRE on Stratus OpenVOS
+ Building PCRE on native z/OS and z/VM
+
+
+GENERAL
+
+I (Philip Hazel) have no experience of Windows or VMS sytems and how their
+libraries work. The items in the PCRE distribution and Makefile that relate to
+anything other than Linux systems are untested by me.
+
+There are some other comments and files (including some documentation in CHM
+format) in the Contrib directory on the FTP site:
+
+ ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
+
+The basic PCRE library consists entirely of code written in Standard C, and so
+should compile successfully on any system that has a Standard C compiler and
+library. The C++ wrapper functions are a separate issue (see below).
+
+The PCRE distribution includes a "configure" file for use by the configure/make
+(autotools) build system, as found in many Unix-like environments. The README
+file contains information about the options for "configure".
+
+There is also support for CMake, which some users prefer, especially in Windows
+environments, though it can also be run in Unix-like environments. See the
+section entitled "Building PCRE on Windows with CMake" below.
+
+Versions of config.h and pcre.h are distributed in the PCRE tarballs under the
+names config.h.generic and pcre.h.generic. These are provided for those who
+build PCRE without using "configure" or CMake. If you use "configure" or CMake,
+the .generic versions are not used.
+
+
+GENERIC INSTRUCTIONS FOR THE PCRE C LIBRARY
+
+The following are generic instructions for building the PCRE C library "by
+hand". If you are going to use CMake, this section does not apply to you; you
+can skip ahead to the CMake section.
+
+ (1) Copy or rename the file config.h.generic as config.h, and edit the macro
+ settings that it contains to whatever is appropriate for your environment.
+
+ In particular, you can alter the definition of the NEWLINE macro to
+ specify what character(s) you want to be interpreted as line terminators.
+ In an EBCDIC environment, you MUST change NEWLINE, because its default
+ value is 10, an ASCII LF. The usual EBCDIC newline character is 21 (0x15,
+ NL), though in some cases it may be 37 (0x25).
+
+ When you compile any of the PCRE modules, you must specify -DHAVE_CONFIG_H
+ to your compiler so that config.h is included in the sources.
+
+ An alternative approach is not to edit config.h, but to use -D on the
+ compiler command line to make any changes that you need to the
+ configuration options. In this case -DHAVE_CONFIG_H must not be set.
+
+ NOTE: There have been occasions when the way in which certain parameters
+ in config.h are used has changed between releases. (In the configure/make
+ world, this is handled automatically.) When upgrading to a new release,
+ you are strongly advised to review config.h.generic before re-using what
+ you had previously.
+
+ (2) Copy or rename the file pcre.h.generic as pcre.h.
+
+ (3) EITHER:
+ Copy or rename file pcre_chartables.c.dist as pcre_chartables.c.
+
+ OR:
+ Compile dftables.c as a stand-alone program (using -DHAVE_CONFIG_H if
+ you have set up config.h), and then run it with the single argument
+ "pcre_chartables.c". This generates a set of standard character tables
+ and writes them to that file. The tables are generated using the default
+ C locale for your system. If you want to use a locale that is specified
+ by LC_xxx environment variables, add the -L option to the dftables
+ command. You must use this method if you are building on a system that
+ uses EBCDIC code.
+
+ The tables in pcre_chartables.c are defaults. The caller of PCRE can
+ specify alternative tables at run time.
+
+ (4) Ensure that you have the following header files:
+
+ pcre_internal.h
+ ucp.h
+
+ (5) For an 8-bit library, compile the following source files, setting
+ -DHAVE_CONFIG_H as a compiler option if you have set up config.h with your
+ configuration, or else use other -D settings to change the configuration
+ as required.
+
+ pcre_byte_order.c
+ pcre_chartables.c
+ pcre_compile.c
+ pcre_config.c
+ pcre_dfa_exec.c
+ pcre_exec.c
+ pcre_fullinfo.c
+ pcre_get.c
+ pcre_globals.c
+ pcre_jit_compile.c
+ pcre_maketables.c
+ pcre_newline.c
+ pcre_ord2utf8.c
+ pcre_refcount.c
+ pcre_string_utils.c
+ pcre_study.c
+ pcre_tables.c
+ pcre_ucd.c
+ pcre_valid_utf8.c
+ pcre_version.c
+ pcre_xclass.c
+
+ Make sure that you include -I. in the compiler command (or equivalent for
+ an unusual compiler) so that all included PCRE header files are first
+ sought in the current directory. Otherwise you run the risk of picking up
+ a previously-installed file from somewhere else.
+
+ Note that you must still compile pcre_jit_compile.c, even if you have not
+ defined SUPPORT_JIT in config.h, because when JIT support is not
+ configured, dummy functions are compiled. When JIT support IS configured,
+ pcre_jit_compile.c #includes sources from the sljit subdirectory, where
+ there should be 16 files, all of whose names begin with "sljit".
+
+ (6) Now link all the compiled code into an object library in whichever form
+ your system keeps such libraries. This is the basic PCRE C 8-bit library.
+ If your system has static and shared libraries, you may have to do this
+ once for each type.
+
+ (7) If you want to build a 16-bit library (as well as, or instead of the 8-bit
+ or 32-bit libraries) repeat steps 5-6 with the following files:
+
+ pcre16_byte_order.c
+ pcre16_chartables.c
+ pcre16_compile.c
+ pcre16_config.c
+ pcre16_dfa_exec.c
+ pcre16_exec.c
+ pcre16_fullinfo.c
+ pcre16_get.c
+ pcre16_globals.c
+ pcre16_jit_compile.c
+ pcre16_maketables.c
+ pcre16_newline.c
+ pcre16_ord2utf16.c
+ pcre16_refcount.c
+ pcre16_string_utils.c
+ pcre16_study.c
+ pcre16_tables.c
+ pcre16_ucd.c
+ pcre16_utf16_utils.c
+ pcre16_valid_utf16.c
+ pcre16_version.c
+ pcre16_xclass.c
+
+ (8) If you want to build a 32-bit library (as well as, or instead of the 8-bit
+ or 16-bit libraries) repeat steps 5-6 with the following files:
+
+ pcre32_byte_order.c
+ pcre32_chartables.c
+ pcre32_compile.c
+ pcre32_config.c
+ pcre32_dfa_exec.c
+ pcre32_exec.c
+ pcre32_fullinfo.c
+ pcre32_get.c
+ pcre32_globals.c
+ pcre32_jit_compile.c
+ pcre32_maketables.c
+ pcre32_newline.c
+ pcre32_ord2utf32.c
+ pcre32_refcount.c
+ pcre32_string_utils.c
+ pcre32_study.c
+ pcre32_tables.c
+ pcre32_ucd.c
+ pcre32_utf32_utils.c
+ pcre32_valid_utf32.c
+ pcre32_version.c
+ pcre32_xclass.c
+
+ (9) If you want to build the POSIX wrapper functions (which apply only to the
+ 8-bit library), ensure that you have the pcreposix.h file and then compile
+ pcreposix.c (remembering -DHAVE_CONFIG_H if necessary). Link the result
+ (on its own) as the pcreposix library.
+
+(10) The pcretest program can be linked with any combination of the 8-bit,
+ 16-bit and 32-bit libraries (depending on what you selected in config.h).
+ Compile pcretest.c and pcre_printint.c (again, don't forget
+ -DHAVE_CONFIG_H) and link them together with the appropriate library/ies.
+ If you compiled an 8-bit library, pcretest also needs the pcreposix
+ wrapper library unless you compiled it with -DNOPOSIX.
+
+(11) Run pcretest on the testinput files in the testdata directory, and check
+ that the output matches the corresponding testoutput files. There are
+ comments about what each test does in the section entitled "Testing PCRE"
+ in the README file. If you compiled more than one of the 8-bit, 16-bit and
+ 32-bit libraries, you need to run pcretest with the -16 option to do
+ 16-bit tests and with the -32 option to do 32-bit tests.
+
+ Some tests are relevant only when certain build-time options are selected.
+ For example, test 4 is for UTF-8/UTF-16/UTF-32 support, and will not run
+ if you have built PCRE without it. See the comments at the start of each
+ testinput file. If you have a suitable Unix-like shell, the RunTest script
+ will run the appropriate tests for you. The command "RunTest list" will
+ output a list of all the tests.
+
+ Note that the supplied files are in Unix format, with just LF characters
+ as line terminators. You may need to edit them to change this if your
+ system uses a different convention. If you are using Windows, you probably
+ should use the wintestinput3 file instead of testinput3 (and the
+ corresponding output file). This is a locale test; wintestinput3 sets the
+ locale to "french" rather than "fr_FR", and there some minor output
+ differences.
+
+(12) If you have built PCRE with SUPPORT_JIT, the JIT features will be tested
+ by the testdata files. However, you might also like to build and run
+ the freestanding JIT test program, pcre_jit_test.c.
+
+(13) If you want to use the pcregrep command, compile and link pcregrep.c; it
+ uses only the basic 8-bit PCRE library (it does not need the pcreposix
+ library).
+
+
+THE C++ WRAPPER FUNCTIONS
+
+The PCRE distribution also contains some C++ wrapper functions and tests,
+applicable to the 8-bit library, which were contributed by Google Inc. On a
+system that can use "configure" and "make", the functions are automatically
+built into a library called pcrecpp. It should be straightforward to compile
+the .cc files manually on other systems. The files called xxx_unittest.cc are
+test programs for each of the corresponding xxx.cc files.
+
+
+BUILDING FOR VIRTUAL PASCAL
+
+A script for building PCRE using Borland's C++ compiler for use with VPASCAL
+was contributed by Alexander Tokarev. Stefan Weber updated the script and added
+additional files. The following files in the distribution are for building PCRE
+for use with VP/Borland: makevp_c.txt, makevp_l.txt, makevp.bat, pcregexp.pas.
+
+
+STACK SIZE IN WINDOWS ENVIRONMENTS
+
+The default processor stack size of 1Mb in some Windows environments is too
+small for matching patterns that need much recursion. In particular, test 2 may
+fail because of this. Normally, running out of stack causes a crash, but there
+have been cases where the test program has just died silently. See your linker
+documentation for how to increase stack size if you experience problems. The
+Linux default of 8Mb is a reasonable choice for the stack, though even that can
+be too small for some pattern/subject combinations.
+
+PCRE has a compile configuration option to disable the use of stack for
+recursion so that heap is used instead. However, pattern matching is
+significantly slower when this is done. There is more about stack usage in the
+"pcrestack" documentation.
+
+
+LINKING PROGRAMS IN WINDOWS ENVIRONMENTS
+
+If you want to statically link a program against a PCRE library in the form of
+a non-dll .a file, you must define PCRE_STATIC before including pcre.h or
+pcrecpp.h, otherwise the pcre_malloc() and pcre_free() exported functions will
+be declared __declspec(dllimport), with unwanted results.
+
+
+CALLING CONVENTIONS IN WINDOWS ENVIRONMENTS
+
+It is possible to compile programs to use different calling conventions using
+MSVC. Search the web for "calling conventions" for more information. To make it
+easier to change the calling convention for the exported functions in the
+PCRE library, the macro PCRE_CALL_CONVENTION is present in all the external
+definitions. It can be set externally when compiling (e.g. in CFLAGS). If it is
+not set, it defaults to empty; the default calling convention is then used
+(which is what is wanted most of the time).
+
+
+COMMENTS ABOUT WIN32 BUILDS (see also "BUILDING PCRE ON WINDOWS WITH CMAKE")
+
+There are two ways of building PCRE using the "configure, make, make install"
+paradigm on Windows systems: using MinGW or using Cygwin. These are not at all
+the same thing; they are completely different from each other. There is also
+support for building using CMake, which some users find a more straightforward
+way of building PCRE under Windows.
+
+The MinGW home page (http://www.mingw.org/) says this:
+
+ MinGW: A collection of freely available and freely distributable Windows
+ specific header files and import libraries combined with GNU toolsets that
+ allow one to produce native Windows programs that do not rely on any
+ 3rd-party C runtime DLLs.
+
+The Cygwin home page (http://www.cygwin.com/) says this:
+
+ Cygwin is a Linux-like environment for Windows. It consists of two parts:
+
+ . A DLL (cygwin1.dll) which acts as a Linux API emulation layer providing
+ substantial Linux API functionality
+
+ . A collection of tools which provide Linux look and feel.
+
+ The Cygwin DLL currently works with all recent, commercially released x86 32
+ bit and 64 bit versions of Windows, with the exception of Windows CE.
+
+On both MinGW and Cygwin, PCRE should build correctly using:
+
+ ./configure && make && make install
+
+This should create two libraries called libpcre and libpcreposix, and, if you
+have enabled building the C++ wrapper, a third one called libpcrecpp. These are
+independent libraries: when you link with libpcreposix or libpcrecpp you must
+also link with libpcre, which contains the basic functions. (Some earlier
+releases of PCRE included the basic libpcre functions in libpcreposix. This no
+longer happens.)
+
+A user submitted a special-purpose patch that makes it easy to create
+"pcre.dll" under mingw32 using the "msys" environment. It provides "pcre.dll"
+as a special target. If you use this target, no other files are built, and in
+particular, the pcretest and pcregrep programs are not built. An example of how
+this might be used is:
+
+ ./configure --enable-utf --disable-cpp CFLAGS="-03 -s"; make pcre.dll
+
+Using Cygwin's compiler generates libraries and executables that depend on
+cygwin1.dll. If a library that is generated this way is distributed,
+cygwin1.dll has to be distributed as well. Since cygwin1.dll is under the GPL
+licence, this forces not only PCRE to be under the GPL, but also the entire
+application. A distributor who wants to keep their own code proprietary must
+purchase an appropriate Cygwin licence.
+
+MinGW has no such restrictions. The MinGW compiler generates a library or
+executable that can run standalone on Windows without any third party dll or
+licensing issues.
+
+But there is more complication:
+
+If a Cygwin user uses the -mno-cygwin Cygwin gcc flag, what that really does is
+to tell Cygwin's gcc to use the MinGW gcc. Cygwin's gcc is only acting as a
+front end to MinGW's gcc (if you install Cygwin's gcc, you get both Cygwin's
+gcc and MinGW's gcc). So, a user can:
+
+. Build native binaries by using MinGW or by getting Cygwin and using
+ -mno-cygwin.
+
+. Build binaries that depend on cygwin1.dll by using Cygwin with the normal
+ compiler flags.
+
+The test files that are supplied with PCRE are in UNIX format, with LF
+characters as line terminators. Unless your PCRE library uses a default newline
+option that includes LF as a valid newline, it may be necessary to change the
+line terminators in the test files to get some of the tests to work.
+
+
+BUILDING PCRE ON WINDOWS WITH CMAKE
+
+CMake is an alternative configuration facility that can be used instead of
+"configure". CMake creates project files (make files, solution files, etc.)
+tailored to numerous development environments, including Visual Studio,
+Borland, Msys, MinGW, NMake, and Unix. If possible, use short paths with no
+spaces in the names for your CMake installation and your PCRE source and build
+directories.
+
+The following instructions were contributed by a PCRE user. If they are not
+followed exactly, errors may occur. In the event that errors do occur, it is
+recommended that you delete the CMake cache before attempting to repeat the
+CMake build process. In the CMake GUI, the cache can be deleted by selecting
+"File > Delete Cache".
+
+1. Install the latest CMake version available from http://www.cmake.org/, and
+ ensure that cmake\bin is on your path.
+
+2. Unzip (retaining folder structure) the PCRE source tree into a source
+ directory such as C:\pcre. You should ensure your local date and time
+ is not earlier than the file dates in your source dir if the release is
+ very new.
+
+3. Create a new, empty build directory, preferably a subdirectory of the
+ source dir. For example, C:\pcre\pcre-xx\build.
+
+4. Run cmake-gui from the Shell envirornment of your build tool, for example,
+ Msys for Msys/MinGW or Visual Studio Command Prompt for VC/VC++. Do not try
+ to start Cmake from the Windows Start menu, as this can lead to errors.
+
+5. Enter C:\pcre\pcre-xx and C:\pcre\pcre-xx\build for the source and build
+ directories, respectively.
+
+6. Hit the "Configure" button.
+
+7. Select the particular IDE / build tool that you are using (Visual
+ Studio, MSYS makefiles, MinGW makefiles, etc.)
+
+8. The GUI will then list several configuration options. This is where
+ you can enable UTF-8 support or other PCRE optional features.
+
+9. Hit "Configure" again. The adjacent "Generate" button should now be
+ active.
+
+10. Hit "Generate".
+
+11. The build directory should now contain a usable build system, be it a
+ solution file for Visual Studio, makefiles for MinGW, etc. Exit from
+ cmake-gui and use the generated build system with your compiler or IDE.
+ E.g., for MinGW you can run "make", or for Visual Studio, open the PCRE
+ solution, select the desired configuration (Debug, or Release, etc.) and
+ build the ALL_BUILD project.
+
+12. If during configuration with cmake-gui you've elected to build the test
+ programs, you can execute them by building the test project. E.g., for
+ MinGW: "make test"; for Visual Studio build the RUN_TESTS project. The
+ most recent build configuration is targeted by the tests. A summary of
+ test results is presented. Complete test output is subsequently
+ available for review in Testing\Temporary under your build dir.
+
+
+USE OF RELATIVE PATHS WITH CMAKE ON WINDOWS
+
+A PCRE user comments as follows: I thought that others may want to know the
+current state of CMAKE_USE_RELATIVE_PATHS support on Windows. Here it is:
+
+-- AdditionalIncludeDirectories is only partially modified (only the
+ first path - see below)
+-- Only some of the contained file paths are modified - shown below for
+ pcre.vcproj
+-- It properly modifies
+
+I am sure CMake people can fix that if they want to. Until then one will
+need to replace existing absolute paths in project files with relative
+paths manually (e.g. from VS) - relative to project file location. I did
+just that before being told to try CMAKE_USE_RELATIVE_PATHS. Not a big
+deal.
+
+AdditionalIncludeDirectories="E:\builds\pcre\build;E:\builds\pcre\pcre-7.5;"
+AdditionalIncludeDirectories=".;E:\builds\pcre\pcre-7.5;"
+
+RelativePath="pcre.h"
+RelativePath="pcre_chartables.c"
+RelativePath="pcre_chartables.c.rule"
+
+
+TESTING WITH RUNTEST.BAT
+
+If configured with CMake, building the test project ("make test" or building
+ALL_TESTS in Visual Studio) creates (and runs) pcre_test.bat (and depending
+on your configuration options, possibly other test programs) in the build
+directory. Pcre_test.bat runs RunTest.Bat with correct source and exe paths.
+
+For manual testing with RunTest.bat, provided the build dir is a subdirectory
+of the source directory: Open command shell window. Chdir to the location
+of your pcretest.exe and pcregrep.exe programs. Call RunTest.bat with
+"..\RunTest.Bat" or "..\..\RunTest.bat" as appropriate.
+
+To run only a particular test with RunTest.Bat provide a test number argument.
+
+Otherwise:
+
+1. Copy RunTest.bat into the directory where pcretest.exe and pcregrep.exe
+ have been created.
+
+2. Edit RunTest.bat to indentify the full or relative location of
+ the pcre source (wherein which the testdata folder resides), e.g.:
+
+ set srcdir=C:\pcre\pcre-8.20
+
+3. In a Windows command environment, chdir to the location of your bat and
+ exe programs.
+
+4. Run RunTest.bat. Test outputs will automatically be compared to expected
+ results, and discrepancies will be identified in the console output.
+
+To independently test the just-in-time compiler, run pcre_jit_test.exe.
+To test pcrecpp, run pcrecpp_unittest.exe, pcre_stringpiece_unittest.exe and
+pcre_scanner_unittest.exe.
+
+
+BUILDING UNDER WINDOWS CE WITH VISUAL STUDIO 200x
+
+Vincent Richomme sent a zip archive of files to help with this process. They
+can be found in the file "pcre-vsbuild.zip" in the Contrib directory of the FTP
+site.
+
+
+BUILDING UNDER WINDOWS WITH BCC5.5
+
+Michael Roy sent these comments about building PCRE under Windows with BCC5.5:
+
+Some of the core BCC libraries have a version of PCRE from 1998 built in, which
+can lead to pcre_exec() giving an erroneous PCRE_ERROR_NULL from a version
+mismatch. I'm including an easy workaround below, if you'd like to include it
+in the non-unix instructions:
+
+When linking a project with BCC5.5, pcre.lib must be included before any of the
+libraries cw32.lib, cw32i.lib, cw32mt.lib, and cw32mti.lib on the command line.
+
+
+BUILDING USING BORLAND C++ BUILDER 2007 (CB2007) AND HIGHER
+
+A PCRE user sent these comments about this environment (see also the comment
+from another user that follows them):
+
+The XE versions of C++ Builder come with a RegularExpressionsCore class which
+contain a version of TPerlRegEx. However, direct use of the C PCRE library may
+be desirable.
+
+The default makevp.bat, however, supplied with PCRE builds a version of PCRE
+that is not usable with any version of C++ Builder because the compiler ships
+with an embedded version of PCRE, version 2.01 from 1998! [See also the note
+about BCC5.5 above.] If you want to use PCRE you'll need to rename the
+functions (pcre_compile to pcre_compile_bcc, etc) or do as I have done and just
+use the 16 bit versions. I'm using std::wstring everywhere anyway. Since the
+embedded version of PCRE does not have the 16 bit function names, there is no
+conflict.
+
+Building PCRE using a C++ Builder static library project file (recommended):
+
+1. Rename or remove pcre.h, pcreposi.h, and pcreposix.h from your C++ Builder
+original include path.
+
+2. Download PCRE from pcre.org and extract to a directory.
+
+3. Rename pcre_chartables.c.dist to pcre_chartables.c, pcre.h.generic to
+pcre.h, and config.h.generic to config.h.
+
+4. Edit pcre.h and pcre_config.c so that they include config.h.
+
+5. Edit config.h like so:
+
+Comment out the following lines:
+#define PACKAGE "pcre"
+#define PACKAGE_BUGREPORT ""
+#define PACKAGE_NAME "PCRE"
+#define PACKAGE_STRING "PCRE 8.32"
+#define PACKAGE_TARNAME "pcre"
+#define PACKAGE_URL ""
+#define PACKAGE_VERSION "8.32"
+
+Add the following lines:
+#ifndef SUPPORT_UTF
+#define SUPPORT_UTF 100 // any value is fine
+#endif
+
+#ifndef SUPPORT_UCP
+#define SUPPORT_UCP 101 // any value is fine
+#endif
+
+#ifndef SUPPORT_UCP
+#define SUPPORT_PCRE16 102 // any value is fine
+#endif
+
+#ifndef SUPPORT_UTF8
+#define SUPPORT_UTF8 103 // any value is fine
+#endif
+
+6. Build a C++ Builder project using the IDE. Go to File / New / Other and
+choose Static Library. You can name it pcre.cbproj or whatever. Now set your
+paths by going to Project / Options. Set the Include path. Do this from the
+"Base" option to apply to both Release and Debug builds. Now add the following
+files to the project:
+
+pcre.h
+pcre16_byte_order.c
+pcre16_chartables.c
+pcre16_compile.c
+pcre16_config.c
+pcre16_dfa_exec.c
+pcre16_exec.c
+pcre16_fullinfo.c
+pcre16_get.c
+pcre16_globals.c
+pcre16_maketables.c
+pcre16_newline.c
+pcre16_ord2utf16.c
+pcre16_printint.c
+pcre16_refcount.c
+pcre16_string_utils.c
+pcre16_study.c
+pcre16_tables.c
+pcre16_ucd.c
+pcre16_utf16_utils.c
+pcre16_valid_utf16.c
+pcre16_version.c
+pcre16_xclass.c
+
+//Optional
+pcre_version.c
+
+7. After compiling the .lib file, copy the .lib and header files to a project
+you want to use PCRE with. Enjoy.
+
+Optional ... Building PCRE using the makevp.bat file:
+
+1. Edit makevp_c.txt and makevp_l.txt and change all the names to the 16 bit
+versions.
+
+2. Edit makevp.bat and set the path to C++ Builder. Run makevp.bat.
+
+Another PCRE user added this comment:
+
+Another approach I successfully used for some years with BCB 5 and 6 was to
+make sure that include and library paths of PCRE are configured before the
+default paths of the IDE in the dialogs where one can manage those paths.
+Afterwards one can open the project files using a text editor and manually add
+the self created library for pcre itself, pcrecpp doesn't ship with the IDE, in
+the library nodes where the IDE manages its own libraries to link against in
+front of the IDE-own libraries. This way one can use the default PCRE function
+names without getting access violations on runtime.
+
+ <ALLLIB value="libpcre.lib $(LIBFILES) $(LIBRARIES) import32.lib cp32mt.lib"/>
+
+
+BUILDING PCRE ON OPENVMS
+
+Stephen Hoffman sent the following, in December 2012:
+
+"Here <http://labs.hoffmanlabs.com/node/1847> is a very short write-up on the
+OpenVMS port and here
+
+<http://labs.hoffmanlabs.com/labsnotes/pcre-vms-8_32.zip>
+
+is a zip with the OpenVMS files, and with one modified testing-related PCRE
+file." This is a port of PCRE 8.32.
+
+Earlier, Dan Mooney sent the following comments about building PCRE on OpenVMS.
+They relate to an older version of PCRE that used fewer source files, so the
+exact commands will need changing. See the current list of source files above.
+
+"It was quite easy to compile and link the library. I don't have a formal
+make file but the attached file [reproduced below] contains the OpenVMS DCL
+commands I used to build the library. I had to add #define
+POSIX_MALLOC_THRESHOLD 10 to pcre.h since it was not defined anywhere.
+
+The library was built on:
+O/S: HP OpenVMS v7.3-1
+Compiler: Compaq C v6.5-001-48BCD
+Linker: vA13-01
+
+The test results did not match 100% due to the issues you mention in your
+documentation regarding isprint(), iscntrl(), isgraph() and ispunct(). I
+modified some of the character tables temporarily and was able to get the
+results to match. Tests using the fr locale did not match since I don't have
+that locale loaded. The study size was always reported to be 3 less than the
+value in the standard test output files."
+
+=========================
+$! This DCL procedure builds PCRE on OpenVMS
+$!
+$! I followed the instructions in the non-unix-use file in the distribution.
+$!
+$ COMPILE == "CC/LIST/NOMEMBER_ALIGNMENT/PREFIX_LIBRARY_ENTRIES=ALL_ENTRIES
+$ COMPILE DFTABLES.C
+$ LINK/EXE=DFTABLES.EXE DFTABLES.OBJ
+$ RUN DFTABLES.EXE/OUTPUT=CHARTABLES.C
+$ COMPILE MAKETABLES.C
+$ COMPILE GET.C
+$ COMPILE STUDY.C
+$! I had to set POSIX_MALLOC_THRESHOLD to 10 in PCRE.H since the symbol
+$! did not seem to be defined anywhere.
+$! I edited pcre.h and added #DEFINE SUPPORT_UTF8 to enable UTF8 support.
+$ COMPILE PCRE.C
+$ LIB/CREATE PCRE MAKETABLES.OBJ, GET.OBJ, STUDY.OBJ, PCRE.OBJ
+$! I had to set POSIX_MALLOC_THRESHOLD to 10 in PCRE.H since the symbol
+$! did not seem to be defined anywhere.
+$ COMPILE PCREPOSIX.C
+$ LIB/CREATE PCREPOSIX PCREPOSIX.OBJ
+$ COMPILE PCRETEST.C
+$ LINK/EXE=PCRETEST.EXE PCRETEST.OBJ, PCRE/LIB, PCREPOSIX/LIB
+$! C programs that want access to command line arguments must be
+$! defined as a symbol
+$ PCRETEST :== "$ SYS$ROADSUSERS:[DMOONEY.REGEXP]PCRETEST.EXE"
+$! Arguments must be enclosed in quotes.
+$ PCRETEST "-C"
+$! Test results:
+$!
+$! The test results did not match 100%. The functions isprint(), iscntrl(),
+$! isgraph() and ispunct() on OpenVMS must not produce the same results
+$! as the system that built the test output files provided with the
+$! distribution.
+$!
+$! The study size did not match and was always 3 less on OpenVMS.
+$!
+$! Locale could not be set to fr
+$!
+=========================
+
+
+BUILDING PCRE ON STRATUS OPENVOS
+
+These notes on the port of PCRE to VOS (lightly edited) were supplied by
+Ashutosh Warikoo, whose email address has the local part awarikoo and the
+domain nse.co.in. The port was for version 7.9 in August 2009.
+
+1. Building PCRE
+
+I built pcre on OpenVOS Release 17.0.1at using GNU Tools 3.4a without any
+problems. I used the following packages to build PCRE:
+
+ ftp://ftp.stratus.com/pub/vos/posix/ga/posix.save.evf.gz
+
+Please read and follow the instructions that come with these packages. To start
+the build of pcre, from the root of the package type:
+
+ ./build.sh
+
+2. Installing PCRE
+
+Once you have successfully built PCRE, login to the SysAdmin group, switch to
+the root user, and type
+
+ [ !create_dir (master_disk)>usr --if needed ]
+ [ !create_dir (master_disk)>usr>local --if needed ]
+ !gmake install
+
+This installs PCRE and its man pages into /usr/local. You can add
+(master_disk)>usr>local>bin to your command search paths, or if you are in
+BASH, add /usr/local/bin to the PATH environment variable.
+
+4. Restrictions
+
+This port requires readline library optionally. However during the build I
+faced some yet unexplored errors while linking with readline. As it was an
+optional component I chose to disable it.
+
+5. Known Problems
+
+I ran the test suite, but you will have to be your own judge of whether this
+command, and this port, suits your purposes. If you find any problems that
+appear to be related to the port itself, please let me know. Please see the
+build.log file in the root of the package also.
+
+
+BUILDING PCRE ON NATIVE Z/OS AND Z/VM
+
+z/OS and z/VM are operating systems for mainframe computers, produced by IBM.
+The character code used is EBCDIC, not ASCII or Unicode. In z/OS, UNIX APIs and
+applications can be supported through UNIX System Services, and in such an
+environment PCRE can be built in the same way as in other systems. However, in
+native z/OS (without UNIX System Services) and in z/VM, special ports are
+required. For details, please see this web site:
+
+ http://www.zaconsultants.net
+
+There is also a mirror here:
+
+ http://www.vsoft-software.com/downloads.html
+
+==========================
+Last Updated: 14 May 2013
diff --git a/doc/html/README.txt b/doc/html/README.txt
new file mode 100644
index 0000000..88f2dfd
--- /dev/null
+++ b/doc/html/README.txt
@@ -0,0 +1,991 @@
+README file for PCRE (Perl-compatible regular expression library)
+-----------------------------------------------------------------
+
+The latest release of PCRE is always available in three alternative formats
+from:
+
+ ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
+ ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.bz2
+ ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.zip
+
+There is a mailing list for discussion about the development of PCRE at
+pcre-dev@exim.org. You can access the archives and subscribe or manage your
+subscription here:
+
+ https://lists.exim.org/mailman/listinfo/pcre-dev
+
+Please read the NEWS file if you are upgrading from a previous release.
+The contents of this README file are:
+
+ The PCRE APIs
+ Documentation for PCRE
+ Contributions by users of PCRE
+ Building PCRE on non-Unix-like systems
+ Building PCRE without using autotools
+ Building PCRE using autotools
+ Retrieving configuration information
+ Shared libraries
+ Cross-compiling using autotools
+ Using HP's ANSI C++ compiler (aCC)
+ Compiling in Tru64 using native compilers
+ Using Sun's compilers for Solaris
+ Using PCRE from MySQL
+ Making new tarballs
+ Testing PCRE
+ Character tables
+ File manifest
+
+
+The PCRE APIs
+-------------
+
+PCRE is written in C, and it has its own API. There are three sets of
+functions, one for the 8-bit library, which processes strings of bytes, one for
+the 16-bit library, which processes strings of 16-bit values, and one for the
+32-bit library, which processes strings of 32-bit values. The distribution also
+includes a set of C++ wrapper functions (see the pcrecpp man page for details),
+courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
+C++.
+
+In addition, there is a set of C wrapper functions (again, just for the 8-bit
+library) that are based on the POSIX regular expression API (see the pcreposix
+man page). These end up in the library called libpcreposix. Note that this just
+provides a POSIX calling interface to PCRE; the regular expressions themselves
+still follow Perl syntax and semantics. The POSIX API is restricted, and does
+not give full access to all of PCRE's facilities.
+
+The header file for the POSIX-style functions is called pcreposix.h. The
+official POSIX name is regex.h, but I did not want to risk possible problems
+with existing files of that name by distributing it that way. To use PCRE with
+an existing program that uses the POSIX API, pcreposix.h will have to be
+renamed or pointed at by a link.
+
+If you are using the POSIX interface to PCRE and there is already a POSIX regex
+library installed on your system, as well as worrying about the regex.h header
+file (as mentioned above), you must also take care when linking programs to
+ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
+up the POSIX functions of the same name from the other library.
+
+One way of avoiding this confusion is to compile PCRE with the addition of
+-Dregcomp=PCREregcomp (and similarly for the other POSIX functions) to the
+compiler flags (CFLAGS if you are using "configure" -- see below). This has the
+effect of renaming the functions so that the names no longer clash. Of course,
+you have to do the same thing for your applications, or write them using the
+new names.
+
+
+Documentation for PCRE
+----------------------
+
+If you install PCRE in the normal way on a Unix-like system, you will end up
+with a set of man pages whose names all start with "pcre". The one that is just
+called "pcre" lists all the others. In addition to these man pages, the PCRE
+documentation is supplied in two other forms:
+
+ 1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
+ doc/pcretest.txt in the source distribution. The first of these is a
+ concatenation of the text forms of all the section 3 man pages except
+ the listing of pcredemo.c and those that summarize individual functions.
+ The other two are the text forms of the section 1 man pages for the
+ pcregrep and pcretest commands. These text forms are provided for ease of
+ scanning with text editors or similar tools. They are installed in
+ <prefix>/share/doc/pcre, where <prefix> is the installation prefix
+ (defaulting to /usr/local).
+
+ 2. A set of files containing all the documentation in HTML form, hyperlinked
+ in various ways, and rooted in a file called index.html, is distributed in
+ doc/html and installed in <prefix>/share/doc/pcre/html.
+
+Users of PCRE have contributed files containing the documentation for various
+releases in CHM format. These can be found in the Contrib directory of the FTP
+site (see next section).
+
+
+Contributions by users of PCRE
+------------------------------
+
+You can find contributions from PCRE users in the directory
+
+ ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
+
+There is a README file giving brief descriptions of what they are. Some are
+complete in themselves; others are pointers to URLs containing relevant files.
+Some of this material is likely to be well out-of-date. Several of the earlier
+contributions provided support for compiling PCRE on various flavours of
+Windows (I myself do not use Windows). Nowadays there is more Windows support
+in the standard distribution, so these contibutions have been archived.
+
+A PCRE user maintains downloadable Windows binaries of the pcregrep and
+pcretest programs here:
+
+ http://www.rexegg.com/pcregrep-pcretest.html
+
+
+Building PCRE on non-Unix-like systems
+--------------------------------------
+
+For a non-Unix-like system, please read the comments in the file
+NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
+"make" you may be able to build PCRE using autotools in the same way as for
+many Unix-like systems.
+
+PCRE can also be configured using the GUI facility provided by CMake's
+cmake-gui command. This creates Makefiles, solution files, etc. The file
+NON-AUTOTOOLS-BUILD has information about CMake.
+
+PCRE has been compiled on many different operating systems. It should be
+straightforward to build PCRE on any system that has a Standard C compiler and
+library, because it uses only Standard C functions.
+
+
+Building PCRE without using autotools
+-------------------------------------
+
+The use of autotools (in particular, libtool) is problematic in some
+environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
+file for ways of building PCRE without using autotools.
+
+
+Building PCRE using autotools
+-----------------------------
+
+If you are using HP's ANSI C++ compiler (aCC), please see the special note
+in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
+
+The following instructions assume the use of the widely used "configure; make;
+make install" (autotools) process.
+
+To build PCRE on system that supports autotools, first run the "configure"
+command from the PCRE distribution directory, with your current directory set
+to the directory where you want the files to be created. This command is a
+standard GNU "autoconf" configuration script, for which generic instructions
+are supplied in the file INSTALL.
+
+Most commonly, people build PCRE within its own distribution directory, and in
+this case, on many systems, just running "./configure" is sufficient. However,
+the usual methods of changing standard defaults are available. For example:
+
+CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
+
+This command specifies that the C compiler should be run with the flags '-O2
+-Wall' instead of the default, and that "make install" should install PCRE
+under /opt/local instead of the default /usr/local.
+
+If you want to build in a different directory, just run "configure" with that
+directory as current. For example, suppose you have unpacked the PCRE source
+into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
+
+cd /build/pcre/pcre-xxx
+/source/pcre/pcre-xxx/configure
+
+PCRE is written in C and is normally compiled as a C library. However, it is
+possible to build it as a C++ library, though the provided building apparatus
+does not have any features to support this.
+
+There are some optional features that can be included or omitted from the PCRE
+library. They are also documented in the pcrebuild man page.
+
+. By default, both shared and static libraries are built. You can change this
+ by adding one of these options to the "configure" command:
+
+ --disable-shared
+ --disable-static
+
+ (See also "Shared libraries on Unix-like systems" below.)
+
+. By default, only the 8-bit library is built. If you add --enable-pcre16 to
+ the "configure" command, the 16-bit library is also built. If you add
+ --enable-pcre32 to the "configure" command, the 32-bit library is also built.
+ If you want only the 16-bit or 32-bit library, use --disable-pcre8 to disable
+ building the 8-bit library.
+
+. If you are building the 8-bit library and want to suppress the building of
+ the C++ wrapper library, you can add --disable-cpp to the "configure"
+ command. Otherwise, when "configure" is run without --disable-pcre8, it will
+ try to find a C++ compiler and C++ header files, and if it succeeds, it will
+ try to build the C++ wrapper.
+
+. If you want to include support for just-in-time compiling, which can give
+ large performance improvements on certain platforms, add --enable-jit to the
+ "configure" command. This support is available only for certain hardware
+ architectures. If you try to enable it on an unsupported architecture, there
+ will be a compile time error.
+
+. When JIT support is enabled, pcregrep automatically makes use of it, unless
+ you add --disable-pcregrep-jit to the "configure" command.
+
+. If you want to make use of the support for UTF-8 Unicode character strings in
+ the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
+ or UTF-32 Unicode character strings in the 32-bit library, you must add
+ --enable-utf to the "configure" command. Without it, the code for handling
+ UTF-8, UTF-16 and UTF-8 is not included in the relevant library. Even
+ when --enable-utf is included, the use of a UTF encoding still has to be
+ enabled by an option at run time. When PCRE is compiled with this option, its
+ input can only either be ASCII or UTF-8/16/32, even when running on EBCDIC
+ platforms. It is not possible to use both --enable-utf and --enable-ebcdic at
+ the same time.
+
+. There are no separate options for enabling UTF-8, UTF-16 and UTF-32
+ independently because that would allow ridiculous settings such as requesting
+ UTF-16 support while building only the 8-bit library. However, the option
+ --enable-utf8 is retained for backwards compatibility with earlier releases
+ that did not support 16-bit or 32-bit character strings. It is synonymous with
+ --enable-utf. It is not possible to configure one library with UTF support
+ and the other without in the same configuration.
+
+. If, in addition to support for UTF-8/16/32 character strings, you want to
+ include support for the \P, \p, and \X sequences that recognize Unicode
+ character properties, you must add --enable-unicode-properties to the
+ "configure" command. This adds about 30K to the size of the library (in the
+ form of a property table); only the basic two-letter properties such as Lu
+ are supported.
+
+. You can build PCRE to recognize either CR or LF or the sequence CRLF or any
+ of the preceding, or any of the Unicode newline sequences as indicating the
+ end of a line. Whatever you specify at build time is the default; the caller
+ of PCRE can change the selection at run time. The default newline indicator
+ is a single LF character (the Unix standard). You can specify the default
+ newline indicator by adding --enable-newline-is-cr or --enable-newline-is-lf
+ or --enable-newline-is-crlf or --enable-newline-is-anycrlf or
+ --enable-newline-is-any to the "configure" command, respectively.
+
+ If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
+ the standard tests will fail, because the lines in the test files end with
+ LF. Even if the files are edited to change the line endings, there are likely
+ to be some failures. With --enable-newline-is-anycrlf or
+ --enable-newline-is-any, many tests should succeed, but there may be some
+ failures.
+
+. By default, the sequence \R in a pattern matches any Unicode line ending
+ sequence. This is independent of the option specifying what PCRE considers to
+ be the end of a line (see above). However, the caller of PCRE can restrict \R
+ to match only CR, LF, or CRLF. You can make this the default by adding
+ --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
+
+. When called via the POSIX interface, PCRE uses malloc() to get additional
+ storage for processing capturing parentheses if there are more than 10 of
+ them in a pattern. You can increase this threshold by setting, for example,
+
+ --with-posix-malloc-threshold=20
+
+ on the "configure" command.
+
+. PCRE has a counter that limits the depth of nesting of parentheses in a
+ pattern. This limits the amount of system stack that a pattern uses when it
+ is compiled. The default is 250, but you can change it by setting, for
+ example,
+
+ --with-parens-nest-limit=500
+
+. PCRE has a counter that can be set to limit the amount of resources it uses
+ when matching a pattern. If the limit is exceeded during a match, the match
+ fails. The default is ten million. You can change the default by setting, for
+ example,
+
+ --with-match-limit=500000
+
+ on the "configure" command. This is just the default; individual calls to
+ pcre_exec() can supply their own value. There is more discussion on the
+ pcreapi man page.
+
+. There is a separate counter that limits the depth of recursive function calls
+ during a matching process. This also has a default of ten million, which is
+ essentially "unlimited". You can change the default by setting, for example,
+
+ --with-match-limit-recursion=500000
+
+ Recursive function calls use up the runtime stack; running out of stack can
+ cause programs to crash in strange ways. There is a discussion about stack
+ sizes in the pcrestack man page.
+
+. The default maximum compiled pattern size is around 64K. You can increase
+ this by adding --with-link-size=3 to the "configure" command. In the 8-bit
+ library, PCRE then uses three bytes instead of two for offsets to different
+ parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
+ the same as --with-link-size=4, which (in both libraries) uses four-byte
+ offsets. Increasing the internal link size reduces performance. In the 32-bit
+ library, the only supported link size is 4.
+
+. You can build PCRE so that its internal match() function that is called from
+ pcre_exec() does not call itself recursively. Instead, it uses memory blocks
+ obtained from the heap via the special functions pcre_stack_malloc() and
+ pcre_stack_free() to save data that would otherwise be saved on the stack. To
+ build PCRE like this, use
+
+ --disable-stack-for-recursion
+
+ on the "configure" command. PCRE runs more slowly in this mode, but it may be
+ necessary in environments with limited stack sizes. This applies only to the
+ normal execution of the pcre_exec() function; if JIT support is being
+ successfully used, it is not relevant. Equally, it does not apply to
+ pcre_dfa_exec(), which does not use deeply nested recursion. There is a
+ discussion about stack sizes in the pcrestack man page.
+
+. For speed, PCRE uses four tables for manipulating and identifying characters
+ whose code point values are less than 256. By default, it uses a set of
+ tables for ASCII encoding that is part of the distribution. If you specify
+
+ --enable-rebuild-chartables
+
+ a program called dftables is compiled and run in the default C locale when
+ you obey "make". It builds a source file called pcre_chartables.c. If you do
+ not specify this option, pcre_chartables.c is created as a copy of
+ pcre_chartables.c.dist. See "Character tables" below for further information.
+
+. It is possible to compile PCRE for use on systems that use EBCDIC as their
+ character code (as opposed to ASCII/Unicode) by specifying
+
+ --enable-ebcdic
+
+ This automatically implies --enable-rebuild-chartables (see above). However,
+ when PCRE is built this way, it always operates in EBCDIC. It cannot support
+ both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
+ which specifies that the code value for the EBCDIC NL character is 0x25
+ instead of the default 0x15.
+
+. In environments where valgrind is installed, if you specify
+
+ --enable-valgrind
+
+ PCRE will use valgrind annotations to mark certain memory regions as
+ unaddressable. This allows it to detect invalid memory accesses, and is
+ mostly useful for debugging PCRE itself.
+
+. In environments where the gcc compiler is used and lcov version 1.6 or above
+ is installed, if you specify
+
+ --enable-coverage
+
+ the build process implements a code coverage report for the test suite. The
+ report is generated by running "make coverage". If ccache is installed on
+ your system, it must be disabled when building PCRE for coverage reporting.
+ You can do this by setting the environment variable CCACHE_DISABLE=1 before
+ running "make" to build PCRE. There is more information about coverage
+ reporting in the "pcrebuild" documentation.
+
+. The pcregrep program currently supports only 8-bit data files, and so
+ requires the 8-bit PCRE library. It is possible to compile pcregrep to use
+ libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
+ specifying one or both of
+
+ --enable-pcregrep-libz
+ --enable-pcregrep-libbz2
+
+ Of course, the relevant libraries must be installed on your system.
+
+. The default size (in bytes) of the internal buffer used by pcregrep can be
+ set by, for example:
+
+ --with-pcregrep-bufsize=51200
+
+ The value must be a plain integer. The default is 20480.
+
+. It is possible to compile pcretest so that it links with the libreadline
+ or libedit libraries, by specifying, respectively,
+
+ --enable-pcretest-libreadline or --enable-pcretest-libedit
+
+ If this is done, when pcretest's input is from a terminal, it reads it using
+ the readline() function. This provides line-editing and history facilities.
+ Note that libreadline is GPL-licenced, so if you distribute a binary of
+ pcretest linked in this way, there may be licensing issues. These can be
+ avoided by linking with libedit (which has a BSD licence) instead.
+
+ Enabling libreadline causes the -lreadline option to be added to the pcretest
+ build. In many operating environments with a sytem-installed readline
+ library this is sufficient. However, in some environments (e.g. if an
+ unmodified distribution version of readline is in use), it may be necessary
+ to specify something like LIBS="-lncurses" as well. This is because, to quote
+ the readline INSTALL, "Readline uses the termcap functions, but does not link
+ with the termcap or curses library itself, allowing applications which link
+ with readline the to choose an appropriate library." If you get error
+ messages about missing functions tgetstr, tgetent, tputs, tgetflag, or tgoto,
+ this is the problem, and linking with the ncurses library should fix it.
+
+The "configure" script builds the following files for the basic C library:
+
+. Makefile the makefile that builds the library
+. config.h build-time configuration options for the library
+. pcre.h the public PCRE header file
+. pcre-config script that shows the building settings such as CFLAGS
+ that were set for "configure"
+. libpcre.pc ) data for the pkg-config command
+. libpcre16.pc )
+. libpcre32.pc )
+. libpcreposix.pc )
+. libtool script that builds shared and/or static libraries
+
+Versions of config.h and pcre.h are distributed in the PCRE tarballs under the
+names config.h.generic and pcre.h.generic. These are provided for those who
+have to built PCRE without using "configure" or CMake. If you use "configure"
+or CMake, the .generic versions are not used.
+
+When building the 8-bit library, if a C++ compiler is found, the following
+files are also built:
+
+. libpcrecpp.pc data for the pkg-config command
+. pcrecpparg.h header file for calling PCRE via the C++ wrapper
+. pcre_stringpiece.h header for the C++ "stringpiece" functions
+
+The "configure" script also creates config.status, which is an executable
+script that can be run to recreate the configuration, and config.log, which
+contains compiler output from tests that "configure" runs.
+
+Once "configure" has run, you can run "make". This builds the the libraries
+libpcre, libpcre16 and/or libpcre32, and a test program called pcretest. If you
+enabled JIT support with --enable-jit, a test program called pcre_jit_test is
+built as well.
+
+If the 8-bit library is built, libpcreposix and the pcregrep command are also
+built, and if a C++ compiler was found on your system, and you did not disable
+it with --disable-cpp, "make" builds the C++ wrapper library, which is called
+libpcrecpp, as well as some test programs called pcrecpp_unittest,
+pcre_scanner_unittest, and pcre_stringpiece_unittest.
+
+The command "make check" runs all the appropriate tests. Details of the PCRE
+tests are given below in a separate section of this document.
+
+You can use "make install" to install PCRE into live directories on your
+system. The following are installed (file names are all relative to the
+<prefix> that is set when "configure" is run):
+
+ Commands (bin):
+ pcretest
+ pcregrep (if 8-bit support is enabled)
+ pcre-config
+
+ Libraries (lib):
+ libpcre16 (if 16-bit support is enabled)
+ libpcre32 (if 32-bit support is enabled)
+ libpcre (if 8-bit support is enabled)
+ libpcreposix (if 8-bit support is enabled)
+ libpcrecpp (if 8-bit and C++ support is enabled)
+
+ Configuration information (lib/pkgconfig):
+ libpcre16.pc
+ libpcre32.pc
+ libpcre.pc
+ libpcreposix.pc
+ libpcrecpp.pc (if C++ support is enabled)
+
+ Header files (include):
+ pcre.h
+ pcreposix.h
+ pcre_scanner.h )
+ pcre_stringpiece.h ) if C++ support is enabled
+ pcrecpp.h )
+ pcrecpparg.h )
+
+ Man pages (share/man/man{1,3}):
+ pcregrep.1
+ pcretest.1
+ pcre-config.1
+ pcre.3
+ pcre*.3 (lots more pages, all starting "pcre")
+
+ HTML documentation (share/doc/pcre/html):
+ index.html
+ *.html (lots more pages, hyperlinked from index.html)
+
+ Text file documentation (share/doc/pcre):
+ AUTHORS
+ COPYING
+ ChangeLog
+ LICENCE
+ NEWS
+ README
+ pcre.txt (a concatenation of the man(3) pages)
+ pcretest.txt the pcretest man page
+ pcregrep.txt the pcregrep man page
+ pcre-config.txt the pcre-config man page
+
+If you want to remove PCRE from your system, you can run "make uninstall".
+This removes all the files that "make install" installed. However, it does not
+remove any directories, because these are often shared with other programs.
+
+
+Retrieving configuration information
+------------------------------------
+
+Running "make install" installs the command pcre-config, which can be used to
+recall information about the PCRE configuration and installation. For example:
+
+ pcre-config --version
+
+prints the version number, and
+
+ pcre-config --libs
+
+outputs information about where the library is installed. This command can be
+included in makefiles for programs that use PCRE, saving the programmer from
+having to remember too many details.
+
+The pkg-config command is another system for saving and retrieving information
+about installed libraries. Instead of separate commands for each library, a
+single command is used. For example:
+
+ pkg-config --cflags pcre
+
+The data is held in *.pc files that are installed in a directory called
+<prefix>/lib/pkgconfig.
+
+
+Shared libraries
+----------------
+
+The default distribution builds PCRE as shared libraries and static libraries,
+as long as the operating system supports shared libraries. Shared library
+support relies on the "libtool" script which is built as part of the
+"configure" process.
+
+The libtool script is used to compile and link both shared and static
+libraries. They are placed in a subdirectory called .libs when they are newly
+built. The programs pcretest and pcregrep are built to use these uninstalled
+libraries (by means of wrapper scripts in the case of shared libraries). When
+you use "make install" to install shared libraries, pcregrep and pcretest are
+automatically re-built to use the newly installed shared libraries before being
+installed themselves. However, the versions left in the build directory still
+use the uninstalled libraries.
+
+To build PCRE using static libraries only you must use --disable-shared when
+configuring it. For example:
+
+./configure --prefix=/usr/gnu --disable-shared
+
+Then run "make" in the usual way. Similarly, you can use --disable-static to
+build only shared libraries.
+
+
+Cross-compiling using autotools
+-------------------------------
+
+You can specify CC and CFLAGS in the normal way to the "configure" command, in
+order to cross-compile PCRE for some other host. However, you should NOT
+specify --enable-rebuild-chartables, because if you do, the dftables.c source
+file is compiled and run on the local host, in order to generate the inbuilt
+character tables (the pcre_chartables.c file). This will probably not work,
+because dftables.c needs to be compiled with the local compiler, not the cross
+compiler.
+
+When --enable-rebuild-chartables is not specified, pcre_chartables.c is created
+by making a copy of pcre_chartables.c.dist, which is a default set of tables
+that assumes ASCII code. Cross-compiling with the default tables should not be
+a problem.
+
+If you need to modify the character tables when cross-compiling, you should
+move pcre_chartables.c.dist out of the way, then compile dftables.c by hand and
+run it on the local host to make a new version of pcre_chartables.c.dist.
+Then when you cross-compile PCRE this new version of the tables will be used.
+
+
+Using HP's ANSI C++ compiler (aCC)
+----------------------------------
+
+Unless C++ support is disabled by specifying the "--disable-cpp" option of the
+"configure" script, you must include the "-AA" option in the CXXFLAGS
+environment variable in order for the C++ components to compile correctly.
+
+Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
+needed libraries fail to get included when specifying the "-AA" compiler
+option. If you experience unresolved symbols when linking the C++ programs,
+use the workaround of specifying the following environment variable prior to
+running the "configure" script:
+
+ CXXLDFLAGS="-lstd_v2 -lCsup_v2"
+
+
+Compiling in Tru64 using native compilers
+-----------------------------------------
+
+The following error may occur when compiling with native compilers in the Tru64
+operating system:
+
+ CXX libpcrecpp_la-pcrecpp.lo
+cxx: Error: /usr/lib/cmplrs/cxx/V7.1-006/include/cxx/iosfwd, line 58: #error
+ directive: "cannot include iosfwd -- define __USE_STD_IOSTREAM to
+ override default - see section 7.1.2 of the C++ Using Guide"
+#error "cannot include iosfwd -- define __USE_STD_IOSTREAM to override default
+- see section 7.1.2 of the C++ Using Guide"
+
+This may be followed by other errors, complaining that 'namespace "std" has no
+member'. The solution to this is to add the line
+
+#define __USE_STD_IOSTREAM 1
+
+to the config.h file.
+
+
+Using Sun's compilers for Solaris
+---------------------------------
+
+A user reports that the following configurations work on Solaris 9 sparcv9 and
+Solaris 9 x86 (32-bit):
+
+ Solaris 9 sparcv9: ./configure --disable-cpp CC=/bin/cc CFLAGS="-m64 -g"
+ Solaris 9 x86: ./configure --disable-cpp CC=/bin/cc CFLAGS="-g"
+
+
+Using PCRE from MySQL
+---------------------
+
+On systems where both PCRE and MySQL are installed, it is possible to make use
+of PCRE from within MySQL, as an alternative to the built-in pattern matching.
+There is a web page that tells you how to do this:
+
+ http://www.mysqludf.org/lib_mysqludf_preg/index.php
+
+
+Making new tarballs
+-------------------
+
+The command "make dist" creates three PCRE tarballs, in tar.gz, tar.bz2, and
+zip formats. The command "make distcheck" does the same, but then does a trial
+build of the new distribution to ensure that it works.
+
+If you have modified any of the man page sources in the doc directory, you
+should first run the PrepareRelease script before making a distribution. This
+script creates the .txt and HTML forms of the documentation from the man pages.
+
+
+Testing PCRE
+------------
+
+To test the basic PCRE library on a Unix-like system, run the RunTest script.
+There is another script called RunGrepTest that tests the options of the
+pcregrep command. If the C++ wrapper library is built, three test programs
+called pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest
+are also built. When JIT support is enabled, another test program called
+pcre_jit_test is built.
+
+Both the scripts and all the program tests are run if you obey "make check" or
+"make test". For other environments, see the instructions in
+NON-AUTOTOOLS-BUILD.
+
+The RunTest script runs the pcretest test program (which is documented in its
+own man page) on each of the relevant testinput files in the testdata
+directory, and compares the output with the contents of the corresponding
+testoutput files. RunTest uses a file called testtry to hold the main output
+from pcretest. Other files whose names begin with "test" are used as working
+files in some tests.
+
+Some tests are relevant only when certain build-time options were selected. For
+example, the tests for UTF-8/16/32 support are run only if --enable-utf was
+used. RunTest outputs a comment when it skips a test.
+
+Many of the tests that are not skipped are run up to three times. The second
+run forces pcre_study() to be called for all patterns except for a few in some
+tests that are marked "never study" (see the pcretest program for how this is
+done). If JIT support is available, the non-DFA tests are run a third time,
+this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.
+This testing can be suppressed by putting "nojit" on the RunTest command line.
+
+The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
+libraries that are enabled. If you want to run just one set of tests, call
+RunTest with either the -8, -16 or -32 option.
+
+If valgrind is installed, you can run the tests under it by putting "valgrind"
+on the RunTest command line. To run pcretest on just one or more specific test
+files, give their numbers as arguments to RunTest, for example:
+
+ RunTest 2 7 11
+
+You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
+end), or a number preceded by ~ to exclude a test. For example:
+
+ Runtest 3-15 ~10
+
+This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
+except test 13. Whatever order the arguments are in, the tests are always run
+in numerical order.
+
+You can also call RunTest with the single argument "list" to cause it to output
+a list of tests.
+
+The first test file can be fed directly into the perltest.pl script to check
+that Perl gives the same results. The only difference you should see is in the
+first few lines, where the Perl version is given instead of the PCRE version.
+
+The second set of tests check pcre_fullinfo(), pcre_study(),
+pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
+detection, and run-time flags that are specific to PCRE, as well as the POSIX
+wrapper API. It also uses the debugging flags to check some of the internals of
+pcre_compile().
+
+If you build PCRE with a locale setting that is not the standard C locale, the
+character tables may be different (see next paragraph). In some cases, this may
+cause failures in the second set of tests. For example, in a locale where the
+isprint() function yields TRUE for characters in the range 128-255, the use of
+[:isascii:] inside a character class defines a different set of characters, and
+this shows up in this test as a difference in the compiled code, which is being
+listed for checking. Where the comparison test output contains [\x00-\x7f] the
+test will contain [\x00-\xff], and similarly in some other cases. This is not a
+bug in PCRE.
+
+The third set of tests checks pcre_maketables(), the facility for building a
+set of character tables for a specific locale and using them instead of the
+default tables. The tests make use of the "fr_FR" (French) locale. Before
+running the test, the script checks for the presence of this locale by running
+the "locale" command. If that command fails, or if it doesn't include "fr_FR"
+in the list of available locales, the third test cannot be run, and a comment
+is output to say why. If running this test produces instances of the error
+
+ ** Failed to set locale "fr_FR"
+
+in the comparison output, it means that locale is not available on your system,
+despite being listed by "locale". This does not mean that PCRE is broken.
+
+[If you are trying to run this test on Windows, you may be able to get it to
+work by changing "fr_FR" to "french" everywhere it occurs. Alternatively, use
+RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
+Windows versions of test 2. More info on using RunTest.bat is included in the
+document entitled NON-UNIX-USE.]
+
+The fourth and fifth tests check the UTF-8/16/32 support and error handling and
+internal UTF features of PCRE that are not relevant to Perl, respectively. The
+sixth and seventh tests do the same for Unicode character properties support.
+
+The eighth, ninth, and tenth tests check the pcre_dfa_exec() alternative
+matching function, in non-UTF-8/16/32 mode, UTF-8/16/32 mode, and UTF-8/16/32
+mode with Unicode property support, respectively.
+
+The eleventh test checks some internal offsets and code size features; it is
+run only when the default "link size" of 2 is set (in other cases the sizes
+change) and when Unicode property support is enabled.
+
+The twelfth test is run only when JIT support is available, and the thirteenth
+test is run only when JIT support is not available. They test some JIT-specific
+features such as information output from pcretest about JIT compilation.
+
+The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and
+the seventeenth, eighteenth, and nineteenth tests are run only in 16/32-bit
+mode. These are tests that generate different output in the two modes. They are
+for general cases, UTF-8/16/32 support, and Unicode property support,
+respectively.
+
+The twentieth test is run only in 16/32-bit mode. It tests some specific
+16/32-bit features of the DFA matching engine.
+
+The twenty-first and twenty-second tests are run only in 16/32-bit mode, when
+the link size is set to 2 for the 16-bit library. They test reloading
+pre-compiled patterns.
+
+The twenty-third and twenty-fourth tests are run only in 16-bit mode. They are
+for general cases, and UTF-16 support, respectively.
+
+The twenty-fifth and twenty-sixth tests are run only in 32-bit mode. They are
+for general cases, and UTF-32 support, respectively.
+
+
+Character tables
+----------------
+
+For speed, PCRE uses four tables for manipulating and identifying characters
+whose code point values are less than 256. The final argument of the
+pcre_compile() function is a pointer to a block of memory containing the
+concatenated tables. A call to pcre_maketables() can be used to generate a set
+of tables in the current locale. If the final argument for pcre_compile() is
+passed as NULL, a set of default tables that is built into the binary is used.
+
+The source file called pcre_chartables.c contains the default set of tables. By
+default, this is created as a copy of pcre_chartables.c.dist, which contains
+tables for ASCII coding. However, if --enable-rebuild-chartables is specified
+for ./configure, a different version of pcre_chartables.c is built by the
+program dftables (compiled from dftables.c), which uses the ANSI C character
+handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to
+build the table sources. This means that the default C locale which is set for
+your system will control the contents of these default tables. You can change
+the default tables by editing pcre_chartables.c and then re-building PCRE. If
+you do this, you should take care to ensure that the file does not get
+automatically re-generated. The best way to do this is to move
+pcre_chartables.c.dist out of the way and replace it with your customized
+tables.
+
+When the dftables program is run as a result of --enable-rebuild-chartables,
+it uses the default C locale that is set on your system. It does not pay
+attention to the LC_xxx environment variables. In other words, it uses the
+system's default locale rather than whatever the compiling user happens to have
+set. If you really do want to build a source set of character tables in a
+locale that is specified by the LC_xxx variables, you can run the dftables
+program by hand with the -L option. For example:
+
+ ./dftables -L pcre_chartables.c.special
+
+The first two 256-byte tables provide lower casing and case flipping functions,
+respectively. The next table consists of three 32-byte bit maps which identify
+digits, "word" characters, and white space, respectively. These are used when
+building 32-byte bit maps that represent character classes for code points less
+than 256.
+
+The final 256-byte table has bits indicating various character types, as
+follows:
+
+ 1 white space character
+ 2 letter
+ 4 decimal digit
+ 8 hexadecimal digit
+ 16 alphanumeric or '_'
+ 128 regular expression metacharacter or binary zero
+
+You should not alter the set of characters that contain the 128 bit, as that
+will cause PCRE to malfunction.
+
+
+File manifest
+-------------
+
+The distribution should contain the files listed below. Where a file name is
+given as pcre[16|32]_xxx it means that there are three files, one with the name
+pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
+
+(A) Source files of the PCRE library functions and their headers:
+
+ dftables.c auxiliary program for building pcre_chartables.c
+ when --enable-rebuild-chartables is specified
+
+ pcre_chartables.c.dist a default set of character tables that assume ASCII
+ coding; used, unless --enable-rebuild-chartables is
+ specified, by copying to pcre[16]_chartables.c
+
+ pcreposix.c )
+ pcre[16|32]_byte_order.c )
+ pcre[16|32]_compile.c )
+ pcre[16|32]_config.c )
+ pcre[16|32]_dfa_exec.c )
+ pcre[16|32]_exec.c )
+ pcre[16|32]_fullinfo.c )
+ pcre[16|32]_get.c ) sources for the functions in the library,
+ pcre[16|32]_globals.c ) and some internal functions that they use
+ pcre[16|32]_jit_compile.c )
+ pcre[16|32]_maketables.c )
+ pcre[16|32]_newline.c )
+ pcre[16|32]_refcount.c )
+ pcre[16|32]_string_utils.c )
+ pcre[16|32]_study.c )
+ pcre[16|32]_tables.c )
+ pcre[16|32]_ucd.c )
+ pcre[16|32]_version.c )
+ pcre[16|32]_xclass.c )
+ pcre_ord2utf8.c )
+ pcre_valid_utf8.c )
+ pcre16_ord2utf16.c )
+ pcre16_utf16_utils.c )
+ pcre16_valid_utf16.c )
+ pcre32_utf32_utils.c )
+ pcre32_valid_utf32.c )
+
+ pcre[16|32]_printint.c ) debugging function that is used by pcretest,
+ ) and can also be #included in pcre_compile()
+
+ pcre.h.in template for pcre.h when built by "configure"
+ pcreposix.h header for the external POSIX wrapper API
+ pcre_internal.h header for internal use
+ sljit/* 16 files that make up the JIT compiler
+ ucp.h header for Unicode property handling
+
+ config.h.in template for config.h, which is built by "configure"
+
+ pcrecpp.h public header file for the C++ wrapper
+ pcrecpparg.h.in template for another C++ header file
+ pcre_scanner.h public header file for C++ scanner functions
+ pcrecpp.cc )
+ pcre_scanner.cc ) source for the C++ wrapper library
+
+ pcre_stringpiece.h.in template for pcre_stringpiece.h, the header for the
+ C++ stringpiece functions
+ pcre_stringpiece.cc source for the C++ stringpiece functions
+
+(B) Source files for programs that use PCRE:
+
+ pcredemo.c simple demonstration of coding calls to PCRE
+ pcregrep.c source of a grep utility that uses PCRE
+ pcretest.c comprehensive test program
+
+(C) Auxiliary files:
+
+ 132html script to turn "man" pages into HTML
+ AUTHORS information about the author of PCRE
+ ChangeLog log of changes to the code
+ CleanTxt script to clean nroff output for txt man pages
+ Detrail script to remove trailing spaces
+ HACKING some notes about the internals of PCRE
+ INSTALL generic installation instructions
+ LICENCE conditions for the use of PCRE
+ COPYING the same, using GNU's standard name
+ Makefile.in ) template for Unix Makefile, which is built by
+ ) "configure"
+ Makefile.am ) the automake input that was used to create
+ ) Makefile.in
+ NEWS important changes in this release
+ NON-UNIX-USE the previous name for NON-AUTOTOOLS-BUILD
+ NON-AUTOTOOLS-BUILD notes on building PCRE without using autotools
+ PrepareRelease script to make preparations for "make dist"
+ README this file
+ RunTest a Unix shell script for running tests
+ RunGrepTest a Unix shell script for pcregrep tests
+ aclocal.m4 m4 macros (generated by "aclocal")
+ config.guess ) files used by libtool,
+ config.sub ) used only when building a shared library
+ configure a configuring shell script (built by autoconf)
+ configure.ac ) the autoconf input that was used to build
+ ) "configure" and config.h
+ depcomp ) script to find program dependencies, generated by
+ ) automake
+ doc/*.3 man page sources for PCRE
+ doc/*.1 man page sources for pcregrep and pcretest
+ doc/index.html.src the base HTML page
+ doc/html/* HTML documentation
+ doc/pcre.txt plain text version of the man pages
+ doc/pcretest.txt plain text documentation of test program
+ doc/perltest.txt plain text documentation of Perl test program
+ install-sh a shell script for installing files
+ libpcre16.pc.in template for libpcre16.pc for pkg-config
+ libpcre32.pc.in template for libpcre32.pc for pkg-config
+ libpcre.pc.in template for libpcre.pc for pkg-config
+ libpcreposix.pc.in template for libpcreposix.pc for pkg-config
+ libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config
+ ltmain.sh file used to build a libtool script
+ missing ) common stub for a few missing GNU programs while
+ ) installing, generated by automake
+ mkinstalldirs script for making install directories
+ perltest.pl Perl test program
+ pcre-config.in source of script which retains PCRE information
+ pcre_jit_test.c test program for the JIT compiler
+ pcrecpp_unittest.cc )
+ pcre_scanner_unittest.cc ) test programs for the C++ wrapper
+ pcre_stringpiece_unittest.cc )
+ testdata/testinput* test data for main library tests
+ testdata/testoutput* expected test results
+ testdata/grep* input and output for pcregrep tests
+ testdata/* other supporting test files
+
+(D) Auxiliary files for cmake support
+
+ cmake/COPYING-CMAKE-SCRIPTS
+ cmake/FindPackageHandleStandardArgs.cmake
+ cmake/FindEditline.cmake
+ cmake/FindReadline.cmake
+ CMakeLists.txt
+ config-cmake.h.in
+
+(E) Auxiliary files for VPASCAL
+
+ makevp.bat
+ makevp_c.txt
+ makevp_l.txt
+ pcregexp.pas
+
+(F) Auxiliary files for building PCRE "by hand"
+
+ pcre.h.generic ) a version of the public PCRE header file
+ ) for use in non-"configure" environments
+ config.h.generic ) a version of config.h for use in non-"configure"
+ ) environments
+
+(F) Miscellaneous
+
+ RunTest.bat a script for running tests under Windows
+
+Philip Hazel
+Email local part: ph10
+Email domain: cam.ac.uk
+Last updated: 17 January 2014
diff --git a/doc/html/index.html b/doc/html/index.html
new file mode 100644
index 0000000..352c55d
--- /dev/null
+++ b/doc/html/index.html
@@ -0,0 +1,185 @@
+<html>
+<!-- This is a manually maintained file that is the root of the HTML version of
+ the PCRE documentation. When the HTML documents are built from the man
+ page versions, the entire doc/html directory is emptied, this file is then
+ copied into doc/html/index.html, and the remaining files therein are
+ created by the 132html script.
+-->
+<head>
+<title>PCRE specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>Perl-compatible Regular Expressions (PCRE)</h1>
+<p>
+The HTML documentation for PCRE consists of a number of pages that are listed
+below in alphabetical order. If you are new to PCRE, please read the first one
+first.
+</p>
+
+<table>
+<tr><td><a href="pcre.html">pcre</a></td>
+ <td>&nbsp;&nbsp;Introductory page</td></tr>
+
+<tr><td><a href="pcre-config.html">pcre-config</a></td>
+ <td>&nbsp;&nbsp;Information about the installation configuration</td></tr>
+
+<tr><td><a href="pcre16.html">pcre16</a></td>
+ <td>&nbsp;&nbsp;Discussion of the 16-bit PCRE library</td></tr>
+
+<tr><td><a href="pcre32.html">pcre32</a></td>
+ <td>&nbsp;&nbsp;Discussion of the 32-bit PCRE library</td></tr>
+
+<tr><td><a href="pcreapi.html">pcreapi</a></td>
+ <td>&nbsp;&nbsp;PCRE's native API</td></tr>
+
+<tr><td><a href="pcrebuild.html">pcrebuild</a></td>
+ <td>&nbsp;&nbsp;Building PCRE</td></tr>
+
+<tr><td><a href="pcrecallout.html">pcrecallout</a></td>
+ <td>&nbsp;&nbsp;The <i>callout</i> facility</td></tr>
+
+<tr><td><a href="pcrecompat.html">pcrecompat</a></td>
+ <td>&nbsp;&nbsp;Compability with Perl</td></tr>
+
+<tr><td><a href="pcrecpp.html">pcrecpp</a></td>
+ <td>&nbsp;&nbsp;The C++ wrapper for the PCRE library</td></tr>
+
+<tr><td><a href="pcredemo.html">pcredemo</a></td>
+ <td>&nbsp;&nbsp;A demonstration C program that uses the PCRE library</td></tr>
+
+<tr><td><a href="pcregrep.html">pcregrep</a></td>
+ <td>&nbsp;&nbsp;The <b>pcregrep</b> command</td></tr>
+
+<tr><td><a href="pcrejit.html">pcrejit</a></td>
+ <td>&nbsp;&nbsp;Discussion of the just-in-time optimization support</td></tr>
+
+<tr><td><a href="pcrelimits.html">pcrelimits</a></td>
+ <td>&nbsp;&nbsp;Details of size and other limits</td></tr>
+
+<tr><td><a href="pcrematching.html">pcrematching</a></td>
+ <td>&nbsp;&nbsp;Discussion of the two matching algorithms</td></tr>
+
+<tr><td><a href="pcrepartial.html">pcrepartial</a></td>
+ <td>&nbsp;&nbsp;Using PCRE for partial matching</td></tr>
+
+<tr><td><a href="pcrepattern.html">pcrepattern</a></td>
+ <td>&nbsp;&nbsp;Specification of the regular expressions supported by PCRE</td></tr>
+
+<tr><td><a href="pcreperform.html">pcreperform</a></td>
+ <td>&nbsp;&nbsp;Some comments on performance</td></tr>
+
+<tr><td><a href="pcreposix.html">pcreposix</a></td>
+ <td>&nbsp;&nbsp;The POSIX API to the PCRE 8-bit library</td></tr>
+
+<tr><td><a href="pcreprecompile.html">pcreprecompile</a></td>
+ <td>&nbsp;&nbsp;How to save and re-use compiled patterns</td></tr>
+
+<tr><td><a href="pcresample.html">pcresample</a></td>
+ <td>&nbsp;&nbsp;Discussion of the pcredemo program</td></tr>
+
+<tr><td><a href="pcrestack.html">pcrestack</a></td>
+ <td>&nbsp;&nbsp;Discussion of PCRE's stack usage</td></tr>
+
+<tr><td><a href="pcresyntax.html">pcresyntax</a></td>
+ <td>&nbsp;&nbsp;Syntax quick-reference summary</td></tr>
+
+<tr><td><a href="pcretest.html">pcretest</a></td>
+ <td>&nbsp;&nbsp;The <b>pcretest</b> command for testing PCRE</td></tr>
+
+<tr><td><a href="pcreunicode.html">pcreunicode</a></td>
+ <td>&nbsp;&nbsp;Discussion of Unicode and UTF-8/UTF-16/UTF-32 support</td></tr>
+</table>
+
+<p>
+There are also individual pages that summarize the interface for each function
+in the library. There is a single page for each triple of 8-bit/16-bit/32-bit
+functions.
+</p>
+
+<table>
+
+<tr><td><a href="pcre_assign_jit_stack.html">pcre_assign_jit_stack</a></td>
+ <td>&nbsp;&nbsp;Assign stack for JIT matching</td></tr>
+
+<tr><td><a href="pcre_compile.html">pcre_compile</a></td>
+ <td>&nbsp;&nbsp;Compile a regular expression</td></tr>
+
+<tr><td><a href="pcre_compile2.html">pcre_compile2</a></td>
+ <td>&nbsp;&nbsp;Compile a regular expression (alternate interface)</td></tr>
+
+<tr><td><a href="pcre_config.html">pcre_config</a></td>
+ <td>&nbsp;&nbsp;Show build-time configuration options</td></tr>
+
+<tr><td><a href="pcre_copy_named_substring.html">pcre_copy_named_substring</a></td>
+ <td>&nbsp;&nbsp;Extract named substring into given buffer</td></tr>
+
+<tr><td><a href="pcre_copy_substring.html">pcre_copy_substring</a></td>
+ <td>&nbsp;&nbsp;Extract numbered substring into given buffer</td></tr>
+
+<tr><td><a href="pcre_dfa_exec.html">pcre_dfa_exec</a></td>
+ <td>&nbsp;&nbsp;Match a compiled pattern to a subject string
+ (DFA algorithm; <i>not</i> Perl compatible)</td></tr>
+
+<tr><td><a href="pcre_exec.html">pcre_exec</a></td>
+ <td>&nbsp;&nbsp;Match a compiled pattern to a subject string
+ (Perl compatible)</td></tr>
+
+<tr><td><a href="pcre_free_study.html">pcre_free_study</a></td>
+ <td>&nbsp;&nbsp;Free study data</td></tr>
+
+<tr><td><a href="pcre_free_substring.html">pcre_free_substring</a></td>
+ <td>&nbsp;&nbsp;Free extracted substring</td></tr>
+
+<tr><td><a href="pcre_free_substring_list.html">pcre_free_substring_list</a></td>
+ <td>&nbsp;&nbsp;Free list of extracted substrings</td></tr>
+
+<tr><td><a href="pcre_fullinfo.html">pcre_fullinfo</a></td>
+ <td>&nbsp;&nbsp;Extract information about a pattern</td></tr>
+
+<tr><td><a href="pcre_get_named_substring.html">pcre_get_named_substring</a></td>
+ <td>&nbsp;&nbsp;Extract named substring into new memory</td></tr>
+
+<tr><td><a href="pcre_get_stringnumber.html">pcre_get_stringnumber</a></td>
+ <td>&nbsp;&nbsp;Convert captured string name to number</td></tr>
+
+<tr><td><a href="pcre_get_stringtable_entries.html">pcre_get_stringtable_entries</a></td>
+ <td>&nbsp;&nbsp;Find table entries for given string name</td></tr>
+
+<tr><td><a href="pcre_get_substring.html">pcre_get_substring</a></td>
+ <td>&nbsp;&nbsp;Extract numbered substring into new memory</td></tr>
+
+<tr><td><a href="pcre_get_substring_list.html">pcre_get_substring_list</a></td>
+ <td>&nbsp;&nbsp;Extract all substrings into new memory</td></tr>
+
+<tr><td><a href="pcre_jit_exec.html">pcre_jit_exec</a></td>
+ <td>&nbsp;&nbsp;Fast path interface to JIT matching</td></tr>
+
+<tr><td><a href="pcre_jit_stack_alloc.html">pcre_jit_stack_alloc</a></td>
+ <td>&nbsp;&nbsp;Create a stack for JIT matching</td></tr>
+
+<tr><td><a href="pcre_jit_stack_free.html">pcre_jit_stack_free</a></td>
+ <td>&nbsp;&nbsp;Free a JIT matching stack</td></tr>
+
+<tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
+ <td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
+
+<tr><td><a href="pcre_pattern_to_host_byte_order.html">pcre_pattern_to_host_byte_order</a></td>
+ <td>&nbsp;&nbsp;Convert compiled pattern to host byte order if necessary</td></tr>
+
+<tr><td><a href="pcre_refcount.html">pcre_refcount</a></td>
+ <td>&nbsp;&nbsp;Maintain reference count in compiled pattern</td></tr>
+
+<tr><td><a href="pcre_study.html">pcre_study</a></td>
+ <td>&nbsp;&nbsp;Study a compiled pattern</td></tr>
+
+<tr><td><a href="pcre_utf16_to_host_byte_order.html">pcre_utf16_to_host_byte_order</a></td>
+ <td>&nbsp;&nbsp;Convert UTF-16 string to host byte order if necessary</td></tr>
+
+<tr><td><a href="pcre_utf32_to_host_byte_order.html">pcre_utf32_to_host_byte_order</a></td>
+ <td>&nbsp;&nbsp;Convert UTF-32 string to host byte order if necessary</td></tr>
+
+<tr><td><a href="pcre_version.html">pcre_version</a></td>
+ <td>&nbsp;&nbsp;Return PCRE version and release date</td></tr>
+</table>
+
+</html>
diff --git a/doc/html/pcre-config.html b/doc/html/pcre-config.html
new file mode 100644
index 0000000..56a8060
--- /dev/null
+++ b/doc/html/pcre-config.html
@@ -0,0 +1,109 @@
+<html>
+<head>
+<title>pcre-config specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre-config man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
+<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
+<li><a name="TOC3" href="#SEC3">OPTIONS</a>
+<li><a name="TOC4" href="#SEC4">SEE ALSO</a>
+<li><a name="TOC5" href="#SEC5">AUTHOR</a>
+<li><a name="TOC6" href="#SEC6">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
+<P>
+<b>pcre-config [--prefix] [--exec-prefix] [--version] [--libs]</b>
+<b> [--libs16] [--libs32] [--libs-cpp] [--libs-posix]</b>
+<b> [--cflags] [--cflags-posix]</b>
+</P>
+<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
+<P>
+<b>pcre-config</b> returns the configuration of the installed PCRE
+libraries and the options required to compile a program to use them. Some of
+the options apply only to the 8-bit, or 16-bit, or 32-bit libraries,
+respectively, and are
+not available if only one of those libraries has been built. If an unavailable
+option is encountered, the "usage" information is output.
+</P>
+<br><a name="SEC3" href="#TOC1">OPTIONS</a><br>
+<P>
+<b>--prefix</b>
+Writes the directory prefix used in the PCRE installation for architecture
+independent files (<i>/usr</i> on many systems, <i>/usr/local</i> on some
+systems) to the standard output.
+</P>
+<P>
+<b>--exec-prefix</b>
+Writes the directory prefix used in the PCRE installation for architecture
+dependent files (normally the same as <b>--prefix</b>) to the standard output.
+</P>
+<P>
+<b>--version</b>
+Writes the version number of the installed PCRE libraries to the standard
+output.
+</P>
+<P>
+<b>--libs</b>
+Writes to the standard output the command line options required to link
+with the 8-bit PCRE library (<b>-lpcre</b> on many systems).
+</P>
+<P>
+<b>--libs16</b>
+Writes to the standard output the command line options required to link
+with the 16-bit PCRE library (<b>-lpcre16</b> on many systems).
+</P>
+<P>
+<b>--libs32</b>
+Writes to the standard output the command line options required to link
+with the 32-bit PCRE library (<b>-lpcre32</b> on many systems).
+</P>
+<P>
+<b>--libs-cpp</b>
+Writes to the standard output the command line options required to link with
+PCRE's C++ wrapper library (<b>-lpcrecpp</b> <b>-lpcre</b> on many
+systems).
+</P>
+<P>
+<b>--libs-posix</b>
+Writes to the standard output the command line options required to link with
+PCRE's POSIX API wrapper library (<b>-lpcreposix</b> <b>-lpcre</b> on many
+systems).
+</P>
+<P>
+<b>--cflags</b>
+Writes to the standard output the command line options required to compile
+files that use PCRE (this may include some <b>-I</b> options, but is blank on
+many systems).
+</P>
+<P>
+<b>--cflags-posix</b>
+Writes to the standard output the command line options required to compile
+files that use PCRE's POSIX API wrapper library (this may include some <b>-I</b>
+options, but is blank on many systems).
+</P>
+<br><a name="SEC4" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcre(3)</b>
+</P>
+<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
+<P>
+This manual page was originally written by Mark Baker for the Debian GNU/Linux
+system. It has been subsequently revised as a generic PCRE man page.
+</P>
+<br><a name="SEC6" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 24 June 2012
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre.html b/doc/html/pcre.html
new file mode 100644
index 0000000..c2b29aa
--- /dev/null
+++ b/doc/html/pcre.html
@@ -0,0 +1,213 @@
+<html>
+<head>
+<title>pcre specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">INTRODUCTION</a>
+<li><a name="TOC2" href="#SEC2">SECURITY CONSIDERATIONS</a>
+<li><a name="TOC3" href="#SEC3">USER DOCUMENTATION</a>
+<li><a name="TOC4" href="#SEC4">AUTHOR</a>
+<li><a name="TOC5" href="#SEC5">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br>
+<P>
+The PCRE library is a set of functions that implement regular expression
+pattern matching using the same syntax and semantics as Perl, with just a few
+differences. Some features that appeared in Python and PCRE before they
+appeared in Perl are also available using the Python syntax, there is some
+support for one or two .NET and Oniguruma syntax items, and there is an option
+for requesting some minor changes that give better JavaScript compatibility.
+</P>
+<P>
+Starting with release 8.30, it is possible to compile two separate PCRE
+libraries: the original, which supports 8-bit character strings (including
+UTF-8 strings), and a second library that supports 16-bit character strings
+(including UTF-16 strings). The build process allows either one or both to be
+built. The majority of the work to make this possible was done by Zoltan
+Herczeg.
+</P>
+<P>
+Starting with release 8.32 it is possible to compile a third separate PCRE
+library that supports 32-bit character strings (including UTF-32 strings). The
+build process allows any combination of the 8-, 16- and 32-bit libraries. The
+work to make this possible was done by Christian Persch.
+</P>
+<P>
+The three libraries contain identical sets of functions, except that the names
+in the 16-bit library start with <b>pcre16_</b> instead of <b>pcre_</b>, and the
+names in the 32-bit library start with <b>pcre32_</b> instead of <b>pcre_</b>. To
+avoid over-complication and reduce the documentation maintenance load, most of
+the documentation describes the 8-bit library, with the differences for the
+16-bit and 32-bit libraries described separately in the
+<a href="pcre16.html"><b>pcre16</b></a>
+and
+<a href="pcre32.html"><b>pcre32</b></a>
+pages. References to functions or structures of the form <i>pcre[16|32]_xxx</i>
+should be read as meaning "<i>pcre_xxx</i> when using the 8-bit library,
+<i>pcre16_xxx</i> when using the 16-bit library, or <i>pcre32_xxx</i> when using
+the 32-bit library".
+</P>
+<P>
+The current implementation of PCRE corresponds approximately with Perl 5.12,
+including support for UTF-8/16/32 encoded strings and Unicode general category
+properties. However, UTF-8/16/32 and Unicode support has to be explicitly
+enabled; it is not the default. The Unicode tables correspond to Unicode
+release 6.3.0.
+</P>
+<P>
+In addition to the Perl-compatible matching function, PCRE contains an
+alternative function that matches the same compiled patterns in a different
+way. In certain circumstances, the alternative function has some advantages.
+For a discussion of the two matching algorithms, see the
+<a href="pcrematching.html"><b>pcrematching</b></a>
+page.
+</P>
+<P>
+PCRE is written in C and released as a C library. A number of people have
+written wrappers and interfaces of various kinds. In particular, Google Inc.
+have provided a comprehensive C++ wrapper for the 8-bit library. This is now
+included as part of the PCRE distribution. The
+<a href="pcrecpp.html"><b>pcrecpp</b></a>
+page has details of this interface. Other people's contributions can be found
+in the <i>Contrib</i> directory at the primary FTP site, which is:
+<a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>
+</P>
+<P>
+Details of exactly which Perl regular expression features are and are not
+supported by PCRE are given in separate documents. See the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+and
+<a href="pcrecompat.html"><b>pcrecompat</b></a>
+pages. There is a syntax summary in the
+<a href="pcresyntax.html"><b>pcresyntax</b></a>
+page.
+</P>
+<P>
+Some features of PCRE can be included, excluded, or changed when the library is
+built. The
+<a href="pcre_config.html"><b>pcre_config()</b></a>
+function makes it possible for a client to discover which features are
+available. The features themselves are described in the
+<a href="pcrebuild.html"><b>pcrebuild</b></a>
+page. Documentation about building PCRE for various operating systems can be
+found in the
+<a href="README.txt"><b>README</b></a>
+and
+<a href="NON-AUTOTOOLS-BUILD.txt"><b>NON-AUTOTOOLS_BUILD</b></a>
+files in the source distribution.
+</P>
+<P>
+The libraries contains a number of undocumented internal functions and data
+tables that are used by more than one of the exported external functions, but
+which are not intended for use by external callers. Their names all begin with
+"_pcre_" or "_pcre16_" or "_pcre32_", which hopefully will not provoke any name
+clashes. In some environments, it is possible to control which external symbols
+are exported when a shared library is built, and in these cases the
+undocumented symbols are not exported.
+</P>
+<br><a name="SEC2" href="#TOC1">SECURITY CONSIDERATIONS</a><br>
+<P>
+If you are using PCRE in a non-UTF application that permits users to supply
+arbitrary patterns for compilation, you should be aware of a feature that
+allows users to turn on UTF support from within a pattern, provided that PCRE
+was built with UTF support. For example, an 8-bit pattern that begins with
+"(*UTF8)" or "(*UTF)" turns on UTF-8 mode, which interprets patterns and
+subjects as strings of UTF-8 characters instead of individual 8-bit characters.
+This causes both the pattern and any data against which it is matched to be
+checked for UTF-8 validity. If the data string is very long, such a check might
+use sufficiently many resources as to cause your application to lose
+performance.
+</P>
+<P>
+One way of guarding against this possibility is to use the
+<b>pcre_fullinfo()</b> function to check the compiled pattern's options for UTF.
+Alternatively, from release 8.33, you can set the PCRE_NEVER_UTF option at
+compile time. This causes an compile time error if a pattern contains a
+UTF-setting sequence.
+</P>
+<P>
+If your application is one that supports UTF, be aware that validity checking
+can take time. If the same data string is to be matched many times, you can use
+the PCRE_NO_UTF[8|16|32]_CHECK option for the second and subsequent matches to
+save redundant checks.
+</P>
+<P>
+Another way that performance can be hit is by running a pattern that has a very
+large search tree against a string that will never match. Nested unlimited
+repeats in a pattern are a common example. PCRE provides some protection
+against this: see the PCRE_EXTRA_MATCH_LIMIT feature in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page.
+</P>
+<br><a name="SEC3" href="#TOC1">USER DOCUMENTATION</a><br>
+<P>
+The user documentation for PCRE comprises a number of different sections. In
+the "man" format, each of these is a separate "man page". In the HTML format,
+each is a separate page, linked from the index page. In the plain text format,
+the descriptions of the <b>pcregrep</b> and <b>pcretest</b> programs are in files
+called <b>pcregrep.txt</b> and <b>pcretest.txt</b>, respectively. The remaining
+sections, except for the <b>pcredemo</b> section (which is a program listing),
+are concatenated in <b>pcre.txt</b>, for ease of searching. The sections are as
+follows:
+<pre>
+ pcre this document
+ pcre-config show PCRE installation configuration information
+ pcre16 details of the 16-bit library
+ pcre32 details of the 32-bit library
+ pcreapi details of PCRE's native C API
+ pcrebuild building PCRE
+ pcrecallout details of the callout feature
+ pcrecompat discussion of Perl compatibility
+ pcrecpp details of the C++ wrapper for the 8-bit library
+ pcredemo a demonstration C program that uses PCRE
+ pcregrep description of the <b>pcregrep</b> command (8-bit only)
+ pcrejit discussion of the just-in-time optimization support
+ pcrelimits details of size and other limits
+ pcrematching discussion of the two matching algorithms
+ pcrepartial details of the partial matching facility
+ pcrepattern syntax and semantics of supported regular expressions
+ pcreperform discussion of performance issues
+ pcreposix the POSIX-compatible C API for the 8-bit library
+ pcreprecompile details of saving and re-using precompiled patterns
+ pcresample discussion of the pcredemo program
+ pcrestack discussion of stack usage
+ pcresyntax quick syntax reference
+ pcretest description of the <b>pcretest</b> testing command
+ pcreunicode discussion of Unicode and UTF-8/16/32 support
+</pre>
+In the "man" and HTML formats, there is also a short page for each C library
+function, listing its arguments and results.
+</P>
+<br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<P>
+Putting an actual email address here seems to have been a spam magnet, so I've
+taken it away. If you want to email me, use my two initials, followed by the
+two digits 10, at the domain cam.ac.uk.
+</P>
+<br><a name="SEC5" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 08 January 2014
+<br>
+Copyright &copy; 1997-2014 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre16.html b/doc/html/pcre16.html
new file mode 100644
index 0000000..f00859f
--- /dev/null
+++ b/doc/html/pcre16.html
@@ -0,0 +1,384 @@
+<html>
+<head>
+<title>pcre16 specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre16 man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE 16-BIT API BASIC FUNCTIONS</a>
+<li><a name="TOC2" href="#SEC2">PCRE 16-BIT API STRING EXTRACTION FUNCTIONS</a>
+<li><a name="TOC3" href="#SEC3">PCRE 16-BIT API AUXILIARY FUNCTIONS</a>
+<li><a name="TOC4" href="#SEC4">PCRE 16-BIT API INDIRECTED FUNCTIONS</a>
+<li><a name="TOC5" href="#SEC5">PCRE 16-BIT API 16-BIT-ONLY FUNCTION</a>
+<li><a name="TOC6" href="#SEC6">THE PCRE 16-BIT LIBRARY</a>
+<li><a name="TOC7" href="#SEC7">THE HEADER FILE</a>
+<li><a name="TOC8" href="#SEC8">THE LIBRARY NAME</a>
+<li><a name="TOC9" href="#SEC9">STRING TYPES</a>
+<li><a name="TOC10" href="#SEC10">STRUCTURE TYPES</a>
+<li><a name="TOC11" href="#SEC11">16-BIT FUNCTIONS</a>
+<li><a name="TOC12" href="#SEC12">SUBJECT STRING OFFSETS</a>
+<li><a name="TOC13" href="#SEC13">NAMED SUBPATTERNS</a>
+<li><a name="TOC14" href="#SEC14">OPTION NAMES</a>
+<li><a name="TOC15" href="#SEC15">CHARACTER CODES</a>
+<li><a name="TOC16" href="#SEC16">ERROR NAMES</a>
+<li><a name="TOC17" href="#SEC17">ERROR TEXTS</a>
+<li><a name="TOC18" href="#SEC18">CALLOUTS</a>
+<li><a name="TOC19" href="#SEC19">TESTING</a>
+<li><a name="TOC20" href="#SEC20">NOT SUPPORTED IN 16-BIT MODE</a>
+<li><a name="TOC21" href="#SEC21">AUTHOR</a>
+<li><a name="TOC22" href="#SEC22">REVISION</a>
+</ul>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<br><a name="SEC1" href="#TOC1">PCRE 16-BIT API BASIC FUNCTIONS</a><br>
+<P>
+<b>pcre16 *pcre16_compile(PCRE_SPTR16 <i>pattern</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre16 *pcre16_compile2(PCRE_SPTR16 <i>pattern</i>, int <i>options</i>,</b>
+<b> int *<i>errorcodeptr</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre16_extra *pcre16_study(const pcre16 *<i>code</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>);</b>
+<br>
+<br>
+<b>void pcre16_free_study(pcre16_extra *<i>extra</i>);</b>
+<br>
+<br>
+<b>int pcre16_exec(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
+<b> PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
+<br>
+<br>
+<b>int pcre16_dfa_exec(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
+<b> PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
+</P>
+<br><a name="SEC2" href="#TOC1">PCRE 16-BIT API STRING EXTRACTION FUNCTIONS</a><br>
+<P>
+<b>int pcre16_copy_named_substring(const pcre16 *<i>code</i>,</b>
+<b> PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
+<b> PCRE_UCHAR16 *<i>buffer</i>, int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre16_copy_substring(PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR16 *<i>buffer</i>,</b>
+<b> int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre16_get_named_substring(const pcre16 *<i>code</i>,</b>
+<b> PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
+<b> PCRE_SPTR16 *<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre16_get_stringnumber(const pcre16 *<i>code</i>,</b>
+<b>" PCRE_SPTR16 <i>name</i>);</b>
+<br>
+<br>
+<b>int pcre16_get_stringtable_entries(const pcre16 *<i>code</i>,</b>
+<b> PCRE_SPTR16 <i>name</i>, PCRE_UCHAR16 **<i>first</i>, PCRE_UCHAR16 **<i>last</i>);</b>
+<br>
+<br>
+<b>int pcre16_get_substring(PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
+<b> PCRE_SPTR16 *<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre16_get_substring_list(PCRE_SPTR16 <i>subject</i>,</b>
+<b> int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR16 **<i>listptr</i>);</b>
+<br>
+<br>
+<b>void pcre16_free_substring(PCRE_SPTR16 <i>stringptr</i>);</b>
+<br>
+<br>
+<b>void pcre16_free_substring_list(PCRE_SPTR16 *<i>stringptr</i>);</b>
+</P>
+<br><a name="SEC3" href="#TOC1">PCRE 16-BIT API AUXILIARY FUNCTIONS</a><br>
+<P>
+<b>pcre16_jit_stack *pcre16_jit_stack_alloc(int <i>startsize</i>, int <i>maxsize</i>);</b>
+<br>
+<br>
+<b>void pcre16_jit_stack_free(pcre16_jit_stack *<i>stack</i>);</b>
+<br>
+<br>
+<b>void pcre16_assign_jit_stack(pcre16_extra *<i>extra</i>,</b>
+<b> pcre16_jit_callback <i>callback</i>, void *<i>data</i>);</b>
+<br>
+<br>
+<b>const unsigned char *pcre16_maketables(void);</b>
+<br>
+<br>
+<b>int pcre16_fullinfo(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
+<b> int <i>what</i>, void *<i>where</i>);</b>
+<br>
+<br>
+<b>int pcre16_refcount(pcre16 *<i>code</i>, int <i>adjust</i>);</b>
+<br>
+<br>
+<b>int pcre16_config(int <i>what</i>, void *<i>where</i>);</b>
+<br>
+<br>
+<b>const char *pcre16_version(void);</b>
+<br>
+<br>
+<b>int pcre16_pattern_to_host_byte_order(pcre16 *<i>code</i>,</b>
+<b> pcre16_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
+</P>
+<br><a name="SEC4" href="#TOC1">PCRE 16-BIT API INDIRECTED FUNCTIONS</a><br>
+<P>
+<b>void *(*pcre16_malloc)(size_t);</b>
+<br>
+<br>
+<b>void (*pcre16_free)(void *);</b>
+<br>
+<br>
+<b>void *(*pcre16_stack_malloc)(size_t);</b>
+<br>
+<br>
+<b>void (*pcre16_stack_free)(void *);</b>
+<br>
+<br>
+<b>int (*pcre16_callout)(pcre16_callout_block *);</b>
+</P>
+<br><a name="SEC5" href="#TOC1">PCRE 16-BIT API 16-BIT-ONLY FUNCTION</a><br>
+<P>
+<b>int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *<i>output</i>,</b>
+<b> PCRE_SPTR16 <i>input</i>, int <i>length</i>, int *<i>byte_order</i>,</b>
+<b> int <i>keep_boms</i>);</b>
+</P>
+<br><a name="SEC6" href="#TOC1">THE PCRE 16-BIT LIBRARY</a><br>
+<P>
+Starting with release 8.30, it is possible to compile a PCRE library that
+supports 16-bit character strings, including UTF-16 strings, as well as or
+instead of the original 8-bit library. The majority of the work to make this
+possible was done by Zoltan Herczeg. The two libraries contain identical sets
+of functions, used in exactly the same way. Only the names of the functions and
+the data types of their arguments and results are different. To avoid
+over-complication and reduce the documentation maintenance load, most of the
+PCRE documentation describes the 8-bit library, with only occasional references
+to the 16-bit library. This page describes what is different when you use the
+16-bit library.
+</P>
+<P>
+WARNING: A single application can be linked with both libraries, but you must
+take care when processing any particular pattern to use functions from just one
+library. For example, if you want to study a pattern that was compiled with
+<b>pcre16_compile()</b>, you must do so with <b>pcre16_study()</b>, not
+<b>pcre_study()</b>, and you must free the study data with
+<b>pcre16_free_study()</b>.
+</P>
+<br><a name="SEC7" href="#TOC1">THE HEADER FILE</a><br>
+<P>
+There is only one header file, <b>pcre.h</b>. It contains prototypes for all the
+functions in all libraries, as well as definitions of flags, structures, error
+codes, etc.
+</P>
+<br><a name="SEC8" href="#TOC1">THE LIBRARY NAME</a><br>
+<P>
+In Unix-like systems, the 16-bit library is called <b>libpcre16</b>, and can
+normally be accesss by adding <b>-lpcre16</b> to the command for linking an
+application that uses PCRE.
+</P>
+<br><a name="SEC9" href="#TOC1">STRING TYPES</a><br>
+<P>
+In the 8-bit library, strings are passed to PCRE library functions as vectors
+of bytes with the C type "char *". In the 16-bit library, strings are passed as
+vectors of unsigned 16-bit quantities. The macro PCRE_UCHAR16 specifies an
+appropriate data type, and PCRE_SPTR16 is defined as "const PCRE_UCHAR16 *". In
+very many environments, "short int" is a 16-bit data type. When PCRE is built,
+it defines PCRE_UCHAR16 as "unsigned short int", but checks that it really is a
+16-bit data type. If it is not, the build fails with an error message telling
+the maintainer to modify the definition appropriately.
+</P>
+<br><a name="SEC10" href="#TOC1">STRUCTURE TYPES</a><br>
+<P>
+The types of the opaque structures that are used for compiled 16-bit patterns
+and JIT stacks are <b>pcre16</b> and <b>pcre16_jit_stack</b> respectively. The
+type of the user-accessible structure that is returned by <b>pcre16_study()</b>
+is <b>pcre16_extra</b>, and the type of the structure that is used for passing
+data to a callout function is <b>pcre16_callout_block</b>. These structures
+contain the same fields, with the same names, as their 8-bit counterparts. The
+only difference is that pointers to character strings are 16-bit instead of
+8-bit types.
+</P>
+<br><a name="SEC11" href="#TOC1">16-BIT FUNCTIONS</a><br>
+<P>
+For every function in the 8-bit library there is a corresponding function in
+the 16-bit library with a name that starts with <b>pcre16_</b> instead of
+<b>pcre_</b>. The prototypes are listed above. In addition, there is one extra
+function, <b>pcre16_utf16_to_host_byte_order()</b>. This is a utility function
+that converts a UTF-16 character string to host byte order if necessary. The
+other 16-bit functions expect the strings they are passed to be in host byte
+order.
+</P>
+<P>
+The <i>input</i> and <i>output</i> arguments of
+<b>pcre16_utf16_to_host_byte_order()</b> may point to the same address, that is,
+conversion in place is supported. The output buffer must be at least as long as
+the input.
+</P>
+<P>
+The <i>length</i> argument specifies the number of 16-bit data units in the
+input string; a negative value specifies a zero-terminated string.
+</P>
+<P>
+If <i>byte_order</i> is NULL, it is assumed that the string starts off in host
+byte order. This may be changed by byte-order marks (BOMs) anywhere in the
+string (commonly as the first character).
+</P>
+<P>
+If <i>byte_order</i> is not NULL, a non-zero value of the integer to which it
+points means that the input starts off in host byte order, otherwise the
+opposite order is assumed. Again, BOMs in the string can change this. The final
+byte order is passed back at the end of processing.
+</P>
+<P>
+If <i>keep_boms</i> is not zero, byte-order mark characters (0xfeff) are copied
+into the output string. Otherwise they are discarded.
+</P>
+<P>
+The result of the function is the number of 16-bit units placed into the output
+buffer, including the zero terminator if the string was zero-terminated.
+</P>
+<br><a name="SEC12" href="#TOC1">SUBJECT STRING OFFSETS</a><br>
+<P>
+The lengths and starting offsets of subject strings must be specified in 16-bit
+data units, and the offsets within subject strings that are returned by the
+matching functions are in also 16-bit units rather than bytes.
+</P>
+<br><a name="SEC13" href="#TOC1">NAMED SUBPATTERNS</a><br>
+<P>
+The name-to-number translation table that is maintained for named subpatterns
+uses 16-bit characters. The <b>pcre16_get_stringtable_entries()</b> function
+returns the length of each entry in the table as the number of 16-bit data
+units.
+</P>
+<br><a name="SEC14" href="#TOC1">OPTION NAMES</a><br>
+<P>
+There are two new general option names, PCRE_UTF16 and PCRE_NO_UTF16_CHECK,
+which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In
+fact, these new options define the same bits in the options word. There is a
+discussion about the
+<a href="pcreunicode.html#utf16strings">validity of UTF-16 strings</a>
+in the
+<a href="pcreunicode.html"><b>pcreunicode</b></a>
+page.
+</P>
+<P>
+For the <b>pcre16_config()</b> function there is an option PCRE_CONFIG_UTF16
+that returns 1 if UTF-16 support is configured, otherwise 0. If this option is
+given to <b>pcre_config()</b> or <b>pcre32_config()</b>, or if the
+PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF32 option is given to <b>pcre16_config()</b>,
+the result is the PCRE_ERROR_BADOPTION error.
+</P>
+<br><a name="SEC15" href="#TOC1">CHARACTER CODES</a><br>
+<P>
+In 16-bit mode, when PCRE_UTF16 is not set, character values are treated in the
+same way as in 8-bit, non UTF-8 mode, except, of course, that they can range
+from 0 to 0xffff instead of 0 to 0xff. Character types for characters less than
+0xff can therefore be influenced by the locale in the same way as before.
+Characters greater than 0xff have only one case, and no "type" (such as letter
+or digit).
+</P>
+<P>
+In UTF-16 mode, the character code is Unicode, in the range 0 to 0x10ffff, with
+the exception of values in the range 0xd800 to 0xdfff because those are
+"surrogate" values that are used in pairs to encode values greater than 0xffff.
+</P>
+<P>
+A UTF-16 string can indicate its endianness by special code knows as a
+byte-order mark (BOM). The PCRE functions do not handle this, expecting strings
+to be in host byte order. A utility function called
+<b>pcre16_utf16_to_host_byte_order()</b> is provided to help with this (see
+above).
+</P>
+<br><a name="SEC16" href="#TOC1">ERROR NAMES</a><br>
+<P>
+The errors PCRE_ERROR_BADUTF16_OFFSET and PCRE_ERROR_SHORTUTF16 correspond to
+their 8-bit counterparts. The error PCRE_ERROR_BADMODE is given when a compiled
+pattern is passed to a function that processes patterns in the other
+mode, for example, if a pattern compiled with <b>pcre_compile()</b> is passed to
+<b>pcre16_exec()</b>.
+</P>
+<P>
+There are new error codes whose names begin with PCRE_UTF16_ERR for invalid
+UTF-16 strings, corresponding to the PCRE_UTF8_ERR codes for UTF-8 strings that
+are described in the section entitled
+<a href="pcreapi.html#badutf8reasons">"Reason codes for invalid UTF-8 strings"</a>
+in the main
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page. The UTF-16 errors are:
+<pre>
+ PCRE_UTF16_ERR1 Missing low surrogate at end of string
+ PCRE_UTF16_ERR2 Invalid low surrogate follows high surrogate
+ PCRE_UTF16_ERR3 Isolated low surrogate
+ PCRE_UTF16_ERR4 Non-character
+</PRE>
+</P>
+<br><a name="SEC17" href="#TOC1">ERROR TEXTS</a><br>
+<P>
+If there is an error while compiling a pattern, the error text that is passed
+back by <b>pcre16_compile()</b> or <b>pcre16_compile2()</b> is still an 8-bit
+character string, zero-terminated.
+</P>
+<br><a name="SEC18" href="#TOC1">CALLOUTS</a><br>
+<P>
+The <i>subject</i> and <i>mark</i> fields in the callout block that is passed to
+a callout function point to 16-bit vectors.
+</P>
+<br><a name="SEC19" href="#TOC1">TESTING</a><br>
+<P>
+The <b>pcretest</b> program continues to operate with 8-bit input and output
+files, but it can be used for testing the 16-bit library. If it is run with the
+command line option <b>-16</b>, patterns and subject strings are converted from
+8-bit to 16-bit before being passed to PCRE, and the 16-bit library functions
+are used instead of the 8-bit ones. Returned 16-bit strings are converted to
+8-bit for output. If both the 8-bit and the 32-bit libraries were not compiled,
+<b>pcretest</b> defaults to 16-bit and the <b>-16</b> option is ignored.
+</P>
+<P>
+When PCRE is being built, the <b>RunTest</b> script that is called by "make
+check" uses the <b>pcretest</b> <b>-C</b> option to discover which of the 8-bit,
+16-bit and 32-bit libraries has been built, and runs the tests appropriately.
+</P>
+<br><a name="SEC20" href="#TOC1">NOT SUPPORTED IN 16-BIT MODE</a><br>
+<P>
+Not all the features of the 8-bit library are available with the 16-bit
+library. The C++ and POSIX wrapper functions support only the 8-bit library,
+and the <b>pcregrep</b> program is at present 8-bit only.
+</P>
+<br><a name="SEC21" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC22" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 12 May 2013
+<br>
+Copyright &copy; 1997-2013 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre32.html b/doc/html/pcre32.html
new file mode 100644
index 0000000..f96876e
--- /dev/null
+++ b/doc/html/pcre32.html
@@ -0,0 +1,382 @@
+<html>
+<head>
+<title>pcre32 specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre32 man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE 32-BIT API BASIC FUNCTIONS</a>
+<li><a name="TOC2" href="#SEC2">PCRE 32-BIT API STRING EXTRACTION FUNCTIONS</a>
+<li><a name="TOC3" href="#SEC3">PCRE 32-BIT API AUXILIARY FUNCTIONS</a>
+<li><a name="TOC4" href="#SEC4">PCRE 32-BIT API INDIRECTED FUNCTIONS</a>
+<li><a name="TOC5" href="#SEC5">PCRE 32-BIT API 32-BIT-ONLY FUNCTION</a>
+<li><a name="TOC6" href="#SEC6">THE PCRE 32-BIT LIBRARY</a>
+<li><a name="TOC7" href="#SEC7">THE HEADER FILE</a>
+<li><a name="TOC8" href="#SEC8">THE LIBRARY NAME</a>
+<li><a name="TOC9" href="#SEC9">STRING TYPES</a>
+<li><a name="TOC10" href="#SEC10">STRUCTURE TYPES</a>
+<li><a name="TOC11" href="#SEC11">32-BIT FUNCTIONS</a>
+<li><a name="TOC12" href="#SEC12">SUBJECT STRING OFFSETS</a>
+<li><a name="TOC13" href="#SEC13">NAMED SUBPATTERNS</a>
+<li><a name="TOC14" href="#SEC14">OPTION NAMES</a>
+<li><a name="TOC15" href="#SEC15">CHARACTER CODES</a>
+<li><a name="TOC16" href="#SEC16">ERROR NAMES</a>
+<li><a name="TOC17" href="#SEC17">ERROR TEXTS</a>
+<li><a name="TOC18" href="#SEC18">CALLOUTS</a>
+<li><a name="TOC19" href="#SEC19">TESTING</a>
+<li><a name="TOC20" href="#SEC20">NOT SUPPORTED IN 32-BIT MODE</a>
+<li><a name="TOC21" href="#SEC21">AUTHOR</a>
+<li><a name="TOC22" href="#SEC22">REVISION</a>
+</ul>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<br><a name="SEC1" href="#TOC1">PCRE 32-BIT API BASIC FUNCTIONS</a><br>
+<P>
+<b>pcre32 *pcre32_compile(PCRE_SPTR32 <i>pattern</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre32 *pcre32_compile2(PCRE_SPTR32 <i>pattern</i>, int <i>options</i>,</b>
+<b> int *<i>errorcodeptr</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre32_extra *pcre32_study(const pcre32 *<i>code</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>);</b>
+<br>
+<br>
+<b>void pcre32_free_study(pcre32_extra *<i>extra</i>);</b>
+<br>
+<br>
+<b>int pcre32_exec(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
+<b> PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
+<br>
+<br>
+<b>int pcre32_dfa_exec(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
+<b> PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
+</P>
+<br><a name="SEC2" href="#TOC1">PCRE 32-BIT API STRING EXTRACTION FUNCTIONS</a><br>
+<P>
+<b>int pcre32_copy_named_substring(const pcre32 *<i>code</i>,</b>
+<b> PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, PCRE_SPTR32 <i>stringname</i>,</b>
+<b> PCRE_UCHAR32 *<i>buffer</i>, int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre32_copy_substring(PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR32 *<i>buffer</i>,</b>
+<b> int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre32_get_named_substring(const pcre32 *<i>code</i>,</b>
+<b> PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, PCRE_SPTR32 <i>stringname</i>,</b>
+<b> PCRE_SPTR32 *<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre32_get_stringnumber(const pcre32 *<i>code</i>,</b>
+<b> PCRE_SPTR32 <i>name</i>);</b>
+<br>
+<br>
+<b>int pcre32_get_stringtable_entries(const pcre32 *<i>code</i>,</b>
+<b> PCRE_SPTR32 <i>name</i>, PCRE_UCHAR32 **<i>first</i>, PCRE_UCHAR32 **<i>last</i>);</b>
+<br>
+<br>
+<b>int pcre32_get_substring(PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
+<b> PCRE_SPTR32 *<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre32_get_substring_list(PCRE_SPTR32 <i>subject</i>,</b>
+<b> int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR32 **<i>listptr</i>);</b>
+<br>
+<br>
+<b>void pcre32_free_substring(PCRE_SPTR32 <i>stringptr</i>);</b>
+<br>
+<br>
+<b>void pcre32_free_substring_list(PCRE_SPTR32 *<i>stringptr</i>);</b>
+</P>
+<br><a name="SEC3" href="#TOC1">PCRE 32-BIT API AUXILIARY FUNCTIONS</a><br>
+<P>
+<b>pcre32_jit_stack *pcre32_jit_stack_alloc(int <i>startsize</i>, int <i>maxsize</i>);</b>
+<br>
+<br>
+<b>void pcre32_jit_stack_free(pcre32_jit_stack *<i>stack</i>);</b>
+<br>
+<br>
+<b>void pcre32_assign_jit_stack(pcre32_extra *<i>extra</i>,</b>
+<b> pcre32_jit_callback <i>callback</i>, void *<i>data</i>);</b>
+<br>
+<br>
+<b>const unsigned char *pcre32_maketables(void);</b>
+<br>
+<br>
+<b>int pcre32_fullinfo(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
+<b> int <i>what</i>, void *<i>where</i>);</b>
+<br>
+<br>
+<b>int pcre32_refcount(pcre32 *<i>code</i>, int <i>adjust</i>);</b>
+<br>
+<br>
+<b>int pcre32_config(int <i>what</i>, void *<i>where</i>);</b>
+<br>
+<br>
+<b>const char *pcre32_version(void);</b>
+<br>
+<br>
+<b>int pcre32_pattern_to_host_byte_order(pcre32 *<i>code</i>,</b>
+<b> pcre32_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
+</P>
+<br><a name="SEC4" href="#TOC1">PCRE 32-BIT API INDIRECTED FUNCTIONS</a><br>
+<P>
+<b>void *(*pcre32_malloc)(size_t);</b>
+<br>
+<br>
+<b>void (*pcre32_free)(void *);</b>
+<br>
+<br>
+<b>void *(*pcre32_stack_malloc)(size_t);</b>
+<br>
+<br>
+<b>void (*pcre32_stack_free)(void *);</b>
+<br>
+<br>
+<b>int (*pcre32_callout)(pcre32_callout_block *);</b>
+</P>
+<br><a name="SEC5" href="#TOC1">PCRE 32-BIT API 32-BIT-ONLY FUNCTION</a><br>
+<P>
+<b>int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *<i>output</i>,</b>
+<b> PCRE_SPTR32 <i>input</i>, int <i>length</i>, int *<i>byte_order</i>,</b>
+<b> int <i>keep_boms</i>);</b>
+</P>
+<br><a name="SEC6" href="#TOC1">THE PCRE 32-BIT LIBRARY</a><br>
+<P>
+Starting with release 8.32, it is possible to compile a PCRE library that
+supports 32-bit character strings, including UTF-32 strings, as well as or
+instead of the original 8-bit library. This work was done by Christian Persch,
+based on the work done by Zoltan Herczeg for the 16-bit library. All three
+libraries contain identical sets of functions, used in exactly the same way.
+Only the names of the functions and the data types of their arguments and
+results are different. To avoid over-complication and reduce the documentation
+maintenance load, most of the PCRE documentation describes the 8-bit library,
+with only occasional references to the 16-bit and 32-bit libraries. This page
+describes what is different when you use the 32-bit library.
+</P>
+<P>
+WARNING: A single application can be linked with all or any of the three
+libraries, but you must take care when processing any particular pattern
+to use functions from just one library. For example, if you want to study
+a pattern that was compiled with <b>pcre32_compile()</b>, you must do so
+with <b>pcre32_study()</b>, not <b>pcre_study()</b>, and you must free the
+study data with <b>pcre32_free_study()</b>.
+</P>
+<br><a name="SEC7" href="#TOC1">THE HEADER FILE</a><br>
+<P>
+There is only one header file, <b>pcre.h</b>. It contains prototypes for all the
+functions in all libraries, as well as definitions of flags, structures, error
+codes, etc.
+</P>
+<br><a name="SEC8" href="#TOC1">THE LIBRARY NAME</a><br>
+<P>
+In Unix-like systems, the 32-bit library is called <b>libpcre32</b>, and can
+normally be accesss by adding <b>-lpcre32</b> to the command for linking an
+application that uses PCRE.
+</P>
+<br><a name="SEC9" href="#TOC1">STRING TYPES</a><br>
+<P>
+In the 8-bit library, strings are passed to PCRE library functions as vectors
+of bytes with the C type "char *". In the 32-bit library, strings are passed as
+vectors of unsigned 32-bit quantities. The macro PCRE_UCHAR32 specifies an
+appropriate data type, and PCRE_SPTR32 is defined as "const PCRE_UCHAR32 *". In
+very many environments, "unsigned int" is a 32-bit data type. When PCRE is
+built, it defines PCRE_UCHAR32 as "unsigned int", but checks that it really is
+a 32-bit data type. If it is not, the build fails with an error message telling
+the maintainer to modify the definition appropriately.
+</P>
+<br><a name="SEC10" href="#TOC1">STRUCTURE TYPES</a><br>
+<P>
+The types of the opaque structures that are used for compiled 32-bit patterns
+and JIT stacks are <b>pcre32</b> and <b>pcre32_jit_stack</b> respectively. The
+type of the user-accessible structure that is returned by <b>pcre32_study()</b>
+is <b>pcre32_extra</b>, and the type of the structure that is used for passing
+data to a callout function is <b>pcre32_callout_block</b>. These structures
+contain the same fields, with the same names, as their 8-bit counterparts. The
+only difference is that pointers to character strings are 32-bit instead of
+8-bit types.
+</P>
+<br><a name="SEC11" href="#TOC1">32-BIT FUNCTIONS</a><br>
+<P>
+For every function in the 8-bit library there is a corresponding function in
+the 32-bit library with a name that starts with <b>pcre32_</b> instead of
+<b>pcre_</b>. The prototypes are listed above. In addition, there is one extra
+function, <b>pcre32_utf32_to_host_byte_order()</b>. This is a utility function
+that converts a UTF-32 character string to host byte order if necessary. The
+other 32-bit functions expect the strings they are passed to be in host byte
+order.
+</P>
+<P>
+The <i>input</i> and <i>output</i> arguments of
+<b>pcre32_utf32_to_host_byte_order()</b> may point to the same address, that is,
+conversion in place is supported. The output buffer must be at least as long as
+the input.
+</P>
+<P>
+The <i>length</i> argument specifies the number of 32-bit data units in the
+input string; a negative value specifies a zero-terminated string.
+</P>
+<P>
+If <i>byte_order</i> is NULL, it is assumed that the string starts off in host
+byte order. This may be changed by byte-order marks (BOMs) anywhere in the
+string (commonly as the first character).
+</P>
+<P>
+If <i>byte_order</i> is not NULL, a non-zero value of the integer to which it
+points means that the input starts off in host byte order, otherwise the
+opposite order is assumed. Again, BOMs in the string can change this. The final
+byte order is passed back at the end of processing.
+</P>
+<P>
+If <i>keep_boms</i> is not zero, byte-order mark characters (0xfeff) are copied
+into the output string. Otherwise they are discarded.
+</P>
+<P>
+The result of the function is the number of 32-bit units placed into the output
+buffer, including the zero terminator if the string was zero-terminated.
+</P>
+<br><a name="SEC12" href="#TOC1">SUBJECT STRING OFFSETS</a><br>
+<P>
+The lengths and starting offsets of subject strings must be specified in 32-bit
+data units, and the offsets within subject strings that are returned by the
+matching functions are in also 32-bit units rather than bytes.
+</P>
+<br><a name="SEC13" href="#TOC1">NAMED SUBPATTERNS</a><br>
+<P>
+The name-to-number translation table that is maintained for named subpatterns
+uses 32-bit characters. The <b>pcre32_get_stringtable_entries()</b> function
+returns the length of each entry in the table as the number of 32-bit data
+units.
+</P>
+<br><a name="SEC14" href="#TOC1">OPTION NAMES</a><br>
+<P>
+There are two new general option names, PCRE_UTF32 and PCRE_NO_UTF32_CHECK,
+which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In
+fact, these new options define the same bits in the options word. There is a
+discussion about the
+<a href="pcreunicode.html#utf32strings">validity of UTF-32 strings</a>
+in the
+<a href="pcreunicode.html"><b>pcreunicode</b></a>
+page.
+</P>
+<P>
+For the <b>pcre32_config()</b> function there is an option PCRE_CONFIG_UTF32
+that returns 1 if UTF-32 support is configured, otherwise 0. If this option is
+given to <b>pcre_config()</b> or <b>pcre16_config()</b>, or if the
+PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF16 option is given to <b>pcre32_config()</b>,
+the result is the PCRE_ERROR_BADOPTION error.
+</P>
+<br><a name="SEC15" href="#TOC1">CHARACTER CODES</a><br>
+<P>
+In 32-bit mode, when PCRE_UTF32 is not set, character values are treated in the
+same way as in 8-bit, non UTF-8 mode, except, of course, that they can range
+from 0 to 0x7fffffff instead of 0 to 0xff. Character types for characters less
+than 0xff can therefore be influenced by the locale in the same way as before.
+Characters greater than 0xff have only one case, and no "type" (such as letter
+or digit).
+</P>
+<P>
+In UTF-32 mode, the character code is Unicode, in the range 0 to 0x10ffff, with
+the exception of values in the range 0xd800 to 0xdfff because those are
+"surrogate" values that are ill-formed in UTF-32.
+</P>
+<P>
+A UTF-32 string can indicate its endianness by special code knows as a
+byte-order mark (BOM). The PCRE functions do not handle this, expecting strings
+to be in host byte order. A utility function called
+<b>pcre32_utf32_to_host_byte_order()</b> is provided to help with this (see
+above).
+</P>
+<br><a name="SEC16" href="#TOC1">ERROR NAMES</a><br>
+<P>
+The error PCRE_ERROR_BADUTF32 corresponds to its 8-bit counterpart.
+The error PCRE_ERROR_BADMODE is given when a compiled
+pattern is passed to a function that processes patterns in the other
+mode, for example, if a pattern compiled with <b>pcre_compile()</b> is passed to
+<b>pcre32_exec()</b>.
+</P>
+<P>
+There are new error codes whose names begin with PCRE_UTF32_ERR for invalid
+UTF-32 strings, corresponding to the PCRE_UTF8_ERR codes for UTF-8 strings that
+are described in the section entitled
+<a href="pcreapi.html#badutf8reasons">"Reason codes for invalid UTF-8 strings"</a>
+in the main
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page. The UTF-32 errors are:
+<pre>
+ PCRE_UTF32_ERR1 Surrogate character (range from 0xd800 to 0xdfff)
+ PCRE_UTF32_ERR2 Non-character
+ PCRE_UTF32_ERR3 Character &#62; 0x10ffff
+</PRE>
+</P>
+<br><a name="SEC17" href="#TOC1">ERROR TEXTS</a><br>
+<P>
+If there is an error while compiling a pattern, the error text that is passed
+back by <b>pcre32_compile()</b> or <b>pcre32_compile2()</b> is still an 8-bit
+character string, zero-terminated.
+</P>
+<br><a name="SEC18" href="#TOC1">CALLOUTS</a><br>
+<P>
+The <i>subject</i> and <i>mark</i> fields in the callout block that is passed to
+a callout function point to 32-bit vectors.
+</P>
+<br><a name="SEC19" href="#TOC1">TESTING</a><br>
+<P>
+The <b>pcretest</b> program continues to operate with 8-bit input and output
+files, but it can be used for testing the 32-bit library. If it is run with the
+command line option <b>-32</b>, patterns and subject strings are converted from
+8-bit to 32-bit before being passed to PCRE, and the 32-bit library functions
+are used instead of the 8-bit ones. Returned 32-bit strings are converted to
+8-bit for output. If both the 8-bit and the 16-bit libraries were not compiled,
+<b>pcretest</b> defaults to 32-bit and the <b>-32</b> option is ignored.
+</P>
+<P>
+When PCRE is being built, the <b>RunTest</b> script that is called by "make
+check" uses the <b>pcretest</b> <b>-C</b> option to discover which of the 8-bit,
+16-bit and 32-bit libraries has been built, and runs the tests appropriately.
+</P>
+<br><a name="SEC20" href="#TOC1">NOT SUPPORTED IN 32-BIT MODE</a><br>
+<P>
+Not all the features of the 8-bit library are available with the 32-bit
+library. The C++ and POSIX wrapper functions support only the 8-bit library,
+and the <b>pcregrep</b> program is at present 8-bit only.
+</P>
+<br><a name="SEC21" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC22" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 12 May 2013
+<br>
+Copyright &copy; 1997-2013 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_assign_jit_stack.html b/doc/html/pcre_assign_jit_stack.html
new file mode 100644
index 0000000..b2eef70
--- /dev/null
+++ b/doc/html/pcre_assign_jit_stack.html
@@ -0,0 +1,76 @@
+<html>
+<head>
+<title>pcre_assign_jit_stack specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_assign_jit_stack man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>void pcre_assign_jit_stack(pcre_extra *<i>extra</i>,</b>
+<b> pcre_jit_callback <i>callback</i>, void *<i>data</i>);</b>
+<br>
+<br>
+<b>void pcre16_assign_jit_stack(pcre16_extra *<i>extra</i>,</b>
+<b> pcre16_jit_callback <i>callback</i>, void *<i>data</i>);</b>
+<br>
+<br>
+<b>void pcre32_assign_jit_stack(pcre32_extra *<i>extra</i>,</b>
+<b> pcre32_jit_callback <i>callback</i>, void *<i>data</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function provides control over the memory used as a stack at run-time by a
+call to <b>pcre[16|32]_exec()</b> with a pattern that has been successfully
+compiled with JIT optimization. The arguments are:
+<pre>
+ extra the data pointer returned by <b>pcre[16|32]_study()</b>
+ callback a callback function
+ data a JIT stack or a value to be passed to the callback
+ function
+</PRE>
+</P>
+<P>
+If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32K block on
+the machine stack is used.
+</P>
+<P>
+If <i>callback</i> is NULL and <i>data</i> is not NULL, <i>data</i> must
+be a valid JIT stack, the result of calling <b>pcre[16|32]_jit_stack_alloc()</b>.
+</P>
+<P>
+If <i>callback</i> not NULL, it is called with <i>data</i> as an argument at
+the start of matching, in order to set up a JIT stack. If the result is NULL,
+the internal 32K stack is used; otherwise the return value must be a valid JIT
+stack, the result of calling <b>pcre[16|32]_jit_stack_alloc()</b>.
+</P>
+<P>
+You may safely assign the same JIT stack to multiple patterns, as long as they
+are all matched in the same thread. In a multithread application, each thread
+must use its own JIT stack. For more details, see the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+page.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_compile.html b/doc/html/pcre_compile.html
new file mode 100644
index 0000000..95b4bec
--- /dev/null
+++ b/doc/html/pcre_compile.html
@@ -0,0 +1,111 @@
+<html>
+<head>
+<title>pcre_compile specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_compile man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre16 *pcre16_compile(PCRE_SPTR16 <i>pattern</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre32 *pcre32_compile(PCRE_SPTR32 <i>pattern</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function compiles a regular expression into an internal form. It is the
+same as <b>pcre[16|32]_compile2()</b>, except for the absence of the
+<i>errorcodeptr</i> argument. Its arguments are:
+<pre>
+ <i>pattern</i> A zero-terminated string containing the
+ regular expression to be compiled
+ <i>options</i> Zero or more option bits
+ <i>errptr</i> Where to put an error message
+ <i>erroffset</i> Offset in pattern where error was found
+ <i>tableptr</i> Pointer to character tables, or NULL to
+ use the built-in default
+</pre>
+The option bits are:
+<pre>
+ PCRE_ANCHORED Force pattern anchoring
+ PCRE_AUTO_CALLOUT Compile automatic callouts
+ PCRE_BSR_ANYCRLF \R matches only CR, LF, or CRLF
+ PCRE_BSR_UNICODE \R matches all Unicode line endings
+ PCRE_CASELESS Do caseless matching
+ PCRE_DOLLAR_ENDONLY $ not to match newline at end
+ PCRE_DOTALL . matches anything including NL
+ PCRE_DUPNAMES Allow duplicate names for subpatterns
+ PCRE_EXTENDED Ignore white space and # comments
+ PCRE_EXTRA PCRE extra features
+ (not much use currently)
+ PCRE_FIRSTLINE Force matching to be before newline
+ PCRE_JAVASCRIPT_COMPAT JavaScript compatibility
+ PCRE_MULTILINE ^ and $ match newlines within data
+ PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF)
+ PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
+ PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline
+ sequences
+ PCRE_NEWLINE_CR Set CR as the newline sequence
+ PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
+ PCRE_NEWLINE_LF Set LF as the newline sequence
+ PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
+ theses (named ones available)
+ PCRE_NO_AUTO_POSSESS Disable auto-possessification
+ PCRE_NO_START_OPTIMIZE Disable match-time start optimizations
+ PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16
+ validity (only relevant if
+ PCRE_UTF16 is set)
+ PCRE_NO_UTF32_CHECK Do not check the pattern for UTF-32
+ validity (only relevant if
+ PCRE_UTF32 is set)
+ PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
+ validity (only relevant if
+ PCRE_UTF8 is set)
+ PCRE_UCP Use Unicode properties for \d, \w, etc.
+ PCRE_UNGREEDY Invert greediness of quantifiers
+ PCRE_UTF16 Run in <b>pcre16_compile()</b> UTF-16 mode
+ PCRE_UTF32 Run in <b>pcre32_compile()</b> UTF-32 mode
+ PCRE_UTF8 Run in <b>pcre_compile()</b> UTF-8 mode
+</pre>
+PCRE must be built with UTF support in order to use PCRE_UTF8/16/32 and
+PCRE_NO_UTF8/16/32_CHECK, and with UCP support if PCRE_UCP is used.
+</P>
+<P>
+The yield of the function is a pointer to a private data structure that
+contains the compiled pattern, or NULL if an error was detected. Note that
+compiling regular expressions with one version of PCRE for use with a different
+version is not guaranteed to work and may cause crashes.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_compile2.html b/doc/html/pcre_compile2.html
new file mode 100644
index 0000000..9cd56a2
--- /dev/null
+++ b/doc/html/pcre_compile2.html
@@ -0,0 +1,115 @@
+<html>
+<head>
+<title>pcre_compile2 specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_compile2 man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>pcre *pcre_compile2(const char *<i>pattern</i>, int <i>options</i>,</b>
+<b> int *<i>errorcodeptr</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre16 *pcre16_compile2(PCRE_SPTR16 <i>pattern</i>, int <i>options</i>,</b>
+<b> int *<i>errorcodeptr</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre32 *pcre32_compile2(PCRE_SPTR32 <i>pattern</i>, int <i>options</i>,</b>
+<b>" int *<i>errorcodeptr</i>,£</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function compiles a regular expression into an internal form. It is the
+same as <b>pcre[16|32]_compile()</b>, except for the addition of the
+<i>errorcodeptr</i> argument. The arguments are:
+<pre>
+ <i>pattern</i> A zero-terminated string containing the
+ regular expression to be compiled
+ <i>options</i> Zero or more option bits
+ <i>errorcodeptr</i> Where to put an error code
+ <i>errptr</i> Where to put an error message
+ <i>erroffset</i> Offset in pattern where error was found
+ <i>tableptr</i> Pointer to character tables, or NULL to
+ use the built-in default
+</pre>
+The option bits are:
+<pre>
+ PCRE_ANCHORED Force pattern anchoring
+ PCRE_AUTO_CALLOUT Compile automatic callouts
+ PCRE_BSR_ANYCRLF \R matches only CR, LF, or CRLF
+ PCRE_BSR_UNICODE \R matches all Unicode line endings
+ PCRE_CASELESS Do caseless matching
+ PCRE_DOLLAR_ENDONLY $ not to match newline at end
+ PCRE_DOTALL . matches anything including NL
+ PCRE_DUPNAMES Allow duplicate names for subpatterns
+ PCRE_EXTENDED Ignore white space and # comments
+ PCRE_EXTRA PCRE extra features
+ (not much use currently)
+ PCRE_FIRSTLINE Force matching to be before newline
+ PCRE_JAVASCRIPT_COMPAT JavaScript compatibility
+ PCRE_MULTILINE ^ and $ match newlines within data
+ PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF)
+ PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
+ PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline
+ sequences
+ PCRE_NEWLINE_CR Set CR as the newline sequence
+ PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
+ PCRE_NEWLINE_LF Set LF as the newline sequence
+ PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
+ theses (named ones available)
+ PCRE_NO_AUTO_POSSESS Disable auto-possessification
+ PCRE_NO_START_OPTIMIZE Disable match-time start optimizations
+ PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16
+ validity (only relevant if
+ PCRE_UTF16 is set)
+ PCRE_NO_UTF32_CHECK Do not check the pattern for UTF-32
+ validity (only relevant if
+ PCRE_UTF32 is set)
+ PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
+ validity (only relevant if
+ PCRE_UTF8 is set)
+ PCRE_UCP Use Unicode properties for \d, \w, etc.
+ PCRE_UNGREEDY Invert greediness of quantifiers
+ PCRE_UTF16 Run <b>pcre16_compile()</b> in UTF-16 mode
+ PCRE_UTF32 Run <b>pcre32_compile()</b> in UTF-32 mode
+ PCRE_UTF8 Run <b>pcre_compile()</b> in UTF-8 mode
+</pre>
+PCRE must be built with UTF support in order to use PCRE_UTF8/16/32 and
+PCRE_NO_UTF8/16/32_CHECK, and with UCP support if PCRE_UCP is used.
+</P>
+<P>
+The yield of the function is a pointer to a private data structure that
+contains the compiled pattern, or NULL if an error was detected. Note that
+compiling regular expressions with one version of PCRE for use with a different
+version is not guaranteed to work and may cause crashes.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_config.html b/doc/html/pcre_config.html
new file mode 100644
index 0000000..bcdcdde
--- /dev/null
+++ b/doc/html/pcre_config.html
@@ -0,0 +1,92 @@
+<html>
+<head>
+<title>pcre_config specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_config man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>
+</P>
+<P>
+<b>int pcre16_config(int <i>what</i>, void *<i>where</i>);</b>
+</P>
+<P>
+<b>int pcre32_config(int <i>what</i>, void *<i>where</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function makes it possible for a client program to find out which optional
+features are available in the version of the PCRE library it is using. The
+arguments are as follows:
+<pre>
+ <i>what</i> A code specifying what information is required
+ <i>where</i> Points to where to put the data
+</pre>
+The <i>where</i> argument must point to an integer variable, except for
+PCRE_CONFIG_MATCH_LIMIT and PCRE_CONFIG_MATCH_LIMIT_RECURSION, when it must
+point to an unsigned long integer. The available codes are:
+<pre>
+ PCRE_CONFIG_JIT Availability of just-in-time compiler
+ support (1=yes 0=no)
+ PCRE_CONFIG_JITTARGET String containing information about the
+ target architecture for the JIT compiler,
+ or NULL if there is no JIT support
+ PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
+ PCRE_CONFIG_PARENS_LIMIT Parentheses nesting limit
+ PCRE_CONFIG_MATCH_LIMIT Internal resource limit
+ PCRE_CONFIG_MATCH_LIMIT_RECURSION
+ Internal recursion depth limit
+ PCRE_CONFIG_NEWLINE Value of the default newline sequence:
+ 13 (0x000d) for CR
+ 10 (0x000a) for LF
+ 3338 (0x0d0a) for CRLF
+ -2 for ANYCRLF
+ -1 for ANY
+ PCRE_CONFIG_BSR Indicates what \R matches by default:
+ 0 all Unicode line endings
+ 1 CR, LF, or CRLF only
+ PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
+ Threshold of return slots, above which
+ <b>malloc()</b> is used by the POSIX API
+ PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
+ PCRE_CONFIG_UTF16 Availability of UTF-16 support (1=yes
+ 0=no); option for <b>pcre16_config()</b>
+ PCRE_CONFIG_UTF32 Availability of UTF-32 support (1=yes
+ 0=no); option for <b>pcre32_config()</b>
+ PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no);
+ option for <b>pcre_config()</b>
+ PCRE_CONFIG_UNICODE_PROPERTIES
+ Availability of Unicode property support
+ (1=yes 0=no)
+</pre>
+The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise. That error
+is also given if PCRE_CONFIG_UTF16 or PCRE_CONFIG_UTF32 is passed to
+<b>pcre_config()</b>, if PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF32 is passed to
+<b>pcre16_config()</b>, or if PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF16 is passed to
+<b>pcre32_config()</b>.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_copy_named_substring.html b/doc/html/pcre_copy_named_substring.html
new file mode 100644
index 0000000..77b4804
--- /dev/null
+++ b/doc/html/pcre_copy_named_substring.html
@@ -0,0 +1,65 @@
+<html>
+<head>
+<title>pcre_copy_named_substring specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_copy_named_substring man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
+<b> const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, const char *<i>stringname</i>,</b>
+<b> char *<i>buffer</i>, int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre16_copy_named_substring(const pcre16 *<i>code</i>,</b>
+<b> PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
+<b> PCRE_UCHAR16 *<i>buffer</i>, int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre32_copy_named_substring(const pcre32 *<i>code</i>,</b>
+<b> PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, PCRE_SPTR32 <i>stringname</i>,</b>
+<b> PCRE_UCHAR32 *<i>buffer</i>, int <i>buffersize</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This is a convenience function for extracting a captured substring, identified
+by name, into a given buffer. The arguments are:
+<pre>
+ <i>code</i> Pattern that was successfully matched
+ <i>subject</i> Subject that has been successfully matched
+ <i>ovector</i> Offset vector that <b>pcre[16|32]_exec()</b> used
+ <i>stringcount</i> Value returned by <b>pcre[16|32]_exec()</b>
+ <i>stringname</i> Name of the required substring
+ <i>buffer</i> Buffer to receive the string
+ <i>buffersize</i> Size of buffer
+</pre>
+The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
+too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_copy_substring.html b/doc/html/pcre_copy_substring.html
new file mode 100644
index 0000000..ecaebe8
--- /dev/null
+++ b/doc/html/pcre_copy_substring.html
@@ -0,0 +1,61 @@
+<html>
+<head>
+<title>pcre_copy_substring specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_copy_substring man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
+<b> int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre16_copy_substring(PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR16 *<i>buffer</i>,</b>
+<b> int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre32_copy_substring(PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>, PCRE_UCHAR32 *<i>buffer</i>,</b>
+<b> int <i>buffersize</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This is a convenience function for extracting a captured substring into a given
+buffer. The arguments are:
+<pre>
+ <i>subject</i> Subject that has been successfully matched
+ <i>ovector</i> Offset vector that <b>pcre[16|32]_exec()</b> used
+ <i>stringcount</i> Value returned by <b>pcre[16|32]_exec()</b>
+ <i>stringnumber</i> Number of the required substring
+ <i>buffer</i> Buffer to receive the string
+ <i>buffersize</i> Size of buffer
+</pre>
+The yield is the length of the string, PCRE_ERROR_NOMEMORY if the buffer was
+too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_dfa_exec.html b/doc/html/pcre_dfa_exec.html
new file mode 100644
index 0000000..5fff6a7
--- /dev/null
+++ b/doc/html/pcre_dfa_exec.html
@@ -0,0 +1,129 @@
+<html>
+<head>
+<title>pcre_dfa_exec specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_dfa_exec man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_dfa_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
+<br>
+<br>
+<b>int pcre16_dfa_exec(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
+<b> PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
+<br>
+<br>
+<b>int pcre32_dfa_exec(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
+<b> PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function matches a compiled regular expression against a given subject
+string, using an alternative matching algorithm that scans the subject string
+just once (<i>not</i> Perl-compatible). Note that the main, Perl-compatible,
+matching function is <b>pcre[16|32]_exec()</b>. The arguments for this function
+are:
+<pre>
+ <i>code</i> Points to the compiled pattern
+ <i>extra</i> Points to an associated <b>pcre[16|32]_extra</b> structure,
+ or is NULL
+ <i>subject</i> Points to the subject string
+ <i>length</i> Length of the subject string
+ <i>startoffset</i> Offset in the subject at which to start matching
+ <i>options</i> Option bits
+ <i>ovector</i> Points to a vector of ints for result offsets
+ <i>ovecsize</i> Number of elements in the vector
+ <i>workspace</i> Points to a vector of ints used as working space
+ <i>wscount</i> Number of elements in the vector
+</pre>
+The units for <i>length</i> and <i>startoffset</i> are bytes for
+<b>pcre_exec()</b>, 16-bit data items for <b>pcre16_exec()</b>, and 32-bit items
+for <b>pcre32_exec()</b>. The options are:
+<pre>
+ PCRE_ANCHORED Match only at the first position
+ PCRE_BSR_ANYCRLF \R matches only CR, LF, or CRLF
+ PCRE_BSR_UNICODE \R matches all Unicode line endings
+ PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
+ PCRE_NEWLINE_ANYCRLF Recognize CR, LF, & CRLF as newline sequences
+ PCRE_NEWLINE_CR Recognize CR as the only newline sequence
+ PCRE_NEWLINE_CRLF Recognize CRLF as the only newline sequence
+ PCRE_NEWLINE_LF Recognize LF as the only newline sequence
+ PCRE_NOTBOL Subject is not the beginning of a line
+ PCRE_NOTEOL Subject is not the end of a line
+ PCRE_NOTEMPTY An empty string is not a valid match
+ PCRE_NOTEMPTY_ATSTART An empty string at the start of the subject
+ is not a valid match
+ PCRE_NO_START_OPTIMIZE Do not do "start-match" optimizations
+ PCRE_NO_UTF16_CHECK Do not check the subject for UTF-16
+ validity (only relevant if PCRE_UTF16
+ was set at compile time)
+ PCRE_NO_UTF32_CHECK Do not check the subject for UTF-32
+ validity (only relevant if PCRE_UTF32
+ was set at compile time)
+ PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
+ validity (only relevant if PCRE_UTF8
+ was set at compile time)
+ PCRE_PARTIAL ) Return PCRE_ERROR_PARTIAL for a partial
+ PCRE_PARTIAL_SOFT ) match if no full matches are found
+ PCRE_PARTIAL_HARD Return PCRE_ERROR_PARTIAL for a partial match
+ even if there is a full match as well
+ PCRE_DFA_SHORTEST Return only the shortest match
+ PCRE_DFA_RESTART Restart after a partial match
+</pre>
+There are restrictions on what may appear in a pattern when using this matching
+function. Details are given in the
+<a href="pcrematching.html"><b>pcrematching</b></a>
+documentation. For details of partial matching, see the
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+page.
+</P>
+<P>
+A <b>pcre[16|32]_extra</b> structure contains the following fields:
+<pre>
+ <i>flags</i> Bits indicating which fields are set
+ <i>study_data</i> Opaque data from <b>pcre[16|32]_study()</b>
+ <i>match_limit</i> Limit on internal resource use
+ <i>match_limit_recursion</i> Limit on internal recursion depth
+ <i>callout_data</i> Opaque data passed back to callouts
+ <i>tables</i> Points to character tables or is NULL
+ <i>mark</i> For passing back a *MARK pointer
+ <i>executable_jit</i> Opaque data from JIT compilation
+</pre>
+The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
+PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA,
+PCRE_EXTRA_TABLES, PCRE_EXTRA_MARK and PCRE_EXTRA_EXECUTABLE_JIT. For this
+matching function, the <i>match_limit</i> and <i>match_limit_recursion</i> fields
+are not used, and must not be set. The PCRE_EXTRA_EXECUTABLE_JIT flag and
+the corresponding variable are ignored.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_exec.html b/doc/html/pcre_exec.html
new file mode 100644
index 0000000..18e1a13
--- /dev/null
+++ b/doc/html/pcre_exec.html
@@ -0,0 +1,111 @@
+<html>
+<head>
+<title>pcre_exec specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_exec man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
+<br>
+<br>
+<b>int pcre16_exec(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
+<b> PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
+<br>
+<br>
+<b>int pcre32_exec(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
+<b> PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function matches a compiled regular expression against a given subject
+string, using a matching algorithm that is similar to Perl's. It returns
+offsets to captured substrings. Its arguments are:
+<pre>
+ <i>code</i> Points to the compiled pattern
+ <i>extra</i> Points to an associated <b>pcre[16|32]_extra</b> structure,
+ or is NULL
+ <i>subject</i> Points to the subject string
+ <i>length</i> Length of the subject string
+ <i>startoffset</i> Offset in the subject at which to start matching
+ <i>options</i> Option bits
+ <i>ovector</i> Points to a vector of ints for result offsets
+ <i>ovecsize</i> Number of elements in the vector (a multiple of 3)
+</pre>
+The units for <i>length</i> and <i>startoffset</i> are bytes for
+<b>pcre_exec()</b>, 16-bit data items for <b>pcre16_exec()</b>, and 32-bit items
+for <b>pcre32_exec()</b>. The options are:
+<pre>
+ PCRE_ANCHORED Match only at the first position
+ PCRE_BSR_ANYCRLF \R matches only CR, LF, or CRLF
+ PCRE_BSR_UNICODE \R matches all Unicode line endings
+ PCRE_NEWLINE_ANY Recognize any Unicode newline sequence
+ PCRE_NEWLINE_ANYCRLF Recognize CR, LF, & CRLF as newline sequences
+ PCRE_NEWLINE_CR Recognize CR as the only newline sequence
+ PCRE_NEWLINE_CRLF Recognize CRLF as the only newline sequence
+ PCRE_NEWLINE_LF Recognize LF as the only newline sequence
+ PCRE_NOTBOL Subject string is not the beginning of a line
+ PCRE_NOTEOL Subject string is not the end of a line
+ PCRE_NOTEMPTY An empty string is not a valid match
+ PCRE_NOTEMPTY_ATSTART An empty string at the start of the subject
+ is not a valid match
+ PCRE_NO_START_OPTIMIZE Do not do "start-match" optimizations
+ PCRE_NO_UTF16_CHECK Do not check the subject for UTF-16
+ validity (only relevant if PCRE_UTF16
+ was set at compile time)
+ PCRE_NO_UTF32_CHECK Do not check the subject for UTF-32
+ validity (only relevant if PCRE_UTF32
+ was set at compile time)
+ PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
+ validity (only relevant if PCRE_UTF8
+ was set at compile time)
+ PCRE_PARTIAL ) Return PCRE_ERROR_PARTIAL for a partial
+ PCRE_PARTIAL_SOFT ) match if no full matches are found
+ PCRE_PARTIAL_HARD Return PCRE_ERROR_PARTIAL for a partial match
+ if that is found before a full match
+</pre>
+For details of partial matching, see the
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+page. A <b>pcre_extra</b> structure contains the following fields:
+<pre>
+ <i>flags</i> Bits indicating which fields are set
+ <i>study_data</i> Opaque data from <b>pcre[16|32]_study()</b>
+ <i>match_limit</i> Limit on internal resource use
+ <i>match_limit_recursion</i> Limit on internal recursion depth
+ <i>callout_data</i> Opaque data passed back to callouts
+ <i>tables</i> Points to character tables or is NULL
+ <i>mark</i> For passing back a *MARK pointer
+ <i>executable_jit</i> Opaque data from JIT compilation
+</pre>
+The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
+PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA,
+PCRE_EXTRA_TABLES, PCRE_EXTRA_MARK and PCRE_EXTRA_EXECUTABLE_JIT.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_free_study.html b/doc/html/pcre_free_study.html
new file mode 100644
index 0000000..7f9e10e
--- /dev/null
+++ b/doc/html/pcre_free_study.html
@@ -0,0 +1,46 @@
+<html>
+<head>
+<title>pcre_free_study specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_free_study man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>void pcre_free_study(pcre_extra *<i>extra</i>);</b>
+</P>
+<P>
+<b>void pcre16_free_study(pcre16_extra *<i>extra</i>);</b>
+</P>
+<P>
+<b>void pcre32_free_study(pcre32_extra *<i>extra</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function is used to free the memory used for the data generated by a call
+to <b>pcre[16|32]_study()</b> when it is no longer needed. The argument must be the
+result of such a call.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_free_substring.html b/doc/html/pcre_free_substring.html
new file mode 100644
index 0000000..1fe6610
--- /dev/null
+++ b/doc/html/pcre_free_substring.html
@@ -0,0 +1,46 @@
+<html>
+<head>
+<title>pcre_free_substring specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_free_substring man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>void pcre_free_substring(const char *<i>stringptr</i>);</b>
+</P>
+<P>
+<b>void pcre16_free_substring(PCRE_SPTR16 <i>stringptr</i>);</b>
+</P>
+<P>
+<b>void pcre32_free_substring(PCRE_SPTR32 <i>stringptr</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This is a convenience function for freeing the store obtained by a previous
+call to <b>pcre[16|32]_get_substring()</b> or <b>pcre[16|32]_get_named_substring()</b>.
+Its only argument is a pointer to the string.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_free_substring_list.html b/doc/html/pcre_free_substring_list.html
new file mode 100644
index 0000000..c086178
--- /dev/null
+++ b/doc/html/pcre_free_substring_list.html
@@ -0,0 +1,46 @@
+<html>
+<head>
+<title>pcre_free_substring_list specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_free_substring_list man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>void pcre_free_substring_list(const char **<i>stringptr</i>);</b>
+</P>
+<P>
+<b>void pcre16_free_substring_list(PCRE_SPTR16 *<i>stringptr</i>);</b>
+</P>
+<P>
+<b>void pcre32_free_substring_list(PCRE_SPTR32 *<i>stringptr</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This is a convenience function for freeing the store obtained by a previous
+call to <b>pcre[16|32]_get_substring_list()</b>. Its only argument is a pointer to
+the list of string pointers.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_fullinfo.html b/doc/html/pcre_fullinfo.html
new file mode 100644
index 0000000..b88fc11
--- /dev/null
+++ b/doc/html/pcre_fullinfo.html
@@ -0,0 +1,108 @@
+<html>
+<head>
+<title>pcre_fullinfo specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_fullinfo man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> int <i>what</i>, void *<i>where</i>);</b>
+<br>
+<br>
+<b>int pcre16_fullinfo(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
+<b> int <i>what</i>, void *<i>where</i>);</b>
+<br>
+<br>
+<b>int pcre32_fullinfo(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
+<b> int <i>what</i>, void *<i>where</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function returns information about a compiled pattern. Its arguments are:
+<pre>
+ <i>code</i> Compiled regular expression
+ <i>extra</i> Result of <b>pcre[16|32]_study()</b> or NULL
+ <i>what</i> What information is required
+ <i>where</i> Where to put the information
+</pre>
+The following information is available:
+<pre>
+ PCRE_INFO_BACKREFMAX Number of highest back reference
+ PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
+ PCRE_INFO_DEFAULT_TABLES Pointer to default tables
+ PCRE_INFO_FIRSTBYTE Fixed first data unit for a match, or
+ -1 for start of string
+ or after newline, or
+ -2 otherwise
+ PCRE_INFO_FIRSTTABLE Table of first data units (after studying)
+ PCRE_INFO_HASCRORLF Return 1 if explicit CR or LF matches exist
+ PCRE_INFO_JCHANGED Return 1 if (?J) or (?-J) was used
+ PCRE_INFO_JIT Return 1 after successful JIT compilation
+ PCRE_INFO_JITSIZE Size of JIT compiled code
+ PCRE_INFO_LASTLITERAL Literal last data unit required
+ PCRE_INFO_MINLENGTH Lower bound length of matching strings
+ PCRE_INFO_NAMECOUNT Number of named subpatterns
+ PCRE_INFO_NAMEENTRYSIZE Size of name table entry
+ PCRE_INFO_NAMETABLE Pointer to name table
+ PCRE_INFO_OKPARTIAL Return 1 if partial matching can be tried
+ (always returns 1 after release 8.00)
+ PCRE_INFO_OPTIONS Option bits used for compilation
+ PCRE_INFO_SIZE Size of compiled pattern
+ PCRE_INFO_STUDYSIZE Size of study data
+ PCRE_INFO_FIRSTCHARACTER Fixed first data unit for a match
+ PCRE_INFO_FIRSTCHARACTERFLAGS Returns
+ 1 if there is a first data character set, which can
+ then be retrieved using PCRE_INFO_FIRSTCHARACTER,
+ 2 if the first character is at the start of the data
+ string or after a newline, and
+ 0 otherwise
+ PCRE_INFO_REQUIREDCHAR Literal last data unit required
+ PCRE_INFO_REQUIREDCHARFLAGS Returns 1 if the last data character is set (which can then
+ be retrieved using PCRE_INFO_REQUIREDCHAR); 0 otherwise
+</pre>
+The <i>where</i> argument must point to an integer variable, except for the
+following <i>what</i> values:
+<pre>
+ PCRE_INFO_DEFAULT_TABLES const unsigned char *
+ PCRE_INFO_FIRSTTABLE const unsigned char *
+ PCRE_INFO_NAMETABLE PCRE_SPTR16 (16-bit library)
+ PCRE_INFO_NAMETABLE PCRE_SPTR32 (32-bit library)
+ PCRE_INFO_NAMETABLE const unsigned char * (8-bit library)
+ PCRE_INFO_OPTIONS unsigned long int
+ PCRE_INFO_SIZE size_t
+ PCRE_INFO_FIRSTCHARACTER uint32_t
+ PCRE_INFO_REQUIREDCHAR uint32_t
+</pre>
+The yield of the function is zero on success or:
+<pre>
+ PCRE_ERROR_NULL the argument <i>code</i> was NULL
+ the argument <i>where</i> was NULL
+ PCRE_ERROR_BADMAGIC the "magic number" was not found
+ PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
+</PRE>
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_get_named_substring.html b/doc/html/pcre_get_named_substring.html
new file mode 100644
index 0000000..72924d9
--- /dev/null
+++ b/doc/html/pcre_get_named_substring.html
@@ -0,0 +1,68 @@
+<html>
+<head>
+<title>pcre_get_named_substring specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_get_named_substring man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
+<b> const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, const char *<i>stringname</i>,</b>
+<b> const char **<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre16_get_named_substring(const pcre16 *<i>code</i>,</b>
+<b> PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, PCRE_SPTR16 <i>stringname</i>,</b>
+<b> PCRE_SPTR16 *<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre32_get_named_substring(const pcre32 *<i>code</i>,</b>
+<b> PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, PCRE_SPTR32 <i>stringname</i>,</b>
+<b> PCRE_SPTR32 *<i>stringptr</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This is a convenience function for extracting a captured substring by name. The
+arguments are:
+<pre>
+ <i>code</i> Compiled pattern
+ <i>subject</i> Subject that has been successfully matched
+ <i>ovector</i> Offset vector that <b>pcre[16|32]_exec()</b> used
+ <i>stringcount</i> Value returned by <b>pcre[16|32]_exec()</b>
+ <i>stringname</i> Name of the required substring
+ <i>stringptr</i> Where to put the string pointer
+</pre>
+The memory in which the substring is placed is obtained by calling
+<b>pcre[16|32]_malloc()</b>. The convenience function
+<b>pcre[16|32]_free_substring()</b> can be used to free it when it is no longer
+needed. The yield of the function is the length of the extracted substring,
+PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
+PCRE_ERROR_NOSUBSTRING if the string name is invalid.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_get_stringnumber.html b/doc/html/pcre_get_stringnumber.html
new file mode 100644
index 0000000..7324d78
--- /dev/null
+++ b/doc/html/pcre_get_stringnumber.html
@@ -0,0 +1,57 @@
+<html>
+<head>
+<title>pcre_get_stringnumber specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_get_stringnumber man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
+<b> const char *<i>name</i>);</b>
+<br>
+<br>
+<b>int pcre16_get_stringnumber(const pcre16 *<i>code</i>,</b>
+<b> PCRE_SPTR16 <i>name</i>);</b>
+<br>
+<br>
+<b>int pcre32_get_stringnumber(const pcre32 *<i>code</i>,</b>
+<b> PCRE_SPTR32 <i>name</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This convenience function finds the number of a named substring capturing
+parenthesis in a compiled pattern. Its arguments are:
+<pre>
+ <i>code</i> Compiled regular expression
+ <i>name</i> Name whose number is required
+</pre>
+The yield of the function is the number of the parenthesis if the name is
+found, or PCRE_ERROR_NOSUBSTRING otherwise. When duplicate names are allowed
+(PCRE_DUPNAMES is set), it is not defined which of the numbers is returned by
+<b>pcre[16|32]_get_stringnumber()</b>. You can obtain the complete list by calling
+<b>pcre[16|32]_get_stringtable_entries()</b>.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_get_stringtable_entries.html b/doc/html/pcre_get_stringtable_entries.html
new file mode 100644
index 0000000..7990679
--- /dev/null
+++ b/doc/html/pcre_get_stringtable_entries.html
@@ -0,0 +1,60 @@
+<html>
+<head>
+<title>pcre_get_stringtable_entries specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_get_stringtable_entries man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_get_stringtable_entries(const pcre *<i>code</i>,</b>
+<b> const char *<i>name</i>, char **<i>first</i>, char **<i>last</i>);</b>
+<br>
+<br>
+<b>int pcre16_get_stringtable_entries(const pcre16 *<i>code</i>,</b>
+<b> PCRE_SPTR16 <i>name</i>, PCRE_UCHAR16 **<i>first</i>, PCRE_UCHAR16 **<i>last</i>);</b>
+<br>
+<br>
+<b>int pcre32_get_stringtable_entries(const pcre32 *<i>code</i>,</b>
+<b> PCRE_SPTR32 <i>name</i>, PCRE_UCHAR32 **<i>first</i>, PCRE_UCHAR32 **<i>last</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This convenience function finds, for a compiled pattern, the first and last
+entries for a given name in the table that translates capturing parenthesis
+names into numbers. When names are required to be unique (PCRE_DUPNAMES is
+<i>not</i> set), it is usually easier to use <b>pcre[16|32]_get_stringnumber()</b>
+instead.
+<pre>
+ <i>code</i> Compiled regular expression
+ <i>name</i> Name whose entries required
+ <i>first</i> Where to return a pointer to the first entry
+ <i>last</i> Where to return a pointer to the last entry
+</pre>
+The yield of the function is the length of each entry, or
+PCRE_ERROR_NOSUBSTRING if none are found.
+</P>
+<P>
+There is a complete description of the PCRE native API, including the format of
+the table entries, in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page, and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_get_substring.html b/doc/html/pcre_get_substring.html
new file mode 100644
index 0000000..1a8e4f5
--- /dev/null
+++ b/doc/html/pcre_get_substring.html
@@ -0,0 +1,64 @@
+<html>
+<head>
+<title>pcre_get_substring specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_get_substring man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
+<b> const char **<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre16_get_substring(PCRE_SPTR16 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
+<b> PCRE_SPTR16 *<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre32_get_substring(PCRE_SPTR32 <i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
+<b> PCRE_SPTR32 *<i>stringptr</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This is a convenience function for extracting a captured substring. The
+arguments are:
+<pre>
+ <i>subject</i> Subject that has been successfully matched
+ <i>ovector</i> Offset vector that <b>pcre[16|32]_exec()</b> used
+ <i>stringcount</i> Value returned by <b>pcre[16|32]_exec()</b>
+ <i>stringnumber</i> Number of the required substring
+ <i>stringptr</i> Where to put the string pointer
+</pre>
+The memory in which the substring is placed is obtained by calling
+<b>pcre[16|32]_malloc()</b>. The convenience function
+<b>pcre[16|32]_free_substring()</b> can be used to free it when it is no longer
+needed. The yield of the function is the length of the substring,
+PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
+PCRE_ERROR_NOSUBSTRING if the string number is invalid.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_get_substring_list.html b/doc/html/pcre_get_substring_list.html
new file mode 100644
index 0000000..7e8c6bc
--- /dev/null
+++ b/doc/html/pcre_get_substring_list.html
@@ -0,0 +1,61 @@
+<html>
+<head>
+<title>pcre_get_substring_list specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_get_substring_list man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_get_substring_list(const char *<i>subject</i>,</b>
+<b> int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
+<br>
+<br>
+<b>int pcre16_get_substring_list(PCRE_SPTR16 <i>subject</i>,</b>
+<b> int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR16 **<i>listptr</i>);</b>
+<br>
+<br>
+<b>int pcre32_get_substring_list(PCRE_SPTR32 <i>subject</i>,</b>
+<b> int *<i>ovector</i>, int <i>stringcount</i>, PCRE_SPTR32 **<i>listptr</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This is a convenience function for extracting a list of all the captured
+substrings. The arguments are:
+<pre>
+ <i>subject</i> Subject that has been successfully matched
+ <i>ovector</i> Offset vector that <b>pcre[16|32]_exec</b> used
+ <i>stringcount</i> Value returned by <b>pcre[16|32]_exec</b>
+ <i>listptr</i> Where to put a pointer to the list
+</pre>
+The memory in which the substrings and the list are placed is obtained by
+calling <b>pcre[16|32]_malloc()</b>. The convenience function
+<b>pcre[16|32]_free_substring_list()</b> can be used to free it when it is no
+longer needed. A pointer to a list of pointers is put in the variable whose
+address is in <i>listptr</i>. The list is terminated by a NULL pointer. The
+yield of the function is zero on success or PCRE_ERROR_NOMEMORY if sufficient
+memory could not be obtained.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_jit_exec.html b/doc/html/pcre_jit_exec.html
new file mode 100644
index 0000000..4ebb0cb
--- /dev/null
+++ b/doc/html/pcre_jit_exec.html
@@ -0,0 +1,108 @@
+<html>
+<head>
+<title>pcre_jit_exec specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_jit_exec man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_jit_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> pcre_jit_stack *<i>jstack</i>);</b>
+<br>
+<br>
+<b>int pcre16_jit_exec(const pcre16 *<i>code</i>, const pcre16_extra *<i>extra</i>,</b>
+<b> PCRE_SPTR16 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> pcre_jit_stack *<i>jstack</i>);</b>
+<br>
+<br>
+<b>int pcre32_jit_exec(const pcre32 *<i>code</i>, const pcre32_extra *<i>extra</i>,</b>
+<b> PCRE_SPTR32 <i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> pcre_jit_stack *<i>jstack</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function matches a compiled regular expression that has been successfully
+studied with one of the JIT options against a given subject string, using a
+matching algorithm that is similar to Perl's. It is a "fast path" interface to
+JIT, and it bypasses some of the sanity checks that <b>pcre_exec()</b> applies.
+It returns offsets to captured substrings. Its arguments are:
+<pre>
+ <i>code</i> Points to the compiled pattern
+ <i>extra</i> Points to an associated <b>pcre[16|32]_extra</b> structure,
+ or is NULL
+ <i>subject</i> Points to the subject string
+ <i>length</i> Length of the subject string, in bytes
+ <i>startoffset</i> Offset in bytes in the subject at which to
+ start matching
+ <i>options</i> Option bits
+ <i>ovector</i> Points to a vector of ints for result offsets
+ <i>ovecsize</i> Number of elements in the vector (a multiple of 3)
+ <i>jstack</i> Pointer to a JIT stack
+</pre>
+The allowed options are:
+<pre>
+ PCRE_NOTBOL Subject string is not the beginning of a line
+ PCRE_NOTEOL Subject string is not the end of a line
+ PCRE_NOTEMPTY An empty string is not a valid match
+ PCRE_NOTEMPTY_ATSTART An empty string at the start of the subject
+ is not a valid match
+ PCRE_NO_UTF16_CHECK Do not check the subject for UTF-16
+ validity (only relevant if PCRE_UTF16
+ was set at compile time)
+ PCRE_NO_UTF32_CHECK Do not check the subject for UTF-32
+ validity (only relevant if PCRE_UTF32
+ was set at compile time)
+ PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
+ validity (only relevant if PCRE_UTF8
+ was set at compile time)
+ PCRE_PARTIAL ) Return PCRE_ERROR_PARTIAL for a partial
+ PCRE_PARTIAL_SOFT ) match if no full matches are found
+ PCRE_PARTIAL_HARD Return PCRE_ERROR_PARTIAL for a partial match
+ if that is found before a full match
+</pre>
+However, the PCRE_NO_UTF[8|16|32]_CHECK options have no effect, as this check
+is never applied. For details of partial matching, see the
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+page. A <b>pcre_extra</b> structure contains the following fields:
+<pre>
+ <i>flags</i> Bits indicating which fields are set
+ <i>study_data</i> Opaque data from <b>pcre[16|32]_study()</b>
+ <i>match_limit</i> Limit on internal resource use
+ <i>match_limit_recursion</i> Limit on internal recursion depth
+ <i>callout_data</i> Opaque data passed back to callouts
+ <i>tables</i> Points to character tables or is NULL
+ <i>mark</i> For passing back a *MARK pointer
+ <i>executable_jit</i> Opaque data from JIT compilation
+</pre>
+The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
+PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA,
+PCRE_EXTRA_TABLES, PCRE_EXTRA_MARK and PCRE_EXTRA_EXECUTABLE_JIT.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the JIT API in the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_jit_stack_alloc.html b/doc/html/pcre_jit_stack_alloc.html
new file mode 100644
index 0000000..23ba450
--- /dev/null
+++ b/doc/html/pcre_jit_stack_alloc.html
@@ -0,0 +1,55 @@
+<html>
+<head>
+<title>pcre_jit_stack_alloc specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_jit_stack_alloc man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>pcre_jit_stack *pcre_jit_stack_alloc(int <i>startsize</i>,</b>
+<b> int <i>maxsize</i>);</b>
+<br>
+<br>
+<b>pcre16_jit_stack *pcre16_jit_stack_alloc(int <i>startsize</i>,</b>
+<b> int <i>maxsize</i>);</b>
+<br>
+<br>
+<b>pcre32_jit_stack *pcre32_jit_stack_alloc(int <i>startsize</i>,</b>
+<b> int <i>maxsize</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function is used to create a stack for use by the code compiled by the JIT
+optimization of <b>pcre[16|32]_study()</b>. The arguments are a starting size for
+the stack, and a maximum size to which it is allowed to grow. The result can be
+passed to the JIT run-time code by <b>pcre[16|32]_assign_jit_stack()</b>, or that
+function can set up a callback for obtaining a stack. A maximum stack size of
+512K to 1M should be more than enough for any pattern. For more details, see
+the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+page.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_jit_stack_free.html b/doc/html/pcre_jit_stack_free.html
new file mode 100644
index 0000000..8bd06e4
--- /dev/null
+++ b/doc/html/pcre_jit_stack_free.html
@@ -0,0 +1,48 @@
+<html>
+<head>
+<title>pcre_jit_stack_free specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_jit_stack_free man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>void pcre_jit_stack_free(pcre_jit_stack *<i>stack</i>);</b>
+</P>
+<P>
+<b>void pcre16_jit_stack_free(pcre16_jit_stack *<i>stack</i>);</b>
+</P>
+<P>
+<b>void pcre32_jit_stack_free(pcre32_jit_stack *<i>stack</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function is used to free a JIT stack that was created by
+<b>pcre[16|32]_jit_stack_alloc()</b> when it is no longer needed. For more details,
+see the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+page.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_maketables.html b/doc/html/pcre_maketables.html
new file mode 100644
index 0000000..3a7b5eb
--- /dev/null
+++ b/doc/html/pcre_maketables.html
@@ -0,0 +1,48 @@
+<html>
+<head>
+<title>pcre_maketables specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_maketables man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>const unsigned char *pcre_maketables(void);</b>
+</P>
+<P>
+<b>const unsigned char *pcre16_maketables(void);</b>
+</P>
+<P>
+<b>const unsigned char *pcre32_maketables(void);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function builds a set of character tables for character values less than
+256. These can be passed to <b>pcre[16|32]_compile()</b> to override PCRE's
+internal, built-in tables (which were made by <b>pcre[16|32]_maketables()</b> when
+PCRE was compiled). You might want to do this if you are using a non-standard
+locale. The function yields a pointer to the tables.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_pattern_to_host_byte_order.html b/doc/html/pcre_pattern_to_host_byte_order.html
new file mode 100644
index 0000000..1b1c803
--- /dev/null
+++ b/doc/html/pcre_pattern_to_host_byte_order.html
@@ -0,0 +1,58 @@
+<html>
+<head>
+<title>pcre_pattern_to_host_byte_order specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_pattern_to_host_byte_order man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_pattern_to_host_byte_order(pcre *<i>code</i>,</b>
+<b> pcre_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
+<br>
+<br>
+<b>int pcre16_pattern_to_host_byte_order(pcre16 *<i>code</i>,</b>
+<b> pcre16_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
+<br>
+<br>
+<b>int pcre32_pattern_to_host_byte_order(pcre32 *<i>code</i>,</b>
+<b> pcre32_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function ensures that the bytes in 2-byte and 4-byte values in a compiled
+pattern are in the correct order for the current host. It is useful when a
+pattern that has been compiled on one host is transferred to another that might
+have different endianness. The arguments are:
+<pre>
+ <i>code</i> A compiled regular expression
+ <i>extra</i> Points to an associated <b>pcre[16|32]_extra</b> structure,
+ or is NULL
+ <i>tables</i> Pointer to character tables, or NULL to
+ set the built-in default
+</pre>
+The result is 0 for success, a negative PCRE_ERROR_xxx value otherwise.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_refcount.html b/doc/html/pcre_refcount.html
new file mode 100644
index 0000000..bfb92e6
--- /dev/null
+++ b/doc/html/pcre_refcount.html
@@ -0,0 +1,51 @@
+<html>
+<head>
+<title>pcre_refcount specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_refcount man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre_refcount(pcre *<i>code</i>, int <i>adjust</i>);</b>
+</P>
+<P>
+<b>int pcre16_refcount(pcre16 *<i>code</i>, int <i>adjust</i>);</b>
+</P>
+<P>
+<b>int pcre32_refcount(pcre32 *<i>code</i>, int <i>adjust</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function is used to maintain a reference count inside a data block that
+contains a compiled pattern. Its arguments are:
+<pre>
+ <i>code</i> Compiled regular expression
+ <i>adjust</i> Adjustment to reference value
+</pre>
+The yield of the function is the adjusted reference value, which is constrained
+to lie between 0 and 65535.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_study.html b/doc/html/pcre_study.html
new file mode 100644
index 0000000..af82f11
--- /dev/null
+++ b/doc/html/pcre_study.html
@@ -0,0 +1,68 @@
+<html>
+<head>
+<title>pcre_study specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_study man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>);</b>
+<br>
+<br>
+<b>pcre16_extra *pcre16_study(const pcre16 *<i>code</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>);</b>
+<br>
+<br>
+<b>pcre32_extra *pcre32_study(const pcre32 *<i>code</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function studies a compiled pattern, to see if additional information can
+be extracted that might speed up matching. Its arguments are:
+<pre>
+ <i>code</i> A compiled regular expression
+ <i>options</i> Options for <b>pcre[16|32]_study()</b>
+ <i>errptr</i> Where to put an error message
+</pre>
+If the function succeeds, it returns a value that can be passed to
+<b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> via their <i>extra</i>
+arguments.
+</P>
+<P>
+If the function returns NULL, either it could not find any additional
+information, or there was an error. You can tell the difference by looking at
+the error value. It is NULL in first case.
+</P>
+<P>
+The only option is PCRE_STUDY_JIT_COMPILE. It requests just-in-time compilation
+if possible. If PCRE has been compiled without JIT support, this option is
+ignored. See the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+page for further details.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_utf16_to_host_byte_order.html b/doc/html/pcre_utf16_to_host_byte_order.html
new file mode 100644
index 0000000..18e7788
--- /dev/null
+++ b/doc/html/pcre_utf16_to_host_byte_order.html
@@ -0,0 +1,57 @@
+<html>
+<head>
+<title>pcre_utf16_to_host_byte_order specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_utf16_to_host_byte_order man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *<i>output</i>,</b>
+<b> PCRE_SPTR16 <i>input</i>, int <i>length</i>, int *<i>host_byte_order</i>,</b>
+<b> int <i>keep_boms</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function, which exists only in the 16-bit library, converts a UTF-16
+string to the correct order for the current host, taking account of any byte
+order marks (BOMs) within the string. Its arguments are:
+<pre>
+ <i>output</i> pointer to output buffer, may be the same as <i>input</i>
+ <i>input</i> pointer to input buffer
+ <i>length</i> number of 16-bit units in the input, or negative for
+ a zero-terminated string
+ <i>host_byte_order</i> a NULL value or a non-zero value pointed to means
+ start in host byte order
+ <i>keep_boms</i> if non-zero, BOMs are copied to the output string
+</pre>
+The result of the function is the number of 16-bit units placed into the output
+buffer, including the zero terminator if the string was zero-terminated.
+</P>
+<P>
+If <i>host_byte_order</i> is not NULL, it is set to indicate the byte order that
+is current at the end of the string.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_utf32_to_host_byte_order.html b/doc/html/pcre_utf32_to_host_byte_order.html
new file mode 100644
index 0000000..772ae40
--- /dev/null
+++ b/doc/html/pcre_utf32_to_host_byte_order.html
@@ -0,0 +1,57 @@
+<html>
+<head>
+<title>pcre_utf32_to_host_byte_order specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_utf32_to_host_byte_order man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *<i>output</i>,</b>
+<b> PCRE_SPTR32 <i>input</i>, int <i>length</i>, int *<i>host_byte_order</i>,</b>
+<b> int <i>keep_boms</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function, which exists only in the 32-bit library, converts a UTF-32
+string to the correct order for the current host, taking account of any byte
+order marks (BOMs) within the string. Its arguments are:
+<pre>
+ <i>output</i> pointer to output buffer, may be the same as <i>input</i>
+ <i>input</i> pointer to input buffer
+ <i>length</i> number of 32-bit units in the input, or negative for
+ a zero-terminated string
+ <i>host_byte_order</i> a NULL value or a non-zero value pointed to means
+ start in host byte order
+ <i>keep_boms</i> if non-zero, BOMs are copied to the output string
+</pre>
+The result of the function is the number of 32-bit units placed into the output
+buffer, including the zero terminator if the string was zero-terminated.
+</P>
+<P>
+If <i>host_byte_order</i> is not NULL, it is set to indicate the byte order that
+is current at the end of the string.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcre_version.html b/doc/html/pcre_version.html
new file mode 100644
index 0000000..d33e718
--- /dev/null
+++ b/doc/html/pcre_version.html
@@ -0,0 +1,46 @@
+<html>
+<head>
+<title>pcre_version specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre_version man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>const char *pcre_version(void);</b>
+</P>
+<P>
+<b>const char *pcre16_version(void);</b>
+</P>
+<P>
+<b>const char *pcre32_version(void);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function (even in the 16-bit and 32-bit libraries) returns a
+zero-terminated, 8-bit character string that gives the version number of the
+PCRE library and the date of its release.
+</P>
+<P>
+There is a complete description of the PCRE native API in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page and a description of the POSIX API in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcreapi.html b/doc/html/pcreapi.html
new file mode 100644
index 0000000..b401ecc
--- /dev/null
+++ b/doc/html/pcreapi.html
@@ -0,0 +1,2922 @@
+<html>
+<head>
+<title>pcreapi specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcreapi man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE NATIVE API BASIC FUNCTIONS</a>
+<li><a name="TOC2" href="#SEC2">PCRE NATIVE API STRING EXTRACTION FUNCTIONS</a>
+<li><a name="TOC3" href="#SEC3">PCRE NATIVE API AUXILIARY FUNCTIONS</a>
+<li><a name="TOC4" href="#SEC4">PCRE NATIVE API INDIRECTED FUNCTIONS</a>
+<li><a name="TOC5" href="#SEC5">PCRE 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a>
+<li><a name="TOC6" href="#SEC6">PCRE API OVERVIEW</a>
+<li><a name="TOC7" href="#SEC7">NEWLINES</a>
+<li><a name="TOC8" href="#SEC8">MULTITHREADING</a>
+<li><a name="TOC9" href="#SEC9">SAVING PRECOMPILED PATTERNS FOR LATER USE</a>
+<li><a name="TOC10" href="#SEC10">CHECKING BUILD-TIME OPTIONS</a>
+<li><a name="TOC11" href="#SEC11">COMPILING A PATTERN</a>
+<li><a name="TOC12" href="#SEC12">COMPILATION ERROR CODES</a>
+<li><a name="TOC13" href="#SEC13">STUDYING A PATTERN</a>
+<li><a name="TOC14" href="#SEC14">LOCALE SUPPORT</a>
+<li><a name="TOC15" href="#SEC15">INFORMATION ABOUT A PATTERN</a>
+<li><a name="TOC16" href="#SEC16">REFERENCE COUNTS</a>
+<li><a name="TOC17" href="#SEC17">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a>
+<li><a name="TOC18" href="#SEC18">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
+<li><a name="TOC19" href="#SEC19">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
+<li><a name="TOC20" href="#SEC20">DUPLICATE SUBPATTERN NAMES</a>
+<li><a name="TOC21" href="#SEC21">FINDING ALL POSSIBLE MATCHES</a>
+<li><a name="TOC22" href="#SEC22">OBTAINING AN ESTIMATE OF STACK USAGE</a>
+<li><a name="TOC23" href="#SEC23">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
+<li><a name="TOC24" href="#SEC24">SEE ALSO</a>
+<li><a name="TOC25" href="#SEC25">AUTHOR</a>
+<li><a name="TOC26" href="#SEC26">REVISION</a>
+</ul>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<br><a name="SEC1" href="#TOC1">PCRE NATIVE API BASIC FUNCTIONS</a><br>
+<P>
+<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre *pcre_compile2(const char *<i>pattern</i>, int <i>options</i>,</b>
+<b> int *<i>errorcodeptr</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>);</b>
+<br>
+<br>
+<b>void pcre_free_study(pcre_extra *<i>extra</i>);</b>
+<br>
+<br>
+<b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
+<br>
+<br>
+<b>int pcre_dfa_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
+</P>
+<br><a name="SEC2" href="#TOC1">PCRE NATIVE API STRING EXTRACTION FUNCTIONS</a><br>
+<P>
+<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
+<b> const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, const char *<i>stringname</i>,</b>
+<b> char *<i>buffer</i>, int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
+<b> int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
+<b> const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, const char *<i>stringname</i>,</b>
+<b> const char **<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
+<b> const char *<i>name</i>);</b>
+<br>
+<br>
+<b>int pcre_get_stringtable_entries(const pcre *<i>code</i>,</b>
+<b> const char *<i>name</i>, char **<i>first</i>, char **<i>last</i>);</b>
+<br>
+<br>
+<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
+<b> const char **<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre_get_substring_list(const char *<i>subject</i>,</b>
+<b> int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
+<br>
+<br>
+<b>void pcre_free_substring(const char *<i>stringptr</i>);</b>
+<br>
+<br>
+<b>void pcre_free_substring_list(const char **<i>stringptr</i>);</b>
+</P>
+<br><a name="SEC3" href="#TOC1">PCRE NATIVE API AUXILIARY FUNCTIONS</a><br>
+<P>
+<b>int pcre_jit_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> pcre_jit_stack *<i>jstack</i>);</b>
+<br>
+<br>
+<b>pcre_jit_stack *pcre_jit_stack_alloc(int <i>startsize</i>, int <i>maxsize</i>);</b>
+<br>
+<br>
+<b>void pcre_jit_stack_free(pcre_jit_stack *<i>stack</i>);</b>
+<br>
+<br>
+<b>void pcre_assign_jit_stack(pcre_extra *<i>extra</i>,</b>
+<b> pcre_jit_callback <i>callback</i>, void *<i>data</i>);</b>
+<br>
+<br>
+<b>const unsigned char *pcre_maketables(void);</b>
+<br>
+<br>
+<b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> int <i>what</i>, void *<i>where</i>);</b>
+<br>
+<br>
+<b>int pcre_refcount(pcre *<i>code</i>, int <i>adjust</i>);</b>
+<br>
+<br>
+<b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>
+<br>
+<br>
+<b>const char *pcre_version(void);</b>
+<br>
+<br>
+<b>int pcre_pattern_to_host_byte_order(pcre *<i>code</i>,</b>
+<b> pcre_extra *<i>extra</i>, const unsigned char *<i>tables</i>);</b>
+</P>
+<br><a name="SEC4" href="#TOC1">PCRE NATIVE API INDIRECTED FUNCTIONS</a><br>
+<P>
+<b>void *(*pcre_malloc)(size_t);</b>
+<br>
+<br>
+<b>void (*pcre_free)(void *);</b>
+<br>
+<br>
+<b>void *(*pcre_stack_malloc)(size_t);</b>
+<br>
+<br>
+<b>void (*pcre_stack_free)(void *);</b>
+<br>
+<br>
+<b>int (*pcre_callout)(pcre_callout_block *);</b>
+<br>
+<br>
+<b>int (*pcre_stack_guard)(void);</b>
+</P>
+<br><a name="SEC5" href="#TOC1">PCRE 8-BIT, 16-BIT, AND 32-BIT LIBRARIES</a><br>
+<P>
+As well as support for 8-bit character strings, PCRE also supports 16-bit
+strings (from release 8.30) and 32-bit strings (from release 8.32), by means of
+two additional libraries. They can be built as well as, or instead of, the
+8-bit library. To avoid too much complication, this document describes the
+8-bit versions of the functions, with only occasional references to the 16-bit
+and 32-bit libraries.
+</P>
+<P>
+The 16-bit and 32-bit functions operate in the same way as their 8-bit
+counterparts; they just use different data types for their arguments and
+results, and their names start with <b>pcre16_</b> or <b>pcre32_</b> instead of
+<b>pcre_</b>. For every option that has UTF8 in its name (for example,
+PCRE_UTF8), there are corresponding 16-bit and 32-bit names with UTF8 replaced
+by UTF16 or UTF32, respectively. This facility is in fact just cosmetic; the
+16-bit and 32-bit option names define the same bit values.
+</P>
+<P>
+References to bytes and UTF-8 in this document should be read as references to
+16-bit data units and UTF-16 when using the 16-bit library, or 32-bit data
+units and UTF-32 when using the 32-bit library, unless specified otherwise.
+More details of the specific differences for the 16-bit and 32-bit libraries
+are given in the
+<a href="pcre16.html"><b>pcre16</b></a>
+and
+<a href="pcre32.html"><b>pcre32</b></a>
+pages.
+</P>
+<br><a name="SEC6" href="#TOC1">PCRE API OVERVIEW</a><br>
+<P>
+PCRE has its own native API, which is described in this document. There are
+also some wrapper functions (for the 8-bit library only) that correspond to the
+POSIX regular expression API, but they do not give access to all the
+functionality. They are described in the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+documentation. Both of these APIs define a set of C function calls. A C++
+wrapper (again for the 8-bit library only) is also distributed with PCRE. It is
+documented in the
+<a href="pcrecpp.html"><b>pcrecpp</b></a>
+page.
+</P>
+<P>
+The native API C function prototypes are defined in the header file
+<b>pcre.h</b>, and on Unix-like systems the (8-bit) library itself is called
+<b>libpcre</b>. It can normally be accessed by adding <b>-lpcre</b> to the
+command for linking an application that uses PCRE. The header file defines the
+macros PCRE_MAJOR and PCRE_MINOR to contain the major and minor release numbers
+for the library. Applications can use these to include support for different
+releases of PCRE.
+</P>
+<P>
+In a Windows environment, if you want to statically link an application program
+against a non-dll <b>pcre.a</b> file, you must define PCRE_STATIC before
+including <b>pcre.h</b> or <b>pcrecpp.h</b>, because otherwise the
+<b>pcre_malloc()</b> and <b>pcre_free()</b> exported functions will be declared
+<b>__declspec(dllimport)</b>, with unwanted results.
+</P>
+<P>
+The functions <b>pcre_compile()</b>, <b>pcre_compile2()</b>, <b>pcre_study()</b>,
+and <b>pcre_exec()</b> are used for compiling and matching regular expressions
+in a Perl-compatible manner. A sample program that demonstrates the simplest
+way of using them is provided in the file called <i>pcredemo.c</i> in the PCRE
+source distribution. A listing of this program is given in the
+<a href="pcredemo.html"><b>pcredemo</b></a>
+documentation, and the
+<a href="pcresample.html"><b>pcresample</b></a>
+documentation describes how to compile and run it.
+</P>
+<P>
+Just-in-time compiler support is an optional feature of PCRE that can be built
+in appropriate hardware environments. It greatly speeds up the matching
+performance of many patterns. Simple programs can easily request that it be
+used if available, by setting an option that is ignored when it is not
+relevant. More complicated programs might need to make use of the functions
+<b>pcre_jit_stack_alloc()</b>, <b>pcre_jit_stack_free()</b>, and
+<b>pcre_assign_jit_stack()</b> in order to control the JIT code's memory usage.
+</P>
+<P>
+From release 8.32 there is also a direct interface for JIT execution, which
+gives improved performance. The JIT-specific functions are discussed in the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+documentation.
+</P>
+<P>
+A second matching function, <b>pcre_dfa_exec()</b>, which is not
+Perl-compatible, is also provided. This uses a different algorithm for the
+matching. The alternative algorithm finds all possible matches (at a given
+point in the subject), and scans the subject just once (unless there are
+lookbehind assertions). However, this algorithm does not return captured
+substrings. A description of the two matching algorithms and their advantages
+and disadvantages is given in the
+<a href="pcrematching.html"><b>pcrematching</b></a>
+documentation.
+</P>
+<P>
+In addition to the main compiling and matching functions, there are convenience
+functions for extracting captured substrings from a subject string that is
+matched by <b>pcre_exec()</b>. They are:
+<pre>
+ <b>pcre_copy_substring()</b>
+ <b>pcre_copy_named_substring()</b>
+ <b>pcre_get_substring()</b>
+ <b>pcre_get_named_substring()</b>
+ <b>pcre_get_substring_list()</b>
+ <b>pcre_get_stringnumber()</b>
+ <b>pcre_get_stringtable_entries()</b>
+</pre>
+<b>pcre_free_substring()</b> and <b>pcre_free_substring_list()</b> are also
+provided, to free the memory used for extracted strings.
+</P>
+<P>
+The function <b>pcre_maketables()</b> is used to build a set of character tables
+in the current locale for passing to <b>pcre_compile()</b>, <b>pcre_exec()</b>,
+or <b>pcre_dfa_exec()</b>. This is an optional facility that is provided for
+specialist use. Most commonly, no special tables are passed, in which case
+internal tables that are generated when PCRE is built are used.
+</P>
+<P>
+The function <b>pcre_fullinfo()</b> is used to find out information about a
+compiled pattern. The function <b>pcre_version()</b> returns a pointer to a
+string containing the version of PCRE and its date of release.
+</P>
+<P>
+The function <b>pcre_refcount()</b> maintains a reference count in a data block
+containing a compiled pattern. This is provided for the benefit of
+object-oriented applications.
+</P>
+<P>
+The global variables <b>pcre_malloc</b> and <b>pcre_free</b> initially contain
+the entry points of the standard <b>malloc()</b> and <b>free()</b> functions,
+respectively. PCRE calls the memory management functions via these variables,
+so a calling program can replace them if it wishes to intercept the calls. This
+should be done before calling any PCRE functions.
+</P>
+<P>
+The global variables <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> are also
+indirections to memory management functions. These special functions are used
+only when PCRE is compiled to use the heap for remembering data, instead of
+recursive function calls, when running the <b>pcre_exec()</b> function. See the
+<a href="pcrebuild.html"><b>pcrebuild</b></a>
+documentation for details of how to do this. It is a non-standard way of
+building PCRE, for use in environments that have limited stacks. Because of the
+greater use of memory management, it runs more slowly. Separate functions are
+provided so that special-purpose external code can be used for this case. When
+used, these functions are always called in a stack-like manner (last obtained,
+first freed), and always for memory blocks of the same size. There is a
+discussion about PCRE's stack usage in the
+<a href="pcrestack.html"><b>pcrestack</b></a>
+documentation.
+</P>
+<P>
+The global variable <b>pcre_callout</b> initially contains NULL. It can be set
+by the caller to a "callout" function, which PCRE will then call at specified
+points during a matching operation. Details are given in the
+<a href="pcrecallout.html"><b>pcrecallout</b></a>
+documentation.
+</P>
+<P>
+The global variable <b>pcre_stack_guard</b> initially contains NULL. It can be
+set by the caller to a function that is called by PCRE whenever it starts
+to compile a parenthesized part of a pattern. When parentheses are nested, PCRE
+uses recursive function calls, which use up the system stack. This function is
+provided so that applications with restricted stacks can force a compilation
+error if the stack runs out. The function should return zero if all is well, or
+non-zero to force an error.
+<a name="newlines"></a></P>
+<br><a name="SEC7" href="#TOC1">NEWLINES</a><br>
+<P>
+PCRE supports five different conventions for indicating line breaks in
+strings: a single CR (carriage return) character, a single LF (linefeed)
+character, the two-character sequence CRLF, any of the three preceding, or any
+Unicode newline sequence. The Unicode newline sequences are the three just
+mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed,
+U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
+(paragraph separator, U+2029).
+</P>
+<P>
+Each of the first three conventions is used by at least one operating system as
+its standard newline sequence. When PCRE is built, a default can be specified.
+The default default is LF, which is the Unix standard. When PCRE is run, the
+default can be overridden, either when a pattern is compiled, or when it is
+matched.
+</P>
+<P>
+At compile time, the newline convention can be specified by the <i>options</i>
+argument of <b>pcre_compile()</b>, or it can be specified by special text at the
+start of the pattern itself; this overrides any other settings. See the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+page for details of the special character sequences.
+</P>
+<P>
+In the PCRE documentation the word "newline" is used to mean "the character or
+pair of characters that indicate a line break". The choice of newline
+convention affects the handling of the dot, circumflex, and dollar
+metacharacters, the handling of #-comments in /x mode, and, when CRLF is a
+recognized line ending sequence, the match position advancement for a
+non-anchored pattern. There is more detail about this in the
+<a href="#execoptions">section on <b>pcre_exec()</b> options</a>
+below.
+</P>
+<P>
+The choice of newline convention does not affect the interpretation of
+the \n or \r escape sequences, nor does it affect what \R matches, which is
+controlled in a similar way, but by separate options.
+</P>
+<br><a name="SEC8" href="#TOC1">MULTITHREADING</a><br>
+<P>
+The PCRE functions can be used in multi-threading applications, with the
+proviso that the memory management functions pointed to by <b>pcre_malloc</b>,
+<b>pcre_free</b>, <b>pcre_stack_malloc</b>, and <b>pcre_stack_free</b>, and the
+callout and stack-checking functions pointed to by <b>pcre_callout</b> and
+<b>pcre_stack_guard</b>, are shared by all threads.
+</P>
+<P>
+The compiled form of a regular expression is not altered during matching, so
+the same compiled pattern can safely be used by several threads at once.
+</P>
+<P>
+If the just-in-time optimization feature is being used, it needs separate
+memory stack areas for each thread. See the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+documentation for more details.
+</P>
+<br><a name="SEC9" href="#TOC1">SAVING PRECOMPILED PATTERNS FOR LATER USE</a><br>
+<P>
+The compiled form of a regular expression can be saved and re-used at a later
+time, possibly by a different program, and even on a host other than the one on
+which it was compiled. Details are given in the
+<a href="pcreprecompile.html"><b>pcreprecompile</b></a>
+documentation, which includes a description of the
+<b>pcre_pattern_to_host_byte_order()</b> function. However, compiling a regular
+expression with one version of PCRE for use with a different version is not
+guaranteed to work and may cause crashes.
+</P>
+<br><a name="SEC10" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
+<P>
+<b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>
+</P>
+<P>
+The function <b>pcre_config()</b> makes it possible for a PCRE client to
+discover which optional features have been compiled into the PCRE library. The
+<a href="pcrebuild.html"><b>pcrebuild</b></a>
+documentation has more details about these optional features.
+</P>
+<P>
+The first argument for <b>pcre_config()</b> is an integer, specifying which
+information is required; the second argument is a pointer to a variable into
+which the information is placed. The returned value is zero on success, or the
+negative error code PCRE_ERROR_BADOPTION if the value in the first argument is
+not recognized. The following information is available:
+<pre>
+ PCRE_CONFIG_UTF8
+</pre>
+The output is an integer that is set to one if UTF-8 support is available;
+otherwise it is set to zero. This value should normally be given to the 8-bit
+version of this function, <b>pcre_config()</b>. If it is given to the 16-bit
+or 32-bit version of this function, the result is PCRE_ERROR_BADOPTION.
+<pre>
+ PCRE_CONFIG_UTF16
+</pre>
+The output is an integer that is set to one if UTF-16 support is available;
+otherwise it is set to zero. This value should normally be given to the 16-bit
+version of this function, <b>pcre16_config()</b>. If it is given to the 8-bit
+or 32-bit version of this function, the result is PCRE_ERROR_BADOPTION.
+<pre>
+ PCRE_CONFIG_UTF32
+</pre>
+The output is an integer that is set to one if UTF-32 support is available;
+otherwise it is set to zero. This value should normally be given to the 32-bit
+version of this function, <b>pcre32_config()</b>. If it is given to the 8-bit
+or 16-bit version of this function, the result is PCRE_ERROR_BADOPTION.
+<pre>
+ PCRE_CONFIG_UNICODE_PROPERTIES
+</pre>
+The output is an integer that is set to one if support for Unicode character
+properties is available; otherwise it is set to zero.
+<pre>
+ PCRE_CONFIG_JIT
+</pre>
+The output is an integer that is set to one if support for just-in-time
+compiling is available; otherwise it is set to zero.
+<pre>
+ PCRE_CONFIG_JITTARGET
+</pre>
+The output is a pointer to a zero-terminated "const char *" string. If JIT
+support is available, the string contains the name of the architecture for
+which the JIT compiler is configured, for example "x86 32bit (little endian +
+unaligned)". If JIT support is not available, the result is NULL.
+<pre>
+ PCRE_CONFIG_NEWLINE
+</pre>
+The output is an integer whose value specifies the default character sequence
+that is recognized as meaning "newline". The values that are supported in
+ASCII/Unicode environments are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for
+ANYCRLF, and -1 for ANY. In EBCDIC environments, CR, ANYCRLF, and ANY yield the
+same values. However, the value for LF is normally 21, though some EBCDIC
+environments use 37. The corresponding values for CRLF are 3349 and 3365. The
+default should normally correspond to the standard sequence for your operating
+system.
+<pre>
+ PCRE_CONFIG_BSR
+</pre>
+The output is an integer whose value indicates what character sequences the \R
+escape sequence matches by default. A value of 0 means that \R matches any
+Unicode line ending sequence; a value of 1 means that \R matches only CR, LF,
+or CRLF. The default can be overridden when a pattern is compiled or matched.
+<pre>
+ PCRE_CONFIG_LINK_SIZE
+</pre>
+The output is an integer that contains the number of bytes used for internal
+linkage in compiled regular expressions. For the 8-bit library, the value can
+be 2, 3, or 4. For the 16-bit library, the value is either 2 or 4 and is still
+a number of bytes. For the 32-bit library, the value is either 2 or 4 and is
+still a number of bytes. The default value of 2 is sufficient for all but the
+most massive patterns, since it allows the compiled pattern to be up to 64K in
+size. Larger values allow larger regular expressions to be compiled, at the
+expense of slower matching.
+<pre>
+ PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
+</pre>
+The output is an integer that contains the threshold above which the POSIX
+interface uses <b>malloc()</b> for output vectors. Further details are given in
+the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+documentation.
+<pre>
+ PCRE_CONFIG_PARENS_LIMIT
+</pre>
+The output is a long integer that gives the maximum depth of nesting of
+parentheses (of any kind) in a pattern. This limit is imposed to cap the amount
+of system stack used when a pattern is compiled. It is specified when PCRE is
+built; the default is 250. This limit does not take into account the stack that
+may already be used by the calling application. For finer control over
+compilation stack usage, you can set a pointer to an external checking function
+in <b>pcre_stack_guard</b>.
+<pre>
+ PCRE_CONFIG_MATCH_LIMIT
+</pre>
+The output is a long integer that gives the default limit for the number of
+internal matching function calls in a <b>pcre_exec()</b> execution. Further
+details are given with <b>pcre_exec()</b> below.
+<pre>
+ PCRE_CONFIG_MATCH_LIMIT_RECURSION
+</pre>
+The output is a long integer that gives the default limit for the depth of
+recursion when calling the internal matching function in a <b>pcre_exec()</b>
+execution. Further details are given with <b>pcre_exec()</b> below.
+<pre>
+ PCRE_CONFIG_STACKRECURSE
+</pre>
+The output is an integer that is set to one if internal recursion when running
+<b>pcre_exec()</b> is implemented by recursive function calls that use the stack
+to remember their state. This is the usual way that PCRE is compiled. The
+output is zero if PCRE was compiled to use blocks of data on the heap instead
+of recursive function calls. In this case, <b>pcre_stack_malloc</b> and
+<b>pcre_stack_free</b> are called to manage memory blocks on the heap, thus
+avoiding the use of the stack.
+</P>
+<br><a name="SEC11" href="#TOC1">COMPILING A PATTERN</a><br>
+<P>
+<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+<br>
+<br>
+<b>pcre *pcre_compile2(const char *<i>pattern</i>, int <i>options</i>,</b>
+<b> int *<i>errorcodeptr</i>,</b>
+<b> const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
+<b> const unsigned char *<i>tableptr</i>);</b>
+</P>
+<P>
+Either of the functions <b>pcre_compile()</b> or <b>pcre_compile2()</b> can be
+called to compile a pattern into an internal form. The only difference between
+the two interfaces is that <b>pcre_compile2()</b> has an additional argument,
+<i>errorcodeptr</i>, via which a numerical error code can be returned. To avoid
+too much repetition, we refer just to <b>pcre_compile()</b> below, but the
+information applies equally to <b>pcre_compile2()</b>.
+</P>
+<P>
+The pattern is a C string terminated by a binary zero, and is passed in the
+<i>pattern</i> argument. A pointer to a single block of memory that is obtained
+via <b>pcre_malloc</b> is returned. This contains the compiled code and related
+data. The <b>pcre</b> type is defined for the returned block; this is a typedef
+for a structure whose contents are not externally defined. It is up to the
+caller to free the memory (via <b>pcre_free</b>) when it is no longer required.
+</P>
+<P>
+Although the compiled code of a PCRE regex is relocatable, that is, it does not
+depend on memory location, the complete <b>pcre</b> data block is not
+fully relocatable, because it may contain a copy of the <i>tableptr</i>
+argument, which is an address (see below).
+</P>
+<P>
+The <i>options</i> argument contains various bit settings that affect the
+compilation. It should be zero if no options are required. The available
+options are described below. Some of them (in particular, those that are
+compatible with Perl, but some others as well) can also be set and unset from
+within the pattern (see the detailed description in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation). For those options that can be different in different parts of
+the pattern, the contents of the <i>options</i> argument specifies their
+settings at the start of compilation and execution. The PCRE_ANCHORED,
+PCRE_BSR_<i>xxx</i>, PCRE_NEWLINE_<i>xxx</i>, PCRE_NO_UTF8_CHECK, and
+PCRE_NO_START_OPTIMIZE options can be set at the time of matching as well as at
+compile time.
+</P>
+<P>
+If <i>errptr</i> is NULL, <b>pcre_compile()</b> returns NULL immediately.
+Otherwise, if compilation of a pattern fails, <b>pcre_compile()</b> returns
+NULL, and sets the variable pointed to by <i>errptr</i> to point to a textual
+error message. This is a static string that is part of the library. You must
+not try to free it. Normally, the offset from the start of the pattern to the
+data unit that was being processed when the error was discovered is placed in
+the variable pointed to by <i>erroffset</i>, which must not be NULL (if it is,
+an immediate error is given). However, for an invalid UTF-8 or UTF-16 string,
+the offset is that of the first data unit of the failing character.
+</P>
+<P>
+Some errors are not detected until the whole pattern has been scanned; in these
+cases, the offset passed back is the length of the pattern. Note that the
+offset is in data units, not characters, even in a UTF mode. It may sometimes
+point into the middle of a UTF-8 or UTF-16 character.
+</P>
+<P>
+If <b>pcre_compile2()</b> is used instead of <b>pcre_compile()</b>, and the
+<i>errorcodeptr</i> argument is not NULL, a non-zero error code number is
+returned via this argument in the event of an error. This is in addition to the
+textual error message. Error codes and messages are listed below.
+</P>
+<P>
+If the final argument, <i>tableptr</i>, is NULL, PCRE uses a default set of
+character tables that are built when PCRE is compiled, using the default C
+locale. Otherwise, <i>tableptr</i> must be an address that is the result of a
+call to <b>pcre_maketables()</b>. This value is stored with the compiled
+pattern, and used again by <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b> when the
+pattern is matched. For more discussion, see the section on locale support
+below.
+</P>
+<P>
+This code fragment shows a typical straightforward call to <b>pcre_compile()</b>:
+<pre>
+ pcre *re;
+ const char *error;
+ int erroffset;
+ re = pcre_compile(
+ "^A.*Z", /* the pattern */
+ 0, /* default options */
+ &error, /* for error message */
+ &erroffset, /* for error offset */
+ NULL); /* use default character tables */
+</pre>
+The following names for option bits are defined in the <b>pcre.h</b> header
+file:
+<pre>
+ PCRE_ANCHORED
+</pre>
+If this bit is set, the pattern is forced to be "anchored", that is, it is
+constrained to match only at the first matching point in the string that is
+being searched (the "subject string"). This effect can also be achieved by
+appropriate constructs in the pattern itself, which is the only way to do it in
+Perl.
+<pre>
+ PCRE_AUTO_CALLOUT
+</pre>
+If this bit is set, <b>pcre_compile()</b> automatically inserts callout items,
+all with number 255, before each pattern item. For discussion of the callout
+facility, see the
+<a href="pcrecallout.html"><b>pcrecallout</b></a>
+documentation.
+<pre>
+ PCRE_BSR_ANYCRLF
+ PCRE_BSR_UNICODE
+</pre>
+These options (which are mutually exclusive) control what the \R escape
+sequence matches. The choice is either to match only CR, LF, or CRLF, or to
+match any Unicode newline sequence. The default is specified when PCRE is
+built. It can be overridden from within the pattern, or by setting an option
+when a compiled pattern is matched.
+<pre>
+ PCRE_CASELESS
+</pre>
+If this bit is set, letters in the pattern match both upper and lower case
+letters. It is equivalent to Perl's /i option, and it can be changed within a
+pattern by a (?i) option setting. In UTF-8 mode, PCRE always understands the
+concept of case for characters whose values are less than 128, so caseless
+matching is always possible. For characters with higher values, the concept of
+case is supported if PCRE is compiled with Unicode property support, but not
+otherwise. If you want to use caseless matching for characters 128 and above,
+you must ensure that PCRE is compiled with Unicode property support as well as
+with UTF-8 support.
+<pre>
+ PCRE_DOLLAR_ENDONLY
+</pre>
+If this bit is set, a dollar metacharacter in the pattern matches only at the
+end of the subject string. Without this option, a dollar also matches
+immediately before a newline at the end of the string (but not before any other
+newlines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set.
+There is no equivalent to this option in Perl, and no way to set it within a
+pattern.
+<pre>
+ PCRE_DOTALL
+</pre>
+If this bit is set, a dot metacharacter in the pattern matches a character of
+any value, including one that indicates a newline. However, it only ever
+matches one character, even if newlines are coded as CRLF. Without this option,
+a dot does not match when the current position is at a newline. This option is
+equivalent to Perl's /s option, and it can be changed within a pattern by a
+(?s) option setting. A negative class such as [^a] always matches newline
+characters, independent of the setting of this option.
+<pre>
+ PCRE_DUPNAMES
+</pre>
+If this bit is set, names used to identify capturing subpatterns need not be
+unique. This can be helpful for certain types of pattern when it is known that
+only one instance of the named subpattern can ever be matched. There are more
+details of named subpatterns below; see also the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation.
+<pre>
+ PCRE_EXTENDED
+</pre>
+If this bit is set, most white space characters in the pattern are totally
+ignored except when escaped or inside a character class. However, white space
+is not allowed within sequences such as (?&#62; that introduce various
+parenthesized subpatterns, nor within a numerical quantifier such as {1,3}.
+However, ignorable white space is permitted between an item and a following
+quantifier and between a quantifier and a following + that indicates
+possessiveness.
+</P>
+<P>
+White space did not used to include the VT character (code 11), because Perl
+did not treat this character as white space. However, Perl changed at release
+5.18, so PCRE followed at release 8.34, and VT is now treated as white space.
+</P>
+<P>
+PCRE_EXTENDED also causes characters between an unescaped # outside a character
+class and the next newline, inclusive, to be ignored. PCRE_EXTENDED is
+equivalent to Perl's /x option, and it can be changed within a pattern by a
+(?x) option setting.
+</P>
+<P>
+Which characters are interpreted as newlines is controlled by the options
+passed to <b>pcre_compile()</b> or by a special sequence at the start of the
+pattern, as described in the section entitled
+<a href="pcrepattern.html#newlines">"Newline conventions"</a>
+in the <b>pcrepattern</b> documentation. Note that the end of this type of
+comment is a literal newline sequence in the pattern; escape sequences that
+happen to represent a newline do not count.
+</P>
+<P>
+This option makes it possible to include comments inside complicated patterns.
+Note, however, that this applies only to data characters. White space characters
+may never appear within special character sequences in a pattern, for example
+within the sequence (?( that introduces a conditional subpattern.
+<pre>
+ PCRE_EXTRA
+</pre>
+This option was invented in order to turn on additional functionality of PCRE
+that is incompatible with Perl, but it is currently of very little use. When
+set, any backslash in a pattern that is followed by a letter that has no
+special meaning causes an error, thus reserving these combinations for future
+expansion. By default, as in Perl, a backslash followed by a letter with no
+special meaning is treated as a literal. (Perl can, however, be persuaded to
+give an error for this, by running it with the -w option.) There are at present
+no other features controlled by this option. It can also be set by a (?X)
+option setting within a pattern.
+<pre>
+ PCRE_FIRSTLINE
+</pre>
+If this option is set, an unanchored pattern is required to match before or at
+the first newline in the subject string, though the matched text may continue
+over the newline.
+<pre>
+ PCRE_JAVASCRIPT_COMPAT
+</pre>
+If this option is set, PCRE's behaviour is changed in some ways so that it is
+compatible with JavaScript rather than Perl. The changes are as follows:
+</P>
+<P>
+(1) A lone closing square bracket in a pattern causes a compile-time error,
+because this is illegal in JavaScript (by default it is treated as a data
+character). Thus, the pattern AB]CD becomes illegal when this option is set.
+</P>
+<P>
+(2) At run time, a back reference to an unset subpattern group matches an empty
+string (by default this causes the current matching alternative to fail). A
+pattern such as (\1)(a) succeeds when this option is set (assuming it can find
+an "a" in the subject), whereas it fails by default, for Perl compatibility.
+</P>
+<P>
+(3) \U matches an upper case "U" character; by default \U causes a compile
+time error (Perl uses \U to upper case subsequent characters).
+</P>
+<P>
+(4) \u matches a lower case "u" character unless it is followed by four
+hexadecimal digits, in which case the hexadecimal number defines the code point
+to match. By default, \u causes a compile time error (Perl uses it to upper
+case the following character).
+</P>
+<P>
+(5) \x matches a lower case "x" character unless it is followed by two
+hexadecimal digits, in which case the hexadecimal number defines the code point
+to match. By default, as in Perl, a hexadecimal number is always expected after
+\x, but it may have zero, one, or two digits (so, for example, \xz matches a
+binary zero character followed by z).
+<pre>
+ PCRE_MULTILINE
+</pre>
+By default, for the purposes of matching "start of line" and "end of line",
+PCRE treats the subject string as consisting of a single line of characters,
+even if it actually contains newlines. The "start of line" metacharacter (^)
+matches only at the start of the string, and the "end of line" metacharacter
+($) matches only at the end of the string, or before a terminating newline
+(except when PCRE_DOLLAR_ENDONLY is set). Note, however, that unless
+PCRE_DOTALL is set, the "any character" metacharacter (.) does not match at a
+newline. This behaviour (for ^, $, and dot) is the same as Perl.
+</P>
+<P>
+When PCRE_MULTILINE it is set, the "start of line" and "end of line" constructs
+match immediately following or immediately before internal newlines in the
+subject string, respectively, as well as at the very start and end. This is
+equivalent to Perl's /m option, and it can be changed within a pattern by a
+(?m) option setting. If there are no newlines in a subject string, or no
+occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE has no effect.
+<pre>
+ PCRE_NEVER_UTF
+</pre>
+This option locks out interpretation of the pattern as UTF-8 (or UTF-16 or
+UTF-32 in the 16-bit and 32-bit libraries). In particular, it prevents the
+creator of the pattern from switching to UTF interpretation by starting the
+pattern with (*UTF). This may be useful in applications that process patterns
+from external sources. The combination of PCRE_UTF8 and PCRE_NEVER_UTF also
+causes an error.
+<pre>
+ PCRE_NEWLINE_CR
+ PCRE_NEWLINE_LF
+ PCRE_NEWLINE_CRLF
+ PCRE_NEWLINE_ANYCRLF
+ PCRE_NEWLINE_ANY
+</pre>
+These options override the default newline definition that was chosen when PCRE
+was built. Setting the first or the second specifies that a newline is
+indicated by a single character (CR or LF, respectively). Setting
+PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character
+CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies that any of the three
+preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies
+that any Unicode newline sequence should be recognized.
+</P>
+<P>
+In an ASCII/Unicode environment, the Unicode newline sequences are the three
+just mentioned, plus the single characters VT (vertical tab, U+000B), FF (form
+feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
+(paragraph separator, U+2029). For the 8-bit library, the last two are
+recognized only in UTF-8 mode.
+</P>
+<P>
+When PCRE is compiled to run in an EBCDIC (mainframe) environment, the code for
+CR is 0x0d, the same as ASCII. However, the character code for LF is normally
+0x15, though in some EBCDIC environments 0x25 is used. Whichever of these is
+not LF is made to correspond to Unicode's NEL character. EBCDIC codes are all
+less than 256. For more details, see the
+<a href="pcrebuild.html"><b>pcrebuild</b></a>
+documentation.
+</P>
+<P>
+The newline setting in the options word uses three bits that are treated
+as a number, giving eight possibilities. Currently only six are used (default
+plus the five values above). This means that if you set more than one newline
+option, the combination may or may not be sensible. For example,
+PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but
+other combinations may yield unused numbers and cause an error.
+</P>
+<P>
+The only time that a line break in a pattern is specially recognized when
+compiling is when PCRE_EXTENDED is set. CR and LF are white space characters,
+and so are ignored in this mode. Also, an unescaped # outside a character class
+indicates a comment that lasts until after the next line break sequence. In
+other circumstances, line break sequences in patterns are treated as literal
+data.
+</P>
+<P>
+The newline option that is set at compile time becomes the default that is used
+for <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>, but it can be overridden.
+<pre>
+ PCRE_NO_AUTO_CAPTURE
+</pre>
+If this option is set, it disables the use of numbered capturing parentheses in
+the pattern. Any opening parenthesis that is not followed by ? behaves as if it
+were followed by ?: but named parentheses can still be used for capturing (and
+they acquire numbers in the usual way). There is no equivalent of this option
+in Perl.
+<pre>
+ PCRE_NO_AUTO_POSSESS
+</pre>
+If this option is set, it disables "auto-possessification". This is an
+optimization that, for example, turns a+b into a++b in order to avoid
+backtracks into a+ that can never be successful. However, if callouts are in
+use, auto-possessification means that some of them are never taken. You can set
+this option if you want the matching functions to do a full unoptimized search
+and run all the callouts, but it is mainly provided for testing purposes.
+<pre>
+ PCRE_NO_START_OPTIMIZE
+</pre>
+This is an option that acts at matching time; that is, it is really an option
+for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>. If it is set at compile time,
+it is remembered with the compiled pattern and assumed at matching time. This
+is necessary if you want to use JIT execution, because the JIT compiler needs
+to know whether or not this option is set. For details see the discussion of
+PCRE_NO_START_OPTIMIZE
+<a href="#execoptions">below.</a>
+<pre>
+ PCRE_UCP
+</pre>
+This option changes the way PCRE processes \B, \b, \D, \d, \S, \s, \W,
+\w, and some of the POSIX character classes. By default, only ASCII characters
+are recognized, but if PCRE_UCP is set, Unicode properties are used instead to
+classify characters. More details are given in the section on
+<a href="pcre.html#genericchartypes">generic character types</a>
+in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+page. If you set PCRE_UCP, matching one of the items it affects takes much
+longer. The option is available only if PCRE has been compiled with Unicode
+property support.
+<pre>
+ PCRE_UNGREEDY
+</pre>
+This option inverts the "greediness" of the quantifiers so that they are not
+greedy by default, but become greedy if followed by "?". It is not compatible
+with Perl. It can also be set by a (?U) option setting within the pattern.
+<pre>
+ PCRE_UTF8
+</pre>
+This option causes PCRE to regard both the pattern and the subject as strings
+of UTF-8 characters instead of single-byte strings. However, it is available
+only when PCRE is built to include UTF support. If not, the use of this option
+provokes an error. Details of how this option changes the behaviour of PCRE are
+given in the
+<a href="pcreunicode.html"><b>pcreunicode</b></a>
+page.
+<pre>
+ PCRE_NO_UTF8_CHECK
+</pre>
+When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
+automatically checked. There is a discussion about the
+<a href="pcreunicode.html#utf8strings">validity of UTF-8 strings</a>
+in the
+<a href="pcreunicode.html"><b>pcreunicode</b></a>
+page. If an invalid UTF-8 sequence is found, <b>pcre_compile()</b> returns an
+error. If you already know that your pattern is valid, and you want to skip
+this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option.
+When it is set, the effect of passing an invalid UTF-8 string as a pattern is
+undefined. It may cause your program to crash or loop. Note that this option
+can also be passed to <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>, to suppress
+the validity checking of subject strings only. If the same string is being
+matched many times, the option can be safely set for the second and subsequent
+matchings to improve performance.
+</P>
+<br><a name="SEC12" href="#TOC1">COMPILATION ERROR CODES</a><br>
+<P>
+The following table lists the error codes than may be returned by
+<b>pcre_compile2()</b>, along with the error messages that may be returned by
+both compiling functions. Note that error messages are always 8-bit ASCII
+strings, even in 16-bit or 32-bit mode. As PCRE has developed, some error codes
+have fallen out of use. To avoid confusion, they have not been re-used.
+<pre>
+ 0 no error
+ 1 \ at end of pattern
+ 2 \c at end of pattern
+ 3 unrecognized character follows \
+ 4 numbers out of order in {} quantifier
+ 5 number too big in {} quantifier
+ 6 missing terminating ] for character class
+ 7 invalid escape sequence in character class
+ 8 range out of order in character class
+ 9 nothing to repeat
+ 10 [this code is not in use]
+ 11 internal error: unexpected repeat
+ 12 unrecognized character after (? or (?-
+ 13 POSIX named classes are supported only within a class
+ 14 missing )
+ 15 reference to non-existent subpattern
+ 16 erroffset passed as NULL
+ 17 unknown option bit(s) set
+ 18 missing ) after comment
+ 19 [this code is not in use]
+ 20 regular expression is too large
+ 21 failed to get memory
+ 22 unmatched parentheses
+ 23 internal error: code overflow
+ 24 unrecognized character after (?&#60;
+ 25 lookbehind assertion is not fixed length
+ 26 malformed number or name after (?(
+ 27 conditional group contains more than two branches
+ 28 assertion expected after (?(
+ 29 (?R or (?[+-]digits must be followed by )
+ 30 unknown POSIX class name
+ 31 POSIX collating elements are not supported
+ 32 this version of PCRE is compiled without UTF support
+ 33 [this code is not in use]
+ 34 character value in \x{} or \o{} is too large
+ 35 invalid condition (?(0)
+ 36 \C not allowed in lookbehind assertion
+ 37 PCRE does not support \L, \l, \N{name}, \U, or \u
+ 38 number after (?C is &#62; 255
+ 39 closing ) for (?C expected
+ 40 recursive call could loop indefinitely
+ 41 unrecognized character after (?P
+ 42 syntax error in subpattern name (missing terminator)
+ 43 two named subpatterns have the same name
+ 44 invalid UTF-8 string (specifically UTF-8)
+ 45 support for \P, \p, and \X has not been compiled
+ 46 malformed \P or \p sequence
+ 47 unknown property name after \P or \p
+ 48 subpattern name is too long (maximum 32 characters)
+ 49 too many named subpatterns (maximum 10000)
+ 50 [this code is not in use]
+ 51 octal value is greater than \377 in 8-bit non-UTF-8 mode
+ 52 internal error: overran compiling workspace
+ 53 internal error: previously-checked referenced subpattern
+ not found
+ 54 DEFINE group contains more than one branch
+ 55 repeating a DEFINE group is not allowed
+ 56 inconsistent NEWLINE options
+ 57 \g is not followed by a braced, angle-bracketed, or quoted
+ name/number or by a plain number
+ 58 a numbered reference must not be zero
+ 59 an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)
+ 60 (*VERB) not recognized or malformed
+ 61 number is too big
+ 62 subpattern name expected
+ 63 digit expected after (?+
+ 64 ] is an invalid data character in JavaScript compatibility mode
+ 65 different names for subpatterns of the same number are
+ not allowed
+ 66 (*MARK) must have an argument
+ 67 this version of PCRE is not compiled with Unicode property
+ support
+ 68 \c must be followed by an ASCII character
+ 69 \k is not followed by a braced, angle-bracketed, or quoted name
+ 70 internal error: unknown opcode in find_fixedlength()
+ 71 \N is not supported in a class
+ 72 too many forward references
+ 73 disallowed Unicode code point (&#62;= 0xd800 && &#60;= 0xdfff)
+ 74 invalid UTF-16 string (specifically UTF-16)
+ 75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
+ 76 character value in \u.... sequence is too large
+ 77 invalid UTF-32 string (specifically UTF-32)
+ 78 setting UTF is disabled by the application
+ 79 non-hex character in \x{} (closing brace missing?)
+ 80 non-octal character in \o{} (closing brace missing?)
+ 81 missing opening brace after \o
+ 82 parentheses are too deeply nested
+ 83 invalid range in character class
+ 84 group name must start with a non-digit
+ 85 parentheses are too deeply nested (stack check)
+</pre>
+The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
+be used if the limits were changed when PCRE was built.
+<a name="studyingapattern"></a></P>
+<br><a name="SEC13" href="#TOC1">STUDYING A PATTERN</a><br>
+<P>
+<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
+<b> const char **<i>errptr</i>);</b>
+</P>
+<P>
+If a compiled pattern is going to be used several times, it is worth spending
+more time analyzing it in order to speed up the time taken for matching. The
+function <b>pcre_study()</b> takes a pointer to a compiled pattern as its first
+argument. If studying the pattern produces additional information that will
+help speed up matching, <b>pcre_study()</b> returns a pointer to a
+<b>pcre_extra</b> block, in which the <i>study_data</i> field points to the
+results of the study.
+</P>
+<P>
+The returned value from <b>pcre_study()</b> can be passed directly to
+<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>. However, a <b>pcre_extra</b> block
+also contains other fields that can be set by the caller before the block is
+passed; these are described
+<a href="#extradata">below</a>
+in the section on matching a pattern.
+</P>
+<P>
+If studying the pattern does not produce any useful information,
+<b>pcre_study()</b> returns NULL by default. In that circumstance, if the
+calling program wants to pass any of the other fields to <b>pcre_exec()</b> or
+<b>pcre_dfa_exec()</b>, it must set up its own <b>pcre_extra</b> block. However,
+if <b>pcre_study()</b> is called with the PCRE_STUDY_EXTRA_NEEDED option, it
+returns a <b>pcre_extra</b> block even if studying did not find any additional
+information. It may still return NULL, however, if an error occurs in
+<b>pcre_study()</b>.
+</P>
+<P>
+The second argument of <b>pcre_study()</b> contains option bits. There are three
+further options in addition to PCRE_STUDY_EXTRA_NEEDED:
+<pre>
+ PCRE_STUDY_JIT_COMPILE
+ PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
+ PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
+</pre>
+If any of these are set, and the just-in-time compiler is available, the
+pattern is further compiled into machine code that executes much faster than
+the <b>pcre_exec()</b> interpretive matching function. If the just-in-time
+compiler is not available, these options are ignored. All undefined bits in the
+<i>options</i> argument must be zero.
+</P>
+<P>
+JIT compilation is a heavyweight optimization. It can take some time for
+patterns to be analyzed, and for one-off matches and simple patterns the
+benefit of faster execution might be offset by a much slower study time.
+Not all patterns can be optimized by the JIT compiler. For those that cannot be
+handled, matching automatically falls back to the <b>pcre_exec()</b>
+interpreter. For more details, see the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+documentation.
+</P>
+<P>
+The third argument for <b>pcre_study()</b> is a pointer for an error message. If
+studying succeeds (even if no data is returned), the variable it points to is
+set to NULL. Otherwise it is set to point to a textual error message. This is a
+static string that is part of the library. You must not try to free it. You
+should test the error pointer for NULL after calling <b>pcre_study()</b>, to be
+sure that it has run successfully.
+</P>
+<P>
+When you are finished with a pattern, you can free the memory used for the
+study data by calling <b>pcre_free_study()</b>. This function was added to the
+API for release 8.20. For earlier versions, the memory could be freed with
+<b>pcre_free()</b>, just like the pattern itself. This will still work in cases
+where JIT optimization is not used, but it is advisable to change to the new
+function when convenient.
+</P>
+<P>
+This is a typical way in which <b>pcre_study</b>() is used (except that in a
+real application there should be tests for errors):
+<pre>
+ int rc;
+ pcre *re;
+ pcre_extra *sd;
+ re = pcre_compile("pattern", 0, &error, &erroroffset, NULL);
+ sd = pcre_study(
+ re, /* result of pcre_compile() */
+ 0, /* no options */
+ &error); /* set to NULL or points to a message */
+ rc = pcre_exec( /* see below for details of pcre_exec() options */
+ re, sd, "subject", 7, 0, 0, ovector, 30);
+ ...
+ pcre_free_study(sd);
+ pcre_free(re);
+</pre>
+Studying a pattern does two things: first, a lower bound for the length of
+subject string that is needed to match the pattern is computed. This does not
+mean that there are any strings of that length that match, but it does
+guarantee that no shorter strings match. The value is used to avoid wasting
+time by trying to match strings that are shorter than the lower bound. You can
+find out the value in a calling program via the <b>pcre_fullinfo()</b> function.
+</P>
+<P>
+Studying a pattern is also useful for non-anchored patterns that do not have a
+single fixed starting character. A bitmap of possible starting bytes is
+created. This speeds up finding a position in the subject at which to start
+matching. (In 16-bit mode, the bitmap is used for 16-bit values less than 256.
+In 32-bit mode, the bitmap is used for 32-bit values less than 256.)
+</P>
+<P>
+These two optimizations apply to both <b>pcre_exec()</b> and
+<b>pcre_dfa_exec()</b>, and the information is also used by the JIT compiler.
+The optimizations can be disabled by setting the PCRE_NO_START_OPTIMIZE option.
+You might want to do this if your pattern contains callouts or (*MARK) and you
+want to make use of these facilities in cases where matching fails.
+</P>
+<P>
+PCRE_NO_START_OPTIMIZE can be specified at either compile time or execution
+time. However, if PCRE_NO_START_OPTIMIZE is passed to <b>pcre_exec()</b>, (that
+is, after any JIT compilation has happened) JIT execution is disabled. For JIT
+execution to work with PCRE_NO_START_OPTIMIZE, the option must be set at
+compile time.
+</P>
+<P>
+There is a longer discussion of PCRE_NO_START_OPTIMIZE
+<a href="#execoptions">below.</a>
+<a name="localesupport"></a></P>
+<br><a name="SEC14" href="#TOC1">LOCALE SUPPORT</a><br>
+<P>
+PCRE handles caseless matching, and determines whether characters are letters,
+digits, or whatever, by reference to a set of tables, indexed by character
+code point. When running in UTF-8 mode, or in the 16- or 32-bit libraries, this
+applies only to characters with code points less than 256. By default,
+higher-valued code points never match escapes such as \w or \d. However, if
+PCRE is built with Unicode property support, all characters can be tested with
+\p and \P, or, alternatively, the PCRE_UCP option can be set when a pattern
+is compiled; this causes \w and friends to use Unicode property support
+instead of the built-in tables.
+</P>
+<P>
+The use of locales with Unicode is discouraged. If you are handling characters
+with code points greater than 128, you should either use Unicode support, or
+use locales, but not try to mix the two.
+</P>
+<P>
+PCRE contains an internal set of tables that are used when the final argument
+of <b>pcre_compile()</b> is NULL. These are sufficient for many applications.
+Normally, the internal tables recognize only ASCII characters. However, when
+PCRE is built, it is possible to cause the internal tables to be rebuilt in the
+default "C" locale of the local system, which may cause them to be different.
+</P>
+<P>
+The internal tables can always be overridden by tables supplied by the
+application that calls PCRE. These may be created in a different locale from
+the default. As more and more applications change to using Unicode, the need
+for this locale support is expected to die away.
+</P>
+<P>
+External tables are built by calling the <b>pcre_maketables()</b> function,
+which has no arguments, in the relevant locale. The result can then be passed
+to <b>pcre_compile()</b> as often as necessary. For example, to build and use
+tables that are appropriate for the French locale (where accented characters
+with values greater than 128 are treated as letters), the following code could
+be used:
+<pre>
+ setlocale(LC_CTYPE, "fr_FR");
+ tables = pcre_maketables();
+ re = pcre_compile(..., tables);
+</pre>
+The locale name "fr_FR" is used on Linux and other Unix-like systems; if you
+are using Windows, the name for the French locale is "french".
+</P>
+<P>
+When <b>pcre_maketables()</b> runs, the tables are built in memory that is
+obtained via <b>pcre_malloc</b>. It is the caller's responsibility to ensure
+that the memory containing the tables remains available for as long as it is
+needed.
+</P>
+<P>
+The pointer that is passed to <b>pcre_compile()</b> is saved with the compiled
+pattern, and the same tables are used via this pointer by <b>pcre_study()</b>
+and also by <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>. Thus, for any single
+pattern, compilation, studying and matching all happen in the same locale, but
+different patterns can be processed in different locales.
+</P>
+<P>
+It is possible to pass a table pointer or NULL (indicating the use of the
+internal tables) to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> (see the
+discussion below in the section on matching a pattern). This facility is
+provided for use with pre-compiled patterns that have been saved and reloaded.
+Character tables are not saved with patterns, so if a non-standard table was
+used at compile time, it must be provided again when the reloaded pattern is
+matched. Attempting to use this facility to match a pattern in a different
+locale from the one in which it was compiled is likely to lead to anomalous
+(usually incorrect) results.
+<a name="infoaboutpattern"></a></P>
+<br><a name="SEC15" href="#TOC1">INFORMATION ABOUT A PATTERN</a><br>
+<P>
+<b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> int <i>what</i>, void *<i>where</i>);</b>
+</P>
+<P>
+The <b>pcre_fullinfo()</b> function returns information about a compiled
+pattern. It replaces the <b>pcre_info()</b> function, which was removed from the
+library at version 8.30, after more than 10 years of obsolescence.
+</P>
+<P>
+The first argument for <b>pcre_fullinfo()</b> is a pointer to the compiled
+pattern. The second argument is the result of <b>pcre_study()</b>, or NULL if
+the pattern was not studied. The third argument specifies which piece of
+information is required, and the fourth argument is a pointer to a variable
+to receive the data. The yield of the function is zero for success, or one of
+the following negative numbers:
+<pre>
+ PCRE_ERROR_NULL the argument <i>code</i> was NULL
+ the argument <i>where</i> was NULL
+ PCRE_ERROR_BADMAGIC the "magic number" was not found
+ PCRE_ERROR_BADENDIANNESS the pattern was compiled with different
+ endianness
+ PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
+ PCRE_ERROR_UNSET the requested field is not set
+</pre>
+The "magic number" is placed at the start of each compiled pattern as an simple
+check against passing an arbitrary memory pointer. The endianness error can
+occur if a compiled pattern is saved and reloaded on a different host. Here is
+a typical call of <b>pcre_fullinfo()</b>, to obtain the length of the compiled
+pattern:
+<pre>
+ int rc;
+ size_t length;
+ rc = pcre_fullinfo(
+ re, /* result of pcre_compile() */
+ sd, /* result of pcre_study(), or NULL */
+ PCRE_INFO_SIZE, /* what is required */
+ &length); /* where to put the data */
+</pre>
+The possible values for the third argument are defined in <b>pcre.h</b>, and are
+as follows:
+<pre>
+ PCRE_INFO_BACKREFMAX
+</pre>
+Return the number of the highest back reference in the pattern. The fourth
+argument should point to an <b>int</b> variable. Zero is returned if there are
+no back references.
+<pre>
+ PCRE_INFO_CAPTURECOUNT
+</pre>
+Return the number of capturing subpatterns in the pattern. The fourth argument
+should point to an <b>int</b> variable.
+<pre>
+ PCRE_INFO_DEFAULT_TABLES
+</pre>
+Return a pointer to the internal default character tables within PCRE. The
+fourth argument should point to an <b>unsigned char *</b> variable. This
+information call is provided for internal use by the <b>pcre_study()</b>
+function. External callers can cause PCRE to use its internal tables by passing
+a NULL table pointer.
+<pre>
+ PCRE_INFO_FIRSTBYTE (deprecated)
+</pre>
+Return information about the first data unit of any matched string, for a
+non-anchored pattern. The name of this option refers to the 8-bit library,
+where data units are bytes. The fourth argument should point to an <b>int</b>
+variable. Negative values are used for special cases. However, this means that
+when the 32-bit library is in non-UTF-32 mode, the full 32-bit range of
+characters cannot be returned. For this reason, this value is deprecated; use
+PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER instead.
+</P>
+<P>
+If there is a fixed first value, for example, the letter "c" from a pattern
+such as (cat|cow|coyote), its value is returned. In the 8-bit library, the
+value is always less than 256. In the 16-bit library the value can be up to
+0xffff. In the 32-bit library the value can be up to 0x10ffff.
+</P>
+<P>
+If there is no fixed first value, and if either
+<br>
+<br>
+(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
+starts with "^", or
+<br>
+<br>
+(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
+(if it were set, the pattern would be anchored),
+<br>
+<br>
+-1 is returned, indicating that the pattern matches only at the start of a
+subject string or after any newline within the string. Otherwise -2 is
+returned. For anchored patterns, -2 is returned.
+<pre>
+ PCRE_INFO_FIRSTCHARACTER
+</pre>
+Return the value of the first data unit (non-UTF character) of any matched
+string in the situation where PCRE_INFO_FIRSTCHARACTERFLAGS returns 1;
+otherwise return 0. The fourth argument should point to an <b>uint_t</b>
+variable.
+</P>
+<P>
+In the 8-bit library, the value is always less than 256. In the 16-bit library
+the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
+can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
+<pre>
+ PCRE_INFO_FIRSTCHARACTERFLAGS
+</pre>
+Return information about the first data unit of any matched string, for a
+non-anchored pattern. The fourth argument should point to an <b>int</b>
+variable.
+</P>
+<P>
+If there is a fixed first value, for example, the letter "c" from a pattern
+such as (cat|cow|coyote), 1 is returned, and the character value can be
+retrieved using PCRE_INFO_FIRSTCHARACTER. If there is no fixed first value, and
+if either
+<br>
+<br>
+(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
+starts with "^", or
+<br>
+<br>
+(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
+(if it were set, the pattern would be anchored),
+<br>
+<br>
+2 is returned, indicating that the pattern matches only at the start of a
+subject string or after any newline within the string. Otherwise 0 is
+returned. For anchored patterns, 0 is returned.
+<pre>
+ PCRE_INFO_FIRSTTABLE
+</pre>
+If the pattern was studied, and this resulted in the construction of a 256-bit
+table indicating a fixed set of values for the first data unit in any matching
+string, a pointer to the table is returned. Otherwise NULL is returned. The
+fourth argument should point to an <b>unsigned char *</b> variable.
+<pre>
+ PCRE_INFO_HASCRORLF
+</pre>
+Return 1 if the pattern contains any explicit matches for CR or LF characters,
+otherwise 0. The fourth argument should point to an <b>int</b> variable. An
+explicit match is either a literal CR or LF character, or \r or \n.
+<pre>
+ PCRE_INFO_JCHANGED
+</pre>
+Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise
+0. The fourth argument should point to an <b>int</b> variable. (?J) and
+(?-J) set and unset the local PCRE_DUPNAMES option, respectively.
+<pre>
+ PCRE_INFO_JIT
+</pre>
+Return 1 if the pattern was studied with one of the JIT options, and
+just-in-time compiling was successful. The fourth argument should point to an
+<b>int</b> variable. A return value of 0 means that JIT support is not available
+in this version of PCRE, or that the pattern was not studied with a JIT option,
+or that the JIT compiler could not handle this particular pattern. See the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+documentation for details of what can and cannot be handled.
+<pre>
+ PCRE_INFO_JITSIZE
+</pre>
+If the pattern was successfully studied with a JIT option, return the size of
+the JIT compiled code, otherwise return zero. The fourth argument should point
+to a <b>size_t</b> variable.
+<pre>
+ PCRE_INFO_LASTLITERAL
+</pre>
+Return the value of the rightmost literal data unit that must exist in any
+matched string, other than at its start, if such a value has been recorded. The
+fourth argument should point to an <b>int</b> variable. If there is no such
+value, -1 is returned. For anchored patterns, a last literal value is recorded
+only if it follows something of variable length. For example, for the pattern
+/^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/ the returned value
+is -1.
+</P>
+<P>
+Since for the 32-bit library using the non-UTF-32 mode, this function is unable
+to return the full 32-bit range of characters, this value is deprecated;
+instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should
+be used.
+<pre>
+ PCRE_INFO_MATCH_EMPTY
+</pre>
+Return 1 if the pattern can match an empty string, otherwise 0. The fourth
+argument should point to an <b>int</b> variable.
+<pre>
+ PCRE_INFO_MATCHLIMIT
+</pre>
+If the pattern set a match limit by including an item of the form
+(*LIMIT_MATCH=nnnn) at the start, the value is returned. The fourth argument
+should point to an unsigned 32-bit integer. If no such value has been set, the
+call to <b>pcre_fullinfo()</b> returns the error PCRE_ERROR_UNSET.
+<pre>
+ PCRE_INFO_MAXLOOKBEHIND
+</pre>
+Return the number of characters (NB not data units) in the longest lookbehind
+assertion in the pattern. This information is useful when doing multi-segment
+matching using the partial matching facilities. Note that the simple assertions
+\b and \B require a one-character lookbehind. \A also registers a
+one-character lookbehind, though it does not actually inspect the previous
+character. This is to ensure that at least one character from the old segment
+is retained when a new segment is processed. Otherwise, if there are no
+lookbehinds in the pattern, \A might match incorrectly at the start of a new
+segment.
+<pre>
+ PCRE_INFO_MINLENGTH
+</pre>
+If the pattern was studied and a minimum length for matching subject strings
+was computed, its value is returned. Otherwise the returned value is -1. The
+value is a number of characters, which in UTF mode may be different from the
+number of data units. The fourth argument should point to an <b>int</b>
+variable. A non-negative value is a lower bound to the length of any matching
+string. There may not be any strings of that length that do actually match, but
+every string that does match is at least that long.
+<pre>
+ PCRE_INFO_NAMECOUNT
+ PCRE_INFO_NAMEENTRYSIZE
+ PCRE_INFO_NAMETABLE
+</pre>
+PCRE supports the use of named as well as numbered capturing parentheses. The
+names are just an additional way of identifying the parentheses, which still
+acquire numbers. Several convenience functions such as
+<b>pcre_get_named_substring()</b> are provided for extracting captured
+substrings by name. It is also possible to extract the data directly, by first
+converting the name to a number in order to access the correct pointers in the
+output vector (described with <b>pcre_exec()</b> below). To do the conversion,
+you need to use the name-to-number map, which is described by these three
+values.
+</P>
+<P>
+The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives
+the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size of each
+entry; both of these return an <b>int</b> value. The entry size depends on the
+length of the longest name. PCRE_INFO_NAMETABLE returns a pointer to the first
+entry of the table. This is a pointer to <b>char</b> in the 8-bit library, where
+the first two bytes of each entry are the number of the capturing parenthesis,
+most significant byte first. In the 16-bit library, the pointer points to
+16-bit data units, the first of which contains the parenthesis number. In the
+32-bit library, the pointer points to 32-bit data units, the first of which
+contains the parenthesis number. The rest of the entry is the corresponding
+name, zero terminated.
+</P>
+<P>
+The names are in alphabetical order. If (?| is used to create multiple groups
+with the same number, as described in the
+<a href="pcrepattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
+in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+page, the groups may be given the same name, but there is only one entry in the
+table. Different names for groups of the same number are not permitted.
+Duplicate names for subpatterns with different numbers are permitted,
+but only if PCRE_DUPNAMES is set. They appear in the table in the order in
+which they were found in the pattern. In the absence of (?| this is the order
+of increasing number; when (?| is used this is not necessarily the case because
+later subpatterns may have lower numbers.
+</P>
+<P>
+As a simple example of the name/number table, consider the following pattern
+after compilation by the 8-bit library (assume PCRE_EXTENDED is set, so white
+space - including newlines - is ignored):
+<pre>
+ (?&#60;date&#62; (?&#60;year&#62;(\d\d)?\d\d) - (?&#60;month&#62;\d\d) - (?&#60;day&#62;\d\d) )
+</pre>
+There are four named subpatterns, so the table has four entries, and each entry
+in the table is eight bytes long. The table is as follows, with non-printing
+bytes shows in hexadecimal, and undefined bytes shown as ??:
+<pre>
+ 00 01 d a t e 00 ??
+ 00 05 d a y 00 ?? ??
+ 00 04 m o n t h 00
+ 00 02 y e a r 00 ??
+</pre>
+When writing code to extract data from named subpatterns using the
+name-to-number map, remember that the length of the entries is likely to be
+different for each compiled pattern.
+<pre>
+ PCRE_INFO_OKPARTIAL
+</pre>
+Return 1 if the pattern can be used for partial matching with
+<b>pcre_exec()</b>, otherwise 0. The fourth argument should point to an
+<b>int</b> variable. From release 8.00, this always returns 1, because the
+restrictions that previously applied to partial matching have been lifted. The
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+documentation gives details of partial matching.
+<pre>
+ PCRE_INFO_OPTIONS
+</pre>
+Return a copy of the options with which the pattern was compiled. The fourth
+argument should point to an <b>unsigned long int</b> variable. These option bits
+are those specified in the call to <b>pcre_compile()</b>, modified by any
+top-level option settings at the start of the pattern itself. In other words,
+they are the options that will be in force when matching starts. For example,
+if the pattern /(?im)abc(?-i)d/ is compiled with the PCRE_EXTENDED option, the
+result is PCRE_CASELESS, PCRE_MULTILINE, and PCRE_EXTENDED.
+</P>
+<P>
+A pattern is automatically anchored by PCRE if all of its top-level
+alternatives begin with one of the following:
+<pre>
+ ^ unless PCRE_MULTILINE is set
+ \A always
+ \G always
+ .* if PCRE_DOTALL is set and there are no back references to the subpattern in which .* appears
+</pre>
+For such patterns, the PCRE_ANCHORED bit is set in the options returned by
+<b>pcre_fullinfo()</b>.
+<pre>
+ PCRE_INFO_RECURSIONLIMIT
+</pre>
+If the pattern set a recursion limit by including an item of the form
+(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The fourth
+argument should point to an unsigned 32-bit integer. If no such value has been
+set, the call to <b>pcre_fullinfo()</b> returns the error PCRE_ERROR_UNSET.
+<pre>
+ PCRE_INFO_SIZE
+</pre>
+Return the size of the compiled pattern in bytes (for all three libraries). The
+fourth argument should point to a <b>size_t</b> variable. This value does not
+include the size of the <b>pcre</b> structure that is returned by
+<b>pcre_compile()</b>. The value that is passed as the argument to
+<b>pcre_malloc()</b> when <b>pcre_compile()</b> is getting memory in which to
+place the compiled data is the value returned by this option plus the size of
+the <b>pcre</b> structure. Studying a compiled pattern, with or without JIT,
+does not alter the value returned by this option.
+<pre>
+ PCRE_INFO_STUDYSIZE
+</pre>
+Return the size in bytes (for all three libraries) of the data block pointed to
+by the <i>study_data</i> field in a <b>pcre_extra</b> block. If <b>pcre_extra</b>
+is NULL, or there is no study data, zero is returned. The fourth argument
+should point to a <b>size_t</b> variable. The <i>study_data</i> field is set by
+<b>pcre_study()</b> to record information that will speed up matching (see the
+section entitled
+<a href="#studyingapattern">"Studying a pattern"</a>
+above). The format of the <i>study_data</i> block is private, but its length
+is made available via this option so that it can be saved and restored (see the
+<a href="pcreprecompile.html"><b>pcreprecompile</b></a>
+documentation for details).
+<pre>
+ PCRE_INFO_REQUIREDCHARFLAGS
+</pre>
+Returns 1 if there is a rightmost literal data unit that must exist in any
+matched string, other than at its start. The fourth argument should point to
+an <b>int</b> variable. If there is no such value, 0 is returned. If returning
+1, the character value itself can be retrieved using PCRE_INFO_REQUIREDCHAR.
+</P>
+<P>
+For anchored patterns, a last literal value is recorded only if it follows
+something of variable length. For example, for the pattern /^a\d+z\d+/ the
+returned value 1 (with "z" returned from PCRE_INFO_REQUIREDCHAR), but for
+/^a\dz\d/ the returned value is 0.
+<pre>
+ PCRE_INFO_REQUIREDCHAR
+</pre>
+Return the value of the rightmost literal data unit that must exist in any
+matched string, other than at its start, if such a value has been recorded. The
+fourth argument should point to an <b>uint32_t</b> variable. If there is no such
+value, 0 is returned.
+</P>
+<br><a name="SEC16" href="#TOC1">REFERENCE COUNTS</a><br>
+<P>
+<b>int pcre_refcount(pcre *<i>code</i>, int <i>adjust</i>);</b>
+</P>
+<P>
+The <b>pcre_refcount()</b> function is used to maintain a reference count in the
+data block that contains a compiled pattern. It is provided for the benefit of
+applications that operate in an object-oriented manner, where different parts
+of the application may be using the same compiled pattern, but you want to free
+the block when they are all done.
+</P>
+<P>
+When a pattern is compiled, the reference count field is initialized to zero.
+It is changed only by calling this function, whose action is to add the
+<i>adjust</i> value (which may be positive or negative) to it. The yield of the
+function is the new value. However, the value of the count is constrained to
+lie between 0 and 65535, inclusive. If the new value is outside these limits,
+it is forced to the appropriate limit value.
+</P>
+<P>
+Except when it is zero, the reference count is not correctly preserved if a
+pattern is compiled on one host and then transferred to a host whose byte-order
+is different. (This seems a highly unlikely scenario.)
+</P>
+<br><a name="SEC17" href="#TOC1">MATCHING A PATTERN: THE TRADITIONAL FUNCTION</a><br>
+<P>
+<b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
+</P>
+<P>
+The function <b>pcre_exec()</b> is called to match a subject string against a
+compiled pattern, which is passed in the <i>code</i> argument. If the
+pattern was studied, the result of the study should be passed in the
+<i>extra</i> argument. You can call <b>pcre_exec()</b> with the same <i>code</i>
+and <i>extra</i> arguments as many times as you like, in order to match
+different subject strings with the same pattern.
+</P>
+<P>
+This function is the main matching facility of the library, and it operates in
+a Perl-like manner. For specialist use there is also an alternative matching
+function, which is described
+<a href="#dfamatch">below</a>
+in the section about the <b>pcre_dfa_exec()</b> function.
+</P>
+<P>
+In most applications, the pattern will have been compiled (and optionally
+studied) in the same process that calls <b>pcre_exec()</b>. However, it is
+possible to save compiled patterns and study data, and then use them later
+in different processes, possibly even on different hosts. For a discussion
+about this, see the
+<a href="pcreprecompile.html"><b>pcreprecompile</b></a>
+documentation.
+</P>
+<P>
+Here is an example of a simple call to <b>pcre_exec()</b>:
+<pre>
+ int rc;
+ int ovector[30];
+ rc = pcre_exec(
+ re, /* result of pcre_compile() */
+ NULL, /* we didn't study the pattern */
+ "some string", /* the subject string */
+ 11, /* the length of the subject string */
+ 0, /* start at offset 0 in the subject */
+ 0, /* default options */
+ ovector, /* vector of integers for substring information */
+ 30); /* number of elements (NOT size in bytes) */
+<a name="extradata"></a></PRE>
+</P>
+<br><b>
+Extra data for <b>pcre_exec()</b>
+</b><br>
+<P>
+If the <i>extra</i> argument is not NULL, it must point to a <b>pcre_extra</b>
+data block. The <b>pcre_study()</b> function returns such a block (when it
+doesn't return NULL), but you can also create one for yourself, and pass
+additional information in it. The <b>pcre_extra</b> block contains the following
+fields (not necessarily in this order):
+<pre>
+ unsigned long int <i>flags</i>;
+ void *<i>study_data</i>;
+ void *<i>executable_jit</i>;
+ unsigned long int <i>match_limit</i>;
+ unsigned long int <i>match_limit_recursion</i>;
+ void *<i>callout_data</i>;
+ const unsigned char *<i>tables</i>;
+ unsigned char **<i>mark</i>;
+</pre>
+In the 16-bit version of this structure, the <i>mark</i> field has type
+"PCRE_UCHAR16 **".
+<br>
+<br>
+In the 32-bit version of this structure, the <i>mark</i> field has type
+"PCRE_UCHAR32 **".
+</P>
+<P>
+The <i>flags</i> field is used to specify which of the other fields are set. The
+flag bits are:
+<pre>
+ PCRE_EXTRA_CALLOUT_DATA
+ PCRE_EXTRA_EXECUTABLE_JIT
+ PCRE_EXTRA_MARK
+ PCRE_EXTRA_MATCH_LIMIT
+ PCRE_EXTRA_MATCH_LIMIT_RECURSION
+ PCRE_EXTRA_STUDY_DATA
+ PCRE_EXTRA_TABLES
+</pre>
+Other flag bits should be set to zero. The <i>study_data</i> field and sometimes
+the <i>executable_jit</i> field are set in the <b>pcre_extra</b> block that is
+returned by <b>pcre_study()</b>, together with the appropriate flag bits. You
+should not set these yourself, but you may add to the block by setting other
+fields and their corresponding flag bits.
+</P>
+<P>
+The <i>match_limit</i> field provides a means of preventing PCRE from using up a
+vast amount of resources when running patterns that are not going to match,
+but which have a very large number of possibilities in their search trees. The
+classic example is a pattern that uses nested unlimited repeats.
+</P>
+<P>
+Internally, <b>pcre_exec()</b> uses a function called <b>match()</b>, which it
+calls repeatedly (sometimes recursively). The limit set by <i>match_limit</i> is
+imposed on the number of times this function is called during a match, which
+has the effect of limiting the amount of backtracking that can take place. For
+patterns that are not anchored, the count restarts from zero for each position
+in the subject string.
+</P>
+<P>
+When <b>pcre_exec()</b> is called with a pattern that was successfully studied
+with a JIT option, the way that the matching is executed is entirely different.
+However, there is still the possibility of runaway matching that goes on for a
+very long time, and so the <i>match_limit</i> value is also used in this case
+(but in a different way) to limit how long the matching can continue.
+</P>
+<P>
+The default value for the limit can be set when PCRE is built; the default
+default is 10 million, which handles all but the most extreme cases. You can
+override the default by suppling <b>pcre_exec()</b> with a <b>pcre_extra</b>
+block in which <i>match_limit</i> is set, and PCRE_EXTRA_MATCH_LIMIT is set in
+the <i>flags</i> field. If the limit is exceeded, <b>pcre_exec()</b> returns
+PCRE_ERROR_MATCHLIMIT.
+</P>
+<P>
+A value for the match limit may also be supplied by an item at the start of a
+pattern of the form
+<pre>
+ (*LIMIT_MATCH=d)
+</pre>
+where d is a decimal number. However, such a setting is ignored unless d is
+less than the limit set by the caller of <b>pcre_exec()</b> or, if no such limit
+is set, less than the default.
+</P>
+<P>
+The <i>match_limit_recursion</i> field is similar to <i>match_limit</i>, but
+instead of limiting the total number of times that <b>match()</b> is called, it
+limits the depth of recursion. The recursion depth is a smaller number than the
+total number of calls, because not all calls to <b>match()</b> are recursive.
+This limit is of use only if it is set smaller than <i>match_limit</i>.
+</P>
+<P>
+Limiting the recursion depth limits the amount of machine stack that can be
+used, or, when PCRE has been compiled to use memory on the heap instead of the
+stack, the amount of heap memory that can be used. This limit is not relevant,
+and is ignored, when matching is done using JIT compiled code.
+</P>
+<P>
+The default value for <i>match_limit_recursion</i> can be set when PCRE is
+built; the default default is the same value as the default for
+<i>match_limit</i>. You can override the default by suppling <b>pcre_exec()</b>
+with a <b>pcre_extra</b> block in which <i>match_limit_recursion</i> is set, and
+PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the <i>flags</i> field. If the limit
+is exceeded, <b>pcre_exec()</b> returns PCRE_ERROR_RECURSIONLIMIT.
+</P>
+<P>
+A value for the recursion limit may also be supplied by an item at the start of
+a pattern of the form
+<pre>
+ (*LIMIT_RECURSION=d)
+</pre>
+where d is a decimal number. However, such a setting is ignored unless d is
+less than the limit set by the caller of <b>pcre_exec()</b> or, if no such limit
+is set, less than the default.
+</P>
+<P>
+The <i>callout_data</i> field is used in conjunction with the "callout" feature,
+and is described in the
+<a href="pcrecallout.html"><b>pcrecallout</b></a>
+documentation.
+</P>
+<P>
+The <i>tables</i> field is provided for use with patterns that have been
+pre-compiled using custom character tables, saved to disc or elsewhere, and
+then reloaded, because the tables that were used to compile a pattern are not
+saved with it. See the
+<a href="pcreprecompile.html"><b>pcreprecompile</b></a>
+documentation for a discussion of saving compiled patterns for later use. If
+NULL is passed using this mechanism, it forces PCRE's internal tables to be
+used.
+</P>
+<P>
+<b>Warning:</b> The tables that <b>pcre_exec()</b> uses must be the same as those
+that were used when the pattern was compiled. If this is not the case, the
+behaviour of <b>pcre_exec()</b> is undefined. Therefore, when a pattern is
+compiled and matched in the same process, this field should never be set. In
+this (the most common) case, the correct table pointer is automatically passed
+with the compiled pattern from <b>pcre_compile()</b> to <b>pcre_exec()</b>.
+</P>
+<P>
+If PCRE_EXTRA_MARK is set in the <i>flags</i> field, the <i>mark</i> field must
+be set to point to a suitable variable. If the pattern contains any
+backtracking control verbs such as (*MARK:NAME), and the execution ends up with
+a name to pass back, a pointer to the name string (zero terminated) is placed
+in the variable pointed to by the <i>mark</i> field. The names are within the
+compiled pattern; if you wish to retain such a name you must copy it before
+freeing the memory of a compiled pattern. If there is no name to pass back, the
+variable pointed to by the <i>mark</i> field is set to NULL. For details of the
+backtracking control verbs, see the section entitled
+<a href="pcrepattern#backtrackcontrol">"Backtracking control"</a>
+in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation.
+<a name="execoptions"></a></P>
+<br><b>
+Option bits for <b>pcre_exec()</b>
+</b><br>
+<P>
+The unused bits of the <i>options</i> argument for <b>pcre_exec()</b> must be
+zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_<i>xxx</i>,
+PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
+PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, and
+PCRE_PARTIAL_SOFT.
+</P>
+<P>
+If the pattern was successfully studied with one of the just-in-time (JIT)
+compile options, the only supported options for JIT execution are
+PCRE_NO_UTF8_CHECK, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,
+PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an
+unsupported option is used, JIT execution is disabled and the normal
+interpretive code in <b>pcre_exec()</b> is run.
+<pre>
+ PCRE_ANCHORED
+</pre>
+The PCRE_ANCHORED option limits <b>pcre_exec()</b> to matching at the first
+matching position. If a pattern was compiled with PCRE_ANCHORED, or turned out
+to be anchored by virtue of its contents, it cannot be made unachored at
+matching time.
+<pre>
+ PCRE_BSR_ANYCRLF
+ PCRE_BSR_UNICODE
+</pre>
+These options (which are mutually exclusive) control what the \R escape
+sequence matches. The choice is either to match only CR, LF, or CRLF, or to
+match any Unicode newline sequence. These options override the choice that was
+made or defaulted when the pattern was compiled.
+<pre>
+ PCRE_NEWLINE_CR
+ PCRE_NEWLINE_LF
+ PCRE_NEWLINE_CRLF
+ PCRE_NEWLINE_ANYCRLF
+ PCRE_NEWLINE_ANY
+</pre>
+These options override the newline definition that was chosen or defaulted when
+the pattern was compiled. For details, see the description of
+<b>pcre_compile()</b> above. During matching, the newline choice affects the
+behaviour of the dot, circumflex, and dollar metacharacters. It may also alter
+the way the match position is advanced after a match failure for an unanchored
+pattern.
+</P>
+<P>
+When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is set, and a
+match attempt for an unanchored pattern fails when the current position is at a
+CRLF sequence, and the pattern contains no explicit matches for CR or LF
+characters, the match position is advanced by two characters instead of one, in
+other words, to after the CRLF.
+</P>
+<P>
+The above rule is a compromise that makes the most common cases work as
+expected. For example, if the pattern is .+A (and the PCRE_DOTALL option is not
+set), it does not match the string "\r\nA" because, after failing at the
+start, it skips both the CR and the LF before retrying. However, the pattern
+[\r\n]A does match that string, because it contains an explicit CR or LF
+reference, and so advances only by one character after the first failure.
+</P>
+<P>
+An explicit match for CR of LF is either a literal appearance of one of those
+characters, or one of the \r or \n escape sequences. Implicit matches such as
+[^X] do not count, nor does \s (which includes CR and LF in the characters
+that it matches).
+</P>
+<P>
+Notwithstanding the above, anomalous effects may still occur when CRLF is a
+valid newline sequence and explicit \r or \n escapes appear in the pattern.
+<pre>
+ PCRE_NOTBOL
+</pre>
+This option specifies that first character of the subject string is not the
+beginning of a line, so the circumflex metacharacter should not match before
+it. Setting this without PCRE_MULTILINE (at compile time) causes circumflex
+never to match. This option affects only the behaviour of the circumflex
+metacharacter. It does not affect \A.
+<pre>
+ PCRE_NOTEOL
+</pre>
+This option specifies that the end of the subject string is not the end of a
+line, so the dollar metacharacter should not match it nor (except in multiline
+mode) a newline immediately before it. Setting this without PCRE_MULTILINE (at
+compile time) causes dollar never to match. This option affects only the
+behaviour of the dollar metacharacter. It does not affect \Z or \z.
+<pre>
+ PCRE_NOTEMPTY
+</pre>
+An empty string is not considered to be a valid match if this option is set. If
+there are alternatives in the pattern, they are tried. If all the alternatives
+match the empty string, the entire match fails. For example, if the pattern
+<pre>
+ a?b?
+</pre>
+is applied to a string not beginning with "a" or "b", it matches an empty
+string at the start of the subject. With PCRE_NOTEMPTY set, this match is not
+valid, so PCRE searches further into the string for occurrences of "a" or "b".
+<pre>
+ PCRE_NOTEMPTY_ATSTART
+</pre>
+This is like PCRE_NOTEMPTY, except that an empty string match that is not at
+the start of the subject is permitted. If the pattern is anchored, such a match
+can occur only if the pattern contains \K.
+</P>
+<P>
+Perl has no direct equivalent of PCRE_NOTEMPTY or PCRE_NOTEMPTY_ATSTART, but it
+does make a special case of a pattern match of the empty string within its
+<b>split()</b> function, and when using the /g modifier. It is possible to
+emulate Perl's behaviour after matching a null string by first trying the match
+again at the same offset with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED, and then
+if that fails, by advancing the starting offset (see below) and trying an
+ordinary match again. There is some code that demonstrates how to do this in
+the
+<a href="pcredemo.html"><b>pcredemo</b></a>
+sample program. In the most general case, you have to check to see if the
+newline convention recognizes CRLF as a newline, and if so, and the current
+character is CR followed by LF, advance the starting offset by two characters
+instead of one.
+<pre>
+ PCRE_NO_START_OPTIMIZE
+</pre>
+There are a number of optimizations that <b>pcre_exec()</b> uses at the start of
+a match, in order to speed up the process. For example, if it is known that an
+unanchored match must start with a specific character, it searches the subject
+for that character, and fails immediately if it cannot find it, without
+actually running the main matching function. This means that a special item
+such as (*COMMIT) at the start of a pattern is not considered until after a
+suitable starting point for the match has been found. Also, when callouts or
+(*MARK) items are in use, these "start-up" optimizations can cause them to be
+skipped if the pattern is never actually used. The start-up optimizations are
+in effect a pre-scan of the subject that takes place before the pattern is run.
+</P>
+<P>
+The PCRE_NO_START_OPTIMIZE option disables the start-up optimizations, possibly
+causing performance to suffer, but ensuring that in cases where the result is
+"no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
+are considered at every possible starting position in the subject string. If
+PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
+time. The use of PCRE_NO_START_OPTIMIZE at matching time (that is, passing it
+to <b>pcre_exec()</b>) disables JIT execution; in this situation, matching is
+always done using interpretively.
+</P>
+<P>
+Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.
+Consider the pattern
+<pre>
+ (*COMMIT)ABC
+</pre>
+When this is compiled, PCRE records the fact that a match must start with the
+character "A". Suppose the subject string is "DEFABC". The start-up
+optimization scans along the subject, finds "A" and runs the first match
+attempt from there. The (*COMMIT) item means that the pattern must match the
+current starting position, which in this case, it does. However, if the same
+match is run with PCRE_NO_START_OPTIMIZE set, the initial scan along the
+subject string does not happen. The first match attempt is run starting from
+"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
+the overall result is "no match". If the pattern is studied, more start-up
+optimizations may be used. For example, a minimum length for the subject may be
+recorded. Consider the pattern
+<pre>
+ (*MARK:A)(X|Y)
+</pre>
+The minimum length for a match is one character. If the subject is "ABC", there
+will be attempts to match "ABC", "BC", "C", and then finally an empty string.
+If the pattern is studied, the final attempt does not take place, because PCRE
+knows that the subject is too short, and so the (*MARK) is never encountered.
+In this case, studying the pattern does not affect the overall match result,
+which is still "no match", but it does affect the auxiliary information that is
+returned.
+<pre>
+ PCRE_NO_UTF8_CHECK
+</pre>
+When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8
+string is automatically checked when <b>pcre_exec()</b> is subsequently called.
+The entire string is checked before any other processing takes place. The value
+of <i>startoffset</i> is also checked to ensure that it points to the start of a
+UTF-8 character. There is a discussion about the
+<a href="pcreunicode.html#utf8strings">validity of UTF-8 strings</a>
+in the
+<a href="pcreunicode.html"><b>pcreunicode</b></a>
+page. If an invalid sequence of bytes is found, <b>pcre_exec()</b> returns the
+error PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a
+truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In both
+cases, information about the precise nature of the error may also be returned
+(see the descriptions of these errors in the section entitled \fIError return
+values from\fP <b>pcre_exec()</b>
+<a href="#errorlist">below).</a>
+If <i>startoffset</i> contains a value that does not point to the start of a
+UTF-8 character (or to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is
+returned.
+</P>
+<P>
+If you already know that your subject is valid, and you want to skip these
+checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when
+calling <b>pcre_exec()</b>. You might want to do this for the second and
+subsequent calls to <b>pcre_exec()</b> if you are making repeated calls to find
+all the matches in a single subject string. However, you should be sure that
+the value of <i>startoffset</i> points to the start of a character (or the end
+of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an
+invalid string as a subject or an invalid value of <i>startoffset</i> is
+undefined. Your program may crash or loop.
+<pre>
+ PCRE_PARTIAL_HARD
+ PCRE_PARTIAL_SOFT
+</pre>
+These options turn on the partial matching feature. For backwards
+compatibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial match
+occurs if the end of the subject string is reached successfully, but there are
+not enough subject characters to complete the match. If this happens when
+PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set, matching continues by
+testing any remaining alternatives. Only if no complete match can be found is
+PCRE_ERROR_PARTIAL returned instead of PCRE_ERROR_NOMATCH. In other words,
+PCRE_PARTIAL_SOFT says that the caller is prepared to handle a partial match,
+but only if no complete match can be found.
+</P>
+<P>
+If PCRE_PARTIAL_HARD is set, it overrides PCRE_PARTIAL_SOFT. In this case, if a
+partial match is found, <b>pcre_exec()</b> immediately returns
+PCRE_ERROR_PARTIAL, without considering any other alternatives. In other words,
+when PCRE_PARTIAL_HARD is set, a partial match is considered to be more
+important that an alternative complete match.
+</P>
+<P>
+In both cases, the portion of the string that was inspected when the partial
+match was found is set as the first matching string. There is a more detailed
+discussion of partial and multi-segment matching, with examples, in the
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+documentation.
+</P>
+<br><b>
+The string to be matched by <b>pcre_exec()</b>
+</b><br>
+<P>
+The subject string is passed to <b>pcre_exec()</b> as a pointer in
+<i>subject</i>, a length in <i>length</i>, and a starting offset in
+<i>startoffset</i>. The units for <i>length</i> and <i>startoffset</i> are bytes
+for the 8-bit library, 16-bit data items for the 16-bit library, and 32-bit
+data items for the 32-bit library.
+</P>
+<P>
+If <i>startoffset</i> is negative or greater than the length of the subject,
+<b>pcre_exec()</b> returns PCRE_ERROR_BADOFFSET. When the starting offset is
+zero, the search for a match starts at the beginning of the subject, and this
+is by far the most common case. In UTF-8 or UTF-16 mode, the offset must point
+to the start of a character, or the end of the subject (in UTF-32 mode, one
+data unit equals one character, so all offsets are valid). Unlike the pattern
+string, the subject may contain binary zeroes.
+</P>
+<P>
+A non-zero starting offset is useful when searching for another match in the
+same subject by calling <b>pcre_exec()</b> again after a previous success.
+Setting <i>startoffset</i> differs from just passing over a shortened string and
+setting PCRE_NOTBOL in the case of a pattern that begins with any kind of
+lookbehind. For example, consider the pattern
+<pre>
+ \Biss\B
+</pre>
+which finds occurrences of "iss" in the middle of words. (\B matches only if
+the current position in the subject is not a word boundary.) When applied to
+the string "Mississipi" the first call to <b>pcre_exec()</b> finds the first
+occurrence. If <b>pcre_exec()</b> is called again with just the remainder of the
+subject, namely "issipi", it does not match, because \B is always false at the
+start of the subject, which is deemed to be a word boundary. However, if
+<b>pcre_exec()</b> is passed the entire string again, but with <i>startoffset</i>
+set to 4, it finds the second occurrence of "iss" because it is able to look
+behind the starting point to discover that it is preceded by a letter.
+</P>
+<P>
+Finding all the matches in a subject is tricky when the pattern can match an
+empty string. It is possible to emulate Perl's /g behaviour by first trying the
+match again at the same offset, with the PCRE_NOTEMPTY_ATSTART and
+PCRE_ANCHORED options, and then if that fails, advancing the starting offset
+and trying an ordinary match again. There is some code that demonstrates how to
+do this in the
+<a href="pcredemo.html"><b>pcredemo</b></a>
+sample program. In the most general case, you have to check to see if the
+newline convention recognizes CRLF as a newline, and if so, and the current
+character is CR followed by LF, advance the starting offset by two characters
+instead of one.
+</P>
+<P>
+If a non-zero starting offset is passed when the pattern is anchored, one
+attempt to match at the given offset is made. This can only succeed if the
+pattern does not require the match to be at the start of the subject.
+</P>
+<br><b>
+How <b>pcre_exec()</b> returns captured substrings
+</b><br>
+<P>
+In general, a pattern matches a certain portion of the subject, and in
+addition, further substrings from the subject may be picked out by parts of the
+pattern. Following the usage in Jeffrey Friedl's book, this is called
+"capturing" in what follows, and the phrase "capturing subpattern" is used for
+a fragment of a pattern that picks out a substring. PCRE supports several other
+kinds of parenthesized subpattern that do not cause substrings to be captured.
+</P>
+<P>
+Captured substrings are returned to the caller via a vector of integers whose
+address is passed in <i>ovector</i>. The number of elements in the vector is
+passed in <i>ovecsize</i>, which must be a non-negative number. <b>Note</b>: this
+argument is NOT the size of <i>ovector</i> in bytes.
+</P>
+<P>
+The first two-thirds of the vector is used to pass back captured substrings,
+each substring using a pair of integers. The remaining third of the vector is
+used as workspace by <b>pcre_exec()</b> while matching capturing subpatterns,
+and is not available for passing back information. The number passed in
+<i>ovecsize</i> should always be a multiple of three. If it is not, it is
+rounded down.
+</P>
+<P>
+When a match is successful, information about captured substrings is returned
+in pairs of integers, starting at the beginning of <i>ovector</i>, and
+continuing up to two-thirds of its length at the most. The first element of
+each pair is set to the offset of the first character in a substring, and the
+second is set to the offset of the first character after the end of a
+substring. These values are always data unit offsets, even in UTF mode. They
+are byte offsets in the 8-bit library, 16-bit data item offsets in the 16-bit
+library, and 32-bit data item offsets in the 32-bit library. <b>Note</b>: they
+are not character counts.
+</P>
+<P>
+The first pair of integers, <i>ovector[0]</i> and <i>ovector[1]</i>, identify the
+portion of the subject string matched by the entire pattern. The next pair is
+used for the first capturing subpattern, and so on. The value returned by
+<b>pcre_exec()</b> is one more than the highest numbered pair that has been set.
+For example, if two substrings have been captured, the returned value is 3. If
+there are no capturing subpatterns, the return value from a successful match is
+1, indicating that just the first pair of offsets has been set.
+</P>
+<P>
+If a capturing subpattern is matched repeatedly, it is the last portion of the
+string that it matched that is returned.
+</P>
+<P>
+If the vector is too small to hold all the captured substring offsets, it is
+used as far as possible (up to two-thirds of its length), and the function
+returns a value of zero. If neither the actual string matched nor any captured
+substrings are of interest, <b>pcre_exec()</b> may be called with <i>ovector</i>
+passed as NULL and <i>ovecsize</i> as zero. However, if the pattern contains
+back references and the <i>ovector</i> is not big enough to remember the related
+substrings, PCRE has to get additional memory for use during matching. Thus it
+is usually advisable to supply an <i>ovector</i> of reasonable size.
+</P>
+<P>
+There are some cases where zero is returned (indicating vector overflow) when
+in fact the vector is exactly the right size for the final match. For example,
+consider the pattern
+<pre>
+ (a)(?:(b)c|bd)
+</pre>
+If a vector of 6 elements (allowing for only 1 captured substring) is given
+with subject string "abd", <b>pcre_exec()</b> will try to set the second
+captured string, thereby recording a vector overflow, before failing to match
+"c" and backing up to try the second alternative. The zero return, however,
+does correctly indicate that the maximum number of slots (namely 2) have been
+filled. In similar cases where there is temporary overflow, but the final
+number of used slots is actually less than the maximum, a non-zero value is
+returned.
+</P>
+<P>
+The <b>pcre_fullinfo()</b> function can be used to find out how many capturing
+subpatterns there are in a compiled pattern. The smallest size for
+<i>ovector</i> that will allow for <i>n</i> captured substrings, in addition to
+the offsets of the substring matched by the whole pattern, is (<i>n</i>+1)*3.
+</P>
+<P>
+It is possible for capturing subpattern number <i>n+1</i> to match some part of
+the subject when subpattern <i>n</i> has not been used at all. For example, if
+the string "abc" is matched against the pattern (a|(z))(bc) the return from the
+function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this
+happens, both values in the offset pairs corresponding to unused subpatterns
+are set to -1.
+</P>
+<P>
+Offset values that correspond to unused subpatterns at the end of the
+expression are also set to -1. For example, if the string "abc" is matched
+against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched. The
+return from the function is 2, because the highest used capturing subpattern
+number is 1, and the offsets for for the second and third capturing subpatterns
+(assuming the vector is large enough, of course) are set to -1.
+</P>
+<P>
+<b>Note</b>: Elements in the first two-thirds of <i>ovector</i> that do not
+correspond to capturing parentheses in the pattern are never changed. That is,
+if a pattern contains <i>n</i> capturing parentheses, no more than
+<i>ovector[0]</i> to <i>ovector[2n+1]</i> are set by <b>pcre_exec()</b>. The other
+elements (in the first two-thirds) retain whatever values they previously had.
+</P>
+<P>
+Some convenience functions are provided for extracting the captured substrings
+as separate strings. These are described below.
+<a name="errorlist"></a></P>
+<br><b>
+Error return values from <b>pcre_exec()</b>
+</b><br>
+<P>
+If <b>pcre_exec()</b> fails, it returns a negative number. The following are
+defined in the header file:
+<pre>
+ PCRE_ERROR_NOMATCH (-1)
+</pre>
+The subject string did not match the pattern.
+<pre>
+ PCRE_ERROR_NULL (-2)
+</pre>
+Either <i>code</i> or <i>subject</i> was passed as NULL, or <i>ovector</i> was
+NULL and <i>ovecsize</i> was not zero.
+<pre>
+ PCRE_ERROR_BADOPTION (-3)
+</pre>
+An unrecognized bit was set in the <i>options</i> argument.
+<pre>
+ PCRE_ERROR_BADMAGIC (-4)
+</pre>
+PCRE stores a 4-byte "magic number" at the start of the compiled code, to catch
+the case when it is passed a junk pointer and to detect when a pattern that was
+compiled in an environment of one endianness is run in an environment with the
+other endianness. This is the error that PCRE gives when the magic number is
+not present.
+<pre>
+ PCRE_ERROR_UNKNOWN_OPCODE (-5)
+</pre>
+While running the pattern match, an unknown item was encountered in the
+compiled pattern. This error could be caused by a bug in PCRE or by overwriting
+of the compiled pattern.
+<pre>
+ PCRE_ERROR_NOMEMORY (-6)
+</pre>
+If a pattern contains back references, but the <i>ovector</i> that is passed to
+<b>pcre_exec()</b> is not big enough to remember the referenced substrings, PCRE
+gets a block of memory at the start of matching to use for this purpose. If the
+call via <b>pcre_malloc()</b> fails, this error is given. The memory is
+automatically freed at the end of matching.
+</P>
+<P>
+This error is also given if <b>pcre_stack_malloc()</b> fails in
+<b>pcre_exec()</b>. This can happen only when PCRE has been compiled with
+<b>--disable-stack-for-recursion</b>.
+<pre>
+ PCRE_ERROR_NOSUBSTRING (-7)
+</pre>
+This error is used by the <b>pcre_copy_substring()</b>,
+<b>pcre_get_substring()</b>, and <b>pcre_get_substring_list()</b> functions (see
+below). It is never returned by <b>pcre_exec()</b>.
+<pre>
+ PCRE_ERROR_MATCHLIMIT (-8)
+</pre>
+The backtracking limit, as specified by the <i>match_limit</i> field in a
+<b>pcre_extra</b> structure (or defaulted) was reached. See the description
+above.
+<pre>
+ PCRE_ERROR_CALLOUT (-9)
+</pre>
+This error is never generated by <b>pcre_exec()</b> itself. It is provided for
+use by callout functions that want to yield a distinctive error code. See the
+<a href="pcrecallout.html"><b>pcrecallout</b></a>
+documentation for details.
+<pre>
+ PCRE_ERROR_BADUTF8 (-10)
+</pre>
+A string that contains an invalid UTF-8 byte sequence was passed as a subject,
+and the PCRE_NO_UTF8_CHECK option was not set. If the size of the output vector
+(<i>ovecsize</i>) is at least 2, the byte offset to the start of the the invalid
+UTF-8 character is placed in the first element, and a reason code is placed in
+the second element. The reason codes are listed in the
+<a href="#badutf8reasons">following section.</a>
+For backward compatibility, if PCRE_PARTIAL_HARD is set and the problem is a
+truncated UTF-8 character at the end of the subject (reason codes 1 to 5),
+PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8.
+<pre>
+ PCRE_ERROR_BADUTF8_OFFSET (-11)
+</pre>
+The UTF-8 byte sequence that was passed as a subject was checked and found to
+be valid (the PCRE_NO_UTF8_CHECK option was not set), but the value of
+<i>startoffset</i> did not point to the beginning of a UTF-8 character or the
+end of the subject.
+<pre>
+ PCRE_ERROR_PARTIAL (-12)
+</pre>
+The subject string did not match, but it did match partially. See the
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+documentation for details of partial matching.
+<pre>
+ PCRE_ERROR_BADPARTIAL (-13)
+</pre>
+This code is no longer in use. It was formerly returned when the PCRE_PARTIAL
+option was used with a compiled pattern containing items that were not
+supported for partial matching. From release 8.00 onwards, there are no
+restrictions on partial matching.
+<pre>
+ PCRE_ERROR_INTERNAL (-14)
+</pre>
+An unexpected internal error has occurred. This error could be caused by a bug
+in PCRE or by overwriting of the compiled pattern.
+<pre>
+ PCRE_ERROR_BADCOUNT (-15)
+</pre>
+This error is given if the value of the <i>ovecsize</i> argument is negative.
+<pre>
+ PCRE_ERROR_RECURSIONLIMIT (-21)
+</pre>
+The internal recursion limit, as specified by the <i>match_limit_recursion</i>
+field in a <b>pcre_extra</b> structure (or defaulted) was reached. See the
+description above.
+<pre>
+ PCRE_ERROR_BADNEWLINE (-23)
+</pre>
+An invalid combination of PCRE_NEWLINE_<i>xxx</i> options was given.
+<pre>
+ PCRE_ERROR_BADOFFSET (-24)
+</pre>
+The value of <i>startoffset</i> was negative or greater than the length of the
+subject, that is, the value in <i>length</i>.
+<pre>
+ PCRE_ERROR_SHORTUTF8 (-25)
+</pre>
+This error is returned instead of PCRE_ERROR_BADUTF8 when the subject string
+ends with a truncated UTF-8 character and the PCRE_PARTIAL_HARD option is set.
+Information about the failure is returned as for PCRE_ERROR_BADUTF8. It is in
+fact sufficient to detect this case, but this special error code for
+PCRE_PARTIAL_HARD precedes the implementation of returned information; it is
+retained for backwards compatibility.
+<pre>
+ PCRE_ERROR_RECURSELOOP (-26)
+</pre>
+This error is returned when <b>pcre_exec()</b> detects a recursion loop within
+the pattern. Specifically, it means that either the whole pattern or a
+subpattern has been called recursively for the second time at the same position
+in the subject string. Some simple patterns that might do this are detected and
+faulted at compile time, but more complicated cases, in particular mutual
+recursions between two different subpatterns, cannot be detected until run
+time.
+<pre>
+ PCRE_ERROR_JIT_STACKLIMIT (-27)
+</pre>
+This error is returned when a pattern that was successfully studied using a
+JIT compile option is being matched, but the memory available for the
+just-in-time processing stack is not large enough. See the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+documentation for more details.
+<pre>
+ PCRE_ERROR_BADMODE (-28)
+</pre>
+This error is given if a pattern that was compiled by the 8-bit library is
+passed to a 16-bit or 32-bit library function, or vice versa.
+<pre>
+ PCRE_ERROR_BADENDIANNESS (-29)
+</pre>
+This error is given if a pattern that was compiled and saved is reloaded on a
+host with different endianness. The utility function
+<b>pcre_pattern_to_host_byte_order()</b> can be used to convert such a pattern
+so that it runs on the new host.
+<pre>
+ PCRE_ERROR_JIT_BADOPTION
+</pre>
+This error is returned when a pattern that was successfully studied using a JIT
+compile option is being matched, but the matching mode (partial or complete
+match) does not correspond to any JIT compilation mode. When the JIT fast path
+function is used, this error may be also given for invalid options. See the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+documentation for more details.
+<pre>
+ PCRE_ERROR_BADLENGTH (-32)
+</pre>
+This error is given if <b>pcre_exec()</b> is called with a negative value for
+the <i>length</i> argument.
+</P>
+<P>
+Error numbers -16 to -20, -22, and 30 are not used by <b>pcre_exec()</b>.
+<a name="badutf8reasons"></a></P>
+<br><b>
+Reason codes for invalid UTF-8 strings
+</b><br>
+<P>
+This section applies only to the 8-bit library. The corresponding information
+for the 16-bit and 32-bit libraries is given in the
+<a href="pcre16.html"><b>pcre16</b></a>
+and
+<a href="pcre32.html"><b>pcre32</b></a>
+pages.
+</P>
+<P>
+When <b>pcre_exec()</b> returns either PCRE_ERROR_BADUTF8 or
+PCRE_ERROR_SHORTUTF8, and the size of the output vector (<i>ovecsize</i>) is at
+least 2, the offset of the start of the invalid UTF-8 character is placed in
+the first output vector element (<i>ovector[0]</i>) and a reason code is placed
+in the second element (<i>ovector[1]</i>). The reason codes are given names in
+the <b>pcre.h</b> header file:
+<pre>
+ PCRE_UTF8_ERR1
+ PCRE_UTF8_ERR2
+ PCRE_UTF8_ERR3
+ PCRE_UTF8_ERR4
+ PCRE_UTF8_ERR5
+</pre>
+The string ends with a truncated UTF-8 character; the code specifies how many
+bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 characters to be
+no longer than 4 bytes, the encoding scheme (originally defined by RFC 2279)
+allows for up to 6 bytes, and this is checked first; hence the possibility of
+4 or 5 missing bytes.
+<pre>
+ PCRE_UTF8_ERR6
+ PCRE_UTF8_ERR7
+ PCRE_UTF8_ERR8
+ PCRE_UTF8_ERR9
+ PCRE_UTF8_ERR10
+</pre>
+The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of the
+character do not have the binary value 0b10 (that is, either the most
+significant bit is 0, or the next bit is 1).
+<pre>
+ PCRE_UTF8_ERR11
+ PCRE_UTF8_ERR12
+</pre>
+A character that is valid by the RFC 2279 rules is either 5 or 6 bytes long;
+these code points are excluded by RFC 3629.
+<pre>
+ PCRE_UTF8_ERR13
+</pre>
+A 4-byte character has a value greater than 0x10fff; these code points are
+excluded by RFC 3629.
+<pre>
+ PCRE_UTF8_ERR14
+</pre>
+A 3-byte character has a value in the range 0xd800 to 0xdfff; this range of
+code points are reserved by RFC 3629 for use with UTF-16, and so are excluded
+from UTF-8.
+<pre>
+ PCRE_UTF8_ERR15
+ PCRE_UTF8_ERR16
+ PCRE_UTF8_ERR17
+ PCRE_UTF8_ERR18
+ PCRE_UTF8_ERR19
+</pre>
+A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes for a
+value that can be represented by fewer bytes, which is invalid. For example,
+the two bytes 0xc0, 0xae give the value 0x2e, whose correct coding uses just
+one byte.
+<pre>
+ PCRE_UTF8_ERR20
+</pre>
+The two most significant bits of the first byte of a character have the binary
+value 0b10 (that is, the most significant bit is 1 and the second is 0). Such a
+byte can only validly occur as the second or subsequent byte of a multi-byte
+character.
+<pre>
+ PCRE_UTF8_ERR21
+</pre>
+The first byte of a character has the value 0xfe or 0xff. These values can
+never occur in a valid UTF-8 string.
+<pre>
+ PCRE_UTF8_ERR22
+</pre>
+This error code was formerly used when the presence of a so-called
+"non-character" caused an error. Unicode corrigendum #9 makes it clear that
+such characters should not cause a string to be rejected, and so this code is
+no longer in use and is never returned.
+</P>
+<br><a name="SEC18" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
+<P>
+<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
+<b> int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, int <i>stringnumber</i>,</b>
+<b> const char **<i>stringptr</i>);</b>
+<br>
+<br>
+<b>int pcre_get_substring_list(const char *<i>subject</i>,</b>
+<b> int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
+</P>
+<P>
+Captured substrings can be accessed directly by using the offsets returned by
+<b>pcre_exec()</b> in <i>ovector</i>. For convenience, the functions
+<b>pcre_copy_substring()</b>, <b>pcre_get_substring()</b>, and
+<b>pcre_get_substring_list()</b> are provided for extracting captured substrings
+as new, separate, zero-terminated strings. These functions identify substrings
+by number. The next section describes functions for extracting named
+substrings.
+</P>
+<P>
+A substring that contains a binary zero is correctly extracted and has a
+further zero added on the end, but the result is not, of course, a C string.
+However, you can process such a string by referring to the length that is
+returned by <b>pcre_copy_substring()</b> and <b>pcre_get_substring()</b>.
+Unfortunately, the interface to <b>pcre_get_substring_list()</b> is not adequate
+for handling strings containing binary zeros, because the end of the final
+string is not independently indicated.
+</P>
+<P>
+The first three arguments are the same for all three of these functions:
+<i>subject</i> is the subject string that has just been successfully matched,
+<i>ovector</i> is a pointer to the vector of integer offsets that was passed to
+<b>pcre_exec()</b>, and <i>stringcount</i> is the number of substrings that were
+captured by the match, including the substring that matched the entire regular
+expression. This is the value returned by <b>pcre_exec()</b> if it is greater
+than zero. If <b>pcre_exec()</b> returned zero, indicating that it ran out of
+space in <i>ovector</i>, the value passed as <i>stringcount</i> should be the
+number of elements in the vector divided by three.
+</P>
+<P>
+The functions <b>pcre_copy_substring()</b> and <b>pcre_get_substring()</b>
+extract a single substring, whose number is given as <i>stringnumber</i>. A
+value of zero extracts the substring that matched the entire pattern, whereas
+higher values extract the captured substrings. For <b>pcre_copy_substring()</b>,
+the string is placed in <i>buffer</i>, whose length is given by
+<i>buffersize</i>, while for <b>pcre_get_substring()</b> a new block of memory is
+obtained via <b>pcre_malloc</b>, and its address is returned via
+<i>stringptr</i>. The yield of the function is the length of the string, not
+including the terminating zero, or one of these error codes:
+<pre>
+ PCRE_ERROR_NOMEMORY (-6)
+</pre>
+The buffer was too small for <b>pcre_copy_substring()</b>, or the attempt to get
+memory failed for <b>pcre_get_substring()</b>.
+<pre>
+ PCRE_ERROR_NOSUBSTRING (-7)
+</pre>
+There is no substring whose number is <i>stringnumber</i>.
+</P>
+<P>
+The <b>pcre_get_substring_list()</b> function extracts all available substrings
+and builds a list of pointers to them. All this is done in a single block of
+memory that is obtained via <b>pcre_malloc</b>. The address of the memory block
+is returned via <i>listptr</i>, which is also the start of the list of string
+pointers. The end of the list is marked by a NULL pointer. The yield of the
+function is zero if all went well, or the error code
+<pre>
+ PCRE_ERROR_NOMEMORY (-6)
+</pre>
+if the attempt to get the memory block failed.
+</P>
+<P>
+When any of these functions encounter a substring that is unset, which can
+happen when capturing subpattern number <i>n+1</i> matches some part of the
+subject, but subpattern <i>n</i> has not been used at all, they return an empty
+string. This can be distinguished from a genuine zero-length substring by
+inspecting the appropriate offset in <i>ovector</i>, which is negative for unset
+substrings.
+</P>
+<P>
+The two convenience functions <b>pcre_free_substring()</b> and
+<b>pcre_free_substring_list()</b> can be used to free the memory returned by
+a previous call of <b>pcre_get_substring()</b> or
+<b>pcre_get_substring_list()</b>, respectively. They do nothing more than call
+the function pointed to by <b>pcre_free</b>, which of course could be called
+directly from a C program. However, PCRE is used in some situations where it is
+linked via a special interface to another programming language that cannot use
+<b>pcre_free</b> directly; it is for these cases that the functions are
+provided.
+</P>
+<br><a name="SEC19" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
+<P>
+<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
+<b> const char *<i>name</i>);</b>
+<br>
+<br>
+<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
+<b> const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, const char *<i>stringname</i>,</b>
+<b> char *<i>buffer</i>, int <i>buffersize</i>);</b>
+<br>
+<br>
+<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
+<b> const char *<i>subject</i>, int *<i>ovector</i>,</b>
+<b> int <i>stringcount</i>, const char *<i>stringname</i>,</b>
+<b> const char **<i>stringptr</i>);</b>
+</P>
+<P>
+To extract a substring by name, you first have to find associated number.
+For example, for this pattern
+<pre>
+ (a+)b(?&#60;xxx&#62;\d+)...
+</pre>
+the number of the subpattern called "xxx" is 2. If the name is known to be
+unique (PCRE_DUPNAMES was not set), you can find the number from the name by
+calling <b>pcre_get_stringnumber()</b>. The first argument is the compiled
+pattern, and the second is the name. The yield of the function is the
+subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no subpattern of
+that name.
+</P>
+<P>
+Given the number, you can extract the substring directly, or use one of the
+functions described in the previous section. For convenience, there are also
+two functions that do the whole job.
+</P>
+<P>
+Most of the arguments of <b>pcre_copy_named_substring()</b> and
+<b>pcre_get_named_substring()</b> are the same as those for the similarly named
+functions that extract by number. As these are described in the previous
+section, they are not re-described here. There are just two differences:
+</P>
+<P>
+First, instead of a substring number, a substring name is given. Second, there
+is an extra argument, given at the start, which is a pointer to the compiled
+pattern. This is needed in order to gain access to the name-to-number
+translation table.
+</P>
+<P>
+These functions call <b>pcre_get_stringnumber()</b>, and if it succeeds, they
+then call <b>pcre_copy_substring()</b> or <b>pcre_get_substring()</b>, as
+appropriate. <b>NOTE:</b> If PCRE_DUPNAMES is set and there are duplicate names,
+the behaviour may not be what you want (see the next section).
+</P>
+<P>
+<b>Warning:</b> If the pattern uses the (?| feature to set up multiple
+subpatterns with the same number, as described in the
+<a href="pcrepattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
+in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+page, you cannot use names to distinguish the different subpatterns, because
+names are not included in the compiled code. The matching process uses only
+numbers. For this reason, the use of different names for subpatterns of the
+same number causes an error at compile time.
+</P>
+<br><a name="SEC20" href="#TOC1">DUPLICATE SUBPATTERN NAMES</a><br>
+<P>
+<b>int pcre_get_stringtable_entries(const pcre *<i>code</i>,</b>
+<b> const char *<i>name</i>, char **<i>first</i>, char **<i>last</i>);</b>
+</P>
+<P>
+When a pattern is compiled with the PCRE_DUPNAMES option, names for subpatterns
+are not required to be unique. (Duplicate names are always allowed for
+subpatterns with the same number, created by using the (?| feature. Indeed, if
+such subpatterns are named, they are required to use the same names.)
+</P>
+<P>
+Normally, patterns with duplicate names are such that in any one match, only
+one of the named subpatterns participates. An example is shown in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation.
+</P>
+<P>
+When duplicates are present, <b>pcre_copy_named_substring()</b> and
+<b>pcre_get_named_substring()</b> return the first substring corresponding to
+the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING (-7) is
+returned; no data is returned. The <b>pcre_get_stringnumber()</b> function
+returns one of the numbers that are associated with the name, but it is not
+defined which it is.
+</P>
+<P>
+If you want to get full details of all captured substrings for a given name,
+you must use the <b>pcre_get_stringtable_entries()</b> function. The first
+argument is the compiled pattern, and the second is the name. The third and
+fourth are pointers to variables which are updated by the function. After it
+has run, they point to the first and last entries in the name-to-number table
+for the given name. The function itself returns the length of each entry, or
+PCRE_ERROR_NOSUBSTRING (-7) if there are none. The format of the table is
+described above in the section entitled <i>Information about a pattern</i>
+<a href="#infoaboutpattern">above.</a>
+Given all the relevant entries for the name, you can extract each of their
+numbers, and hence the captured data, if any.
+</P>
+<br><a name="SEC21" href="#TOC1">FINDING ALL POSSIBLE MATCHES</a><br>
+<P>
+The traditional matching function uses a similar algorithm to Perl, which stops
+when it finds the first match, starting at a given point in the subject. If you
+want to find all possible matches, or the longest possible match, consider
+using the alternative matching function (see below) instead. If you cannot use
+the alternative function, but still need to find all possible matches, you
+can kludge it up by making use of the callout facility, which is described in
+the
+<a href="pcrecallout.html"><b>pcrecallout</b></a>
+documentation.
+</P>
+<P>
+What you have to do is to insert a callout right at the end of the pattern.
+When your callout function is called, extract and save the current matched
+substring. Then return 1, which forces <b>pcre_exec()</b> to backtrack and try
+other alternatives. Ultimately, when it runs out of matches, <b>pcre_exec()</b>
+will yield PCRE_ERROR_NOMATCH.
+</P>
+<br><a name="SEC22" href="#TOC1">OBTAINING AN ESTIMATE OF STACK USAGE</a><br>
+<P>
+Matching certain patterns using <b>pcre_exec()</b> can use a lot of process
+stack, which in certain environments can be rather limited in size. Some users
+find it helpful to have an estimate of the amount of stack that is used by
+<b>pcre_exec()</b>, to help them set recursion limits, as described in the
+<a href="pcrestack.html"><b>pcrestack</b></a>
+documentation. The estimate that is output by <b>pcretest</b> when called with
+the <b>-m</b> and <b>-C</b> options is obtained by calling <b>pcre_exec</b> with
+the values NULL, NULL, NULL, -999, and -999 for its first five arguments.
+</P>
+<P>
+Normally, if its first argument is NULL, <b>pcre_exec()</b> immediately returns
+the negative error code PCRE_ERROR_NULL, but with this special combination of
+arguments, it returns instead a negative number whose absolute value is the
+approximate stack frame size in bytes. (A negative number is used so that it is
+clear that no match has happened.) The value is approximate because in some
+cases, recursive calls to <b>pcre_exec()</b> occur when there are one or two
+additional variables on the stack.
+</P>
+<P>
+If PCRE has been compiled to use the heap instead of the stack for recursion,
+the value returned is the size of each block that is obtained from the heap.
+<a name="dfamatch"></a></P>
+<br><a name="SEC23" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
+<P>
+<b>int pcre_dfa_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
+<b> const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
+<b> int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
+<b> int *<i>workspace</i>, int <i>wscount</i>);</b>
+</P>
+<P>
+The function <b>pcre_dfa_exec()</b> is called to match a subject string against
+a compiled pattern, using a matching algorithm that scans the subject string
+just once, and does not backtrack. This has different characteristics to the
+normal algorithm, and is not compatible with Perl. Some of the features of PCRE
+patterns are not supported. Nevertheless, there are times when this kind of
+matching can be useful. For a discussion of the two matching algorithms, and a
+list of features that <b>pcre_dfa_exec()</b> does not support, see the
+<a href="pcrematching.html"><b>pcrematching</b></a>
+documentation.
+</P>
+<P>
+The arguments for the <b>pcre_dfa_exec()</b> function are the same as for
+<b>pcre_exec()</b>, plus two extras. The <i>ovector</i> argument is used in a
+different way, and this is described below. The other common arguments are used
+in the same way as for <b>pcre_exec()</b>, so their description is not repeated
+here.
+</P>
+<P>
+The two additional arguments provide workspace for the function. The workspace
+vector should contain at least 20 elements. It is used for keeping track of
+multiple paths through the pattern tree. More workspace will be needed for
+patterns and subjects where there are a lot of potential matches.
+</P>
+<P>
+Here is an example of a simple call to <b>pcre_dfa_exec()</b>:
+<pre>
+ int rc;
+ int ovector[10];
+ int wspace[20];
+ rc = pcre_dfa_exec(
+ re, /* result of pcre_compile() */
+ NULL, /* we didn't study the pattern */
+ "some string", /* the subject string */
+ 11, /* the length of the subject string */
+ 0, /* start at offset 0 in the subject */
+ 0, /* default options */
+ ovector, /* vector of integers for substring information */
+ 10, /* number of elements (NOT size in bytes) */
+ wspace, /* working space vector */
+ 20); /* number of elements (NOT size in bytes) */
+</PRE>
+</P>
+<br><b>
+Option bits for <b>pcre_dfa_exec()</b>
+</b><br>
+<P>
+The unused bits of the <i>options</i> argument for <b>pcre_dfa_exec()</b> must be
+zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_<i>xxx</i>,
+PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
+PCRE_NO_UTF8_CHECK, PCRE_BSR_ANYCRLF, PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE,
+PCRE_PARTIAL_HARD, PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART.
+All but the last four of these are exactly the same as for <b>pcre_exec()</b>,
+so their description is not repeated here.
+<pre>
+ PCRE_PARTIAL_HARD
+ PCRE_PARTIAL_SOFT
+</pre>
+These have the same general effect as they do for <b>pcre_exec()</b>, but the
+details are slightly different. When PCRE_PARTIAL_HARD is set for
+<b>pcre_dfa_exec()</b>, it returns PCRE_ERROR_PARTIAL if the end of the subject
+is reached and there is still at least one matching possibility that requires
+additional characters. This happens even if some complete matches have also
+been found. When PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH
+is converted into PCRE_ERROR_PARTIAL if the end of the subject is reached,
+there have been no complete matches, but there is still at least one matching
+possibility. The portion of the string that was inspected when the longest
+partial match was found is set as the first matching string in both cases.
+There is a more detailed discussion of partial and multi-segment matching, with
+examples, in the
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+documentation.
+<pre>
+ PCRE_DFA_SHORTEST
+</pre>
+Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as
+soon as it has found one match. Because of the way the alternative algorithm
+works, this is necessarily the shortest possible match at the first possible
+matching point in the subject string.
+<pre>
+ PCRE_DFA_RESTART
+</pre>
+When <b>pcre_dfa_exec()</b> returns a partial match, it is possible to call it
+again, with additional subject characters, and have it continue with the same
+match. The PCRE_DFA_RESTART option requests this action; when it is set, the
+<i>workspace</i> and <i>wscount</i> options must reference the same vector as
+before because data about the match so far is left in them after a partial
+match. There is more discussion of this facility in the
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+documentation.
+</P>
+<br><b>
+Successful returns from <b>pcre_dfa_exec()</b>
+</b><br>
+<P>
+When <b>pcre_dfa_exec()</b> succeeds, it may have matched more than one
+substring in the subject. Note, however, that all the matches from one run of
+the function start at the same point in the subject. The shorter matches are
+all initial substrings of the longer matches. For example, if the pattern
+<pre>
+ &#60;.*&#62;
+</pre>
+is matched against the string
+<pre>
+ This is &#60;something&#62; &#60;something else&#62; &#60;something further&#62; no more
+</pre>
+the three matched strings are
+<pre>
+ &#60;something&#62;
+ &#60;something&#62; &#60;something else&#62;
+ &#60;something&#62; &#60;something else&#62; &#60;something further&#62;
+</pre>
+On success, the yield of the function is a number greater than zero, which is
+the number of matched substrings. The substrings themselves are returned in
+<i>ovector</i>. Each string uses two elements; the first is the offset to the
+start, and the second is the offset to the end. In fact, all the strings have
+the same start offset. (Space could have been saved by giving this only once,
+but it was decided to retain some compatibility with the way <b>pcre_exec()</b>
+returns data, even though the meaning of the strings is different.)
+</P>
+<P>
+The strings are returned in reverse order of length; that is, the longest
+matching string is given first. If there were too many matches to fit into
+<i>ovector</i>, the yield of the function is zero, and the vector is filled with
+the longest matches. Unlike <b>pcre_exec()</b>, <b>pcre_dfa_exec()</b> can use
+the entire <i>ovector</i> for returning matched strings.
+</P>
+<P>
+NOTE: PCRE's "auto-possessification" optimization usually applies to character
+repeats at the end of a pattern (as well as internally). For example, the
+pattern "a\d+" is compiled as if it were "a\d++" because there is no point
+even considering the possibility of backtracking into the repeated digits. For
+DFA matching, this means that only one possible match is found. If you really
+do want multiple matches in such cases, either use an ungreedy repeat
+("a\d+?") or set the PCRE_NO_AUTO_POSSESS option when compiling.
+</P>
+<br><b>
+Error returns from <b>pcre_dfa_exec()</b>
+</b><br>
+<P>
+The <b>pcre_dfa_exec()</b> function returns a negative number when it fails.
+Many of the errors are the same as for <b>pcre_exec()</b>, and these are
+described
+<a href="#errorlist">above.</a>
+There are in addition the following errors that are specific to
+<b>pcre_dfa_exec()</b>:
+<pre>
+ PCRE_ERROR_DFA_UITEM (-16)
+</pre>
+This return is given if <b>pcre_dfa_exec()</b> encounters an item in the pattern
+that it does not support, for instance, the use of \C or a back reference.
+<pre>
+ PCRE_ERROR_DFA_UCOND (-17)
+</pre>
+This return is given if <b>pcre_dfa_exec()</b> encounters a condition item that
+uses a back reference for the condition, or a test for recursion in a specific
+group. These are not supported.
+<pre>
+ PCRE_ERROR_DFA_UMLIMIT (-18)
+</pre>
+This return is given if <b>pcre_dfa_exec()</b> is called with an <i>extra</i>
+block that contains a setting of the <i>match_limit</i> or
+<i>match_limit_recursion</i> fields. This is not supported (these fields are
+meaningless for DFA matching).
+<pre>
+ PCRE_ERROR_DFA_WSSIZE (-19)
+</pre>
+This return is given if <b>pcre_dfa_exec()</b> runs out of space in the
+<i>workspace</i> vector.
+<pre>
+ PCRE_ERROR_DFA_RECURSE (-20)
+</pre>
+When a recursive subpattern is processed, the matching function calls itself
+recursively, using private vectors for <i>ovector</i> and <i>workspace</i>. This
+error is given if the output vector is not large enough. This should be
+extremely rare, as a vector of size 1000 is used.
+<pre>
+ PCRE_ERROR_DFA_BADRESTART (-30)
+</pre>
+When <b>pcre_dfa_exec()</b> is called with the <b>PCRE_DFA_RESTART</b> option,
+some plausibility checks are made on the contents of the workspace, which
+should contain data about the previous partial match. If any of these checks
+fail, this error is given.
+</P>
+<br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcre16</b>(3), <b>pcre32</b>(3), <b>pcrebuild</b>(3), <b>pcrecallout</b>(3),
+<b>pcrecpp(3)</b>(3), <b>pcrematching</b>(3), <b>pcrepartial</b>(3),
+<b>pcreposix</b>(3), <b>pcreprecompile</b>(3), <b>pcresample</b>(3),
+<b>pcrestack</b>(3).
+</P>
+<br><a name="SEC25" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC26" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 09 February 2014
+<br>
+Copyright &copy; 1997-2014 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcrebuild.html b/doc/html/pcrebuild.html
new file mode 100644
index 0000000..03c8cbe
--- /dev/null
+++ b/doc/html/pcrebuild.html
@@ -0,0 +1,534 @@
+<html>
+<head>
+<title>pcrebuild specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrebuild man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">BUILDING PCRE</a>
+<li><a name="TOC2" href="#SEC2">PCRE BUILD-TIME OPTIONS</a>
+<li><a name="TOC3" href="#SEC3">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
+<li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a>
+<li><a name="TOC5" href="#SEC5">C++ SUPPORT</a>
+<li><a name="TOC6" href="#SEC6">UTF-8, UTF-16 AND UTF-32 SUPPORT</a>
+<li><a name="TOC7" href="#SEC7">UNICODE CHARACTER PROPERTY SUPPORT</a>
+<li><a name="TOC8" href="#SEC8">JUST-IN-TIME COMPILER SUPPORT</a>
+<li><a name="TOC9" href="#SEC9">CODE VALUE OF NEWLINE</a>
+<li><a name="TOC10" href="#SEC10">WHAT \R MATCHES</a>
+<li><a name="TOC11" href="#SEC11">POSIX MALLOC USAGE</a>
+<li><a name="TOC12" href="#SEC12">HANDLING VERY LARGE PATTERNS</a>
+<li><a name="TOC13" href="#SEC13">AVOIDING EXCESSIVE STACK USAGE</a>
+<li><a name="TOC14" href="#SEC14">LIMITING PCRE RESOURCE USAGE</a>
+<li><a name="TOC15" href="#SEC15">CREATING CHARACTER TABLES AT BUILD TIME</a>
+<li><a name="TOC16" href="#SEC16">USING EBCDIC CODE</a>
+<li><a name="TOC17" href="#SEC17">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
+<li><a name="TOC18" href="#SEC18">PCREGREP BUFFER SIZE</a>
+<li><a name="TOC19" href="#SEC19">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
+<li><a name="TOC20" href="#SEC20">DEBUGGING WITH VALGRIND SUPPORT</a>
+<li><a name="TOC21" href="#SEC21">CODE COVERAGE REPORTING</a>
+<li><a name="TOC22" href="#SEC22">SEE ALSO</a>
+<li><a name="TOC23" href="#SEC23">AUTHOR</a>
+<li><a name="TOC24" href="#SEC24">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">BUILDING PCRE</a><br>
+<P>
+PCRE is distributed with a <b>configure</b> script that can be used to build the
+library in Unix-like environments using the applications known as Autotools.
+Also in the distribution are files to support building using <b>CMake</b>
+instead of <b>configure</b>. The text file
+<a href="README.txt"><b>README</b></a>
+contains general information about building with Autotools (some of which is
+repeated below), and also has some comments about building on various operating
+systems. There is a lot more information about building PCRE without using
+Autotools (including information about using <b>CMake</b> and building "by
+hand") in the text file called
+<a href="NON-AUTOTOOLS-BUILD.txt"><b>NON-AUTOTOOLS-BUILD</b>.</a>
+You should consult this file as well as the
+<a href="README.txt"><b>README</b></a>
+file if you are building in a non-Unix-like environment.
+</P>
+<br><a name="SEC2" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
+<P>
+The rest of this document describes the optional features of PCRE that can be
+selected when the library is compiled. It assumes use of the <b>configure</b>
+script, where the optional features are selected or deselected by providing
+options to <b>configure</b> before running the <b>make</b> command. However, the
+same options can be selected in both Unix-like and non-Unix-like environments
+using the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead
+of <b>configure</b> to build PCRE.
+</P>
+<P>
+If you are not using Autotools or <b>CMake</b>, option selection can be done by
+editing the <b>config.h</b> file, or by passing parameter settings to the
+compiler, as described in
+<a href="NON-AUTOTOOLS-BUILD.txt"><b>NON-AUTOTOOLS-BUILD</b>.</a>
+</P>
+<P>
+The complete list of options for <b>configure</b> (which includes the standard
+ones such as the selection of the installation directory) can be obtained by
+running
+<pre>
+ ./configure --help
+</pre>
+The following sections include descriptions of options whose names begin with
+--enable or --disable. These settings specify changes to the defaults for the
+<b>configure</b> command. Because of the way that <b>configure</b> works,
+--enable and --disable always come in pairs, so the complementary option always
+exists as well, but as it specifies the default, it is not described.
+</P>
+<br><a name="SEC3" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
+<P>
+By default, a library called <b>libpcre</b> is built, containing functions that
+take string arguments contained in vectors of bytes, either as single-byte
+characters, or interpreted as UTF-8 strings. You can also build a separate
+library, called <b>libpcre16</b>, in which strings are contained in vectors of
+16-bit data units and interpreted either as single-unit characters or UTF-16
+strings, by adding
+<pre>
+ --enable-pcre16
+</pre>
+to the <b>configure</b> command. You can also build yet another separate
+library, called <b>libpcre32</b>, in which strings are contained in vectors of
+32-bit data units and interpreted either as single-unit characters or UTF-32
+strings, by adding
+<pre>
+ --enable-pcre32
+</pre>
+to the <b>configure</b> command. If you do not want the 8-bit library, add
+<pre>
+ --disable-pcre8
+</pre>
+as well. At least one of the three libraries must be built. Note that the C++
+and POSIX wrappers are for the 8-bit library only, and that <b>pcregrep</b> is
+an 8-bit program. None of these are built if you select only the 16-bit or
+32-bit libraries.
+</P>
+<br><a name="SEC4" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
+<P>
+The Autotools PCRE building process uses <b>libtool</b> to build both shared and
+static libraries by default. You can suppress one of these by adding one of
+<pre>
+ --disable-shared
+ --disable-static
+</pre>
+to the <b>configure</b> command, as required.
+</P>
+<br><a name="SEC5" href="#TOC1">C++ SUPPORT</a><br>
+<P>
+By default, if the 8-bit library is being built, the <b>configure</b> script
+will search for a C++ compiler and C++ header files. If it finds them, it
+automatically builds the C++ wrapper library (which supports only 8-bit
+strings). You can disable this by adding
+<pre>
+ --disable-cpp
+</pre>
+to the <b>configure</b> command.
+</P>
+<br><a name="SEC6" href="#TOC1">UTF-8, UTF-16 AND UTF-32 SUPPORT</a><br>
+<P>
+To build PCRE with support for UTF Unicode character strings, add
+<pre>
+ --enable-utf
+</pre>
+to the <b>configure</b> command. This setting applies to all three libraries,
+adding support for UTF-8 to the 8-bit library, support for UTF-16 to the 16-bit
+library, and support for UTF-32 to the to the 32-bit library. There are no
+separate options for enabling UTF-8, UTF-16 and UTF-32 independently because
+that would allow ridiculous settings such as requesting UTF-16 support while
+building only the 8-bit library. It is not possible to build one library with
+UTF support and another without in the same configuration. (For backwards
+compatibility, --enable-utf8 is a synonym of --enable-utf.)
+</P>
+<P>
+Of itself, this setting does not make PCRE treat strings as UTF-8, UTF-16 or
+UTF-32. As well as compiling PCRE with this option, you also have have to set
+the PCRE_UTF8, PCRE_UTF16 or PCRE_UTF32 option (as appropriate) when you call
+one of the pattern compiling functions.
+</P>
+<P>
+If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
+its input to be either ASCII or UTF-8 (depending on the run-time option). It is
+not possible to support both EBCDIC and UTF-8 codes in the same version of the
+library. Consequently, --enable-utf and --enable-ebcdic are mutually
+exclusive.
+</P>
+<br><a name="SEC7" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
+<P>
+UTF support allows the libraries to process character codepoints up to 0x10ffff
+in the strings that they handle. On its own, however, it does not provide any
+facilities for accessing the properties of such characters. If you want to be
+able to use the pattern escapes \P, \p, and \X, which refer to Unicode
+character properties, you must add
+<pre>
+ --enable-unicode-properties
+</pre>
+to the <b>configure</b> command. This implies UTF support, even if you have
+not explicitly requested it.
+</P>
+<P>
+Including Unicode property support adds around 30K of tables to the PCRE
+library. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
+supported. Details are given in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation.
+</P>
+<br><a name="SEC8" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
+<P>
+Just-in-time compiler support is included in the build by specifying
+<pre>
+ --enable-jit
+</pre>
+This support is available only for certain hardware architectures. If this
+option is set for an unsupported architecture, a compile time error occurs.
+See the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+documentation for a discussion of JIT usage. When JIT support is enabled,
+pcregrep automatically makes use of it, unless you add
+<pre>
+ --disable-pcregrep-jit
+</pre>
+to the "configure" command.
+</P>
+<br><a name="SEC9" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
+<P>
+By default, PCRE interprets the linefeed (LF) character as indicating the end
+of a line. This is the normal newline character on Unix-like systems. You can
+compile PCRE to use carriage return (CR) instead, by adding
+<pre>
+ --enable-newline-is-cr
+</pre>
+to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
+which explicitly specifies linefeed as the newline character.
+<br>
+<br>
+Alternatively, you can specify that line endings are to be indicated by the two
+character sequence CRLF. If you want this, add
+<pre>
+ --enable-newline-is-crlf
+</pre>
+to the <b>configure</b> command. There is a fourth option, specified by
+<pre>
+ --enable-newline-is-anycrlf
+</pre>
+which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as
+indicating a line ending. Finally, a fifth option, specified by
+<pre>
+ --enable-newline-is-any
+</pre>
+causes PCRE to recognize any Unicode newline sequence.
+</P>
+<P>
+Whatever line ending convention is selected when PCRE is built can be
+overridden when the library functions are called. At build time it is
+conventional to use the standard for your operating system.
+</P>
+<br><a name="SEC10" href="#TOC1">WHAT \R MATCHES</a><br>
+<P>
+By default, the sequence \R in a pattern matches any Unicode newline sequence,
+whatever has been selected as the line ending sequence. If you specify
+<pre>
+ --enable-bsr-anycrlf
+</pre>
+the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
+selected when PCRE is built can be overridden when the library functions are
+called.
+</P>
+<br><a name="SEC11" href="#TOC1">POSIX MALLOC USAGE</a><br>
+<P>
+When the 8-bit library is called through the POSIX interface (see the
+<a href="pcreposix.html"><b>pcreposix</b></a>
+documentation), additional working storage is required for holding the pointers
+to capturing substrings, because PCRE requires three integers per substring,
+whereas the POSIX interface provides only two. If the number of expected
+substrings is small, the wrapper function uses space on the stack, because this
+is faster than using <b>malloc()</b> for each call. The default threshold above
+which the stack is no longer used is 10; it can be changed by adding a setting
+such as
+<pre>
+ --with-posix-malloc-threshold=20
+</pre>
+to the <b>configure</b> command.
+</P>
+<br><a name="SEC12" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
+<P>
+Within a compiled pattern, offset values are used to point from one part to
+another (for example, from an opening parenthesis to an alternation
+metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
+are used for these offsets, leading to a maximum size for a compiled pattern of
+around 64K. This is sufficient to handle all but the most gigantic patterns.
+Nevertheless, some people do want to process truly enormous patterns, so it is
+possible to compile PCRE to use three-byte or four-byte offsets by adding a
+setting such as
+<pre>
+ --with-link-size=3
+</pre>
+to the <b>configure</b> command. The value given must be 2, 3, or 4. For the
+16-bit library, a value of 3 is rounded up to 4. In these libraries, using
+longer offsets slows down the operation of PCRE because it has to load
+additional data when handling them. For the 32-bit library the value is always
+4 and cannot be overridden; the value of --with-link-size is ignored.
+</P>
+<br><a name="SEC13" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
+<P>
+When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
+by making recursive calls to an internal function called <b>match()</b>. In
+environments where the size of the stack is limited, this can severely limit
+PCRE's operation. (The Unix environment does not usually suffer from this
+problem, but it may sometimes be necessary to increase the maximum stack size.
+There is a discussion in the
+<a href="pcrestack.html"><b>pcrestack</b></a>
+documentation.) An alternative approach to recursion that uses memory from the
+heap to remember data, instead of using recursive function calls, has been
+implemented to work round the problem of limited stack size. If you want to
+build a version of PCRE that works this way, add
+<pre>
+ --disable-stack-for-recursion
+</pre>
+to the <b>configure</b> command. With this configuration, PCRE will use the
+<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
+management functions. By default these point to <b>malloc()</b> and
+<b>free()</b>, but you can replace the pointers so that your own functions are
+used instead.
+</P>
+<P>
+Separate functions are provided rather than using <b>pcre_malloc</b> and
+<b>pcre_free</b> because the usage is very predictable: the block sizes
+requested are always the same, and the blocks are always freed in reverse
+order. A calling program might be able to implement optimized functions that
+perform better than <b>malloc()</b> and <b>free()</b>. PCRE runs noticeably more
+slowly when built in this way. This option affects only the <b>pcre_exec()</b>
+function; it is not relevant for <b>pcre_dfa_exec()</b>.
+</P>
+<br><a name="SEC14" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
+<P>
+Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
+(sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
+function. By controlling the maximum number of times this function may be
+called during a single matching operation, a limit can be placed on the
+resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
+at run time, as described in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation. The default is 10 million, but this can be changed by adding a
+setting such as
+<pre>
+ --with-match-limit=500000
+</pre>
+to the <b>configure</b> command. This setting has no effect on the
+<b>pcre_dfa_exec()</b> matching function.
+</P>
+<P>
+In some environments it is desirable to limit the depth of recursive calls of
+<b>match()</b> more strictly than the total number of calls, in order to
+restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
+is specified) that is used. A second limit controls this; it defaults to the
+value that is set for --with-match-limit, which imposes no additional
+constraints. However, you can set a lower limit by adding, for example,
+<pre>
+ --with-match-limit-recursion=10000
+</pre>
+to the <b>configure</b> command. This value can also be overridden at run time.
+</P>
+<br><a name="SEC15" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
+<P>
+PCRE uses fixed tables for processing characters whose code values are less
+than 256. By default, PCRE is built with a set of tables that are distributed
+in the file <i>pcre_chartables.c.dist</i>. These tables are for ASCII codes
+only. If you add
+<pre>
+ --enable-rebuild-chartables
+</pre>
+to the <b>configure</b> command, the distributed tables are no longer used.
+Instead, a program called <b>dftables</b> is compiled and run. This outputs the
+source for new set of tables, created in the default locale of your C run-time
+system. (This method of replacing the tables does not work if you are cross
+compiling, because <b>dftables</b> is run on the local host. If you need to
+create alternative tables when cross compiling, you will have to do so "by
+hand".)
+</P>
+<br><a name="SEC16" href="#TOC1">USING EBCDIC CODE</a><br>
+<P>
+PCRE assumes by default that it will run in an environment where the character
+code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
+most computer operating systems. PCRE can, however, be compiled to run in an
+EBCDIC environment by adding
+<pre>
+ --enable-ebcdic
+</pre>
+to the <b>configure</b> command. This setting implies
+--enable-rebuild-chartables. You should only use it if you know that you are in
+an EBCDIC environment (for example, an IBM mainframe operating system). The
+--enable-ebcdic option is incompatible with --enable-utf.
+</P>
+<P>
+The EBCDIC character that corresponds to an ASCII LF is assumed to have the
+value 0x15 by default. However, in some EBCDIC environments, 0x25 is used. In
+such an environment you should use
+<pre>
+ --enable-ebcdic-nl25
+</pre>
+as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR has the
+same value as in ASCII, namely, 0x0d. Whichever of 0x15 and 0x25 is <i>not</i>
+chosen as LF is made to correspond to the Unicode NEL character (which, in
+Unicode, is 0x85).
+</P>
+<P>
+The options that select newline behaviour, such as --enable-newline-is-cr,
+and equivalent run-time options, refer to these character values in an EBCDIC
+environment.
+</P>
+<br><a name="SEC17" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
+<P>
+By default, <b>pcregrep</b> reads all files as plain text. You can build it so
+that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
+them with <b>libz</b> or <b>libbz2</b>, respectively, by adding one or both of
+<pre>
+ --enable-pcregrep-libz
+ --enable-pcregrep-libbz2
+</pre>
+to the <b>configure</b> command. These options naturally require that the
+relevant libraries are installed on your system. Configuration will fail if
+they are not.
+</P>
+<br><a name="SEC18" href="#TOC1">PCREGREP BUFFER SIZE</a><br>
+<P>
+<b>pcregrep</b> uses an internal buffer to hold a "window" on the file it is
+scanning, in order to be able to output "before" and "after" lines when it
+finds a match. The size of the buffer is controlled by a parameter whose
+default value is 20K. The buffer itself is three times this size, but because
+of the way it is used for holding "before" lines, the longest line that is
+guaranteed to be processable is the parameter size. You can change the default
+parameter value by adding, for example,
+<pre>
+ --with-pcregrep-bufsize=50K
+</pre>
+to the <b>configure</b> command. The caller of \fPpcregrep\fP can, however,
+override this value by specifying a run-time option.
+</P>
+<br><a name="SEC19" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
+<P>
+If you add
+<pre>
+ --enable-pcretest-libreadline
+</pre>
+to the <b>configure</b> command, <b>pcretest</b> is linked with the
+<b>libreadline</b> library, and when its input is from a terminal, it reads it
+using the <b>readline()</b> function. This provides line-editing and history
+facilities. Note that <b>libreadline</b> is GPL-licensed, so if you distribute a
+binary of <b>pcretest</b> linked in this way, there may be licensing issues.
+</P>
+<P>
+Setting this option causes the <b>-lreadline</b> option to be added to the
+<b>pcretest</b> build. In many operating environments with a sytem-installed
+<b>libreadline</b> this is sufficient. However, in some environments (e.g.
+if an unmodified distribution version of readline is in use), some extra
+configuration may be necessary. The INSTALL file for <b>libreadline</b> says
+this:
+<pre>
+ "Readline uses the termcap functions, but does not link with the
+ termcap or curses library itself, allowing applications which link
+ with readline the to choose an appropriate library."
+</pre>
+If your environment has not been set up so that an appropriate library is
+automatically included, you may need to add something like
+<pre>
+ LIBS="-ncurses"
+</pre>
+immediately before the <b>configure</b> command.
+</P>
+<br><a name="SEC20" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
+<P>
+By adding the
+<pre>
+ --enable-valgrind
+</pre>
+option to to the <b>configure</b> command, PCRE will use valgrind annotations
+to mark certain memory regions as unaddressable. This allows it to detect
+invalid memory accesses, and is mostly useful for debugging PCRE itself.
+</P>
+<br><a name="SEC21" href="#TOC1">CODE COVERAGE REPORTING</a><br>
+<P>
+If your C compiler is gcc, you can build a version of PCRE that can generate a
+code coverage report for its test suite. To enable this, you must install
+<b>lcov</b> version 1.6 or above. Then specify
+<pre>
+ --enable-coverage
+</pre>
+to the <b>configure</b> command and build PCRE in the usual way.
+</P>
+<P>
+Note that using <b>ccache</b> (a caching C compiler) is incompatible with code
+coverage reporting. If you have configured <b>ccache</b> to run automatically
+on your system, you must set the environment variable
+<pre>
+ CCACHE_DISABLE=1
+</pre>
+before running <b>make</b> to build PCRE, so that <b>ccache</b> is not used.
+</P>
+<P>
+When --enable-coverage is used, the following addition targets are added to the
+<i>Makefile</i>:
+<pre>
+ make coverage
+</pre>
+This creates a fresh coverage report for the PCRE test suite. It is equivalent
+to running "make coverage-reset", "make coverage-baseline", "make check", and
+then "make coverage-report".
+<pre>
+ make coverage-reset
+</pre>
+This zeroes the coverage counters, but does nothing else.
+<pre>
+ make coverage-baseline
+</pre>
+This captures baseline coverage information.
+<pre>
+ make coverage-report
+</pre>
+This creates the coverage report.
+<pre>
+ make coverage-clean-report
+</pre>
+This removes the generated coverage report without cleaning the coverage data
+itself.
+<pre>
+ make coverage-clean-data
+</pre>
+This removes the captured coverage data without removing the coverage files
+created at compile time (*.gcno).
+<pre>
+ make coverage-clean
+</pre>
+This cleans all coverage data including the generated coverage report. For more
+information about code coverage, see the <b>gcov</b> and <b>lcov</b>
+documentation.
+</P>
+<br><a name="SEC22" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcreapi</b>(3), <b>pcre16</b>, <b>pcre32</b>, <b>pcre_config</b>(3).
+</P>
+<br><a name="SEC23" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC24" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 12 May 2013
+<br>
+Copyright &copy; 1997-2013 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcrecallout.html b/doc/html/pcrecallout.html
new file mode 100644
index 0000000..53a937f
--- /dev/null
+++ b/doc/html/pcrecallout.html
@@ -0,0 +1,286 @@
+<html>
+<head>
+<title>pcrecallout specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrecallout man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
+<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
+<li><a name="TOC3" href="#SEC3">MISSING CALLOUTS</a>
+<li><a name="TOC4" href="#SEC4">THE CALLOUT INTERFACE</a>
+<li><a name="TOC5" href="#SEC5">RETURN VALUES</a>
+<li><a name="TOC6" href="#SEC6">AUTHOR</a>
+<li><a name="TOC7" href="#SEC7">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
+<P>
+<b>#include &#60;pcre.h&#62;</b>
+</P>
+<P>
+<b>int (*pcre_callout)(pcre_callout_block *);</b>
+</P>
+<P>
+<b>int (*pcre16_callout)(pcre16_callout_block *);</b>
+</P>
+<P>
+<b>int (*pcre32_callout)(pcre32_callout_block *);</b>
+</P>
+<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
+<P>
+PCRE provides a feature called "callout", which is a means of temporarily
+passing control to the caller of PCRE in the middle of pattern matching. The
+caller of PCRE provides an external function by putting its entry point in the
+global variable <i>pcre_callout</i> (<i>pcre16_callout</i> for the 16-bit
+library, <i>pcre32_callout</i> for the 32-bit library). By default, this
+variable contains NULL, which disables all calling out.
+</P>
+<P>
+Within a regular expression, (?C) indicates the points at which the external
+function is to be called. Different callout points can be identified by putting
+a number less than 256 after the letter C. The default value is zero.
+For example, this pattern has two callout points:
+<pre>
+ (?C1)abc(?C2)def
+</pre>
+If the PCRE_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE
+automatically inserts callouts, all with number 255, before each item in the
+pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
+<pre>
+ A(\d{2}|--)
+</pre>
+it is processed as if it were
+<br>
+<br>
+(?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
+<br>
+<br>
+Notice that there is a callout before and after each parenthesis and
+alternation bar. If the pattern contains a conditional group whose condition is
+an assertion, an automatic callout is inserted immediately before the
+condition. Such a callout may also be inserted explicitly, for example:
+<pre>
+ (?(?C9)(?=a)ab|de)
+</pre>
+This applies only to assertion conditions (because they are themselves
+independent groups).
+</P>
+<P>
+Automatic callouts can be used for tracking the progress of pattern matching.
+The
+<a href="pcretest.html"><b>pcretest</b></a>
+program has a pattern qualifier (/C) that sets automatic callouts; when it is
+used, the output indicates how the pattern is being matched. This is useful
+information when you are trying to optimize the performance of a particular
+pattern.
+</P>
+<br><a name="SEC3" href="#TOC1">MISSING CALLOUTS</a><br>
+<P>
+You should be aware that, because of optimizations in the way PCRE compiles and
+matches patterns, callouts sometimes do not happen exactly as you might expect.
+</P>
+<P>
+At compile time, PCRE "auto-possessifies" repeated items when it knows that
+what follows cannot be part of the repeat. For example, a+[bc] is compiled as
+if it were a++[bc]. The <b>pcretest</b> output when this pattern is anchored and
+then applied with automatic callouts to the string "aaaa" is:
+<pre>
+ ---&#62;aaaa
+ +0 ^ ^
+ +1 ^ a+
+ +3 ^ ^ [bc]
+ No match
+</pre>
+This indicates that when matching [bc] fails, there is no backtracking into a+
+and therefore the callouts that would be taken for the backtracks do not occur.
+You can disable the auto-possessify feature by passing PCRE_NO_AUTO_POSSESS
+to <b>pcre_compile()</b>, or starting the pattern with (*NO_AUTO_POSSESS). If
+this is done in <b>pcretest</b> (using the /O qualifier), the output changes to
+this:
+<pre>
+ ---&#62;aaaa
+ +0 ^ ^
+ +1 ^ a+
+ +3 ^ ^ [bc]
+ +3 ^ ^ [bc]
+ +3 ^ ^ [bc]
+ +3 ^^ [bc]
+ No match
+</pre>
+This time, when matching [bc] fails, the matcher backtracks into a+ and tries
+again, repeatedly, until a+ itself fails.
+</P>
+<P>
+Other optimizations that provide fast "no match" results also affect callouts.
+For example, if the pattern is
+<pre>
+ ab(?C4)cd
+</pre>
+PCRE knows that any matching string must contain the letter "d". If the subject
+string is "abyz", the lack of "d" means that matching doesn't ever start, and
+the callout is never reached. However, with "abyd", though the result is still
+no match, the callout is obeyed.
+</P>
+<P>
+If the pattern is studied, PCRE knows the minimum length of a matching string,
+and will immediately give a "no match" return without actually running a match
+if the subject is not long enough, or, for unanchored patterns, if it has
+been scanned far enough.
+</P>
+<P>
+You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE
+option to the matching function, or by starting the pattern with
+(*NO_START_OPT). This slows down the matching process, but does ensure that
+callouts such as the example above are obeyed.
+</P>
+<br><a name="SEC4" href="#TOC1">THE CALLOUT INTERFACE</a><br>
+<P>
+During matching, when PCRE reaches a callout point, the external function
+defined by <i>pcre_callout</i> or <i>pcre[16|32]_callout</i> is called (if it is
+set). This applies to both normal and DFA matching. The only argument to the
+callout function is a pointer to a <b>pcre_callout</b> or
+<b>pcre[16|32]_callout</b> block. These structures contains the following
+fields:
+<pre>
+ int <i>version</i>;
+ int <i>callout_number</i>;
+ int *<i>offset_vector</i>;
+ const char *<i>subject</i>; (8-bit version)
+ PCRE_SPTR16 <i>subject</i>; (16-bit version)
+ PCRE_SPTR32 <i>subject</i>; (32-bit version)
+ int <i>subject_length</i>;
+ int <i>start_match</i>;
+ int <i>current_position</i>;
+ int <i>capture_top</i>;
+ int <i>capture_last</i>;
+ void *<i>callout_data</i>;
+ int <i>pattern_position</i>;
+ int <i>next_item_length</i>;
+ const unsigned char *<i>mark</i>; (8-bit version)
+ const PCRE_UCHAR16 *<i>mark</i>; (16-bit version)
+ const PCRE_UCHAR32 *<i>mark</i>; (32-bit version)
+</pre>
+The <i>version</i> field is an integer containing the version number of the
+block format. The initial version was 0; the current version is 2. The version
+number will change again in future if additional fields are added, but the
+intention is never to remove any of the existing fields.
+</P>
+<P>
+The <i>callout_number</i> field contains the number of the callout, as compiled
+into the pattern (that is, the number after ?C for manual callouts, and 255 for
+automatically generated callouts).
+</P>
+<P>
+The <i>offset_vector</i> field is a pointer to the vector of offsets that was
+passed by the caller to the matching function. When <b>pcre_exec()</b> or
+<b>pcre[16|32]_exec()</b> is used, the contents can be inspected, in order to
+extract substrings that have been matched so far, in the same way as for
+extracting substrings after a match has completed. For the DFA matching
+functions, this field is not useful.
+</P>
+<P>
+The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
+that were passed to the matching function.
+</P>
+<P>
+The <i>start_match</i> field normally contains the offset within the subject at
+which the current match attempt started. However, if the escape sequence \K
+has been encountered, this value is changed to reflect the modified starting
+point. If the pattern is not anchored, the callout function may be called
+several times from the same point in the pattern for different starting points
+in the subject.
+</P>
+<P>
+The <i>current_position</i> field contains the offset within the subject of the
+current match pointer.
+</P>
+<P>
+When the <b>pcre_exec()</b> or <b>pcre[16|32]_exec()</b> is used, the
+<i>capture_top</i> field contains one more than the number of the highest
+numbered captured substring so far. If no substrings have been captured, the
+value of <i>capture_top</i> is one. This is always the case when the DFA
+functions are used, because they do not support captured substrings.
+</P>
+<P>
+The <i>capture_last</i> field contains the number of the most recently captured
+substring. However, when a recursion exits, the value reverts to what it was
+outside the recursion, as do the values of all captured substrings. If no
+substrings have been captured, the value of <i>capture_last</i> is -1. This is
+always the case for the DFA matching functions.
+</P>
+<P>
+The <i>callout_data</i> field contains a value that is passed to a matching
+function specifically so that it can be passed back in callouts. It is passed
+in the <i>callout_data</i> field of a <b>pcre_extra</b> or <b>pcre[16|32]_extra</b>
+data structure. If no such data was passed, the value of <i>callout_data</i> in
+a callout block is NULL. There is a description of the <b>pcre_extra</b>
+structure in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+</P>
+<P>
+The <i>pattern_position</i> field is present from version 1 of the callout
+structure. It contains the offset to the next item to be matched in the pattern
+string.
+</P>
+<P>
+The <i>next_item_length</i> field is present from version 1 of the callout
+structure. It contains the length of the next item to be matched in the pattern
+string. When the callout immediately precedes an alternation bar, a closing
+parenthesis, or the end of the pattern, the length is zero. When the callout
+precedes an opening parenthesis, the length is that of the entire subpattern.
+</P>
+<P>
+The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
+help in distinguishing between different automatic callouts, which all have the
+same callout number. However, they are set for all callouts.
+</P>
+<P>
+The <i>mark</i> field is present from version 2 of the callout structure. In
+callouts from <b>pcre_exec()</b> or <b>pcre[16|32]_exec()</b> it contains a
+pointer to the zero-terminated name of the most recently passed (*MARK),
+(*PRUNE), or (*THEN) item in the match, or NULL if no such items have been
+passed. Instances of (*PRUNE) or (*THEN) without a name do not obliterate a
+previous (*MARK). In callouts from the DFA matching functions this field always
+contains NULL.
+</P>
+<br><a name="SEC5" href="#TOC1">RETURN VALUES</a><br>
+<P>
+The external callout function returns an integer to PCRE. If the value is zero,
+matching proceeds as normal. If the value is greater than zero, matching fails
+at the current point, but the testing of other matching possibilities goes
+ahead, just as if a lookahead assertion had failed. If the value is less than
+zero, the match is abandoned, the matching function returns the negative value.
+</P>
+<P>
+Negative values should normally be chosen from the set of PCRE_ERROR_xxx
+values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
+The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
+it will never be used by PCRE itself.
+</P>
+<br><a name="SEC6" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC7" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 12 November 2013
+<br>
+Copyright &copy; 1997-2013 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcrecompat.html b/doc/html/pcrecompat.html
new file mode 100644
index 0000000..3e62266
--- /dev/null
+++ b/doc/html/pcrecompat.html
@@ -0,0 +1,235 @@
+<html>
+<head>
+<title>pcrecompat specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrecompat man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+DIFFERENCES BETWEEN PCRE AND PERL
+</b><br>
+<P>
+This document describes the differences in the ways that PCRE and Perl handle
+regular expressions. The differences described here are with respect to Perl
+versions 5.10 and above.
+</P>
+<P>
+1. PCRE has only a subset of Perl's Unicode support. Details of what it does
+have are given in the
+<a href="pcreunicode.html"><b>pcreunicode</b></a>
+page.
+</P>
+<P>
+2. PCRE allows repeat quantifiers only on parenthesized assertions, but they do
+not mean what you might think. For example, (?!a){3} does not assert that the
+next three characters are not "a". It just asserts that the next character is
+not "a" three times (in principle: PCRE optimizes this to run the assertion
+just once). Perl allows repeat quantifiers on other assertions such as \b, but
+these do not seem to have any use.
+</P>
+<P>
+3. Capturing subpatterns that occur inside negative lookahead assertions are
+counted, but their entries in the offsets vector are never set. Perl sometimes
+(but not always) sets its numerical variables from inside negative assertions.
+</P>
+<P>
+4. Though binary zero characters are supported in the subject string, they are
+not allowed in a pattern string because it is passed as a normal C string,
+terminated by zero. The escape sequence \0 can be used in the pattern to
+represent a binary zero.
+</P>
+<P>
+5. The following Perl escape sequences are not supported: \l, \u, \L,
+\U, and \N when followed by a character name or Unicode value. (\N on its
+own, matching a non-newline character, is supported.) In fact these are
+implemented by Perl's general string-handling and are not part of its pattern
+matching engine. If any of these are encountered by PCRE, an error is
+generated by default. However, if the PCRE_JAVASCRIPT_COMPAT option is set,
+\U and \u are interpreted as JavaScript interprets them.
+</P>
+<P>
+6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is
+built with Unicode character property support. The properties that can be
+tested with \p and \P are limited to the general category properties such as
+Lu and Nd, script names such as Greek or Han, and the derived properties Any
+and L&. PCRE does support the Cs (surrogate) property, which Perl does not; the
+Perl documentation says "Because Perl hides the need for the user to understand
+the internal representation of Unicode characters, there is no need to
+implement the somewhat messy concept of surrogates."
+</P>
+<P>
+7. PCRE does support the \Q...\E escape for quoting substrings. Characters in
+between are treated as literals. This is slightly different from Perl in that $
+and @ are also handled as literals inside the quotes. In Perl, they cause
+variable interpolation (but of course PCRE does not have variables). Note the
+following examples:
+<pre>
+ Pattern PCRE matches Perl matches
+
+ \Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
+ \Qabc\$xyz\E abc\$xyz abc\$xyz
+ \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
+</pre>
+The \Q...\E sequence is recognized both inside and outside character classes.
+</P>
+<P>
+8. Fairly obviously, PCRE does not support the (?{code}) and (??{code})
+constructions. However, there is support for recursive patterns. This is not
+available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout"
+feature allows an external function to be called during pattern matching. See
+the
+<a href="pcrecallout.html"><b>pcrecallout</b></a>
+documentation for details.
+</P>
+<P>
+9. Subpatterns that are called as subroutines (whether or not recursively) are
+always treated as atomic groups in PCRE. This is like Python, but unlike Perl.
+Captured values that are set outside a subroutine call can be reference from
+inside in PCRE, but not in Perl. There is a discussion that explains these
+differences in more detail in the
+<a href="pcrepattern.html#recursiondifference">section on recursion differences from Perl</a>
+in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+page.
+</P>
+<P>
+10. If any of the backtracking control verbs are used in a subpattern that is
+called as a subroutine (whether or not recursively), their effect is confined
+to that subpattern; it does not extend to the surrounding pattern. This is not
+always the case in Perl. In particular, if (*THEN) is present in a group that
+is called as a subroutine, its action is limited to that group, even if the
+group does not contain any | characters. Note that such subpatterns are
+processed as anchored at the point where they are tested.
+</P>
+<P>
+11. If a pattern contains more than one backtracking control verb, the first
+one that is backtracked onto acts. For example, in the pattern
+A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
+triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
+same as PCRE, but there are examples where it differs.
+</P>
+<P>
+12. Most backtracking verbs in assertions have their normal actions. They are
+not confined to the assertion.
+</P>
+<P>
+13. There are some differences that are concerned with the settings of captured
+strings when part of a pattern is repeated. For example, matching "aba" against
+the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
+</P>
+<P>
+14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern
+names is not as general as Perl's. This is a consequence of the fact the PCRE
+works internally just with numbers, using an external table to translate
+between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b)B),
+where the two capturing parentheses have the same number but different names,
+is not supported, and causes an error at compile time. If it were allowed, it
+would not be possible to distinguish which parentheses matched, because both
+names map to capturing subpattern number 1. To avoid this confusing situation,
+an error is given at compile time.
+</P>
+<P>
+15. Perl recognizes comments in some places that PCRE does not, for example,
+between the ( and ? at the start of a subpattern. If the /x modifier is set,
+Perl allows white space between ( and ? (though current Perls warn that this is
+deprecated) but PCRE never does, even if the PCRE_EXTENDED option is set.
+</P>
+<P>
+16. Perl, when in warning mode, gives warnings for character classes such as
+[A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE has no
+warning features, so it gives an error in these cases because they are almost
+certainly user mistakes.
+</P>
+<P>
+17. In PCRE, the upper/lower case character properties Lu and Ll are not
+affected when case-independent matching is specified. For example, \p{Lu}
+always matches an upper case letter. I think Perl has changed in this respect;
+in the release at the time of writing (5.16), \p{Lu} and \p{Ll} match all
+letters, regardless of case, when case independence is specified.
+</P>
+<P>
+18. PCRE provides some extensions to the Perl regular expression facilities.
+Perl 5.10 includes new features that are not in earlier versions of Perl, some
+of which (such as named parentheses) have been in PCRE for some time. This list
+is with respect to Perl 5.10:
+<br>
+<br>
+(a) Although lookbehind assertions in PCRE must match fixed length strings,
+each alternative branch of a lookbehind assertion can match a different length
+of string. Perl requires them all to have the same length.
+<br>
+<br>
+(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
+meta-character matches only at the very end of the string.
+<br>
+<br>
+(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
+meaning is faulted. Otherwise, like Perl, the backslash is quietly ignored.
+(Perl can be made to issue a warning.)
+<br>
+<br>
+(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
+inverted, that is, by default they are not greedy, but if followed by a
+question mark they are.
+<br>
+<br>
+(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried
+only at the first matching position in the subject string.
+<br>
+<br>
+(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, and
+PCRE_NO_AUTO_CAPTURE options for <b>pcre_exec()</b> have no Perl equivalents.
+<br>
+<br>
+(g) The \R escape sequence can be restricted to match only CR, LF, or CRLF
+by the PCRE_BSR_ANYCRLF option.
+<br>
+<br>
+(h) The callout facility is PCRE-specific.
+<br>
+<br>
+(i) The partial matching facility is PCRE-specific.
+<br>
+<br>
+(j) Patterns compiled by PCRE can be saved and re-used at a later time, even on
+different hosts that have the other endianness. However, this does not apply to
+optimized data created by the just-in-time compiler.
+<br>
+<br>
+(k) The alternative matching functions (<b>pcre_dfa_exec()</b>,
+<b>pcre16_dfa_exec()</b> and <b>pcre32_dfa_exec()</b>,) match in a different way
+and are not Perl-compatible.
+<br>
+<br>
+(l) PCRE recognizes some special sequences such as (*CR) at the start of
+a pattern that set overall options that cannot be changed within the pattern.
+</P>
+<br><b>
+AUTHOR
+</b><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><b>
+REVISION
+</b><br>
+<P>
+Last updated: 10 November 2013
+<br>
+Copyright &copy; 1997-2013 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcrecpp.html b/doc/html/pcrecpp.html
new file mode 100644
index 0000000..b7eac3a
--- /dev/null
+++ b/doc/html/pcrecpp.html
@@ -0,0 +1,368 @@
+<html>
+<head>
+<title>pcrecpp specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrecpp man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
+<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
+<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
+<li><a name="TOC4" href="#SEC4">QUOTING METACHARACTERS</a>
+<li><a name="TOC5" href="#SEC5">PARTIAL MATCHES</a>
+<li><a name="TOC6" href="#SEC6">UTF-8 AND THE MATCHING INTERFACE</a>
+<li><a name="TOC7" href="#SEC7">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a>
+<li><a name="TOC8" href="#SEC8">SCANNING TEXT INCREMENTALLY</a>
+<li><a name="TOC9" href="#SEC9">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
+<li><a name="TOC10" href="#SEC10">REPLACING PARTS OF STRINGS</a>
+<li><a name="TOC11" href="#SEC11">AUTHOR</a>
+<li><a name="TOC12" href="#SEC12">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
+<P>
+<b>#include &#60;pcrecpp.h&#62;</b>
+</P>
+<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
+<P>
+The C++ wrapper for PCRE was provided by Google Inc. Some additional
+functionality was added by Giuseppe Maxia. This brief man page was constructed
+from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
+further details. Note that the C++ wrapper supports only the original 8-bit
+PCRE library. There is no 16-bit or 32-bit support at present.
+</P>
+<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
+<P>
+The "FullMatch" operation checks that supplied text matches a supplied pattern
+exactly. If pointer arguments are supplied, it copies matched sub-strings that
+match sub-patterns into them.
+<pre>
+ Example: successful match
+ pcrecpp::RE re("h.*o");
+ re.FullMatch("hello");
+
+ Example: unsuccessful match (requires full match):
+ pcrecpp::RE re("e");
+ !re.FullMatch("hello");
+
+ Example: creating a temporary RE object:
+ pcrecpp::RE("h.*o").FullMatch("hello");
+</pre>
+You can pass in a "const char*" or a "string" for "text". The examples below
+tend to use a const char*. You can, as in the different examples above, store
+the RE object explicitly in a variable or use a temporary RE object. The
+examples below use one mode or the other arbitrarily. Either could correctly be
+used for any of these examples.
+</P>
+<P>
+You must supply extra pointer arguments to extract matched subpieces.
+<pre>
+ Example: extracts "ruby" into "s" and 1234 into "i"
+ int i;
+ string s;
+ pcrecpp::RE re("(\\w+):(\\d+)");
+ re.FullMatch("ruby:1234", &s, &i);
+
+ Example: does not try to extract any extra sub-patterns
+ re.FullMatch("ruby:1234", &s);
+
+ Example: does not try to extract into NULL
+ re.FullMatch("ruby:1234", NULL, &i);
+
+ Example: integer overflow causes failure
+ !re.FullMatch("ruby:1234567891234", NULL, &i);
+
+ Example: fails because there aren't enough sub-patterns:
+ !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
+
+ Example: fails because string cannot be stored in integer
+ !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
+</pre>
+The provided pointer arguments can be pointers to any scalar numeric
+type, or one of:
+<pre>
+ string (matched piece is copied to string)
+ StringPiece (StringPiece is mutated to point to matched piece)
+ T (where "bool T::ParseFrom(const char*, int)" exists)
+ NULL (the corresponding matched sub-pattern is not copied)
+</pre>
+The function returns true iff all of the following conditions are satisfied:
+<pre>
+ a. "text" matches "pattern" exactly;
+
+ b. The number of matched sub-patterns is &#62;= number of supplied
+ pointers;
+
+ c. The "i"th argument has a suitable type for holding the
+ string captured as the "i"th sub-pattern. If you pass in
+ void * NULL for the "i"th argument, or a non-void * NULL
+ of the correct type, or pass fewer arguments than the
+ number of sub-patterns, "i"th captured sub-pattern is
+ ignored.
+</pre>
+CAVEAT: An optional sub-pattern that does not exist in the matched
+string is assigned the empty string. Therefore, the following will
+return false (because the empty string is not a valid number):
+<pre>
+ int number;
+ pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
+</pre>
+The matching interface supports at most 16 arguments per call.
+If you need more, consider using the more general interface
+<b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
+<b>DoMatch</b>.
+</P>
+<P>
+NOTE: Do not use <b>no_arg</b>, which is used internally to mark the end of a
+list of optional arguments, as a placeholder for missing arguments, as this can
+lead to segfaults.
+</P>
+<br><a name="SEC4" href="#TOC1">QUOTING METACHARACTERS</a><br>
+<P>
+You can use the "QuoteMeta" operation to insert backslashes before all
+potentially meaningful characters in a string. The returned string, used as a
+regular expression, will exactly match the original string.
+<pre>
+ Example:
+ string quoted = RE::QuoteMeta(unquoted);
+</pre>
+Note that it's legal to escape a character even if it has no special meaning in
+a regular expression -- so this function does that. (This also makes it
+identical to the perl function of the same name; see "perldoc -f quotemeta".)
+For example, "1.5-2.0?" becomes "1\.5\-2\.0\?".
+</P>
+<br><a name="SEC5" href="#TOC1">PARTIAL MATCHES</a><br>
+<P>
+You can use the "PartialMatch" operation when you want the pattern
+to match any substring of the text.
+<pre>
+ Example: simple search for a string:
+ pcrecpp::RE("ell").PartialMatch("hello");
+
+ Example: find first number in a string:
+ int number;
+ pcrecpp::RE re("(\\d+)");
+ re.PartialMatch("x*100 + 20", &number);
+ assert(number == 100);
+</PRE>
+</P>
+<br><a name="SEC6" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
+<P>
+By default, pattern and text are plain text, one byte per character. The UTF8
+flag, passed to the constructor, causes both pattern and string to be treated
+as UTF-8 text, still a byte stream but potentially multiple bytes per
+character. In practice, the text is likelier to be UTF-8 than the pattern, but
+the match returned may depend on the UTF8 flag, so always use it when matching
+UTF8 text. For example, "." will match one byte normally but with UTF8 set may
+match up to three bytes of a multi-byte character.
+<pre>
+ Example:
+ pcrecpp::RE_Options options;
+ options.set_utf8();
+ pcrecpp::RE re(utf8_pattern, options);
+ re.FullMatch(utf8_string);
+
+ Example: using the convenience function UTF8():
+ pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
+ re.FullMatch(utf8_string);
+</pre>
+NOTE: The UTF8 flag is ignored if pcre was not configured with the
+<pre>
+ --enable-utf8 flag.
+</PRE>
+</P>
+<br><a name="SEC7" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
+<P>
+PCRE defines some modifiers to change the behavior of the regular expression
+engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
+pass such modifiers to a RE class. Currently, the following modifiers are
+supported:
+<pre>
+ modifier description Perl corresponding
+
+ PCRE_CASELESS case insensitive match /i
+ PCRE_MULTILINE multiple lines match /m
+ PCRE_DOTALL dot matches newlines /s
+ PCRE_DOLLAR_ENDONLY $ matches only at end N/A
+ PCRE_EXTRA strict escape parsing N/A
+ PCRE_EXTENDED ignore white spaces /x
+ PCRE_UTF8 handles UTF8 chars built-in
+ PCRE_UNGREEDY reverses * and *? N/A
+ PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
+</pre>
+(*) Both Perl and PCRE allow non capturing parentheses by means of the
+"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
+capture, while (ab|cd) does.
+</P>
+<P>
+For a full account on how each modifier works, please check the
+PCRE API reference page.
+</P>
+<P>
+For each modifier, there are two member functions whose name is made
+out of the modifier in lowercase, without the "PCRE_" prefix. For
+instance, PCRE_CASELESS is handled by
+<pre>
+ bool caseless()
+</pre>
+which returns true if the modifier is set, and
+<pre>
+ RE_Options & set_caseless(bool)
+</pre>
+which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
+accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
+functions. Setting <i>match_limit</i> to a non-zero value will limit the
+execution of pcre to keep it from doing bad things like blowing the stack or
+taking an eternity to return a result. A value of 5000 is good enough to stop
+stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
+match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
+which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
+recurses. <b>match_limit()</b> limits the number of matches PCRE does;
+<b>match_limit_recursion()</b> limits the depth of internal recursion, and
+therefore the amount of stack that is used.
+</P>
+<P>
+Normally, to pass one or more modifiers to a RE class, you declare
+a <i>RE_Options</i> object, set the appropriate options, and pass this
+object to a RE constructor. Example:
+<pre>
+ RE_Options opt;
+ opt.set_caseless(true);
+ if (RE("HELLO", opt).PartialMatch("hello world")) ...
+</pre>
+RE_options has two constructors. The default constructor takes no arguments and
+creates a set of flags that are off by default. The optional parameter
+<i>option_flags</i> is to facilitate transfer of legacy code from C programs.
+This lets you do
+<pre>
+ RE(pattern,
+ RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
+</pre>
+However, new code is better off doing
+<pre>
+ RE(pattern,
+ RE_Options().set_caseless(true).set_multiline(true))
+ .PartialMatch(str);
+</pre>
+If you are going to pass one of the most used modifiers, there are some
+convenience functions that return a RE_Options class with the
+appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
+<b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
+</P>
+<P>
+If you need to set several options at once, and you don't want to go through
+the pains of declaring a RE_Options object and setting several options, there
+is a parallel method that give you such ability on the fly. You can concatenate
+several <b>set_xxxxx()</b> member functions, since each of them returns a
+reference to its class object. For example, to pass PCRE_CASELESS,
+PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
+<pre>
+ RE(" ^ xyz \\s+ .* blah$",
+ RE_Options()
+ .set_caseless(true)
+ .set_extended(true)
+ .set_multiline(true)).PartialMatch(sometext);
+
+</PRE>
+</P>
+<br><a name="SEC8" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
+<P>
+The "Consume" operation may be useful if you want to repeatedly
+match regular expressions at the front of a string and skip over
+them as they match. This requires use of the "StringPiece" type,
+which represents a sub-range of a real string. Like RE, StringPiece
+is defined in the pcrecpp namespace.
+<pre>
+ Example: read lines of the form "var = value" from a string.
+ string contents = ...; // Fill string somehow
+ pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
+
+ string var;
+ int value;
+ pcrecpp::RE re("(\\w+) = (\\d+)\n");
+ while (re.Consume(&input, &var, &value)) {
+ ...;
+ }
+</pre>
+Each successful call to "Consume" will set "var/value", and also
+advance "input" so it points past the matched text.
+</P>
+<P>
+The "FindAndConsume" operation is similar to "Consume" but does not
+anchor your match at the beginning of the string. For example, you
+could extract all words from a string by repeatedly calling
+<pre>
+ pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
+</PRE>
+</P>
+<br><a name="SEC9" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
+<P>
+By default, if you pass a pointer to a numeric value, the
+corresponding text is interpreted as a base-10 number. You can
+instead wrap the pointer with a call to one of the operators Hex(),
+Octal(), or CRadix() to interpret the text in another base. The
+CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
+prefixes, but defaults to base-10.
+<pre>
+ Example:
+ int a, b, c, d;
+ pcrecpp::RE re("(.*) (.*) (.*) (.*)");
+ re.FullMatch("100 40 0100 0x40",
+ pcrecpp::Octal(&a), pcrecpp::Hex(&b),
+ pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
+</pre>
+will leave 64 in a, b, c, and d.
+</P>
+<br><a name="SEC10" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
+<P>
+You can replace the first match of "pattern" in "str" with "rewrite".
+Within "rewrite", backslash-escaped digits (\1 to \9) can be
+used to insert text matching corresponding parenthesized group
+from the pattern. \0 in "rewrite" refers to the entire matching
+text. For example:
+<pre>
+ string s = "yabba dabba doo";
+ pcrecpp::RE("b+").Replace("d", &s);
+</pre>
+will leave "s" containing "yada dabba doo". The result is true if the pattern
+matches and a replacement occurs, false otherwise.
+</P>
+<P>
+<b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
+occurrences of the pattern in the string with the rewrite. Replacements are
+not subject to re-matching. For example:
+<pre>
+ string s = "yabba dabba doo";
+ pcrecpp::RE("b+").GlobalReplace("d", &s);
+</pre>
+will leave "s" containing "yada dada doo". It returns the number of
+replacements made.
+</P>
+<P>
+<b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
+"rewrite" is copied into "out" (an additional argument) with substitutions.
+The non-matching portions of "text" are ignored. Returns true iff a match
+occurred and the extraction happened successfully; if no match occurs, the
+string is left unaffected.
+</P>
+<br><a name="SEC11" href="#TOC1">AUTHOR</a><br>
+<P>
+The C++ wrapper was contributed by Google Inc.
+<br>
+Copyright &copy; 2007 Google Inc.
+<br>
+</P>
+<br><a name="SEC12" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 08 January 2012
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcredemo.html b/doc/html/pcredemo.html
new file mode 100644
index 0000000..894a930
--- /dev/null
+++ b/doc/html/pcredemo.html
@@ -0,0 +1,426 @@
+<html>
+<head>
+<title>pcredemo specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcredemo man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+</ul>
+<PRE>
+/*************************************************
+* PCRE DEMONSTRATION PROGRAM *
+*************************************************/
+
+/* This is a demonstration program to illustrate the most straightforward ways
+of calling the PCRE regular expression library from a C program. See the
+pcresample documentation for a short discussion ("man pcresample" if you have
+the PCRE man pages installed).
+
+In Unix-like environments, if PCRE is installed in your standard system
+libraries, you should be able to compile this program using this command:
+
+gcc -Wall pcredemo.c -lpcre -o pcredemo
+
+If PCRE is not installed in a standard place, it is likely to be installed with
+support for the pkg-config mechanism. If you have pkg-config, you can compile
+this program using this command:
+
+gcc -Wall pcredemo.c `pkg-config --cflags --libs libpcre` -o pcredemo
+
+If you do not have pkg-config, you may have to use this:
+
+gcc -Wall pcredemo.c -I/usr/local/include -L/usr/local/lib \
+ -R/usr/local/lib -lpcre -o pcredemo
+
+Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and
+library files for PCRE are installed on your system. Only some operating
+systems (e.g. Solaris) use the -R option.
+
+Building under Windows:
+
+If you want to statically link this program against a non-dll .a file, you must
+define PCRE_STATIC before including pcre.h, otherwise the pcre_malloc() and
+pcre_free() exported functions will be declared __declspec(dllimport), with
+unwanted results. So in this environment, uncomment the following line. */
+
+/* #define PCRE_STATIC */
+
+#include &lt;stdio.h&gt;
+#include &lt;string.h&gt;
+#include &lt;pcre.h&gt;
+
+#define OVECCOUNT 30 /* should be a multiple of 3 */
+
+
+int main(int argc, char **argv)
+{
+pcre *re;
+const char *error;
+char *pattern;
+char *subject;
+unsigned char *name_table;
+unsigned int option_bits;
+int erroffset;
+int find_all;
+int crlf_is_newline;
+int namecount;
+int name_entry_size;
+int ovector[OVECCOUNT];
+int subject_length;
+int rc, i;
+int utf8;
+
+
+/**************************************************************************
+* First, sort out the command line. There is only one possible option at *
+* the moment, "-g" to request repeated matching to find all occurrences, *
+* like Perl's /g option. We set the variable find_all to a non-zero value *
+* if the -g option is present. Apart from that, there must be exactly two *
+* arguments. *
+**************************************************************************/
+
+find_all = 0;
+for (i = 1; i &lt; argc; i++)
+ {
+ if (strcmp(argv[i], "-g") == 0) find_all = 1;
+ else break;
+ }
+
+/* After the options, we require exactly two arguments, which are the pattern,
+and the subject string. */
+
+if (argc - i != 2)
+ {
+ printf("Two arguments required: a regex and a subject string\n");
+ return 1;
+ }
+
+pattern = argv[i];
+subject = argv[i+1];
+subject_length = (int)strlen(subject);
+
+
+/*************************************************************************
+* Now we are going to compile the regular expression pattern, and handle *
+* and errors that are detected. *
+*************************************************************************/
+
+re = pcre_compile(
+ pattern, /* the pattern */
+ 0, /* default options */
+ &amp;error, /* for error message */
+ &amp;erroffset, /* for error offset */
+ NULL); /* use default character tables */
+
+/* Compilation failed: print the error message and exit */
+
+if (re == NULL)
+ {
+ printf("PCRE compilation failed at offset %d: %s\n", erroffset, error);
+ return 1;
+ }
+
+
+/*************************************************************************
+* If the compilation succeeded, we call PCRE again, in order to do a *
+* pattern match against the subject string. This does just ONE match. If *
+* further matching is needed, it will be done below. *
+*************************************************************************/
+
+rc = pcre_exec(
+ re, /* the compiled pattern */
+ NULL, /* no extra data - we didn't study the pattern */
+ subject, /* the subject string */
+ subject_length, /* the length of the subject */
+ 0, /* start at offset 0 in the subject */
+ 0, /* default options */
+ ovector, /* output vector for substring information */
+ OVECCOUNT); /* number of elements in the output vector */
+
+/* Matching failed: handle error cases */
+
+if (rc &lt; 0)
+ {
+ switch(rc)
+ {
+ case PCRE_ERROR_NOMATCH: printf("No match\n"); break;
+ /*
+ Handle other special cases if you like
+ */
+ default: printf("Matching error %d\n", rc); break;
+ }
+ pcre_free(re); /* Release memory used for the compiled pattern */
+ return 1;
+ }
+
+/* Match succeded */
+
+printf("\nMatch succeeded at offset %d\n", ovector[0]);
+
+
+/*************************************************************************
+* We have found the first match within the subject string. If the output *
+* vector wasn't big enough, say so. Then output any substrings that were *
+* captured. *
+*************************************************************************/
+
+/* The output vector wasn't big enough */
+
+if (rc == 0)
+ {
+ rc = OVECCOUNT/3;
+ printf("ovector only has room for %d captured substrings\n", rc - 1);
+ }
+
+/* Show substrings stored in the output vector by number. Obviously, in a real
+application you might want to do things other than print them. */
+
+for (i = 0; i &lt; rc; i++)
+ {
+ char *substring_start = subject + ovector[2*i];
+ int substring_length = ovector[2*i+1] - ovector[2*i];
+ printf("%2d: %.*s\n", i, substring_length, substring_start);
+ }
+
+
+/**************************************************************************
+* That concludes the basic part of this demonstration program. We have *
+* compiled a pattern, and performed a single match. The code that follows *
+* shows first how to access named substrings, and then how to code for *
+* repeated matches on the same subject. *
+**************************************************************************/
+
+/* See if there are any named substrings, and if so, show them by name. First
+we have to extract the count of named parentheses from the pattern. */
+
+(void)pcre_fullinfo(
+ re, /* the compiled pattern */
+ NULL, /* no extra data - we didn't study the pattern */
+ PCRE_INFO_NAMECOUNT, /* number of named substrings */
+ &amp;namecount); /* where to put the answer */
+
+if (namecount &lt;= 0) printf("No named substrings\n"); else
+ {
+ unsigned char *tabptr;
+ printf("Named substrings\n");
+
+ /* Before we can access the substrings, we must extract the table for
+ translating names to numbers, and the size of each entry in the table. */
+
+ (void)pcre_fullinfo(
+ re, /* the compiled pattern */
+ NULL, /* no extra data - we didn't study the pattern */
+ PCRE_INFO_NAMETABLE, /* address of the table */
+ &amp;name_table); /* where to put the answer */
+
+ (void)pcre_fullinfo(
+ re, /* the compiled pattern */
+ NULL, /* no extra data - we didn't study the pattern */
+ PCRE_INFO_NAMEENTRYSIZE, /* size of each entry in the table */
+ &amp;name_entry_size); /* where to put the answer */
+
+ /* Now we can scan the table and, for each entry, print the number, the name,
+ and the substring itself. */
+
+ tabptr = name_table;
+ for (i = 0; i &lt; namecount; i++)
+ {
+ int n = (tabptr[0] &lt;&lt; 8) | tabptr[1];
+ printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2,
+ ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]);
+ tabptr += name_entry_size;
+ }
+ }
+
+
+/*************************************************************************
+* If the "-g" option was given on the command line, we want to continue *
+* to search for additional matches in the subject string, in a similar *
+* way to the /g option in Perl. This turns out to be trickier than you *
+* might think because of the possibility of matching an empty string. *
+* What happens is as follows: *
+* *
+* If the previous match was NOT for an empty string, we can just start *
+* the next match at the end of the previous one. *
+* *
+* If the previous match WAS for an empty string, we can't do that, as it *
+* would lead to an infinite loop. Instead, a special call of pcre_exec() *
+* is made with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED flags set. *
+* The first of these tells PCRE that an empty string at the start of the *
+* subject is not a valid match; other possibilities must be tried. The *
+* second flag restricts PCRE to one match attempt at the initial string *
+* position. If this match succeeds, an alternative to the empty string *
+* match has been found, and we can print it and proceed round the loop, *
+* advancing by the length of whatever was found. If this match does not *
+* succeed, we still stay in the loop, advancing by just one character. *
+* In UTF-8 mode, which can be set by (*UTF8) in the pattern, this may be *
+* more than one byte. *
+* *
+* However, there is a complication concerned with newlines. When the *
+* newline convention is such that CRLF is a valid newline, we must *
+* advance by two characters rather than one. The newline convention can *
+* be set in the regex by (*CR), etc.; if not, we must find the default. *
+*************************************************************************/
+
+if (!find_all) /* Check for -g */
+ {
+ pcre_free(re); /* Release the memory used for the compiled pattern */
+ return 0; /* Finish unless -g was given */
+ }
+
+/* Before running the loop, check for UTF-8 and whether CRLF is a valid newline
+sequence. First, find the options with which the regex was compiled; extract
+the UTF-8 state, and mask off all but the newline options. */
+
+(void)pcre_fullinfo(re, NULL, PCRE_INFO_OPTIONS, &amp;option_bits);
+utf8 = option_bits &amp; PCRE_UTF8;
+option_bits &amp;= PCRE_NEWLINE_CR|PCRE_NEWLINE_LF|PCRE_NEWLINE_CRLF|
+ PCRE_NEWLINE_ANY|PCRE_NEWLINE_ANYCRLF;
+
+/* If no newline options were set, find the default newline convention from the
+build configuration. */
+
+if (option_bits == 0)
+ {
+ int d;
+ (void)pcre_config(PCRE_CONFIG_NEWLINE, &amp;d);
+ /* Note that these values are always the ASCII ones, even in
+ EBCDIC environments. CR = 13, NL = 10. */
+ option_bits = (d == 13)? PCRE_NEWLINE_CR :
+ (d == 10)? PCRE_NEWLINE_LF :
+ (d == (13&lt;&lt;8 | 10))? PCRE_NEWLINE_CRLF :
+ (d == -2)? PCRE_NEWLINE_ANYCRLF :
+ (d == -1)? PCRE_NEWLINE_ANY : 0;
+ }
+
+/* See if CRLF is a valid newline sequence. */
+
+crlf_is_newline =
+ option_bits == PCRE_NEWLINE_ANY ||
+ option_bits == PCRE_NEWLINE_CRLF ||
+ option_bits == PCRE_NEWLINE_ANYCRLF;
+
+/* Loop for second and subsequent matches */
+
+for (;;)
+ {
+ int options = 0; /* Normally no options */
+ int start_offset = ovector[1]; /* Start at end of previous match */
+
+ /* If the previous match was for an empty string, we are finished if we are
+ at the end of the subject. Otherwise, arrange to run another match at the
+ same point to see if a non-empty match can be found. */
+
+ if (ovector[0] == ovector[1])
+ {
+ if (ovector[0] == subject_length) break;
+ options = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
+ }
+
+ /* Run the next matching operation */
+
+ rc = pcre_exec(
+ re, /* the compiled pattern */
+ NULL, /* no extra data - we didn't study the pattern */
+ subject, /* the subject string */
+ subject_length, /* the length of the subject */
+ start_offset, /* starting offset in the subject */
+ options, /* options */
+ ovector, /* output vector for substring information */
+ OVECCOUNT); /* number of elements in the output vector */
+
+ /* This time, a result of NOMATCH isn't an error. If the value in "options"
+ is zero, it just means we have found all possible matches, so the loop ends.
+ Otherwise, it means we have failed to find a non-empty-string match at a
+ point where there was a previous empty-string match. In this case, we do what
+ Perl does: advance the matching position by one character, and continue. We
+ do this by setting the "end of previous match" offset, because that is picked
+ up at the top of the loop as the point at which to start again.
+
+ There are two complications: (a) When CRLF is a valid newline sequence, and
+ the current position is just before it, advance by an extra byte. (b)
+ Otherwise we must ensure that we skip an entire UTF-8 character if we are in
+ UTF-8 mode. */
+
+ if (rc == PCRE_ERROR_NOMATCH)
+ {
+ if (options == 0) break; /* All matches found */
+ ovector[1] = start_offset + 1; /* Advance one byte */
+ if (crlf_is_newline &amp;&amp; /* If CRLF is newline &amp; */
+ start_offset &lt; subject_length - 1 &amp;&amp; /* we are at CRLF, */
+ subject[start_offset] == '\r' &amp;&amp;
+ subject[start_offset + 1] == '\n')
+ ovector[1] += 1; /* Advance by one more. */
+ else if (utf8) /* Otherwise, ensure we */
+ { /* advance a whole UTF-8 */
+ while (ovector[1] &lt; subject_length) /* character. */
+ {
+ if ((subject[ovector[1]] &amp; 0xc0) != 0x80) break;
+ ovector[1] += 1;
+ }
+ }
+ continue; /* Go round the loop again */
+ }
+
+ /* Other matching errors are not recoverable. */
+
+ if (rc &lt; 0)
+ {
+ printf("Matching error %d\n", rc);
+ pcre_free(re); /* Release memory used for the compiled pattern */
+ return 1;
+ }
+
+ /* Match succeded */
+
+ printf("\nMatch succeeded again at offset %d\n", ovector[0]);
+
+ /* The match succeeded, but the output vector wasn't big enough. */
+
+ if (rc == 0)
+ {
+ rc = OVECCOUNT/3;
+ printf("ovector only has room for %d captured substrings\n", rc - 1);
+ }
+
+ /* As before, show substrings stored in the output vector by number, and then
+ also any named substrings. */
+
+ for (i = 0; i &lt; rc; i++)
+ {
+ char *substring_start = subject + ovector[2*i];
+ int substring_length = ovector[2*i+1] - ovector[2*i];
+ printf("%2d: %.*s\n", i, substring_length, substring_start);
+ }
+
+ if (namecount &lt;= 0) printf("No named substrings\n"); else
+ {
+ unsigned char *tabptr = name_table;
+ printf("Named substrings\n");
+ for (i = 0; i &lt; namecount; i++)
+ {
+ int n = (tabptr[0] &lt;&lt; 8) | tabptr[1];
+ printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2,
+ ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]);
+ tabptr += name_entry_size;
+ }
+ }
+ } /* End of loop to find second and subsequent matches */
+
+printf("\n");
+pcre_free(re); /* Release memory used for the compiled pattern */
+return 0;
+}
+
+/* End of pcredemo.c */
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcregrep.html b/doc/html/pcregrep.html
new file mode 100644
index 0000000..dacbb49
--- /dev/null
+++ b/doc/html/pcregrep.html
@@ -0,0 +1,759 @@
+<html>
+<head>
+<title>pcregrep specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcregrep man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
+<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
+<li><a name="TOC3" href="#SEC3">SUPPORT FOR COMPRESSED FILES</a>
+<li><a name="TOC4" href="#SEC4">BINARY FILES</a>
+<li><a name="TOC5" href="#SEC5">OPTIONS</a>
+<li><a name="TOC6" href="#SEC6">ENVIRONMENT VARIABLES</a>
+<li><a name="TOC7" href="#SEC7">NEWLINES</a>
+<li><a name="TOC8" href="#SEC8">OPTIONS COMPATIBILITY</a>
+<li><a name="TOC9" href="#SEC9">OPTIONS WITH DATA</a>
+<li><a name="TOC10" href="#SEC10">MATCHING ERRORS</a>
+<li><a name="TOC11" href="#SEC11">DIAGNOSTICS</a>
+<li><a name="TOC12" href="#SEC12">SEE ALSO</a>
+<li><a name="TOC13" href="#SEC13">AUTHOR</a>
+<li><a name="TOC14" href="#SEC14">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
+<P>
+<b>pcregrep [options] [long options] [pattern] [path1 path2 ...]</b>
+</P>
+<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
+<P>
+<b>pcregrep</b> searches files for character patterns, in the same way as other
+grep commands do, but it uses the PCRE regular expression library to support
+patterns that are compatible with the regular expressions of Perl 5. See
+<a href="pcresyntax.html"><b>pcresyntax</b>(3)</a>
+for a quick-reference summary of pattern syntax, or
+<a href="pcrepattern.html"><b>pcrepattern</b>(3)</a>
+for a full description of the syntax and semantics of the regular expressions
+that PCRE supports.
+</P>
+<P>
+Patterns, whether supplied on the command line or in a separate file, are given
+without delimiters. For example:
+<pre>
+ pcregrep Thursday /etc/motd
+</pre>
+If you attempt to use delimiters (for example, by surrounding a pattern with
+slashes, as is common in Perl scripts), they are interpreted as part of the
+pattern. Quotes can of course be used to delimit patterns on the command line
+because they are interpreted by the shell, and indeed quotes are required if a
+pattern contains white space or shell metacharacters.
+</P>
+<P>
+The first argument that follows any option settings is treated as the single
+pattern to be matched when neither <b>-e</b> nor <b>-f</b> is present.
+Conversely, when one or both of these options are used to specify patterns, all
+arguments are treated as path names. At least one of <b>-e</b>, <b>-f</b>, or an
+argument pattern must be provided.
+</P>
+<P>
+If no files are specified, <b>pcregrep</b> reads the standard input. The
+standard input can also be referenced by a name consisting of a single hyphen.
+For example:
+<pre>
+ pcregrep some-pattern /file1 - /file3
+</pre>
+By default, each line that matches a pattern is copied to the standard
+output, and if there is more than one file, the file name is output at the
+start of each line, followed by a colon. However, there are options that can
+change how <b>pcregrep</b> behaves. In particular, the <b>-M</b> option makes it
+possible to search for patterns that span line boundaries. What defines a line
+boundary is controlled by the <b>-N</b> (<b>--newline</b>) option.
+</P>
+<P>
+The amount of memory used for buffering files that are being scanned is
+controlled by a parameter that can be set by the <b>--buffer-size</b> option.
+The default value for this parameter is specified when <b>pcregrep</b> is built,
+with the default default being 20K. A block of memory three times this size is
+used (to allow for buffering "before" and "after" lines). An error occurs if a
+line overflows the buffer.
+</P>
+<P>
+Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
+BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>. When there is more than one pattern
+(specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
+each line in the order in which they are defined, except that all the <b>-e</b>
+patterns are tried before the <b>-f</b> patterns.
+</P>
+<P>
+By default, as soon as one pattern matches a line, no further patterns are
+considered. However, if <b>--colour</b> (or <b>--color</b>) is used to colour the
+matching substrings, or if <b>--only-matching</b>, <b>--file-offsets</b>, or
+<b>--line-offsets</b> is used to output only the part of the line that matched
+(either shown literally, or as an offset), scanning resumes immediately
+following the match, so that further matches on the same line can be found. If
+there are multiple patterns, they are all tried on the remainder of the line,
+but patterns that follow the one that matched are not tried on the earlier part
+of the line.
+</P>
+<P>
+This behaviour means that the order in which multiple patterns are specified
+can affect the output when one of the above options is used. This is no longer
+the same behaviour as GNU grep, which now manages to display earlier matches
+for later patterns (as long as there is no overlap).
+</P>
+<P>
+Patterns that can match an empty string are accepted, but empty string
+matches are never recognized. An example is the pattern "(super)?(man)?", in
+which all components are optional. This pattern finds all occurrences of both
+"super" and "man"; the output differs from matching with "super|man" when only
+the matching substrings are being shown.
+</P>
+<P>
+If the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variable is set,
+<b>pcregrep</b> uses the value to set a locale when calling the PCRE library.
+The <b>--locale</b> option can be used to override this.
+</P>
+<br><a name="SEC3" href="#TOC1">SUPPORT FOR COMPRESSED FILES</a><br>
+<P>
+It is possible to compile <b>pcregrep</b> so that it uses <b>libz</b> or
+<b>libbz2</b> to read files whose names end in <b>.gz</b> or <b>.bz2</b>,
+respectively. You can find out whether your binary has support for one or both
+of these file types by running it with the <b>--help</b> option. If the
+appropriate support is not present, files are treated as plain text. The
+standard input is always so treated.
+</P>
+<br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
+<P>
+By default, a file that contains a binary zero byte within the first 1024 bytes
+is identified as a binary file, and is processed specially. (GNU grep also
+identifies binary files in this manner.) See the <b>--binary-files</b> option
+for a means of changing the way binary files are handled.
+</P>
+<br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
+<P>
+The order in which some of the options appear can affect the output. For
+example, both the <b>-h</b> and <b>-l</b> options affect the printing of file
+names. Whichever comes later in the command line will be the one that takes
+effect. Similarly, except where noted below, if an option is given twice, the
+later setting is used. Numerical values for options may be followed by K or M,
+to signify multiplication by 1024 or 1024*1024 respectively.
+</P>
+<P>
+<b>--</b>
+This terminates the list of options. It is useful if the next item on the
+command line starts with a hyphen but is not an option. This allows for the
+processing of patterns and filenames that start with hyphens.
+</P>
+<P>
+<b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
+Output <i>number</i> lines of context after each matching line. If filenames
+and/or line numbers are being output, a hyphen separator is used instead of a
+colon for the context lines. A line containing "--" is output between each
+group of lines, unless they are in fact contiguous in the input file. The value
+of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
+guarantees to have up to 8K of following text available for context output.
+</P>
+<P>
+<b>-a</b>, <b>--text</b>
+Treat binary files as text. This is equivalent to
+<b>--binary-files</b>=<i>text</i>.
+</P>
+<P>
+<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
+Output <i>number</i> lines of context before each matching line. If filenames
+and/or line numbers are being output, a hyphen separator is used instead of a
+colon for the context lines. A line containing "--" is output between each
+group of lines, unless they are in fact contiguous in the input file. The value
+of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
+guarantees to have up to 8K of preceding text available for context output.
+</P>
+<P>
+<b>--binary-files=</b><i>word</i>
+Specify how binary files are to be processed. If the word is "binary" (the
+default), pattern matching is performed on binary files, but the only output is
+"Binary file &#60;name&#62; matches" when a match succeeds. If the word is "text",
+which is equivalent to the <b>-a</b> or <b>--text</b> option, binary files are
+processed in the same way as any other file. In this case, when a match
+succeeds, the output may be binary garbage, which can have nasty effects if
+sent to a terminal. If the word is "without-match", which is equivalent to the
+<b>-I</b> option, binary files are not processed at all; they are assumed not to
+be of interest.
+</P>
+<P>
+<b>--buffer-size=</b><i>number</i>
+Set the parameter that controls how much memory is used for buffering files
+that are being scanned.
+</P>
+<P>
+<b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
+Output <i>number</i> lines of context both before and after each matching line.
+This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
+</P>
+<P>
+<b>-c</b>, <b>--count</b>
+Do not output individual lines from the files that are being scanned; instead
+output the number of lines that would otherwise have been shown. If no lines
+are selected, the number zero is output. If several files are are being
+scanned, a count is output for each of them. However, if the
+<b>--files-with-matches</b> option is also used, only those files whose counts
+are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
+<b>-B</b>, and <b>-C</b> options are ignored.
+</P>
+<P>
+<b>--colour</b>, <b>--color</b>
+If this option is given without any data, it is equivalent to "--colour=auto".
+If data is required, it must be given in the same shell item, separated by an
+equals sign.
+</P>
+<P>
+<b>--colour=</b><i>value</i>, <b>--color=</b><i>value</i>
+This option specifies under what circumstances the parts of a line that matched
+a pattern should be coloured in the output. By default, the output is not
+coloured. The value (which is optional, see above) may be "never", "always", or
+"auto". In the latter case, colouring happens only if the standard output is
+connected to a terminal. More resources are used when colouring is enabled,
+because <b>pcregrep</b> has to search for all possible matches in a line, not
+just one, in order to colour them all.
+<br>
+<br>
+The colour that is used can be specified by setting the environment variable
+PCREGREP_COLOUR or PCREGREP_COLOR. The value of this variable should be a
+string of two numbers, separated by a semicolon. They are copied directly into
+the control string for setting colour on a terminal, so it is your
+responsibility to ensure that they make sense. If neither of the environment
+variables is set, the default is "1;31", which gives red.
+</P>
+<P>
+<b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
+If an input path is not a regular file or a directory, "action" specifies how
+it is to be processed. Valid values are "read" (the default) or "skip"
+(silently skip the path).
+</P>
+<P>
+<b>-d</b> <i>action</i>, <b>--directories=</b><i>action</i>
+If an input path is a directory, "action" specifies how it is to be processed.
+Valid values are "read" (the default in non-Windows environments, for
+compatibility with GNU grep), "recurse" (equivalent to the <b>-r</b> option), or
+"skip" (silently skip the path, the default in Windows environments). In the
+"read" case, directories are read as if they were ordinary files. In some
+operating systems the effect of reading a directory like this is an immediate
+end-of-file; in others it may provoke an error.
+</P>
+<P>
+<b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>, <b>--regexp=</b><i>pattern</i>
+Specify a pattern to be matched. This option can be used multiple times in
+order to specify several patterns. It can also be used as a way of specifying a
+single pattern that starts with a hyphen. When <b>-e</b> is used, no argument
+pattern is taken from the command line; all arguments are treated as file
+names. There is no limit to the number of patterns. They are applied to each
+line in the order in which they are defined until one matches.
+<br>
+<br>
+If <b>-f</b> is used with <b>-e</b>, the command line patterns are matched first,
+followed by the patterns from the file(s), independent of the order in which
+these options are specified. Note that multiple use of <b>-e</b> is not the same
+as a single pattern with alternatives. For example, X|Y finds the first
+character in a line that is X or Y, whereas if the two patterns are given
+separately, with X first, <b>pcregrep</b> finds X if it is present, even if it
+follows Y in the line. It finds Y only if there is no X in the line. This
+matters only if you are using <b>-o</b> or <b>--colo(u)r</b> to show the part(s)
+of the line that matched.
+</P>
+<P>
+<b>--exclude</b>=<i>pattern</i>
+Files (but not directories) whose names match the pattern are skipped without
+being processed. This applies to all files, whether listed on the command line,
+obtained from <b>--file-list</b>, or by scanning a directory. The pattern is a
+PCRE regular expression, and is matched against the final component of the file
+name, not the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do not
+apply to this pattern. The option may be given any number of times in order to
+specify multiple patterns. If a file name matches both an <b>--include</b>
+and an <b>--exclude</b> pattern, it is excluded. There is no short form for this
+option.
+</P>
+<P>
+<b>--exclude-from=</b><i>filename</i>
+Treat each non-empty line of the file as the data for an <b>--exclude</b>
+option. What constitutes a newline when reading the file is the operating
+system's default. The <b>--newline</b> option has no effect on this option. This
+option may be given more than once in order to specify a number of files to
+read.
+</P>
+<P>
+<b>--exclude-dir</b>=<i>pattern</i>
+Directories whose names match the pattern are skipped without being processed,
+whatever the setting of the <b>--recursive</b> option. This applies to all
+directories, whether listed on the command line, obtained from
+<b>--file-list</b>, or by scanning a parent directory. The pattern is a PCRE
+regular expression, and is matched against the final component of the directory
+name, not the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do not
+apply to this pattern. The option may be given any number of times in order to
+specify more than one pattern. If a directory matches both <b>--include-dir</b>
+and <b>--exclude-dir</b>, it is excluded. There is no short form for this
+option.
+</P>
+<P>
+<b>-F</b>, <b>--fixed-strings</b>
+Interpret each data-matching pattern as a list of fixed strings, separated by
+newlines, instead of as a regular expression. What constitutes a newline for
+this purpose is controlled by the <b>--newline</b> option. The <b>-w</b> (match
+as a word) and <b>-x</b> (match whole line) options can be used with <b>-F</b>.
+They apply to each of the fixed strings. A line is selected if any of the fixed
+strings are found in it (subject to <b>-w</b> or <b>-x</b>, if present). This
+option applies only to the patterns that are matched against the contents of
+files; it does not apply to patterns specified by any of the <b>--include</b> or
+<b>--exclude</b> options.
+</P>
+<P>
+<b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
+Read patterns from the file, one per line, and match them against
+each line of input. What constitutes a newline when reading the file is the
+operating system's default. The <b>--newline</b> option has no effect on this
+option. Trailing white space is removed from each line, and blank lines are
+ignored. An empty file contains no patterns and therefore matches nothing. See
+also the comments about multiple patterns versus a single pattern with
+alternatives in the description of <b>-e</b> above.
+<br>
+<br>
+If this option is given more than once, all the specified files are
+read. A data line is output if any of the patterns match it. A filename can
+be given as "-" to refer to the standard input. When <b>-f</b> is used, patterns
+specified on the command line using <b>-e</b> may also be present; they are
+tested before the file's patterns. However, no other pattern is taken from the
+command line; all arguments are treated as the names of paths to be searched.
+</P>
+<P>
+<b>--file-list</b>=<i>filename</i>
+Read a list of files and/or directories that are to be scanned from the given
+file, one per line. Trailing white space is removed from each line, and blank
+lines are ignored. These paths are processed before any that are listed on the
+command line. The filename can be given as "-" to refer to the standard input.
+If <b>--file</b> and <b>--file-list</b> are both specified as "-", patterns are
+read first. This is useful only when the standard input is a terminal, from
+which further lines (the list of files) can be read after an end-of-file
+indication. If this option is given more than once, all the specified files are
+read.
+</P>
+<P>
+<b>--file-offsets</b>
+Instead of showing lines or parts of lines that match, show each match as an
+offset from the start of the file and a length, separated by a comma. In this
+mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b>
+options are ignored. If there is more than one match in a line, each of them is
+shown separately. This option is mutually exclusive with <b>--line-offsets</b>
+and <b>--only-matching</b>.
+</P>
+<P>
+<b>-H</b>, <b>--with-filename</b>
+Force the inclusion of the filename at the start of output lines when searching
+a single file. By default, the filename is not shown in this case. For matching
+lines, the filename is followed by a colon; for context lines, a hyphen
+separator is used. If a line number is also being output, it follows the file
+name.
+</P>
+<P>
+<b>-h</b>, <b>--no-filename</b>
+Suppress the output filenames when searching multiple files. By default,
+filenames are shown when multiple files are searched. For matching lines, the
+filename is followed by a colon; for context lines, a hyphen separator is used.
+If a line number is also being output, it follows the file name.
+</P>
+<P>
+<b>--help</b>
+Output a help message, giving brief details of the command options and file
+type support, and then exit. Anything else on the command line is
+ignored.
+</P>
+<P>
+<b>-I</b>
+Treat binary files as never matching. This is equivalent to
+<b>--binary-files</b>=<i>without-match</i>.
+</P>
+<P>
+<b>-i</b>, <b>--ignore-case</b>
+Ignore upper/lower case distinctions during comparisons.
+</P>
+<P>
+<b>--include</b>=<i>pattern</i>
+If any <b>--include</b> patterns are specified, the only files that are
+processed are those that match one of the patterns (and do not match an
+<b>--exclude</b> pattern). This option does not affect directories, but it
+applies to all files, whether listed on the command line, obtained from
+<b>--file-list</b>, or by scanning a directory. The pattern is a PCRE regular
+expression, and is matched against the final component of the file name, not
+the entire path. The <b>-F</b>, <b>-w</b>, and <b>-x</b> options do not apply to
+this pattern. The option may be given any number of times. If a file name
+matches both an <b>--include</b> and an <b>--exclude</b> pattern, it is excluded.
+There is no short form for this option.
+</P>
+<P>
+<b>--include-from=</b><i>filename</i>
+Treat each non-empty line of the file as the data for an <b>--include</b>
+option. What constitutes a newline for this purpose is the operating system's
+default. The <b>--newline</b> option has no effect on this option. This option
+may be given any number of times; all the files are read.
+</P>
+<P>
+<b>--include-dir</b>=<i>pattern</i>
+If any <b>--include-dir</b> patterns are specified, the only directories that
+are processed are those that match one of the patterns (and do not match an
+<b>--exclude-dir</b> pattern). This applies to all directories, whether listed
+on the command line, obtained from <b>--file-list</b>, or by scanning a parent
+directory. The pattern is a PCRE regular expression, and is matched against the
+final component of the directory name, not the entire path. The <b>-F</b>,
+<b>-w</b>, and <b>-x</b> options do not apply to this pattern. The option may be
+given any number of times. If a directory matches both <b>--include-dir</b> and
+<b>--exclude-dir</b>, it is excluded. There is no short form for this option.
+</P>
+<P>
+<b>-L</b>, <b>--files-without-match</b>
+Instead of outputting lines from the files, just output the names of the files
+that do not contain any lines that would have been output. Each file name is
+output once, on a separate line.
+</P>
+<P>
+<b>-l</b>, <b>--files-with-matches</b>
+Instead of outputting lines from the files, just output the names of the files
+containing lines that would have been output. Each file name is output
+once, on a separate line. Searching normally stops as soon as a matching line
+is found in a file. However, if the <b>-c</b> (count) option is also used,
+matching continues in order to obtain the correct count, and those files that
+have at least one match are listed along with their counts. Using this option
+with <b>-c</b> is a way of suppressing the listing of files with no matches.
+</P>
+<P>
+<b>--label</b>=<i>name</i>
+This option supplies a name to be used for the standard input when file names
+are being output. If not supplied, "(standard input)" is used. There is no
+short form for this option.
+</P>
+<P>
+<b>--line-buffered</b>
+When this option is given, input is read and processed line by line, and the
+output is flushed after each write. By default, input is read in large chunks,
+unless <b>pcregrep</b> can determine that it is reading from a terminal (which
+is currently possible only in Unix-like environments). Output to terminal is
+normally automatically flushed by the operating system. This option can be
+useful when the input or output is attached to a pipe and you do not want
+<b>pcregrep</b> to buffer up large amounts of data. However, its use will affect
+performance, and the <b>-M</b> (multiline) option ceases to work.
+</P>
+<P>
+<b>--line-offsets</b>
+Instead of showing lines or parts of lines that match, show each match as a
+line number, the offset from the start of the line, and a length. The line
+number is terminated by a colon (as usual; see the <b>-n</b> option), and the
+offset and length are separated by a comma. In this mode, no context is shown.
+That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b> options are ignored. If there is
+more than one match in a line, each of them is shown separately. This option is
+mutually exclusive with <b>--file-offsets</b> and <b>--only-matching</b>.
+</P>
+<P>
+<b>--locale</b>=<i>locale-name</i>
+This option specifies a locale to be used for pattern matching. It overrides
+the value in the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variables. If no
+locale is specified, the PCRE library's default (usually the "C" locale) is
+used. There is no short form for this option.
+</P>
+<P>
+<b>--match-limit</b>=<i>number</i>
+Processing some regular expression patterns can require a very large amount of
+memory, leading in some cases to a program crash if not enough is available.
+Other patterns may take a very long time to search for all possible matching
+strings. The <b>pcre_exec()</b> function that is called by <b>pcregrep</b> to do
+the matching has two parameters that can limit the resources that it uses.
+<br>
+<br>
+The <b>--match-limit</b> option provides a means of limiting resource usage
+when processing patterns that are not going to match, but which have a very
+large number of possibilities in their search trees. The classic example is a
+pattern that uses nested unlimited repeats. Internally, PCRE uses a function
+called <b>match()</b> which it calls repeatedly (sometimes recursively). The
+limit set by <b>--match-limit</b> is imposed on the number of times this
+function is called during a match, which has the effect of limiting the amount
+of backtracking that can take place.
+<br>
+<br>
+The <b>--recursion-limit</b> option is similar to <b>--match-limit</b>, but
+instead of limiting the total number of times that <b>match()</b> is called, it
+limits the depth of recursive calls, which in turn limits the amount of memory
+that can be used. The recursion depth is a smaller number than the total number
+of calls, because not all calls to <b>match()</b> are recursive. This limit is
+of use only if it is set smaller than <b>--match-limit</b>.
+<br>
+<br>
+There are no short forms for these options. The default settings are specified
+when the PCRE library is compiled, with the default default being 10 million.
+</P>
+<P>
+<b>-M</b>, <b>--multiline</b>
+Allow patterns to match more than one line. When this option is given, patterns
+may usefully contain literal newline characters and internal occurrences of ^
+and $ characters. The output for a successful match may consist of more than
+one line, the last of which is the one in which the match ended. If the matched
+string ends with a newline sequence the output ends at the end of that line.
+<br>
+<br>
+When this option is set, the PCRE library is called in "multiline" mode.
+There is a limit to the number of lines that can be matched, imposed by the way
+that <b>pcregrep</b> buffers the input file as it scans it. However,
+<b>pcregrep</b> ensures that at least 8K characters or the rest of the document
+(whichever is the shorter) are available for forward matching, and similarly
+the previous 8K characters (or all the previous characters, if fewer than 8K)
+are guaranteed to be available for lookbehind assertions. This option does not
+work when input is read line by line (see \fP--line-buffered\fP.)
+</P>
+<P>
+<b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
+The PCRE library supports five different conventions for indicating
+the ends of lines. They are the single-character sequences CR (carriage return)
+and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention,
+which recognizes any of the preceding three types, and an "any" convention, in
+which any Unicode line ending sequence is assumed to end a line. The Unicode
+sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF
+(form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and
+PS (paragraph separator, U+2029).
+<br>
+<br>
+When the PCRE library is built, a default line-ending sequence is specified.
+This is normally the standard sequence for the operating system. Unless
+otherwise specified by this option, <b>pcregrep</b> uses the library's default.
+The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
+makes it possible to use <b>pcregrep</b> to scan files that have come from other
+environments without having to modify their line endings. If the data that is
+being scanned does not agree with the convention set by this option,
+<b>pcregrep</b> may behave in strange ways. Note that this option does not
+apply to files specified by the <b>-f</b>, <b>--exclude-from</b>, or
+<b>--include-from</b> options, which are expected to use the operating system's
+standard newline sequence.
+</P>
+<P>
+<b>-n</b>, <b>--line-number</b>
+Precede each output line by its line number in the file, followed by a colon
+for matching lines or a hyphen for context lines. If the filename is also being
+output, it precedes the line number. This option is forced if
+<b>--line-offsets</b> is used.
+</P>
+<P>
+<b>--no-jit</b>
+If the PCRE library is built with support for just-in-time compiling (which
+speeds up matching), <b>pcregrep</b> automatically makes use of this, unless it
+was explicitly disabled at build time. This option can be used to disable the
+use of JIT at run time. It is provided for testing and working round problems.
+It should never be needed in normal use.
+</P>
+<P>
+<b>-o</b>, <b>--only-matching</b>
+Show only the part of the line that matched a pattern instead of the whole
+line. In this mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>, and
+<b>-C</b> options are ignored. If there is more than one match in a line, each
+of them is shown separately. If <b>-o</b> is combined with <b>-v</b> (invert the
+sense of the match to find non-matching lines), no output is generated, but the
+return code is set appropriately. If the matched portion of the line is empty,
+nothing is output unless the file name or line number are being printed, in
+which case they are shown on an otherwise empty line. This option is mutually
+exclusive with <b>--file-offsets</b> and <b>--line-offsets</b>.
+</P>
+<P>
+<b>-o</b><i>number</i>, <b>--only-matching</b>=<i>number</i>
+Show only the part of the line that matched the capturing parentheses of the
+given number. Up to 32 capturing parentheses are supported, and -o0 is
+equivalent to <b>-o</b> without a number. Because these options can be given
+without an argument (see above), if an argument is present, it must be given in
+the same shell item, for example, -o3 or --only-matching=2. The comments given
+for the non-argument case above also apply to this case. If the specified
+capturing parentheses do not exist in the pattern, or were not set in the
+match, nothing is output unless the file name or line number are being printed.
+<br>
+<br>
+If this option is given multiple times, multiple substrings are output, in the
+order the options are given. For example, -o3 -o1 -o3 causes the substrings
+matched by capturing parentheses 3 and 1 and then 3 again to be output. By
+default, there is no separator (but see the next option).
+</P>
+<P>
+<b>--om-separator</b>=<i>text</i>
+Specify a separating string for multiple occurrences of <b>-o</b>. The default
+is an empty string. Separating strings are never coloured.
+</P>
+<P>
+<b>-q</b>, <b>--quiet</b>
+Work quietly, that is, display nothing except error messages. The exit
+status indicates whether or not any matches were found.
+</P>
+<P>
+<b>-r</b>, <b>--recursive</b>
+If any given path is a directory, recursively scan the files it contains,
+taking note of any <b>--include</b> and <b>--exclude</b> settings. By default, a
+directory is read as a normal file; in some operating systems this gives an
+immediate end-of-file. This option is a shorthand for setting the <b>-d</b>
+option to "recurse".
+</P>
+<P>
+<b>--recursion-limit</b>=<i>number</i>
+See <b>--match-limit</b> above.
+</P>
+<P>
+<b>-s</b>, <b>--no-messages</b>
+Suppress error messages about non-existent or unreadable files. Such files are
+quietly skipped. However, the return code is still 2, even if matches were
+found in other files.
+</P>
+<P>
+<b>-u</b>, <b>--utf-8</b>
+Operate in UTF-8 mode. This option is available only if PCRE has been compiled
+with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
+<b>--include</b> options) and all subject lines that are scanned must be valid
+strings of UTF-8 characters.
+</P>
+<P>
+<b>-V</b>, <b>--version</b>
+Write the version numbers of <b>pcregrep</b> and the PCRE library to the
+standard output and then exit. Anything else on the command line is
+ignored.
+</P>
+<P>
+<b>-v</b>, <b>--invert-match</b>
+Invert the sense of the match, so that lines which do <i>not</i> match any of
+the patterns are the ones that are found.
+</P>
+<P>
+<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
+Force the patterns to match only whole words. This is equivalent to having \b
+at the start and end of the pattern. This option applies only to the patterns
+that are matched against the contents of files; it does not apply to patterns
+specified by any of the <b>--include</b> or <b>--exclude</b> options.
+</P>
+<P>
+<b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
+Force the patterns to be anchored (each must start matching at the beginning of
+a line) and in addition, require them to match entire lines. This is equivalent
+to having ^ and $ characters at the start and end of each alternative branch in
+every pattern. This option applies only to the patterns that are matched
+against the contents of files; it does not apply to patterns specified by any
+of the <b>--include</b> or <b>--exclude</b> options.
+</P>
+<br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
+<P>
+The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
+order, for a locale. The first one that is set is used. This can be overridden
+by the <b>--locale</b> option. If no locale is set, the PCRE library's default
+(usually the "C" locale) is used.
+</P>
+<br><a name="SEC7" href="#TOC1">NEWLINES</a><br>
+<P>
+The <b>-N</b> (<b>--newline</b>) option allows <b>pcregrep</b> to scan files with
+different newline conventions from the default. Any parts of the input files
+that are written to the standard output are copied identically, with whatever
+newline sequences they have in the input. However, the setting of this option
+does not affect the interpretation of files specified by the <b>-f</b>,
+<b>--exclude-from</b>, or <b>--include-from</b> options, which are assumed to use
+the operating system's standard newline sequence, nor does it affect the way in
+which <b>pcregrep</b> writes informational messages to the standard error and
+output streams. For these it uses the string "\n" to indicate newlines,
+relying on the C I/O library to convert this to an appropriate sequence.
+</P>
+<br><a name="SEC8" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
+<P>
+Many of the short and long forms of <b>pcregrep</b>'s options are the same
+as in the GNU <b>grep</b> program. Any long option of the form
+<b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
+(PCRE terminology). However, the <b>--file-list</b>, <b>--file-offsets</b>,
+<b>--include-dir</b>, <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>,
+<b>-M</b>, <b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
+<b>--recursion-limit</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to
+<b>pcregrep</b>, as is the use of the <b>--only-matching</b> option with a
+capturing parentheses number.
+</P>
+<P>
+Although most of the common options work the same way, a few are different in
+<b>pcregrep</b>. For example, the <b>--include</b> option's argument is a glob
+for GNU <b>grep</b>, but a regular expression for <b>pcregrep</b>. If both the
+<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
+without counts, but <b>pcregrep</b> gives the counts.
+</P>
+<br><a name="SEC9" href="#TOC1">OPTIONS WITH DATA</a><br>
+<P>
+There are four different ways in which an option with data can be specified.
+If a short form option is used, the data may follow immediately, or (with one
+exception) in the next command line item. For example:
+<pre>
+ -f/some/file
+ -f /some/file
+</pre>
+The exception is the <b>-o</b> option, which may appear with or without data.
+Because of this, if data is present, it must follow immediately in the same
+item, for example -o3.
+</P>
+<P>
+If a long form option is used, the data may appear in the same command line
+item, separated by an equals character, or (with two exceptions) it may appear
+in the next command line item. For example:
+<pre>
+ --file=/some/file
+ --file /some/file
+</pre>
+Note, however, that if you want to supply a file name beginning with ~ as data
+in a shell command, and have the shell expand ~ to a home directory, you must
+separate the file name from the option, because the shell does not treat ~
+specially unless it is at the start of an item.
+</P>
+<P>
+The exceptions to the above are the <b>--colour</b> (or <b>--color</b>) and
+<b>--only-matching</b> options, for which the data is optional. If one of these
+options does have data, it must be given in the first form, using an equals
+character. Otherwise <b>pcregrep</b> will assume that it has no data.
+</P>
+<br><a name="SEC10" href="#TOC1">MATCHING ERRORS</a><br>
+<P>
+It is possible to supply a regular expression that takes a very long time to
+fail to match certain lines. Such patterns normally involve nested indefinite
+repeats, for example: (a+)*\d when matched against a line of a's with no final
+digit. The PCRE matching function has a resource limit that causes it to abort
+in these circumstances. If this happens, <b>pcregrep</b> outputs an error
+message and the line that caused the problem to the standard error stream. If
+there are more than 20 such errors, <b>pcregrep</b> gives up.
+</P>
+<P>
+The <b>--match-limit</b> option of <b>pcregrep</b> can be used to set the overall
+resource limit; there is a second option called <b>--recursion-limit</b> that
+sets a limit on the amount of memory (usually stack) that is used (see the
+discussion of these options above).
+</P>
+<br><a name="SEC11" href="#TOC1">DIAGNOSTICS</a><br>
+<P>
+Exit status is 0 if any matches were found, 1 if no matches were found, and 2
+for syntax errors, overlong lines, non-existent or inaccessible files (even if
+matches were found in other files) or too many matching errors. Using the
+<b>-s</b> option to suppress error messages about inaccessible files does not
+affect the return code.
+</P>
+<br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcrepattern</b>(3), <b>pcresyntax</b>(3), <b>pcretest</b>(1).
+</P>
+<br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC14" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 03 April 2014
+<br>
+Copyright &copy; 1997-2014 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcrejit.html b/doc/html/pcrejit.html
new file mode 100644
index 0000000..210f1da
--- /dev/null
+++ b/doc/html/pcrejit.html
@@ -0,0 +1,452 @@
+<html>
+<head>
+<title>pcrejit specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrejit man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE JUST-IN-TIME COMPILER SUPPORT</a>
+<li><a name="TOC2" href="#SEC2">8-BIT, 16-BIT AND 32-BIT SUPPORT</a>
+<li><a name="TOC3" href="#SEC3">AVAILABILITY OF JIT SUPPORT</a>
+<li><a name="TOC4" href="#SEC4">SIMPLE USE OF JIT</a>
+<li><a name="TOC5" href="#SEC5">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a>
+<li><a name="TOC6" href="#SEC6">RETURN VALUES FROM JIT EXECUTION</a>
+<li><a name="TOC7" href="#SEC7">SAVING AND RESTORING COMPILED PATTERNS</a>
+<li><a name="TOC8" href="#SEC8">CONTROLLING THE JIT STACK</a>
+<li><a name="TOC9" href="#SEC9">JIT STACK FAQ</a>
+<li><a name="TOC10" href="#SEC10">EXAMPLE CODE</a>
+<li><a name="TOC11" href="#SEC11">JIT FAST PATH API</a>
+<li><a name="TOC12" href="#SEC12">SEE ALSO</a>
+<li><a name="TOC13" href="#SEC13">AUTHOR</a>
+<li><a name="TOC14" href="#SEC14">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">PCRE JUST-IN-TIME COMPILER SUPPORT</a><br>
+<P>
+Just-in-time compiling is a heavyweight optimization that can greatly speed up
+pattern matching. However, it comes at the cost of extra processing before the
+match is performed. Therefore, it is of most benefit when the same pattern is
+going to be matched many times. This does not necessarily mean many calls of a
+matching function; if the pattern is not anchored, matching attempts may take
+place many times at various positions in the subject, even for a single call.
+Therefore, if the subject string is very long, it may still pay to use JIT for
+one-off matches.
+</P>
+<P>
+JIT support applies only to the traditional Perl-compatible matching function.
+It does not apply when the DFA matching function is being used. The code for
+this support was written by Zoltan Herczeg.
+</P>
+<br><a name="SEC2" href="#TOC1">8-BIT, 16-BIT AND 32-BIT SUPPORT</a><br>
+<P>
+JIT support is available for all of the 8-bit, 16-bit and 32-bit PCRE
+libraries. To keep this documentation simple, only the 8-bit interface is
+described in what follows. If you are using the 16-bit library, substitute the
+16-bit functions and 16-bit structures (for example, <i>pcre16_jit_stack</i>
+instead of <i>pcre_jit_stack</i>). If you are using the 32-bit library,
+substitute the 32-bit functions and 32-bit structures (for example,
+<i>pcre32_jit_stack</i> instead of <i>pcre_jit_stack</i>).
+</P>
+<br><a name="SEC3" href="#TOC1">AVAILABILITY OF JIT SUPPORT</a><br>
+<P>
+JIT support is an optional feature of PCRE. The "configure" option --enable-jit
+(or equivalent CMake option) must be set when PCRE is built if you want to use
+JIT. The support is limited to the following hardware platforms:
+<pre>
+ ARM v5, v7, and Thumb2
+ Intel x86 32-bit and 64-bit
+ MIPS 32-bit
+ Power PC 32-bit and 64-bit
+ SPARC 32-bit (experimental)
+</pre>
+If --enable-jit is set on an unsupported platform, compilation fails.
+</P>
+<P>
+A program that is linked with PCRE 8.20 or later can tell if JIT support is
+available by calling <b>pcre_config()</b> with the PCRE_CONFIG_JIT option. The
+result is 1 when JIT is available, and 0 otherwise. However, a simple program
+does not need to check this in order to use JIT. The normal API is implemented
+in a way that falls back to the interpretive code if JIT is not available. For
+programs that need the best possible performance, there is also a "fast path"
+API that is JIT-specific.
+</P>
+<P>
+If your program may sometimes be linked with versions of PCRE that are older
+than 8.20, but you want to use JIT when it is available, you can test
+the values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such
+as PCRE_CONFIG_JIT, for compile-time control of your code.
+</P>
+<br><a name="SEC4" href="#TOC1">SIMPLE USE OF JIT</a><br>
+<P>
+You have to do two things to make use of the JIT support in the simplest way:
+<pre>
+ (1) Call <b>pcre_study()</b> with the PCRE_STUDY_JIT_COMPILE option for
+ each compiled pattern, and pass the resulting <b>pcre_extra</b> block to
+ <b>pcre_exec()</b>.
+
+ (2) Use <b>pcre_free_study()</b> to free the <b>pcre_extra</b> block when it is
+ no longer needed, instead of just freeing it yourself. This ensures that
+ any JIT data is also freed.
+</pre>
+For a program that may be linked with pre-8.20 versions of PCRE, you can insert
+<pre>
+ #ifndef PCRE_STUDY_JIT_COMPILE
+ #define PCRE_STUDY_JIT_COMPILE 0
+ #endif
+</pre>
+so that no option is passed to <b>pcre_study()</b>, and then use something like
+this to free the study data:
+<pre>
+ #ifdef PCRE_CONFIG_JIT
+ pcre_free_study(study_ptr);
+ #else
+ pcre_free(study_ptr);
+ #endif
+</pre>
+PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete
+matches. If you want to run partial matches using the PCRE_PARTIAL_HARD or
+PCRE_PARTIAL_SOFT options of <b>pcre_exec()</b>, you should set one or both of
+the following options in addition to, or instead of, PCRE_STUDY_JIT_COMPILE
+when you call <b>pcre_study()</b>:
+<pre>
+ PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
+ PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
+</pre>
+The JIT compiler generates different optimized code for each of the three
+modes (normal, soft partial, hard partial). When <b>pcre_exec()</b> is called,
+the appropriate code is run if it is available. Otherwise, the pattern is
+matched using interpretive code.
+</P>
+<P>
+In some circumstances you may need to call additional functions. These are
+described in the section entitled
+<a href="#stackcontrol">"Controlling the JIT stack"</a>
+below.
+</P>
+<P>
+If JIT support is not available, PCRE_STUDY_JIT_COMPILE etc. are ignored, and
+no JIT data is created. Otherwise, the compiled pattern is passed to the JIT
+compiler, which turns it into machine code that executes much faster than the
+normal interpretive code. When <b>pcre_exec()</b> is passed a <b>pcre_extra</b>
+block containing a pointer to JIT code of the appropriate mode (normal or
+hard/soft partial), it obeys that code instead of running the interpreter. The
+result is identical, but the compiled JIT code runs much faster.
+</P>
+<P>
+There are some <b>pcre_exec()</b> options that are not supported for JIT
+execution. There are also some pattern items that JIT cannot handle. Details
+are given below. In both cases, execution automatically falls back to the
+interpretive code. If you want to know whether JIT was actually used for a
+particular match, you should arrange for a JIT callback function to be set up
+as described in the section entitled
+<a href="#stackcontrol">"Controlling the JIT stack"</a>
+below, even if you do not need to supply a non-default JIT stack. Such a
+callback function is called whenever JIT code is about to be obeyed. If the
+execution options are not right for JIT execution, the callback function is not
+obeyed.
+</P>
+<P>
+If the JIT compiler finds an unsupported item, no JIT data is generated. You
+can find out if JIT execution is available after studying a pattern by calling
+<b>pcre_fullinfo()</b> with the PCRE_INFO_JIT option. A result of 1 means that
+JIT compilation was successful. A result of 0 means that JIT support is not
+available, or the pattern was not studied with PCRE_STUDY_JIT_COMPILE etc., or
+the JIT compiler was not able to handle the pattern.
+</P>
+<P>
+Once a pattern has been studied, with or without JIT, it can be used as many
+times as you like for matching different subject strings.
+</P>
+<br><a name="SEC5" href="#TOC1">UNSUPPORTED OPTIONS AND PATTERN ITEMS</a><br>
+<P>
+The only <b>pcre_exec()</b> options that are supported for JIT execution are
+PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK, PCRE_NO_UTF32_CHECK, PCRE_NOTBOL,
+PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and
+PCRE_PARTIAL_SOFT.
+</P>
+<P>
+The only unsupported pattern items are \C (match a single data unit) when
+running in a UTF mode, and a callout immediately before an assertion condition
+in a conditional group.
+</P>
+<br><a name="SEC6" href="#TOC1">RETURN VALUES FROM JIT EXECUTION</a><br>
+<P>
+When a pattern is matched using JIT execution, the return values are the same
+as those given by the interpretive <b>pcre_exec()</b> code, with the addition of
+one new error code: PCRE_ERROR_JIT_STACKLIMIT. This means that the memory used
+for the JIT stack was insufficient. See
+<a href="#stackcontrol">"Controlling the JIT stack"</a>
+below for a discussion of JIT stack usage. For compatibility with the
+interpretive <b>pcre_exec()</b> code, no more than two-thirds of the
+<i>ovector</i> argument is used for passing back captured substrings.
+</P>
+<P>
+The error code PCRE_ERROR_MATCHLIMIT is returned by the JIT code if searching a
+very large pattern tree goes on for too long, as it is in the same circumstance
+when JIT is not used, but the details of exactly what is counted are not the
+same. The PCRE_ERROR_RECURSIONLIMIT error code is never returned by JIT
+execution.
+</P>
+<br><a name="SEC7" href="#TOC1">SAVING AND RESTORING COMPILED PATTERNS</a><br>
+<P>
+The code that is generated by the JIT compiler is architecture-specific, and is
+also position dependent. For those reasons it cannot be saved (in a file or
+database) and restored later like the bytecode and other data of a compiled
+pattern. Saving and restoring compiled patterns is not something many people
+do. More detail about this facility is given in the
+<a href="pcreprecompile.html"><b>pcreprecompile</b></a>
+documentation. It should be possible to run <b>pcre_study()</b> on a saved and
+restored pattern, and thereby recreate the JIT data, but because JIT
+compilation uses significant resources, it is probably not worth doing this;
+you might as well recompile the original pattern.
+<a name="stackcontrol"></a></P>
+<br><a name="SEC8" href="#TOC1">CONTROLLING THE JIT STACK</a><br>
+<P>
+When the compiled JIT code runs, it needs a block of memory to use as a stack.
+By default, it uses 32K on the machine stack. However, some large or
+complicated patterns need more than this. The error PCRE_ERROR_JIT_STACKLIMIT
+is given when there is not enough stack. Three functions are provided for
+managing blocks of memory for use as JIT stacks. There is further discussion
+about the use of JIT stacks in the section entitled
+<a href="#stackcontrol">"JIT stack FAQ"</a>
+below.
+</P>
+<P>
+The <b>pcre_jit_stack_alloc()</b> function creates a JIT stack. Its arguments
+are a starting size and a maximum size, and it returns a pointer to an opaque
+structure of type <b>pcre_jit_stack</b>, or NULL if there is an error. The
+<b>pcre_jit_stack_free()</b> function can be used to free a stack that is no
+longer needed. (For the technically minded: the address space is allocated by
+mmap or VirtualAlloc.)
+</P>
+<P>
+JIT uses far less memory for recursion than the interpretive code,
+and a maximum stack size of 512K to 1M should be more than enough for any
+pattern.
+</P>
+<P>
+The <b>pcre_assign_jit_stack()</b> function specifies which stack JIT code
+should use. Its arguments are as follows:
+<pre>
+ pcre_extra *extra
+ pcre_jit_callback callback
+ void *data
+</pre>
+The <i>extra</i> argument must be the result of studying a pattern with
+PCRE_STUDY_JIT_COMPILE etc. There are three cases for the values of the other
+two options:
+<pre>
+ (1) If <i>callback</i> is NULL and <i>data</i> is NULL, an internal 32K block
+ on the machine stack is used.
+
+ (2) If <i>callback</i> is NULL and <i>data</i> is not NULL, <i>data</i> must be
+ a valid JIT stack, the result of calling <b>pcre_jit_stack_alloc()</b>.
+
+ (3) If <i>callback</i> is not NULL, it must point to a function that is
+ called with <i>data</i> as an argument at the start of matching, in
+ order to set up a JIT stack. If the return from the callback
+ function is NULL, the internal 32K stack is used; otherwise the
+ return value must be a valid JIT stack, the result of calling
+ <b>pcre_jit_stack_alloc()</b>.
+</pre>
+A callback function is obeyed whenever JIT code is about to be run; it is not
+obeyed when <b>pcre_exec()</b> is called with options that are incompatible for
+JIT execution. A callback function can therefore be used to determine whether a
+match operation was executed by JIT or by the interpreter.
+</P>
+<P>
+You may safely use the same JIT stack for more than one pattern (either by
+assigning directly or by callback), as long as the patterns are all matched
+sequentially in the same thread. In a multithread application, if you do not
+specify a JIT stack, or if you assign or pass back NULL from a callback, that
+is thread-safe, because each thread has its own machine stack. However, if you
+assign or pass back a non-NULL JIT stack, this must be a different stack for
+each thread so that the application is thread-safe.
+</P>
+<P>
+Strictly speaking, even more is allowed. You can assign the same non-NULL stack
+to any number of patterns as long as they are not used for matching by multiple
+threads at the same time. For example, you can assign the same stack to all
+compiled patterns, and use a global mutex in the callback to wait until the
+stack is available for use. However, this is an inefficient solution, and not
+recommended.
+</P>
+<P>
+This is a suggestion for how a multithreaded program that needs to set up
+non-default JIT stacks might operate:
+<pre>
+ During thread initalization
+ thread_local_var = pcre_jit_stack_alloc(...)
+
+ During thread exit
+ pcre_jit_stack_free(thread_local_var)
+
+ Use a one-line callback function
+ return thread_local_var
+</pre>
+All the functions described in this section do nothing if JIT is not available,
+and <b>pcre_assign_jit_stack()</b> does nothing unless the <b>extra</b> argument
+is non-NULL and points to a <b>pcre_extra</b> block that is the result of a
+successful study with PCRE_STUDY_JIT_COMPILE etc.
+<a name="stackfaq"></a></P>
+<br><a name="SEC9" href="#TOC1">JIT STACK FAQ</a><br>
+<P>
+(1) Why do we need JIT stacks?
+<br>
+<br>
+PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack where
+the local data of the current node is pushed before checking its child nodes.
+Allocating real machine stack on some platforms is difficult. For example, the
+stack chain needs to be updated every time if we extend the stack on PowerPC.
+Although it is possible, its updating time overhead decreases performance. So
+we do the recursion in memory.
+</P>
+<P>
+(2) Why don't we simply allocate blocks of memory with <b>malloc()</b>?
+<br>
+<br>
+Modern operating systems have a nice feature: they can reserve an address space
+instead of allocating memory. We can safely allocate memory pages inside this
+address space, so the stack could grow without moving memory data (this is
+important because of pointers). Thus we can allocate 1M address space, and use
+only a single memory page (usually 4K) if that is enough. However, we can still
+grow up to 1M anytime if needed.
+</P>
+<P>
+(3) Who "owns" a JIT stack?
+<br>
+<br>
+The owner of the stack is the user program, not the JIT studied pattern or
+anything else. The user program must ensure that if a stack is used by
+<b>pcre_exec()</b>, (that is, it is assigned to the pattern currently running),
+that stack must not be used by any other threads (to avoid overwriting the same
+memory area). The best practice for multithreaded programs is to allocate a
+stack for each thread, and return this stack through the JIT callback function.
+</P>
+<P>
+(4) When should a JIT stack be freed?
+<br>
+<br>
+You can free a JIT stack at any time, as long as it will not be used by
+<b>pcre_exec()</b> again. When you assign the stack to a pattern, only a pointer
+is set. There is no reference counting or any other magic. You can free the
+patterns and stacks in any order, anytime. Just <i>do not</i> call
+<b>pcre_exec()</b> with a pattern pointing to an already freed stack, as that
+will cause SEGFAULT. (Also, do not free a stack currently used by
+<b>pcre_exec()</b> in another thread). You can also replace the stack for a
+pattern at any time. You can even free the previous stack before assigning a
+replacement.
+</P>
+<P>
+(5) Should I allocate/free a stack every time before/after calling
+<b>pcre_exec()</b>?
+<br>
+<br>
+No, because this is too costly in terms of resources. However, you could
+implement some clever idea which release the stack if it is not used in let's
+say two minutes. The JIT callback can help to achieve this without keeping a
+list of the currently JIT studied patterns.
+</P>
+<P>
+(6) OK, the stack is for long term memory allocation. But what happens if a
+pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
+stack is freed?
+<br>
+<br>
+Especially on embedded sytems, it might be a good idea to release memory
+sometimes without freeing the stack. There is no API for this at the moment.
+Probably a function call which returns with the currently allocated memory for
+any stack and another which allows releasing memory (shrinking the stack) would
+be a good idea if someone needs this.
+</P>
+<P>
+(7) This is too much of a headache. Isn't there any better solution for JIT
+stack handling?
+<br>
+<br>
+No, thanks to Windows. If POSIX threads were used everywhere, we could throw
+out this complicated API.
+</P>
+<br><a name="SEC10" href="#TOC1">EXAMPLE CODE</a><br>
+<P>
+This is a single-threaded example that specifies a JIT stack without using a
+callback.
+<pre>
+ int rc;
+ int ovector[30];
+ pcre *re;
+ pcre_extra *extra;
+ pcre_jit_stack *jit_stack;
+
+ re = pcre_compile(pattern, 0, &error, &erroffset, NULL);
+ /* Check for errors */
+ extra = pcre_study(re, PCRE_STUDY_JIT_COMPILE, &error);
+ jit_stack = pcre_jit_stack_alloc(32*1024, 512*1024);
+ /* Check for error (NULL) */
+ pcre_assign_jit_stack(extra, NULL, jit_stack);
+ rc = pcre_exec(re, extra, subject, length, 0, 0, ovector, 30);
+ /* Check results */
+ pcre_free(re);
+ pcre_free_study(extra);
+ pcre_jit_stack_free(jit_stack);
+
+</PRE>
+</P>
+<br><a name="SEC11" href="#TOC1">JIT FAST PATH API</a><br>
+<P>
+Because the API described above falls back to interpreted execution when JIT is
+not available, it is convenient for programs that are written for general use
+in many environments. However, calling JIT via <b>pcre_exec()</b> does have a
+performance impact. Programs that are written for use where JIT is known to be
+available, and which need the best possible performance, can instead use a
+"fast path" API to call JIT execution directly instead of calling
+<b>pcre_exec()</b> (obviously only for patterns that have been successfully
+studied by JIT).
+</P>
+<P>
+The fast path function is called <b>pcre_jit_exec()</b>, and it takes exactly
+the same arguments as <b>pcre_exec()</b>, plus one additional argument that
+must point to a JIT stack. The JIT stack arrangements described above do not
+apply. The return values are the same as for <b>pcre_exec()</b>.
+</P>
+<P>
+When you call <b>pcre_exec()</b>, as well as testing for invalid options, a
+number of other sanity checks are performed on the arguments. For example, if
+the subject pointer is NULL, or its length is negative, an immediate error is
+given. Also, unless PCRE_NO_UTF[8|16|32] is set, a UTF subject string is tested
+for validity. In the interests of speed, these checks do not happen on the JIT
+fast path, and if invalid data is passed, the result is undefined.
+</P>
+<P>
+Bypassing the sanity checks and the <b>pcre_exec()</b> wrapping can give
+speedups of more than 10%.
+</P>
+<br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcreapi</b>(3)
+</P>
+<br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel (FAQ by Zoltan Herczeg)
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC14" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 17 March 2013
+<br>
+Copyright &copy; 1997-2013 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcrelimits.html b/doc/html/pcrelimits.html
new file mode 100644
index 0000000..ee5ebf0
--- /dev/null
+++ b/doc/html/pcrelimits.html
@@ -0,0 +1,90 @@
+<html>
+<head>
+<title>pcrelimits specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrelimits man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+SIZE AND OTHER LIMITATIONS
+</b><br>
+<P>
+There are some size limitations in PCRE but it is hoped that they will never in
+practice be relevant.
+</P>
+<P>
+The maximum length of a compiled pattern is approximately 64K data units (bytes
+for the 8-bit library, 16-bit units for the 16-bit library, and 32-bit units for
+the 32-bit library) if PCRE is compiled with the default internal linkage size,
+which is 2 bytes for the 8-bit and 16-bit libraries, and 4 bytes for the 32-bit
+library. If you want to process regular expressions that are truly enormous,
+you can compile PCRE with an internal linkage size of 3 or 4 (when building the
+16-bit or 32-bit library, 3 is rounded up to 4). See the <b>README</b> file in
+the source distribution and the
+<a href="pcrebuild.html"><b>pcrebuild</b></a>
+documentation for details. In these cases the limit is substantially larger.
+However, the speed of execution is slower.
+</P>
+<P>
+All values in repeating quantifiers must be less than 65536.
+</P>
+<P>
+There is no limit to the number of parenthesized subpatterns, but there can be
+no more than 65535 capturing subpatterns. There is, however, a limit to the
+depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
+order to limit the amount of system stack used at compile time. The limit can
+be specified when PCRE is built; the default is 250.
+</P>
+<P>
+There is a limit to the number of forward references to subsequent subpatterns
+of around 200,000. Repeated forward references with fixed upper limits, for
+example, (?2){0,100} when subpattern number 2 is to the right, are included in
+the count. There is no limit to the number of backward references.
+</P>
+<P>
+The maximum length of name for a named subpattern is 32 characters, and the
+maximum number of named subpatterns is 10000.
+</P>
+<P>
+The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
+is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries.
+</P>
+<P>
+The maximum length of a subject string is the largest positive number that an
+integer variable can hold. However, when using the traditional matching
+function, PCRE uses recursion to handle subpatterns and indefinite repetition.
+This means that the available stack space may limit the size of a subject
+string that can be processed by certain patterns. For a discussion of stack
+issues, see the
+<a href="pcrestack.html"><b>pcrestack</b></a>
+documentation.
+</P>
+<br><b>
+AUTHOR
+</b><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><b>
+REVISION
+</b><br>
+<P>
+Last updated: 05 November 2013
+<br>
+Copyright &copy; 1997-2013 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcrematching.html b/doc/html/pcrematching.html
new file mode 100644
index 0000000..a1af39b
--- /dev/null
+++ b/doc/html/pcrematching.html
@@ -0,0 +1,242 @@
+<html>
+<head>
+<title>pcrematching specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrematching man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE MATCHING ALGORITHMS</a>
+<li><a name="TOC2" href="#SEC2">REGULAR EXPRESSIONS AS TREES</a>
+<li><a name="TOC3" href="#SEC3">THE STANDARD MATCHING ALGORITHM</a>
+<li><a name="TOC4" href="#SEC4">THE ALTERNATIVE MATCHING ALGORITHM</a>
+<li><a name="TOC5" href="#SEC5">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a>
+<li><a name="TOC6" href="#SEC6">DISADVANTAGES OF THE ALTERNATIVE ALGORITHM</a>
+<li><a name="TOC7" href="#SEC7">AUTHOR</a>
+<li><a name="TOC8" href="#SEC8">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">PCRE MATCHING ALGORITHMS</a><br>
+<P>
+This document describes the two different algorithms that are available in PCRE
+for matching a compiled regular expression against a given subject string. The
+"standard" algorithm is the one provided by the <b>pcre_exec()</b>,
+<b>pcre16_exec()</b> and <b>pcre32_exec()</b> functions. These work in the same
+as as Perl's matching function, and provide a Perl-compatible matching operation.
+The just-in-time (JIT) optimization that is described in the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+documentation is compatible with these functions.
+</P>
+<P>
+An alternative algorithm is provided by the <b>pcre_dfa_exec()</b>,
+<b>pcre16_dfa_exec()</b> and <b>pcre32_dfa_exec()</b> functions; they operate in
+a different way, and are not Perl-compatible. This alternative has advantages
+and disadvantages compared with the standard algorithm, and these are described
+below.
+</P>
+<P>
+When there is only one possible way in which a given subject string can match a
+pattern, the two algorithms give the same answer. A difference arises, however,
+when there are multiple possibilities. For example, if the pattern
+<pre>
+ ^&#60;.*&#62;
+</pre>
+is matched against the string
+<pre>
+ &#60;something&#62; &#60;something else&#62; &#60;something further&#62;
+</pre>
+there are three possible answers. The standard algorithm finds only one of
+them, whereas the alternative algorithm finds all three.
+</P>
+<br><a name="SEC2" href="#TOC1">REGULAR EXPRESSIONS AS TREES</a><br>
+<P>
+The set of strings that are matched by a regular expression can be represented
+as a tree structure. An unlimited repetition in the pattern makes the tree of
+infinite size, but it is still a tree. Matching the pattern to a given subject
+string (from a given starting point) can be thought of as a search of the tree.
+There are two ways to search a tree: depth-first and breadth-first, and these
+correspond to the two matching algorithms provided by PCRE.
+</P>
+<br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br>
+<P>
+In the terminology of Jeffrey Friedl's book "Mastering Regular
+Expressions", the standard algorithm is an "NFA algorithm". It conducts a
+depth-first search of the pattern tree. That is, it proceeds along a single
+path through the tree, checking that the subject matches what is required. When
+there is a mismatch, the algorithm tries any alternatives at the current point,
+and if they all fail, it backs up to the previous branch point in the tree, and
+tries the next alternative branch at that level. This often involves backing up
+(moving to the left) in the subject string as well. The order in which
+repetition branches are tried is controlled by the greedy or ungreedy nature of
+the quantifier.
+</P>
+<P>
+If a leaf node is reached, a matching string has been found, and at that point
+the algorithm stops. Thus, if there is more than one possible match, this
+algorithm returns the first one that it finds. Whether this is the shortest,
+the longest, or some intermediate length depends on the way the greedy and
+ungreedy repetition quantifiers are specified in the pattern.
+</P>
+<P>
+Because it ends up with a single path through the tree, it is relatively
+straightforward for this algorithm to keep track of the substrings that are
+matched by portions of the pattern in parentheses. This provides support for
+capturing parentheses and back references.
+</P>
+<br><a name="SEC4" href="#TOC1">THE ALTERNATIVE MATCHING ALGORITHM</a><br>
+<P>
+This algorithm conducts a breadth-first search of the tree. Starting from the
+first matching point in the subject, it scans the subject string from left to
+right, once, character by character, and as it does this, it remembers all the
+paths through the tree that represent valid matches. In Friedl's terminology,
+this is a kind of "DFA algorithm", though it is not implemented as a
+traditional finite state machine (it keeps multiple states active
+simultaneously).
+</P>
+<P>
+Although the general principle of this matching algorithm is that it scans the
+subject string only once, without backtracking, there is one exception: when a
+lookaround assertion is encountered, the characters following or preceding the
+current point have to be independently inspected.
+</P>
+<P>
+The scan continues until either the end of the subject is reached, or there are
+no more unterminated paths. At this point, terminated paths represent the
+different matching possibilities (if there are none, the match has failed).
+Thus, if there is more than one possible match, this algorithm finds all of
+them, and in particular, it finds the longest. The matches are returned in
+decreasing order of length. There is an option to stop the algorithm after the
+first match (which is necessarily the shortest) is found.
+</P>
+<P>
+Note that all the matches that are found start at the same point in the
+subject. If the pattern
+<pre>
+ cat(er(pillar)?)?
+</pre>
+is matched against the string "the caterpillar catchment", the result will be
+the three strings "caterpillar", "cater", and "cat" that start at the fifth
+character of the subject. The algorithm does not automatically move on to find
+matches that start at later positions.
+</P>
+<P>
+PCRE's "auto-possessification" optimization usually applies to character
+repeats at the end of a pattern (as well as internally). For example, the
+pattern "a\d+" is compiled as if it were "a\d++" because there is no point
+even considering the possibility of backtracking into the repeated digits. For
+DFA matching, this means that only one possible match is found. If you really
+do want multiple matches in such cases, either use an ungreedy repeat
+("a\d+?") or set the PCRE_NO_AUTO_POSSESS option when compiling.
+</P>
+<P>
+There are a number of features of PCRE regular expressions that are not
+supported by the alternative matching algorithm. They are as follows:
+</P>
+<P>
+1. Because the algorithm finds all possible matches, the greedy or ungreedy
+nature of repetition quantifiers is not relevant. Greedy and ungreedy
+quantifiers are treated in exactly the same way. However, possessive
+quantifiers can make a difference when what follows could also match what is
+quantified, for example in a pattern like this:
+<pre>
+ ^a++\w!
+</pre>
+This pattern matches "aaab!" but not "aaa!", which would be matched by a
+non-possessive quantifier. Similarly, if an atomic group is present, it is
+matched as if it were a standalone pattern at the current point, and the
+longest match is then "locked in" for the rest of the overall pattern.
+</P>
+<P>
+2. When dealing with multiple paths through the tree simultaneously, it is not
+straightforward to keep track of captured substrings for the different matching
+possibilities, and PCRE's implementation of this algorithm does not attempt to
+do this. This means that no captured substrings are available.
+</P>
+<P>
+3. Because no substrings are captured, back references within the pattern are
+not supported, and cause errors if encountered.
+</P>
+<P>
+4. For the same reason, conditional expressions that use a backreference as the
+condition or test for a specific group recursion are not supported.
+</P>
+<P>
+5. Because many paths through the tree may be active, the \K escape sequence,
+which resets the start of the match when encountered (but may be on some paths
+and not on others), is not supported. It causes an error if encountered.
+</P>
+<P>
+6. Callouts are supported, but the value of the <i>capture_top</i> field is
+always 1, and the value of the <i>capture_last</i> field is always -1.
+</P>
+<P>
+7. The \C escape sequence, which (in the standard algorithm) always matches a
+single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is not supported in
+these modes, because the alternative algorithm moves through the subject string
+one character (not data unit) at a time, for all active paths through the tree.
+</P>
+<P>
+8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not
+supported. (*FAIL) is supported, and behaves like a failing negative assertion.
+</P>
+<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
+<P>
+Using the alternative matching algorithm provides the following advantages:
+</P>
+<P>
+1. All possible matches (at a single point in the subject) are automatically
+found, and in particular, the longest match is found. To find more than one
+match using the standard algorithm, you have to do kludgy things with
+callouts.
+</P>
+<P>
+2. Because the alternative algorithm scans the subject string just once, and
+never needs to backtrack (except for lookbehinds), it is possible to pass very
+long subject strings to the matching function in several pieces, checking for
+partial matching each time. Although it is possible to do multi-segment
+matching using the standard algorithm by retaining partially matched
+substrings, it is more complicated. The
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+documentation gives details of partial matching and discusses multi-segment
+matching.
+</P>
+<br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
+<P>
+The alternative algorithm suffers from a number of disadvantages:
+</P>
+<P>
+1. It is substantially slower than the standard algorithm. This is partly
+because it has to search for all possible matches, but is also because it is
+less susceptible to optimization.
+</P>
+<P>
+2. Capturing parentheses and back references are not supported.
+</P>
+<P>
+3. Although atomic groups are supported, their use does not provide the
+performance advantage that it does for the standard algorithm.
+</P>
+<br><a name="SEC7" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC8" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 12 November 2013
+<br>
+Copyright &copy; 1997-2012 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcrepartial.html b/doc/html/pcrepartial.html
new file mode 100644
index 0000000..4faeafc
--- /dev/null
+++ b/doc/html/pcrepartial.html
@@ -0,0 +1,509 @@
+<html>
+<head>
+<title>pcrepartial specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrepartial man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
+<li><a name="TOC2" href="#SEC2">PARTIAL MATCHING USING pcre_exec() OR pcre[16|32]_exec()</a>
+<li><a name="TOC3" href="#SEC3">PARTIAL MATCHING USING pcre_dfa_exec() OR pcre[16|32]_dfa_exec()</a>
+<li><a name="TOC4" href="#SEC4">PARTIAL MATCHING AND WORD BOUNDARIES</a>
+<li><a name="TOC5" href="#SEC5">FORMERLY RESTRICTED PATTERNS</a>
+<li><a name="TOC6" href="#SEC6">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
+<li><a name="TOC7" href="#SEC7">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre[16|32]_dfa_exec()</a>
+<li><a name="TOC8" href="#SEC8">MULTI-SEGMENT MATCHING WITH pcre_exec() OR pcre[16|32]_exec()</a>
+<li><a name="TOC9" href="#SEC9">ISSUES WITH MULTI-SEGMENT MATCHING</a>
+<li><a name="TOC10" href="#SEC10">AUTHOR</a>
+<li><a name="TOC11" href="#SEC11">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
+<P>
+In normal use of PCRE, if the subject string that is passed to a matching
+function matches as far as it goes, but is too short to match the entire
+pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances where it might
+be helpful to distinguish this case from other cases in which there is no
+match.
+</P>
+<P>
+Consider, for example, an application where a human is required to type in data
+for a field with specific formatting requirements. An example might be a date
+in the form <i>ddmmmyy</i>, defined by this pattern:
+<pre>
+ ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
+</pre>
+If the application sees the user's keystrokes one by one, and can check that
+what has been typed so far is potentially valid, it is able to raise an error
+as soon as a mistake is made, by beeping and not reflecting the character that
+has been typed, for example. This immediate feedback is likely to be a better
+user interface than a check that is delayed until the entire string has been
+entered. Partial matching can also be useful when the subject string is very
+long and is not all available at once.
+</P>
+<P>
+PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and
+PCRE_PARTIAL_HARD options, which can be set when calling any of the matching
+functions. For backwards compatibility, PCRE_PARTIAL is a synonym for
+PCRE_PARTIAL_SOFT. The essential difference between the two options is whether
+or not a partial match is preferred to an alternative complete match, though
+the details differ between the two types of matching function. If both options
+are set, PCRE_PARTIAL_HARD takes precedence.
+</P>
+<P>
+If you want to use partial matching with just-in-time optimized code, you must
+call <b>pcre_study()</b>, <b>pcre16_study()</b> or <b>pcre32_study()</b> with one
+or both of these options:
+<pre>
+ PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
+ PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
+</pre>
+PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial
+matches on the same pattern. If the appropriate JIT study mode has not been set
+for a match, the interpretive matching code is used.
+</P>
+<P>
+Setting a partial matching option disables two of PCRE's standard
+optimizations. PCRE remembers the last literal data unit in a pattern, and
+abandons matching immediately if it is not present in the subject string. This
+optimization cannot be used for a subject string that might match only
+partially. If the pattern was studied, PCRE knows the minimum length of a
+matching string, and does not bother to run the matching function on shorter
+strings. This optimization is also disabled for partial matching.
+</P>
+<br><a name="SEC2" href="#TOC1">PARTIAL MATCHING USING pcre_exec() OR pcre[16|32]_exec()</a><br>
+<P>
+A partial match occurs during a call to <b>pcre_exec()</b> or
+<b>pcre[16|32]_exec()</b> when the end of the subject string is reached
+successfully, but matching cannot continue because more characters are needed.
+However, at least one character in the subject must have been inspected. This
+character need not form part of the final matched string; lookbehind assertions
+and the \K escape sequence provide ways of inspecting characters before the
+start of a matched substring. The requirement for inspecting at least one
+character exists because an empty string can always be matched; without such a
+restriction there would always be a partial match of an empty string at the end
+of the subject.
+</P>
+<P>
+If there are at least two slots in the offsets vector when a partial match is
+returned, the first slot is set to the offset of the earliest character that
+was inspected. For convenience, the second offset points to the end of the
+subject so that a substring can easily be identified. If there are at least
+three slots in the offsets vector, the third slot is set to the offset of the
+character where matching started.
+</P>
+<P>
+For the majority of patterns, the contents of the first and third slots will be
+the same. However, for patterns that contain lookbehind assertions, or begin
+with \b or \B, characters before the one where matching started may have been
+inspected while carrying out the match. For example, consider this pattern:
+<pre>
+ /(?&#60;=abc)123/
+</pre>
+This pattern matches "123", but only if it is preceded by "abc". If the subject
+string is "xyzabc12", the first two offsets after a partial match are for the
+substring "abc12", because all these characters were inspected. However, the
+third offset is set to 6, because that is the offset where matching began.
+</P>
+<P>
+What happens when a partial match is identified depends on which of the two
+partial matching options are set.
+</P>
+<br><b>
+PCRE_PARTIAL_SOFT WITH pcre_exec() OR pcre[16|32]_exec()
+</b><br>
+<P>
+If PCRE_PARTIAL_SOFT is set when <b>pcre_exec()</b> or <b>pcre[16|32]_exec()</b>
+identifies a partial match, the partial match is remembered, but matching
+continues as normal, and other alternatives in the pattern are tried. If no
+complete match can be found, PCRE_ERROR_PARTIAL is returned instead of
+PCRE_ERROR_NOMATCH.
+</P>
+<P>
+This option is "soft" because it prefers a complete match over a partial match.
+All the various matching items in a pattern behave as if the subject string is
+potentially complete. For example, \z, \Z, and $ match at the end of the
+subject, as normal, and for \b and \B the end of the subject is treated as a
+non-alphanumeric.
+</P>
+<P>
+If there is more than one partial match, the first one that was found provides
+the data that is returned. Consider this pattern:
+<pre>
+ /123\w+X|dogY/
+</pre>
+If this is matched against the subject string "abc123dog", both
+alternatives fail to match, but the end of the subject is reached during
+matching, so PCRE_ERROR_PARTIAL is returned. The offsets are set to 3 and 9,
+identifying "123dog" as the first partial match that was found. (In this
+example, there are two partial matches, because "dog" on its own partially
+matches the second alternative.)
+</P>
+<br><b>
+PCRE_PARTIAL_HARD WITH pcre_exec() OR pcre[16|32]_exec()
+</b><br>
+<P>
+If PCRE_PARTIAL_HARD is set for <b>pcre_exec()</b> or <b>pcre[16|32]_exec()</b>,
+PCRE_ERROR_PARTIAL is returned as soon as a partial match is found, without
+continuing to search for possible complete matches. This option is "hard"
+because it prefers an earlier partial match over a later complete match. For
+this reason, the assumption is made that the end of the supplied subject string
+may not be the true end of the available data, and so, if \z, \Z, \b, \B,
+or $ are encountered at the end of the subject, the result is
+PCRE_ERROR_PARTIAL, provided that at least one character in the subject has
+been inspected.
+</P>
+<P>
+Setting PCRE_PARTIAL_HARD also affects the way UTF-8 and UTF-16
+subject strings are checked for validity. Normally, an invalid sequence
+causes the error PCRE_ERROR_BADUTF8 or PCRE_ERROR_BADUTF16. However, in the
+special case of a truncated character at the end of the subject,
+PCRE_ERROR_SHORTUTF8 or PCRE_ERROR_SHORTUTF16 is returned when
+PCRE_PARTIAL_HARD is set.
+</P>
+<br><b>
+Comparing hard and soft partial matching
+</b><br>
+<P>
+The difference between the two partial matching options can be illustrated by a
+pattern such as:
+<pre>
+ /dog(sbody)?/
+</pre>
+This matches either "dog" or "dogsbody", greedily (that is, it prefers the
+longer string if possible). If it is matched against the string "dog" with
+PCRE_PARTIAL_SOFT, it yields a complete match for "dog". However, if
+PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL. On the other hand,
+if the pattern is made ungreedy the result is different:
+<pre>
+ /dog(sbody)??/
+</pre>
+In this case the result is always a complete match because that is found first,
+and matching never continues after finding a complete match. It might be easier
+to follow this explanation by thinking of the two patterns like this:
+<pre>
+ /dog(sbody)?/ is the same as /dogsbody|dog/
+ /dog(sbody)??/ is the same as /dog|dogsbody/
+</pre>
+The second pattern will never match "dogsbody", because it will always find the
+shorter match first.
+</P>
+<br><a name="SEC3" href="#TOC1">PARTIAL MATCHING USING pcre_dfa_exec() OR pcre[16|32]_dfa_exec()</a><br>
+<P>
+The DFA functions move along the subject string character by character, without
+backtracking, searching for all possible matches simultaneously. If the end of
+the subject is reached before the end of the pattern, there is the possibility
+of a partial match, again provided that at least one character has been
+inspected.
+</P>
+<P>
+When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned only if there
+have been no complete matches. Otherwise, the complete matches are returned.
+However, if PCRE_PARTIAL_HARD is set, a partial match takes precedence over any
+complete matches. The portion of the string that was inspected when the longest
+partial match was found is set as the first matching string, provided there are
+at least two slots in the offsets vector.
+</P>
+<P>
+Because the DFA functions always search for all possible matches, and there is
+no difference between greedy and ungreedy repetition, their behaviour is
+different from the standard functions when PCRE_PARTIAL_HARD is set. Consider
+the string "dog" matched against the ungreedy pattern shown above:
+<pre>
+ /dog(sbody)??/
+</pre>
+Whereas the standard functions stop as soon as they find the complete match for
+"dog", the DFA functions also find the partial match for "dogsbody", and so
+return that when PCRE_PARTIAL_HARD is set.
+</P>
+<br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br>
+<P>
+If a pattern ends with one of sequences \b or \B, which test for word
+boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive
+results. Consider this pattern:
+<pre>
+ /\bcat\b/
+</pre>
+This matches "cat", provided there is a word boundary at either end. If the
+subject string is "the cat", the comparison of the final "t" with a following
+character cannot take place, so a partial match is found. However, normal
+matching carries on, and \b matches at the end of the subject when the last
+character is a letter, so a complete match is found. The result, therefore, is
+<i>not</i> PCRE_ERROR_PARTIAL. Using PCRE_PARTIAL_HARD in this case does yield
+PCRE_ERROR_PARTIAL, because then the partial match takes precedence.
+</P>
+<br><a name="SEC5" href="#TOC1">FORMERLY RESTRICTED PATTERNS</a><br>
+<P>
+For releases of PCRE prior to 8.00, because of the way certain internal
+optimizations were implemented in the <b>pcre_exec()</b> function, the
+PCRE_PARTIAL option (predecessor of PCRE_PARTIAL_SOFT) could not be used with
+all patterns. From release 8.00 onwards, the restrictions no longer apply, and
+partial matching with can be requested for any pattern.
+</P>
+<P>
+Items that were formerly restricted were repeated single characters and
+repeated metasequences. If PCRE_PARTIAL was set for a pattern that did not
+conform to the restrictions, <b>pcre_exec()</b> returned the error code
+PCRE_ERROR_BADPARTIAL (-13). This error code is no longer in use. The
+PCRE_INFO_OKPARTIAL call to <b>pcre_fullinfo()</b> to find out if a compiled
+pattern can be used for partial matching now always returns 1.
+</P>
+<br><a name="SEC6" href="#TOC1">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a><br>
+<P>
+If the escape sequence \P is present in a <b>pcretest</b> data line, the
+PCRE_PARTIAL_SOFT option is used for the match. Here is a run of <b>pcretest</b>
+that uses the date example quoted above:
+<pre>
+ re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+ data&#62; 25jun04\P
+ 0: 25jun04
+ 1: jun
+ data&#62; 25dec3\P
+ Partial match: 23dec3
+ data&#62; 3ju\P
+ Partial match: 3ju
+ data&#62; 3juj\P
+ No match
+ data&#62; j\P
+ No match
+</pre>
+The first data string is matched completely, so <b>pcretest</b> shows the
+matched substrings. The remaining four strings do not match the complete
+pattern, but the first two are partial matches. Similar output is obtained
+if DFA matching is used.
+</P>
+<P>
+If the escape sequence \P is present more than once in a <b>pcretest</b> data
+line, the PCRE_PARTIAL_HARD option is set for the match.
+</P>
+<br><a name="SEC7" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre[16|32]_dfa_exec()</a><br>
+<P>
+When a partial match has been found using a DFA matching function, it is
+possible to continue the match by providing additional subject data and calling
+the function again with the same compiled regular expression, this time setting
+the PCRE_DFA_RESTART option. You must pass the same working space as before,
+because this is where details of the previous partial match are stored. Here is
+an example using <b>pcretest</b>, using the \R escape sequence to set the
+PCRE_DFA_RESTART option (\D specifies the use of the DFA matching function):
+<pre>
+ re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+ data&#62; 23ja\P\D
+ Partial match: 23ja
+ data&#62; n05\R\D
+ 0: n05
+</pre>
+The first call has "23ja" as the subject, and requests partial matching; the
+second call has "n05" as the subject for the continued (restarted) match.
+Notice that when the match is complete, only the last part is shown; PCRE does
+not retain the previously partially-matched string. It is up to the calling
+program to do that if it needs to.
+</P>
+<P>
+That means that, for an unanchored pattern, if a continued match fails, it is
+not possible to try again at a new starting point. All this facility is capable
+of doing is continuing with the previous match attempt. In the previous
+example, if the second set of data is "ug23" the result is no match, even
+though there would be a match for "aug23" if the entire string were given at
+once. Depending on the application, this may or may not be what you want.
+The only way to allow for starting again at the next character is to retain the
+matched part of the subject and try a new complete match.
+</P>
+<P>
+You can set the PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
+PCRE_DFA_RESTART to continue partial matching over multiple segments. This
+facility can be used to pass very long subject strings to the DFA matching
+functions.
+</P>
+<br><a name="SEC8" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_exec() OR pcre[16|32]_exec()</a><br>
+<P>
+From release 8.00, the standard matching functions can also be used to do
+multi-segment matching. Unlike the DFA functions, it is not possible to
+restart the previous match with a new segment of data. Instead, new data must
+be added to the previous subject string, and the entire match re-run, starting
+from the point where the partial match occurred. Earlier data can be discarded.
+</P>
+<P>
+It is best to use PCRE_PARTIAL_HARD in this situation, because it does not
+treat the end of a segment as the end of the subject when matching \z, \Z,
+\b, \B, and $. Consider an unanchored pattern that matches dates:
+<pre>
+ re&#62; /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
+ data&#62; The date is 23ja\P\P
+ Partial match: 23ja
+</pre>
+At this stage, an application could discard the text preceding "23ja", add on
+text from the next segment, and call the matching function again. Unlike the
+DFA matching functions, the entire matching string must always be available,
+and the complete matching process occurs for each call, so more memory and more
+processing time is needed.
+</P>
+<P>
+<b>Note:</b> If the pattern contains lookbehind assertions, or \K, or starts
+with \b or \B, the string that is returned for a partial match includes
+characters that precede the start of what would be returned for a complete
+match, because it contains all the characters that were inspected during the
+partial match.
+</P>
+<br><a name="SEC9" href="#TOC1">ISSUES WITH MULTI-SEGMENT MATCHING</a><br>
+<P>
+Certain types of pattern may give problems with multi-segment matching,
+whichever matching function is used.
+</P>
+<P>
+1. If the pattern contains a test for the beginning of a line, you need to pass
+the PCRE_NOTBOL option when the subject string for any call does start at the
+beginning of a line. There is also a PCRE_NOTEOL option, but in practice when
+doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which
+includes the effect of PCRE_NOTEOL.
+</P>
+<P>
+2. Lookbehind assertions that have already been obeyed are catered for in the
+offsets that are returned for a partial match. However a lookbehind assertion
+later in the pattern could require even earlier characters to be inspected. You
+can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
+<b>pcre_fullinfo()</b> or <b>pcre[16|32]_fullinfo()</b> functions to obtain the
+length of the longest lookbehind in the pattern. This length is given in
+characters, not bytes. If you always retain at least that many characters
+before the partially matched string, all should be well. (Of course, near the
+start of the subject, fewer characters may be present; in that case all
+characters should be retained.)
+</P>
+<P>
+From release 8.33, there is a more accurate way of deciding which characters to
+retain. Instead of subtracting the length of the longest lookbehind from the
+earliest inspected character (<i>offsets[0]</i>), the match start position
+(<i>offsets[2]</i>) should be used, and the next match attempt started at the
+<i>offsets[2]</i> character by setting the <i>startoffset</i> argument of
+<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>.
+</P>
+<P>
+For example, if the pattern "(?&#60;=123)abc" is partially
+matched against the string "xx123a", the three offset values returned are 2, 6,
+and 5. This indicates that the matching process that gave a partial match
+started at offset 5, but the characters "123a" were all inspected. The maximum
+lookbehind for that pattern is 3, so taking that away from 5 shows that we need
+only keep "123a", and the next match attempt can be started at offset 3 (that
+is, at "a") when further characters have been added. When the match start is
+not the earliest inspected character, <b>pcretest</b> shows it explicitly:
+<pre>
+ re&#62; "(?&#60;=123)abc"
+ data&#62; xx123a\P\P
+ Partial match at offset 5: 123a
+</PRE>
+</P>
+<P>
+3. Because a partial match must always contain at least one character, what
+might be considered a partial match of an empty string actually gives a "no
+match" result. For example:
+<pre>
+ re&#62; /c(?&#60;=abc)x/
+ data&#62; ab\P
+ No match
+</pre>
+If the next segment begins "cx", a match should be found, but this will only
+happen if characters from the previous segment are retained. For this reason, a
+"no match" result should be interpreted as "partial match of an empty string"
+when the pattern contains lookbehinds.
+</P>
+<P>
+4. Matching a subject string that is split into multiple segments may not
+always produce exactly the same result as matching over one single long string,
+especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and
+Word Boundaries" above describes an issue that arises if the pattern ends with
+\b or \B. Another kind of difference may occur when there are multiple
+matching possibilities, because (for PCRE_PARTIAL_SOFT) a partial match result
+is given only when there are no completed matches. This means that as soon as
+the shortest match has been found, continuation to a new subject segment is no
+longer possible. Consider again this <b>pcretest</b> example:
+<pre>
+ re&#62; /dog(sbody)?/
+ data&#62; dogsb\P
+ 0: dog
+ data&#62; do\P\D
+ Partial match: do
+ data&#62; gsb\R\P\D
+ 0: g
+ data&#62; dogsbody\D
+ 0: dogsbody
+ 1: dog
+</pre>
+The first data line passes the string "dogsb" to a standard matching function,
+setting the PCRE_PARTIAL_SOFT option. Although the string is a partial match
+for "dogsbody", the result is not PCRE_ERROR_PARTIAL, because the shorter
+string "dog" is a complete match. Similarly, when the subject is presented to
+a DFA matching function in several parts ("do" and "gsb" being the first two)
+the match stops when "dog" has been found, and it is not possible to continue.
+On the other hand, if "dogsbody" is presented as a single string, a DFA
+matching function finds both matches.
+</P>
+<P>
+Because of these problems, it is best to use PCRE_PARTIAL_HARD when matching
+multi-segment data. The example above then behaves differently:
+<pre>
+ re&#62; /dog(sbody)?/
+ data&#62; dogsb\P\P
+ Partial match: dogsb
+ data&#62; do\P\D
+ Partial match: do
+ data&#62; gsb\R\P\P\D
+ Partial match: gsb
+</pre>
+5. Patterns that contain alternatives at the top level which do not all start
+with the same pattern item may not work as expected when PCRE_DFA_RESTART is
+used. For example, consider this pattern:
+<pre>
+ 1234|3789
+</pre>
+If the first part of the subject is "ABC123", a partial match of the first
+alternative is found at offset 3. There is no partial match for the second
+alternative, because such a match does not start at the same point in the
+subject string. Attempting to continue with the string "7890" does not yield a
+match because only those alternatives that match at one point in the subject
+are remembered. The problem arises because the start of the second alternative
+matches within the first alternative. There is no problem with anchored
+patterns or patterns such as:
+<pre>
+ 1234|ABCD
+</pre>
+where no string can be a partial match for both alternatives. This is not a
+problem if a standard matching function is used, because the entire match has
+to be rerun each time:
+<pre>
+ re&#62; /1234|3789/
+ data&#62; ABC123\P\P
+ Partial match: 123
+ data&#62; 1237890
+ 0: 3789
+</pre>
+Of course, instead of using PCRE_DFA_RESTART, the same technique of re-running
+the entire match can also be used with the DFA matching functions. Another
+possibility is to work with two buffers. If a partial match at offset <i>n</i>
+in the first buffer is followed by "no match" when PCRE_DFA_RESTART is used on
+the second buffer, you can then try a new match starting at offset <i>n+1</i> in
+the first buffer.
+</P>
+<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC11" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 02 July 2013
+<br>
+Copyright &copy; 1997-2013 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcrepattern.html b/doc/html/pcrepattern.html
new file mode 100644
index 0000000..c06d1e0
--- /dev/null
+++ b/doc/html/pcrepattern.html
@@ -0,0 +1,3235 @@
+<html>
+<head>
+<title>pcrepattern specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrepattern man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION DETAILS</a>
+<li><a name="TOC2" href="#SEC2">SPECIAL START-OF-PATTERN ITEMS</a>
+<li><a name="TOC3" href="#SEC3">EBCDIC CHARACTER CODES</a>
+<li><a name="TOC4" href="#SEC4">CHARACTERS AND METACHARACTERS</a>
+<li><a name="TOC5" href="#SEC5">BACKSLASH</a>
+<li><a name="TOC6" href="#SEC6">CIRCUMFLEX AND DOLLAR</a>
+<li><a name="TOC7" href="#SEC7">FULL STOP (PERIOD, DOT) AND \N</a>
+<li><a name="TOC8" href="#SEC8">MATCHING A SINGLE DATA UNIT</a>
+<li><a name="TOC9" href="#SEC9">SQUARE BRACKETS AND CHARACTER CLASSES</a>
+<li><a name="TOC10" href="#SEC10">POSIX CHARACTER CLASSES</a>
+<li><a name="TOC11" href="#SEC11">COMPATIBILITY FEATURE FOR WORD BOUNDARIES</a>
+<li><a name="TOC12" href="#SEC12">VERTICAL BAR</a>
+<li><a name="TOC13" href="#SEC13">INTERNAL OPTION SETTING</a>
+<li><a name="TOC14" href="#SEC14">SUBPATTERNS</a>
+<li><a name="TOC15" href="#SEC15">DUPLICATE SUBPATTERN NUMBERS</a>
+<li><a name="TOC16" href="#SEC16">NAMED SUBPATTERNS</a>
+<li><a name="TOC17" href="#SEC17">REPETITION</a>
+<li><a name="TOC18" href="#SEC18">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
+<li><a name="TOC19" href="#SEC19">BACK REFERENCES</a>
+<li><a name="TOC20" href="#SEC20">ASSERTIONS</a>
+<li><a name="TOC21" href="#SEC21">CONDITIONAL SUBPATTERNS</a>
+<li><a name="TOC22" href="#SEC22">COMMENTS</a>
+<li><a name="TOC23" href="#SEC23">RECURSIVE PATTERNS</a>
+<li><a name="TOC24" href="#SEC24">SUBPATTERNS AS SUBROUTINES</a>
+<li><a name="TOC25" href="#SEC25">ONIGURUMA SUBROUTINE SYNTAX</a>
+<li><a name="TOC26" href="#SEC26">CALLOUTS</a>
+<li><a name="TOC27" href="#SEC27">BACKTRACKING CONTROL</a>
+<li><a name="TOC28" href="#SEC28">SEE ALSO</a>
+<li><a name="TOC29" href="#SEC29">AUTHOR</a>
+<li><a name="TOC30" href="#SEC30">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>
+<P>
+The syntax and semantics of the regular expressions that are supported by PCRE
+are described in detail below. There is a quick-reference syntax summary in the
+<a href="pcresyntax.html"><b>pcresyntax</b></a>
+page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE
+also supports some alternative regular expression syntax (which does not
+conflict with the Perl syntax) in order to provide some compatibility with
+regular expressions in Python, .NET, and Oniguruma.
+</P>
+<P>
+Perl's regular expressions are described in its own documentation, and
+regular expressions in general are covered in a number of books, some of which
+have copious examples. Jeffrey Friedl's "Mastering Regular Expressions",
+published by O'Reilly, covers regular expressions in great detail. This
+description of PCRE's regular expressions is intended as reference material.
+</P>
+<P>
+This document discusses the patterns that are supported by PCRE when one its
+main matching functions, <b>pcre_exec()</b> (8-bit) or <b>pcre[16|32]_exec()</b>
+(16- or 32-bit), is used. PCRE also has alternative matching functions,
+<b>pcre_dfa_exec()</b> and <b>pcre[16|32_dfa_exec()</b>, which match using a
+different algorithm that is not Perl-compatible. Some of the features discussed
+below are not available when DFA matching is used. The advantages and
+disadvantages of the alternative functions, and how they differ from the normal
+functions, are discussed in the
+<a href="pcrematching.html"><b>pcrematching</b></a>
+page.
+</P>
+<br><a name="SEC2" href="#TOC1">SPECIAL START-OF-PATTERN ITEMS</a><br>
+<P>
+A number of options that can be passed to <b>pcre_compile()</b> can also be set
+by special items at the start of a pattern. These are not Perl-compatible, but
+are provided to make these options accessible to pattern writers who are not
+able to change the program that processes the pattern. Any number of these
+items may appear, but they must all be together right at the start of the
+pattern string, and the letters must be in upper case.
+</P>
+<br><b>
+UTF support
+</b><br>
+<P>
+The original operation of PCRE was on strings of one-byte characters. However,
+there is now also support for UTF-8 strings in the original library, an
+extra library that supports 16-bit and UTF-16 character strings, and a
+third library that supports 32-bit and UTF-32 character strings. To use these
+features, PCRE must be built to include appropriate support. When using UTF
+strings you must either call the compiling function with the PCRE_UTF8,
+PCRE_UTF16, or PCRE_UTF32 option, or the pattern must start with one of
+these special sequences:
+<pre>
+ (*UTF8)
+ (*UTF16)
+ (*UTF32)
+ (*UTF)
+</pre>
+(*UTF) is a generic sequence that can be used with any of the libraries.
+Starting a pattern with such a sequence is equivalent to setting the relevant
+option. How setting a UTF mode affects pattern matching is mentioned in several
+places below. There is also a summary of features in the
+<a href="pcreunicode.html"><b>pcreunicode</b></a>
+page.
+</P>
+<P>
+Some applications that allow their users to supply patterns may wish to
+restrict them to non-UTF data for security reasons. If the PCRE_NEVER_UTF
+option is set at compile time, (*UTF) etc. are not allowed, and their
+appearance causes an error.
+</P>
+<br><b>
+Unicode property support
+</b><br>
+<P>
+Another special sequence that may appear at the start of a pattern is (*UCP).
+This has the same effect as setting the PCRE_UCP option: it causes sequences
+such as \d and \w to use Unicode properties to determine character types,
+instead of recognizing only characters with codes less than 128 via a lookup
+table.
+</P>
+<br><b>
+Disabling auto-possessification
+</b><br>
+<P>
+If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect as setting
+the PCRE_NO_AUTO_POSSESS option at compile time. This stops PCRE from making
+quantifiers possessive when what follows cannot match the repeated item. For
+example, by default a+b is treated as a++b. For more details, see the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+</P>
+<br><b>
+Disabling start-up optimizations
+</b><br>
+<P>
+If a pattern starts with (*NO_START_OPT), it has the same effect as setting the
+PCRE_NO_START_OPTIMIZE option either at compile or matching time. This disables
+several optimizations for quickly reaching "no match" results. For more
+details, see the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+<a name="newlines"></a></P>
+<br><b>
+Newline conventions
+</b><br>
+<P>
+PCRE supports five different conventions for indicating line breaks in
+strings: a single CR (carriage return) character, a single LF (linefeed)
+character, the two-character sequence CRLF, any of the three preceding, or any
+Unicode newline sequence. The
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page has
+<a href="pcreapi.html#newlines">further discussion</a>
+about newlines, and shows how to set the newline convention in the
+<i>options</i> arguments for the compiling and matching functions.
+</P>
+<P>
+It is also possible to specify a newline convention by starting a pattern
+string with one of the following five sequences:
+<pre>
+ (*CR) carriage return
+ (*LF) linefeed
+ (*CRLF) carriage return, followed by linefeed
+ (*ANYCRLF) any of the three above
+ (*ANY) all Unicode newline sequences
+</pre>
+These override the default and the options given to the compiling function. For
+example, on a Unix system where LF is the default newline sequence, the pattern
+<pre>
+ (*CR)a.b
+</pre>
+changes the convention to CR. That pattern matches "a\nb" because LF is no
+longer a newline. If more than one of these settings is present, the last one
+is used.
+</P>
+<P>
+The newline convention affects where the circumflex and dollar assertions are
+true. It also affects the interpretation of the dot metacharacter when
+PCRE_DOTALL is not set, and the behaviour of \N. However, it does not affect
+what the \R escape sequence matches. By default, this is any Unicode newline
+sequence, for Perl compatibility. However, this can be changed; see the
+description of \R in the section entitled
+<a href="#newlineseq">"Newline sequences"</a>
+below. A change of \R setting can be combined with a change of newline
+convention.
+</P>
+<br><b>
+Setting match and recursion limits
+</b><br>
+<P>
+The caller of <b>pcre_exec()</b> can set a limit on the number of times the
+internal <b>match()</b> function is called and on the maximum depth of
+recursive calls. These facilities are provided to catch runaway matches that
+are provoked by patterns with huge matching trees (a typical example is a
+pattern with nested unlimited repeats) and to avoid running out of system stack
+by too much recursion. When one of these limits is reached, <b>pcre_exec()</b>
+gives an error return. The limits can also be set by items at the start of the
+pattern of the form
+<pre>
+ (*LIMIT_MATCH=d)
+ (*LIMIT_RECURSION=d)
+</pre>
+where d is any number of decimal digits. However, the value of the setting must
+be less than the value set (or defaulted) by the caller of <b>pcre_exec()</b>
+for it to have any effect. In other words, the pattern writer can lower the
+limits set by the programmer, but not raise them. If there is more than one
+setting of one of these limits, the lower value is used.
+</P>
+<br><a name="SEC3" href="#TOC1">EBCDIC CHARACTER CODES</a><br>
+<P>
+PCRE can be compiled to run in an environment that uses EBCDIC as its character
+code rather than ASCII or Unicode (typically a mainframe system). In the
+sections below, character code values are ASCII or Unicode; in an EBCDIC
+environment these characters may have different code values, and there are no
+code points greater than 255.
+</P>
+<br><a name="SEC4" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br>
+<P>
+A regular expression is a pattern that is matched against a subject string from
+left to right. Most characters stand for themselves in a pattern, and match the
+corresponding characters in the subject. As a trivial example, the pattern
+<pre>
+ The quick brown fox
+</pre>
+matches a portion of a subject string that is identical to itself. When
+caseless matching is specified (the PCRE_CASELESS option), letters are matched
+independently of case. In a UTF mode, PCRE always understands the concept of
+case for characters whose values are less than 128, so caseless matching is
+always possible. For characters with higher values, the concept of case is
+supported if PCRE is compiled with Unicode property support, but not otherwise.
+If you want to use caseless matching for characters 128 and above, you must
+ensure that PCRE is compiled with Unicode property support as well as with
+UTF support.
+</P>
+<P>
+The power of regular expressions comes from the ability to include alternatives
+and repetitions in the pattern. These are encoded in the pattern by the use of
+<i>metacharacters</i>, which do not stand for themselves but instead are
+interpreted in some special way.
+</P>
+<P>
+There are two different sets of metacharacters: those that are recognized
+anywhere in the pattern except within square brackets, and those that are
+recognized within square brackets. Outside square brackets, the metacharacters
+are as follows:
+<pre>
+ \ general escape character with several uses
+ ^ assert start of string (or line, in multiline mode)
+ $ assert end of string (or line, in multiline mode)
+ . match any character except newline (by default)
+ [ start character class definition
+ | start of alternative branch
+ ( start subpattern
+ ) end subpattern
+ ? extends the meaning of (
+ also 0 or 1 quantifier
+ also quantifier minimizer
+ * 0 or more quantifier
+ + 1 or more quantifier
+ also "possessive quantifier"
+ { start min/max quantifier
+</pre>
+Part of a pattern that is in square brackets is called a "character class". In
+a character class the only metacharacters are:
+<pre>
+ \ general escape character
+ ^ negate the class, but only if the first character
+ - indicates character range
+ [ POSIX character class (only if followed by POSIX syntax)
+ ] terminates the character class
+</pre>
+The following sections describe the use of each of the metacharacters.
+</P>
+<br><a name="SEC5" href="#TOC1">BACKSLASH</a><br>
+<P>
+The backslash character has several uses. Firstly, if it is followed by a
+character that is not a number or a letter, it takes away any special meaning
+that character may have. This use of backslash as an escape character applies
+both inside and outside character classes.
+</P>
+<P>
+For example, if you want to match a * character, you write \* in the pattern.
+This escaping action applies whether or not the following character would
+otherwise be interpreted as a metacharacter, so it is always safe to precede a
+non-alphanumeric with backslash to specify that it stands for itself. In
+particular, if you want to match a backslash, you write \\.
+</P>
+<P>
+In a UTF mode, only ASCII numbers and letters have any special meaning after a
+backslash. All other characters (in particular, those whose codepoints are
+greater than 127) are treated as literals.
+</P>
+<P>
+If a pattern is compiled with the PCRE_EXTENDED option, most white space in the
+pattern (other than in a character class), and characters between a # outside a
+character class and the next newline, inclusive, are ignored. An escaping
+backslash can be used to include a white space or # character as part of the
+pattern.
+</P>
+<P>
+If you want to remove the special meaning from a sequence of characters, you
+can do so by putting them between \Q and \E. This is different from Perl in
+that $ and @ are handled as literals in \Q...\E sequences in PCRE, whereas in
+Perl, $ and @ cause variable interpolation. Note the following examples:
+<pre>
+ Pattern PCRE matches Perl matches
+
+ \Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
+ \Qabc\$xyz\E abc\$xyz abc\$xyz
+ \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
+</pre>
+The \Q...\E sequence is recognized both inside and outside character classes.
+An isolated \E that is not preceded by \Q is ignored. If \Q is not followed
+by \E later in the pattern, the literal interpretation continues to the end of
+the pattern (that is, \E is assumed at the end). If the isolated \Q is inside
+a character class, this causes an error, because the character class is not
+terminated.
+<a name="digitsafterbackslash"></a></P>
+<br><b>
+Non-printing characters
+</b><br>
+<P>
+A second use of backslash provides a way of encoding non-printing characters
+in patterns in a visible manner. There is no restriction on the appearance of
+non-printing characters, apart from the binary zero that terminates a pattern,
+but when a pattern is being prepared by text editing, it is often easier to use
+one of the following escape sequences than the binary character it represents:
+<pre>
+ \a alarm, that is, the BEL character (hex 07)
+ \cx "control-x", where x is any ASCII character
+ \e escape (hex 1B)
+ \f form feed (hex 0C)
+ \n linefeed (hex 0A)
+ \r carriage return (hex 0D)
+ \t tab (hex 09)
+ \0dd character with octal code 0dd
+ \ddd character with octal code ddd, or back reference
+ \o{ddd..} character with octal code ddd..
+ \xhh character with hex code hh
+ \x{hhh..} character with hex code hhh.. (non-JavaScript mode)
+ \uhhhh character with hex code hhhh (JavaScript mode only)
+</pre>
+The precise effect of \cx on ASCII characters is as follows: if x is a lower
+case letter, it is converted to upper case. Then bit 6 of the character (hex
+40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
+but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the
+data item (byte or 16-bit value) following \c has a value greater than 127, a
+compile-time error occurs. This locks out non-ASCII characters in all modes.
+</P>
+<P>
+The \c facility was designed for use with ASCII characters, but with the
+extension to Unicode it is even less useful than it once was. It is, however,
+recognized when PCRE is compiled in EBCDIC mode, where data items are always
+bytes. In this mode, all values are valid after \c. If the next character is a
+lower case letter, it is converted to upper case. Then the 0xc0 bits of the
+byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
+the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
+characters also generate different values.
+</P>
+<P>
+After \0 up to two further octal digits are read. If there are fewer than two
+digits, just those that are present are used. Thus the sequence \0\x\07
+specifies two binary zeros followed by a BEL character (code value 7). Make
+sure you supply two digits after the initial zero if the pattern character that
+follows is itself an octal digit.
+</P>
+<P>
+The escape \o must be followed by a sequence of octal digits, enclosed in
+braces. An error occurs if this is not the case. This escape is a recent
+addition to Perl; it provides way of specifying character code points as octal
+numbers greater than 0777, and it also allows octal numbers and back references
+to be unambiguously specified.
+</P>
+<P>
+For greater clarity and unambiguity, it is best to avoid following \ by a
+digit greater than zero. Instead, use \o{} or \x{} to specify character
+numbers, and \g{} to specify back references. The following paragraphs
+describe the old, ambiguous syntax.
+</P>
+<P>
+The handling of a backslash followed by a digit other than 0 is complicated,
+and Perl has changed in recent releases, causing PCRE also to change. Outside a
+character class, PCRE reads the digit and any following digits as a decimal
+number. If the number is less than 8, or if there have been at least that many
+previous capturing left parentheses in the expression, the entire sequence is
+taken as a <i>back reference</i>. A description of how this works is given
+<a href="#backreferences">later,</a>
+following the discussion of
+<a href="#subpattern">parenthesized subpatterns.</a>
+</P>
+<P>
+Inside a character class, or if the decimal number following \ is greater than
+7 and there have not been that many capturing subpatterns, PCRE handles \8 and
+\9 as the literal characters "8" and "9", and otherwise re-reads up to three
+octal digits following the backslash, using them to generate a data character.
+Any subsequent digits stand for themselves. For example:
+<pre>
+ \040 is another way of writing an ASCII space
+ \40 is the same, provided there are fewer than 40 previous capturing subpatterns
+ \7 is always a back reference
+ \11 might be a back reference, or another way of writing a tab
+ \011 is always a tab
+ \0113 is a tab followed by the character "3"
+ \113 might be a back reference, otherwise the character with octal code 113
+ \377 might be a back reference, otherwise the value 255 (decimal)
+ \81 is either a back reference, or the two characters "8" and "1"
+</pre>
+Note that octal values of 100 or greater that are specified using this syntax
+must not be introduced by a leading zero, because no more than three octal
+digits are ever read.
+</P>
+<P>
+By default, after \x that is not followed by {, from zero to two hexadecimal
+digits are read (letters can be in upper or lower case). Any number of
+hexadecimal digits may appear between \x{ and }. If a character other than
+a hexadecimal digit appears between \x{ and }, or if there is no terminating
+}, an error occurs.
+</P>
+<P>
+If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \x is
+as just described only when it is followed by two hexadecimal digits.
+Otherwise, it matches a literal "x" character. In JavaScript mode, support for
+code points greater than 256 is provided by \u, which must be followed by
+four hexadecimal digits; otherwise it matches a literal "u" character.
+</P>
+<P>
+Characters whose value is less than 256 can be defined by either of the two
+syntaxes for \x (or by \u in JavaScript mode). There is no difference in the
+way they are handled. For example, \xdc is exactly the same as \x{dc} (or
+\u00dc in JavaScript mode).
+</P>
+<br><b>
+Constraints on character values
+</b><br>
+<P>
+Characters that are specified using octal or hexadecimal numbers are
+limited to certain values, as follows:
+<pre>
+ 8-bit non-UTF mode less than 0x100
+ 8-bit UTF-8 mode less than 0x10ffff and a valid codepoint
+ 16-bit non-UTF mode less than 0x10000
+ 16-bit UTF-16 mode less than 0x10ffff and a valid codepoint
+ 32-bit non-UTF mode less than 0x100000000
+ 32-bit UTF-32 mode less than 0x10ffff and a valid codepoint
+</pre>
+Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called
+"surrogate" codepoints), and 0xffef.
+</P>
+<br><b>
+Escape sequences in character classes
+</b><br>
+<P>
+All the sequences that define a single character value can be used both inside
+and outside character classes. In addition, inside a character class, \b is
+interpreted as the backspace character (hex 08).
+</P>
+<P>
+\N is not allowed in a character class. \B, \R, and \X are not special
+inside a character class. Like other unrecognized escape sequences, they are
+treated as the literal characters "B", "R", and "X" by default, but cause an
+error if the PCRE_EXTRA option is set. Outside a character class, these
+sequences have different meanings.
+</P>
+<br><b>
+Unsupported escape sequences
+</b><br>
+<P>
+In Perl, the sequences \l, \L, \u, and \U are recognized by its string
+handler and used to modify the case of following characters. By default, PCRE
+does not support these escape sequences. However, if the PCRE_JAVASCRIPT_COMPAT
+option is set, \U matches a "U" character, and \u can be used to define a
+character by code point, as described in the previous section.
+</P>
+<br><b>
+Absolute and relative back references
+</b><br>
+<P>
+The sequence \g followed by an unsigned or a negative number, optionally
+enclosed in braces, is an absolute or relative back reference. A named back
+reference can be coded as \g{name}. Back references are discussed
+<a href="#backreferences">later,</a>
+following the discussion of
+<a href="#subpattern">parenthesized subpatterns.</a>
+</P>
+<br><b>
+Absolute and relative subroutine calls
+</b><br>
+<P>
+For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
+a number enclosed either in angle brackets or single quotes, is an alternative
+syntax for referencing a subpattern as a "subroutine". Details are discussed
+<a href="#onigurumasubroutines">later.</a>
+Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
+synonymous. The former is a back reference; the latter is a
+<a href="#subpatternsassubroutines">subroutine</a>
+call.
+<a name="genericchartypes"></a></P>
+<br><b>
+Generic character types
+</b><br>
+<P>
+Another use of backslash is for specifying generic character types:
+<pre>
+ \d any decimal digit
+ \D any character that is not a decimal digit
+ \h any horizontal white space character
+ \H any character that is not a horizontal white space character
+ \s any white space character
+ \S any character that is not a white space character
+ \v any vertical white space character
+ \V any character that is not a vertical white space character
+ \w any "word" character
+ \W any "non-word" character
+</pre>
+There is also the single sequence \N, which matches a non-newline character.
+This is the same as
+<a href="#fullstopdot">the "." metacharacter</a>
+when PCRE_DOTALL is not set. Perl also uses \N to match characters by name;
+PCRE does not support this.
+</P>
+<P>
+Each pair of lower and upper case escape sequences partitions the complete set
+of characters into two disjoint sets. Any given character matches one, and only
+one, of each pair. The sequences can appear both inside and outside character
+classes. They each match one character of the appropriate type. If the current
+matching point is at the end of the subject string, all of them fail, because
+there is no character to match.
+</P>
+<P>
+For compatibility with Perl, \s did not used to match the VT character (code
+11), which made it different from the the POSIX "space" class. However, Perl
+added VT at release 5.18, and PCRE followed suit at release 8.34. The default
+\s characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space
+(32), which are defined as white space in the "C" locale. This list may vary if
+locale-specific matching is taking place. For example, in some locales the
+"non-breaking space" character (\xA0) is recognized as white space, and in
+others the VT character is not.
+</P>
+<P>
+A "word" character is an underscore or any character that is a letter or digit.
+By default, the definition of letters and digits is controlled by PCRE's
+low-valued character tables, and may vary if locale-specific matching is taking
+place (see
+<a href="pcreapi.html#localesupport">"Locale support"</a>
+in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page). For example, in a French locale such as "fr_FR" in Unix-like systems,
+or "french" in Windows, some character codes greater than 127 are used for
+accented letters, and these are then matched by \w. The use of locales with
+Unicode is discouraged.
+</P>
+<P>
+By default, characters whose code points are greater than 127 never match \d,
+\s, or \w, and always match \D, \S, and \W, although this may vary for
+characters in the range 128-255 when locale-specific matching is happening.
+These escape sequences retain their original meanings from before Unicode
+support was available, mainly for efficiency reasons. If PCRE is compiled with
+Unicode property support, and the PCRE_UCP option is set, the behaviour is
+changed so that Unicode properties are used to determine character types, as
+follows:
+<pre>
+ \d any character that matches \p{Nd} (decimal digit)
+ \s any character that matches \p{Z} or \h or \v
+ \w any character that matches \p{L} or \p{N}, plus underscore
+</pre>
+The upper case escapes match the inverse sets of characters. Note that \d
+matches only decimal digits, whereas \w matches any Unicode digit, as well as
+any Unicode letter, and underscore. Note also that PCRE_UCP affects \b, and
+\B because they are defined in terms of \w and \W. Matching these sequences
+is noticeably slower when PCRE_UCP is set.
+</P>
+<P>
+The sequences \h, \H, \v, and \V are features that were added to Perl at
+release 5.10. In contrast to the other sequences, which match only ASCII
+characters by default, these always match certain high-valued code points,
+whether or not PCRE_UCP is set. The horizontal space characters are:
+<pre>
+ U+0009 Horizontal tab (HT)
+ U+0020 Space
+ U+00A0 Non-break space
+ U+1680 Ogham space mark
+ U+180E Mongolian vowel separator
+ U+2000 En quad
+ U+2001 Em quad
+ U+2002 En space
+ U+2003 Em space
+ U+2004 Three-per-em space
+ U+2005 Four-per-em space
+ U+2006 Six-per-em space
+ U+2007 Figure space
+ U+2008 Punctuation space
+ U+2009 Thin space
+ U+200A Hair space
+ U+202F Narrow no-break space
+ U+205F Medium mathematical space
+ U+3000 Ideographic space
+</pre>
+The vertical space characters are:
+<pre>
+ U+000A Linefeed (LF)
+ U+000B Vertical tab (VT)
+ U+000C Form feed (FF)
+ U+000D Carriage return (CR)
+ U+0085 Next line (NEL)
+ U+2028 Line separator
+ U+2029 Paragraph separator
+</pre>
+In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are
+relevant.
+<a name="newlineseq"></a></P>
+<br><b>
+Newline sequences
+</b><br>
+<P>
+Outside a character class, by default, the escape sequence \R matches any
+Unicode newline sequence. In 8-bit non-UTF-8 mode \R is equivalent to the
+following:
+<pre>
+ (?&#62;\r\n|\n|\x0b|\f|\r|\x85)
+</pre>
+This is an example of an "atomic group", details of which are given
+<a href="#atomicgroup">below.</a>
+This particular group matches either the two-character sequence CR followed by
+LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
+U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
+line, U+0085). The two-character sequence is treated as a single unit that
+cannot be split.
+</P>
+<P>
+In other modes, two additional characters whose codepoints are greater than 255
+are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029).
+Unicode character property support is not needed for these characters to be
+recognized.
+</P>
+<P>
+It is possible to restrict \R to match only CR, LF, or CRLF (instead of the
+complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF
+either at compile time or when the pattern is matched. (BSR is an abbrevation
+for "backslash R".) This can be made the default when PCRE is built; if this is
+the case, the other behaviour can be requested via the PCRE_BSR_UNICODE option.
+It is also possible to specify these settings by starting a pattern string with
+one of the following sequences:
+<pre>
+ (*BSR_ANYCRLF) CR, LF, or CRLF only
+ (*BSR_UNICODE) any Unicode newline sequence
+</pre>
+These override the default and the options given to the compiling function, but
+they can themselves be overridden by options given to a matching function. Note
+that these special settings, which are not Perl-compatible, are recognized only
+at the very start of a pattern, and that they must be in upper case. If more
+than one of them is present, the last one is used. They can be combined with a
+change of newline convention; for example, a pattern can start with:
+<pre>
+ (*ANY)(*BSR_ANYCRLF)
+</pre>
+They can also be combined with the (*UTF8), (*UTF16), (*UTF32), (*UTF) or
+(*UCP) special sequences. Inside a character class, \R is treated as an
+unrecognized escape sequence, and so matches the letter "R" by default, but
+causes an error if PCRE_EXTRA is set.
+<a name="uniextseq"></a></P>
+<br><b>
+Unicode character properties
+</b><br>
+<P>
+When PCRE is built with Unicode character property support, three additional
+escape sequences that match characters with specific properties are available.
+When in 8-bit non-UTF-8 mode, these sequences are of course limited to testing
+characters whose codepoints are less than 256, but they do work in this mode.
+The extra escape sequences are:
+<pre>
+ \p{<i>xx</i>} a character with the <i>xx</i> property
+ \P{<i>xx</i>} a character without the <i>xx</i> property
+ \X a Unicode extended grapheme cluster
+</pre>
+The property names represented by <i>xx</i> above are limited to the Unicode
+script names, the general category properties, "Any", which matches any
+character (including newline), and some special PCRE properties (described
+in the
+<a href="#extraprops">next section).</a>
+Other Perl properties such as "InMusicalSymbols" are not currently supported by
+PCRE. Note that \P{Any} does not match any characters, so always causes a
+match failure.
+</P>
+<P>
+Sets of Unicode characters are defined as belonging to certain scripts. A
+character from one of these sets can be matched using a script name. For
+example:
+<pre>
+ \p{Greek}
+ \P{Han}
+</pre>
+Those that are not part of an identified script are lumped together as
+"Common". The current list of scripts is:
+</P>
+<P>
+Arabic,
+Armenian,
+Avestan,
+Balinese,
+Bamum,
+Batak,
+Bengali,
+Bopomofo,
+Brahmi,
+Braille,
+Buginese,
+Buhid,
+Canadian_Aboriginal,
+Carian,
+Chakma,
+Cham,
+Cherokee,
+Common,
+Coptic,
+Cuneiform,
+Cypriot,
+Cyrillic,
+Deseret,
+Devanagari,
+Egyptian_Hieroglyphs,
+Ethiopic,
+Georgian,
+Glagolitic,
+Gothic,
+Greek,
+Gujarati,
+Gurmukhi,
+Han,
+Hangul,
+Hanunoo,
+Hebrew,
+Hiragana,
+Imperial_Aramaic,
+Inherited,
+Inscriptional_Pahlavi,
+Inscriptional_Parthian,
+Javanese,
+Kaithi,
+Kannada,
+Katakana,
+Kayah_Li,
+Kharoshthi,
+Khmer,
+Lao,
+Latin,
+Lepcha,
+Limbu,
+Linear_B,
+Lisu,
+Lycian,
+Lydian,
+Malayalam,
+Mandaic,
+Meetei_Mayek,
+Meroitic_Cursive,
+Meroitic_Hieroglyphs,
+Miao,
+Mongolian,
+Myanmar,
+New_Tai_Lue,
+Nko,
+Ogham,
+Old_Italic,
+Old_Persian,
+Old_South_Arabian,
+Old_Turkic,
+Ol_Chiki,
+Oriya,
+Osmanya,
+Phags_Pa,
+Phoenician,
+Rejang,
+Runic,
+Samaritan,
+Saurashtra,
+Sharada,
+Shavian,
+Sinhala,
+Sora_Sompeng,
+Sundanese,
+Syloti_Nagri,
+Syriac,
+Tagalog,
+Tagbanwa,
+Tai_Le,
+Tai_Tham,
+Tai_Viet,
+Takri,
+Tamil,
+Telugu,
+Thaana,
+Thai,
+Tibetan,
+Tifinagh,
+Ugaritic,
+Vai,
+Yi.
+</P>
+<P>
+Each character has exactly one Unicode general category property, specified by
+a two-letter abbreviation. For compatibility with Perl, negation can be
+specified by including a circumflex between the opening brace and the property
+name. For example, \p{^Lu} is the same as \P{Lu}.
+</P>
+<P>
+If only one letter is specified with \p or \P, it includes all the general
+category properties that start with that letter. In this case, in the absence
+of negation, the curly brackets in the escape sequence are optional; these two
+examples have the same effect:
+<pre>
+ \p{L}
+ \pL
+</pre>
+The following general category property codes are supported:
+<pre>
+ C Other
+ Cc Control
+ Cf Format
+ Cn Unassigned
+ Co Private use
+ Cs Surrogate
+
+ L Letter
+ Ll Lower case letter
+ Lm Modifier letter
+ Lo Other letter
+ Lt Title case letter
+ Lu Upper case letter
+
+ M Mark
+ Mc Spacing mark
+ Me Enclosing mark
+ Mn Non-spacing mark
+
+ N Number
+ Nd Decimal number
+ Nl Letter number
+ No Other number
+
+ P Punctuation
+ Pc Connector punctuation
+ Pd Dash punctuation
+ Pe Close punctuation
+ Pf Final punctuation
+ Pi Initial punctuation
+ Po Other punctuation
+ Ps Open punctuation
+
+ S Symbol
+ Sc Currency symbol
+ Sk Modifier symbol
+ Sm Mathematical symbol
+ So Other symbol
+
+ Z Separator
+ Zl Line separator
+ Zp Paragraph separator
+ Zs Space separator
+</pre>
+The special property L& is also supported: it matches a character that has
+the Lu, Ll, or Lt property, in other words, a letter that is not classified as
+a modifier or "other".
+</P>
+<P>
+The Cs (Surrogate) property applies only to characters in the range U+D800 to
+U+DFFF. Such characters are not valid in Unicode strings and so
+cannot be tested by PCRE, unless UTF validity checking has been turned off
+(see the discussion of PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK and
+PCRE_NO_UTF32_CHECK in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+page). Perl does not support the Cs property.
+</P>
+<P>
+The long synonyms for property names that Perl supports (such as \p{Letter})
+are not supported by PCRE, nor is it permitted to prefix any of these
+properties with "Is".
+</P>
+<P>
+No character that is in the Unicode table has the Cn (unassigned) property.
+Instead, this property is assumed for any code point that is not in the
+Unicode table.
+</P>
+<P>
+Specifying caseless matching does not affect these escape sequences. For
+example, \p{Lu} always matches only upper case letters. This is different from
+the behaviour of current versions of Perl.
+</P>
+<P>
+Matching characters by Unicode property is not fast, because PCRE has to do a
+multistage table lookup in order to find a character's property. That is why
+the traditional escape sequences such as \d and \w do not use Unicode
+properties in PCRE by default, though you can make them do so by setting the
+PCRE_UCP option or by starting the pattern with (*UCP).
+</P>
+<br><b>
+Extended grapheme clusters
+</b><br>
+<P>
+The \X escape matches any number of Unicode characters that form an "extended
+grapheme cluster", and treats the sequence as an atomic group
+<a href="#atomicgroup">(see below).</a>
+Up to and including release 8.31, PCRE matched an earlier, simpler definition
+that was equivalent to
+<pre>
+ (?&#62;\PM\pM*)
+</pre>
+That is, it matched a character without the "mark" property, followed by zero
+or more characters with the "mark" property. Characters with the "mark"
+property are typically non-spacing accents that affect the preceding character.
+</P>
+<P>
+This simple definition was extended in Unicode to include more complicated
+kinds of composite character by giving each character a grapheme breaking
+property, and creating rules that use these properties to define the boundaries
+of extended grapheme clusters. In releases of PCRE later than 8.31, \X matches
+one of these clusters.
+</P>
+<P>
+\X always matches at least one character. Then it decides whether to add
+additional characters according to the following rules for ending a cluster:
+</P>
+<P>
+1. End at the end of the subject string.
+</P>
+<P>
+2. Do not end between CR and LF; otherwise end after any control character.
+</P>
+<P>
+3. Do not break Hangul (a Korean script) syllable sequences. Hangul characters
+are of five types: L, V, T, LV, and LVT. An L character may be followed by an
+L, V, LV, or LVT character; an LV or V character may be followed by a V or T
+character; an LVT or T character may be follwed only by a T character.
+</P>
+<P>
+4. Do not end before extending characters or spacing marks. Characters with
+the "mark" property always have the "extend" grapheme breaking property.
+</P>
+<P>
+5. Do not end after prepend characters.
+</P>
+<P>
+6. Otherwise, end the cluster.
+<a name="extraprops"></a></P>
+<br><b>
+PCRE's additional properties
+</b><br>
+<P>
+As well as the standard Unicode properties described above, PCRE supports four
+more that make it possible to convert traditional escape sequences such as \w
+and \s to use Unicode properties. PCRE uses these non-standard, non-Perl
+properties internally when PCRE_UCP is set. However, they may also be used
+explicitly. These properties are:
+<pre>
+ Xan Any alphanumeric character
+ Xps Any POSIX space character
+ Xsp Any Perl space character
+ Xwd Any Perl "word" character
+</pre>
+Xan matches characters that have either the L (letter) or the N (number)
+property. Xps matches the characters tab, linefeed, vertical tab, form feed, or
+carriage return, and any other character that has the Z (separator) property.
+Xsp is the same as Xps; it used to exclude vertical tab, for Perl
+compatibility, but Perl changed, and so PCRE followed at release 8.34. Xwd
+matches the same characters as Xan, plus underscore.
+</P>
+<P>
+There is another non-standard property, Xuc, which matches any character that
+can be represented by a Universal Character Name in C++ and other programming
+languages. These are the characters $, @, ` (grave accent), and all characters
+with Unicode code points greater than or equal to U+00A0, except for the
+surrogates U+D800 to U+DFFF. Note that most base (ASCII) characters are
+excluded. (Universal Character Names are of the form \uHHHH or \UHHHHHHHH
+where H is a hexadecimal digit. Note that the Xuc property does not match these
+sequences but the characters that they represent.)
+<a name="resetmatchstart"></a></P>
+<br><b>
+Resetting the match start
+</b><br>
+<P>
+The escape sequence \K causes any previously matched characters not to be
+included in the final matched sequence. For example, the pattern:
+<pre>
+ foo\Kbar
+</pre>
+matches "foobar", but reports that it has matched "bar". This feature is
+similar to a lookbehind assertion
+<a href="#lookbehind">(described below).</a>
+However, in this case, the part of the subject before the real match does not
+have to be of fixed length, as lookbehind assertions do. The use of \K does
+not interfere with the setting of
+<a href="#subpattern">captured substrings.</a>
+For example, when the pattern
+<pre>
+ (foo)\Kbar
+</pre>
+matches "foobar", the first substring is still set to "foo".
+</P>
+<P>
+Perl documents that the use of \K within assertions is "not well defined". In
+PCRE, \K is acted upon when it occurs inside positive assertions, but is
+ignored in negative assertions. Note that when a pattern such as (?=ab\K)
+matches, the reported start of the match can be greater than the end of the
+match.
+<a name="smallassertions"></a></P>
+<br><b>
+Simple assertions
+</b><br>
+<P>
+The final use of backslash is for certain simple assertions. An assertion
+specifies a condition that has to be met at a particular point in a match,
+without consuming any characters from the subject string. The use of
+subpatterns for more complicated assertions is described
+<a href="#bigassertions">below.</a>
+The backslashed assertions are:
+<pre>
+ \b matches at a word boundary
+ \B matches when not at a word boundary
+ \A matches at the start of the subject
+ \Z matches at the end of the subject
+ also matches before a newline at the end of the subject
+ \z matches only at the end of the subject
+ \G matches at the first matching position in the subject
+</pre>
+Inside a character class, \b has a different meaning; it matches the backspace
+character. If any other of these assertions appears in a character class, by
+default it matches the corresponding literal character (for example, \B
+matches the letter B). However, if the PCRE_EXTRA option is set, an "invalid
+escape sequence" error is generated instead.
+</P>
+<P>
+A word boundary is a position in the subject string where the current character
+and the previous character do not both match \w or \W (i.e. one matches
+\w and the other matches \W), or the start or end of the string if the
+first or last character matches \w, respectively. In a UTF mode, the meanings
+of \w and \W can be changed by setting the PCRE_UCP option. When this is
+done, it also affects \b and \B. Neither PCRE nor Perl has a separate "start
+of word" or "end of word" metasequence. However, whatever follows \b normally
+determines which it is. For example, the fragment \ba matches "a" at the start
+of a word.
+</P>
+<P>
+The \A, \Z, and \z assertions differ from the traditional circumflex and
+dollar (described in the next section) in that they only ever match at the very
+start and end of the subject string, whatever options are set. Thus, they are
+independent of multiline mode. These three assertions are not affected by the
+PCRE_NOTBOL or PCRE_NOTEOL options, which affect only the behaviour of the
+circumflex and dollar metacharacters. However, if the <i>startoffset</i>
+argument of <b>pcre_exec()</b> is non-zero, indicating that matching is to start
+at a point other than the beginning of the subject, \A can never match. The
+difference between \Z and \z is that \Z matches before a newline at the end
+of the string as well as at the very end, whereas \z matches only at the end.
+</P>
+<P>
+The \G assertion is true only when the current matching position is at the
+start point of the match, as specified by the <i>startoffset</i> argument of
+<b>pcre_exec()</b>. It differs from \A when the value of <i>startoffset</i> is
+non-zero. By calling <b>pcre_exec()</b> multiple times with appropriate
+arguments, you can mimic Perl's /g option, and it is in this kind of
+implementation where \G can be useful.
+</P>
+<P>
+Note, however, that PCRE's interpretation of \G, as the start of the current
+match, is subtly different from Perl's, which defines it as the end of the
+previous match. In Perl, these can be different when the previously matched
+string was empty. Because PCRE does just one match at a time, it cannot
+reproduce this behaviour.
+</P>
+<P>
+If all the alternatives of a pattern begin with \G, the expression is anchored
+to the starting match position, and the "anchored" flag is set in the compiled
+regular expression.
+</P>
+<br><a name="SEC6" href="#TOC1">CIRCUMFLEX AND DOLLAR</a><br>
+<P>
+The circumflex and dollar metacharacters are zero-width assertions. That is,
+they test for a particular condition being true without consuming any
+characters from the subject string.
+</P>
+<P>
+Outside a character class, in the default matching mode, the circumflex
+character is an assertion that is true only if the current matching point is at
+the start of the subject string. If the <i>startoffset</i> argument of
+<b>pcre_exec()</b> is non-zero, circumflex can never match if the PCRE_MULTILINE
+option is unset. Inside a character class, circumflex has an entirely different
+meaning
+<a href="#characterclass">(see below).</a>
+</P>
+<P>
+Circumflex need not be the first character of the pattern if a number of
+alternatives are involved, but it should be the first thing in each alternative
+in which it appears if the pattern is ever to match that branch. If all
+possible alternatives start with a circumflex, that is, if the pattern is
+constrained to match only at the start of the subject, it is said to be an
+"anchored" pattern. (There are also other constructs that can cause a pattern
+to be anchored.)
+</P>
+<P>
+The dollar character is an assertion that is true only if the current matching
+point is at the end of the subject string, or immediately before a newline at
+the end of the string (by default). Note, however, that it does not actually
+match the newline. Dollar need not be the last character of the pattern if a
+number of alternatives are involved, but it should be the last item in any
+branch in which it appears. Dollar has no special meaning in a character class.
+</P>
+<P>
+The meaning of dollar can be changed so that it matches only at the very end of
+the string, by setting the PCRE_DOLLAR_ENDONLY option at compile time. This
+does not affect the \Z assertion.
+</P>
+<P>
+The meanings of the circumflex and dollar characters are changed if the
+PCRE_MULTILINE option is set. When this is the case, a circumflex matches
+immediately after internal newlines as well as at the start of the subject
+string. It does not match after a newline that ends the string. A dollar
+matches before any newlines in the string, as well as at the very end, when
+PCRE_MULTILINE is set. When newline is specified as the two-character
+sequence CRLF, isolated CR and LF characters do not indicate newlines.
+</P>
+<P>
+For example, the pattern /^abc$/ matches the subject string "def\nabc" (where
+\n represents a newline) in multiline mode, but not otherwise. Consequently,
+patterns that are anchored in single line mode because all branches start with
+^ are not anchored in multiline mode, and a match for circumflex is possible
+when the <i>startoffset</i> argument of <b>pcre_exec()</b> is non-zero. The
+PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set.
+</P>
+<P>
+Note that the sequences \A, \Z, and \z can be used to match the start and
+end of the subject in both modes, and if all branches of a pattern start with
+\A it is always anchored, whether or not PCRE_MULTILINE is set.
+<a name="fullstopdot"></a></P>
+<br><a name="SEC7" href="#TOC1">FULL STOP (PERIOD, DOT) AND \N</a><br>
+<P>
+Outside a character class, a dot in the pattern matches any one character in
+the subject string except (by default) a character that signifies the end of a
+line.
+</P>
+<P>
+When a line ending is defined as a single character, dot never matches that
+character; when the two-character sequence CRLF is used, dot does not match CR
+if it is immediately followed by LF, but otherwise it matches all characters
+(including isolated CRs and LFs). When any Unicode line endings are being
+recognized, dot does not match CR or LF or any of the other line ending
+characters.
+</P>
+<P>
+The behaviour of dot with regard to newlines can be changed. If the PCRE_DOTALL
+option is set, a dot matches any one character, without exception. If the
+two-character sequence CRLF is present in the subject string, it takes two dots
+to match it.
+</P>
+<P>
+The handling of dot is entirely independent of the handling of circumflex and
+dollar, the only relationship being that they both involve newlines. Dot has no
+special meaning in a character class.
+</P>
+<P>
+The escape sequence \N behaves like a dot, except that it is not affected by
+the PCRE_DOTALL option. In other words, it matches any character except one
+that signifies the end of a line. Perl also uses \N to match characters by
+name; PCRE does not support this.
+</P>
+<br><a name="SEC8" href="#TOC1">MATCHING A SINGLE DATA UNIT</a><br>
+<P>
+Outside a character class, the escape sequence \C matches any one data unit,
+whether or not a UTF mode is set. In the 8-bit library, one data unit is one
+byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is
+a 32-bit unit. Unlike a dot, \C always
+matches line-ending characters. The feature is provided in Perl in order to
+match individual bytes in UTF-8 mode, but it is unclear how it can usefully be
+used. Because \C breaks up characters into individual data units, matching one
+unit with \C in a UTF mode means that the rest of the string may start with a
+malformed UTF character. This has undefined results, because PCRE assumes that
+it is dealing with valid UTF strings (and by default it checks this at the
+start of processing unless the PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK or
+PCRE_NO_UTF32_CHECK option is used).
+</P>
+<P>
+PCRE does not allow \C to appear in lookbehind assertions
+<a href="#lookbehind">(described below)</a>
+in a UTF mode, because this would make it impossible to calculate the length of
+the lookbehind.
+</P>
+<P>
+In general, the \C escape sequence is best avoided. However, one
+way of using it that avoids the problem of malformed UTF characters is to use a
+lookahead to check the length of the next character, as in this pattern, which
+could be used with a UTF-8 string (ignore white space and line breaks):
+<pre>
+ (?| (?=[\x00-\x7f])(\C) |
+ (?=[\x80-\x{7ff}])(\C)(\C) |
+ (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
+ (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))
+</pre>
+A group that starts with (?| resets the capturing parentheses numbers in each
+alternative (see
+<a href="#dupsubpatternnumber">"Duplicate Subpattern Numbers"</a>
+below). The assertions at the start of each branch check the next UTF-8
+character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The
+character's individual bytes are then captured by the appropriate number of
+groups.
+<a name="characterclass"></a></P>
+<br><a name="SEC9" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>
+<P>
+An opening square bracket introduces a character class, terminated by a closing
+square bracket. A closing square bracket on its own is not special by default.
+However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square
+bracket causes a compile-time error. If a closing square bracket is required as
+a member of the class, it should be the first data character in the class
+(after an initial circumflex, if present) or escaped with a backslash.
+</P>
+<P>
+A character class matches a single character in the subject. In a UTF mode, the
+character may be more than one data unit long. A matched character must be in
+the set of characters defined by the class, unless the first character in the
+class definition is a circumflex, in which case the subject character must not
+be in the set defined by the class. If a circumflex is actually required as a
+member of the class, ensure it is not the first character, or escape it with a
+backslash.
+</P>
+<P>
+For example, the character class [aeiou] matches any lower case vowel, while
+[^aeiou] matches any character that is not a lower case vowel. Note that a
+circumflex is just a convenient notation for specifying the characters that
+are in the class by enumerating those that are not. A class that starts with a
+circumflex is not an assertion; it still consumes a character from the subject
+string, and therefore it fails if the current pointer is at the end of the
+string.
+</P>
+<P>
+In UTF-8 (UTF-16, UTF-32) mode, characters with values greater than 255 (0xffff)
+can be included in a class as a literal string of data units, or by using the
+\x{ escaping mechanism.
+</P>
+<P>
+When caseless matching is set, any letters in a class represent both their
+upper case and lower case versions, so for example, a caseless [aeiou] matches
+"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a
+caseful version would. In a UTF mode, PCRE always understands the concept of
+case for characters whose values are less than 128, so caseless matching is
+always possible. For characters with higher values, the concept of case is
+supported if PCRE is compiled with Unicode property support, but not otherwise.
+If you want to use caseless matching in a UTF mode for characters 128 and
+above, you must ensure that PCRE is compiled with Unicode property support as
+well as with UTF support.
+</P>
+<P>
+Characters that might indicate line breaks are never treated in any special way
+when matching character classes, whatever line-ending sequence is in use, and
+whatever setting of the PCRE_DOTALL and PCRE_MULTILINE options is used. A class
+such as [^a] always matches one of these characters.
+</P>
+<P>
+The minus (hyphen) character can be used to specify a range of characters in a
+character class. For example, [d-m] matches any letter between d and m,
+inclusive. If a minus character is required in a class, it must be escaped with
+a backslash or appear in a position where it cannot be interpreted as
+indicating a range, typically as the first or last character in the class, or
+immediately after a range. For example, [b-d-z] matches letters in the range b
+to d, a hyphen character, or z.
+</P>
+<P>
+It is not possible to have the literal character "]" as the end character of a
+range. A pattern such as [W-]46] is interpreted as a class of two characters
+("W" and "-") followed by a literal string "46]", so it would match "W46]" or
+"-46]". However, if the "]" is escaped with a backslash it is interpreted as
+the end of range, so [W-\]46] is interpreted as a class containing a range
+followed by two other characters. The octal or hexadecimal representation of
+"]" can also be used to end a range.
+</P>
+<P>
+An error is generated if a POSIX character class (see below) or an escape
+sequence other than one that defines a single character appears at a point
+where a range ending character is expected. For example, [z-\xff] is valid,
+but [A-\d] and [A-[:digit:]] are not.
+</P>
+<P>
+Ranges operate in the collating sequence of character values. They can also be
+used for characters specified numerically, for example [\000-\037]. Ranges
+can include any characters that are valid for the current mode.
+</P>
+<P>
+If a range that includes letters is used when caseless matching is set, it
+matches the letters in either case. For example, [W-c] is equivalent to
+[][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character
+tables for a French locale are in use, [\xc8-\xcb] matches accented E
+characters in both cases. In UTF modes, PCRE supports the concept of case for
+characters with values greater than 128 only when it is compiled with Unicode
+property support.
+</P>
+<P>
+The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,
+\V, \w, and \W may appear in a character class, and add the characters that
+they match to the class. For example, [\dABCDEF] matches any hexadecimal
+digit. In UTF modes, the PCRE_UCP option affects the meanings of \d, \s, \w
+and their upper case partners, just as it does when they appear outside a
+character class, as described in the section entitled
+<a href="#genericchartypes">"Generic character types"</a>
+above. The escape sequence \b has a different meaning inside a character
+class; it matches the backspace character. The sequences \B, \N, \R, and \X
+are not special inside a character class. Like any other unrecognized escape
+sequences, they are treated as the literal characters "B", "N", "R", and "X" by
+default, but cause an error if the PCRE_EXTRA option is set.
+</P>
+<P>
+A circumflex can conveniently be used with the upper case character types to
+specify a more restricted set of characters than the matching lower case type.
+For example, the class [^\W_] matches any letter or digit, but not underscore,
+whereas [\w] includes underscore. A positive character class should be read as
+"something OR something OR ..." and a negative class as "NOT something AND NOT
+something AND NOT ...".
+</P>
+<P>
+The only metacharacters that are recognized in character classes are backslash,
+hyphen (only where it can be interpreted as specifying a range), circumflex
+(only at the start), opening square bracket (only when it can be interpreted as
+introducing a POSIX class name, or for a special compatibility feature - see
+the next two sections), and the terminating closing square bracket. However,
+escaping other non-alphanumeric characters does no harm.
+</P>
+<br><a name="SEC10" href="#TOC1">POSIX CHARACTER CLASSES</a><br>
+<P>
+Perl supports the POSIX notation for character classes. This uses names
+enclosed by [: and :] within the enclosing square brackets. PCRE also supports
+this notation. For example,
+<pre>
+ [01[:alpha:]%]
+</pre>
+matches "0", "1", any alphabetic character, or "%". The supported class names
+are:
+<pre>
+ alnum letters and digits
+ alpha letters
+ ascii character codes 0 - 127
+ blank space or tab only
+ cntrl control characters
+ digit decimal digits (same as \d)
+ graph printing characters, excluding space
+ lower lower case letters
+ print printing characters, including space
+ punct printing characters, excluding letters and digits and space
+ space white space (the same as \s from PCRE 8.34)
+ upper upper case letters
+ word "word" characters (same as \w)
+ xdigit hexadecimal digits
+</pre>
+The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
+and space (32). If locale-specific matching is taking place, the list of space
+characters may be different; there may be fewer or more of them. "Space" used
+to be different to \s, which did not include VT, for Perl compatibility.
+However, Perl changed at release 5.18, and PCRE followed at release 8.34.
+"Space" and \s now match the same set of characters.
+</P>
+<P>
+The name "word" is a Perl extension, and "blank" is a GNU extension from Perl
+5.8. Another Perl extension is negation, which is indicated by a ^ character
+after the colon. For example,
+<pre>
+ [12[:^digit:]]
+</pre>
+matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX
+syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not
+supported, and an error is given if they are encountered.
+</P>
+<P>
+By default, characters with values greater than 128 do not match any of the
+POSIX character classes. However, if the PCRE_UCP option is passed to
+<b>pcre_compile()</b>, some of the classes are changed so that Unicode character
+properties are used. This is achieved by replacing certain POSIX classes by
+other sequences, as follows:
+<pre>
+ [:alnum:] becomes \p{Xan}
+ [:alpha:] becomes \p{L}
+ [:blank:] becomes \h
+ [:digit:] becomes \p{Nd}
+ [:lower:] becomes \p{Ll}
+ [:space:] becomes \p{Xps}
+ [:upper:] becomes \p{Lu}
+ [:word:] becomes \p{Xwd}
+</pre>
+Negated versions, such as [:^alpha:] use \P instead of \p. Three other POSIX
+classes are handled specially in UCP mode:
+</P>
+<P>
+[:graph:]
+This matches characters that have glyphs that mark the page when printed. In
+Unicode property terms, it matches all characters with the L, M, N, P, S, or Cf
+properties, except for:
+<pre>
+ U+061C Arabic Letter Mark
+ U+180E Mongolian Vowel Separator
+ U+2066 - U+2069 Various "isolate"s
+
+</PRE>
+</P>
+<P>
+[:print:]
+This matches the same characters as [:graph:] plus space characters that are
+not controls, that is, characters with the Zs property.
+</P>
+<P>
+[:punct:]
+This matches all characters that have the Unicode P (punctuation) property,
+plus those characters whose code points are less than 128 that have the S
+(Symbol) property.
+</P>
+<P>
+The other POSIX classes are unchanged, and match only characters with code
+points less than 128.
+</P>
+<br><a name="SEC11" href="#TOC1">COMPATIBILITY FEATURE FOR WORD BOUNDARIES</a><br>
+<P>
+In the POSIX.2 compliant library that was included in 4.4BSD Unix, the ugly
+syntax [[:&#60;:]] and [[:&#62;:]] is used for matching "start of word" and "end of
+word". PCRE treats these items as follows:
+<pre>
+ [[:&#60;:]] is converted to \b(?=\w)
+ [[:&#62;:]] is converted to \b(?&#60;=\w)
+</pre>
+Only these exact character sequences are recognized. A sequence such as
+[a[:&#60;:]b] provokes error for an unrecognized POSIX class name. This support is
+not compatible with Perl. It is provided to help migrations from other
+environments, and is best not used in any new patterns. Note that \b matches
+at the start and the end of a word (see
+<a href="#smallassertions">"Simple assertions"</a>
+above), and in a Perl-style pattern the preceding or following character
+normally shows which is wanted, without the need for the assertions that are
+used above in order to give exactly the POSIX behaviour.
+</P>
+<br><a name="SEC12" href="#TOC1">VERTICAL BAR</a><br>
+<P>
+Vertical bar characters are used to separate alternative patterns. For example,
+the pattern
+<pre>
+ gilbert|sullivan
+</pre>
+matches either "gilbert" or "sullivan". Any number of alternatives may appear,
+and an empty alternative is permitted (matching the empty string). The matching
+process tries each alternative in turn, from left to right, and the first one
+that succeeds is used. If the alternatives are within a subpattern
+<a href="#subpattern">(defined below),</a>
+"succeeds" means matching the rest of the main pattern as well as the
+alternative in the subpattern.
+</P>
+<br><a name="SEC13" href="#TOC1">INTERNAL OPTION SETTING</a><br>
+<P>
+The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
+PCRE_EXTENDED options (which are Perl-compatible) can be changed from within
+the pattern by a sequence of Perl option letters enclosed between "(?" and ")".
+The option letters are
+<pre>
+ i for PCRE_CASELESS
+ m for PCRE_MULTILINE
+ s for PCRE_DOTALL
+ x for PCRE_EXTENDED
+</pre>
+For example, (?im) sets caseless, multiline matching. It is also possible to
+unset these options by preceding the letter with a hyphen, and a combined
+setting and unsetting such as (?im-sx), which sets PCRE_CASELESS and
+PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED, is also
+permitted. If a letter appears both before and after the hyphen, the option is
+unset.
+</P>
+<P>
+The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA can be
+changed in the same way as the Perl-compatible options by using the characters
+J, U and X respectively.
+</P>
+<P>
+When one of these option changes occurs at top level (that is, not inside
+subpattern parentheses), the change applies to the remainder of the pattern
+that follows. If the change is placed right at the start of a pattern, PCRE
+extracts it into the global options (and it will therefore show up in data
+extracted by the <b>pcre_fullinfo()</b> function).
+</P>
+<P>
+An option change within a subpattern (see below for a description of
+subpatterns) affects only that part of the subpattern that follows it, so
+<pre>
+ (a(?i)b)c
+</pre>
+matches abc and aBc and no other strings (assuming PCRE_CASELESS is not used).
+By this means, options can be made to have different settings in different
+parts of the pattern. Any changes made in one alternative do carry on
+into subsequent branches within the same subpattern. For example,
+<pre>
+ (a(?i)b|c)
+</pre>
+matches "ab", "aB", "c", and "C", even though when matching "C" the first
+branch is abandoned before the option setting. This is because the effects of
+option settings happen at compile time. There would be some very weird
+behaviour otherwise.
+</P>
+<P>
+<b>Note:</b> There are other PCRE-specific options that can be set by the
+application when the compiling or matching functions are called. In some cases
+the pattern can contain special leading sequences such as (*CRLF) to override
+what the application has set or what has been defaulted. Details are given in
+the section entitled
+<a href="#newlineseq">"Newline sequences"</a>
+above. There are also the (*UTF8), (*UTF16),(*UTF32), and (*UCP) leading
+sequences that can be used to set UTF and Unicode property modes; they are
+equivalent to setting the PCRE_UTF8, PCRE_UTF16, PCRE_UTF32 and the PCRE_UCP
+options, respectively. The (*UTF) sequence is a generic version that can be
+used with any of the libraries. However, the application can set the
+PCRE_NEVER_UTF option, which locks out the use of the (*UTF) sequences.
+<a name="subpattern"></a></P>
+<br><a name="SEC14" href="#TOC1">SUBPATTERNS</a><br>
+<P>
+Subpatterns are delimited by parentheses (round brackets), which can be nested.
+Turning part of a pattern into a subpattern does two things:
+<br>
+<br>
+1. It localizes a set of alternatives. For example, the pattern
+<pre>
+ cat(aract|erpillar|)
+</pre>
+matches "cataract", "caterpillar", or "cat". Without the parentheses, it would
+match "cataract", "erpillar" or an empty string.
+<br>
+<br>
+2. It sets up the subpattern as a capturing subpattern. This means that, when
+the whole pattern matches, that portion of the subject string that matched the
+subpattern is passed back to the caller via the <i>ovector</i> argument of the
+matching function. (This applies only to the traditional matching functions;
+the DFA matching functions do not support capturing.)
+</P>
+<P>
+Opening parentheses are counted from left to right (starting from 1) to obtain
+numbers for the capturing subpatterns. For example, if the string "the red
+king" is matched against the pattern
+<pre>
+ the ((red|white) (king|queen))
+</pre>
+the captured substrings are "red king", "red", and "king", and are numbered 1,
+2, and 3, respectively.
+</P>
+<P>
+The fact that plain parentheses fulfil two functions is not always helpful.
+There are often times when a grouping subpattern is required without a
+capturing requirement. If an opening parenthesis is followed by a question mark
+and a colon, the subpattern does not do any capturing, and is not counted when
+computing the number of any subsequent capturing subpatterns. For example, if
+the string "the white queen" is matched against the pattern
+<pre>
+ the ((?:red|white) (king|queen))
+</pre>
+the captured substrings are "white queen" and "queen", and are numbered 1 and
+2. The maximum number of capturing subpatterns is 65535.
+</P>
+<P>
+As a convenient shorthand, if any option settings are required at the start of
+a non-capturing subpattern, the option letters may appear between the "?" and
+the ":". Thus the two patterns
+<pre>
+ (?i:saturday|sunday)
+ (?:(?i)saturday|sunday)
+</pre>
+match exactly the same set of strings. Because alternative branches are tried
+from left to right, and options are not reset until the end of the subpattern
+is reached, an option setting in one branch does affect subsequent branches, so
+the above patterns match "SUNDAY" as well as "Saturday".
+<a name="dupsubpatternnumber"></a></P>
+<br><a name="SEC15" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>
+<P>
+Perl 5.10 introduced a feature whereby each alternative in a subpattern uses
+the same numbers for its capturing parentheses. Such a subpattern starts with
+(?| and is itself a non-capturing subpattern. For example, consider this
+pattern:
+<pre>
+ (?|(Sat)ur|(Sun))day
+</pre>
+Because the two alternatives are inside a (?| group, both sets of capturing
+parentheses are numbered one. Thus, when the pattern matches, you can look
+at captured substring number one, whichever alternative matched. This construct
+is useful when you want to capture part, but not all, of one of a number of
+alternatives. Inside a (?| group, parentheses are numbered as usual, but the
+number is reset at the start of each branch. The numbers of any capturing
+parentheses that follow the subpattern start after the highest number used in
+any branch. The following example is taken from the Perl documentation. The
+numbers underneath show in which buffer the captured content will be stored.
+<pre>
+ # before ---------------branch-reset----------- after
+ / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
+ # 1 2 2 3 2 3 4
+</pre>
+A back reference to a numbered subpattern uses the most recent value that is
+set for that number by any subpattern. The following pattern matches "abcabc"
+or "defdef":
+<pre>
+ /(?|(abc)|(def))\1/
+</pre>
+In contrast, a subroutine call to a numbered subpattern always refers to the
+first one in the pattern with the given number. The following pattern matches
+"abcabc" or "defabc":
+<pre>
+ /(?|(abc)|(def))(?1)/
+</pre>
+If a
+<a href="#conditions">condition test</a>
+for a subpattern's having matched refers to a non-unique number, the test is
+true if any of the subpatterns of that number have matched.
+</P>
+<P>
+An alternative approach to using this "branch reset" feature is to use
+duplicate named subpatterns, as described in the next section.
+</P>
+<br><a name="SEC16" href="#TOC1">NAMED SUBPATTERNS</a><br>
+<P>
+Identifying capturing parentheses by number is simple, but it can be very hard
+to keep track of the numbers in complicated regular expressions. Furthermore,
+if an expression is modified, the numbers may change. To help with this
+difficulty, PCRE supports the naming of subpatterns. This feature was not
+added to Perl until release 5.10. Python had the feature earlier, and PCRE
+introduced it at release 4.0, using the Python syntax. PCRE now supports both
+the Perl and the Python syntax. Perl allows identically numbered subpatterns to
+have different names, but PCRE does not.
+</P>
+<P>
+In PCRE, a subpattern can be named in one of three ways: (?&#60;name&#62;...) or
+(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. References to capturing
+parentheses from other parts of the pattern, such as
+<a href="#backreferences">back references,</a>
+<a href="#recursion">recursion,</a>
+and
+<a href="#conditions">conditions,</a>
+can be made by name as well as by number.
+</P>
+<P>
+Names consist of up to 32 alphanumeric characters and underscores, but must
+start with a non-digit. Named capturing parentheses are still allocated numbers
+as well as names, exactly as if the names were not present. The PCRE API
+provides function calls for extracting the name-to-number translation table
+from a compiled pattern. There is also a convenience function for extracting a
+captured substring by name.
+</P>
+<P>
+By default, a name must be unique within a pattern, but it is possible to relax
+this constraint by setting the PCRE_DUPNAMES option at compile time. (Duplicate
+names are also always permitted for subpatterns with the same number, set up as
+described in the previous section.) Duplicate names can be useful for patterns
+where only one instance of the named parentheses can match. Suppose you want to
+match the name of a weekday, either as a 3-letter abbreviation or as the full
+name, and in both cases you want to extract the abbreviation. This pattern
+(ignoring the line breaks) does the job:
+<pre>
+ (?&#60;DN&#62;Mon|Fri|Sun)(?:day)?|
+ (?&#60;DN&#62;Tue)(?:sday)?|
+ (?&#60;DN&#62;Wed)(?:nesday)?|
+ (?&#60;DN&#62;Thu)(?:rsday)?|
+ (?&#60;DN&#62;Sat)(?:urday)?
+</pre>
+There are five capturing substrings, but only one is ever set after a match.
+(An alternative way of solving this problem is to use a "branch reset"
+subpattern, as described in the previous section.)
+</P>
+<P>
+The convenience function for extracting the data by name returns the substring
+for the first (and in this example, the only) subpattern of that name that
+matched. This saves searching to find which numbered subpattern it was.
+</P>
+<P>
+If you make a back reference to a non-unique named subpattern from elsewhere in
+the pattern, the subpatterns to which the name refers are checked in the order
+in which they appear in the overall pattern. The first one that is set is used
+for the reference. For example, this pattern matches both "foofoo" and
+"barbar" but not "foobar" or "barfoo":
+<pre>
+ (?:(?&#60;n&#62;foo)|(?&#60;n&#62;bar))\k&#60;n&#62;
+
+</PRE>
+</P>
+<P>
+If you make a subroutine call to a non-unique named subpattern, the one that
+corresponds to the first occurrence of the name is used. In the absence of
+duplicate numbers (see the previous section) this is the one with the lowest
+number.
+</P>
+<P>
+If you use a named reference in a condition
+test (see the
+<a href="#conditions">section about conditions</a>
+below), either to check whether a subpattern has matched, or to check for
+recursion, all subpatterns with the same name are tested. If the condition is
+true for any one of them, the overall condition is true. This is the same
+behaviour as testing by number. For further details of the interfaces for
+handling named subpatterns, see the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+</P>
+<P>
+<b>Warning:</b> You cannot use different names to distinguish between two
+subpatterns with the same number because PCRE uses only the numbers when
+matching. For this reason, an error is given at compile time if different names
+are given to subpatterns with the same number. However, you can always give the
+same name to subpatterns with the same number, even when PCRE_DUPNAMES is not
+set.
+</P>
+<br><a name="SEC17" href="#TOC1">REPETITION</a><br>
+<P>
+Repetition is specified by quantifiers, which can follow any of the following
+items:
+<pre>
+ a literal data character
+ the dot metacharacter
+ the \C escape sequence
+ the \X escape sequence
+ the \R escape sequence
+ an escape such as \d or \pL that matches a single character
+ a character class
+ a back reference (see next section)
+ a parenthesized subpattern (including assertions)
+ a subroutine call to a subpattern (recursive or otherwise)
+</pre>
+The general repetition quantifier specifies a minimum and maximum number of
+permitted matches, by giving the two numbers in curly brackets (braces),
+separated by a comma. The numbers must be less than 65536, and the first must
+be less than or equal to the second. For example:
+<pre>
+ z{2,4}
+</pre>
+matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special
+character. If the second number is omitted, but the comma is present, there is
+no upper limit; if the second number and the comma are both omitted, the
+quantifier specifies an exact number of required matches. Thus
+<pre>
+ [aeiou]{3,}
+</pre>
+matches at least 3 successive vowels, but may match many more, while
+<pre>
+ \d{8}
+</pre>
+matches exactly 8 digits. An opening curly bracket that appears in a position
+where a quantifier is not allowed, or one that does not match the syntax of a
+quantifier, is taken as a literal character. For example, {,6} is not a
+quantifier, but a literal string of four characters.
+</P>
+<P>
+In UTF modes, quantifiers apply to characters rather than to individual data
+units. Thus, for example, \x{100}{2} matches two characters, each of
+which is represented by a two-byte sequence in a UTF-8 string. Similarly,
+\X{3} matches three Unicode extended grapheme clusters, each of which may be
+several data units long (and they may be of different lengths).
+</P>
+<P>
+The quantifier {0} is permitted, causing the expression to behave as if the
+previous item and the quantifier were not present. This may be useful for
+subpatterns that are referenced as
+<a href="#subpatternsassubroutines">subroutines</a>
+from elsewhere in the pattern (but see also the section entitled
+<a href="#subdefine">"Defining subpatterns for use by reference only"</a>
+below). Items other than subpatterns that have a {0} quantifier are omitted
+from the compiled pattern.
+</P>
+<P>
+For convenience, the three most common quantifiers have single-character
+abbreviations:
+<pre>
+ * is equivalent to {0,}
+ + is equivalent to {1,}
+ ? is equivalent to {0,1}
+</pre>
+It is possible to construct infinite loops by following a subpattern that can
+match no characters with a quantifier that has no upper limit, for example:
+<pre>
+ (a?)*
+</pre>
+Earlier versions of Perl and PCRE used to give an error at compile time for
+such patterns. However, because there are cases where this can be useful, such
+patterns are now accepted, but if any repetition of the subpattern does in fact
+match no characters, the loop is forcibly broken.
+</P>
+<P>
+By default, the quantifiers are "greedy", that is, they match as much as
+possible (up to the maximum number of permitted times), without causing the
+rest of the pattern to fail. The classic example of where this gives problems
+is in trying to match comments in C programs. These appear between /* and */
+and within the comment, individual * and / characters may appear. An attempt to
+match C comments by applying the pattern
+<pre>
+ /\*.*\*/
+</pre>
+to the string
+<pre>
+ /* first comment */ not comment /* second comment */
+</pre>
+fails, because it matches the entire string owing to the greediness of the .*
+item.
+</P>
+<P>
+However, if a quantifier is followed by a question mark, it ceases to be
+greedy, and instead matches the minimum number of times possible, so the
+pattern
+<pre>
+ /\*.*?\*/
+</pre>
+does the right thing with the C comments. The meaning of the various
+quantifiers is not otherwise changed, just the preferred number of matches.
+Do not confuse this use of question mark with its use as a quantifier in its
+own right. Because it has two uses, it can sometimes appear doubled, as in
+<pre>
+ \d??\d
+</pre>
+which matches one digit by preference, but can match two if that is the only
+way the rest of the pattern matches.
+</P>
+<P>
+If the PCRE_UNGREEDY option is set (an option that is not available in Perl),
+the quantifiers are not greedy by default, but individual ones can be made
+greedy by following them with a question mark. In other words, it inverts the
+default behaviour.
+</P>
+<P>
+When a parenthesized subpattern is quantified with a minimum repeat count that
+is greater than 1 or with a limited maximum, more memory is required for the
+compiled pattern, in proportion to the size of the minimum or maximum.
+</P>
+<P>
+If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent
+to Perl's /s) is set, thus allowing the dot to match newlines, the pattern is
+implicitly anchored, because whatever follows will be tried against every
+character position in the subject string, so there is no point in retrying the
+overall match at any position after the first. PCRE normally treats such a
+pattern as though it were preceded by \A.
+</P>
+<P>
+In cases where it is known that the subject string contains no newlines, it is
+worth setting PCRE_DOTALL in order to obtain this optimization, or
+alternatively using ^ to indicate anchoring explicitly.
+</P>
+<P>
+However, there are some cases where the optimization cannot be used. When .*
+is inside capturing parentheses that are the subject of a back reference
+elsewhere in the pattern, a match at the start may fail where a later one
+succeeds. Consider, for example:
+<pre>
+ (.*)abc\1
+</pre>
+If the subject is "xyz123abc123" the match point is the fourth character. For
+this reason, such a pattern is not implicitly anchored.
+</P>
+<P>
+Another case where implicit anchoring is not applied is when the leading .* is
+inside an atomic group. Once again, a match at the start may fail where a later
+one succeeds. Consider this pattern:
+<pre>
+ (?&#62;.*?a)b
+</pre>
+It matches "ab" in the subject "aab". The use of the backtracking control verbs
+(*PRUNE) and (*SKIP) also disable this optimization.
+</P>
+<P>
+When a capturing subpattern is repeated, the value captured is the substring
+that matched the final iteration. For example, after
+<pre>
+ (tweedle[dume]{3}\s*)+
+</pre>
+has matched "tweedledum tweedledee" the value of the captured substring is
+"tweedledee". However, if there are nested capturing subpatterns, the
+corresponding captured values may have been set in previous iterations. For
+example, after
+<pre>
+ /(a|(b))+/
+</pre>
+matches "aba" the value of the second captured substring is "b".
+<a name="atomicgroup"></a></P>
+<br><a name="SEC18" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>
+<P>
+With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
+repetition, failure of what follows normally causes the repeated item to be
+re-evaluated to see if a different number of repeats allows the rest of the
+pattern to match. Sometimes it is useful to prevent this, either to change the
+nature of the match, or to cause it fail earlier than it otherwise might, when
+the author of the pattern knows there is no point in carrying on.
+</P>
+<P>
+Consider, for example, the pattern \d+foo when applied to the subject line
+<pre>
+ 123456bar
+</pre>
+After matching all 6 digits and then failing to match "foo", the normal
+action of the matcher is to try again with only 5 digits matching the \d+
+item, and then with 4, and so on, before ultimately failing. "Atomic grouping"
+(a term taken from Jeffrey Friedl's book) provides the means for specifying
+that once a subpattern has matched, it is not to be re-evaluated in this way.
+</P>
+<P>
+If we use atomic grouping for the previous example, the matcher gives up
+immediately on failing to match "foo" the first time. The notation is a kind of
+special parenthesis, starting with (?&#62; as in this example:
+<pre>
+ (?&#62;\d+)foo
+</pre>
+This kind of parenthesis "locks up" the part of the pattern it contains once
+it has matched, and a failure further into the pattern is prevented from
+backtracking into it. Backtracking past it to previous items, however, works as
+normal.
+</P>
+<P>
+An alternative description is that a subpattern of this type matches the string
+of characters that an identical standalone pattern would match, if anchored at
+the current point in the subject string.
+</P>
+<P>
+Atomic grouping subpatterns are not capturing subpatterns. Simple cases such as
+the above example can be thought of as a maximizing repeat that must swallow
+everything it can. So, while both \d+ and \d+? are prepared to adjust the
+number of digits they match in order to make the rest of the pattern match,
+(?&#62;\d+) can only match an entire sequence of digits.
+</P>
+<P>
+Atomic groups in general can of course contain arbitrarily complicated
+subpatterns, and can be nested. However, when the subpattern for an atomic
+group is just a single repeated item, as in the example above, a simpler
+notation, called a "possessive quantifier" can be used. This consists of an
+additional + character following a quantifier. Using this notation, the
+previous example can be rewritten as
+<pre>
+ \d++foo
+</pre>
+Note that a possessive quantifier can be used with an entire group, for
+example:
+<pre>
+ (abc|xyz){2,3}+
+</pre>
+Possessive quantifiers are always greedy; the setting of the PCRE_UNGREEDY
+option is ignored. They are a convenient notation for the simpler forms of
+atomic group. However, there is no difference in the meaning of a possessive
+quantifier and the equivalent atomic group, though there may be a performance
+difference; possessive quantifiers should be slightly faster.
+</P>
+<P>
+The possessive quantifier syntax is an extension to the Perl 5.8 syntax.
+Jeffrey Friedl originated the idea (and the name) in the first edition of his
+book. Mike McCloskey liked it, so implemented it when he built Sun's Java
+package, and PCRE copied it from there. It ultimately found its way into Perl
+at release 5.10.
+</P>
+<P>
+PCRE has an optimization that automatically "possessifies" certain simple
+pattern constructs. For example, the sequence A+B is treated as A++B because
+there is no point in backtracking into a sequence of A's when B must follow.
+</P>
+<P>
+When a pattern contains an unlimited repeat inside a subpattern that can itself
+be repeated an unlimited number of times, the use of an atomic group is the
+only way to avoid some failing matches taking a very long time indeed. The
+pattern
+<pre>
+ (\D+|&#60;\d+&#62;)*[!?]
+</pre>
+matches an unlimited number of substrings that either consist of non-digits, or
+digits enclosed in &#60;&#62;, followed by either ! or ?. When it matches, it runs
+quickly. However, if it is applied to
+<pre>
+ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+</pre>
+it takes a long time before reporting failure. This is because the string can
+be divided between the internal \D+ repeat and the external * repeat in a
+large number of ways, and all have to be tried. (The example uses [!?] rather
+than a single character at the end, because both PCRE and Perl have an
+optimization that allows for fast failure when a single character is used. They
+remember the last single character that is required for a match, and fail early
+if it is not present in the string.) If the pattern is changed so that it uses
+an atomic group, like this:
+<pre>
+ ((?&#62;\D+)|&#60;\d+&#62;)*[!?]
+</pre>
+sequences of non-digits cannot be broken, and failure happens quickly.
+<a name="backreferences"></a></P>
+<br><a name="SEC19" href="#TOC1">BACK REFERENCES</a><br>
+<P>
+Outside a character class, a backslash followed by a digit greater than 0 (and
+possibly further digits) is a back reference to a capturing subpattern earlier
+(that is, to its left) in the pattern, provided there have been that many
+previous capturing left parentheses.
+</P>
+<P>
+However, if the decimal number following the backslash is less than 10, it is
+always taken as a back reference, and causes an error only if there are not
+that many capturing left parentheses in the entire pattern. In other words, the
+parentheses that are referenced need not be to the left of the reference for
+numbers less than 10. A "forward back reference" of this type can make sense
+when a repetition is involved and the subpattern to the right has participated
+in an earlier iteration.
+</P>
+<P>
+It is not possible to have a numerical "forward back reference" to a subpattern
+whose number is 10 or more using this syntax because a sequence such as \50 is
+interpreted as a character defined in octal. See the subsection entitled
+"Non-printing characters"
+<a href="#digitsafterbackslash">above</a>
+for further details of the handling of digits following a backslash. There is
+no such problem when named parentheses are used. A back reference to any
+subpattern is possible using named parentheses (see below).
+</P>
+<P>
+Another way of avoiding the ambiguity inherent in the use of digits following a
+backslash is to use the \g escape sequence. This escape must be followed by an
+unsigned number or a negative number, optionally enclosed in braces. These
+examples are all identical:
+<pre>
+ (ring), \1
+ (ring), \g1
+ (ring), \g{1}
+</pre>
+An unsigned number specifies an absolute reference without the ambiguity that
+is present in the older syntax. It is also useful when literal digits follow
+the reference. A negative number is a relative reference. Consider this
+example:
+<pre>
+ (abc(def)ghi)\g{-1}
+</pre>
+The sequence \g{-1} is a reference to the most recently started capturing
+subpattern before \g, that is, is it equivalent to \2 in this example.
+Similarly, \g{-2} would be equivalent to \1. The use of relative references
+can be helpful in long patterns, and also in patterns that are created by
+joining together fragments that contain references within themselves.
+</P>
+<P>
+A back reference matches whatever actually matched the capturing subpattern in
+the current subject string, rather than anything matching the subpattern
+itself (see
+<a href="#subpatternsassubroutines">"Subpatterns as subroutines"</a>
+below for a way of doing that). So the pattern
+<pre>
+ (sens|respons)e and \1ibility
+</pre>
+matches "sense and sensibility" and "response and responsibility", but not
+"sense and responsibility". If caseful matching is in force at the time of the
+back reference, the case of letters is relevant. For example,
+<pre>
+ ((?i)rah)\s+\1
+</pre>
+matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original
+capturing subpattern is matched caselessly.
+</P>
+<P>
+There are several different ways of writing back references to named
+subpatterns. The .NET syntax \k{name} and the Perl syntax \k&#60;name&#62; or
+\k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified
+back reference syntax, in which \g can be used for both numeric and named
+references, is also supported. We could rewrite the above example in any of
+the following ways:
+<pre>
+ (?&#60;p1&#62;(?i)rah)\s+\k&#60;p1&#62;
+ (?'p1'(?i)rah)\s+\k{p1}
+ (?P&#60;p1&#62;(?i)rah)\s+(?P=p1)
+ (?&#60;p1&#62;(?i)rah)\s+\g{p1}
+</pre>
+A subpattern that is referenced by name may appear in the pattern before or
+after the reference.
+</P>
+<P>
+There may be more than one back reference to the same subpattern. If a
+subpattern has not actually been used in a particular match, any back
+references to it always fail by default. For example, the pattern
+<pre>
+ (a|(bc))\2
+</pre>
+always fails if it starts to match "a" rather than "bc". However, if the
+PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back reference to an
+unset value matches an empty string.
+</P>
+<P>
+Because there may be many capturing parentheses in a pattern, all digits
+following a backslash are taken as part of a potential back reference number.
+If the pattern continues with a digit character, some delimiter must be used to
+terminate the back reference. If the PCRE_EXTENDED option is set, this can be
+white space. Otherwise, the \g{ syntax or an empty comment (see
+<a href="#comments">"Comments"</a>
+below) can be used.
+</P>
+<br><b>
+Recursive back references
+</b><br>
+<P>
+A back reference that occurs inside the parentheses to which it refers fails
+when the subpattern is first used, so, for example, (a\1) never matches.
+However, such references can be useful inside repeated subpatterns. For
+example, the pattern
+<pre>
+ (a|b\1)+
+</pre>
+matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
+the subpattern, the back reference matches the character string corresponding
+to the previous iteration. In order for this to work, the pattern must be such
+that the first iteration does not need to match the back reference. This can be
+done using alternation, as in the example above, or by a quantifier with a
+minimum of zero.
+</P>
+<P>
+Back references of this type cause the group that they reference to be treated
+as an
+<a href="#atomicgroup">atomic group.</a>
+Once the whole group has been matched, a subsequent matching failure cannot
+cause backtracking into the middle of the group.
+<a name="bigassertions"></a></P>
+<br><a name="SEC20" href="#TOC1">ASSERTIONS</a><br>
+<P>
+An assertion is a test on the characters following or preceding the current
+matching point that does not actually consume any characters. The simple
+assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described
+<a href="#smallassertions">above.</a>
+</P>
+<P>
+More complicated assertions are coded as subpatterns. There are two kinds:
+those that look ahead of the current position in the subject string, and those
+that look behind it. An assertion subpattern is matched in the normal way,
+except that it does not cause the current matching position to be changed.
+</P>
+<P>
+Assertion subpatterns are not capturing subpatterns. If such an assertion
+contains capturing subpatterns within it, these are counted for the purposes of
+numbering the capturing subpatterns in the whole pattern. However, substring
+capturing is carried out only for positive assertions. (Perl sometimes, but not
+always, does do capturing in negative assertions.)
+</P>
+<P>
+For compatibility with Perl, assertion subpatterns may be repeated; though
+it makes no sense to assert the same thing several times, the side effect of
+capturing parentheses may occasionally be useful. In practice, there only three
+cases:
+<br>
+<br>
+(1) If the quantifier is {0}, the assertion is never obeyed during matching.
+However, it may contain internal capturing parenthesized groups that are called
+from elsewhere via the
+<a href="#subpatternsassubroutines">subroutine mechanism.</a>
+<br>
+<br>
+(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
+were {0,1}. At run time, the rest of the pattern match is tried with and
+without the assertion, the order depending on the greediness of the quantifier.
+<br>
+<br>
+(3) If the minimum repetition is greater than zero, the quantifier is ignored.
+The assertion is obeyed just once when encountered during matching.
+</P>
+<br><b>
+Lookahead assertions
+</b><br>
+<P>
+Lookahead assertions start with (?= for positive assertions and (?! for
+negative assertions. For example,
+<pre>
+ \w+(?=;)
+</pre>
+matches a word followed by a semicolon, but does not include the semicolon in
+the match, and
+<pre>
+ foo(?!bar)
+</pre>
+matches any occurrence of "foo" that is not followed by "bar". Note that the
+apparently similar pattern
+<pre>
+ (?!foo)bar
+</pre>
+does not find an occurrence of "bar" that is preceded by something other than
+"foo"; it finds any occurrence of "bar" whatsoever, because the assertion
+(?!foo) is always true when the next three characters are "bar". A
+lookbehind assertion is needed to achieve the other effect.
+</P>
+<P>
+If you want to force a matching failure at some point in a pattern, the most
+convenient way to do it is with (?!) because an empty string always matches, so
+an assertion that requires there not to be an empty string must always fail.
+The backtracking control verb (*FAIL) or (*F) is a synonym for (?!).
+<a name="lookbehind"></a></P>
+<br><b>
+Lookbehind assertions
+</b><br>
+<P>
+Lookbehind assertions start with (?&#60;= for positive assertions and (?&#60;! for
+negative assertions. For example,
+<pre>
+ (?&#60;!foo)bar
+</pre>
+does find an occurrence of "bar" that is not preceded by "foo". The contents of
+a lookbehind assertion are restricted such that all the strings it matches must
+have a fixed length. However, if there are several top-level alternatives, they
+do not all have to have the same fixed length. Thus
+<pre>
+ (?&#60;=bullock|donkey)
+</pre>
+is permitted, but
+<pre>
+ (?&#60;!dogs?|cats?)
+</pre>
+causes an error at compile time. Branches that match different length strings
+are permitted only at the top level of a lookbehind assertion. This is an
+extension compared with Perl, which requires all branches to match the same
+length of string. An assertion such as
+<pre>
+ (?&#60;=ab(c|de))
+</pre>
+is not permitted, because its single top-level branch can match two different
+lengths, but it is acceptable to PCRE if rewritten to use two top-level
+branches:
+<pre>
+ (?&#60;=abc|abde)
+</pre>
+In some cases, the escape sequence \K
+<a href="#resetmatchstart">(see above)</a>
+can be used instead of a lookbehind assertion to get round the fixed-length
+restriction.
+</P>
+<P>
+The implementation of lookbehind assertions is, for each alternative, to
+temporarily move the current position back by the fixed length and then try to
+match. If there are insufficient characters before the current position, the
+assertion fails.
+</P>
+<P>
+In a UTF mode, PCRE does not allow the \C escape (which matches a single data
+unit even in a UTF mode) to appear in lookbehind assertions, because it makes
+it impossible to calculate the length of the lookbehind. The \X and \R
+escapes, which can match different numbers of data units, are also not
+permitted.
+</P>
+<P>
+<a href="#subpatternsassubroutines">"Subroutine"</a>
+calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
+as the subpattern matches a fixed-length string.
+<a href="#recursion">Recursion,</a>
+however, is not supported.
+</P>
+<P>
+Possessive quantifiers can be used in conjunction with lookbehind assertions to
+specify efficient matching of fixed-length strings at the end of subject
+strings. Consider a simple pattern such as
+<pre>
+ abcd$
+</pre>
+when applied to a long string that does not match. Because matching proceeds
+from left to right, PCRE will look for each "a" in the subject and then see if
+what follows matches the rest of the pattern. If the pattern is specified as
+<pre>
+ ^.*abcd$
+</pre>
+the initial .* matches the entire string at first, but when this fails (because
+there is no following "a"), it backtracks to match all but the last character,
+then all but the last two characters, and so on. Once again the search for "a"
+covers the entire string, from right to left, so we are no better off. However,
+if the pattern is written as
+<pre>
+ ^.*+(?&#60;=abcd)
+</pre>
+there can be no backtracking for the .*+ item; it can match only the entire
+string. The subsequent lookbehind assertion does a single test on the last four
+characters. If it fails, the match fails immediately. For long strings, this
+approach makes a significant difference to the processing time.
+</P>
+<br><b>
+Using multiple assertions
+</b><br>
+<P>
+Several assertions (of any sort) may occur in succession. For example,
+<pre>
+ (?&#60;=\d{3})(?&#60;!999)foo
+</pre>
+matches "foo" preceded by three digits that are not "999". Notice that each of
+the assertions is applied independently at the same point in the subject
+string. First there is a check that the previous three characters are all
+digits, and then there is a check that the same three characters are not "999".
+This pattern does <i>not</i> match "foo" preceded by six characters, the first
+of which are digits and the last three of which are not "999". For example, it
+doesn't match "123abcfoo". A pattern to do that is
+<pre>
+ (?&#60;=\d{3}...)(?&#60;!999)foo
+</pre>
+This time the first assertion looks at the preceding six characters, checking
+that the first three are digits, and then the second assertion checks that the
+preceding three characters are not "999".
+</P>
+<P>
+Assertions can be nested in any combination. For example,
+<pre>
+ (?&#60;=(?&#60;!foo)bar)baz
+</pre>
+matches an occurrence of "baz" that is preceded by "bar" which in turn is not
+preceded by "foo", while
+<pre>
+ (?&#60;=\d{3}(?!999)...)foo
+</pre>
+is another pattern that matches "foo" preceded by three digits and any three
+characters that are not "999".
+<a name="conditions"></a></P>
+<br><a name="SEC21" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>
+<P>
+It is possible to cause the matching process to obey a subpattern
+conditionally or to choose between two alternative subpatterns, depending on
+the result of an assertion, or whether a specific capturing subpattern has
+already been matched. The two possible forms of conditional subpattern are:
+<pre>
+ (?(condition)yes-pattern)
+ (?(condition)yes-pattern|no-pattern)
+</pre>
+If the condition is satisfied, the yes-pattern is used; otherwise the
+no-pattern (if present) is used. If there are more than two alternatives in the
+subpattern, a compile-time error occurs. Each of the two alternatives may
+itself contain nested subpatterns of any form, including conditional
+subpatterns; the restriction to two alternatives applies only at the level of
+the condition. This pattern fragment is an example where the alternatives are
+complex:
+<pre>
+ (?(1) (A|B|C) | (D | (?(2)E|F) | E) )
+
+</PRE>
+</P>
+<P>
+There are four kinds of condition: references to subpatterns, references to
+recursion, a pseudo-condition called DEFINE, and assertions.
+</P>
+<br><b>
+Checking for a used subpattern by number
+</b><br>
+<P>
+If the text between the parentheses consists of a sequence of digits, the
+condition is true if a capturing subpattern of that number has previously
+matched. If there is more than one capturing subpattern with the same number
+(see the earlier
+<a href="#recursion">section about duplicate subpattern numbers),</a>
+the condition is true if any of them have matched. An alternative notation is
+to precede the digits with a plus or minus sign. In this case, the subpattern
+number is relative rather than absolute. The most recently opened parentheses
+can be referenced by (?(-1), the next most recent by (?(-2), and so on. Inside
+loops it can also make sense to refer to subsequent groups. The next
+parentheses to be opened can be referenced as (?(+1), and so on. (The value
+zero in any of these forms is not used; it provokes a compile-time error.)
+</P>
+<P>
+Consider the following pattern, which contains non-significant white space to
+make it more readable (assume the PCRE_EXTENDED option) and to divide it into
+three parts for ease of discussion:
+<pre>
+ ( \( )? [^()]+ (?(1) \) )
+</pre>
+The first part matches an optional opening parenthesis, and if that
+character is present, sets it as the first captured substring. The second part
+matches one or more characters that are not parentheses. The third part is a
+conditional subpattern that tests whether or not the first set of parentheses
+matched. If they did, that is, if subject started with an opening parenthesis,
+the condition is true, and so the yes-pattern is executed and a closing
+parenthesis is required. Otherwise, since no-pattern is not present, the
+subpattern matches nothing. In other words, this pattern matches a sequence of
+non-parentheses, optionally enclosed in parentheses.
+</P>
+<P>
+If you were embedding this pattern in a larger one, you could use a relative
+reference:
+<pre>
+ ...other stuff... ( \( )? [^()]+ (?(-1) \) ) ...
+</pre>
+This makes the fragment independent of the parentheses in the larger pattern.
+</P>
+<br><b>
+Checking for a used subpattern by name
+</b><br>
+<P>
+Perl uses the syntax (?(&#60;name&#62;)...) or (?('name')...) to test for a used
+subpattern by name. For compatibility with earlier versions of PCRE, which had
+this facility before Perl, the syntax (?(name)...) is also recognized.
+</P>
+<P>
+Rewriting the above example to use a named subpattern gives this:
+<pre>
+ (?&#60;OPEN&#62; \( )? [^()]+ (?(&#60;OPEN&#62;) \) )
+</pre>
+If the name used in a condition of this kind is a duplicate, the test is
+applied to all subpatterns of the same name, and is true if any one of them has
+matched.
+</P>
+<br><b>
+Checking for pattern recursion
+</b><br>
+<P>
+If the condition is the string (R), and there is no subpattern with the name R,
+the condition is true if a recursive call to the whole pattern or any
+subpattern has been made. If digits or a name preceded by ampersand follow the
+letter R, for example:
+<pre>
+ (?(R3)...) or (?(R&name)...)
+</pre>
+the condition is true if the most recent recursion is into a subpattern whose
+number or name is given. This condition does not check the entire recursion
+stack. If the name used in a condition of this kind is a duplicate, the test is
+applied to all subpatterns of the same name, and is true if any one of them is
+the most recent recursion.
+</P>
+<P>
+At "top level", all these recursion test conditions are false.
+<a href="#recursion">The syntax for recursive patterns</a>
+is described below.
+<a name="subdefine"></a></P>
+<br><b>
+Defining subpatterns for use by reference only
+</b><br>
+<P>
+If the condition is the string (DEFINE), and there is no subpattern with the
+name DEFINE, the condition is always false. In this case, there may be only one
+alternative in the subpattern. It is always skipped if control reaches this
+point in the pattern; the idea of DEFINE is that it can be used to define
+subroutines that can be referenced from elsewhere. (The use of
+<a href="#subpatternsassubroutines">subroutines</a>
+is described below.) For example, a pattern to match an IPv4 address such as
+"192.168.23.245" could be written like this (ignore white space and line
+breaks):
+<pre>
+ (?(DEFINE) (?&#60;byte&#62; 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
+ \b (?&byte) (\.(?&byte)){3} \b
+</pre>
+The first part of the pattern is a DEFINE group inside which a another group
+named "byte" is defined. This matches an individual component of an IPv4
+address (a number less than 256). When matching takes place, this part of the
+pattern is skipped because DEFINE acts like a false condition. The rest of the
+pattern uses references to the named group to match the four dot-separated
+components of an IPv4 address, insisting on a word boundary at each end.
+</P>
+<br><b>
+Assertion conditions
+</b><br>
+<P>
+If the condition is not in any of the above formats, it must be an assertion.
+This may be a positive or negative lookahead or lookbehind assertion. Consider
+this pattern, again containing non-significant white space, and with the two
+alternatives on the second line:
+<pre>
+ (?(?=[^a-z]*[a-z])
+ \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} )
+</pre>
+The condition is a positive lookahead assertion that matches an optional
+sequence of non-letters followed by a letter. In other words, it tests for the
+presence of at least one letter in the subject. If a letter is found, the
+subject is matched against the first alternative; otherwise it is matched
+against the second. This pattern matches strings in one of the two forms
+dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
+<a name="comments"></a></P>
+<br><a name="SEC22" href="#TOC1">COMMENTS</a><br>
+<P>
+There are two ways of including comments in patterns that are processed by
+PCRE. In both cases, the start of the comment must not be in a character class,
+nor in the middle of any other sequence of related characters such as (?: or a
+subpattern name or number. The characters that make up a comment play no part
+in the pattern matching.
+</P>
+<P>
+The sequence (?# marks the start of a comment that continues up to the next
+closing parenthesis. Nested parentheses are not permitted. If the PCRE_EXTENDED
+option is set, an unescaped # character also introduces a comment, which in
+this case continues to immediately after the next newline character or
+character sequence in the pattern. Which characters are interpreted as newlines
+is controlled by the options passed to a compiling function or by a special
+sequence at the start of the pattern, as described in the section entitled
+<a href="#newlines">"Newline conventions"</a>
+above. Note that the end of this type of comment is a literal newline sequence
+in the pattern; escape sequences that happen to represent a newline do not
+count. For example, consider this pattern when PCRE_EXTENDED is set, and the
+default newline convention is in force:
+<pre>
+ abc #comment \n still comment
+</pre>
+On encountering the # character, <b>pcre_compile()</b> skips along, looking for
+a newline in the pattern. The sequence \n is still literal at this stage, so
+it does not terminate the comment. Only an actual character with the code value
+0x0a (the default newline) does so.
+<a name="recursion"></a></P>
+<br><a name="SEC23" href="#TOC1">RECURSIVE PATTERNS</a><br>
+<P>
+Consider the problem of matching a string in parentheses, allowing for
+unlimited nested parentheses. Without the use of recursion, the best that can
+be done is to use a pattern that matches up to some fixed depth of nesting. It
+is not possible to handle an arbitrary nesting depth.
+</P>
+<P>
+For some time, Perl has provided a facility that allows regular expressions to
+recurse (amongst other things). It does this by interpolating Perl code in the
+expression at run time, and the code can refer to the expression itself. A Perl
+pattern using code interpolation to solve the parentheses problem can be
+created like this:
+<pre>
+ $re = qr{\( (?: (?&#62;[^()]+) | (?p{$re}) )* \)}x;
+</pre>
+The (?p{...}) item interpolates Perl code at run time, and in this case refers
+recursively to the pattern in which it appears.
+</P>
+<P>
+Obviously, PCRE cannot support the interpolation of Perl code. Instead, it
+supports special syntax for recursion of the entire pattern, and also for
+individual subpattern recursion. After its introduction in PCRE and Python,
+this kind of recursion was subsequently introduced into Perl at release 5.10.
+</P>
+<P>
+A special item that consists of (? followed by a number greater than zero and a
+closing parenthesis is a recursive subroutine call of the subpattern of the
+given number, provided that it occurs inside that subpattern. (If not, it is a
+<a href="#subpatternsassubroutines">non-recursive subroutine</a>
+call, which is described in the next section.) The special item (?R) or (?0) is
+a recursive call of the entire regular expression.
+</P>
+<P>
+This PCRE pattern solves the nested parentheses problem (assume the
+PCRE_EXTENDED option is set so that white space is ignored):
+<pre>
+ \( ( [^()]++ | (?R) )* \)
+</pre>
+First it matches an opening parenthesis. Then it matches any number of
+substrings which can either be a sequence of non-parentheses, or a recursive
+match of the pattern itself (that is, a correctly parenthesized substring).
+Finally there is a closing parenthesis. Note the use of a possessive quantifier
+to avoid backtracking into sequences of non-parentheses.
+</P>
+<P>
+If this were part of a larger pattern, you would not want to recurse the entire
+pattern, so instead you could use this:
+<pre>
+ ( \( ( [^()]++ | (?1) )* \) )
+</pre>
+We have put the pattern into parentheses, and caused the recursion to refer to
+them instead of the whole pattern.
+</P>
+<P>
+In a larger pattern, keeping track of parenthesis numbers can be tricky. This
+is made easier by the use of relative references. Instead of (?1) in the
+pattern above you can write (?-2) to refer to the second most recently opened
+parentheses preceding the recursion. In other words, a negative number counts
+capturing parentheses leftwards from the point at which it is encountered.
+</P>
+<P>
+It is also possible to refer to subsequently opened parentheses, by writing
+references such as (?+2). However, these cannot be recursive because the
+reference is not inside the parentheses that are referenced. They are always
+<a href="#subpatternsassubroutines">non-recursive subroutine</a>
+calls, as described in the next section.
+</P>
+<P>
+An alternative approach is to use named parentheses instead. The Perl syntax
+for this is (?&name); PCRE's earlier syntax (?P&#62;name) is also supported. We
+could rewrite the above example as follows:
+<pre>
+ (?&#60;pn&#62; \( ( [^()]++ | (?&pn) )* \) )
+</pre>
+If there is more than one subpattern with the same name, the earliest one is
+used.
+</P>
+<P>
+This particular example pattern that we have been looking at contains nested
+unlimited repeats, and so the use of a possessive quantifier for matching
+strings of non-parentheses is important when applying the pattern to strings
+that do not match. For example, when this pattern is applied to
+<pre>
+ (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
+</pre>
+it yields "no match" quickly. However, if a possessive quantifier is not used,
+the match runs for a very long time indeed because there are so many different
+ways the + and * repeats can carve up the subject, and all have to be tested
+before failure can be reported.
+</P>
+<P>
+At the end of a match, the values of capturing parentheses are those from
+the outermost level. If you want to obtain intermediate values, a callout
+function can be used (see below and the
+<a href="pcrecallout.html"><b>pcrecallout</b></a>
+documentation). If the pattern above is matched against
+<pre>
+ (ab(cd)ef)
+</pre>
+the value for the inner capturing parentheses (numbered 2) is "ef", which is
+the last value taken on at the top level. If a capturing subpattern is not
+matched at the top level, its final captured value is unset, even if it was
+(temporarily) set at a deeper level during the matching process.
+</P>
+<P>
+If there are more than 15 capturing parentheses in a pattern, PCRE has to
+obtain extra memory to store data during a recursion, which it does by using
+<b>pcre_malloc</b>, freeing it via <b>pcre_free</b> afterwards. If no memory can
+be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
+</P>
+<P>
+Do not confuse the (?R) item with the condition (R), which tests for recursion.
+Consider this pattern, which matches text in angle brackets, allowing for
+arbitrary nesting. Only digits are allowed in nested brackets (that is, when
+recursing), whereas any characters are permitted at the outer level.
+<pre>
+ &#60; (?: (?(R) \d++ | [^&#60;&#62;]*+) | (?R)) * &#62;
+</pre>
+In this pattern, (?(R) is the start of a conditional subpattern, with two
+different alternatives for the recursive and non-recursive cases. The (?R) item
+is the actual recursive call.
+<a name="recursiondifference"></a></P>
+<br><b>
+Differences in recursion processing between PCRE and Perl
+</b><br>
+<P>
+Recursion processing in PCRE differs from Perl in two important ways. In PCRE
+(like Python, but unlike Perl), a recursive subpattern call is always treated
+as an atomic group. That is, once it has matched some of the subject string, it
+is never re-entered, even if it contains untried alternatives and there is a
+subsequent matching failure. This can be illustrated by the following pattern,
+which purports to match a palindromic string that contains an odd number of
+characters (for example, "a", "aba", "abcba", "abcdcba"):
+<pre>
+ ^(.|(.)(?1)\2)$
+</pre>
+The idea is that it either matches a single character, or two identical
+characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE
+it does not if the pattern is longer than three characters. Consider the
+subject string "abcba":
+</P>
+<P>
+At the top level, the first character is matched, but as it is not at the end
+of the string, the first alternative fails; the second alternative is taken
+and the recursion kicks in. The recursive call to subpattern 1 successfully
+matches the next character ("b"). (Note that the beginning and end of line
+tests are not part of the recursion).
+</P>
+<P>
+Back at the top level, the next character ("c") is compared with what
+subpattern 2 matched, which was "a". This fails. Because the recursion is
+treated as an atomic group, there are now no backtracking points, and so the
+entire match fails. (Perl is able, at this point, to re-enter the recursion and
+try the second alternative.) However, if the pattern is written with the
+alternatives in the other order, things are different:
+<pre>
+ ^((.)(?1)\2|.)$
+</pre>
+This time, the recursing alternative is tried first, and continues to recurse
+until it runs out of characters, at which point the recursion fails. But this
+time we do have another alternative to try at the higher level. That is the big
+difference: in the previous case the remaining alternative is at a deeper
+recursion level, which PCRE cannot use.
+</P>
+<P>
+To change the pattern so that it matches all palindromic strings, not just
+those with an odd number of characters, it is tempting to change the pattern to
+this:
+<pre>
+ ^((.)(?1)\2|.?)$
+</pre>
+Again, this works in Perl, but not in PCRE, and for the same reason. When a
+deeper recursion has matched a single character, it cannot be entered again in
+order to match an empty string. The solution is to separate the two cases, and
+write out the odd and even cases as alternatives at the higher level:
+<pre>
+ ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
+</pre>
+If you want to match typical palindromic phrases, the pattern has to ignore all
+non-word characters, which can be done like this:
+<pre>
+ ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
+</pre>
+If run with the PCRE_CASELESS option, this pattern matches phrases such as "A
+man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note
+the use of the possessive quantifier *+ to avoid backtracking into sequences of
+non-word characters. Without this, PCRE takes a great deal longer (ten times or
+more) to match typical phrases, and Perl takes so long that you think it has
+gone into a loop.
+</P>
+<P>
+<b>WARNING</b>: The palindrome-matching patterns above work only if the subject
+string does not start with a palindrome that is shorter than the entire string.
+For example, although "abcba" is correctly matched, if the subject is "ababa",
+PCRE finds the palindrome "aba" at the start, then fails at top level because
+the end of the string does not follow. Once again, it cannot jump back into the
+recursion to try other alternatives, so the entire match fails.
+</P>
+<P>
+The second way in which PCRE and Perl differ in their recursion processing is
+in the handling of captured values. In Perl, when a subpattern is called
+recursively or as a subpattern (see the next section), it has no access to any
+values that were captured outside the recursion, whereas in PCRE these values
+can be referenced. Consider this pattern:
+<pre>
+ ^(.)(\1|a(?2))
+</pre>
+In PCRE, this pattern matches "bab". The first capturing parentheses match "b",
+then in the second group, when the back reference \1 fails to match "b", the
+second alternative matches "a" and then recurses. In the recursion, \1 does
+now match "b" and so the whole match succeeds. In Perl, the pattern fails to
+match because inside the recursive call \1 cannot access the externally set
+value.
+<a name="subpatternsassubroutines"></a></P>
+<br><a name="SEC24" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
+<P>
+If the syntax for a recursive subpattern call (either by number or by
+name) is used outside the parentheses to which it refers, it operates like a
+subroutine in a programming language. The called subpattern may be defined
+before or after the reference. A numbered reference can be absolute or
+relative, as in these examples:
+<pre>
+ (...(absolute)...)...(?2)...
+ (...(relative)...)...(?-1)...
+ (...(?+1)...(relative)...
+</pre>
+An earlier example pointed out that the pattern
+<pre>
+ (sens|respons)e and \1ibility
+</pre>
+matches "sense and sensibility" and "response and responsibility", but not
+"sense and responsibility". If instead the pattern
+<pre>
+ (sens|respons)e and (?1)ibility
+</pre>
+is used, it does match "sense and responsibility" as well as the other two
+strings. Another example is given in the discussion of DEFINE above.
+</P>
+<P>
+All subroutine calls, whether recursive or not, are always treated as atomic
+groups. That is, once a subroutine has matched some of the subject string, it
+is never re-entered, even if it contains untried alternatives and there is a
+subsequent matching failure. Any capturing parentheses that are set during the
+subroutine call revert to their previous values afterwards.
+</P>
+<P>
+Processing options such as case-independence are fixed when a subpattern is
+defined, so if it is used as a subroutine, such options cannot be changed for
+different calls. For example, consider this pattern:
+<pre>
+ (abc)(?i:(?-1))
+</pre>
+It matches "abcabc". It does not match "abcABC" because the change of
+processing option does not affect the called subpattern.
+<a name="onigurumasubroutines"></a></P>
+<br><a name="SEC25" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
+<P>
+For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
+a number enclosed either in angle brackets or single quotes, is an alternative
+syntax for referencing a subpattern as a subroutine, possibly recursively. Here
+are two of the examples used above, rewritten using this syntax:
+<pre>
+ (?&#60;pn&#62; \( ( (?&#62;[^()]+) | \g&#60;pn&#62; )* \) )
+ (sens|respons)e and \g'1'ibility
+</pre>
+PCRE supports an extension to Oniguruma: if a number is preceded by a
+plus or a minus sign it is taken as a relative reference. For example:
+<pre>
+ (abc)(?i:\g&#60;-1&#62;)
+</pre>
+Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
+synonymous. The former is a back reference; the latter is a subroutine call.
+</P>
+<br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
+<P>
+Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
+code to be obeyed in the middle of matching a regular expression. This makes it
+possible, amongst other things, to extract different substrings that match the
+same pair of parentheses when there is a repetition.
+</P>
+<P>
+PCRE provides a similar feature, but of course it cannot obey arbitrary Perl
+code. The feature is called "callout". The caller of PCRE provides an external
+function by putting its entry point in the global variable <i>pcre_callout</i>
+(8-bit library) or <i>pcre[16|32]_callout</i> (16-bit or 32-bit library).
+By default, this variable contains NULL, which disables all calling out.
+</P>
+<P>
+Within a regular expression, (?C) indicates the points at which the external
+function is to be called. If you want to identify different callout points, you
+can put a number less than 256 after the letter C. The default value is zero.
+For example, this pattern has two callout points:
+<pre>
+ (?C1)abc(?C2)def
+</pre>
+If the PCRE_AUTO_CALLOUT flag is passed to a compiling function, callouts are
+automatically installed before each item in the pattern. They are all numbered
+255. If there is a conditional group in the pattern whose condition is an
+assertion, an additional callout is inserted just before the condition. An
+explicit callout may also be set at this position, as in this example:
+<pre>
+ (?(?C9)(?=a)abc|def)
+</pre>
+Note that this applies only to assertion conditions, not to other types of
+condition.
+</P>
+<P>
+During matching, when PCRE reaches a callout point, the external function is
+called. It is provided with the number of the callout, the position in the
+pattern, and, optionally, one item of data originally supplied by the caller of
+the matching function. The callout function may cause matching to proceed, to
+backtrack, or to fail altogether.
+</P>
+<P>
+By default, PCRE implements a number of optimizations at compile time and
+matching time, and one side-effect is that sometimes callouts are skipped. If
+you need all possible callouts to happen, you need to set options that disable
+the relevant optimizations. More details, and a complete description of the
+interface to the callout function, are given in the
+<a href="pcrecallout.html"><b>pcrecallout</b></a>
+documentation.
+<a name="backtrackcontrol"></a></P>
+<br><a name="SEC27" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<P>
+Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which
+are still described in the Perl documentation as "experimental and subject to
+change or removal in a future version of Perl". It goes on to say: "Their usage
+in production code should be noted to avoid problems during upgrades." The same
+remarks apply to the PCRE features described in this section.
+</P>
+<P>
+The new verbs make use of what was previously invalid syntax: an opening
+parenthesis followed by an asterisk. They are generally of the form
+(*VERB) or (*VERB:NAME). Some may take either form, possibly behaving
+differently depending on whether or not a name is present. A name is any
+sequence of characters that does not include a closing parenthesis. The maximum
+length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit
+libraries. If the name is empty, that is, if the closing parenthesis
+immediately follows the colon, the effect is as if the colon were not there.
+Any number of these verbs may occur in a pattern.
+</P>
+<P>
+Since these verbs are specifically related to backtracking, most of them can be
+used only when the pattern is to be matched using one of the traditional
+matching functions, because these use a backtracking algorithm. With the
+exception of (*FAIL), which behaves like a failing negative assertion, the
+backtracking control verbs cause an error if encountered by a DFA matching
+function.
+</P>
+<P>
+The behaviour of these verbs in
+<a href="#btrepeat">repeated groups,</a>
+<a href="#btassert">assertions,</a>
+and in
+<a href="#btsub">subpatterns called as subroutines</a>
+(whether or not recursively) is documented below.
+<a name="nooptimize"></a></P>
+<br><b>
+Optimizations that affect backtracking verbs
+</b><br>
+<P>
+PCRE contains some optimizations that are used to speed up matching by running
+some checks at the start of each match attempt. For example, it may know the
+minimum length of matching subject, or that a particular character must be
+present. When one of these optimizations bypasses the running of a match, any
+included backtracking verbs will not, of course, be processed. You can suppress
+the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option
+when calling <b>pcre_compile()</b> or <b>pcre_exec()</b>, or by starting the
+pattern with (*NO_START_OPT). There is more discussion of this option in the
+section entitled
+<a href="pcreapi.html#execoptions">"Option bits for <b>pcre_exec()</b>"</a>
+in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+</P>
+<P>
+Experiments with Perl suggest that it too has similar optimizations, sometimes
+leading to anomalous results.
+</P>
+<br><b>
+Verbs that act immediately
+</b><br>
+<P>
+The following verbs act as soon as they are encountered. They may not be
+followed by a name.
+<pre>
+ (*ACCEPT)
+</pre>
+This verb causes the match to end successfully, skipping the remainder of the
+pattern. However, when it is inside a subpattern that is called as a
+subroutine, only that subpattern is ended successfully. Matching then continues
+at the outer level. If (*ACCEPT) in triggered in a positive assertion, the
+assertion succeeds; in a negative assertion, the assertion fails.
+</P>
+<P>
+If (*ACCEPT) is inside capturing parentheses, the data so far is captured. For
+example:
+<pre>
+ A((?:A|B(*ACCEPT)|C)D)
+</pre>
+This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
+the outer parentheses.
+<pre>
+ (*FAIL) or (*F)
+</pre>
+This verb causes a matching failure, forcing backtracking to occur. It is
+equivalent to (?!) but easier to read. The Perl documentation notes that it is
+probably useful only when combined with (?{}) or (??{}). Those are, of course,
+Perl features that are not present in PCRE. The nearest equivalent is the
+callout feature, as for example in this pattern:
+<pre>
+ a+(?C)(*FAIL)
+</pre>
+A match with the string "aaaa" always fails, but the callout is taken before
+each backtrack happens (in this example, 10 times).
+</P>
+<br><b>
+Recording which path was taken
+</b><br>
+<P>
+There is one verb whose main purpose is to track how a match was arrived at,
+though it also has a secondary use in conjunction with advancing the match
+starting point (see (*SKIP) below).
+<pre>
+ (*MARK:NAME) or (*:NAME)
+</pre>
+A name is always required with this verb. There may be as many instances of
+(*MARK) as you like in a pattern, and their names do not have to be unique.
+</P>
+<P>
+When a match succeeds, the name of the last-encountered (*MARK:NAME),
+(*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed back to the
+caller as described in the section entitled
+<a href="pcreapi.html#extradata">"Extra data for <b>pcre_exec()</b>"</a>
+in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation. Here is an example of <b>pcretest</b> output, where the /K
+modifier requests the retrieval and outputting of (*MARK) data:
+<pre>
+ re&#62; /X(*MARK:A)Y|X(*MARK:B)Z/K
+ data&#62; XY
+ 0: XY
+ MK: A
+ XZ
+ 0: XZ
+ MK: B
+</pre>
+The (*MARK) name is tagged with "MK:" in this output, and in this example it
+indicates which of the two alternatives matched. This is a more efficient way
+of obtaining this information than putting each alternative in its own
+capturing parentheses.
+</P>
+<P>
+If a verb with a name is encountered in a positive assertion that is true, the
+name is recorded and passed back if it is the last-encountered. This does not
+happen for negative assertions or failing positive assertions.
+</P>
+<P>
+After a partial match or a failed match, the last encountered name in the
+entire match process is returned. For example:
+<pre>
+ re&#62; /X(*MARK:A)Y|X(*MARK:B)Z/K
+ data&#62; XP
+ No match, mark = B
+</pre>
+Note that in this unanchored example the mark is retained from the match
+attempt that started at the letter "X" in the subject. Subsequent match
+attempts starting at "P" and then with an empty string do not get as far as the
+(*MARK) item, but nevertheless do not reset it.
+</P>
+<P>
+If you are interested in (*MARK) values after failed matches, you should
+probably set the PCRE_NO_START_OPTIMIZE option
+<a href="#nooptimize">(see above)</a>
+to ensure that the match is always attempted.
+</P>
+<br><b>
+Verbs that act after backtracking
+</b><br>
+<P>
+The following verbs do nothing when they are encountered. Matching continues
+with what follows, but if there is no subsequent match, causing a backtrack to
+the verb, a failure is forced. That is, backtracking cannot pass to the left of
+the verb. However, when one of these verbs appears inside an atomic group or an
+assertion that is true, its effect is confined to that group, because once the
+group has been matched, there is never any backtracking into it. In this
+situation, backtracking can "jump back" to the left of the entire atomic group
+or assertion. (Remember also, as stated above, that this localization also
+applies in subroutine calls.)
+</P>
+<P>
+These verbs differ in exactly what kind of failure occurs when backtracking
+reaches them. The behaviour described below is what happens when the verb is
+not in a subroutine or an assertion. Subsequent sections cover these special
+cases.
+<pre>
+ (*COMMIT)
+</pre>
+This verb, which may not be followed by a name, causes the whole match to fail
+outright if there is a later matching failure that causes backtracking to reach
+it. Even if the pattern is unanchored, no further attempts to find a match by
+advancing the starting point take place. If (*COMMIT) is the only backtracking
+verb that is encountered, once it has been passed <b>pcre_exec()</b> is
+committed to finding a match at the current starting point, or not at all. For
+example:
+<pre>
+ a+(*COMMIT)b
+</pre>
+This matches "xxaab" but not "aacaab". It can be thought of as a kind of
+dynamic anchor, or "I've started, so I must finish." The name of the most
+recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
+match failure.
+</P>
+<P>
+If there is more than one backtracking verb in a pattern, a different one that
+follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a
+match does not always guarantee that a match must be at this starting point.
+</P>
+<P>
+Note that (*COMMIT) at the start of a pattern is not the same as an anchor,
+unless PCRE's start-of-match optimizations are turned off, as shown in this
+output from <b>pcretest</b>:
+<pre>
+ re&#62; /(*COMMIT)abc/
+ data&#62; xyzabc
+ 0: abc
+ data&#62; xyzabc\Y
+ No match
+</pre>
+For this pattern, PCRE knows that any match must start with "a", so the
+optimization skips along the subject to "a" before applying the pattern to the
+first set of data. The match attempt then succeeds. In the second set of data,
+the escape sequence \Y is interpreted by the <b>pcretest</b> program. It causes
+the PCRE_NO_START_OPTIMIZE option to be set when <b>pcre_exec()</b> is called.
+This disables the optimization that skips along to the first character. The
+pattern is now applied starting at "x", and so the (*COMMIT) causes the match
+to fail without trying any other starting points.
+<pre>
+ (*PRUNE) or (*PRUNE:NAME)
+</pre>
+This verb causes the match to fail at the current starting position in the
+subject if there is a later matching failure that causes backtracking to reach
+it. If the pattern is unanchored, the normal "bumpalong" advance to the next
+starting character then happens. Backtracking can occur as usual to the left of
+(*PRUNE), before it is reached, or when matching to the right of (*PRUNE), but
+if there is no match to the right, backtracking cannot cross (*PRUNE). In
+simple cases, the use of (*PRUNE) is just an alternative to an atomic group or
+possessive quantifier, but there are some uses of (*PRUNE) that cannot be
+expressed in any other way. In an anchored pattern (*PRUNE) has the same effect
+as (*COMMIT).
+</P>
+<P>
+The behaviour of (*PRUNE:NAME) is the not the same as (*MARK:NAME)(*PRUNE).
+It is like (*MARK:NAME) in that the name is remembered for passing back to the
+caller. However, (*SKIP:NAME) searches only for names set with (*MARK).
+<pre>
+ (*SKIP)
+</pre>
+This verb, when given without a name, is like (*PRUNE), except that if the
+pattern is unanchored, the "bumpalong" advance is not to the next character,
+but to the position in the subject where (*SKIP) was encountered. (*SKIP)
+signifies that whatever text was matched leading up to it cannot be part of a
+successful match. Consider:
+<pre>
+ a+(*SKIP)b
+</pre>
+If the subject is "aaaac...", after the first match attempt fails (starting at
+the first character in the string), the starting point skips on to start the
+next attempt at "c". Note that a possessive quantifer does not have the same
+effect as this example; although it would suppress backtracking during the
+first match attempt, the second attempt would start at the second character
+instead of skipping on to "c".
+<pre>
+ (*SKIP:NAME)
+</pre>
+When (*SKIP) has an associated name, its behaviour is modified. When it is
+triggered, the previous path through the pattern is searched for the most
+recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
+is to the subject position that corresponds to that (*MARK) instead of to where
+(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
+(*SKIP) is ignored.
+</P>
+<P>
+Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
+names that are set by (*PRUNE:NAME) or (*THEN:NAME).
+<pre>
+ (*THEN) or (*THEN:NAME)
+</pre>
+This verb causes a skip to the next innermost alternative when backtracking
+reaches it. That is, it cancels any further backtracking within the current
+alternative. Its name comes from the observation that it can be used for a
+pattern-based if-then-else block:
+<pre>
+ ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
+</pre>
+If the COND1 pattern matches, FOO is tried (and possibly further items after
+the end of the group if FOO succeeds); on failure, the matcher skips to the
+second alternative and tries COND2, without backtracking into COND1. If that
+succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no
+more alternatives, so there is a backtrack to whatever came before the entire
+group. If (*THEN) is not inside an alternation, it acts like (*PRUNE).
+</P>
+<P>
+The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN).
+It is like (*MARK:NAME) in that the name is remembered for passing back to the
+caller. However, (*SKIP:NAME) searches only for names set with (*MARK).
+</P>
+<P>
+A subpattern that does not contain a | character is just a part of the
+enclosing alternative; it is not a nested alternation with only one
+alternative. The effect of (*THEN) extends beyond such a subpattern to the
+enclosing alternative. Consider this pattern, where A, B, etc. are complex
+pattern fragments that do not contain any | characters at this level:
+<pre>
+ A (B(*THEN)C) | D
+</pre>
+If A and B are matched, but there is a failure in C, matching does not
+backtrack into A; instead it moves to the next alternative, that is, D.
+However, if the subpattern containing (*THEN) is given an alternative, it
+behaves differently:
+<pre>
+ A (B(*THEN)C | (*FAIL)) | D
+</pre>
+The effect of (*THEN) is now confined to the inner subpattern. After a failure
+in C, matching moves to (*FAIL), which causes the whole subpattern to fail
+because there are no more alternatives to try. In this case, matching does now
+backtrack into A.
+</P>
+<P>
+Note that a conditional subpattern is not considered as having two
+alternatives, because only one is ever used. In other words, the | character in
+a conditional subpattern has a different meaning. Ignoring white space,
+consider:
+<pre>
+ ^.*? (?(?=a) a | b(*THEN)c )
+</pre>
+If the subject is "ba", this pattern does not match. Because .*? is ungreedy,
+it initially matches zero characters. The condition (?=a) then fails, the
+character "b" is matched, but "c" is not. At this point, matching does not
+backtrack to .*? as might perhaps be expected from the presence of the |
+character. The conditional subpattern is part of the single alternative that
+comprises the whole pattern, and so the match fails. (If there was a backtrack
+into .*?, allowing it to match "b", the match would succeed.)
+</P>
+<P>
+The verbs just described provide four different "strengths" of control when
+subsequent matching fails. (*THEN) is the weakest, carrying on the match at the
+next alternative. (*PRUNE) comes next, failing the match at the current
+starting position, but allowing an advance to the next character (for an
+unanchored pattern). (*SKIP) is similar, except that the advance may be more
+than one character. (*COMMIT) is the strongest, causing the entire match to
+fail.
+</P>
+<br><b>
+More than one backtracking verb
+</b><br>
+<P>
+If more than one backtracking verb is present in a pattern, the one that is
+backtracked onto first acts. For example, consider this pattern, where A, B,
+etc. are complex pattern fragments:
+<pre>
+ (A(*COMMIT)B(*THEN)C|ABD)
+</pre>
+If A matches but B fails, the backtrack to (*COMMIT) causes the entire match to
+fail. However, if A and B match, but C fails, the backtrack to (*THEN) causes
+the next alternative (ABD) to be tried. This behaviour is consistent, but is
+not always the same as Perl's. It means that if two or more backtracking verbs
+appear in succession, all the the last of them has no effect. Consider this
+example:
+<pre>
+ ...(*COMMIT)(*PRUNE)...
+</pre>
+If there is a matching failure to the right, backtracking onto (*PRUNE) causes
+it to be triggered, and its action is taken. There can never be a backtrack
+onto (*COMMIT).
+<a name="btrepeat"></a></P>
+<br><b>
+Backtracking verbs in repeated groups
+</b><br>
+<P>
+PCRE differs from Perl in its handling of backtracking verbs in repeated
+groups. For example, consider:
+<pre>
+ /(a(*COMMIT)b)+ac/
+</pre>
+If the subject is "abac", Perl matches, but PCRE fails because the (*COMMIT) in
+the second repeat of the group acts.
+<a name="btassert"></a></P>
+<br><b>
+Backtracking verbs in assertions
+</b><br>
+<P>
+(*FAIL) in an assertion has its normal effect: it forces an immediate backtrack.
+</P>
+<P>
+(*ACCEPT) in a positive assertion causes the assertion to succeed without any
+further processing. In a negative assertion, (*ACCEPT) causes the assertion to
+fail without any further processing.
+</P>
+<P>
+The other backtracking verbs are not treated specially if they appear in a
+positive assertion. In particular, (*THEN) skips to the next alternative in the
+innermost enclosing group that has alternations, whether or not this is within
+the assertion.
+</P>
+<P>
+Negative assertions are, however, different, in order to ensure that changing a
+positive assertion into a negative assertion changes its result. Backtracking
+into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true,
+without considering any further alternative branches in the assertion.
+Backtracking into (*THEN) causes it to skip to the next enclosing alternative
+within the assertion (the normal behaviour), but if the assertion does not have
+such an alternative, (*THEN) behaves like (*PRUNE).
+<a name="btsub"></a></P>
+<br><b>
+Backtracking verbs in subroutines
+</b><br>
+<P>
+These behaviours occur whether or not the subpattern is called recursively.
+Perl's treatment of subroutines is different in some cases.
+</P>
+<P>
+(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces
+an immediate backtrack.
+</P>
+<P>
+(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
+succeed without any further processing. Matching then continues after the
+subroutine call.
+</P>
+<P>
+(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause
+the subroutine match to fail.
+</P>
+<P>
+(*THEN) skips to the next alternative in the innermost enclosing group within
+the subpattern that has alternatives. If there is no such group within the
+subpattern, (*THEN) causes the subroutine match to fail.
+</P>
+<br><a name="SEC28" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3),
+<b>pcresyntax</b>(3), <b>pcre</b>(3), <b>pcre16(3)</b>, <b>pcre32(3)</b>.
+</P>
+<br><a name="SEC29" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC30" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 08 January 2014
+<br>
+Copyright &copy; 1997-2014 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcreperform.html b/doc/html/pcreperform.html
new file mode 100644
index 0000000..dda207f
--- /dev/null
+++ b/doc/html/pcreperform.html
@@ -0,0 +1,195 @@
+<html>
+<head>
+<title>pcreperform specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcreperform man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+PCRE PERFORMANCE
+</b><br>
+<P>
+Two aspects of performance are discussed below: memory usage and processing
+time. The way you express your pattern as a regular expression can affect both
+of them.
+</P>
+<br><b>
+COMPILED PATTERN MEMORY USAGE
+</b><br>
+<P>
+Patterns are compiled by PCRE into a reasonably efficient interpretive code, so
+that most simple patterns do not use much memory. However, there is one case
+where the memory usage of a compiled pattern can be unexpectedly large. If a
+parenthesized subpattern has a quantifier with a minimum greater than 1 and/or
+a limited maximum, the whole subpattern is repeated in the compiled code. For
+example, the pattern
+<pre>
+ (abc|def){2,4}
+</pre>
+is compiled as if it were
+<pre>
+ (abc|def)(abc|def)((abc|def)(abc|def)?)?
+</pre>
+(Technical aside: It is done this way so that backtrack points within each of
+the repetitions can be independently maintained.)
+</P>
+<P>
+For regular expressions whose quantifiers use only small numbers, this is not
+usually a problem. However, if the numbers are large, and particularly if such
+repetitions are nested, the memory usage can become an embarrassment. For
+example, the very simple pattern
+<pre>
+ ((ab){1,1000}c){1,3}
+</pre>
+uses 51K bytes when compiled using the 8-bit library. When PCRE is compiled
+with its default internal pointer size of two bytes, the size limit on a
+compiled pattern is 64K data units, and this is reached with the above pattern
+if the outer repetition is increased from 3 to 4. PCRE can be compiled to use
+larger internal pointers and thus handle larger compiled patterns, but it is
+better to try to rewrite your pattern to use less memory if you can.
+</P>
+<P>
+One way of reducing the memory usage for such patterns is to make use of PCRE's
+<a href="pcrepattern.html#subpatternsassubroutines">"subroutine"</a>
+facility. Re-writing the above pattern as
+<pre>
+ ((ab)(?2){0,999}c)(?1){0,2}
+</pre>
+reduces the memory requirements to 18K, and indeed it remains under 20K even
+with the outer repetition increased to 100. However, this pattern is not
+exactly equivalent, because the "subroutine" calls are treated as
+<a href="pcrepattern.html#atomicgroup">atomic groups</a>
+into which there can be no backtracking if there is a subsequent matching
+failure. Therefore, PCRE cannot do this kind of rewriting automatically.
+Furthermore, there is a noticeable loss of speed when executing the modified
+pattern. Nevertheless, if the atomic grouping is not a problem and the loss of
+speed is acceptable, this kind of rewriting will allow you to process patterns
+that PCRE cannot otherwise handle.
+</P>
+<br><b>
+STACK USAGE AT RUN TIME
+</b><br>
+<P>
+When <b>pcre_exec()</b> or <b>pcre[16|32]_exec()</b> is used for matching, certain
+kinds of pattern can cause it to use large amounts of the process stack. In
+some environments the default process stack is quite small, and if it runs out
+the result is often SIGSEGV. This issue is probably the most frequently raised
+problem with PCRE. Rewriting your pattern can often help. The
+<a href="pcrestack.html"><b>pcrestack</b></a>
+documentation discusses this issue in detail.
+</P>
+<br><b>
+PROCESSING TIME
+</b><br>
+<P>
+Certain items in regular expression patterns are processed more efficiently
+than others. It is more efficient to use a character class like [aeiou] than a
+set of single-character alternatives such as (a|e|i|o|u). In general, the
+simplest construction that provides the required behaviour is usually the most
+efficient. Jeffrey Friedl's book contains a lot of useful general discussion
+about optimizing regular expressions for efficient performance. This document
+contains a few observations about PCRE.
+</P>
+<P>
+Using Unicode character properties (the \p, \P, and \X escapes) is slow,
+because PCRE has to use a multi-stage table lookup whenever it needs a
+character's property. If you can find an alternative pattern that does not use
+character properties, it will probably be faster.
+</P>
+<P>
+By default, the escape sequences \b, \d, \s, and \w, and the POSIX
+character classes such as [:alpha:] do not use Unicode properties, partly for
+backwards compatibility, and partly for performance reasons. However, you can
+set PCRE_UCP if you want Unicode character properties to be used. This can
+double the matching time for items such as \d, when matched with
+a traditional matching function; the performance loss is less with
+a DFA matching function, and in both cases there is not much difference for
+\b.
+</P>
+<P>
+When a pattern begins with .* not in parentheses, or in parentheses that are
+not the subject of a backreference, and the PCRE_DOTALL option is set, the
+pattern is implicitly anchored by PCRE, since it can match only at the start of
+a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
+optimization, because the . metacharacter does not then match a newline, and if
+the subject string contains newlines, the pattern may match from the character
+immediately following one of them instead of from the very start. For example,
+the pattern
+<pre>
+ .*second
+</pre>
+matches the subject "first\nand second" (where \n stands for a newline
+character), with the match starting at the seventh character. In order to do
+this, PCRE has to retry the match starting after every newline in the subject.
+</P>
+<P>
+If you are using such a pattern with subject strings that do not contain
+newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
+the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE
+from having to scan along the subject looking for a newline to restart at.
+</P>
+<P>
+Beware of patterns that contain nested indefinite repeats. These can take a
+long time to run when applied to a string that does not match. Consider the
+pattern fragment
+<pre>
+ ^(a+)*
+</pre>
+This can match "aaaa" in 16 different ways, and this number increases very
+rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
+times, and for each of those cases other than 0 or 4, the + repeats can match
+different numbers of times.) When the remainder of the pattern is such that the
+entire match is going to fail, PCRE has in principle to try every possible
+variation, and this can take an extremely long time, even for relatively short
+strings.
+</P>
+<P>
+An optimization catches some of the more simple cases such as
+<pre>
+ (a+)*b
+</pre>
+where a literal character follows. Before embarking on the standard matching
+procedure, PCRE checks that there is a "b" later in the subject string, and if
+there is not, it fails the match immediately. However, when there is no
+following literal this optimization cannot be used. You can see the difference
+by comparing the behaviour of
+<pre>
+ (a+)*\d
+</pre>
+with the pattern above. The former gives a failure almost instantly when
+applied to a whole line of "a" characters, whereas the latter takes an
+appreciable time with strings longer than about 20 characters.
+</P>
+<P>
+In many cases, the solution to this kind of performance issue is to use an
+atomic group or a possessive quantifier.
+</P>
+<br><b>
+AUTHOR
+</b><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><b>
+REVISION
+</b><br>
+<P>
+Last updated: 25 August 2012
+<br>
+Copyright &copy; 1997-2012 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcreposix.html b/doc/html/pcreposix.html
new file mode 100644
index 0000000..18924cf
--- /dev/null
+++ b/doc/html/pcreposix.html
@@ -0,0 +1,290 @@
+<html>
+<head>
+<title>pcreposix specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcreposix man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
+<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
+<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
+<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
+<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
+<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
+<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
+<li><a name="TOC8" href="#SEC8">AUTHOR</a>
+<li><a name="TOC9" href="#SEC9">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
+<P>
+<b>#include &#60;pcreposix.h&#62;</b>
+</P>
+<P>
+<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
+<b> int <i>cflags</i>);</b>
+<br>
+<br>
+<b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b>
+<b> size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
+<b> size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
+<b> char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
+<br>
+<br>
+<b>void regfree(regex_t *<i>preg</i>);</b>
+</P>
+<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
+<P>
+This set of functions provides a POSIX-style API for the PCRE regular
+expression 8-bit library. See the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation for a description of PCRE's native API, which contains much
+additional functionality. There is no POSIX-style wrapper for PCRE's 16-bit
+and 32-bit library.
+</P>
+<P>
+The functions described here are just wrapper functions that ultimately call
+the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b>
+header file, and on Unix systems the library itself is called
+<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the
+command for linking an application that uses them. Because the POSIX functions
+call the native ones, it is also necessary to add <b>-lpcre</b>.
+</P>
+<P>
+I have implemented only those POSIX option bits that can be reasonably mapped
+to PCRE native options. In addition, the option REG_EXTENDED is defined with
+the value zero. This has no effect, but since programs that are written to the
+POSIX interface often use it, this makes it easier to slot in PCRE as a
+replacement library. Other POSIX options are not even defined.
+</P>
+<P>
+There are also some other options that are not defined by POSIX. These have
+been added at the request of users who want to make use of certain
+PCRE-specific features via the POSIX calling interface.
+</P>
+<P>
+When PCRE is called via these functions, it is only the API that is POSIX-like
+in style. The syntax and semantics of the regular expressions themselves are
+still those of Perl, subject to the setting of various PCRE options, as
+described below. "POSIX-like in style" means that the API approximates to the
+POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
+domains it is probably even less compatible.
+</P>
+<P>
+The header for these functions is supplied as <b>pcreposix.h</b> to avoid any
+potential clash with other POSIX libraries. It can, of course, be renamed or
+aliased as <b>regex.h</b>, which is the "correct" name. It provides two
+structure types, <i>regex_t</i> for compiled internal forms, and
+<i>regmatch_t</i> for returning captured substrings. It also defines some
+constants whose names start with "REG_"; these are used for setting options and
+identifying error codes.
+</P>
+<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
+<P>
+The function <b>regcomp()</b> is called to compile a pattern into an
+internal form. The pattern is a C string terminated by a binary zero, and
+is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
+to a <b>regex_t</b> structure that is used as a base for storing information
+about the compiled regular expression.
+</P>
+<P>
+The argument <i>cflags</i> is either zero, or contains one or more of the bits
+defined by the following macros:
+<pre>
+ REG_DOTALL
+</pre>
+The PCRE_DOTALL option is set when the regular expression is passed for
+compilation to the native function. Note that REG_DOTALL is not part of the
+POSIX standard.
+<pre>
+ REG_ICASE
+</pre>
+The PCRE_CASELESS option is set when the regular expression is passed for
+compilation to the native function.
+<pre>
+ REG_NEWLINE
+</pre>
+The PCRE_MULTILINE option is set when the regular expression is passed for
+compilation to the native function. Note that this does <i>not</i> mimic the
+defined POSIX behaviour for REG_NEWLINE (see the following section).
+<pre>
+ REG_NOSUB
+</pre>
+The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
+for compilation to the native function. In addition, when a pattern that is
+compiled with this flag is passed to <b>regexec()</b> for matching, the
+<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
+are returned.
+<pre>
+ REG_UCP
+</pre>
+The PCRE_UCP option is set when the regular expression is passed for
+compilation to the native function. This causes PCRE to use Unicode properties
+when matchine \d, \w, etc., instead of just recognizing ASCII values. Note
+that REG_UTF8 is not part of the POSIX standard.
+<pre>
+ REG_UNGREEDY
+</pre>
+The PCRE_UNGREEDY option is set when the regular expression is passed for
+compilation to the native function. Note that REG_UNGREEDY is not part of the
+POSIX standard.
+<pre>
+ REG_UTF8
+</pre>
+The PCRE_UTF8 option is set when the regular expression is passed for
+compilation to the native function. This causes the pattern itself and all data
+strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
+is not part of the POSIX standard.
+</P>
+<P>
+In the absence of these flags, no options are passed to the native function.
+This means the the regex is compiled with PCRE default semantics. In
+particular, the way it handles newline characters in the subject string is the
+Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
+<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
+newlines are matched by . (they are not) or by a negative class such as [^a]
+(they are).
+</P>
+<P>
+The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
+<i>preg</i> structure is filled in on success, and one member of the structure
+is public: <i>re_nsub</i> contains the number of capturing subpatterns in
+the regular expression. Various error codes are defined in the header file.
+</P>
+<P>
+NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to
+use the contents of the <i>preg</i> structure. If, for example, you pass it to
+<b>regexec()</b>, the result is undefined and your program is likely to crash.
+</P>
+<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
+<P>
+This area is not simple, because POSIX and Perl take different views of things.
+It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
+intended to be a POSIX engine. The following table lists the different
+possibilities for matching newline characters in PCRE:
+<pre>
+ Default Change with
+
+ . matches newline no PCRE_DOTALL
+ newline matches [^a] yes not changeable
+ $ matches \n at end yes PCRE_DOLLARENDONLY
+ $ matches \n in middle no PCRE_MULTILINE
+ ^ matches \n in middle no PCRE_MULTILINE
+</pre>
+This is the equivalent table for POSIX:
+<pre>
+ Default Change with
+
+ . matches newline yes REG_NEWLINE
+ newline matches [^a] yes REG_NEWLINE
+ $ matches \n at end no REG_NEWLINE
+ $ matches \n in middle no REG_NEWLINE
+ ^ matches \n in middle no REG_NEWLINE
+</pre>
+PCRE's behaviour is the same as Perl's, except that there is no equivalent for
+PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
+newline from matching [^a].
+</P>
+<P>
+The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
+PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
+REG_NEWLINE action.
+</P>
+<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
+<P>
+The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i>
+against a given <i>string</i>, which is by default terminated by a zero byte
+(but see REG_STARTEND below), subject to the options in <i>eflags</i>. These can
+be:
+<pre>
+ REG_NOTBOL
+</pre>
+The PCRE_NOTBOL option is set when calling the underlying PCRE matching
+function.
+<pre>
+ REG_NOTEMPTY
+</pre>
+The PCRE_NOTEMPTY option is set when calling the underlying PCRE matching
+function. Note that REG_NOTEMPTY is not part of the POSIX standard. However,
+setting this option can give more POSIX-like behaviour in some situations.
+<pre>
+ REG_NOTEOL
+</pre>
+The PCRE_NOTEOL option is set when calling the underlying PCRE matching
+function.
+<pre>
+ REG_STARTEND
+</pre>
+The string is considered to start at <i>string</i> + <i>pmatch[0].rm_so</i> and
+to have a terminating NUL located at <i>string</i> + <i>pmatch[0].rm_eo</i>
+(there need not actually be a NUL at that location), regardless of the value of
+<i>nmatch</i>. This is a BSD extension, compatible with but not specified by
+IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software
+intended to be portable to other systems. Note that a non-zero <i>rm_so</i> does
+not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not
+how it is matched.
+</P>
+<P>
+If the pattern was compiled with the REG_NOSUB flag, no data about any matched
+strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
+<b>regexec()</b> are ignored.
+</P>
+<P>
+If the value of <i>nmatch</i> is zero, or if the value <i>pmatch</i> is NULL,
+no data about any matched strings is returned.
+</P>
+<P>
+Otherwise,the portion of the string that was matched, and also any captured
+substrings, are returned via the <i>pmatch</i> argument, which points to an
+array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
+members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
+character of each substring and the offset to the first character after the end
+of each substring, respectively. The 0th element of the vector relates to the
+entire portion of <i>string</i> that was matched; subsequent elements relate to
+the capturing subpatterns of the regular expression. Unused entries in the
+array have both structure members set to -1.
+</P>
+<P>
+A successful match yields a zero return; various error codes are defined in the
+header file, of which REG_NOMATCH is the "expected" failure code.
+</P>
+<br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
+<P>
+The <b>regerror()</b> function maps a non-zero errorcode from either
+<b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
+NULL, the error should have arisen from the use of that structure. A message
+terminated by a binary zero is placed in <i>errbuf</i>. The length of the
+message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
+function is the size of buffer needed to hold the whole message.
+</P>
+<br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
+<P>
+Compiling a regular expression causes memory to be allocated and associated
+with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
+memory, after which <i>preg</i> may no longer be used as a compiled expression.
+</P>
+<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC9" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 09 January 2012
+<br>
+Copyright &copy; 1997-2012 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcreprecompile.html b/doc/html/pcreprecompile.html
new file mode 100644
index 0000000..decb1d6
--- /dev/null
+++ b/doc/html/pcreprecompile.html
@@ -0,0 +1,163 @@
+<html>
+<head>
+<title>pcreprecompile specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcreprecompile man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a>
+<li><a name="TOC2" href="#SEC2">SAVING A COMPILED PATTERN</a>
+<li><a name="TOC3" href="#SEC3">RE-USING A PRECOMPILED PATTERN</a>
+<li><a name="TOC4" href="#SEC4">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a>
+<li><a name="TOC5" href="#SEC5">AUTHOR</a>
+<li><a name="TOC6" href="#SEC6">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a><br>
+<P>
+If you are running an application that uses a large number of regular
+expression patterns, it may be useful to store them in a precompiled form
+instead of having to compile them every time the application is run.
+If you are not using any private character tables (see the
+<a href="pcre_maketables.html"><b>pcre_maketables()</b></a>
+documentation), this is relatively straightforward. If you are using private
+tables, it is a little bit more complicated. However, if you are using the
+just-in-time optimization feature, it is not possible to save and reload the
+JIT data.
+</P>
+<P>
+If you save compiled patterns to a file, you can copy them to a different host
+and run them there. If the two hosts have different endianness (byte order),
+you should run the <b>pcre[16|32]_pattern_to_host_byte_order()</b> function on the
+new host before trying to match the pattern. The matching functions return
+PCRE_ERROR_BADENDIANNESS if they detect a pattern with the wrong endianness.
+</P>
+<P>
+Compiling regular expressions with one version of PCRE for use with a different
+version is not guaranteed to work and may cause crashes, and saving and
+restoring a compiled pattern loses any JIT optimization data.
+</P>
+<br><a name="SEC2" href="#TOC1">SAVING A COMPILED PATTERN</a><br>
+<P>
+The value returned by <b>pcre[16|32]_compile()</b> points to a single block of
+memory that holds the compiled pattern and associated data. You can find the
+length of this block in bytes by calling <b>pcre[16|32]_fullinfo()</b> with an
+argument of PCRE_INFO_SIZE. You can then save the data in any appropriate
+manner. Here is sample code for the 8-bit library that compiles a pattern and
+writes it to a file. It assumes that the variable <i>fd</i> refers to a file
+that is open for output:
+<pre>
+ int erroroffset, rc, size;
+ char *error;
+ pcre *re;
+
+ re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
+ if (re == NULL) { ... handle errors ... }
+ rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
+ if (rc &#60; 0) { ... handle errors ... }
+ rc = fwrite(re, 1, size, fd);
+ if (rc != size) { ... handle errors ... }
+</pre>
+In this example, the bytes that comprise the compiled pattern are copied
+exactly. Note that this is binary data that may contain any of the 256 possible
+byte values. On systems that make a distinction between binary and non-binary
+data, be sure that the file is opened for binary output.
+</P>
+<P>
+If you want to write more than one pattern to a file, you will have to devise a
+way of separating them. For binary data, preceding each pattern with its length
+is probably the most straightforward approach. Another possibility is to write
+out the data in hexadecimal instead of binary, one pattern to a line.
+</P>
+<P>
+Saving compiled patterns in a file is only one possible way of storing them for
+later use. They could equally well be saved in a database, or in the memory of
+some daemon process that passes them via sockets to the processes that want
+them.
+</P>
+<P>
+If the pattern has been studied, it is also possible to save the normal study
+data in a similar way to the compiled pattern itself. However, if the
+PCRE_STUDY_JIT_COMPILE was used, the just-in-time data that is created cannot
+be saved because it is too dependent on the current environment. When studying
+generates additional information, <b>pcre[16|32]_study()</b> returns a pointer to a
+<b>pcre[16|32]_extra</b> data block. Its format is defined in the
+<a href="pcreapi.html#extradata">section on matching a pattern</a>
+in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation. The <i>study_data</i> field points to the binary study data, and
+this is what you must save (not the <b>pcre[16|32]_extra</b> block itself). The
+length of the study data can be obtained by calling <b>pcre[16|32]_fullinfo()</b>
+with an argument of PCRE_INFO_STUDYSIZE. Remember to check that
+<b>pcre[16|32]_study()</b> did return a non-NULL value before trying to save the
+study data.
+</P>
+<br><a name="SEC3" href="#TOC1">RE-USING A PRECOMPILED PATTERN</a><br>
+<P>
+Re-using a precompiled pattern is straightforward. Having reloaded it into main
+memory, called <b>pcre[16|32]_pattern_to_host_byte_order()</b> if necessary, you
+pass its pointer to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> in
+the usual way.
+</P>
+<P>
+However, if you passed a pointer to custom character tables when the pattern
+was compiled (the <i>tableptr</i> argument of <b>pcre[16|32]_compile()</b>), you
+must now pass a similar pointer to <b>pcre[16|32]_exec()</b> or
+<b>pcre[16|32]_dfa_exec()</b>, because the value saved with the compiled pattern
+will obviously be nonsense. A field in a <b>pcre[16|32]_extra()</b> block is used
+to pass this data, as described in the
+<a href="pcreapi.html#extradata">section on matching a pattern</a>
+in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+</P>
+<P>
+<b>Warning:</b> The tables that <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b> use
+must be the same as those that were used when the pattern was compiled. If this
+is not the case, the behaviour is undefined.
+</P>
+<P>
+If you did not provide custom character tables when the pattern was compiled,
+the pointer in the compiled pattern is NULL, which causes the matching
+functions to use PCRE's internal tables. Thus, you do not need to take any
+special action at run time in this case.
+</P>
+<P>
+If you saved study data with the compiled pattern, you need to create your own
+<b>pcre[16|32]_extra</b> data block and set the <i>study_data</i> field to point
+to the reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in
+the <i>flags</i> field to indicate that study data is present. Then pass the
+<b>pcre[16|32]_extra</b> block to the matching function in the usual way. If the
+pattern was studied for just-in-time optimization, that data cannot be saved,
+and so is lost by a save/restore cycle.
+</P>
+<br><a name="SEC4" href="#TOC1">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a><br>
+<P>
+In general, it is safest to recompile all saved patterns when you update to a
+new PCRE release, though not all updates actually require this.
+</P>
+<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC6" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 12 November 2013
+<br>
+Copyright &copy; 1997-2013 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcresample.html b/doc/html/pcresample.html
new file mode 100644
index 0000000..aca9184
--- /dev/null
+++ b/doc/html/pcresample.html
@@ -0,0 +1,110 @@
+<html>
+<head>
+<title>pcresample specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcresample man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+PCRE SAMPLE PROGRAM
+</b><br>
+<P>
+A simple, complete demonstration program, to get you started with using PCRE,
+is supplied in the file <i>pcredemo.c</i> in the PCRE distribution. A listing of
+this program is given in the
+<a href="pcredemo.html"><b>pcredemo</b></a>
+documentation. If you do not have a copy of the PCRE distribution, you can save
+this listing to re-create <i>pcredemo.c</i>.
+</P>
+<P>
+The demonstration program, which uses the original PCRE 8-bit library, compiles
+the regular expression that is its first argument, and matches it against the
+subject string in its second argument. No PCRE options are set, and default
+character tables are used. If matching succeeds, the program outputs the
+portion of the subject that matched, together with the contents of any captured
+substrings.
+</P>
+<P>
+If the -g option is given on the command line, the program then goes on to
+check for further matches of the same regular expression in the same subject
+string. The logic is a little bit tricky because of the possibility of matching
+an empty string. Comments in the code explain what is going on.
+</P>
+<P>
+If PCRE is installed in the standard include and library directories for your
+operating system, you should be able to compile the demonstration program using
+this command:
+<pre>
+ gcc -o pcredemo pcredemo.c -lpcre
+</pre>
+If PCRE is installed elsewhere, you may need to add additional options to the
+command line. For example, on a Unix-like system that has PCRE installed in
+<i>/usr/local</i>, you can compile the demonstration program using a command
+like this:
+<pre>
+ gcc -o pcredemo -I/usr/local/include pcredemo.c -L/usr/local/lib -lpcre
+</pre>
+In a Windows environment, if you want to statically link the program against a
+non-dll <b>pcre.a</b> file, you must uncomment the line that defines PCRE_STATIC
+before including <b>pcre.h</b>, because otherwise the <b>pcre_malloc()</b> and
+<b>pcre_free()</b> exported functions will be declared
+<b>__declspec(dllimport)</b>, with unwanted results.
+</P>
+<P>
+Once you have compiled and linked the demonstration program, you can run simple
+tests like this:
+<pre>
+ ./pcredemo 'cat|dog' 'the cat sat on the mat'
+ ./pcredemo -g 'cat|dog' 'the dog sat on the cat'
+</pre>
+Note that there is a much more comprehensive test program, called
+<a href="pcretest.html"><b>pcretest</b>,</a>
+which supports many more facilities for testing regular expressions and both
+PCRE libraries. The
+<a href="pcredemo.html"><b>pcredemo</b></a>
+program is provided as a simple coding example.
+</P>
+<P>
+If you try to run
+<a href="pcredemo.html"><b>pcredemo</b></a>
+when PCRE is not installed in the standard library directory, you may get an
+error like this on some operating systems (e.g. Solaris):
+<pre>
+ ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
+</pre>
+This is caused by the way shared library support works on those systems. You
+need to add
+<pre>
+ -R/usr/local/lib
+</pre>
+(for example) to the compile command to get round this problem.
+</P>
+<br><b>
+AUTHOR
+</b><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><b>
+REVISION
+</b><br>
+<P>
+Last updated: 10 January 2012
+<br>
+Copyright &copy; 1997-2012 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcrestack.html b/doc/html/pcrestack.html
new file mode 100644
index 0000000..af6406d
--- /dev/null
+++ b/doc/html/pcrestack.html
@@ -0,0 +1,225 @@
+<html>
+<head>
+<title>pcrestack specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrestack man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+PCRE DISCUSSION OF STACK USAGE
+</b><br>
+<P>
+When you call <b>pcre[16|32]_exec()</b>, it makes use of an internal function
+called <b>match()</b>. This calls itself recursively at branch points in the
+pattern, in order to remember the state of the match so that it can back up and
+try a different alternative if the first one fails. As matching proceeds deeper
+and deeper into the tree of possibilities, the recursion depth increases. The
+<b>match()</b> function is also called in other circumstances, for example,
+whenever a parenthesized sub-pattern is entered, and in certain cases of
+repetition.
+</P>
+<P>
+Not all calls of <b>match()</b> increase the recursion depth; for an item such
+as a* it may be called several times at the same level, after matching
+different numbers of a's. Furthermore, in a number of cases where the result of
+the recursive call would immediately be passed back as the result of the
+current call (a "tail recursion"), the function is just restarted instead.
+</P>
+<P>
+The above comments apply when <b>pcre[16|32]_exec()</b> is run in its normal
+interpretive manner. If the pattern was studied with the
+PCRE_STUDY_JIT_COMPILE option, and just-in-time compiling was successful, and
+the options passed to <b>pcre[16|32]_exec()</b> were not incompatible, the matching
+process uses the JIT-compiled code instead of the <b>match()</b> function. In
+this case, the memory requirements are handled entirely differently. See the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+documentation for details.
+</P>
+<P>
+The <b>pcre[16|32]_dfa_exec()</b> function operates in an entirely different way,
+and uses recursion only when there is a regular expression recursion or
+subroutine call in the pattern. This includes the processing of assertion and
+"once-only" subpatterns, which are handled like subroutine calls. Normally,
+these are never very deep, and the limit on the complexity of
+<b>pcre[16|32]_dfa_exec()</b> is controlled by the amount of workspace it is given.
+However, it is possible to write patterns with runaway infinite recursions;
+such patterns will cause <b>pcre[16|32]_dfa_exec()</b> to run out of stack. At
+present, there is no protection against this.
+</P>
+<P>
+The comments that follow do NOT apply to <b>pcre[16|32]_dfa_exec()</b>; they are
+relevant only for <b>pcre[16|32]_exec()</b> without the JIT optimization.
+</P>
+<br><b>
+Reducing <b>pcre[16|32]_exec()</b>'s stack usage
+</b><br>
+<P>
+Each time that <b>match()</b> is actually called recursively, it uses memory
+from the process stack. For certain kinds of pattern and data, very large
+amounts of stack may be needed, despite the recognition of "tail recursion".
+You can often reduce the amount of recursion, and therefore the amount of stack
+used, by modifying the pattern that is being matched. Consider, for example,
+this pattern:
+<pre>
+ ([^&#60;]|&#60;(?!inet))+
+</pre>
+It matches from wherever it starts until it encounters "&#60;inet" or the end of
+the data, and is the kind of pattern that might be used when processing an XML
+file. Each iteration of the outer parentheses matches either one character that
+is not "&#60;" or a "&#60;" that is not followed by "inet". However, each time a
+parenthesis is processed, a recursion occurs, so this formulation uses a stack
+frame for each matched character. For a long string, a lot of stack is
+required. Consider now this rewritten pattern, which matches exactly the same
+strings:
+<pre>
+ ([^&#60;]++|&#60;(?!inet))+
+</pre>
+This uses very much less stack, because runs of characters that do not contain
+"&#60;" are "swallowed" in one item inside the parentheses. Recursion happens only
+when a "&#60;" character that is not followed by "inet" is encountered (and we
+assume this is relatively rare). A possessive quantifier is used to stop any
+backtracking into the runs of non-"&#60;" characters, but that is not related to
+stack usage.
+</P>
+<P>
+This example shows that one way of avoiding stack problems when matching long
+subject strings is to write repeated parenthesized subpatterns to match more
+than one character whenever possible.
+</P>
+<br><b>
+Compiling PCRE to use heap instead of stack for <b>pcre[16|32]_exec()</b>
+</b><br>
+<P>
+In environments where stack memory is constrained, you might want to compile
+PCRE to use heap memory instead of stack for remembering back-up points when
+<b>pcre[16|32]_exec()</b> is running. This makes it run a lot more slowly, however.
+Details of how to do this are given in the
+<a href="pcrebuild.html"><b>pcrebuild</b></a>
+documentation. When built in this way, instead of using the stack, PCRE obtains
+and frees memory by calling the functions that are pointed to by the
+<b>pcre[16|32]_stack_malloc</b> and <b>pcre[16|32]_stack_free</b> variables. By
+default, these point to <b>malloc()</b> and <b>free()</b>, but you can replace
+the pointers to cause PCRE to use your own functions. Since the block sizes are
+always the same, and are always freed in reverse order, it may be possible to
+implement customized memory handlers that are more efficient than the standard
+functions.
+</P>
+<br><b>
+Limiting <b>pcre[16|32]_exec()</b>'s stack usage
+</b><br>
+<P>
+You can set limits on the number of times that <b>match()</b> is called, both in
+total and recursively. If a limit is exceeded, <b>pcre[16|32]_exec()</b> returns an
+error code. Setting suitable limits should prevent it from running out of
+stack. The default values of the limits are very large, and unlikely ever to
+operate. They can be changed when PCRE is built, and they can also be set when
+<b>pcre[16|32]_exec()</b> is called. For details of these interfaces, see the
+<a href="pcrebuild.html"><b>pcrebuild</b></a>
+documentation and the
+<a href="pcreapi.html#extradata">section on extra data for <b>pcre[16|32]_exec()</b></a>
+in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+</P>
+<P>
+As a very rough rule of thumb, you should reckon on about 500 bytes per
+recursion. Thus, if you want to limit your stack usage to 8Mb, you should set
+the limit at 16000 recursions. A 64Mb stack, on the other hand, can support
+around 128000 recursions.
+</P>
+<P>
+In Unix-like environments, the <b>pcretest</b> test program has a command line
+option (<b>-S</b>) that can be used to increase the size of its stack. As long
+as the stack is large enough, another option (<b>-M</b>) can be used to find the
+smallest limits that allow a particular pattern to match a given subject
+string. This is done by calling <b>pcre[16|32]_exec()</b> repeatedly with different
+limits.
+</P>
+<br><b>
+Obtaining an estimate of stack usage
+</b><br>
+<P>
+The actual amount of stack used per recursion can vary quite a lot, depending
+on the compiler that was used to build PCRE and the optimization or debugging
+options that were set for it. The rule of thumb value of 500 bytes mentioned
+above may be larger or smaller than what is actually needed. A better
+approximation can be obtained by running this command:
+<pre>
+ pcretest -m -C
+</pre>
+The <b>-C</b> option causes <b>pcretest</b> to output information about the
+options with which PCRE was compiled. When <b>-m</b> is also given (before
+<b>-C</b>), information about stack use is given in a line like this:
+<pre>
+ Match recursion uses stack: approximate frame size = 640 bytes
+</pre>
+The value is approximate because some recursions need a bit more (up to perhaps
+16 more bytes).
+</P>
+<P>
+If the above command is given when PCRE is compiled to use the heap instead of
+the stack for recursion, the value that is output is the size of each block
+that is obtained from the heap.
+</P>
+<br><b>
+Changing stack size in Unix-like systems
+</b><br>
+<P>
+In Unix-like environments, there is not often a problem with the stack unless
+very long strings are involved, though the default limit on stack size varies
+from system to system. Values from 8Mb to 64Mb are common. You can find your
+default limit by running the command:
+<pre>
+ ulimit -s
+</pre>
+Unfortunately, the effect of running out of stack is often SIGSEGV, though
+sometimes a more explicit error message is given. You can normally increase the
+limit on stack size by code such as this:
+<pre>
+ struct rlimit rlim;
+ getrlimit(RLIMIT_STACK, &rlim);
+ rlim.rlim_cur = 100*1024*1024;
+ setrlimit(RLIMIT_STACK, &rlim);
+</pre>
+This reads the current limits (soft and hard) using <b>getrlimit()</b>, then
+attempts to increase the soft limit to 100Mb using <b>setrlimit()</b>. You must
+do this before calling <b>pcre[16|32]_exec()</b>.
+</P>
+<br><b>
+Changing stack size in Mac OS X
+</b><br>
+<P>
+Using <b>setrlimit()</b>, as described above, should also work on Mac OS X. It
+is also possible to set a stack size when linking a program. There is a
+discussion about stack sizes in Mac OS X at this web site:
+<a href="http://developer.apple.com/qa/qa2005/qa1419.html">http://developer.apple.com/qa/qa2005/qa1419.html.</a>
+</P>
+<br><b>
+AUTHOR
+</b><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><b>
+REVISION
+</b><br>
+<P>
+Last updated: 24 June 2012
+<br>
+Copyright &copy; 1997-2012 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcresyntax.html b/doc/html/pcresyntax.html
new file mode 100644
index 0000000..89f3573
--- /dev/null
+++ b/doc/html/pcresyntax.html
@@ -0,0 +1,538 @@
+<html>
+<head>
+<title>pcresyntax specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcresyntax man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
+<li><a name="TOC2" href="#SEC2">QUOTING</a>
+<li><a name="TOC3" href="#SEC3">CHARACTERS</a>
+<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
+<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
+<li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
+<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
+<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
+<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
+<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
+<li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a>
+<li><a name="TOC12" href="#SEC12">ALTERNATION</a>
+<li><a name="TOC13" href="#SEC13">CAPTURING</a>
+<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
+<li><a name="TOC15" href="#SEC15">COMMENT</a>
+<li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
+<li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
+<li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
+<li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
+<li><a name="TOC20" href="#SEC20">BACKREFERENCES</a>
+<li><a name="TOC21" href="#SEC21">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC22" href="#SEC22">CONDITIONAL PATTERNS</a>
+<li><a name="TOC23" href="#SEC23">BACKTRACKING CONTROL</a>
+<li><a name="TOC24" href="#SEC24">CALLOUTS</a>
+<li><a name="TOC25" href="#SEC25">SEE ALSO</a>
+<li><a name="TOC26" href="#SEC26">AUTHOR</a>
+<li><a name="TOC27" href="#SEC27">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
+<P>
+The full syntax and semantics of the regular expressions that are supported by
+PCRE are described in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation. This document contains a quick-reference summary of the syntax.
+</P>
+<br><a name="SEC2" href="#TOC1">QUOTING</a><br>
+<P>
+<pre>
+ \x where x is non-alphanumeric is a literal x
+ \Q...\E treat enclosed characters as literal
+</PRE>
+</P>
+<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
+<P>
+<pre>
+ \a alarm, that is, the BEL character (hex 07)
+ \cx "control-x", where x is any ASCII character
+ \e escape (hex 1B)
+ \f form feed (hex 0C)
+ \n newline (hex 0A)
+ \r carriage return (hex 0D)
+ \t tab (hex 09)
+ \0dd character with octal code 0dd
+ \ddd character with octal code ddd, or backreference
+ \o{ddd..} character with octal code ddd..
+ \xhh character with hex code hh
+ \x{hhh..} character with hex code hhh..
+</pre>
+Note that \0dd is always an octal code, and that \8 and \9 are the literal
+characters "8" and "9".
+</P>
+<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
+<P>
+<pre>
+ . any character except newline;
+ in dotall mode, any character whatsoever
+ \C one data unit, even in UTF mode (best avoided)
+ \d a decimal digit
+ \D a character that is not a decimal digit
+ \h a horizontal white space character
+ \H a character that is not a horizontal white space character
+ \N a character that is not a newline
+ \p{<i>xx</i>} a character with the <i>xx</i> property
+ \P{<i>xx</i>} a character without the <i>xx</i> property
+ \R a newline sequence
+ \s a white space character
+ \S a character that is not a white space character
+ \v a vertical white space character
+ \V a character that is not a vertical white space character
+ \w a "word" character
+ \W a "non-word" character
+ \X a Unicode extended grapheme cluster
+</pre>
+By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode
+or in the 16- bit and 32-bit libraries. However, if locale-specific matching is
+happening, \s and \w may also match characters with code points in the range
+128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences
+is changed to use Unicode properties and they match many more characters.
+</P>
+<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
+<P>
+<pre>
+ C Other
+ Cc Control
+ Cf Format
+ Cn Unassigned
+ Co Private use
+ Cs Surrogate
+
+ L Letter
+ Ll Lower case letter
+ Lm Modifier letter
+ Lo Other letter
+ Lt Title case letter
+ Lu Upper case letter
+ L& Ll, Lu, or Lt
+
+ M Mark
+ Mc Spacing mark
+ Me Enclosing mark
+ Mn Non-spacing mark
+
+ N Number
+ Nd Decimal number
+ Nl Letter number
+ No Other number
+
+ P Punctuation
+ Pc Connector punctuation
+ Pd Dash punctuation
+ Pe Close punctuation
+ Pf Final punctuation
+ Pi Initial punctuation
+ Po Other punctuation
+ Ps Open punctuation
+
+ S Symbol
+ Sc Currency symbol
+ Sk Modifier symbol
+ Sm Mathematical symbol
+ So Other symbol
+
+ Z Separator
+ Zl Line separator
+ Zp Paragraph separator
+ Zs Space separator
+</PRE>
+</P>
+<br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
+<P>
+<pre>
+ Xan Alphanumeric: union of properties L and N
+ Xps POSIX space: property Z or tab, NL, VT, FF, CR
+ Xsp Perl space: property Z or tab, NL, VT, FF, CR
+ Xuc Univerally-named character: one that can be
+ represented by a Universal Character Name
+ Xwd Perl word: property Xan or underscore
+</pre>
+Perl and POSIX space are now the same. Perl added VT to its space character set
+at release 5.18 and PCRE changed at release 8.34.
+</P>
+<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
+<P>
+Arabic,
+Armenian,
+Avestan,
+Balinese,
+Bamum,
+Batak,
+Bengali,
+Bopomofo,
+Brahmi,
+Braille,
+Buginese,
+Buhid,
+Canadian_Aboriginal,
+Carian,
+Chakma,
+Cham,
+Cherokee,
+Common,
+Coptic,
+Cuneiform,
+Cypriot,
+Cyrillic,
+Deseret,
+Devanagari,
+Egyptian_Hieroglyphs,
+Ethiopic,
+Georgian,
+Glagolitic,
+Gothic,
+Greek,
+Gujarati,
+Gurmukhi,
+Han,
+Hangul,
+Hanunoo,
+Hebrew,
+Hiragana,
+Imperial_Aramaic,
+Inherited,
+Inscriptional_Pahlavi,
+Inscriptional_Parthian,
+Javanese,
+Kaithi,
+Kannada,
+Katakana,
+Kayah_Li,
+Kharoshthi,
+Khmer,
+Lao,
+Latin,
+Lepcha,
+Limbu,
+Linear_B,
+Lisu,
+Lycian,
+Lydian,
+Malayalam,
+Mandaic,
+Meetei_Mayek,
+Meroitic_Cursive,
+Meroitic_Hieroglyphs,
+Miao,
+Mongolian,
+Myanmar,
+New_Tai_Lue,
+Nko,
+Ogham,
+Old_Italic,
+Old_Persian,
+Old_South_Arabian,
+Old_Turkic,
+Ol_Chiki,
+Oriya,
+Osmanya,
+Phags_Pa,
+Phoenician,
+Rejang,
+Runic,
+Samaritan,
+Saurashtra,
+Sharada,
+Shavian,
+Sinhala,
+Sora_Sompeng,
+Sundanese,
+Syloti_Nagri,
+Syriac,
+Tagalog,
+Tagbanwa,
+Tai_Le,
+Tai_Tham,
+Tai_Viet,
+Takri,
+Tamil,
+Telugu,
+Thaana,
+Thai,
+Tibetan,
+Tifinagh,
+Ugaritic,
+Vai,
+Yi.
+</P>
+<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
+<P>
+<pre>
+ [...] positive character class
+ [^...] negative character class
+ [x-y] range (can be used for hex characters)
+ [[:xxx:]] positive POSIX named set
+ [[:^xxx:]] negative POSIX named set
+
+ alnum alphanumeric
+ alpha alphabetic
+ ascii 0-127
+ blank space or tab
+ cntrl control character
+ digit decimal digit
+ graph printing, excluding space
+ lower lower case letter
+ print printing, including space
+ punct printing, excluding alphanumeric
+ space white space
+ upper upper case letter
+ word same as \w
+ xdigit hexadecimal digit
+</pre>
+In PCRE, POSIX character set names recognize only ASCII characters by default,
+but some of them use Unicode properties if PCRE_UCP is set. You can use
+\Q...\E inside a character class.
+</P>
+<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
+<P>
+<pre>
+ ? 0 or 1, greedy
+ ?+ 0 or 1, possessive
+ ?? 0 or 1, lazy
+ * 0 or more, greedy
+ *+ 0 or more, possessive
+ *? 0 or more, lazy
+ + 1 or more, greedy
+ ++ 1 or more, possessive
+ +? 1 or more, lazy
+ {n} exactly n
+ {n,m} at least n, no more than m, greedy
+ {n,m}+ at least n, no more than m, possessive
+ {n,m}? at least n, no more than m, lazy
+ {n,} n or more, greedy
+ {n,}+ n or more, possessive
+ {n,}? n or more, lazy
+</PRE>
+</P>
+<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
+<P>
+<pre>
+ \b word boundary
+ \B not a word boundary
+ ^ start of subject
+ also after internal newline in multiline mode
+ \A start of subject
+ $ end of subject
+ also before newline at end of subject
+ also before internal newline in multiline mode
+ \Z end of subject
+ also before newline at end of subject
+ \z end of subject
+ \G first matching position in subject
+</PRE>
+</P>
+<br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br>
+<P>
+<pre>
+ \K reset start of match
+</pre>
+\K is honoured in positive assertions, but ignored in negative ones.
+</P>
+<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
+<P>
+<pre>
+ expr|expr|expr...
+</PRE>
+</P>
+<br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
+<P>
+<pre>
+ (...) capturing group
+ (?&#60;name&#62;...) named capturing group (Perl)
+ (?'name'...) named capturing group (Perl)
+ (?P&#60;name&#62;...) named capturing group (Python)
+ (?:...) non-capturing group
+ (?|...) non-capturing group; reset group numbers for
+ capturing groups in each alternative
+</PRE>
+</P>
+<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
+<P>
+<pre>
+ (?&#62;...) atomic, non-capturing group
+</PRE>
+</P>
+<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
+<P>
+<pre>
+ (?#....) comment (not nestable)
+</PRE>
+</P>
+<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
+<P>
+<pre>
+ (?i) caseless
+ (?J) allow duplicate names
+ (?m) multiline
+ (?s) single line (dotall)
+ (?U) default ungreedy (lazy)
+ (?x) extended (ignore white space)
+ (?-...) unset option(s)
+</pre>
+The following are recognized only at the very start of a pattern or after one
+of the newline or \R options with similar syntax. More than one of them may
+appear.
+<pre>
+ (*LIMIT_MATCH=d) set the match limit to d (decimal number)
+ (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
+ (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
+ (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
+ (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
+ (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
+ (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32)
+ (*UTF) set appropriate UTF mode for the library in use
+ (*UCP) set PCRE_UCP (use Unicode properties for \d etc)
+</pre>
+Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
+limits set by the caller of pcre_exec(), not increase them.
+</P>
+<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
+<P>
+These are recognized only at the very start of the pattern or after option
+settings with a similar syntax.
+<pre>
+ (*CR) carriage return only
+ (*LF) linefeed only
+ (*CRLF) carriage return followed by linefeed
+ (*ANYCRLF) all three of the above
+ (*ANY) any Unicode newline sequence
+</PRE>
+</P>
+<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br>
+<P>
+These are recognized only at the very start of the pattern or after option
+setting with a similar syntax.
+<pre>
+ (*BSR_ANYCRLF) CR, LF, or CRLF
+ (*BSR_UNICODE) any Unicode newline sequence
+</PRE>
+</P>
+<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
+<P>
+<pre>
+ (?=...) positive look ahead
+ (?!...) negative look ahead
+ (?&#60;=...) positive look behind
+ (?&#60;!...) negative look behind
+</pre>
+Each top-level branch of a look behind must be of a fixed length.
+</P>
+<br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br>
+<P>
+<pre>
+ \n reference by number (can be ambiguous)
+ \gn reference by number
+ \g{n} reference by number
+ \g{-n} relative reference by number
+ \k&#60;name&#62; reference by name (Perl)
+ \k'name' reference by name (Perl)
+ \g{name} reference by name (Perl)
+ \k{name} reference by name (.NET)
+ (?P=name) reference by name (Python)
+</PRE>
+</P>
+<br><a name="SEC21" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<P>
+<pre>
+ (?R) recurse whole pattern
+ (?n) call subpattern by absolute number
+ (?+n) call subpattern by relative number
+ (?-n) call subpattern by relative number
+ (?&name) call subpattern by name (Perl)
+ (?P&#62;name) call subpattern by name (Python)
+ \g&#60;name&#62; call subpattern by name (Oniguruma)
+ \g'name' call subpattern by name (Oniguruma)
+ \g&#60;n&#62; call subpattern by absolute number (Oniguruma)
+ \g'n' call subpattern by absolute number (Oniguruma)
+ \g&#60;+n&#62; call subpattern by relative number (PCRE extension)
+ \g'+n' call subpattern by relative number (PCRE extension)
+ \g&#60;-n&#62; call subpattern by relative number (PCRE extension)
+ \g'-n' call subpattern by relative number (PCRE extension)
+</PRE>
+</P>
+<br><a name="SEC22" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<P>
+<pre>
+ (?(condition)yes-pattern)
+ (?(condition)yes-pattern|no-pattern)
+
+ (?(n)... absolute reference condition
+ (?(+n)... relative reference condition
+ (?(-n)... relative reference condition
+ (?(&#60;name&#62;)... named reference condition (Perl)
+ (?('name')... named reference condition (Perl)
+ (?(name)... named reference condition (PCRE)
+ (?(R)... overall recursion condition
+ (?(Rn)... specific group recursion condition
+ (?(R&name)... specific recursion condition
+ (?(DEFINE)... define subpattern for reference
+ (?(assert)... assertion condition
+</PRE>
+</P>
+<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<P>
+The following act immediately they are reached:
+<pre>
+ (*ACCEPT) force successful match
+ (*FAIL) force backtrack; synonym (*F)
+ (*MARK:NAME) set name to be passed back; synonym (*:NAME)
+</pre>
+The following act only when a subsequent match failure causes a backtrack to
+reach them. They all force a match failure, but they differ in what happens
+afterwards. Those that advance the start-of-match point do so only if the
+pattern is not anchored.
+<pre>
+ (*COMMIT) overall failure, no advance of starting point
+ (*PRUNE) advance to next starting character
+ (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
+ (*SKIP) advance to current matching position
+ (*SKIP:NAME) advance to position corresponding to an earlier
+ (*MARK:NAME); if not found, the (*SKIP) is ignored
+ (*THEN) local failure, backtrack to next alternation
+ (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
+</PRE>
+</P>
+<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
+<P>
+<pre>
+ (?C) callout
+ (?Cn) callout with data n
+</PRE>
+</P>
+<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
+<b>pcrematching</b>(3), <b>pcre</b>(3).
+</P>
+<br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC27" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 08 January 2014
+<br>
+Copyright &copy; 1997-2014 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcretest.html b/doc/html/pcretest.html
new file mode 100644
index 0000000..839fabf
--- /dev/null
+++ b/doc/html/pcretest.html
@@ -0,0 +1,1158 @@
+<html>
+<head>
+<title>pcretest specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcretest man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
+<li><a name="TOC2" href="#SEC2">INPUT DATA FORMAT</a>
+<li><a name="TOC3" href="#SEC3">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
+<li><a name="TOC4" href="#SEC4">COMMAND LINE OPTIONS</a>
+<li><a name="TOC5" href="#SEC5">DESCRIPTION</a>
+<li><a name="TOC6" href="#SEC6">PATTERN MODIFIERS</a>
+<li><a name="TOC7" href="#SEC7">DATA LINES</a>
+<li><a name="TOC8" href="#SEC8">THE ALTERNATIVE MATCHING FUNCTION</a>
+<li><a name="TOC9" href="#SEC9">DEFAULT OUTPUT FROM PCRETEST</a>
+<li><a name="TOC10" href="#SEC10">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
+<li><a name="TOC11" href="#SEC11">RESTARTING AFTER A PARTIAL MATCH</a>
+<li><a name="TOC12" href="#SEC12">CALLOUTS</a>
+<li><a name="TOC13" href="#SEC13">NON-PRINTING CHARACTERS</a>
+<li><a name="TOC14" href="#SEC14">SAVING AND RELOADING COMPILED PATTERNS</a>
+<li><a name="TOC15" href="#SEC15">SEE ALSO</a>
+<li><a name="TOC16" href="#SEC16">AUTHOR</a>
+<li><a name="TOC17" href="#SEC17">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
+<P>
+<b>pcretest [options] [input file [output file]]</b>
+<br>
+<br>
+<b>pcretest</b> was written as a test program for the PCRE regular expression
+library itself, but it can also be used for experimenting with regular
+expressions. This document describes the features of the test program; for
+details of the regular expressions themselves, see the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation. For details of the PCRE library function calls and their
+options, see the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+,
+<a href="pcre16.html"><b>pcre16</b></a>
+and
+<a href="pcre32.html"><b>pcre32</b></a>
+documentation.
+</P>
+<P>
+The input for <b>pcretest</b> is a sequence of regular expression patterns and
+strings to be matched, as described below. The output shows the result of each
+match. Options on the command line and the patterns control PCRE options and
+exactly what is output.
+</P>
+<P>
+As PCRE has evolved, it has acquired many different features, and as a result,
+<b>pcretest</b> now has rather a lot of obscure options for testing every
+possible feature. Some of these options are specifically designed for use in
+conjunction with the test script and data files that are distributed as part of
+PCRE, and are unlikely to be of use otherwise. They are all documented here,
+but without much justification.
+</P>
+<br><a name="SEC2" href="#TOC1">INPUT DATA FORMAT</a><br>
+<P>
+Input to <b>pcretest</b> is processed line by line, either by calling the C
+library's <b>fgets()</b> function, or via the <b>libreadline</b> library (see
+below). In Unix-like environments, <b>fgets()</b> treats any bytes other than
+newline as data characters. However, in some Windows environments character 26
+(hex 1A) causes an immediate end of file, and no further data is read. For
+maximum portability, therefore, it is safest to use only ASCII characters in
+<b>pcretest</b> input files.
+</P>
+<br><a name="SEC3" href="#TOC1">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
+<P>
+From release 8.30, two separate PCRE libraries can be built. The original one
+supports 8-bit character strings, whereas the newer 16-bit library supports
+character strings encoded in 16-bit units. From release 8.32, a third library
+can be built, supporting character strings encoded in 32-bit units. The
+<b>pcretest</b> program can be used to test all three libraries. However, it is
+itself still an 8-bit program, reading 8-bit input and writing 8-bit output.
+When testing the 16-bit or 32-bit library, the patterns and data strings are
+converted to 16- or 32-bit format before being passed to the PCRE library
+functions. Results are converted to 8-bit for output.
+</P>
+<P>
+References to functions and structures of the form <b>pcre[16|32]_xx</b> below
+mean "<b>pcre_xx</b> when using the 8-bit library, <b>pcre16_xx</b> when using
+the 16-bit library, or <b>pcre32_xx</b> when using the 32-bit library".
+</P>
+<br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br>
+<P>
+<b>-8</b>
+If both the 8-bit library has been built, this option causes the 8-bit library
+to be used (which is the default); if the 8-bit library has not been built,
+this option causes an error.
+</P>
+<P>
+<b>-16</b>
+If both the 8-bit or the 32-bit, and the 16-bit libraries have been built, this
+option causes the 16-bit library to be used. If only the 16-bit library has been
+built, this is the default (so has no effect). If only the 8-bit or the 32-bit
+library has been built, this option causes an error.
+</P>
+<P>
+<b>-32</b>
+If both the 8-bit or the 16-bit, and the 32-bit libraries have been built, this
+option causes the 32-bit library to be used. If only the 32-bit library has been
+built, this is the default (so has no effect). If only the 8-bit or the 16-bit
+library has been built, this option causes an error.
+</P>
+<P>
+<b>-b</b>
+Behave as if each pattern has the <b>/B</b> (show byte code) modifier; the
+internal form is output after compilation.
+</P>
+<P>
+<b>-C</b>
+Output the version number of the PCRE library, and all available information
+about the optional features that are included, and then exit with zero exit
+code. All other options are ignored.
+</P>
+<P>
+<b>-C</b> <i>option</i>
+Output information about a specific build-time option, then exit. This
+functionality is intended for use in scripts such as <b>RunTest</b>. The
+following options output the value and set the exit code as indicated:
+<pre>
+ ebcdic-nl the code for LF (= NL) in an EBCDIC environment:
+ 0x15 or 0x25
+ 0 if used in an ASCII environment
+ exit code is always 0
+ linksize the configured internal link size (2, 3, or 4)
+ exit code is set to the link size
+ newline the default newline setting:
+ CR, LF, CRLF, ANYCRLF, or ANY
+ exit code is always 0
+ bsr the default setting for what \R matches:
+ ANYCRLF or ANY
+ exit code is always 0
+</pre>
+The following options output 1 for true or 0 for false, and set the exit code
+to the same value:
+<pre>
+ ebcdic compiled for an EBCDIC environment
+ jit just-in-time support is available
+ pcre16 the 16-bit library was built
+ pcre32 the 32-bit library was built
+ pcre8 the 8-bit library was built
+ ucp Unicode property support is available
+ utf UTF-8 and/or UTF-16 and/or UTF-32 support
+ is available
+</pre>
+If an unknown option is given, an error message is output; the exit code is 0.
+</P>
+<P>
+<b>-d</b>
+Behave as if each pattern has the <b>/D</b> (debug) modifier; the internal
+form and information about the compiled pattern is output after compilation;
+<b>-d</b> is equivalent to <b>-b -i</b>.
+</P>
+<P>
+<b>-dfa</b>
+Behave as if each data line contains the \D escape sequence; this causes the
+alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, to be used instead
+of the standard <b>pcre[16|32]_exec()</b> function (more detail is given below).
+</P>
+<P>
+<b>-help</b>
+Output a brief summary these options and then exit.
+</P>
+<P>
+<b>-i</b>
+Behave as if each pattern has the <b>/I</b> modifier; information about the
+compiled pattern is given after compilation.
+</P>
+<P>
+<b>-M</b>
+Behave as if each data line contains the \M escape sequence; this causes
+PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by
+calling <b>pcre[16|32]_exec()</b> repeatedly with different limits.
+</P>
+<P>
+<b>-m</b>
+Output the size of each compiled pattern after it has been compiled. This is
+equivalent to adding <b>/M</b> to each regular expression. The size is given in
+bytes for both libraries.
+</P>
+<P>
+<b>-O</b>
+Behave as if each pattern has the <b>/O</b> modifier, that is disable
+auto-possessification for all patterns.
+</P>
+<P>
+<b>-o</b> <i>osize</i>
+Set the number of elements in the output vector that is used when calling
+<b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b> to be <i>osize</i>. The
+default value is 45, which is enough for 14 capturing subexpressions for
+<b>pcre[16|32]_exec()</b> or 22 different matches for
+<b>pcre[16|32]_dfa_exec()</b>.
+The vector size can be changed for individual matching calls by including \O
+in the data line (see below).
+</P>
+<P>
+<b>-p</b>
+Behave as if each pattern has the <b>/P</b> modifier; the POSIX wrapper API is
+used to call PCRE. None of the other options has any effect when <b>-p</b> is
+set. This option can be used only with the 8-bit library.
+</P>
+<P>
+<b>-q</b>
+Do not output the version number of <b>pcretest</b> at the start of execution.
+</P>
+<P>
+<b>-S</b> <i>size</i>
+On Unix-like systems, set the size of the run-time stack to <i>size</i>
+megabytes.
+</P>
+<P>
+<b>-s</b> or <b>-s+</b>
+Behave as if each pattern has the <b>/S</b> modifier; in other words, force each
+pattern to be studied. If <b>-s+</b> is used, all the JIT compile options are
+passed to <b>pcre[16|32]_study()</b>, causing just-in-time optimization to be set
+up if it is available, for both full and partial matching. Specific JIT compile
+options can be selected by following <b>-s+</b> with a digit in the range 1 to
+7, which selects the JIT compile modes as follows:
+<pre>
+ 1 normal match only
+ 2 soft partial match only
+ 3 normal match and soft partial match
+ 4 hard partial match only
+ 6 soft and hard partial match
+ 7 all three modes (default)
+</pre>
+If <b>-s++</b> is used instead of <b>-s+</b> (with or without a following digit),
+the text "(JIT)" is added to the first output line after a match or no match
+when JIT-compiled code was actually used.
+<br>
+<br>
+Note that there are pattern options that can override <b>-s</b>, either
+specifying no studying at all, or suppressing JIT compilation.
+<br>
+<br>
+If the <b>/I</b> or <b>/D</b> option is present on a pattern (requesting output
+about the compiled pattern), information about the result of studying is not
+included when studying is caused only by <b>-s</b> and neither <b>-i</b> nor
+<b>-d</b> is present on the command line. This behaviour means that the output
+from tests that are run with and without <b>-s</b> should be identical, except
+when options that output information about the actual running of a match are
+set.
+<br>
+<br>
+The <b>-M</b>, <b>-t</b>, and <b>-tm</b> options, which give information about
+resources used, are likely to produce different output with and without
+<b>-s</b>. Output may also differ if the <b>/C</b> option is present on an
+individual pattern. This uses callouts to trace the the matching process, and
+this may be different between studied and non-studied patterns. If the pattern
+contains (*MARK) items there may also be differences, for the same reason. The
+<b>-s</b> command line option can be overridden for specific patterns that
+should never be studied (see the <b>/S</b> pattern modifier below).
+</P>
+<P>
+<b>-t</b>
+Run each compile, study, and match many times with a timer, and output the
+resulting times per compile, study, or match (in milliseconds). Do not set
+<b>-m</b> with <b>-t</b>, because you will then get the size output a zillion
+times, and the timing will be distorted. You can control the number of
+iterations that are used for timing by following <b>-t</b> with a number (as a
+separate item on the command line). For example, "-t 1000" iterates 1000 times.
+The default is to iterate 500000 times.
+</P>
+<P>
+<b>-tm</b>
+This is like <b>-t</b> except that it times only the matching phase, not the
+compile or study phases.
+</P>
+<P>
+<b>-T</b> <b>-TM</b>
+These behave like <b>-t</b> and <b>-tm</b>, but in addition, at the end of a run,
+the total times for all compiles, studies, and matches are output.
+</P>
+<br><a name="SEC5" href="#TOC1">DESCRIPTION</a><br>
+<P>
+If <b>pcretest</b> is given two filename arguments, it reads from the first and
+writes to the second. If it is given only one filename argument, it reads from
+that file and writes to stdout. Otherwise, it reads from stdin and writes to
+stdout, and prompts for each line of input, using "re&#62;" to prompt for regular
+expressions, and "data&#62;" to prompt for data lines.
+</P>
+<P>
+When <b>pcretest</b> is built, a configuration option can specify that it should
+be linked with the <b>libreadline</b> library. When this is done, if the input
+is from a terminal, it is read using the <b>readline()</b> function. This
+provides line-editing and history facilities. The output from the <b>-help</b>
+option states whether or not <b>readline()</b> will be used.
+</P>
+<P>
+The program handles any number of sets of input on a single input file. Each
+set starts with a regular expression, and continues with any number of data
+lines to be matched against that pattern.
+</P>
+<P>
+Each data line is matched separately and independently. If you want to do
+multi-line matches, you have to use the \n escape sequence (or \r or \r\n,
+etc., depending on the newline setting) in a single line of input to encode the
+newline sequences. There is no limit on the length of data lines; the input
+buffer is automatically extended if it is too small.
+</P>
+<P>
+An empty line signals the end of the data lines, at which point a new regular
+expression is read. The regular expressions are given enclosed in any
+non-alphanumeric delimiters other than backslash, for example:
+<pre>
+ /(a|bc)x+yz/
+</pre>
+White space before the initial delimiter is ignored. A regular expression may
+be continued over several input lines, in which case the newline characters are
+included within it. It is possible to include the delimiter within the pattern
+by escaping it, for example
+<pre>
+ /abc\/def/
+</pre>
+If you do so, the escape and the delimiter form part of the pattern, but since
+delimiters are always non-alphanumeric, this does not affect its interpretation.
+If the terminating delimiter is immediately followed by a backslash, for
+example,
+<pre>
+ /abc/\
+</pre>
+then a backslash is added to the end of the pattern. This is done to provide a
+way of testing the error condition that arises if a pattern finishes with a
+backslash, because
+<pre>
+ /abc\/
+</pre>
+is interpreted as the first line of a pattern that starts with "abc/", causing
+pcretest to read the next line as a continuation of the regular expression.
+</P>
+<br><a name="SEC6" href="#TOC1">PATTERN MODIFIERS</a><br>
+<P>
+A pattern may be followed by any number of modifiers, which are mostly single
+characters, though some of these can be qualified by further characters.
+Following Perl usage, these are referred to below as, for example, "the
+<b>/i</b> modifier", even though the delimiter of the pattern need not always be
+a slash, and no slash is used when writing modifiers. White space may appear
+between the final pattern delimiter and the first modifier, and between the
+modifiers themselves. For reference, here is a complete list of modifiers. They
+fall into several groups that are described in detail in the following
+sections.
+<pre>
+ <b>/8</b> set UTF mode
+ <b>/9</b> set PCRE_NEVER_UTF (locks out UTF mode)
+ <b>/?</b> disable UTF validity check
+ <b>/+</b> show remainder of subject after match
+ <b>/=</b> show all captures (not just those that are set)
+
+ <b>/A</b> set PCRE_ANCHORED
+ <b>/B</b> show compiled code
+ <b>/C</b> set PCRE_AUTO_CALLOUT
+ <b>/D</b> same as <b>/B</b> plus <b>/I</b>
+ <b>/E</b> set PCRE_DOLLAR_ENDONLY
+ <b>/F</b> flip byte order in compiled pattern
+ <b>/f</b> set PCRE_FIRSTLINE
+ <b>/G</b> find all matches (shorten string)
+ <b>/g</b> find all matches (use startoffset)
+ <b>/I</b> show information about pattern
+ <b>/i</b> set PCRE_CASELESS
+ <b>/J</b> set PCRE_DUPNAMES
+ <b>/K</b> show backtracking control names
+ <b>/L</b> set locale
+ <b>/M</b> show compiled memory size
+ <b>/m</b> set PCRE_MULTILINE
+ <b>/N</b> set PCRE_NO_AUTO_CAPTURE
+ <b>/O</b> set PCRE_NO_AUTO_POSSESS
+ <b>/P</b> use the POSIX wrapper
+ <b>/Q</b> test external stack check function
+ <b>/S</b> study the pattern after compilation
+ <b>/s</b> set PCRE_DOTALL
+ <b>/T</b> select character tables
+ <b>/U</b> set PCRE_UNGREEDY
+ <b>/W</b> set PCRE_UCP
+ <b>/X</b> set PCRE_EXTRA
+ <b>/x</b> set PCRE_EXTENDED
+ <b>/Y</b> set PCRE_NO_START_OPTIMIZE
+ <b>/Z</b> don't show lengths in <b>/B</b> output
+
+ <b>/&#60;any&#62;</b> set PCRE_NEWLINE_ANY
+ <b>/&#60;anycrlf&#62;</b> set PCRE_NEWLINE_ANYCRLF
+ <b>/&#60;cr&#62;</b> set PCRE_NEWLINE_CR
+ <b>/&#60;crlf&#62;</b> set PCRE_NEWLINE_CRLF
+ <b>/&#60;lf&#62;</b> set PCRE_NEWLINE_LF
+ <b>/&#60;bsr_anycrlf&#62;</b> set PCRE_BSR_ANYCRLF
+ <b>/&#60;bsr_unicode&#62;</b> set PCRE_BSR_UNICODE
+ <b>/&#60;JS&#62;</b> set PCRE_JAVASCRIPT_COMPAT
+
+</PRE>
+</P>
+<br><b>
+Perl-compatible modifiers
+</b><br>
+<P>
+The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS,
+PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when
+<b>pcre[16|32]_compile()</b> is called. These four modifier letters have the same
+effect as they do in Perl. For example:
+<pre>
+ /caseless/i
+
+</PRE>
+</P>
+<br><b>
+Modifiers for other PCRE options
+</b><br>
+<P>
+The following table shows additional modifiers for setting PCRE compile-time
+options that do not correspond to anything in Perl:
+<pre>
+ <b>/8</b> PCRE_UTF8 ) when using the 8-bit
+ <b>/?</b> PCRE_NO_UTF8_CHECK ) library
+
+ <b>/8</b> PCRE_UTF16 ) when using the 16-bit
+ <b>/?</b> PCRE_NO_UTF16_CHECK ) library
+
+ <b>/8</b> PCRE_UTF32 ) when using the 32-bit
+ <b>/?</b> PCRE_NO_UTF32_CHECK ) library
+
+ <b>/9</b> PCRE_NEVER_UTF
+ <b>/A</b> PCRE_ANCHORED
+ <b>/C</b> PCRE_AUTO_CALLOUT
+ <b>/E</b> PCRE_DOLLAR_ENDONLY
+ <b>/f</b> PCRE_FIRSTLINE
+ <b>/J</b> PCRE_DUPNAMES
+ <b>/N</b> PCRE_NO_AUTO_CAPTURE
+ <b>/O</b> PCRE_NO_AUTO_POSSESS
+ <b>/U</b> PCRE_UNGREEDY
+ <b>/W</b> PCRE_UCP
+ <b>/X</b> PCRE_EXTRA
+ <b>/Y</b> PCRE_NO_START_OPTIMIZE
+ <b>/&#60;any&#62;</b> PCRE_NEWLINE_ANY
+ <b>/&#60;anycrlf&#62;</b> PCRE_NEWLINE_ANYCRLF
+ <b>/&#60;cr&#62;</b> PCRE_NEWLINE_CR
+ <b>/&#60;crlf&#62;</b> PCRE_NEWLINE_CRLF
+ <b>/&#60;lf&#62;</b> PCRE_NEWLINE_LF
+ <b>/&#60;bsr_anycrlf&#62;</b> PCRE_BSR_ANYCRLF
+ <b>/&#60;bsr_unicode&#62;</b> PCRE_BSR_UNICODE
+ <b>/&#60;JS&#62;</b> PCRE_JAVASCRIPT_COMPAT
+</pre>
+The modifiers that are enclosed in angle brackets are literal strings as shown,
+including the angle brackets, but the letters within can be in either case.
+This example sets multiline matching with CRLF as the line ending sequence:
+<pre>
+ /^abc/m&#60;CRLF&#62;
+</pre>
+As well as turning on the PCRE_UTF8/16/32 option, the <b>/8</b> modifier causes
+all non-printing characters in output strings to be printed using the
+\x{hh...} notation. Otherwise, those less than 0x100 are output in hex without
+the curly brackets.
+</P>
+<P>
+Full details of the PCRE options are given in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+</P>
+<br><b>
+Finding all matches in a string
+</b><br>
+<P>
+Searching for all possible matches within each subject string can be requested
+by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
+again to search the remainder of the subject string. The difference between
+<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to
+<b>pcre[16|32]_exec()</b> to start searching at a new point within the entire
+string (which is in effect what Perl does), whereas the latter passes over a
+shortened substring. This makes a difference to the matching process if the
+pattern begins with a lookbehind assertion (including \b or \B).
+</P>
+<P>
+If any call to <b>pcre[16|32]_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches
+an empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and
+PCRE_ANCHORED flags set in order to search for another, non-empty, match at the
+same point. If this second match fails, the start offset is advanced, and the
+normal match is retried. This imitates the way Perl handles such cases when
+using the <b>/g</b> modifier or the <b>split()</b> function. Normally, the start
+offset is advanced by one character, but if the newline convention recognizes
+CRLF as a newline, and the current character is CR followed by LF, an advance
+of two is used.
+</P>
+<br><b>
+Other modifiers
+</b><br>
+<P>
+There are yet more modifiers for controlling the way <b>pcretest</b>
+operates.
+</P>
+<P>
+The <b>/+</b> modifier requests that as well as outputting the substring that
+matched the entire pattern, <b>pcretest</b> should in addition output the
+remainder of the subject string. This is useful for tests where the subject
+contains multiple copies of the same substring. If the <b>+</b> modifier appears
+twice, the same action is taken for captured substrings. In each case the
+remainder is output on the following line with a plus character following the
+capture number. Note that this modifier must not immediately follow the /S
+modifier because /S+ and /S++ have other meanings.
+</P>
+<P>
+The <b>/=</b> modifier requests that the values of all potential captured
+parentheses be output after a match. By default, only those up to the highest
+one actually used in the match are output (corresponding to the return code
+from <b>pcre[16|32]_exec()</b>). Values in the offsets vector corresponding to
+higher numbers should be set to -1, and these are output as "&#60;unset&#62;". This
+modifier gives a way of checking that this is happening.
+</P>
+<P>
+The <b>/B</b> modifier is a debugging feature. It requests that <b>pcretest</b>
+output a representation of the compiled code after compilation. Normally this
+information contains length and offset values; however, if <b>/Z</b> is also
+present, this data is replaced by spaces. This is a special feature for use in
+the automatic test scripts; it ensures that the same output is generated for
+different internal link sizes.
+</P>
+<P>
+The <b>/D</b> modifier is a PCRE debugging feature, and is equivalent to
+<b>/BI</b>, that is, both the <b>/B</b> and the <b>/I</b> modifiers.
+</P>
+<P>
+The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the
+2-byte and 4-byte fields in the compiled pattern. This facility is for testing
+the feature in PCRE that allows it to execute patterns that were compiled on a
+host with a different endianness. This feature is not available when the POSIX
+interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
+specified. See also the section about saving and reloading compiled patterns
+below.
+</P>
+<P>
+The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
+compiled pattern (whether it is anchored, has a fixed first character, and
+so on). It does this by calling <b>pcre[16|32]_fullinfo()</b> after compiling a
+pattern. If the pattern is studied, the results of that are also output. In
+this output, the word "char" means a non-UTF character, that is, the value of a
+single data item (8-bit, 16-bit, or 32-bit, depending on the library that is
+being tested).
+</P>
+<P>
+The <b>/K</b> modifier requests <b>pcretest</b> to show names from backtracking
+control verbs that are returned from calls to <b>pcre[16|32]_exec()</b>. It causes
+<b>pcretest</b> to create a <b>pcre[16|32]_extra</b> block if one has not already
+been created by a call to <b>pcre[16|32]_study()</b>, and to set the
+PCRE_EXTRA_MARK flag and the <b>mark</b> field within it, every time that
+<b>pcre[16|32]_exec()</b> is called. If the variable that the <b>mark</b> field
+points to is non-NULL for a match, non-match, or partial match, <b>pcretest</b>
+prints the string to which it points. For a match, this is shown on a line by
+itself, tagged with "MK:". For a non-match it is added to the message.
+</P>
+<P>
+The <b>/L</b> modifier must be followed directly by the name of a locale, for
+example,
+<pre>
+ /pattern/Lfr_FR
+</pre>
+For this reason, it must be the last modifier. The given locale is set,
+<b>pcre[16|32]_maketables()</b> is called to build a set of character tables for
+the locale, and this is then passed to <b>pcre[16|32]_compile()</b> when compiling
+the regular expression. Without an <b>/L</b> (or <b>/T</b>) modifier, NULL is
+passed as the tables pointer; that is, <b>/L</b> applies only to the expression
+on which it appears.
+</P>
+<P>
+The <b>/M</b> modifier causes the size in bytes of the memory block used to hold
+the compiled pattern to be output. This does not include the size of the
+<b>pcre[16|32]</b> block; it is just the actual compiled data. If the pattern is
+successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the
+JIT compiled code is also output.
+</P>
+<P>
+The <b>/Q</b> modifier is used to test the use of <b>pcre_stack_guard</b>. It
+must be followed by '0' or '1', specifying the return code to be given from an
+external function that is passed to PCRE and used for stack checking during
+compilation (see the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation for details).
+</P>
+<P>
+The <b>/S</b> modifier causes <b>pcre[16|32]_study()</b> to be called after the
+expression has been compiled, and the results used when the expression is
+matched. There are a number of qualifying characters that may follow <b>/S</b>.
+They may appear in any order.
+</P>
+<P>
+If <b>/S</b> is followed by an exclamation mark, <b>pcre[16|32]_study()</b> is
+called with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
+<b>pcre_extra</b> block, even when studying discovers no useful information.
+</P>
+<P>
+If <b>/S</b> is followed by a second S character, it suppresses studying, even
+if it was requested externally by the <b>-s</b> command line option. This makes
+it possible to specify that certain patterns are always studied, and others are
+never studied, independently of <b>-s</b>. This feature is used in the test
+files in a few cases where the output is different when the pattern is studied.
+</P>
+<P>
+If the <b>/S</b> modifier is followed by a + character, the call to
+<b>pcre[16|32]_study()</b> is made with all the JIT study options, requesting
+just-in-time optimization support if it is available, for both normal and
+partial matching. If you want to restrict the JIT compiling modes, you can
+follow <b>/S+</b> with a digit in the range 1 to 7:
+<pre>
+ 1 normal match only
+ 2 soft partial match only
+ 3 normal match and soft partial match
+ 4 hard partial match only
+ 6 soft and hard partial match
+ 7 all three modes (default)
+</pre>
+If <b>/S++</b> is used instead of <b>/S+</b> (with or without a following digit),
+the text "(JIT)" is added to the first output line after a match or no match
+when JIT-compiled code was actually used.
+</P>
+<P>
+Note that there is also an independent <b>/+</b> modifier; it must not be given
+immediately after <b>/S</b> or <b>/S+</b> because this will be misinterpreted.
+</P>
+<P>
+If JIT studying is successful, the compiled JIT code will automatically be used
+when <b>pcre[16|32]_exec()</b> is run, except when incompatible run-time options
+are specified. For more details, see the
+<a href="pcrejit.html"><b>pcrejit</b></a>
+documentation. See also the <b>\J</b> escape sequence below for a way of
+setting the size of the JIT stack.
+</P>
+<P>
+Finally, if <b>/S</b> is followed by a minus character, JIT compilation is
+suppressed, even if it was requested externally by the <b>-s</b> command line
+option. This makes it possible to specify that JIT is never to be used for
+certain patterns.
+</P>
+<P>
+The <b>/T</b> modifier must be followed by a single digit. It causes a specific
+set of built-in character tables to be passed to <b>pcre[16|32]_compile()</b>. It
+is used in the standard PCRE tests to check behaviour with different character
+tables. The digit specifies the tables as follows:
+<pre>
+ 0 the default ASCII tables, as distributed in
+ pcre_chartables.c.dist
+ 1 a set of tables defining ISO 8859 characters
+</pre>
+In table 1, some characters whose codes are greater than 128 are identified as
+letters, digits, spaces, etc.
+</P>
+<br><b>
+Using the POSIX wrapper API
+</b><br>
+<P>
+The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper
+API rather than its native API. This supports only the 8-bit library. When
+<b>/P</b> is set, the following modifiers set options for the <b>regcomp()</b>
+function:
+<pre>
+ /i REG_ICASE
+ /m REG_NEWLINE
+ /N REG_NOSUB
+ /s REG_DOTALL )
+ /U REG_UNGREEDY ) These options are not part of
+ /W REG_UCP ) the POSIX standard
+ /8 REG_UTF8 )
+</pre>
+The <b>/+</b> modifier works as described above. All other modifiers are
+ignored.
+</P>
+<br><b>
+Locking out certain modifiers
+</b><br>
+<P>
+PCRE can be compiled with or without support for certain features such as
+UTF-8/16/32 or Unicode properties. Accordingly, the standard tests are split up
+into a number of different files that are selected for running depending on
+which features are available. When updating the tests, it is all too easy to
+put a new test into the wrong file by mistake; for example, to put a test that
+requires UTF support into a file that is used when it is not available. To help
+detect such mistakes as early as possible, there is a facility for locking out
+specific modifiers. If an input line for <b>pcretest</b> starts with the string
+"&#60; forbid " the following sequence of characters is taken as a list of
+forbidden modifiers. For example, in the test files that must not use UTF or
+Unicode property support, this line appears:
+<pre>
+ &#60; forbid 8W
+</pre>
+This locks out the /8 and /W modifiers. An immediate error is given if they are
+subsequently encountered. If the character string contains &#60; but not &#62;, all the
+multi-character modifiers that begin with &#60; are locked out. Otherwise, such
+modifiers must be explicitly listed, for example:
+<pre>
+ &#60; forbid &#60;JS&#62;&#60;cr&#62;
+</pre>
+There must be a single space between &#60; and "forbid" for this feature to be
+recognised. If there is not, the line is interpreted either as a request to
+re-load a pre-compiled pattern (see "SAVING AND RELOADING COMPILED PATTERNS"
+below) or, if there is a another &#60; character, as a pattern that uses &#60; as its
+delimiter.
+</P>
+<br><a name="SEC7" href="#TOC1">DATA LINES</a><br>
+<P>
+Before each data line is passed to <b>pcre[16|32]_exec()</b>, leading and trailing
+white space is removed, and it is then scanned for \ escapes. Some of these
+are pretty esoteric features, intended for checking out some of the more
+complicated features of PCRE. If you are just testing "ordinary" regular
+expressions, you probably don't need any of these. The following escapes are
+recognized:
+<pre>
+ \a alarm (BEL, \x07)
+ \b backspace (\x08)
+ \e escape (\x27)
+ \f form feed (\x0c)
+ \n newline (\x0a)
+ \qdd set the PCRE_MATCH_LIMIT limit to dd (any number of digits)
+ \r carriage return (\x0d)
+ \t tab (\x09)
+ \v vertical tab (\x0b)
+ \nnn octal character (up to 3 octal digits); always
+ a byte unless &#62; 255 in UTF-8 or 16-bit or 32-bit mode
+ \o{dd...} octal character (any number of octal digits}
+ \xhh hexadecimal byte (up to 2 hex digits)
+ \x{hh...} hexadecimal character (any number of hex digits)
+ \A pass the PCRE_ANCHORED option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+ \B pass the PCRE_NOTBOL option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+ \Cdd call pcre[16|32]_copy_substring() for substring dd after a successful match (number less than 32)
+ \Cname call pcre[16|32]_copy_named_substring() for substring "name" after a successful match (name termin-
+ ated by next non alphanumeric character)
+ \C+ show the current captured substrings at callout time
+ \C- do not supply a callout function
+ \C!n return 1 instead of 0 when callout number n is reached
+ \C!n!m return 1 instead of 0 when callout number n is reached for the nth time
+ \C*n pass the number n (may be negative) as callout data; this is used as the callout return value
+ \D use the <b>pcre[16|32]_dfa_exec()</b> match function
+ \F only shortest match for <b>pcre[16|32]_dfa_exec()</b>
+ \Gdd call pcre[16|32]_get_substring() for substring dd after a successful match (number less than 32)
+ \Gname call pcre[16|32]_get_named_substring() for substring "name" after a successful match (name termin-
+ ated by next non-alphanumeric character)
+ \Jdd set up a JIT stack of dd kilobytes maximum (any number of digits)
+ \L call pcre[16|32]_get_substringlist() after a successful match
+ \M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
+ \N pass the PCRE_NOTEMPTY option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>; if used twice, pass the
+ PCRE_NOTEMPTY_ATSTART option
+ \Odd set the size of the output vector passed to <b>pcre[16|32]_exec()</b> to dd (any number of digits)
+ \P pass the PCRE_PARTIAL_SOFT option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>; if used twice, pass the
+ PCRE_PARTIAL_HARD option
+ \Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
+ \R pass the PCRE_DFA_RESTART option to <b>pcre[16|32]_dfa_exec()</b>
+ \S output details of memory get/free calls during matching
+ \Y pass the PCRE_NO_START_OPTIMIZE option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+ \Z pass the PCRE_NOTEOL option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+ \? pass the PCRE_NO_UTF[8|16|32]_CHECK option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+ \&#62;dd start the match at offset dd (optional "-"; then any number of digits); this sets the <i>startoffset</i>
+ argument for <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+ \&#60;cr&#62; pass the PCRE_NEWLINE_CR option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+ \&#60;lf&#62; pass the PCRE_NEWLINE_LF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+ \&#60;crlf&#62; pass the PCRE_NEWLINE_CRLF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+ \&#60;anycrlf&#62; pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+ \&#60;any&#62; pass the PCRE_NEWLINE_ANY option to <b>pcre[16|32]_exec()</b> or <b>pcre[16|32]_dfa_exec()</b>
+</pre>
+The use of \x{hh...} is not dependent on the use of the <b>/8</b> modifier on
+the pattern. It is recognized always. There may be any number of hexadecimal
+digits inside the braces; invalid values provoke error messages.
+</P>
+<P>
+Note that \xhh specifies one byte rather than one character in UTF-8 mode;
+this makes it possible to construct invalid UTF-8 sequences for testing
+purposes. On the other hand, \x{hh} is interpreted as a UTF-8 character in
+UTF-8 mode, generating more than one byte if the value is greater than 127.
+When testing the 8-bit library not in UTF-8 mode, \x{hh} generates one byte
+for values less than 256, and causes an error for greater values.
+</P>
+<P>
+In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
+possible to construct invalid UTF-16 sequences for testing purposes.
+</P>
+<P>
+In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it
+possible to construct invalid UTF-32 sequences for testing purposes.
+</P>
+<P>
+The escapes that specify line ending sequences are literal strings, exactly as
+shown. No more than one newline setting should be present in any data line.
+</P>
+<P>
+A backslash followed by anything else just escapes the anything else. If
+the very last character is a backslash, it is ignored. This gives a way of
+passing an empty line as data, since a real empty line terminates the data
+input.
+</P>
+<P>
+The <b>\J</b> escape provides a way of setting the maximum stack size that is
+used by the just-in-time optimization code. It is ignored if JIT optimization
+is not being used. Providing a stack that is larger than the default 32K is
+necessary only for very complicated patterns.
+</P>
+<P>
+If \M is present, <b>pcretest</b> calls <b>pcre[16|32]_exec()</b> several times,
+with different values in the <i>match_limit</i> and <i>match_limit_recursion</i>
+fields of the <b>pcre[16|32]_extra</b> data structure, until it finds the minimum
+numbers for each parameter that allow <b>pcre[16|32]_exec()</b> to complete without
+error. Because this is testing a specific feature of the normal interpretive
+<b>pcre[16|32]_exec()</b> execution, the use of any JIT optimization that might
+have been set up by the <b>/S+</b> qualifier of <b>-s+</b> option is disabled.
+</P>
+<P>
+The <i>match_limit</i> number is a measure of the amount of backtracking
+that takes place, and checking it out can be instructive. For most simple
+matches, the number is quite small, but for patterns with very large numbers of
+matching possibilities, it can become large very quickly with increasing length
+of subject string. The <i>match_limit_recursion</i> number is a measure of how
+much stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is
+needed to complete the match attempt.
+</P>
+<P>
+When \O is used, the value specified may be higher or lower than the size set
+by the <b>-O</b> command line option (or defaulted to 45); \O applies only to
+the call of <b>pcre[16|32]_exec()</b> for the line in which it appears.
+</P>
+<P>
+If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper
+API to be used, the only option-setting sequences that have any effect are \B,
+\N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively,
+to be passed to <b>regexec()</b>.
+</P>
+<br><a name="SEC8" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
+<P>
+By default, <b>pcretest</b> uses the standard PCRE matching function,
+<b>pcre[16|32]_exec()</b> to match each data line. PCRE also supports an
+alternative matching function, <b>pcre[16|32]_dfa_test()</b>, which operates in a
+different way, and has some restrictions. The differences between the two
+functions are described in the
+<a href="pcrematching.html"><b>pcrematching</b></a>
+documentation.
+</P>
+<P>
+If a data line contains the \D escape sequence, or if the command line
+contains the <b>-dfa</b> option, the alternative matching function is used.
+This function finds all possible matches at a given point. If, however, the \F
+escape sequence is present in the data line, it stops after the first match is
+found. This is always the shortest possible match.
+</P>
+<br><a name="SEC9" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
+<P>
+This section describes the output when the normal matching function,
+<b>pcre[16|32]_exec()</b>, is being used.
+</P>
+<P>
+When a match succeeds, <b>pcretest</b> outputs the list of captured substrings
+that <b>pcre[16|32]_exec()</b> returns, starting with number 0 for the string that
+matched the whole pattern. Otherwise, it outputs "No match" when the return is
+PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching
+substring when <b>pcre[16|32]_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that
+this is the entire substring that was inspected during the partial match; it
+may include characters before the actual match start if a lookbehind assertion,
+\K, \b, or \B was involved.) For any other return, <b>pcretest</b> outputs
+the PCRE negative error number and a short descriptive phrase. If the error is
+a failed UTF string check, the offset of the start of the failing character and
+the reason code are also output, provided that the size of the output vector is
+at least two. Here is an example of an interactive <b>pcretest</b> run.
+<pre>
+ $ pcretest
+ PCRE version 8.13 2011-04-30
+
+ re&#62; /^abc(\d+)/
+ data&#62; abc123
+ 0: abc123
+ 1: 123
+ data&#62; xyz
+ No match
+</pre>
+Unset capturing substrings that are not followed by one that is set are not
+returned by <b>pcre[16|32]_exec()</b>, and are not shown by <b>pcretest</b>. In the
+following example, there are two capturing substrings, but when the first data
+line is matched, the second, unset substring is not shown. An "internal" unset
+substring is shown as "&#60;unset&#62;", as for the second data line.
+<pre>
+ re&#62; /(a)|(b)/
+ data&#62; a
+ 0: a
+ 1: a
+ data&#62; b
+ 0: b
+ 1: &#60;unset&#62;
+ 2: b
+</pre>
+If the strings contain any non-printing characters, they are output as \xhh
+escapes if the value is less than 256 and UTF mode is not set. Otherwise they
+are output as \x{hh...} escapes. See below for the definition of non-printing
+characters. If the pattern has the <b>/+</b> modifier, the output for substring
+0 is followed by the the rest of the subject string, identified by "0+" like
+this:
+<pre>
+ re&#62; /cat/+
+ data&#62; cataract
+ 0: cat
+ 0+ aract
+</pre>
+If the pattern has the <b>/g</b> or <b>/G</b> modifier, the results of successive
+matching attempts are output in sequence, like this:
+<pre>
+ re&#62; /\Bi(\w\w)/g
+ data&#62; Mississippi
+ 0: iss
+ 1: ss
+ 0: iss
+ 1: ss
+ 0: ipp
+ 1: pp
+</pre>
+"No match" is output only if the first match attempt fails. Here is an example
+of a failure message (the offset 4 that is specified by \&#62;4 is past the end of
+the subject string):
+<pre>
+ re&#62; /xyz/
+ data&#62; xyz\&#62;4
+ Error -24 (bad offset value)
+</PRE>
+</P>
+<P>
+If any of the sequences <b>\C</b>, <b>\G</b>, or <b>\L</b> are present in a
+data line that is successfully matched, the substrings extracted by the
+convenience functions are output with C, G, or L after the string number
+instead of a colon. This is in addition to the normal full list. The string
+length (that is, the return from the extraction function) is given in
+parentheses after each string for <b>\C</b> and <b>\G</b>.
+</P>
+<P>
+Note that whereas patterns can be continued over several lines (a plain "&#62;"
+prompt is used for continuations), data lines may not. However newlines can be
+included in data by means of the \n escape (or \r, \r\n, etc., depending on
+the newline sequence setting).
+</P>
+<br><a name="SEC10" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
+<P>
+When the alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, is used (by
+means of the \D escape sequence or the <b>-dfa</b> command line option), the
+output consists of a list of all the matches that start at the first point in
+the subject where there is at least one match. For example:
+<pre>
+ re&#62; /(tang|tangerine|tan)/
+ data&#62; yellow tangerine\D
+ 0: tangerine
+ 1: tang
+ 2: tan
+</pre>
+(Using the normal matching function on this data finds only "tang".) The
+longest matching string is always given first (and numbered zero). After a
+PCRE_ERROR_PARTIAL return, the output is "Partial match:", followed by the
+partially matching substring. (Note that this is the entire substring that was
+inspected during the partial match; it may include characters before the actual
+match start if a lookbehind assertion, \K, \b, or \B was involved.)
+</P>
+<P>
+If <b>/g</b> is present on the pattern, the search for further matches resumes
+at the end of the longest match. For example:
+<pre>
+ re&#62; /(tang|tangerine|tan)/g
+ data&#62; yellow tangerine and tangy sultana\D
+ 0: tangerine
+ 1: tang
+ 2: tan
+ 0: tang
+ 1: tan
+ 0: tan
+</pre>
+Since the matching function does not support substring capture, the escape
+sequences that are concerned with captured substrings are not relevant.
+</P>
+<br><a name="SEC11" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
+<P>
+When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
+indicating that the subject partially matched the pattern, you can restart the
+match with additional subject data by means of the \R escape sequence. For
+example:
+<pre>
+ re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+ data&#62; 23ja\P\D
+ Partial match: 23ja
+ data&#62; n05\R\D
+ 0: n05
+</pre>
+For further information about partial matching, see the
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+documentation.
+</P>
+<br><a name="SEC12" href="#TOC1">CALLOUTS</a><br>
+<P>
+If the pattern contains any callout requests, <b>pcretest</b>'s callout function
+is called during matching. This works with both matching functions. By default,
+the called function displays the callout number, the start and current
+positions in the text at the callout time, and the next pattern item to be
+tested. For example:
+<pre>
+ ---&#62;pqrabcdef
+ 0 ^ ^ \d
+</pre>
+This output indicates that callout number 0 occurred for a match attempt
+starting at the fourth character of the subject string, when the pointer was at
+the seventh character of the data, and when the next pattern item was \d. Just
+one circumflex is output if the start and current positions are the same.
+</P>
+<P>
+Callouts numbered 255 are assumed to be automatic callouts, inserted as a
+result of the <b>/C</b> pattern modifier. In this case, instead of showing the
+callout number, the offset in the pattern, preceded by a plus, is output. For
+example:
+<pre>
+ re&#62; /\d?[A-E]\*/C
+ data&#62; E*
+ ---&#62;E*
+ +0 ^ \d?
+ +3 ^ [A-E]
+ +8 ^^ \*
+ +10 ^ ^
+ 0: E*
+</pre>
+If a pattern contains (*MARK) items, an additional line is output whenever
+a change of latest mark is passed to the callout function. For example:
+<pre>
+ re&#62; /a(*MARK:X)bc/C
+ data&#62; abc
+ ---&#62;abc
+ +0 ^ a
+ +1 ^^ (*MARK:X)
+ +10 ^^ b
+ Latest Mark: X
+ +11 ^ ^ c
+ +12 ^ ^
+ 0: abc
+</pre>
+The mark changes between matching "a" and "b", but stays the same for the rest
+of the match, so nothing more is output. If, as a result of backtracking, the
+mark reverts to being unset, the text "&#60;unset&#62;" is output.
+</P>
+<P>
+The callout function in <b>pcretest</b> returns zero (carry on matching) by
+default, but you can use a \C item in a data line (as described above) to
+change this and other parameters of the callout.
+</P>
+<P>
+Inserting callouts can be helpful when using <b>pcretest</b> to check
+complicated regular expressions. For further information about callouts, see
+the
+<a href="pcrecallout.html"><b>pcrecallout</b></a>
+documentation.
+</P>
+<br><a name="SEC13" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
+<P>
+When <b>pcretest</b> is outputting text in the compiled version of a pattern,
+bytes other than 32-126 are always treated as non-printing characters are are
+therefore shown as hex escapes.
+</P>
+<P>
+When <b>pcretest</b> is outputting text that is a matched part of a subject
+string, it behaves in the same way, unless a different locale has been set for
+the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b>
+function to distinguish printing and non-printing characters.
+</P>
+<br><a name="SEC14" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
+<P>
+The facilities described in this section are not available when the POSIX
+interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
+specified.
+</P>
+<P>
+When the POSIX interface is not in use, you can cause <b>pcretest</b> to write a
+compiled pattern to a file, by following the modifiers with &#62; and a file name.
+For example:
+<pre>
+ /pattern/im &#62;/some/file
+</pre>
+See the
+<a href="pcreprecompile.html"><b>pcreprecompile</b></a>
+documentation for a discussion about saving and re-using compiled patterns.
+Note that if the pattern was successfully studied with JIT optimization, the
+JIT data cannot be saved.
+</P>
+<P>
+The data that is written is binary. The first eight bytes are the length of the
+compiled pattern data followed by the length of the optional study data, each
+written as four bytes in big-endian order (most significant byte first). If
+there is no study data (either the pattern was not studied, or studying did not
+return any data), the second length is zero. The lengths are followed by an
+exact copy of the compiled pattern. If there is additional study data, this
+(excluding any JIT data) follows immediately after the compiled pattern. After
+writing the file, <b>pcretest</b> expects to read a new pattern.
+</P>
+<P>
+A saved pattern can be reloaded into <b>pcretest</b> by specifying &#60; and a file
+name instead of a pattern. There must be no space between &#60; and the file name,
+which must not contain a &#60; character, as otherwise <b>pcretest</b> will
+interpret the line as a pattern delimited by &#60; characters. For example:
+<pre>
+ re&#62; &#60;/some/file
+ Compiled pattern loaded from /some/file
+ No study data
+</pre>
+If the pattern was previously studied with the JIT optimization, the JIT
+information cannot be saved and restored, and so is lost. When the pattern has
+been loaded, <b>pcretest</b> proceeds to read data lines in the usual way.
+</P>
+<P>
+You can copy a file written by <b>pcretest</b> to a different host and reload it
+there, even if the new host has opposite endianness to the one on which the
+pattern was compiled. For example, you can compile on an i86 machine and run on
+a SPARC machine. When a pattern is reloaded on a host with different
+endianness, the confirmation message is changed to:
+<pre>
+ Compiled pattern (byte-inverted) loaded from /some/file
+</pre>
+The test suite contains some saved pre-compiled patterns with different
+endianness. These are reloaded using "&#60;!" instead of just "&#60;". This suppresses
+the "(byte-inverted)" text so that the output is the same on all hosts. It also
+forces debugging output once the pattern has been reloaded.
+</P>
+<P>
+File names for saving and reloading can be absolute or relative, but note that
+the shell facility of expanding a file name that starts with a tilde (~) is not
+available.
+</P>
+<P>
+The ability to save and reload files in <b>pcretest</b> is intended for testing
+and experimentation. It is not intended for production use because only a
+single pattern can be written to a file. Furthermore, there is no facility for
+supplying custom character tables for use with a reloaded pattern. If the
+original pattern was compiled with custom tables, an attempt to match a subject
+string using a reloaded pattern is likely to cause <b>pcretest</b> to crash.
+Finally, if you attempt to load a file that is not in the correct format, the
+result is undefined.
+</P>
+<br><a name="SEC15" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcre</b>(3), <b>pcre16</b>(3), <b>pcre32</b>(3), <b>pcreapi</b>(3),
+<b>pcrecallout</b>(3),
+<b>pcrejit</b>, <b>pcrematching</b>(3), <b>pcrepartial</b>(d),
+<b>pcrepattern</b>(3), <b>pcreprecompile</b>(3).
+</P>
+<br><a name="SEC16" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC17" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 09 February 2014
+<br>
+Copyright &copy; 1997-2014 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
diff --git a/doc/html/pcreunicode.html b/doc/html/pcreunicode.html
new file mode 100644
index 0000000..ab36bc6
--- /dev/null
+++ b/doc/html/pcreunicode.html
@@ -0,0 +1,262 @@
+<html>
+<head>
+<title>pcreunicode specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcreunicode man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+UTF-8, UTF-16, UTF-32, AND UNICODE PROPERTY SUPPORT
+</b><br>
+<P>
+As well as UTF-8 support, PCRE also supports UTF-16 (from release 8.30) and
+UTF-32 (from release 8.32), by means of two additional libraries. They can be
+built as well as, or instead of, the 8-bit library.
+</P>
+<br><b>
+UTF-8 SUPPORT
+</b><br>
+<P>
+In order process UTF-8 strings, you must build PCRE's 8-bit library with UTF
+support, and, in addition, you must call
+<a href="pcre_compile.html"><b>pcre_compile()</b></a>
+with the PCRE_UTF8 option flag, or the pattern must start with the sequence
+(*UTF8) or (*UTF). When either of these is the case, both the pattern and any
+subject strings that are matched against it are treated as UTF-8 strings
+instead of strings of individual 1-byte characters.
+</P>
+<br><b>
+UTF-16 AND UTF-32 SUPPORT
+</b><br>
+<P>
+In order process UTF-16 or UTF-32 strings, you must build PCRE's 16-bit or
+32-bit library with UTF support, and, in addition, you must call
+<a href="pcre16_compile.html"><b>pcre16_compile()</b></a>
+or
+<a href="pcre32_compile.html"><b>pcre32_compile()</b></a>
+with the PCRE_UTF16 or PCRE_UTF32 option flag, as appropriate. Alternatively,
+the pattern must start with the sequence (*UTF16), (*UTF32), as appropriate, or
+(*UTF), which can be used with either library. When UTF mode is set, both the
+pattern and any subject strings that are matched against it are treated as
+UTF-16 or UTF-32 strings instead of strings of individual 16-bit or 32-bit
+characters.
+</P>
+<br><b>
+UTF SUPPORT OVERHEAD
+</b><br>
+<P>
+If you compile PCRE with UTF support, but do not use it at run time, the
+library will be a bit bigger, but the additional run time overhead is limited
+to testing the PCRE_UTF[8|16|32] flag occasionally, so should not be very big.
+</P>
+<br><b>
+UNICODE PROPERTY SUPPORT
+</b><br>
+<P>
+If PCRE is built with Unicode character property support (which implies UTF
+support), the escape sequences \p{..}, \P{..}, and \X can be used.
+The available properties that can be tested are limited to the general
+category properties such as Lu for an upper case letter or Nd for a decimal
+number, the Unicode script names such as Arabic or Han, and the derived
+properties Any and L&. Full lists is given in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+and
+<a href="pcresyntax.html"><b>pcresyntax</b></a>
+documentation. Only the short names for properties are supported. For example,
+\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.
+Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
+compatibility with Perl 5.6. PCRE does not support this.
+<a name="utf8strings"></a></P>
+<br><b>
+Validity of UTF-8 strings
+</b><br>
+<P>
+When you set the PCRE_UTF8 flag, the byte strings passed as patterns and
+subjects are (by default) checked for validity on entry to the relevant
+functions. The entire string is checked before any other processing takes
+place. From release 7.3 of PCRE, the check is according the rules of RFC 3629,
+which are themselves derived from the Unicode specification. Earlier releases
+of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
+values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
+to U+10FFFF, excluding the surrogate area. (From release 8.33 the so-called
+"non-character" code points are no longer excluded because Unicode corrigendum
+#9 makes it clear that they should not be.)
+</P>
+<P>
+Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16,
+where they are used in pairs to encode codepoints with values greater than
+0xFFFF. The code points that are encoded by UTF-16 pairs are available
+independently in the UTF-8 and UTF-32 encodings. (In other words, the whole
+surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and
+UTF-32.)
+</P>
+<P>
+If an invalid UTF-8 string is passed to PCRE, an error return is given. At
+compile time, the only additional information is the offset to the first byte
+of the failing character. The run-time functions <b>pcre_exec()</b> and
+<b>pcre_dfa_exec()</b> also pass back this information, as well as a more
+detailed reason code if the caller has provided memory in which to do this.
+</P>
+<P>
+In some situations, you may already know that your strings are valid, and
+therefore want to skip these checks in order to improve performance, for
+example in the case of a long subject string that is being scanned repeatedly.
+If you set the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE
+assumes that the pattern or subject it is given (respectively) contains only
+valid UTF-8 codes. In this case, it does not diagnose an invalid UTF-8 string.
+</P>
+<P>
+Note that passing PCRE_NO_UTF8_CHECK to <b>pcre_compile()</b> just disables the
+check for the pattern; it does not also apply to subject strings. If you want
+to disable the check for a subject string you must pass this option to
+<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>.
+</P>
+<P>
+If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, the result
+is undefined and your program may crash.
+<a name="utf16strings"></a></P>
+<br><b>
+Validity of UTF-16 strings
+</b><br>
+<P>
+When you set the PCRE_UTF16 flag, the strings of 16-bit data units that are
+passed as patterns and subjects are (by default) checked for validity on entry
+to the relevant functions. Values other than those in the surrogate range
+U+D800 to U+DFFF are independent code points. Values in the surrogate range
+must be used in pairs in the correct manner.
+</P>
+<P>
+If an invalid UTF-16 string is passed to PCRE, an error return is given. At
+compile time, the only additional information is the offset to the first data
+unit of the failing character. The run-time functions <b>pcre16_exec()</b> and
+<b>pcre16_dfa_exec()</b> also pass back this information, as well as a more
+detailed reason code if the caller has provided memory in which to do this.
+</P>
+<P>
+In some situations, you may already know that your strings are valid, and
+therefore want to skip these checks in order to improve performance. If you set
+the PCRE_NO_UTF16_CHECK flag at compile time or at run time, PCRE assumes that
+the pattern or subject it is given (respectively) contains only valid UTF-16
+sequences. In this case, it does not diagnose an invalid UTF-16 string.
+However, if an invalid string is passed, the result is undefined.
+<a name="utf32strings"></a></P>
+<br><b>
+Validity of UTF-32 strings
+</b><br>
+<P>
+When you set the PCRE_UTF32 flag, the strings of 32-bit data units that are
+passed as patterns and subjects are (by default) checked for validity on entry
+to the relevant functions. This check allows only values in the range U+0
+to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF.
+</P>
+<P>
+If an invalid UTF-32 string is passed to PCRE, an error return is given. At
+compile time, the only additional information is the offset to the first data
+unit of the failing character. The run-time functions <b>pcre32_exec()</b> and
+<b>pcre32_dfa_exec()</b> also pass back this information, as well as a more
+detailed reason code if the caller has provided memory in which to do this.
+</P>
+<P>
+In some situations, you may already know that your strings are valid, and
+therefore want to skip these checks in order to improve performance. If you set
+the PCRE_NO_UTF32_CHECK flag at compile time or at run time, PCRE assumes that
+the pattern or subject it is given (respectively) contains only valid UTF-32
+sequences. In this case, it does not diagnose an invalid UTF-32 string.
+However, if an invalid string is passed, the result is undefined.
+</P>
+<br><b>
+General comments about UTF modes
+</b><br>
+<P>
+1. Codepoints less than 256 can be specified in patterns by either braced or
+unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3). Larger
+values have to use braced sequences.
+</P>
+<P>
+2. Octal numbers up to \777 are recognized, and in UTF-8 mode they match
+two-byte characters for values greater than \177.
+</P>
+<P>
+3. Repeat quantifiers apply to complete UTF characters, not to individual
+data units, for example: \x{100}{3}.
+</P>
+<P>
+4. The dot metacharacter matches one UTF character instead of a single data
+unit.
+</P>
+<P>
+5. The escape sequence \C can be used to match a single byte in UTF-8 mode, or
+a single 16-bit data unit in UTF-16 mode, or a single 32-bit data unit in
+UTF-32 mode, but its use can lead to some strange effects because it breaks up
+multi-unit characters (see the description of \C in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation). The use of \C is not supported in the alternative matching
+function <b>pcre[16|32]_dfa_exec()</b>, nor is it supported in UTF mode by the
+JIT optimization of <b>pcre[16|32]_exec()</b>. If JIT optimization is requested
+for a UTF pattern that contains \C, it will not succeed, and so the matching
+will be carried out by the normal interpretive function.
+</P>
+<P>
+6. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
+test characters of any code value, but, by default, the characters that PCRE
+recognizes as digits, spaces, or word characters remain the same set as in
+non-UTF mode, all with values less than 256. This remains true even when PCRE
+is built to include Unicode property support, because to do otherwise would
+slow down PCRE in many common cases. Note in particular that this applies to
+\b and \B, because they are defined in terms of \w and \W. If you really
+want to test for a wider sense of, say, "digit", you can use explicit Unicode
+property tests such as \p{Nd}. Alternatively, if you set the PCRE_UCP option,
+the way that the character escapes work is changed so that Unicode properties
+are used to determine which characters match. There are more details in the
+section on
+<a href="pcrepattern.html#genericchartypes">generic character types</a>
+in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation.
+</P>
+<P>
+7. Similarly, characters that match the POSIX named character classes are all
+low-valued characters, unless the PCRE_UCP option is set.
+</P>
+<P>
+8. However, the horizontal and vertical white space matching escapes (\h, \H,
+\v, and \V) do match all the appropriate Unicode characters, whether or not
+PCRE_UCP is set.
+</P>
+<P>
+9. Case-insensitive matching applies only to characters whose values are less
+than 128, unless PCRE is built with Unicode property support. A few Unicode
+characters such as Greek sigma have more than two codepoints that are
+case-equivalent. Up to and including PCRE release 8.31, only one-to-one case
+mappings were supported, but later releases (with Unicode property support) do
+treat as case-equivalent all versions of characters such as Greek sigma.
+</P>
+<br><b>
+AUTHOR
+</b><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><b>
+REVISION
+</b><br>
+<P>
+Last updated: 27 February 2013
+<br>
+Copyright &copy; 1997-2013 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>