summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authorMatthew Vernon <matthew@debian.org>2018-02-24 12:07:04 +0000
committerMatthew Vernon <matthew@debian.org>2018-02-24 12:07:04 +0000
commite98c3314cf9e05aa99f5e192862ec37f29b7dbb5 (patch)
treeb69bb3feb63a4fd79ad8a6e55865228f6fde04eb /README
parent92b17f0eb8fddd7117c5344a1e1177daec21995a (diff)
New upstream version 10.31
Diffstat (limited to 'README')
-rw-r--r--README212
1 files changed, 125 insertions, 87 deletions
diff --git a/README b/README
index 03d67f6..52859a9 100644
--- a/README
+++ b/README
@@ -15,8 +15,8 @@ subscribe or manage your subscription here:
https://lists.exim.org/mailman/listinfo/pcre-dev
-Please read the NEWS file if you are upgrading from a previous release.
-The contents of this README file are:
+Please read the NEWS file if you are upgrading from a previous release. The
+contents of this README file are:
The PCRE2 APIs
Documentation for PCRE2
@@ -44,8 +44,8 @@ wrappers.
The distribution does contain a set of C wrapper functions for the 8-bit
library that are based on the POSIX regular expression API (see the pcre2posix
-man page). These can be found in a library called libpcre2posix. Note that this
-just provides a POSIX calling interface to PCRE2; the regular expressions
+man page). These can be found in a library called libpcre2-posix. Note that
+this just provides a POSIX calling interface to PCRE2; the regular expressions
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
and does not give full access to all of PCRE2's facilities.
@@ -58,8 +58,8 @@ renamed or pointed at by a link.
If you are using the POSIX interface to PCRE2 and there is already a POSIX
regex library installed on your system, as well as worrying about the regex.h
header file (as mentioned above), you must also take care when linking programs
-to ensure that they link with PCRE2's libpcre2posix library. Otherwise they may
-pick up the POSIX functions of the same name from the other library.
+to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they
+may pick up the POSIX functions of the same name from the other library.
One way of avoiding this confusion is to compile PCRE2 with the addition of
-Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
@@ -95,10 +95,9 @@ PCRE2 documentation is supplied in two other forms:
Building PCRE2 on non-Unix-like systems
---------------------------------------
-For a non-Unix-like system, please read the comments in the file
-NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
-"make" you may be able to build PCRE2 using autotools in the same way as for
-many Unix-like systems.
+For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
+your system supports the use of "configure" and "make" you may be able to build
+PCRE2 using autotools in the same way as for many Unix-like systems.
PCRE2 can also be configured using CMake, which can be run in various ways
(command line, GUI, etc). This creates Makefiles, solution files, etc. The file
@@ -172,21 +171,24 @@ library. They are also documented in the pcre2build man page.
give large performance improvements on certain platforms, add --enable-jit to
the "configure" command. This support is available only for certain hardware
architectures. If you try to enable it on an unsupported architecture, there
- will be a compile time error.
-
-. If you do not want to make use of the support for UTF-8 Unicode character
- strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit
- library, or UTF-32 Unicode character strings in the 32-bit library, you can
- add --disable-unicode to the "configure" command. This reduces the size of
- the libraries. It is not possible to configure one library with Unicode
- support, and another without, in the same configuration.
+ will be a compile time error. If you are running under SELinux you may also
+ want to add --enable-jit-sealloc, which enables the use of an execmem
+ allocator in JIT that is compatible with SELinux. This has no effect if JIT
+ is not enabled.
+
+. If you do not want to make use of the default support for UTF-8 Unicode
+ character strings in the 8-bit library, UTF-16 Unicode character strings in
+ the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
+ library, you can add --disable-unicode to the "configure" command. This
+ reduces the size of the libraries. It is not possible to configure one
+ library with Unicode support, and another without, in the same configuration.
+ It is also not possible to use --enable-ebcdic (see below) with Unicode
+ support, so if this option is set, you must also use --disable-unicode.
When Unicode support is available, the use of a UTF encoding still has to be
enabled by setting the PCRE2_UTF option at run time or starting a pattern
with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
- either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. It is
- not possible to use both --enable-unicode and --enable-ebcdic at the same
- time.
+ either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
As well as supporting UTF strings, Unicode support includes support for the
\P, \p, and \X sequences that recognize Unicode character properties.
@@ -196,20 +198,14 @@ library. They are also documented in the pcre2build man page.
or starting a pattern with (*UCP).
. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
- of the preceding, or any of the Unicode newline sequences, as indicating the
- end of a line. Whatever you specify at build time is the default; the caller
- of PCRE2 can change the selection at run time. The default newline indicator
- is a single LF character (the Unix standard). You can specify the default
- newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf,
- --enable-newline-is-crlf, --enable-newline-is-anycrlf, or
- --enable-newline-is-any to the "configure" command, respectively.
-
- If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
- the standard tests will fail, because the lines in the test files end with
- LF. Even if the files are edited to change the line endings, there are likely
- to be some failures. With --enable-newline-is-anycrlf or
- --enable-newline-is-any, many tests should succeed, but there may be some
- failures.
+ of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
+ character as indicating the end of a line. Whatever you specify at build time
+ is the default; the caller of PCRE2 can change the selection at run time. The
+ default newline indicator is a single LF character (the Unix standard). You
+ can specify the default newline indicator by adding --enable-newline-is-cr,
+ --enable-newline-is-lf, --enable-newline-is-crlf,
+ --enable-newline-is-anycrlf, --enable-newline-is-any, or
+ --enable-newline-is-nul to the "configure" command, respectively.
. By default, the sequence \R in a pattern matches any Unicode line ending
sequence. This is independent of the option specifying what PCRE2 considers
@@ -231,49 +227,44 @@ library. They are also documented in the pcre2build man page.
--with-parens-nest-limit=500
-. PCRE2 has a counter that can be set to limit the amount of resources it uses
- when matching a pattern. If the limit is exceeded during a match, the match
- fails. The default is ten million. You can change the default by setting, for
- example,
+. PCRE2 has a counter that can be set to limit the amount of computing resource
+ it uses when matching a pattern. If the limit is exceeded during a match, the
+ match fails. The default is ten million. You can change the default by
+ setting, for example,
--with-match-limit=500000
on the "configure" command. This is just the default; individual calls to
- pcre2_match() can supply their own value. There is more discussion on the
- pcre2api man page.
+ pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
+ discussion in the pcre2api man page (search for pcre2_set_match_limit).
+
+. There is a separate counter that limits the depth of nested backtracking
+ during a matching process, which indirectly limits the amount of heap memory
+ that is used. This also has a default of ten million, which is essentially
+ "unlimited". You can change the default by setting, for example,
+
+ --with-match-limit-depth=5000
-. There is a separate counter that limits the depth of recursive function calls
- during a matching process. This also has a default of ten million, which is
- essentially "unlimited". You can change the default by setting, for example,
+ There is more discussion in the pcre2api man page (search for
+ pcre2_set_depth_limit).
- --with-match-limit-recursion=500000
+. You can also set an explicit limit on the amount of heap memory used by
+ the pcre2_match() interpreter:
- Recursive function calls use up the runtime stack; running out of stack can
- cause programs to crash in strange ways. There is a discussion about stack
- sizes in the pcre2stack man page.
+ --with-heap-limit=500
+
+ The units are kilobytes. This limit does not apply when the JIT optimization
+ (which has its own memory control features) is used. There is more discussion
+ on the pcre2api man page (search for pcre2_set_heap_limit).
. In the 8-bit library, the default maximum compiled pattern size is around
- 64K. You can increase this by adding --with-link-size=3 to the "configure"
- command. PCRE2 then uses three bytes instead of two for offsets to different
- parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
- the same as --with-link-size=4, which (in both libraries) uses four-byte
- offsets. Increasing the internal link size reduces performance in the 8-bit
- and 16-bit libraries. In the 32-bit library, the link size setting is
- ignored, as 4-byte offsets are always used.
-
-. You can build PCRE2 so that its internal match() function that is called from
- pcre2_match() does not call itself recursively. Instead, it uses memory
- blocks obtained from the heap to save data that would otherwise be saved on
- the stack. To build PCRE2 like this, use
-
- --disable-stack-for-recursion
-
- on the "configure" command. PCRE2 runs more slowly in this mode, but it may
- be necessary in environments with limited stack sizes. This applies only to
- the normal execution of the pcre2_match() function; if JIT support is being
- successfully used, it is not relevant. Equally, it does not apply to
- pcre2_dfa_match(), which does not use deeply nested recursion. There is a
- discussion about stack sizes in the pcre2stack man page.
+ 64K bytes. You can increase this by adding --with-link-size=3 to the
+ "configure" command. PCRE2 then uses three bytes instead of two for offsets
+ to different parts of the compiled pattern. In the 16-bit library,
+ --with-link-size=3 is the same as --with-link-size=4, which (in both
+ libraries) uses four-byte offsets. Increasing the internal link size reduces
+ performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
+ link size setting is ignored, as 4-byte offsets are always used.
. For speed, PCRE2 uses four tables for manipulating and identifying characters
whose code point values are less than 256. By default, it uses a set of
@@ -339,12 +330,23 @@ library. They are also documented in the pcre2build man page.
Of course, the relevant libraries must be installed on your system.
-. The default size (in bytes) of the internal buffer used by pcre2grep can be
- set by, for example:
+. The default starting size (in bytes) of the internal buffer used by pcre2grep
+ can be set by, for example:
--with-pcre2grep-bufsize=51200
- The value must be a plain integer. The default is 20480.
+ The value must be a plain integer. The default is 20480. The amount of memory
+ used by pcre2grep is actually three times this number, to allow for "before"
+ and "after" lines. If very long lines are encountered, the buffer is
+ automatically enlarged, up to a fixed maximum size.
+
+. The default maximum size of pcre2grep's internal buffer can be set by, for
+ example:
+
+ --with-pcre2grep-max-bufsize=2097152
+
+ The default is either 1048576 or the value of --with-pcre2grep-bufsize,
+ whichever is the larger.
. It is possible to compile pcre2test so that it links with the libreadline
or libedit libraries, by specifying, respectively,
@@ -369,6 +371,29 @@ library. They are also documented in the pcre2build man page.
tgetflag, or tgoto, this is the problem, and linking with the ncurses library
should fix it.
+. There is a special option called --enable-fuzz-support for use by people who
+ want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
+ library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
+ be built, but not installed. This contains a single function called
+ LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
+ length of the string. When called, this function tries to compile the string
+ as a pattern, and if that succeeds, to match it. This is done both with no
+ options and with some random options bits that are generated from the string.
+ Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
+ be created. This is normally run under valgrind or used when PCRE2 is
+ compiled with address sanitizing enabled. It calls the fuzzing function and
+ outputs information about it is doing. The input strings are specified by
+ arguments: if an argument starts with "=" the rest of it is a literal input
+ string. Otherwise, it is assumed to be a file name, and the contents of the
+ file are the test string.
+
+. Releases before 10.30 could be compiled with --disable-stack-for-recursion,
+ which caused pcre2_match() to use individual blocks on the heap for
+ backtracking instead of recursive function calls (which use the stack). This
+ is now obsolete since pcre2_match() was refactored always to use the heap (in
+ a much more efficient way than before). This option is retained for backwards
+ compatibility, but has no effect other than to output a warning.
+
The "configure" script builds the following files for the basic C library:
. Makefile the makefile that builds the library
@@ -543,7 +568,7 @@ script creates the .txt and HTML forms of the documentation from the man pages.
Testing PCRE2
-------------
+-------------
To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
There is another script called RunGrepTest that tests the pcre2grep command.
@@ -635,32 +660,43 @@ with the perltest.sh script, and test 5 checking PCRE2-specific things.
Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
non-UTF mode and UTF-mode with Unicode property support, respectively.
-Test 8 checks some internal offsets and code size features; it is run only when
-the default "link size" of 2 is set (in other cases the sizes change) and when
-Unicode support is enabled.
+Test 8 checks some internal offsets and code size features, but it is run only
+when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
+32-bit modes and for different link sizes, so there are different output files
+for each mode and link size.
Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
16-bit and 32-bit modes. These are tests that generate different output in
8-bit mode. Each pair are for general cases and Unicode support, respectively.
+
Test 13 checks the handling of non-UTF characters greater than 255 by
pcre2_dfa_match() in 16-bit and 32-bit modes.
-Test 14 contains a number of tests that must not be run with JIT. They check,
+Test 14 contains some special UTF and UCP tests that give different output for
+different code unit widths.
+
+Test 15 contains a number of tests that must not be run with JIT. They check,
among other non-JIT things, the match-limiting features of the intepretive
matcher.
-Test 15 is run only when JIT support is not available. It checks that an
+Test 16 is run only when JIT support is not available. It checks that an
attempt to use JIT has the expected behaviour.
-Test 16 is run only when JIT support is available. It checks JIT complete and
+Test 17 is run only when JIT support is available. It checks JIT complete and
partial modes, match-limiting under JIT, and other JIT-specific features.
-Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
+Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
the 8-bit library, without and with Unicode support, respectively.
-Test 19 checks the serialization functions by writing a set of compiled
+Test 20 checks the serialization functions by writing a set of compiled
patterns to a file, and then reloading and checking them.
+Tests 21 and 22 test \C support when the use of \C is not locked out, without
+and with UTF support, respectively. Test 23 tests \C when it is locked out.
+
+Tests 24 and 25 test the experimental pattern conversion functions, without and
+with UTF support, respectively.
+
Character tables
----------------
@@ -679,7 +715,7 @@ specified for ./configure, a different version of pcre2_chartables.c is built
by the program dftables (compiled from dftables.c), which uses the ANSI C
character handling functions such as isalnum(), isalpha(), isupper(),
islower(), etc. to build the table sources. This means that the default C
-locale which is set for your system will control the contents of these default
+locale that is set for your system will control the contents of these default
tables. You can change the default tables by editing pcre2_chartables.c and
then re-building PCRE2. If you do this, you should take care to ensure that the
file does not get automatically re-generated. The best way to do this is to
@@ -734,8 +770,10 @@ The distribution should contain the files listed below.
src/pcre2_compile.c )
src/pcre2_config.c )
src/pcre2_context.c )
+ src/pcre2_convert.c )
src/pcre2_dfa_match.c )
src/pcre2_error.c )
+ src/pcre2_extuni.c )
src/pcre2_find_bracket.c )
src/pcre2_jit_compile.c )
src/pcre2_jit_match.c ) sources for the functions in the library,
@@ -757,6 +795,7 @@ The distribution should contain the files listed below.
src/pcre2_xclass.c )
src/pcre2_printint.c debugging function that is used by pcre2test,
+ src/pcre2_fuzzsupport.c function for (optional) fuzzing support
src/config.h.in template for config.h, when built by "configure"
src/pcre2.h.in template for pcre2.h when built by "configure"
@@ -772,7 +811,6 @@ The distribution should contain the files listed below.
src/pcre2demo.c simple demonstration of coding calls to PCRE2
src/pcre2grep.c source of a grep utility that uses PCRE2
src/pcre2test.c comprehensive test program
- src/pcre2_printint.c part of pcre2test
src/pcre2_jit_test.c JIT test program
(C) Auxiliary files:
@@ -814,7 +852,7 @@ The distribution should contain the files listed below.
libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config
libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config
libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config
- libpcre2posix.pc.in template for libpcre2posix.pc for pkg-config
+ libpcre2-posix.pc.in template for libpcre2-posix.pc for pkg-config
ltmain.sh file used to build a libtool script
missing ) common stub for a few missing GNU programs while
) installing, generated by automake
@@ -837,12 +875,12 @@ The distribution should contain the files listed below.
(E) Auxiliary files for building PCRE2 "by hand"
- pcre2.h.generic ) a version of the public PCRE2 header file
+ src/pcre2.h.generic ) a version of the public PCRE2 header file
) for use in non-"configure" environments
- config.h.generic ) a version of config.h for use in non-"configure"
+ src/config.h.generic ) a version of config.h for use in non-"configure"
) environments
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 01 April 2016
+Last updated: 12 September 2017