New upstream version 10.31

author: Matthew Vernon <matthew@debian.org> 2018-02-24 12:07:04 +0000
committer: Matthew Vernon <matthew@debian.org> 2018-02-24 12:07:04 +0000
commit: e98c3314cf9e05aa99f5e192862ec37f29b7dbb5 (patch)
tree: b69bb3feb63a4fd79ad8a6e55865228f6fde04eb /README
parent: 92b17f0eb8fddd7117c5344a1e1177daec21995a (diff)
1 files changed, 125 insertions, 87 deletions
diff --git a/README b/README
index 03d67f6..52859a9 100644
--- a/README
+++ b/README
@@ -15,8 +15,8 @@ subscribe or manage your subscription here:
 
    https://lists.exim.org/mailman/listinfo/pcre-dev
 
-Please read the NEWS file if you are upgrading from a previous release.
-The contents of this README file are:
+Please read the NEWS file if you are upgrading from a previous release. The
+contents of this README file are:
 
   The PCRE2 APIs
   Documentation for PCRE2
@@ -44,8 +44,8 @@ wrappers.
 
 The distribution does contain a set of C wrapper functions for the 8-bit
 library that are based on the POSIX regular expression API (see the pcre2posix
-man page). These can be found in a library called libpcre2posix. Note that this
-just provides a POSIX calling interface to PCRE2; the regular expressions
+man page). These can be found in a library called libpcre2-posix. Note that
+this just provides a POSIX calling interface to PCRE2; the regular expressions
 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
 and does not give full access to all of PCRE2's facilities.
 
@@ -58,8 +58,8 @@ renamed or pointed at by a link.
 If you are using the POSIX interface to PCRE2 and there is already a POSIX
 regex library installed on your system, as well as worrying about the regex.h
 header file (as mentioned above), you must also take care when linking programs
-to ensure that they link with PCRE2's libpcre2posix library. Otherwise they may
-pick up the POSIX functions of the same name from the other library.
+to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they
+may pick up the POSIX functions of the same name from the other library.
 
 One way of avoiding this confusion is to compile PCRE2 with the addition of
 -Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
@@ -95,10 +95,9 @@ PCRE2 documentation is supplied in two other forms:
 Building PCRE2 on non-Unix-like systems
 ---------------------------------------
 
-For a non-Unix-like system, please read the comments in the file
-NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
-"make" you may be able to build PCRE2 using autotools in the same way as for
-many Unix-like systems.
+For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
+your system supports the use of "configure" and "make" you may be able to build
+PCRE2 using autotools in the same way as for many Unix-like systems.
 
 PCRE2 can also be configured using CMake, which can be run in various ways
 (command line, GUI, etc). This creates Makefiles, solution files, etc. The file
@@ -172,21 +171,24 @@ library. They are also documented in the pcre2build man page.
   give large performance improvements on certain platforms, add --enable-jit to
   the "configure" command. This support is available only for certain hardware
   architectures. If you try to enable it on an unsupported architecture, there
-  will be a compile time error.
-
-. If you do not want to make use of the support for UTF-8 Unicode character
-  strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit
-  library, or UTF-32 Unicode character strings in the 32-bit library, you can
-  add --disable-unicode to the "configure" command. This reduces the size of
-  the libraries. It is not possible to configure one library with Unicode
-  support, and another without, in the same configuration.
+  will be a compile time error. If you are running under SELinux you may also
+  want to add --enable-jit-sealloc, which enables the use of an execmem
+  allocator in JIT that is compatible with SELinux. This has no effect if JIT
+  is not enabled.
+
+. If you do not want to make use of the default support for UTF-8 Unicode
+  character strings in the 8-bit library, UTF-16 Unicode character strings in
+  the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
+  library, you can add --disable-unicode to the "configure" command. This
+  reduces the size of the libraries. It is not possible to configure one
+  library with Unicode support, and another without, in the same configuration.
+  It is also not possible to use --enable-ebcdic (see below) with Unicode
+  support, so if this option is set, you must also use --disable-unicode.
 
   When Unicode support is available, the use of a UTF encoding still has to be
   enabled by setting the PCRE2_UTF option at run time or starting a pattern
   with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
-  either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. It is
-  not possible to use both --enable-unicode and --enable-ebcdic at the same
-  time.
+  either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
 
   As well as supporting UTF strings, Unicode support includes support for the
   \P, \p, and \X sequences that recognize Unicode character properties.
@@ -196,20 +198,14 @@ library. They are also documented in the pcre2build man page.
   or starting a pattern with (*UCP).
 
 . You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
-  of the preceding, or any of the Unicode newline sequences, as indicating the
-  end of a line. Whatever you specify at build time is the default; the caller
-  of PCRE2 can change the selection at run time. The default newline indicator
-  is a single LF character (the Unix standard). You can specify the default
-  newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf,
-  --enable-newline-is-crlf, --enable-newline-is-anycrlf, or
-  --enable-newline-is-any to the "configure" command, respectively.
-
-  If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
-  the standard tests will fail, because the lines in the test files end with
-  LF. Even if the files are edited to change the line endings, there are likely
-  to be some failures. With --enable-newline-is-anycrlf or
-  --enable-newline-is-any, many tests should succeed, but there may be some
-  failures.
+  of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
+  character as indicating the end of a line. Whatever you specify at build time
+  is the default; the caller of PCRE2 can change the selection at run time. The
+  default newline indicator is a single LF character (the Unix standard). You
+  can specify the default newline indicator by adding --enable-newline-is-cr,
+  --enable-newline-is-lf, --enable-newline-is-crlf,
+  --enable-newline-is-anycrlf, --enable-newline-is-any, or
+  --enable-newline-is-nul to the "configure" command, respectively.
 
 . By default, the sequence \R in a pattern matches any Unicode line ending
   sequence. This is independent of the option specifying what PCRE2 considers
@@ -231,49 +227,44 @@ library. They are also documented in the pcre2build man page.
 
   --with-parens-nest-limit=500
 
-. PCRE2 has a counter that can be set to limit the amount of resources it uses
-  when matching a pattern. If the limit is exceeded during a match, the match
-  fails. The default is ten million. You can change the default by setting, for
-  example,
+. PCRE2 has a counter that can be set to limit the amount of computing resource
+  it uses when matching a pattern. If the limit is exceeded during a match, the
+  match fails. The default is ten million. You can change the default by
+  setting, for example,
 
   --with-match-limit=500000
 
   on the "configure" command. This is just the default; individual calls to
-  pcre2_match() can supply their own value. There is more discussion on the
-  pcre2api man page.
+  pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
+  discussion in the pcre2api man page (search for pcre2_set_match_limit).
+
+. There is a separate counter that limits the depth of nested backtracking
+  during a matching process, which indirectly limits the amount of heap memory
+  that is used. This also has a default of ten million, which is essentially
+  "unlimited". You can change the default by setting, for example,
+
+  --with-match-limit-depth=5000
 
-. There is a separate counter that limits the depth of recursive function calls
-  during a matching process. This also has a default of ten million, which is
-  essentially "unlimited". You can change the default by setting, for example,
+  There is more discussion in the pcre2api man page (search for
+  pcre2_set_depth_limit).
 
-  --with-match-limit-recursion=500000
+. You can also set an explicit limit on the amount of heap memory used by
+  the pcre2_match() interpreter:
 
-  Recursive function calls use up the runtime stack; running out of stack can
-  cause programs to crash in strange ways. There is a discussion about stack
-  sizes in the pcre2stack man page.
+  --with-heap-limit=500
+
+  The units are kilobytes. This limit does not apply when the JIT optimization
+  (which has its own memory control features) is used. There is more discussion
+  on the pcre2api man page (search for pcre2_set_heap_limit).
 
 . In the 8-bit library, the default maximum compiled pattern size is around
-  64K. You can increase this by adding --with-link-size=3 to the "configure"
-  command. PCRE2 then uses three bytes instead of two for offsets to different
-  parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
-  the same as --with-link-size=4, which (in both libraries) uses four-byte
-  offsets. Increasing the internal link size reduces performance in the 8-bit
-  and 16-bit libraries. In the 32-bit library, the link size setting is
-  ignored, as 4-byte offsets are always used.
-
-. You can build PCRE2 so that its internal match() function that is called from
-  pcre2_match() does not call itself recursively. Instead, it uses memory
-  blocks obtained from the heap to save data that would otherwise be saved on
-  the stack. To build PCRE2 like this, use
-
-  --disable-stack-for-recursion
-
-  on the "configure" command. PCRE2 runs more slowly in this mode, but it may
-  be necessary in environments with limited stack sizes. This applies only to
-  the normal execution of the pcre2_match() function; if JIT support is being
-  successfully used, it is not relevant. Equally, it does not apply to
-  pcre2_dfa_match(), which does not use deeply nested recursion. There is a
-  discussion about stack sizes in the pcre2stack man page.
+  64K bytes. You can increase this by adding --with-link-size=3 to the
+  "configure" command. PCRE2 then uses three bytes instead of two for offsets
+  to different parts of the compiled pattern. In the 16-bit library,
+  --with-link-size=3 is the same as --with-link-size=4, which (in both
+  libraries) uses four-byte offsets. Increasing the internal link size reduces
+  performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
+  link size setting is ignored, as 4-byte offsets are always used.
 
 . For speed, PCRE2 uses four tables for manipulating and identifying characters
   whose code point values are less than 256. By default, it uses a set of
@@ -339,12 +330,23 @@ library. They are also documented in the pcre2build man page.
 
   Of course, the relevant libraries must be installed on your system.
 
-. The default size (in bytes) of the internal buffer used by pcre2grep can be
-  set by, for example:
+. The default starting size (in bytes) of the internal buffer used by pcre2grep
+  can be set by, for example:
 
   --with-pcre2grep-bufsize=51200
 
-  The value must be a plain integer. The default is 20480.
+  The value must be a plain integer. The default is 20480. The amount of memory
+  used by pcre2grep is actually three times this number, to allow for "before"
+  and "after" lines. If very long lines are encountered, the buffer is
+  automatically enlarged, up to a fixed maximum size.
+
+. The default maximum size of pcre2grep's internal buffer can be set by, for
+  example:
+
+  --with-pcre2grep-max-bufsize=2097152
+
+  The default is either 1048576 or the value of --with-pcre2grep-bufsize,
+  whichever is the larger.
 
 . It is possible to compile pcre2test so that it links with the libreadline
   or libedit libraries, by specifying, respectively,
@@ -369,6 +371,29 @@ library. They are also documented in the pcre2build man page.
   tgetflag, or tgoto, this is the problem, and linking with the ncurses library
   should fix it.
 
+. There is a special option called --enable-fuzz-support for use by people who
+  want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
+  library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
+  be built, but not installed. This contains a single function called
+  LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
+  length of the string. When called, this function tries to compile the string
+  as a pattern, and if that succeeds, to match it. This is done both with no
+  options and with some random options bits that are generated from the string.
+  Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
+  be created. This is normally run under valgrind or used when PCRE2 is
+  compiled with address sanitizing enabled. It calls the fuzzing function and
+  outputs information about it is doing. The input strings are specified by
+  arguments: if an argument starts with "=" the rest of it is a literal input
+  string. Otherwise, it is assumed to be a file name, and the contents of the
+  file are the test string.
+
+. Releases before 10.30 could be compiled with --disable-stack-for-recursion,
+  which caused pcre2_match() to use individual blocks on the heap for
+  backtracking instead of recursive function calls (which use the stack). This
+  is now obsolete since pcre2_match() was refactored always to use the heap (in
+  a much more efficient way than before). This option is retained for backwards
+  compatibility, but has no effect other than to output a warning.
+
 The "configure" script builds the following files for the basic C library:
 
 . Makefile             the makefile that builds the library
@@ -543,7 +568,7 @@ script creates the .txt and HTML forms of the documentation from the man pages.
 
 
 Testing PCRE2
-------------
+-------------
 
 To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
 There is another script called RunGrepTest that tests the pcre2grep command.
@@ -635,32 +660,43 @@ with the perltest.sh script, and test 5 checking PCRE2-specific things.
 Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
 non-UTF mode and UTF-mode with Unicode property support, respectively.
 
-Test 8 checks some internal offsets and code size features; it is run only when
-the default "link size" of 2 is set (in other cases the sizes change) and when
-Unicode support is enabled.
+Test 8 checks some internal offsets and code size features, but it is run only
+when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
+32-bit modes and for different link sizes, so there are different output files
+for each mode and link size.
 
 Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
 16-bit and 32-bit modes. These are tests that generate different output in
 8-bit mode. Each pair are for general cases and Unicode support, respectively.
+
 Test 13 checks the handling of non-UTF characters greater than 255 by
 pcre2_dfa_match() in 16-bit and 32-bit modes.
 
-Test 14 contains a number of tests that must not be run with JIT. They check,
+Test 14 contains some special UTF and UCP tests that give different output for
+different code unit widths.
+
+Test 15 contains a number of tests that must not be run with JIT. They check,
 among other non-JIT things, the match-limiting features of the intepretive
 matcher.
 
-Test 15 is run only when JIT support is not available. It checks that an
+Test 16 is run only when JIT support is not available. It checks that an
 attempt to use JIT has the expected behaviour.
 
-Test 16 is run only when JIT support is available. It checks JIT complete and
+Test 17 is run only when JIT support is available. It checks JIT complete and
 partial modes, match-limiting under JIT, and other JIT-specific features.
 
-Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
+Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
 the 8-bit library, without and with Unicode support, respectively.
 
-Test 19 checks the serialization functions by writing a set of compiled
+Test 20 checks the serialization functions by writing a set of compiled
 patterns to a file, and then reloading and checking them.
 
+Tests 21 and 22 test \C support when the use of \C is not locked out, without
+and with UTF support, respectively. Test 23 tests \C when it is locked out.
+
+Tests 24 and 25 test the experimental pattern conversion functions, without and
+with UTF support, respectively.
+
 
 Character tables
 ----------------
@@ -679,7 +715,7 @@ specified for ./configure, a different version of pcre2_chartables.c is built
 by the program dftables (compiled from dftables.c), which uses the ANSI C
 character handling functions such as isalnum(), isalpha(), isupper(),
 islower(), etc. to build the table sources. This means that the default C
-locale which is set for your system will control the contents of these default
+locale that is set for your system will control the contents of these default
 tables. You can change the default tables by editing pcre2_chartables.c and
 then re-building PCRE2. If you do this, you should take care to ensure that the
 file does not get automatically re-generated. The best way to do this is to
@@ -734,8 +770,10 @@ The distribution should contain the files listed below.
   src/pcre2_compile.c      )
   src/pcre2_config.c       )
   src/pcre2_context.c      )
+  src/pcre2_convert.c      )
   src/pcre2_dfa_match.c    )
   src/pcre2_error.c        )
+  src/pcre2_extuni.c       )
   src/pcre2_find_bracket.c )
   src/pcre2_jit_compile.c  )
   src/pcre2_jit_match.c    ) sources for the functions in the library,
@@ -757,6 +795,7 @@ The distribution should contain the files listed below.
   src/pcre2_xclass.c       )
 
   src/pcre2_printint.c     debugging function that is used by pcre2test,
+  src/pcre2_fuzzsupport.c  function for (optional) fuzzing support
 
   src/config.h.in          template for config.h, when built by "configure"
   src/pcre2.h.in           template for pcre2.h when built by "configure"
@@ -772,7 +811,6 @@ The distribution should contain the files listed below.
   src/pcre2demo.c          simple demonstration of coding calls to PCRE2
   src/pcre2grep.c          source of a grep utility that uses PCRE2
   src/pcre2test.c          comprehensive test program
-  src/pcre2_printint.c     part of pcre2test
   src/pcre2_jit_test.c     JIT test program
 
 (C) Auxiliary files:
@@ -814,7 +852,7 @@ The distribution should contain the files listed below.
   libpcre2-8.pc.in         template for libpcre2-8.pc for pkg-config
   libpcre2-16.pc.in        template for libpcre2-16.pc for pkg-config
   libpcre2-32.pc.in        template for libpcre2-32.pc for pkg-config
-  libpcre2posix.pc.in      template for libpcre2posix.pc for pkg-config
+  libpcre2-posix.pc.in     template for libpcre2-posix.pc for pkg-config
   ltmain.sh                file used to build a libtool script
   missing                  ) common stub for a few missing GNU programs while
                            )   installing, generated by automake
@@ -837,12 +875,12 @@ The distribution should contain the files listed below.
 
 (E) Auxiliary files for building PCRE2 "by hand"
 
-  pcre2.h.generic         ) a version of the public PCRE2 header file
+  src/pcre2.h.generic     ) a version of the public PCRE2 header file
                           )   for use in non-"configure" environments
-  config.h.generic        ) a version of config.h for use in non-"configure"
+  src/config.h.generic    ) a version of config.h for use in non-"configure"
                           )   environments
 
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 01 April 2016
+Last updated: 12 September 2017
author	Matthew Vernon <matthew@debian.org>	2018-02-24 12:07:04 +0000
committer	Matthew Vernon <matthew@debian.org>	2018-02-24 12:07:04 +0000
commit	e98c3314cf9e05aa99f5e192862ec37f29b7dbb5 (patch)
tree	b69bb3feb63a4fd79ad8a6e55865228f6fde04eb /README
parent	92b17f0eb8fddd7117c5344a1e1177daec21995a (diff)