New upstream version 10.31

author: Matthew Vernon <matthew@debian.org> 2018-02-24 12:07:04 +0000
committer: Matthew Vernon <matthew@debian.org> 2018-02-24 12:07:04 +0000
commit: e98c3314cf9e05aa99f5e192862ec37f29b7dbb5 (patch)
tree: b69bb3feb63a4fd79ad8a6e55865228f6fde04eb /ChangeLog
parent: 92b17f0eb8fddd7117c5344a1e1177daec21995a (diff)
1 files changed, 737 insertions, 0 deletions
diff --git a/ChangeLog b/ChangeLog
index 3dcebb9..7f520bf 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -2,6 +2,743 @@ Change Log for PCRE2
 --------------------
 
 
+Version 10.31 12-February-2018
+------------------------------
+
+1. Fix typo (missing ]) in VMS code in pcre2test.c.
+
+2. Replace the replicated code for matching extended Unicode grapheme sequences
+(which got a lot more complicated by change 10.30/49) by a single subroutine
+that is called by both pcre2_match() and pcre2_dfa_match().
+
+3. Add idempotent guard to pcre2_internal.h.
+
+4. Add new pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and
+PCRE2_CONFIG_COMPILED_WIDTHS.
+
+5. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is
+defined (e.g. by --enable-never-backslash-C).
+
+6. Defined public names for all the pcre2_compile() error numbers, and used
+the public names in pcre2_convert.c.
+
+7. Fixed a small memory leak in pcre2test (convert contexts).
+
+8. Added two casts to compile.c and one to match.c to avoid compiler warnings.
+
+9. Added code to pcre2grep when compiled under VMS to set the symbol
+PCRE2GREP_RC to the exit status, because VMS does not distinguish between
+exit(0) and exit(1).
+
+10. Added the -LM (list modifiers) option to pcre2test. Also made -C complain
+about a bad option only if the following argument item does not start with a
+hyphen.
+
+11. pcre2grep was truncating components of file names to 128 characters when
+processing files with the -r option, and also (some very odd code) truncating
+path names to 512 characters. There is now a check on the absolute length of
+full path file names, which may be up to 2047 characters long.
+
+12. When an assertion contained (*ACCEPT) it caused all open capturing groups
+to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to
+misbehaviour for subsequent references to groups that started outside the
+assertion. ACCEPT in an assertion now closes only those groups that were
+started within that assertion. Fixes oss-fuzz issues 3852 and 3891.
+
+13. Multiline matching in pcre2grep was misbehaving if the pattern matched
+within a line, and then matched again at the end of the line and over into
+subsequent lines. Behaviour was different with and without colouring, and
+sometimes context lines were incorrectly printed and/or line endings were lost.
+All these issues should now be fixed.
+
+14. If --line-buffered was specified for pcre2grep when input was from a
+compressed file (.gz or .bz2) a segfault occurred. (Line buffering should be
+ignored for compressed files.)
+
+15. Although pcre2_jit_match checks whether the pattern is compiled
+in a given mode, it was also expected that at least one mode is available.
+This is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION
+when the pattern is not optimized by JIT at all.
+
+16. The line number and related variables such as match counts in pcre2grep
+were all int variables, causing overflow when files with more than 2147483647
+lines were processed (assuming 32-bit ints). They have all been changed to
+unsigned long ints.
+
+17. If a backreference with a minimum repeat count of zero was first in a
+pattern, apart from assertions, an incorrect first matching character could be
+recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
+as the first character of a match.
+
+18. Characters in a leading positive assertion are considered for recording a
+first character of a match when the rest of the pattern does not provide one.
+However, a character in a non-assertive group within a leading assertion such
+as in the pattern /(?=(a))\1?b/ caused this process to fail. This was an
+infelicity rather than an outright bug, because it did not affect the result of
+a match, just its speed. (In fact, in this case, the starting 'a' was
+subsequently picked up in the study.)
+
+19. A minor tidy in pcre2_match(): making all PCRE2_ERROR_ returns use "return"
+instead of "RRETURN" saves unwinding the backtracks in these cases (only one
+didn't).
+
+20. Allocate a single callout block on the stack at the start of pcre2_match()
+and set its never-changing fields once only. Do the same for pcre2_dfa_match().
+
+21. Save the extra compile options (set in the compile context) with the
+compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS
+to retrieve them, and update pcre2test to show them.
+
+22. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
+field callout_flags in callout blocks. The bits are set by pcre2_match(), but
+not by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts
+if the callout_extra subject modifier is set. These bits are provided to help
+with tracking how a backtracking match is proceeding.
+
+23. Updated the pcre2demo.c demonstration program, which was missing the extra
+code for -g that handles the case when \K in an assertion causes the match to
+end at the original start point. Also arranged for it to detect when \K causes
+the end of a match to be before its start.
+
+24. Similar to 23 above, strange things (including loops) could happen in
+pcre2grep when \K was used in an assertion when --colour was used or in
+multiline mode. The "end at original start point" bug is fixed, and if the end
+point is found to be before the start point, they are swapped.
+
+25. When PCRE2_FIRSTLINE without PCRE2_NO_START_OPTIMIZE was used in non-JIT
+matching (both pcre2_match() and pcre2_dfa_match()) and the matched string
+started with the first code unit of a newline sequence, matching failed because
+it was not tried at the newline.
+
+26. Code for giving up a non-partial match after failing to find a starting
+code unit anywhere in the subject was missing when searching for one of a
+number of code units (the bitmap case) in both pcre2_match() and
+pcre2_dfa_match(). This was a missing optimization rather than a bug.
+
+27. Tidied up the ACROSSCHAR macro to be like FORWARDCHAR and BACKCHAR, using a
+pointer argument rather than a code unit value. This should not have affected
+the generated code.
+
+28. The JIT compiler has been updated.
+
+29. Avoid pointer overflow for unset captures in pcre2_substring_list_get().
+This could not actually cause a crash because it was always used in a memcpy()
+call with zero length.
+
+30. Some internal structures have a variable-length ovector[] as their last
+element. Their actual memory is obtained dynamically, giving an ovector of
+appropriate length. However, they are defined in the structure as
+ovector[NUMBER], where NUMBER is large so that array bound checkers don't
+grumble. The value of NUMBER was 10000, but a fuzzer exceeded 5000 capturing
+groups, making the ovector larger than this. The number has been increased to
+131072, which allows for the maximum number of captures (65535) plus the
+overall match. This fixes oss-fuzz issue 5415.
+
+31. Auto-possessification at the end of a capturing group was dependent on what
+follows the group (e.g. /(a+)b/ would auto-possessify the a+) but this caused
+incorrect behaviour when the group was called recursively from elsewhere in the
+pattern where something different might follow. This bug is an unforseen
+consequence of change #1 for 10.30 - the implementation of backtracking into
+recursions. Iterators at the ends of capturing groups are no longer considered
+for auto-possessification if the pattern contains any recursions. Fixes
+Bugzilla #2232.
+
+
+Version 10.30 14-August-2017
+----------------------------
+
+1. The main interpreter, pcre2_match(), has been refactored into a new version
+that does not use recursive function calls (and therefore the stack) for
+remembering backtracking positions. This makes --disable-stack-for-recursion a
+NOOP. The new implementation allows backtracking into recursive group calls in
+patterns, making it more compatible with Perl, and also fixes some other
+hard-to-do issues such as #1887 in Bugzilla. The code is also cleaner because
+the old code had a number of fudges to try to reduce stack usage. It seems to
+run no slower than the old code.
+
+A number of bugs in the refactored code were subsequently fixed during testing
+before release, but after the code was made available in the repository. These
+bugs were never in fully released code, but are noted here for the record.
+
+  (a) If a pattern had fewer capturing parentheses than the ovector supplied in
+      the match data block, a memory error (detectable by ASAN) occurred after
+      a match, because the external block was being set from non-existent
+      internal ovector fields. Fixes oss-fuzz issue 781.
+
+  (b) A pattern with very many capturing parentheses (when the internal frame
+      size was greater than the initial frame vector on the stack) caused a
+      crash. A vector on the heap is now set up at the start of matching if the
+      vector on the stack is not big enough to handle at least 10 frames.
+      Fixes oss-fuzz issue 783.
+
+  (c) Handling of (*VERB)s in recursions was wrong in some cases.
+
+  (d) Captures in negative assertions that were used as conditions were not
+      happening if the assertion matched via (*ACCEPT).
+
+  (e) Mark values were not being passed out of recursions.
+
+  (f) Refactor some code in do_callout() to avoid picky compiler warnings about
+      negative indices. Fixes oss-fuzz issue 1454.
+
+  (g) Similarly refactor the way the variable length ovector is addressed for
+      similar reasons. Fixes oss-fuzz issue 1465.
+
+2. Now that pcre2_match() no longer uses recursive function calls (see above),
+the "match limit recursion" value seems misnamed. It still exists, and limits
+the depth of tree that is searched. To avoid future confusion, it has been
+renamed as "depth limit" in all relevant places (--with-depth-limit,
+(*LIMIT_DEPTH), pcre2_set_depth_limit(), etc) but the old names are still
+available for backwards compatibility.
+
+3. Hardened pcre2test so as to reduce the number of bugs reported by fuzzers:
+
+  (a) Check for malloc failures when getting memory for the ovector (POSIX) or
+      the match data block (non-POSIX).
+
+4. In the 32-bit library in non-UTF mode, an attempt to find a Unicode property
+for a character with a code point greater than 0x10ffff (the Unicode maximum)
+caused a crash.
+
+5. If a lookbehind assertion that contained a back reference to a group
+appearing later in the pattern was compiled with the PCRE2_ANCHORED option,
+undefined actions (often a segmentation fault) could occur, depending on what
+other options were set. An example assertion is (?<!\1(abc)) where the
+reference \1 precedes the group (abc). This fixes oss-fuzz issue 865.
+
+6. Added the PCRE2_INFO_FRAMESIZE item to pcre2_pattern_info() and arranged for
+pcre2test to use it to output the frame size when the "framesize" modifier is
+given.
+
+7. Reworked the recursive pattern matching in the JIT compiler to follow the
+interpreter changes.
+
+8. When the zero_terminate modifier was specified on a pcre2test subject line
+for global matching, unpredictable things could happen. For example, in UTF-8
+mode, the pattern //g,zero_terminate read random memory when matched against an
+empty string with zero_terminate. This was a bug in pcre2test, not the library.
+
+9. Moved some Windows-specific code in pcre2grep (introduced in 10.23/13) out
+of the section that is compiled when Unix-style directory scanning is
+available, and into a new section that is always compiled for Windows.
+
+10. In pcre2test, explicitly close the file after an error during serialization
+or deserialization (the "load" or "save" commands).
+
+11. Fix memory leak in pcre2_serialize_decode() when the input is invalid.
+
+12. Fix potential NULL dereference in pcre2_callout_enumerate() if called with
+a NULL pattern pointer when Unicode support is available.
+
+13. When the 32-bit library was being tested by pcre2test, error messages that
+were longer than 64 code units could cause a buffer overflow. This was a bug in
+pcre2test.
+
+14. The alternative matching function, pcre2_dfa_match() misbehaved if it
+encountered a character class with a possessive repeat, for example [a-f]{3}+.
+
+15. The depth (formerly recursion) limit now applies to DFA matching (as
+of 10.23/36); pcre2test has been upgraded so that \=find_limits works with DFA
+matching to find the minimum value for this limit.
+
+16. Since 10.21, if pcre2_match() was called with a null context, default
+memory allocation functions were used instead of whatever was used when the
+pattern was compiled.
+
+17. Changes to the pcre2test "memory" modifier on a subject line. These apply
+only to pcre2_match():
+
+  (a) Warn if null_context is set on both pattern and subject, because the
+      memory details cannot then be shown.
+
+  (b) Remember (up to a certain number of) memory allocations and their
+      lengths, and list only the lengths, so as to be system-independent.
+      (In practice, the new interpreter never has more than 2 blocks allocated
+      simultaneously.)
+
+18. Make pcre2test detect an error return from pcre2_get_error_message(), give
+a message, and abandon the run (this would have detected #13 above).
+
+19. Implemented PCRE2_ENDANCHORED.
+
+20. Applied Jason Hood's patches (slightly modified) to pcre2grep, to implement
+the --output=text (-O) option and the inbuilt callout echo.
+
+21. Extend auto-anchoring etc. to ignore groups with a zero qualifier and
+single-branch conditions with a false condition (e.g. DEFINE) at the start of a
+branch. For example, /(?(DEFINE)...)^A/ and /(...){0}^B/ are now flagged as
+anchored.
+
+22. Added an explicit limit on the amount of heap used by pcre2_match(), set by
+pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). Upgraded pcre2test to show the
+heap limit along with other pattern information, and to find the minimum when
+the find_limits modifier is set.
+
+23. Write to the last 8 bytes of the pcre2_real_code structure when a compiled
+pattern is set up so as to initialize any padding the compiler might have
+included. This avoids valgrind warnings when a compiled pattern is copied, in
+particular when it is serialized.
+
+24. Remove a redundant line of code left in accidentally a long time ago.
+
+25. Remove a duplication typo in pcre2_tables.c
+
+26. Correct an incorrect cast in pcre2_valid_utf.c
+
+27. Update pcre2test, remove some unused code in pcre2_match(), and upgrade the
+tests to improve coverage.
+
+28. Some fixes/tidies as a result of looking at Coverity Scan output:
+
+    (a) Typo: ">" should be ">=" in opcode check in pcre2_auto_possess.c.
+    (b) Added some casts to avoid "suspicious implicit sign extension".
+    (c) Resource leaks in pcre2test in rare error cases.
+    (d) Avoid warning for never-use case OP_TABLE_LENGTH which is just a fudge
+        for checking at compile time that tables are the right size.
+    (e) Add missing "fall through" comment.
+
+29. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features.
+
+30. Implement (?n: for PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
+
+31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
+pcre2test, a crash could occur.
+
+32. Make -bigstack in RunTest allocate a 64Mb stack (instead of 16 MB) so that
+all the tests can run with clang's sanitizing options.
+
+33. Implement extra compile options in the compile context and add the first
+one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
+
+34. Implement newline type PCRE2_NEWLINE_NUL.
+
+35. A lookbehind assertion that had a zero-length branch caused undefined
+behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859.
+
+36. The match limit value now also applies to pcre2_dfa_match() as there are
+patterns that can use up a lot of resources without necessarily recursing very
+deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
+
+37. Implement PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
+
+38. Fix returned offsets from regexec() when REG_STARTEND is used with a
+starting offset greater than zero.
+
+39. Implement REG_PEND (GNU extension) for the POSIX wrapper.
+
+40. Implement the subject_literal modifier in pcre2test, and allow jitstack on
+pattern lines.
+
+41. Implement PCRE2_LITERAL and use it to support REG_NOSPEC.
+
+42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit
+of pcre2grep.
+
+43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL,
+PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs:
+
+    (a) The -F option did not work for fixed strings containing \E.
+    (b) The -w option did not work for patterns with multiple branches.
+
+44. Added configuration options for the SELinux compatible execmem allocator in
+JIT.
+
+45. Increased the limit for searching for a "must be present" code unit in
+subjects from 1000 to 2000 for 8-bit searches, since they use memchr() and are
+much faster.
+
+46. Arrange for anchored patterns to record and use "first code unit" data,
+because this can give a fast "no match" without searching for a "required code
+unit". Previously only non-anchored patterns did this.
+
+47. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0.
+
+48. Add the callout_no_where modifier to pcre2test.
+
+49. Update extended grapheme breaking rules to the latest set that are in
+Unicode Standard Annex #29.
+
+50. Added experimental foreign pattern conversion facilities
+(pcre2_pattern_convert() and friends).
+
+51. Change the macro FWRITE, used in pcre2grep, to FWRITE_IGNORE because FWRITE
+is defined in a system header in cygwin. Also modified some of the #ifdefs in
+pcre2grep related to Windows and Cygwin support.
+
+52. Change 3(g) for 10.23 was a bit too zealous. If a hyphen that follows a
+character class is the last character in the class, Perl does not give a
+warning. PCRE2 now also treats this as a literal.
+
+53. Related to 52, though PCRE2 was throwing an error for [[:digit:]-X] it was
+not doing so for [\d-X] (and similar escapes), as is documented.
+
+54. Fixed a MIPS issue in the JIT compiler reported by Joshua Kinard.
+
+55. Fixed a "maybe uninitialized" warning for class_uchardata in \p handling in
+pcre2_compile() which could never actually trigger (code should have been cut
+out when Unicode support is disabled).
+
+
+Version 10.23 14-February-2017
+------------------------------
+
+1. Extended pcre2test with the utf8_input modifier so that it is able to
+generate all possible 16-bit and 32-bit code unit values in non-UTF modes.
+
+2. In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode), without
+PCRE2_UCP set, a negative character type such as \D in a positive class should
+cause all characters greater than 255 to match, whatever else is in the class.
+There was a bug that caused this not to happen if a Unicode property item was
+added to such a class, for example [\D\P{Nd}] or [\W\pL].
+
+3. There has been a major re-factoring of the pcre2_compile.c file. Most syntax
+checking is now done in the pre-pass that identifies capturing groups. This has
+reduced the amount of duplication and made the code tidier. While doing this,
+some minor bugs and Perl incompatibilities were fixed, including:
+
+  (a) \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored instead
+      of giving an invalid quantifier error.
+
+  (b) {0} can now be used after a group in a lookbehind assertion; previously
+      this caused an "assertion is not fixed length" error.
+
+  (c) Perl always treats (?(DEFINE) as a "define" group, even if a group with
+      the name "DEFINE" exists. PCRE2 now does likewise.
+
+  (d) A recursion condition test such as (?(R2)...) must now refer to an
+      existing subpattern.
+
+  (e) A conditional recursion test such as (?(R)...) misbehaved if there was a
+      group whose name began with "R".
+
+  (f) When testing zero-terminated patterns under valgrind, the terminating
+      zero is now marked "no access". This catches bugs that would otherwise
+      show up only with non-zero-terminated patterns.
+
+  (g) A hyphen appearing immediately after a POSIX character class (for example
+      /[[:ascii:]-z]/) now generates an error. Perl does accept this as a
+      literal, but gives a warning, so it seems best to fail it in PCRE.
+
+  (h) An empty \Q\E sequence may appear after a callout that precedes an
+      assertion condition (it is, of course, ignored).
+
+One effect of the refactoring is that some error numbers and messages have
+changed, and the pattern offset given for compiling errors is not always the
+right-most character that has been read. In particular, for a variable-length
+lookbehind assertion it now points to the start of the assertion. Another
+change is that when a callout appears before a group, the "length of next
+pattern item" that is passed now just gives the length of the opening
+parenthesis item, not the length of the whole group. A length of zero is now
+given only for a callout at the end of the pattern. Automatic callouts are no
+longer inserted before and after explicit callouts in the pattern.
+
+A number of bugs in the refactored code were subsequently fixed during testing
+before release, but after the code was made available in the repository. Many
+of the bugs were discovered by fuzzing testing. Several of them were related to
+the change from assuming a zero-terminated pattern (which previously had
+required non-zero terminated strings to be copied). These bugs were never in
+fully released code, but are noted here for the record.
+
+  (a) An overall recursion such as (?0) inside a lookbehind assertion was not
+      being diagnosed as an error.
+
+  (b) In utf mode, the length of a *MARK (or other verb) name was being checked
+      in characters instead of code units, which could lead to bad code being
+      compiled, leading to unpredictable behaviour.
+
+  (c) In extended /x mode, characters whose code was greater than 255 caused
+      a lookup outside one of the global tables. A similar bug existed for wide
+      characters in *VERB names.
+
+  (d) The amount of memory needed for a compiled pattern was miscalculated if a
+      lookbehind contained more than one toplevel branch and the first branch
+      was of length zero.
+
+  (e) In UTF-8 or UTF-16 modes with PCRE2_EXTENDED (/x) set and a non-zero-
+      terminated pattern, if a # comment ran on to the end of the pattern, one
+      or more code units past the end were being read.
+
+  (f) An unterminated repeat at the end of a non-zero-terminated pattern (e.g.
+      "{2,2") could cause reading beyond the pattern.
+
+  (g) When reading a callout string, if the end delimiter was at the end of the
+      pattern one further code unit was read.
+
+  (h) An unterminated number after \g' could cause reading beyond the pattern.
+
+  (i) An insufficient memory size was being computed for compiling with
+      PCRE2_AUTO_CALLOUT.
+
+  (j) A conditional group with an assertion condition used more memory than was
+      allowed for it during parsing, so too many of them could therefore
+      overrun a buffer.
+
+  (k) If parsing a pattern exactly filled the buffer, the internal test for
+      overrun did not check when the final META_END item was added.
+
+  (l) If a lookbehind contained a subroutine call, and the called group
+      contained an option setting such as (?s), and the PCRE2_ANCHORED option
+      was set, unpredictable behaviour could occur. The underlying bug was
+      incorrect code and insufficient checking while searching for the end of
+      the called subroutine in the parsed pattern.
+
+  (m) Quantifiers following (*VERB)s were not being diagnosed as errors.
+
+  (n) The use of \Q...\E in a (*VERB) name when PCRE2_ALT_VERBNAMES and
+      PCRE2_AUTO_CALLOUT were both specified caused undetermined behaviour.
+
+  (o) If \Q was preceded by a quantified item, and the following \E was
+      followed by '?' or '+', and there was at least one literal character
+      between them, an internal error "unexpected repeat" occurred (example:
+      /.+\QX\E+/).
+
+  (p) A buffer overflow could occur while sorting the names in the group name
+      list (depending on the order in which the names were seen).
+
+  (q) A conditional group that started with a callout was not doing the right
+      check for a following assertion, leading to compiling bad code. Example:
+      /(?(C'XX))?!XX/
+
+  (r) If a character whose code point was greater than 0xffff appeared within
+      a lookbehind that was within another lookbehind, the calculation of the
+      lookbehind length went wrong and could provoke an internal error.
+
+  (t) The sequence \E- or \Q\E- after a POSIX class in a character class caused
+      an internal error. Now the hyphen is treated as a literal.
+
+4. Back references are now permitted in lookbehind assertions when there are
+no duplicated group numbers (that is, (?| has not been used), and, if the
+reference is by name, there is only one group of that name. The referenced
+group must, of course be of fixed length.
+
+5. pcre2test has been upgraded so that, when run under valgrind with valgrind
+support enabled, reading past the end of the pattern is detected, both when
+compiling and during callout processing.
+
+6. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back
+reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does
+not recognize this syntax.
+
+7. Automatic callouts are no longer generated before and after callouts in the
+pattern.
+
+8. When pcre2test was outputing information from a callout, the caret indicator
+for the current position in the subject line was incorrect if it was after an
+escape sequence for a character whose code point was greater than \x{ff}.
+
+9. Change 19 for 10.22 had a typo (PCRE_STATIC_RUNTIME should be
+PCRE2_STATIC_RUNTIME). Fix from David Gaussmann.
+
+10. Added --max-buffer-size to pcre2grep, to allow for automatic buffer
+expansion when long lines are encountered. Original patch by Dmitry
+Cherniachenko.
+
+11. If pcre2grep was compiled with JIT support, but the library was compiled
+without it (something that neither ./configure nor CMake allow, but it can be
+done by editing config.h), pcre2grep was giving a JIT error. Now it detects
+this situation and does not try to use JIT.
+
+12. Added some "const" qualifiers to variables in pcre2grep.
+
+13. Added Dmitry Cherniachenko's patch for colouring output in Windows
+(untested by me). Also, look for GREP_COLOUR or GREP_COLOR if the environment
+variables PCRE2GREP_COLOUR and PCRE2GREP_COLOR are not found.
+
+14. Add the -t (grand total) option to pcre2grep.
+
+15. A number of bugs have been mended relating to match start-up optimizations
+when the first thing in a pattern is a positive lookahead. These all applied
+only when PCRE2_NO_START_OPTIMIZE was *not* set:
+
+    (a) A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed
+        both an initial 'X' and a following 'X'.
+    (b) Some patterns starting with an assertion that started with .* were
+        incorrectly optimized as having to match at the start of the subject or
+        after a newline. There are cases where this is not true, for example,
+        (?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that
+        start with spaces. Starting .* in an assertion is no longer taken as an
+        indication of matching at the start (or after a newline).
+
+16. The "offset" modifier in pcre2test was not being ignored (as documented)
+when the POSIX API was in use.
+
+17. Added --enable-fuzz-support to "configure", causing an non-installed
+library containing a test function that can be called by fuzzers to be
+compiled. A non-installed  binary to run the test function locally, called
+pcre2fuzzcheck is also compiled.
+
+18. A pattern with PCRE2_DOTALL (/s) set but not PCRE2_NO_DOTSTAR_ANCHOR, and
+which started with .* inside a positive lookahead was incorrectly being
+compiled as implicitly anchored.
+
+19. Removed all instances of "register" declarations, as they are considered
+obsolete these days and in any case had become very haphazard.
+
+20. Add strerror() to pcre2test for failed file opening.
+
+21. Make pcre2test -C list valgrind support when it is enabled.
+
+22. Add the use_length modifier to pcre2test.
+
+23. Fix an off-by-one bug in pcre2test for the list of names for 'get' and
+'copy' modifiers.
+
+24. Add PCRE2_CALL_CONVENTION into the prototype declarations in pcre2.h as it
+is apparently needed there as well as in the function definitions. (Why did
+nobody ask for this in PCRE1?)
+
+25. Change the _PCRE2_H and _PCRE2_UCP_H guard macros in the header files to
+PCRE2_H_IDEMPOTENT_GUARD and PCRE2_UCP_H_IDEMPOTENT_GUARD to be more standard
+compliant and unique.
+
+26. pcre2-config --libs-posix was listing -lpcre2posix instead of
+-lpcre2-posix. Also, the CMake build process was building the library with the
+wrong name.
+
+27. In pcre2test, give some offset information for errors in hex patterns.
+This uses the C99 formatting sequence %td, except for MSVC which doesn't
+support it - %lu is used instead.
+
+28. Implemented pcre2_code_copy_with_tables(), and added pushtablescopy to
+pcre2test for testing it.
+
+29. Fix small memory leak in pcre2test.
+
+30. Fix out-of-bounds read for partial matching of /./ against an empty string
+when the newline type is CRLF.
+
+31. Fix a bug in pcre2test that caused a crash when a locale was set either in
+the current pattern or a previous one and a wide character was matched.
+
+32. The appearance of \p, \P, or \X in a substitution string when
+PCRE2_SUBSTITUTE_EXTENDED was set caused a segmentation fault (NULL
+dereference).
+
+33. If the starting offset was specified as greater than the subject length in
+a call to pcre2_substitute() an out-of-bounds memory reference could occur.
+
+34. When PCRE2 was compiled to use the heap instead of the stack for recursive
+calls to match(), a repeated minimizing caseless back reference, or a
+maximizing one where the two cases had different numbers of code units,
+followed by a caseful back reference, could lose the caselessness of the first
+repeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX
+but didn't).
+
+35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum
+matching length and just records zero. Typically this happens when there are
+too many nested or recursive back references. If the limit was reached in
+certain recursive cases it failed to be triggered and an internal error could
+be the result.
+
+36. The pcre2_dfa_match() function now takes note of the recursion limit for
+the internal recursive calls that are used for lookrounds and recursions within
+the pattern.
+
+37. More refactoring has got rid of the internal could_be_empty_branch()
+function (around 400 lines of code, including comments) by keeping track of
+could-be-emptiness as the pattern is compiled instead of scanning compiled
+groups. (This would have been much harder before the refactoring of #3 above.)
+This lifts a restriction on the number of branches in a group (more than about
+1100 would give "pattern is too complicated").
+
+38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern
+auto_callout".
+
+39. In a library with Unicode support, incorrect data was compiled for a
+pattern with PCRE2_UCP set without PCRE2_UTF if a class required all wide
+characters to match (for example, /[\s[:^ascii:]]/).
+
+40. The callout_error modifier has been added to pcre2test to make it possible
+to return PCRE2_ERROR_CALLOUT from a callout.
+
+41. A minor change to pcre2grep: colour reset is now "<esc>[0m" instead of
+"<esc>[00m".
+
+42. The limit in the auto-possessification code that was intended to catch
+overly-complicated patterns and not spend too much time auto-possessifying was
+being reset too often, resulting in very long compile times for some patterns.
+Now such patterns are no longer completely auto-possessified.
+
+43. Applied Jason Hood's revised patch for RunTest.bat.
+
+44. Added a new Windows script RunGrepTest.bat, courtesy of Jason Hood.
+
+45. Minor cosmetic fix to pcre2test: move a variable that is not used under
+Windows into the "not Windows" code.
+
+46. Applied Jason Hood's patches to upgrade pcre2grep under Windows and tidy
+some of the code:
+
+  * normalised the Windows condition by ensuring WIN32 is defined;
+  * enables the callout feature under Windows;
+  * adds globbing (Microsoft's implementation expands quoted args),
+    using a tweaked opendirectory;
+  * implements the is_*_tty functions for Windows;
+  * --color=always will write the ANSI sequences to file;
+  * add sequences 4 (underline works on Win10) and 5 (blink as bright
+    background, relatively standard on DOS/Win);
+  * remove the (char *) casts for the now-const strings;
+  * remove GREP_COLOUR (grep's command line allowed the 'u', but not
+    the environment), parsing GREP_COLORS instead;
+  * uses the current colour if not set, rather than black;
+  * add print_match for the undefined case;
+  * fixes a typo.
+
+In addition, colour settings containing anything other than digits and
+semicolon are ignored, and the colour controls are no longer output for empty
+strings.
+
+47. Detecting patterns that are too large inside the length-measuring loop
+saves processing ridiculously long patterns to their end.
+
+48. Ignore PCRE2_CASELESS when processing \h, \H, \v, and \V in classes as it
+just wastes time. In the UTF case it can also produce redundant entries in
+XCLASS lists caused by characters with multiple other cases and pairs of
+characters in the same "not-x" sublists.
+
+49. A pattern such as /(?=(a\K))/ can report the end of the match being before
+its start; pcre2test was not handling this correctly when using the POSIX
+interface (it was OK with the native interface).
+
+50. In pcre2grep, ignore all JIT compile errors. This means that pcre2grep will
+continue to work, falling back to interpretation if anything goes wrong with
+JIT.
+
+51. Applied patches from Christian Persch to configure.ac to make use of the
+AC_USE_SYSTEM_EXTENSIONS macro and to test for functions used by the JIT
+modules.
+
+52. Minor fixes to pcre2grep from Jason Hood:
+    * fixed some spacing;
+    * Windows doesn't usually use single quotes, so I've added a define
+      to use appropriate quotes [in an example];
+    * LC_ALL was displayed as "LCC_ALL";
+    * numbers 11, 12 & 13 should end in "th";
+    * use double quotes in usage message.
+
+53. When autopossessifying, skip empty branches without recursion, to reduce
+stack usage for the benefit of clang with -fsanitize-address, which uses huge
+stack frames. Example pattern: /X?(R||){3335}/. Fixes oss-fuzz issue 553.
+
+54. A pattern with very many explicit back references to a group that is a long
+way from the start of the pattern could take a long time to compile because
+searching for the referenced group in order to find the minimum length was
+being done repeatedly. Now up to 128 group minimum lengths are cached and the
+attempt to find a minimum length is abandoned if there is a back reference to a
+group whose number is greater than 128. (In that case, the pattern is so
+complicated that this optimization probably isn't worth it.) This fixes
+oss-fuzz issue 557.
+
+55. Issue 32 for 10.22 below was not correctly fixed. If pcre2grep in multiline
+mode with --only-matching matched several lines, it restarted scanning at the
+next line instead of moving on to the end of the matched string, which can be
+several lines after the start.
+
+56. Applied Jason Hood's new patch for RunGrepTest.bat that updates it in line
+with updates to the non-Windows version.
+
+
+
 Version 10.22 29-July-2016
 --------------------------
author	Matthew Vernon <matthew@debian.org>	2018-02-24 12:07:04 +0000
committer	Matthew Vernon <matthew@debian.org>	2018-02-24 12:07:04 +0000
commit	e98c3314cf9e05aa99f5e192862ec37f29b7dbb5 (patch)
tree	b69bb3feb63a4fd79ad8a6e55865228f6fde04eb /ChangeLog
parent	92b17f0eb8fddd7117c5344a1e1177daec21995a (diff)