diff options
author | Matthew Vernon <matthew@debian.org> | 2018-10-26 19:26:32 +0100 |
---|---|---|
committer | Matthew Vernon <matthew@debian.org> | 2018-10-26 19:26:32 +0100 |
commit | 1cab70503159c32de523a1762614b6829687a116 (patch) | |
tree | 26068f08ea492a5d12216d014a1a58372c12d360 /testdata | |
parent | 39c4b070d68976779cdb3f2a9f886de962870a37 (diff) | |
parent | b03dbaae48971b62fe6ce174a8dfbbcaf1314d7e (diff) |
Merge tag '10.32'
Upstream version 10.32
Diffstat (limited to 'testdata')
-rw-r--r-- | testdata/grepinput | 6 | ||||
-rw-r--r-- | testdata/grepoutput | 34 | ||||
-rw-r--r-- | testdata/testinput1 | 90 | ||||
-rw-r--r-- | testdata/testinput15 | 13 | ||||
-rw-r--r-- | testdata/testinput17 | 9 | ||||
-rw-r--r-- | testdata/testinput18 | 4 | ||||
-rw-r--r-- | testdata/testinput2 | 78 | ||||
-rw-r--r-- | testdata/testinput22 | 6 | ||||
-rw-r--r-- | testdata/testinput4 | 49 | ||||
-rw-r--r-- | testdata/testinput5 | 65 | ||||
-rw-r--r-- | testdata/testinput6 | 13 | ||||
-rw-r--r-- | testdata/testoutput1 | 127 | ||||
-rw-r--r-- | testdata/testoutput15 | 13 | ||||
-rw-r--r-- | testdata/testoutput17 | 9 | ||||
-rw-r--r-- | testdata/testoutput18 | 15 | ||||
-rw-r--r-- | testdata/testoutput2 | 214 | ||||
-rw-r--r-- | testdata/testoutput22-16 | 8 | ||||
-rw-r--r-- | testdata/testoutput22-32 | 8 | ||||
-rw-r--r-- | testdata/testoutput22-8 | 8 | ||||
-rw-r--r-- | testdata/testoutput4 | 73 | ||||
-rw-r--r-- | testdata/testoutput5 | 101 | ||||
-rw-r--r-- | testdata/testoutput6 | 16 | ||||
-rw-r--r-- | testdata/testoutput8-16-4 | 1022 | ||||
-rw-r--r-- | testdata/testoutputEBC | 3 |
24 files changed, 1859 insertions, 125 deletions
diff --git a/testdata/grepinput b/testdata/grepinput index b01643d..1e2ceb4 100644 --- a/testdata/grepinput +++ b/testdata/grepinput @@ -1,6 +1,6 @@ This is a file of miscellaneous text that is used as test data for checking -that the pcregrep command is working correctly. The file must be more than 24K -long so that it needs more than a single read() call to process it. New +that the pcregrep command is working correctly. The file must be more than +24KiB long so that it needs more than a single read() call to process it. New features should be added at the end, because some of the tests involve the output of line numbers, and we don't want these to change. @@ -9,7 +9,7 @@ In the middle of a line, PATTERN appears. This pattern is in lower case. -Here follows a whole lot of stuff that makes the file over 24K long. +Here follows a whole lot of stuff that makes the file over 24KiB long. ------------------------------------------------------------------------------- The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the diff --git a/testdata/grepoutput b/testdata/grepoutput index e49c2b2..2bd69be 100644 --- a/testdata/grepoutput +++ b/testdata/grepoutput @@ -346,7 +346,7 @@ RC=0 ./testdata/grepinput-9- ./testdata/grepinput:10:This pattern is in lower case. ./testdata/grepinput-11- -./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24K long. +./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24KiB long. ./testdata/grepinput-13- -- ./testdata/grepinput:623:Check up on PATTERN near the end. @@ -379,6 +379,7 @@ RC=0 ./testdata/grepinputx RC=0 ---------------------------- Test 37 ----------------------------- +24KiB long so that it needs more than a single read() call to process it. New aaaaa0 aaaaa2 010203040506 @@ -465,11 +466,11 @@ fox [1;31mjumps[0m This time it [1;31mjumps[0m and [1;31mjumps[0m and [1;31mjumps[0m. RC=0 ---------------------------- Test 53 ------------------------------ -36972,6 -36990,4 -37024,4 -37066,5 -37083,4 +36976,6 +36994,4 +37028,4 +37070,5 +37087,4 RC=0 ---------------------------- Test 54 ------------------------------ 595:15,6 @@ -519,8 +520,8 @@ RC=0 pcre2grep: pcre2_match() gave error -47 while matching text that starts: This is a file of miscellaneous text that is used as test data for checking -that the pcregrep command is working correctly. The file must be more than 24K -long so that it needs more than a single read +that the pcregrep command is working correctly. The file must be more than +24KiB long so that it needs more than a single re pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded. pcre2grep: Check your regex for nested unlimited loops. @@ -529,8 +530,8 @@ RC=1 pcre2grep: pcre2_match() gave error -53 while matching text that starts: This is a file of miscellaneous text that is used as test data for checking -that the pcregrep command is working correctly. The file must be more than 24K -long so that it needs more than a single read +that the pcregrep command is working correctly. The file must be more than +24KiB long so that it needs more than a single re pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded. pcre2grep: Check your regex for nested unlimited loops. @@ -814,11 +815,11 @@ RC=0 615:0,12 RC=0 ---------------------------- Test 112 ----------------------------- -37168,12 -37180,12 -37192,12 -37204,12 -37216,12 +37172,12 +37184,12 +37196,12 +37208,12 +37220,12 RC=0 ---------------------------- Test 113 ----------------------------- 480 @@ -945,3 +946,6 @@ RC=0 RC=0 [1;31ma[0mb[1;31mc[0md RC=0 +---------------------------- Test 126 ----------------------------- +ABC +RC=0 diff --git a/testdata/testinput1 b/testdata/testinput1 index 9a9c5fd..d8615ee 100644 --- a/testdata/testinput1 +++ b/testdata/testinput1 @@ -2184,6 +2184,11 @@ Blah blah blaH blah +/((?i)blah)\s+(?m)A(?i:\1)/ + blah ABLAH +\= Expect no match + blah aBLAH + /(?>a*)*/ a aa @@ -5157,14 +5162,6 @@ name)/mark /A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/x,mark AAAC -/a(*PRUNE:X)bc|qq/mark,no_start_optimize -\= Expect no match - axy - -/a(*THEN:X)bc|qq/mark,no_start_optimize -\= Expect no match - axy - /(?=a(*MARK:A)b)..x/mark abxy \= Expect no match @@ -6189,4 +6186,81 @@ ef) x/x,mark /(?=a+)a(a+)++b/ aab +/(?<=\G.)/g,aftertext + abc + +/(?<=(?=.)?)/ + +/(?<=(?=.)?+)/ + +/(?<=(?=.)*)/ + +/(?<=(?=.){4,5})/ + +/(?<=(?=.){4,5}x)/ + +/a(?=.(*:X))(*SKIP:X)(*F)|(.)/ + abc + +/a(?>(*:X))(*SKIP:X)(*F)|(.)/ + abc + +/a(?:(*:X))(*SKIP:X)(*F)|(.)/ + abc + +#pattern no_start_optimize + +/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/ + abc + +/(?>a(*:1))(?>b)(*SKIP:1)x|.*/ + abc + +#subject mark + +/a(*ACCEPT:X)b/ + abc + +/(?=a(*ACCEPT:QQ)bc)axyz/ + axyz + +/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/ + abc + +/a(*F:X)b/ + abc + +/(?(DEFINE)(a(*F:X)))(?1)b/ + abc + +/a(*COMMIT:X)b/ + abc + +/(?(DEFINE)(a(*COMMIT:X)))(?1)b/ + abc + +/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/ + aaaabd + +/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/ + aaaabd + +/a(*COMMIT:X)b/ + axabc + +#pattern -no_start_optimize +#subject -mark + +/(.COMMIT)(*COMMIT::::::::::interal error:::)/ + +/(*COMMIT:)/ + +/(*COMMIT:]w)/ + +/(?i)A(?^)B(?^x:C D)(?^i)e f/ + aBCDE F +\= Expect no match + aBCDEF + AbCDe f + # End of testinput1 diff --git a/testdata/testinput15 b/testdata/testinput15 index cd12ad1..2ef6672 100644 --- a/testdata/testinput15 +++ b/testdata/testinput15 @@ -46,32 +46,45 @@ /(*LIMIT_DEPTH=4294967280)abc/I /(a+)*zz/ +\= Expect no match aaaaaaaaaaaaaz +\= Expect limit exceeded aaaaaaaaaaaaaz\=match_limit=3000 /(a+)*zz/ +\= Expect limit exceeded aaaaaaaaaaaaaz\=depth_limit=10 /(*LIMIT_MATCH=3000)(a+)*zz/I +\= Expect limit exceeded aaaaaaaaaaaaaz +\= Expect limit exceeded aaaaaaaaaaaaaz\=match_limit=60000 /(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I +\= Expect limit exceeded aaaaaaaaaaaaaz /(*LIMIT_MATCH=60000)(a+)*zz/I +\= Expect no match aaaaaaaaaaaaaz +\= Expect limit exceeded aaaaaaaaaaaaaz\=match_limit=3000 /(*LIMIT_DEPTH=10)(a+)*zz/I +\= Expect limit exceeded aaaaaaaaaaaaaz +\= Expect limit exceeded aaaaaaaaaaaaaz\=depth_limit=1000 /(*LIMIT_DEPTH=10)(*LIMIT_DEPTH=1000)(a+)*zz/I +\= Expect no match aaaaaaaaaaaaaz /(*LIMIT_DEPTH=1000)(a+)*zz/I +\= Expect no match aaaaaaaaaaaaaz +\= Expect limit exceeded aaaaaaaaaaaaaz\=depth_limit=10 # These three have infinitely nested recursions. diff --git a/testdata/testinput17 b/testdata/testinput17 index 9a73ef1..0944151 100644 --- a/testdata/testinput17 +++ b/testdata/testinput17 @@ -160,10 +160,13 @@ aaaaaaaaaaaaaz\=match_limit=3000 /(*LIMIT_MATCH=3000)(a+)*zz/I +\= Expect limit exceeded aaaaaaaaaaaaaz +\= Expect limit exceeded aaaaaaaaaaaaaz\=match_limit=60000 /(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I +\= Expect limit exceeded aaaaaaaaaaaaaz /(*LIMIT_MATCH=60000)(a+)*zz/I @@ -175,12 +178,15 @@ # These three have infinitely nested recursions. /((?2))((?1))/ +\= Expect JIT stack limit reached abc /((?(R2)a+|(?1)b))()/ +\= Expect JIT stack limit reached aaaabcde /(?(R)a*(?1)|((?R))b)/ +\= Expect JIT stack limit reached aaaabcde # Invalid options disable JIT when called via pcre2_match(), causing the @@ -277,7 +283,8 @@ /[axm]{7}/ /(.|.)*?bx/ - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabax +\= Expect limit exceeded + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabax\=match_limit=10000000 # Test JIT disable diff --git a/testdata/testinput18 b/testdata/testinput18 index 755a0c9..563a506 100644 --- a/testdata/testinput18 +++ b/testdata/testinput18 @@ -134,4 +134,8 @@ /a\b(c/literal,posix,dotall +/((a)(b)?(c))/posix + 123ace + 123ace\=posix_startend=2:6 + # End of testdata/testinput18 diff --git a/testdata/testinput2 b/testdata/testinput2 index 5d3a80e..fc94b35 100644 --- a/testdata/testinput2 +++ b/testdata/testinput2 @@ -910,6 +910,8 @@ /[:x:]/I +/\F/I + /\l/I /\L/I @@ -2949,10 +2951,11 @@ /abc(*:)pqr/ -/abc(*FAIL:123)xyz/ +/(*COMMIT:X)/B # This should, and does, fail. In Perl, it does not, which I think is a # bug because replacing the B in the pattern by (B|D) does make it fail. +# Turning off Perl's optimization by inserting (??{""}) also makes it fail. /A(*COMMIT)B/aftertext,mark \= Expect no match @@ -4007,6 +4010,9 @@ /(?(VERSION>=10.0)yes|no)/I yesno +/(?(VERSION>=10.04)yes|no)/ + yesno + /(?(VERSION=8)yes){3}/BI,aftertext yesno @@ -4643,6 +4649,9 @@ B)x/alt_verbnames,mark /(?=a\K)/replace=z BaCaD + +/(?<=\K.)/g,replace=- + ab /(?'abcdefghijklmnopqrstuvwxyzABCDEFG'toolong)/ @@ -4935,6 +4944,9 @@ a)"xI //replace=0 \=offset=7 +/(?<=\G.)/g,replace=+ + abc + ".+\QX\E+"B,no_auto_possess ".+\QX\E+"B,auto_callout,no_auto_possess @@ -5429,4 +5441,68 @@ a)"xI /(?=a+)a(a+)++b/B +/(?<=(?=.){4,5}x)/B + +# Perl behaves differently with these when optimization is turned off + +/a(*PRUNE:X)bc|qq/mark,no_start_optimize +\= Expect no match + axy + +/a(*THEN:X)bc|qq/mark,no_start_optimize +\= Expect no match + axy + +/(?^x-i)AB/ + +/(?^-i)AB/ + +/(?x-i-i)/ + +/(?(?=^))b/I + abc + +/(?(?=^)|)b/I + abc + +/(?(?=^)|^)b/I + bbc +\= Expect no match + abc + +/(?(1)^|^())/I + +/(?(1)^())b/I + +/(?(1)^())+b/I,aftertext + abc + +/(?(1)^()|^)+b/I,aftertext + bbc +\= Expect no match + abc + +/(?(1)^()|^)*b/I,aftertext + bbc + abc + xbc + +/(?(1)^())+b/I,aftertext + abc + +/(?(1)^a()|^a)+b/I,aftertext + abc +\= Expect no match + bbc + +/(?(1)^|^(a))+b/I,aftertext + abc +\= Expect no match + bbc + +/(?(1)^a()|^a)*b/I,aftertext + abc + bbc + xbc + # End of testinput2 diff --git a/testdata/testinput22 b/testdata/testinput22 index e6d4053..5e01fdc 100644 --- a/testdata/testinput22 +++ b/testdata/testinput22 @@ -98,4 +98,10 @@ \= Expect no match - tests \C at end of subject ab +/\C[^\v]+\x80/utf + [AΏBŀC] + +/\C[^\d]+\x80/utf + [AΏBŀC] + # End of testinput22 diff --git a/testdata/testinput4 b/testdata/testinput4 index 0ef7b8e..a27b6af 100644 --- a/testdata/testinput4 +++ b/testdata/testinput4 @@ -1394,28 +1394,15 @@ \x{6e9} \x{6ef} \x{6fa} -\= Expect no match - \x{650} - \x{651} - \x{652} - \x{653} - \x{654} - \x{655} - + /^\p{Cyrillic}/utf \x{1d2b} /^\p{Common}/utf - \x{589} - \x{60c} - \x{61f} - \x{964} - \x{965} + \x{2116} + \x{1D183} /^\p{Inherited}/utf - \x{64b} - \x{654} - \x{655} \x{200c} \= Expect no match \x{64a} @@ -2300,5 +2287,35 @@ \x{123}\x{122}\x{123} \= Expect no match \x{123}\x{124}\x{123} + +/\N{U+1234}/utf + \x{1234} + +/[\N{U+1234}]/utf + \x{1234} + +# Test the full list of Unicode "Pattern White Space" characters that are to +# be ignored by /x. The pattern lines below may show up oddly in text editors +# or when listed to the screen. Note that characters such as U+2002, which are +# matched as space by \h and \v are *not* "Pattern White Space". + +/A
B/x,utf + AB + +/A B/x,utf + A\x{2002}B +\= Expect no match + AB + +# ------- + +/[^\x{100}-\x{ffff}]*[\x80-\xff]/utf + \x{99}\x{99}\x{99} + +/[^\x{100}-\x{ffff}ABC]*[\x80-\xff]/utf + \x{99}\x{99}\x{99} + +/[^\x{100}-\x{ffff}]*[\x80-\xff]/i,utf + \x{99}\x{99}\x{99} # End of testinput4 diff --git a/testdata/testinput5 b/testdata/testinput5 index 0366136..687de32 100644 --- a/testdata/testinput5 +++ b/testdata/testinput5 @@ -2030,8 +2030,8 @@ # to test 4. /^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+) - (\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+) - (\p{Zanabazar_Square}+)/x,utf + (\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+) + (\p{Zanabazar_Square}+)/x,utf \x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47} /^\x{1E900}\x{104B0}/i,utf @@ -2041,23 +2041,70 @@ /^(?:(\X)(?C))+$/utf \x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47}\=callout_capture,callout_no_where -# These two are here because JIT is not yet updated. Also, the very first data -# line is handled differently by Perl. +# Similarly for Unicode 11.0.0 + +/^(\p{Dogra}+)(\p{Gunjala_Gondi}+)(\p{Hanifi_Rohingya}+)(\p{Makasar}+) + (\p{Medefaidrin}+)(\p{Old_Sogdian}+)(\p{Sogdian}+)/x,utf + \x{11800}\x{11da9}\x{10d27}\x{11ee0}\x{16e48}\x{10f27}\x{10f30} + +# These two are here because of differences from Perl. /^\X/utf A\x{200d}B A ZWJ - \x{261D}\x{1F3FB}B E_Base E_Modifier - \x{1F466}\x{1F3FF}B E_Base_GAZ E_Modifier - \x{200d}\x{1F3A4}B ZWJ Glue_After_ZWJ - \x{200d}\x{1F469}B ZWJ E_Base_GAZ + \x{261d}\x{261d}B Extended_Pictographic Extended_Pictographic + \x{261D}\x{1F3FB}B Extended_Pictographic Extend \x{1F1E6}\x{1F1E7}B RegionalIndicator RegionalIndicator - \x{261D}\x{E0100}\x{1F3FB}B E_Base Extend E_Modifier + \x{261D}\x{1F3FB}\x{261d}B Extended_Pictographic Extend E-P + \x{261D}\x{1F3FB}\x{200d}\x{261d}B Extended_Pictographic Extend ZWJ E-P # Regional indicators /^(\X)(\X)/utf,aftertext \x{1F1E6}\x{1F1E7}\x{1F1E7}B \x{1F1E6}\x{1F1E7}\x{1F1E7}\x{1F1E6}B + +# More differences from Perl + +/^[\p{Arabic}]/utf +\= Expect no match + \x{650} + \x{651} + \x{652} + \x{653} + \x{654} + \x{655} + +/^\p{Common}/utf + \x{589} + \x{60c} + \x{61f} + \x{964} + \x{965} + +/^\p{Inherited}/utf + \x{64b} + \x{654} + \x{655} + \x{1D1AA} +/\N{U+}/ + +/\N{U+}/utf + +/\N{U}/ + +# This tests the non-UTF Unicode NEL pattern whitespace character, only +# recognized by PCRE2 with /x when there is Unicode support. + +/A +
B/x + AB + +# This tests Unicode Pattern White Space characters in verb names when they +# are being processed with PCRE2_EXTENDED. Note: there are UTF-8 characters +# with code points greater than 255 between A, B, and C in the pattern. + +/(*: AB
C)abc/x,utf,mark,alt_verbnames + abc # End of testinput5 diff --git a/testdata/testinput6 b/testdata/testinput6 index e2f00c0..f7dedb2 100644 --- a/testdata/testinput6 +++ b/testdata/testinput6 @@ -4874,6 +4874,14 @@ \= Expect depth limit exceeded a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] +/(*LIMIT_HEAP=0)^((.)(?1)|.)$/ +\= Expect heap limit exceeded + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] + +/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/ +\= Expect success + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] + /(02-)?[0-9]{3}-[0-9]{3}/ 02-123-123 @@ -4929,8 +4937,9 @@ /(?<=|abc)/endanchored abcde\=aftertext -/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor -.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?);); +/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor +\= Expect limit exceeded +.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?);); /\n/firstline xyz\nabc diff --git a/testdata/testoutput1 b/testdata/testoutput1 index 9c55be9..77b9ff0 100644 --- a/testdata/testoutput1 +++ b/testdata/testoutput1 @@ -3346,6 +3346,14 @@ No match 0: blaH blah 1: blaH +/((?i)blah)\s+(?m)A(?i:\1)/ + blah ABLAH + 0: blah ABLAH + 1: blah +\= Expect no match + blah aBLAH +No match + /(?>a*)*/ a 0: a @@ -8282,16 +8290,6 @@ No match, mark = m AAAC 0: AAC -/a(*PRUNE:X)bc|qq/mark,no_start_optimize -\= Expect no match - axy -No match, mark = X - -/a(*THEN:X)bc|qq/mark,no_start_optimize -\= Expect no match - axy -No match, mark = X - /(?=a(*MARK:A)b)..x/mark abxy 0: abx @@ -9822,4 +9820,113 @@ No match 0: aab 1: a +/(?<=\G.)/g,aftertext + abc + 0: + 0+ bc + 0: + 0+ c + 0: + 0+ + +/(?<=(?=.)?)/ + +/(?<=(?=.)?+)/ + +/(?<=(?=.)*)/ + +/(?<=(?=.){4,5})/ + +/(?<=(?=.){4,5}x)/ + +/a(?=.(*:X))(*SKIP:X)(*F)|(.)/ + abc + 0: a + 1: a + +/a(?>(*:X))(*SKIP:X)(*F)|(.)/ + abc + 0: a + 1: a + +/a(?:(*:X))(*SKIP:X)(*F)|(.)/ + abc + 0: b + 1: b + +#pattern no_start_optimize + +/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/ + abc + 0: abc + +/(?>a(*:1))(?>b)(*SKIP:1)x|.*/ + abc + 0: abc + +#subject mark + +/a(*ACCEPT:X)b/ + abc + 0: a +MK: X + +/(?=a(*ACCEPT:QQ)bc)axyz/ + axyz + 0: axyz +MK: QQ + +/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/ + abc + 0: ab +MK: X + +/a(*F:X)b/ + abc +No match, mark = X + +/(?(DEFINE)(a(*F:X)))(?1)b/ + abc +No match, mark = X + +/a(*COMMIT:X)b/ + abc + 0: ab +MK: X + +/(?(DEFINE)(a(*COMMIT:X)))(?1)b/ + abc + 0: ab +MK: X + +/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/ + aaaabd + 0: bd + +/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/ + aaaabd +No match, mark = X + +/a(*COMMIT:X)b/ + axabc +No match, mark = X + +#pattern -no_start_optimize +#subject -mark + +/(.COMMIT)(*COMMIT::::::::::interal error:::)/ + +/(*COMMIT:)/ + +/(*COMMIT:]w)/ + +/(?i)A(?^)B(?^x:C D)(?^i)e f/ + aBCDE F + 0: aBCDE F +\= Expect no match + aBCDEF +No match + AbCDe f +No match + # End of testinput1 diff --git a/testdata/testoutput15 b/testdata/testoutput15 index b2068d0..d09e781 100644 --- a/testdata/testoutput15 +++ b/testdata/testoutput15 @@ -124,12 +124,15 @@ Last code unit = 'c' Subject length lower bound = 3 /(a+)*zz/ +\= Expect no match aaaaaaaaaaaaaz No match +\= Expect limit exceeded aaaaaaaaaaaaaz\=match_limit=3000 Failed: error -47: match limit exceeded /(a+)*zz/ +\= Expect limit exceeded aaaaaaaaaaaaaz\=depth_limit=10 Failed: error -53: matching depth limit exceeded @@ -139,8 +142,10 @@ Match limit = 3000 Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 +\= Expect limit exceeded aaaaaaaaaaaaaz Failed: error -47: match limit exceeded +\= Expect limit exceeded aaaaaaaaaaaaaz\=match_limit=60000 Failed: error -47: match limit exceeded @@ -150,6 +155,7 @@ Match limit = 3000 Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 +\= Expect limit exceeded aaaaaaaaaaaaaz Failed: error -47: match limit exceeded @@ -159,8 +165,10 @@ Match limit = 60000 Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 +\= Expect no match aaaaaaaaaaaaaz No match +\= Expect limit exceeded aaaaaaaaaaaaaz\=match_limit=3000 Failed: error -47: match limit exceeded @@ -170,8 +178,10 @@ Depth limit = 10 Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 +\= Expect limit exceeded aaaaaaaaaaaaaz Failed: error -53: matching depth limit exceeded +\= Expect limit exceeded aaaaaaaaaaaaaz\=depth_limit=1000 Failed: error -53: matching depth limit exceeded @@ -181,6 +191,7 @@ Depth limit = 1000 Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 +\= Expect no match aaaaaaaaaaaaaz No match @@ -190,8 +201,10 @@ Depth limit = 1000 Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 +\= Expect no match aaaaaaaaaaaaaz No match +\= Expect limit exceeded aaaaaaaaaaaaaz\=depth_limit=10 Failed: error -53: matching depth limit exceeded diff --git a/testdata/testoutput17 b/testdata/testoutput17 index a0606a7..acf00e0 100644 --- a/testdata/testoutput17 +++ b/testdata/testoutput17 @@ -300,8 +300,10 @@ Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 JIT compilation was successful +\= Expect limit exceeded aaaaaaaaaaaaaz Failed: error -47: match limit exceeded +\= Expect limit exceeded aaaaaaaaaaaaaz\=match_limit=60000 Failed: error -47: match limit exceeded @@ -312,6 +314,7 @@ Starting code units: a z Last code unit = 'z' Subject length lower bound = 2 JIT compilation was successful +\= Expect limit exceeded aaaaaaaaaaaaaz Failed: error -47: match limit exceeded @@ -332,14 +335,17 @@ Failed: error -47: match limit exceeded # These three have infinitely nested recursions. /((?2))((?1))/ +\= Expect JIT stack limit reached abc Failed: error -46: JIT stack limit reached /((?(R2)a+|(?1)b))()/ +\= Expect JIT stack limit reached aaaabcde Failed: error -46: JIT stack limit reached /(?(R)a*(?1)|((?R))b)/ +\= Expect JIT stack limit reached aaaabcde Failed: error -46: JIT stack limit reached @@ -516,7 +522,8 @@ Failed: error -46: JIT stack limit reached /[axm]{7}/ /(.|.)*?bx/ - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabax +\= Expect limit exceeded + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabax\=match_limit=10000000 Failed: error -47: match limit exceeded # Test JIT disable diff --git a/testdata/testoutput18 b/testdata/testoutput18 index d51423d..d6e3c71 100644 --- a/testdata/testoutput18 +++ b/testdata/testoutput18 @@ -46,6 +46,7 @@ defabc\=noteol 0: def 1: def + 2: <unset> 3: def /the quick brown fox/ @@ -206,4 +207,18 @@ No match: POSIX code 17: match failed /a\b(c/literal,posix,dotall Failed: POSIX code 16: bad argument at offset 0 +/((a)(b)?(c))/posix + 123ace + 0: ac + 1: ac + 2: a + 3: <unset> + 4: c + 123ace\=posix_startend=2:6 + 0: ac + 1: ac + 2: a + 3: <unset> + 4: c + # End of testdata/testinput18 diff --git a/testdata/testoutput2 b/testdata/testoutput2 index fcaac8f..ecf0d80 100644 --- a/testdata/testoutput2 +++ b/testdata/testoutput2 @@ -3244,20 +3244,23 @@ Failed: error 113 at offset 0: POSIX collating elements are not supported /[:x:]/I Failed: error 112 at offset 0: POSIX named classes are supported only within a class +/\F/I +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u + /\l/I -Failed: error 137 at offset 2: PCRE does not support \L, \l, \N{name}, \U, or \u +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u /\L/I -Failed: error 137 at offset 2: PCRE does not support \L, \l, \N{name}, \U, or \u +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u /\N{name}/I -Failed: error 137 at offset 2: PCRE does not support \L, \l, \N{name}, \U, or \u +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u /\u/I -Failed: error 137 at offset 2: PCRE does not support \L, \l, \N{name}, \U, or \u +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u /\U/I -Failed: error 137 at offset 2: PCRE does not support \L, \l, \N{name}, \U, or \u +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u /a{1,3}b/ungreedy ab @@ -10154,11 +10157,17 @@ Failed: error 166 at offset 10: (*MARK) must have an argument /abc(*:)pqr/ Failed: error 166 at offset 6: (*MARK) must have an argument -/abc(*FAIL:123)xyz/ -Failed: error 159 at offset 10: an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) +/(*COMMIT:X)/B +------------------------------------------------------------------ + Bra + *COMMIT X + Ket + End +------------------------------------------------------------------ # This should, and does, fail. In Perl, it does not, which I think is a # bug because replacing the B in the pattern by (B|D) does make it fail. +# Turning off Perl's optimization by inserting (??{""}) also makes it fail. /A(*COMMIT)B/aftertext,mark \= Expect no match @@ -13188,7 +13197,7 @@ Failed: error 167 at offset 5: non-hex character in \x{} (closing brace missing? Failed: error 167 at offset 7: non-hex character in \x{} (closing brace missing?) /^A\x{/ -Failed: error 178 at offset 5: digits missing in \x{} or \o{} +Failed: error 178 at offset 5: digits missing in \x{} or \o{} or \N{U+} /[ab]++/B,no_auto_possess ------------------------------------------------------------------ @@ -13402,7 +13411,7 @@ Failed: error 133 at offset 7: parentheses are too deeply nested (stack check) Failed: error 155 at offset 2: missing opening brace after \o /\o{}/ -Failed: error 178 at offset 3: digits missing in \x{} or \o{} +Failed: error 178 at offset 3: digits missing in \x{} or \o{} or \N{U+} /\o{whatever}/ Failed: error 164 at offset 3: non-octal character in \o{} (closing brace missing?) @@ -13410,7 +13419,7 @@ Failed: error 164 at offset 3: non-octal character in \o{} (closing brace missin /\xthing/ /\x{}/ -Failed: error 178 at offset 3: digits missing in \x{} or \o{} +Failed: error 178 at offset 3: digits missing in \x{} or \o{} or \N{U+} /\x{whatever}/ Failed: error 167 at offset 3: non-hex character in \x{} (closing brace missing?) @@ -13483,6 +13492,10 @@ Subject length lower bound = 2 yesno 0: yes +/(?(VERSION>=10.04)yes|no)/ + yesno + 0: yes + /(?(VERSION=8)yes){3}/BI,aftertext ------------------------------------------------------------------ Bra @@ -13537,7 +13550,7 @@ Failed: error 179 at offset 11: syntax error or number too big in (?(VERSION con Failed: error 179 at offset 16: syntax error or number too big in (?(VERSION condition /(?(VERSION=10.101)yes|no)/ -Failed: error 179 at offset 17: syntax error or number too big in (?(VERSION condition +Failed: error 179 at offset 16: syntax error or number too big in (?(VERSION condition /abcd/I Capturing subpattern count = 0 @@ -14899,7 +14912,11 @@ Subject length lower bound = 1 /(?=a\K)/replace=z BaCaD -Failed: error -60: match with end before start is not supported +Failed: error -60: match with end before start or start moved backwards is not supported + +/(?<=\K.)/g,replace=- + ab +Failed: error -60: match with end before start or start moved backwards is not supported /(?'abcdefghijklmnopqrstuvwxyzABCDEFG'toolong)/ Failed: error 148 at offset 36: subpattern name is too long (maximum 32 characters) @@ -15545,6 +15562,10 @@ Failed: error -57 at offset 2 in replacement: bad escape sequence in replacement \=offset=7 Failed: error -33: bad offset value +/(?<=\G.)/g,replace=+ + abc + 3: a+b+c+ + ".+\QX\E+"B,no_auto_possess ------------------------------------------------------------------ Bra @@ -16575,8 +16596,175 @@ No match End ------------------------------------------------------------------ +/(?<=(?=.){4,5}x)/B +------------------------------------------------------------------ + Bra + AssertB + Reverse + Assert + Any + Ket + x + Ket + Ket + End +------------------------------------------------------------------ + +# Perl behaves differently with these when optimization is turned off + +/a(*PRUNE:X)bc|qq/mark,no_start_optimize +\= Expect no match + axy +No match, mark = X + +/a(*THEN:X)bc|qq/mark,no_start_optimize +\= Expect no match + axy +No match, mark = X + +/(?^x-i)AB/ +Failed: error 194 at offset 4: invalid hyphen in option setting + +/(?^-i)AB/ +Failed: error 194 at offset 3: invalid hyphen in option setting + +/(?x-i-i)/ +Failed: error 194 at offset 5: invalid hyphen in option setting + +/(?(?=^))b/I +Capturing subpattern count = 0 +Last code unit = 'b' +Subject length lower bound = 1 + abc + 0: b + +/(?(?=^)|)b/I +Capturing subpattern count = 0 +First code unit = 'b' +Subject length lower bound = 1 + abc + 0: b + +/(?(?=^)|^)b/I +Capturing subpattern count = 0 +Compile options: <none> +Overall options: anchored +First code unit = 'b' +Subject length lower bound = 1 + bbc + 0: b +\= Expect no match + abc +No match + +/(?(1)^|^())/I +Capturing subpattern count = 1 +Max back reference = 1 +May match empty string +Compile options: <none> +Overall options: anchored +Subject length lower bound = 0 + +/(?(1)^())b/I +Capturing subpattern count = 1 +Max back reference = 1 +Last code unit = 'b' +Subject length lower bound = 1 + +/(?(1)^())+b/I,aftertext +Capturing subpattern count = 1 +Max back reference = 1 +Last code unit = 'b' +Subject length lower bound = 1 + abc + 0: b + 0+ c + +/(?(1)^()|^)+b/I,aftertext +Capturing subpattern count = 1 +Max back reference = 1 +Compile options: <none> +Overall options: anchored +First code unit = 'b' +Subject length lower bound = 1 + bbc + 0: b + 0+ bc +\= Expect no match + abc +No match + +/(?(1)^()|^)*b/I,aftertext +Capturing subpattern count = 1 +Max back reference = 1 +First code unit = 'b' +Subject length lower bound = 1 + bbc + 0: b + 0+ bc + abc + 0: b + 0+ c + xbc + 0: b + 0+ c + +/(?(1)^())+b/I,aftertext +Capturing subpattern count = 1 +Max back reference = 1 +Last code unit = 'b' +Subject length lower bound = 1 + abc + 0: b + 0+ c + +/(?(1)^a()|^a)+b/I,aftertext +Capturing subpattern count = 1 +Max back reference = 1 +Compile options: <none> +Overall options: anchored +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + abc + 0: ab + 0+ c +\= Expect no match + bbc +No match + +/(?(1)^|^(a))+b/I,aftertext +Capturing subpattern count = 1 +Max back reference = 1 +Compile options: <none> +Overall options: anchored +Last code unit = 'b' +Subject length lower bound = 1 + abc + 0: ab + 0+ c + 1: a +\= Expect no match + bbc +No match + +/(?(1)^a()|^a)*b/I,aftertext +Capturing subpattern count = 1 +Max back reference = 1 +Last code unit = 'b' +Subject length lower bound = 1 + abc + 0: ab + 0+ c + bbc + 0: b + 0+ bc + xbc + 0: b + 0+ c + # End of testinput2 -Error -65: PCRE2_ERROR_BADDATA (unknown error number) +Error -70: PCRE2_ERROR_BADDATA (unknown error number) Error -62: bad serialized data Error -2: partial match Error -1: no match diff --git a/testdata/testoutput22-16 b/testdata/testoutput22-16 index 88f827c..df29e14 100644 --- a/testdata/testoutput22-16 +++ b/testdata/testoutput22-16 @@ -171,4 +171,12 @@ No match ab No match +/\C[^\v]+\x80/utf + [AΏBŀC] +No match + +/\C[^\d]+\x80/utf + [AΏBŀC] +No match + # End of testinput22 diff --git a/testdata/testoutput22-32 b/testdata/testoutput22-32 index ac485fc..f0b7984 100644 --- a/testdata/testoutput22-32 +++ b/testdata/testoutput22-32 @@ -169,4 +169,12 @@ No match ab No match +/\C[^\v]+\x80/utf + [AΏBŀC] +No match + +/\C[^\d]+\x80/utf + [AΏBŀC] +No match + # End of testinput22 diff --git a/testdata/testoutput22-8 b/testdata/testoutput22-8 index 3d31fbc..0a04aa8 100644 --- a/testdata/testoutput22-8 +++ b/testdata/testoutput22-8 @@ -173,4 +173,12 @@ No match ab No match +/\C[^\v]+\x80/utf + [AΏBŀC] +No match + +/\C[^\d]+\x80/utf + [AΏBŀC] +No match + # End of testinput22 diff --git a/testdata/testoutput4 b/testdata/testoutput4 index 6056e6d..ba3df37 100644 --- a/testdata/testoutput4 +++ b/testdata/testoutput4 @@ -2293,43 +2293,18 @@ No match 0: \x{6ef} \x{6fa} 0: \x{6fa} -\= Expect no match - \x{650} -No match - \x{651} -No match - \x{652} -No match - \x{653} -No match - \x{654} -No match - \x{655} -No match - + /^\p{Cyrillic}/utf \x{1d2b} 0: \x{1d2b} /^\p{Common}/utf - \x{589} - 0: \x{589} - \x{60c} - 0: \x{60c} - \x{61f} - 0: \x{61f} - \x{964} - 0: \x{964} - \x{965} - 0: \x{965} + \x{2116} + 0: \x{2116} + \x{1D183} + 0: \x{1d183} /^\p{Inherited}/utf - \x{64b} - 0: \x{64b} - \x{654} - 0: \x{654} - \x{655} - 0: \x{655} \x{200c} 0: \x{200c} \= Expect no match @@ -3728,5 +3703,43 @@ No match \= Expect no match \x{123}\x{124}\x{123} No match + +/\N{U+1234}/utf + \x{1234} + 0: \x{1234} + +/[\N{U+1234}]/utf + \x{1234} + 0: \x{1234} + +# Test the full list of Unicode "Pattern White Space" characters that are to +# be ignored by /x. The pattern lines below may show up oddly in text editors +# or when listed to the screen. Note that characters such as U+2002, which are +# matched as space by \h and \v are *not* "Pattern White Space". + +/A
B/x,utf + AB + 0: AB + +/A B/x,utf + A\x{2002}B + 0: A\x{2002}B +\= Expect no match + AB +No match + +# ------- + +/[^\x{100}-\x{ffff}]*[\x80-\xff]/utf + \x{99}\x{99}\x{99} + 0: \x{99}\x{99}\x{99} + +/[^\x{100}-\x{ffff}ABC]*[\x80-\xff]/utf + \x{99}\x{99}\x{99} + 0: \x{99}\x{99}\x{99} + +/[^\x{100}-\x{ffff}]*[\x80-\xff]/i,utf + \x{99}\x{99}\x{99} + 0: \x{99}\x{99}\x{99} # End of testinput4 diff --git a/testdata/testoutput5 b/testdata/testoutput5 index 4b3171c..51caa18 100644 --- a/testdata/testoutput5 +++ b/testdata/testoutput5 @@ -4593,8 +4593,8 @@ No match # to test 4. /^(\p{Adlam}+)(\p{Bhaiksuki}+)(\p{Marchen}+)(\p{Newa}+)(\p{Osage}+) - (\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+) - (\p{Zanabazar_Square}+)/x,utf + (\p{Tangut}+)(\p{Masaram_Gondi}+)(\p{Nushu}+)(\p{Soyombo}+) + (\p{Zanabazar_Square}+)/x,utf \x{1E900}\x{1E924}\x{1E953}\x{11C00}\x{11C2D}\x{11C3E}\x{11C70}\x{11C77}\x{11CAB}\x{11400}\x{1142F}\x{11455}\x{104B0}\x{104D8}\x{104FB}\x{16FE0}\x{18800}\x{18AF2}\x{11D00}\x{11D3A}\x{11D59}\x{16FE1}\x{1B170}\x{1B2FB}\x{11A50}\x{11A58}\x{11AA2}\x{11A00}\x{11A07}\x{11A47} 0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47} 1: \x{1e900}\x{1e924}\x{1e953} @@ -4667,24 +4667,35 @@ Callout 0: last capture = 1 0: \x{1e900}\x{1e924}\x{1e953}\x{11c00}\x{11c2d}\x{11c3e}\x{11c70}\x{11c77}\x{11cab}\x{11400}\x{1142f}\x{11455}\x{104b0}\x{104d8}\x{104fb}\x{16fe0}\x{18800}\x{18af2}\x{11d00}\x{11d3a}\x{11d59}\x{16fe1}\x{1b170}\x{1b2fb}\x{11a50}\x{11a58}\x{11aa2}\x{11a00}\x{11a07}\x{11a47} 1: \x{11a00}\x{11a07}\x{11a47} -# These two are here because JIT is not yet updated. Also, the very first data -# line is handled differently by Perl. +# Similarly for Unicode 11.0.0 + +/^(\p{Dogra}+)(\p{Gunjala_Gondi}+)(\p{Hanifi_Rohingya}+)(\p{Makasar}+) + (\p{Medefaidrin}+)(\p{Old_Sogdian}+)(\p{Sogdian}+)/x,utf + \x{11800}\x{11da9}\x{10d27}\x{11ee0}\x{16e48}\x{10f27}\x{10f30} + 0: \x{11800}\x{11da9}\x{10d27}\x{11ee0}\x{16e48}\x{10f27}\x{10f30} + 1: \x{11800} + 2: \x{11da9} + 3: \x{10d27} + 4: \x{11ee0} + 5: \x{16e48} + 6: \x{10f27} + 7: \x{10f30} + +# These two are here because of differences from Perl. /^\X/utf A\x{200d}B A ZWJ 0: A\x{200d} - \x{261D}\x{1F3FB}B E_Base E_Modifier + \x{261d}\x{261d}B Extended_Pictographic Extended_Pictographic + 0: \x{261d}\x{261d} + \x{261D}\x{1F3FB}B Extended_Pictographic Extend 0: \x{261d}\x{1f3fb} - \x{1F466}\x{1F3FF}B E_Base_GAZ E_Modifier - 0: \x{1f466}\x{1f3ff} - \x{200d}\x{1F3A4}B ZWJ Glue_After_ZWJ - 0: \x{200d}\x{1f3a4} - \x{200d}\x{1F469}B ZWJ E_Base_GAZ - 0: \x{200d}\x{1f469} \x{1F1E6}\x{1F1E7}B RegionalIndicator RegionalIndicator 0: \x{1f1e6}\x{1f1e7} - \x{261D}\x{E0100}\x{1F3FB}B E_Base Extend E_Modifier - 0: \x{261d}\x{e0100}\x{1f3fb} + \x{261D}\x{1F3FB}\x{261d}B Extended_Pictographic Extend E-P + 0: \x{261d}\x{1f3fb}\x{261d} + \x{261D}\x{1F3FB}\x{200d}\x{261d}B Extended_Pictographic Extend ZWJ E-P + 0: \x{261d}\x{1f3fb}\x{200d}\x{261d} # Regional indicators @@ -4699,6 +4710,70 @@ Callout 0: last capture = 1 0+ B 1: \x{1f1e6}\x{1f1e7} 2: \x{1f1e7}\x{1f1e6} + +# More differences from Perl + +/^[\p{Arabic}]/utf +\= Expect no match + \x{650} +No match + \x{651} +No match + \x{652} +No match + \x{653} +No match + \x{654} +No match + \x{655} +No match + +/^\p{Common}/utf + \x{589} + 0: \x{589} + \x{60c} + 0: \x{60c} + \x{61f} + 0: \x{61f} + \x{964} + 0: \x{964} + \x{965} + 0: \x{965} + +/^\p{Inherited}/utf + \x{64b} + 0: \x{64b} + \x{654} + 0: \x{654} + \x{655} + 0: \x{655} + \x{1D1AA} + 0: \x{1d1aa} + +/\N{U+}/ +Failed: error 193 at offset 2: \N{U+dddd} is supported only in Unicode (UTF) mode + +/\N{U+}/utf +Failed: error 178 at offset 5: digits missing in \x{} or \o{} or \N{U+} + +/\N{U}/ +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u + +# This tests the non-UTF Unicode NEL pattern whitespace character, only +# recognized by PCRE2 with /x when there is Unicode support. + +/A +
B/x + AB + 0: AB + +# This tests Unicode Pattern White Space characters in verb names when they +# are being processed with PCRE2_EXTENDED. Note: there are UTF-8 characters +# with code points greater than 255 between A, B, and C in the pattern. +/(*: AB
C)abc/x,utf,mark,alt_verbnames + abc + 0: abc +MK: ABC # End of testinput5 diff --git a/testdata/testoutput6 b/testdata/testoutput6 index b409fe0..caec833 100644 --- a/testdata/testoutput6 +++ b/testdata/testoutput6 @@ -7667,12 +7667,23 @@ No match a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] Failed: error -53: matching depth limit exceeded +/(*LIMIT_HEAP=0)^((.)(?1)|.)$/ +\= Expect heap limit exceeded + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] +Failed: error -63: heap limit exceeded + +/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/ +\= Expect success + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] + 0: a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] + /(02-)?[0-9]{3}-[0-9]{3}/ 02-123-123 0: 02-123-123 /^(a(?2))(b)(?1)/ abbab\=find_limits +Minimum heap limit = 0 Minimum match limit = 4 Minimum depth limit = 2 0: abbab @@ -7749,8 +7760,9 @@ No match 0: 0+ -/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor -.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?);); +/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor +\= Expect limit exceeded +.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00\x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?);); Failed: error -47: match limit exceeded /\n/firstline diff --git a/testdata/testoutput8-16-4 b/testdata/testoutput8-16-4 new file mode 100644 index 0000000..722b0e1 --- /dev/null +++ b/testdata/testoutput8-16-4 @@ -0,0 +1,1022 @@ +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ +Memory allocation (code space): 32 +------------------------------------------------------------------ + 0 12 Bra + 3 6 CBra 1 + 7 /i b + 9 6 Ket + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/(?s)(.*X|^B)/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 20 Bra + 3 8 CBra 1 + 7 AllAny* + 9 X + 11 6 Alt + 14 ^ + 15 B + 17 14 Ket + 20 20 Ket + 23 End +------------------------------------------------------------------ + +/(?s:.*X|^B)/ +Memory allocation (code space): 46 +------------------------------------------------------------------ + 0 19 Bra + 3 7 Bra + 6 AllAny* + 8 X + 10 6 Alt + 13 ^ + 14 B + 16 13 Ket + 19 19 Ket + 22 End +------------------------------------------------------------------ + +/^[[:alnum:]]/ +Memory allocation (code space): 50 +------------------------------------------------------------------ + 0 21 Bra + 3 ^ + 4 [0-9A-Za-z] + 21 21 Ket + 24 End +------------------------------------------------------------------ + +/#/Ix +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 3 Bra + 3 3 Ket + 6 End +------------------------------------------------------------------ +Capturing subpattern count = 0 +May match empty string +Options: extended +Subject length lower bound = 0 + +/a#/Ix +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 a + 5 5 Ket + 8 End +------------------------------------------------------------------ +Capturing subpattern count = 0 +Options: extended +First code unit = 'a' +Subject length lower bound = 1 + +/x?+/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 x?+ + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/x++/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 x++ + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/x{1,3}+/ +Memory allocation (code space): 24 +------------------------------------------------------------------ + 0 8 Bra + 3 x + 5 x{0,2}+ + 8 8 Ket + 11 End +------------------------------------------------------------------ + +/(x)*+/ +Memory allocation (code space): 34 +------------------------------------------------------------------ + 0 13 Bra + 3 Braposzero + 4 6 CBraPos 1 + 8 x + 10 6 KetRpos + 13 13 Ket + 16 End +------------------------------------------------------------------ + +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ +Memory allocation (code space): 166 +------------------------------------------------------------------ + 0 79 Bra + 3 ^ + 4 72 CBra 1 + 8 6 CBra 2 + 12 a+ + 14 6 Ket + 17 22 CBra 3 + 21 [ab]+? + 39 22 Ket + 42 22 CBra 4 + 46 [bc]+ + 64 22 Ket + 67 6 CBra 5 + 71 \w*+ + 73 6 Ket + 76 72 Ket + 79 79 Ket + 82 End +------------------------------------------------------------------ + +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\<EjmhUZ\?\.akp2dF\>qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 1652 +------------------------------------------------------------------ + 0 822 Bra + 3 8J$WE<.rX+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDD<EjmhUZ?.akp2dF>qmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +821 \b +822 822 Ket +825 End +------------------------------------------------------------------ + +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\<EjmhUZ\?\.akp2dF\>qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 1632 +------------------------------------------------------------------ + 0 812 Bra + 3 $<.X+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDD<EjmhUZ?.akp2dF>qmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +811 \b +812 812 Ket +815 End +------------------------------------------------------------------ + +/(a(?1)b)/ +Memory allocation (code space): 42 +------------------------------------------------------------------ + 0 17 Bra + 3 11 CBra 1 + 7 a + 9 3 Recurse + 12 b + 14 11 Ket + 17 17 Ket + 20 End +------------------------------------------------------------------ + +/(a(?1)+b)/ +Memory allocation (code space): 54 +------------------------------------------------------------------ + 0 23 Bra + 3 17 CBra 1 + 7 a + 9 6 SBra + 12 3 Recurse + 15 6 KetRmax + 18 b + 20 17 Ket + 23 23 Ket + 26 End +------------------------------------------------------------------ + +/a(?P<name1>b|c)d(?P<longername2>e)/ +Memory allocation (code space): 68 +------------------------------------------------------------------ + 0 30 Bra + 3 a + 5 6 CBra 1 + 9 b + 11 5 Alt + 14 c + 16 11 Ket + 19 d + 21 6 CBra 2 + 25 e + 27 6 Ket + 30 30 Ket + 33 End +------------------------------------------------------------------ + +/(?:a(?P<c>c(?P<d>d)))(?P<a>a)/ +Memory allocation (code space): 84 +------------------------------------------------------------------ + 0 38 Bra + 3 23 Bra + 6 a + 8 15 CBra 1 + 12 c + 14 6 CBra 2 + 18 d + 20 6 Ket + 23 15 Ket + 26 23 Ket + 29 6 CBra 3 + 33 a + 35 6 Ket + 38 38 Ket + 41 End +------------------------------------------------------------------ + +/(?P<a>a)...(?P=a)bbb(?P>a)d/ +Memory allocation (code space): 64 +------------------------------------------------------------------ + 0 28 Bra + 3 6 CBra 1 + 7 a + 9 6 Ket + 12 Any + 13 Any + 14 Any + 15 \1 + 17 bbb + 23 3 Recurse + 26 d + 28 28 Ket + 31 End +------------------------------------------------------------------ + +/abc(?C255)de(?C)f/ +Memory allocation (code space): 62 +------------------------------------------------------------------ + 0 27 Bra + 3 abc + 9 Callout 255 10 1 + 15 de + 19 Callout 0 16 1 + 25 f + 27 27 Ket + 30 End +------------------------------------------------------------------ + +/abcde/auto_callout +Memory allocation (code space): 106 +------------------------------------------------------------------ + 0 49 Bra + 3 Callout 255 0 1 + 9 a + 11 Callout 255 1 1 + 17 b + 19 Callout 255 2 1 + 25 c + 27 Callout 255 3 1 + 33 d + 35 Callout 255 4 1 + 41 e + 43 Callout 255 5 0 + 49 49 Ket + 52 End +------------------------------------------------------------------ + +/\x{100}/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{100} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x{1000}/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{1000} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x{10000}/utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 6 Bra + 3 \x{10000} + 6 6 Ket + 9 End +------------------------------------------------------------------ + +/\x{100000}/utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 6 Bra + 3 \x{100000} + 6 6 Ket + 9 End +------------------------------------------------------------------ + +/\x{10ffff}/utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 6 Bra + 3 \x{10ffff} + 6 6 Ket + 9 End +------------------------------------------------------------------ + +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/[\x{ff}]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{ff} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[\x{100}]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{100} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x80/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{80} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\xff/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{ff} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 A\x{2262}\x{391}. + 11 11 Ket + 14 End +------------------------------------------------------------------ +Capturing subpattern count = 0 +Options: utf +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf +Memory allocation (code space): 26 +------------------------------------------------------------------ + 0 9 Bra + 3 \x{d55c}\x{ad6d}\x{c5b4} + 9 9 Ket + 12 End +------------------------------------------------------------------ +Capturing subpattern count = 0 +Options: utf +First code unit = \x{d55c} +Last code unit = \x{c5b4} +Subject length lower bound = 3 + +/\x{65e5}\x{672c}\x{8a9e}/I,utf +Memory allocation (code space): 26 +------------------------------------------------------------------ + 0 9 Bra + 3 \x{65e5}\x{672c}\x{8a9e} + 9 9 Ket + 12 End +------------------------------------------------------------------ +Capturing subpattern count = 0 +Options: utf +First code unit = \x{65e5} +Last code unit = \x{8a9e} +Subject length lower bound = 3 + +/[\x{100}]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{100} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[Z\x{100}]/utf +Memory allocation (code space): 60 +------------------------------------------------------------------ + 0 26 Bra + 3 [Z\x{100}] + 26 26 Ket + 29 End +------------------------------------------------------------------ + +/^[\x{100}\E-\Q\E\x{150}]/utf +Memory allocation (code space): 32 +------------------------------------------------------------------ + 0 12 Bra + 3 ^ + 4 [\x{100}-\x{150}] + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/^[\QĀ\E-\QŐ\E]/utf +Memory allocation (code space): 32 +------------------------------------------------------------------ + 0 12 Bra + 3 ^ + 4 [\x{100}-\x{150}] + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/^[\QĀ\E-\QŐ\E/utf +Failed: error 106 at offset 13: missing terminating ] for character class + +/[\p{L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\p{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\p{^L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\P{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\P{L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\P{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\P{^L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\p{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[abc\p{L}\x{0660}]/utf +Memory allocation (code space): 66 +------------------------------------------------------------------ + 0 29 Bra + 3 [a-c\p{L}\x{660}] + 29 29 Ket + 32 End +------------------------------------------------------------------ + +/[\p{Nd}]/utf +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\p{Nd}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\p{Nd}+-]+/utf +Memory allocation (code space): 64 +------------------------------------------------------------------ + 0 28 Bra + 3 [+\-\p{Nd}]++ + 28 28 Ket + 31 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf +Memory allocation (code space): 36 +------------------------------------------------------------------ + 0 14 Bra + 3 /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 14 14 Ket + 17 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf +Memory allocation (code space): 36 +------------------------------------------------------------------ + 0 14 Bra + 3 A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 14 14 Ket + 17 End +------------------------------------------------------------------ + +/[\x{105}-\x{109}]/i,utf +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\x{104}-\x{109}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/( ( (?(1)0|) )* )/x +Memory allocation (code space): 70 +------------------------------------------------------------------ + 0 31 Bra + 3 25 CBra 1 + 7 Brazero + 8 17 SCBra 2 + 12 7 Cond + 15 1 Cond ref + 17 0 + 19 3 Alt + 22 10 Ket + 25 17 KetRmax + 28 25 Ket + 31 31 Ket + 34 End +------------------------------------------------------------------ + +/( (?(1)0|)* )/x +Memory allocation (code space): 56 +------------------------------------------------------------------ + 0 24 Bra + 3 18 CBra 1 + 7 Brazero + 8 7 SCond + 11 1 Cond ref + 13 0 + 15 3 Alt + 18 10 KetRmax + 21 18 Ket + 24 24 Ket + 27 End +------------------------------------------------------------------ + +/[a]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 a + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[a]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 a + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[\xaa]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{aa} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[\xaa]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{aa} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^a]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^a] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^a]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^a] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^\xaa]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^\x{aa}] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^\xaa]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^\x{aa}] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +#pattern -memory + +/[^\d]/utf,ucp +------------------------------------------------------------------ + 0 11 Bra + 3 [^\p{Nd}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[[:^alpha:][:^cntrl:]]+/utf,ucp +------------------------------------------------------------------ + 0 15 Bra + 3 [\P{L}\P{Cc}]++ + 15 15 Ket + 18 End +------------------------------------------------------------------ + +/[[:^cntrl:][:^alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 15 Bra + 3 [\P{Cc}\P{L}]++ + 15 15 Ket + 18 End +------------------------------------------------------------------ + +/[[:alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 12 Bra + 3 [\p{L}]++ + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/[[:^alpha:]\S]+/utf,ucp +------------------------------------------------------------------ + 0 15 Bra + 3 [\P{L}\P{Xsp}]++ + 15 15 Ket + 18 End +------------------------------------------------------------------ + +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ +------------------------------------------------------------------ + 0 70 Bra + 3 abc + 9 6 CBra 1 + 13 d + 15 5 Alt + 18 e + 20 11 Ket + 23 *THEN + 24 x + 26 13 CBra 2 + 30 123 + 36 *THEN + 37 4 + 39 28 Alt + 42 567 + 48 6 CBra 3 + 52 b + 54 5 Alt + 57 q + 59 11 Ket + 62 *THEN + 63 xx + 67 41 Ket + 70 70 Ket + 73 End +------------------------------------------------------------------ + +/(((a\2)|(a*)\g<-1>))*a?/ +------------------------------------------------------------------ + 0 52 Bra + 3 Brazero + 4 43 SCBra 1 + 8 36 Once + 11 15 CBra 2 + 15 8 CBra 3 + 19 a + 21 \2 + 23 8 Ket + 26 15 Alt + 29 6 CBra 4 + 33 a* + 35 6 Ket + 38 29 Recurse + 41 30 Ket + 44 36 Ket + 47 43 KetRmax + 50 a?+ + 52 52 Ket + 55 End +------------------------------------------------------------------ + +/((?+1)(\1))/ +------------------------------------------------------------------ + 0 28 Bra + 3 22 Once + 6 16 CBra 1 + 10 13 Recurse + 13 6 CBra 2 + 17 \1 + 19 6 Ket + 22 16 Ket + 25 22 Ket + 28 28 Ket + 31 End +------------------------------------------------------------------ + +"(?1)(?#?'){2}(a)" +------------------------------------------------------------------ + 0 18 Bra + 3 9 Recurse + 6 9 Recurse + 9 6 CBra 1 + 13 a + 15 6 Ket + 18 18 Ket + 21 End +------------------------------------------------------------------ + +/.((?2)(?R)|\1|$)()/ +------------------------------------------------------------------ + 0 39 Bra + 3 Any + 4 25 Once + 7 10 CBra 1 + 11 32 Recurse + 14 0 Recurse + 17 5 Alt + 20 \1 + 22 4 Alt + 25 $ + 26 19 Ket + 29 25 Ket + 32 4 CBra 2 + 36 4 Ket + 39 39 Ket + 42 End +------------------------------------------------------------------ + +/.((?3)(?R)()(?2)|\1|$)()/ +------------------------------------------------------------------ + 0 49 Bra + 3 Any + 4 35 Once + 7 20 CBra 1 + 11 42 Recurse + 14 0 Recurse + 17 4 CBra 2 + 21 4 Ket + 24 17 Recurse + 27 5 Alt + 30 \1 + 32 4 Alt + 35 $ + 36 29 Ket + 39 35 Ket + 42 4 CBra 3 + 46 4 Ket + 49 49 Ket + 52 End +------------------------------------------------------------------ + +/(?1)()((((((\1++))\x85)+)|))/ +------------------------------------------------------------------ + 0 69 Bra + 3 6 Recurse + 6 4 CBra 1 + 10 4 Ket + 13 53 CBra 2 + 17 43 CBra 3 + 21 36 CBra 4 + 25 29 CBra 5 + 29 20 CBra 6 + 33 13 CBra 7 + 37 6 Once + 40 \1+ + 43 6 Ket + 46 13 Ket + 49 20 Ket + 52 \x{85} + 54 29 KetRmax + 57 36 Ket + 60 3 Alt + 63 46 Ket + 66 53 Ket + 69 69 Ket + 72 End +------------------------------------------------------------------ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?| +))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) +/parens_nest_limit=1000,-fullbincode + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 +Failed: error 186 at offset 12820: regular expression is too complicated + +/(?(1)(?1)){8,}+()/debug +------------------------------------------------------------------ + 0 110 Bra + 3 97 Once + 6 8 Cond + 9 1 Cond ref + 11 103 Recurse + 14 8 Ket + 17 8 Cond + 20 1 Cond ref + 22 103 Recurse + 25 8 Ket + 28 8 Cond + 31 1 Cond ref + 33 103 Recurse + 36 8 Ket + 39 8 Cond + 42 1 Cond ref + 44 103 Recurse + 47 8 Ket + 50 8 Cond + 53 1 Cond ref + 55 103 Recurse + 58 8 Ket + 61 8 Cond + 64 1 Cond ref + 66 103 Recurse + 69 8 Ket + 72 8 Cond + 75 1 Cond ref + 77 103 Recurse + 80 8 Ket + 83 14 SBraPos + 86 8 SCond + 89 1 Cond ref + 91 103 Recurse + 94 8 Ket + 97 14 KetRpos +100 97 Ket +103 4 CBra 1 +107 4 Ket +110 110 Ket +113 End +------------------------------------------------------------------ +Capturing subpattern count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: + 1: + +/(?(1)|a(?1)b){2,}+()/debug +------------------------------------------------------------------ + 0 58 Bra + 3 45 Once + 6 5 Cond + 9 1 Cond ref + 11 10 Alt + 14 a + 16 51 Recurse + 19 b + 21 15 Ket + 24 21 SBraPos + 27 5 SCond + 30 1 Cond ref + 32 10 Alt + 35 a + 37 51 Recurse + 40 b + 42 15 Ket + 45 21 KetRpos + 48 45 Ket + 51 4 CBra 1 + 55 4 Ket + 58 58 Ket + 61 End +------------------------------------------------------------------ +Capturing subpattern count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcde +No match + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug +------------------------------------------------------------------ + 0 194 Bra + 3 61 CBra 1 + 7 3 Recurse + 10 131 Recurse + 13 138 Recurse + 16 145 Recurse + 19 152 Recurse + 22 159 Recurse + 25 166 Recurse + 28 173 Recurse + 31 180 Recurse + 34 180 Recurse + 37 173 Recurse + 40 166 Recurse + 43 159 Recurse + 46 152 Recurse + 49 145 Recurse + 52 138 Recurse + 55 131 Recurse + 58 3 Recurse + 61 0 Recurse + 64 61 Ket + 67 61 SCBra 1 + 71 3 Recurse + 74 131 Recurse + 77 138 Recurse + 80 145 Recurse + 83 152 Recurse + 86 159 Recurse + 89 166 Recurse + 92 173 Recurse + 95 180 Recurse + 98 180 Recurse +101 173 Recurse +104 166 Recurse +107 159 Recurse +110 152 Recurse +113 145 Recurse +116 138 Recurse +119 131 Recurse +122 3 Recurse +125 0 Recurse +128 61 KetRmax +131 4 CBra 2 +135 4 Ket +138 4 CBra 3 +142 4 Ket +145 4 CBra 4 +149 4 Ket +152 4 CBra 5 +156 4 Ket +159 4 CBra 6 +163 4 Ket +166 4 CBra 7 +170 4 Ket +173 4 CBra 8 +177 4 Ket +180 4 CBra 9 +184 4 Ket +187 4 CBra 10 +191 4 Ket +194 194 Ket +197 End +------------------------------------------------------------------ +Capturing subpattern count = 10 +May match empty string +Subject length lower bound = 0 + +/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/ +Failed: error 114 at offset 509: missing closing parenthesis + +/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode + +# End of testinput8 diff --git a/testdata/testoutputEBC b/testdata/testoutputEBC index 03e179a..4edc8f9 100644 --- a/testdata/testoutputEBC +++ b/testdata/testoutputEBC @@ -1,3 +1,4 @@ +PCRE2 version 10.32-RC1 2018-02-19 # This is a specialized test for checking, when PCRE2 is compiled with the # EBCDIC option but in an ASCII environment, that newline, white space, and \c # functionality is working. It catches cases where explicit values such as 0x0a @@ -200,6 +201,6 @@ No match 0: \xff /\&/ -Failed: error 168 at offset 2: \c\x20must\x20be\x20followed\x20by\x20a\x20letter\x20or\x20one\x20of\x20[\]^_\x3f +Failed: error 168 at offset 3: \c\x20must\x20be\x20followed\x20by\x20a\x20letter\x20or\x20one\x20of\x20[\]^_\x3f # End |