summaryrefslogtreecommitdiff
path: root/doc/pcre2unicode.3
diff options
context:
space:
mode:
Diffstat (limited to 'doc/pcre2unicode.3')
-rw-r--r--doc/pcre2unicode.333
1 files changed, 18 insertions, 15 deletions
diff --git a/doc/pcre2unicode.3 b/doc/pcre2unicode.3
index 59e226e..253d4b6 100644
--- a/doc/pcre2unicode.3
+++ b/doc/pcre2unicode.3
@@ -1,4 +1,4 @@
-.TH PCRE2UNICODE 3 "16 October 2015" "PCRE2 10.21"
+.TH PCRE2UNICODE 3 "03 July 2016" "PCRE2 10.22"
.SH NAME
PCRE - Perl-compatible regular expressions (revised API)
.SH "UNICODE AND UTF SUPPORT"
@@ -57,18 +57,21 @@ individual code units.
In UTF modes, the dot metacharacter matches one UTF character instead of a
single code unit.
.P
-The escape sequence \eC can be used to match a single code unit, in a UTF mode,
+The escape sequence \eC can be used to match a single code unit in a UTF mode,
but its use can lead to some strange effects because it breaks up multi-unit
characters (see the description of \eC in the
.\" HREF
\fBpcre2pattern\fP
.\"
-documentation). The use of \eC is not supported by the alternative matching
-function \fBpcre2_dfa_match()\fP when in UTF mode. Its use provokes a
-match-time error. The JIT optimization also does not support \eC in UTF mode.
-If JIT optimization is requested for a UTF pattern that contains \eC, it will
-not succeed, and so the matching will be carried out by the normal interpretive
-function.
+documentation).
+.P
+The use of \eC is not supported by the alternative matching function
+\fBpcre2_dfa_match()\fP when in UTF-8 or UTF-16 mode, that is, when a character
+may consist of more than one code unit. The use of \eC in these modes provokes
+a match-time error. Also, the JIT optimization does not support \eC in these
+modes. If JIT optimization is requested for a UTF-8 or UTF-16 pattern that
+contains \eC, it will not succeed, and so when \fBpcre2_match()\fP is called,
+the matching will be carried out by the normal interpretive function.
.P
The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly test
characters of any code value, but, by default, the characters that PCRE2
@@ -232,9 +235,9 @@ never occur in a valid UTF-8 string.
.sp
The following negative error codes are given for invalid UTF-16 strings:
.sp
- PCRE_UTF16_ERR1 Missing low surrogate at end of string
- PCRE_UTF16_ERR2 Invalid low surrogate follows high surrogate
- PCRE_UTF16_ERR3 Isolated low surrogate
+ PCRE2_ERROR_UTF16_ERR1 Missing low surrogate at end of string
+ PCRE2_ERROR_UTF16_ERR2 Invalid low surrogate follows high surrogate
+ PCRE2_ERROR_UTF16_ERR3 Isolated low surrogate
.sp
.
.
@@ -244,8 +247,8 @@ The following negative error codes are given for invalid UTF-16 strings:
.sp
The following negative error codes are given for invalid UTF-32 strings:
.sp
- PCRE_UTF32_ERR1 Surrogate character (range from 0xd800 to 0xdfff)
- PCRE_UTF32_ERR2 Code point is greater than 0x10ffff
+ PCRE2_ERROR_UTF32_ERR1 Surrogate character (0xd800 to 0xdfff)
+ PCRE2_ERROR_UTF32_ERR2 Code point is greater than 0x10ffff
.sp
.
.
@@ -263,6 +266,6 @@ Cambridge, England.
.rs
.sp
.nf
-Last updated: 16 October 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 03 July 2016
+Copyright (c) 1997-2016 University of Cambridge.
.fi