summaryrefslogtreecommitdiff
path: root/doc/html/pcre2unicode.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html/pcre2unicode.html')
-rw-r--r--doc/html/pcre2unicode.html16
1 files changed, 11 insertions, 5 deletions
diff --git a/doc/html/pcre2unicode.html b/doc/html/pcre2unicode.html
index 448a221..24f6d93 100644
--- a/doc/html/pcre2unicode.html
+++ b/doc/html/pcre2unicode.html
@@ -26,7 +26,8 @@ you must call
with the PCRE2_UTF option flag, or the pattern must start with the sequence
(*UTF). When either of these is the case, both the pattern and any subject
strings that are matched against it are treated as UTF strings instead of
-strings of individual one-code-unit characters.
+strings of individual one-code-unit characters. There are also some other
+changes to the way characters are handled, as documented below.
</P>
<P>
If you do not need Unicode support you can build PCRE2 without it, in which
@@ -53,12 +54,17 @@ compatibility with Perl 5.6. PCRE2 does not support this.
WIDE CHARACTERS AND UTF MODES
</b><br>
<P>
-Codepoints less than 256 can be specified in patterns by either braced or
+Code points less than 256 can be specified in patterns by either braced or
unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3). Larger
values have to use braced sequences. Unbraced octal code points up to \777 are
also recognized; larger ones can be coded using \o{...}.
</P>
<P>
+The escape sequence \N{U+&#60;hex digits&#62;} is recognized as another way of
+specifying a Unicode character by code point in a UTF mode. It is not allowed
+in non-UTF modes.
+</P>
+<P>
In UTF modes, repeat quantifiers apply to complete UTF characters, not to
individual code units.
</P>
@@ -116,7 +122,7 @@ CASE-EQUIVALENCE IN UTF MODES
Case-insensitive matching in a UTF mode makes use of Unicode properties except
for characters whose code points are less than 128 and that have at most two
case-equivalent values. For these, a direct table lookup is used for speed. A
-few Unicode characters such as Greek sigma have more than two codepoints that
+few Unicode characters such as Greek sigma have more than two code points that
are case-equivalent, and these are treated as such.
</P>
<br><b>
@@ -294,9 +300,9 @@ Cambridge, England.
REVISION
</b><br>
<P>
-Last updated: 17 May 2017
+Last updated: 02 September 2018
<br>
-Copyright &copy; 1997-2017 University of Cambridge.
+Copyright &copy; 1997-2018 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.