diff options
Diffstat (limited to 'doc/html/pcre2unicode.html')
-rw-r--r-- | doc/html/pcre2unicode.html | 16 |
1 files changed, 11 insertions, 5 deletions
diff --git a/doc/html/pcre2unicode.html b/doc/html/pcre2unicode.html index 448a221..24f6d93 100644 --- a/doc/html/pcre2unicode.html +++ b/doc/html/pcre2unicode.html @@ -26,7 +26,8 @@ you must call with the PCRE2_UTF option flag, or the pattern must start with the sequence (*UTF). When either of these is the case, both the pattern and any subject strings that are matched against it are treated as UTF strings instead of -strings of individual one-code-unit characters. +strings of individual one-code-unit characters. There are also some other +changes to the way characters are handled, as documented below. </P> <P> If you do not need Unicode support you can build PCRE2 without it, in which @@ -53,12 +54,17 @@ compatibility with Perl 5.6. PCRE2 does not support this. WIDE CHARACTERS AND UTF MODES </b><br> <P> -Codepoints less than 256 can be specified in patterns by either braced or +Code points less than 256 can be specified in patterns by either braced or unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3). Larger values have to use braced sequences. Unbraced octal code points up to \777 are also recognized; larger ones can be coded using \o{...}. </P> <P> +The escape sequence \N{U+<hex digits>} is recognized as another way of +specifying a Unicode character by code point in a UTF mode. It is not allowed +in non-UTF modes. +</P> +<P> In UTF modes, repeat quantifiers apply to complete UTF characters, not to individual code units. </P> @@ -116,7 +122,7 @@ CASE-EQUIVALENCE IN UTF MODES Case-insensitive matching in a UTF mode makes use of Unicode properties except for characters whose code points are less than 128 and that have at most two case-equivalent values. For these, a direct table lookup is used for speed. A -few Unicode characters such as Greek sigma have more than two codepoints that +few Unicode characters such as Greek sigma have more than two code points that are case-equivalent, and these are treated as such. </P> <br><b> @@ -294,9 +300,9 @@ Cambridge, England. REVISION </b><br> <P> -Last updated: 17 May 2017 +Last updated: 02 September 2018 <br> -Copyright © 1997-2017 University of Cambridge. +Copyright © 1997-2018 University of Cambridge. <br> <p> Return to the <a href="index.html">PCRE2 index page</a>. |