Updates to the UTF-8 documentation

Signed-off-by: Steve Bennett <steveb@workware.net.au>
author: Steve Bennett <steveb@workware.net.au> 2010-11-02 21:20:36 +1000
committer: Steve Bennett <steveb@workware.net.au> 2010-11-17 07:57:38 +1000
commit: 84ae3392d8b001acb9731be6d95821f32704e3e6 (patch)
tree: 1c9ccea82fd3d62ea4473fa769d23ce6c299304d /README.utf-8
parent: 1c0d153ae8ba3ce430cee55723ed86909453ff65 (diff)
1 files changed, 12 insertions, 5 deletions
diff --git a/README.utf-8 b/README.utf-8
index ad6c7b5..eca528c 100644
--- a/README.utf-8
+++ b/README.utf-8
@@ -98,11 +98,18 @@ unicode data table at http://unicode.org/Public/UNIDATA/UnicodeData.txt
 
 Working with Binary Data and non-UTF-8 encodings
 ------------------------------------------------
-If it is necessary to work with both UTF-8 and binary data (bytes
->= 0x80), or non-UTF-8 encodings you will need to arrange for the
-data to be converted between UTF-8 on input and output.  Individual
-characters can be converted from Unicode to UTF-8 with the
-utf8_fromunicode() function and the reverse with utf8_tounicode().
+Almost all Jim commands will work identically with binary data and
+UTF-8 encoded data, including read, gets, puts and 'string eq'.  It
+is only certain string manipulation commands which will operated
+differently.  For example, 'string index' will return UTF-8 characters,
+not bytes.
+
+If it is necessary to manipulate strings containing binary, non-ASCII
+data (bytes >= 0x80), there are two options.
+
+1. Build Jim without UTF-8 support
+2. Arrange to encode and decode binary data or data in other encodings
+   to UTF-8 before manipulation.
 
 Internal Details
 ----------------
author	Steve Bennett <steveb@workware.net.au>	2010-11-02 21:20:36 +1000
committer	Steve Bennett <steveb@workware.net.au>	2010-11-17 07:57:38 +1000
commit	84ae3392d8b001acb9731be6d95821f32704e3e6 (patch)
tree	1c9ccea82fd3d62ea4473fa769d23ce6c299304d /README.utf-8
parent	1c0d153ae8ba3ce430cee55723ed86909453ff65 (diff)