summaryrefslogtreecommitdiff
path: root/README.utf-8
diff options
context:
space:
mode:
authorSteve Bennett <steveb@workware.net.au>2010-11-02 21:20:36 +1000
committerSteve Bennett <steveb@workware.net.au>2010-11-17 07:57:38 +1000
commit84ae3392d8b001acb9731be6d95821f32704e3e6 (patch)
tree1c9ccea82fd3d62ea4473fa769d23ce6c299304d /README.utf-8
parent1c0d153ae8ba3ce430cee55723ed86909453ff65 (diff)
Updates to the UTF-8 documentation
Signed-off-by: Steve Bennett <steveb@workware.net.au>
Diffstat (limited to 'README.utf-8')
-rw-r--r--README.utf-817
1 files changed, 12 insertions, 5 deletions
diff --git a/README.utf-8 b/README.utf-8
index ad6c7b5..eca528c 100644
--- a/README.utf-8
+++ b/README.utf-8
@@ -98,11 +98,18 @@ unicode data table at http://unicode.org/Public/UNIDATA/UnicodeData.txt
Working with Binary Data and non-UTF-8 encodings
------------------------------------------------
-If it is necessary to work with both UTF-8 and binary data (bytes
->= 0x80), or non-UTF-8 encodings you will need to arrange for the
-data to be converted between UTF-8 on input and output. Individual
-characters can be converted from Unicode to UTF-8 with the
-utf8_fromunicode() function and the reverse with utf8_tounicode().
+Almost all Jim commands will work identically with binary data and
+UTF-8 encoded data, including read, gets, puts and 'string eq'. It
+is only certain string manipulation commands which will operated
+differently. For example, 'string index' will return UTF-8 characters,
+not bytes.
+
+If it is necessary to manipulate strings containing binary, non-ASCII
+data (bytes >= 0x80), there are two options.
+
+1. Build Jim without UTF-8 support
+2. Arrange to encode and decode binary data or data in other encodings
+ to UTF-8 before manipulation.
Internal Details
----------------