summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README41
1 files changed, 41 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..d515d51
--- /dev/null
+++ b/README
@@ -0,0 +1,41 @@
+libguess - a high-speed character set detection library
+-------------------------------------------------------
+
+libguess is free, but copyrighted software. See COPYING for details.
+
+Using libguess in your programs:
+--------------------------------
+
+libguess has two functions:
+
+ - libguess_determine_encoding(const char *inbuf, int length, const char *region);
+
+ This detects a character set. Returns an appropriate charset name
+ that can be passed to iconv_open(). Region is the name of the language
+ or region that the data is related to, e.g. 'Baltic' for the Baltic states,
+ or 'Japanese' for Japan.
+
+ - libguess_validate_utf8(const char *inbuf, int length);
+
+ This employs libguess's DFA-based character set validation rules to ensure
+ that a string is pure UTF-8. GLib's UTF-8 validation functions are broken,
+ for example.
+
+Just include libguess.h and link to libguess to get these functions in your
+program. For your convenience, a pkg-config file is also supplied.
+
+How does libguess work:
+-----------------------
+
+libguess employs discrete-finite automata to deduce the character set of the
+input buffer. The advantage of this is that all character sets can be checked
+in parallel, and quickly. Right now, libguess passes a byte to each DFA on the
+same pass, meaning that the winning character set can be deduced as efficiently
+as possible.
+
+libguess is fully reentrant, using only local stack memory for DFA operations.
+
+I found a bug:
+--------------
+
+Please report your bug to http://jira.atheme.org/ against the libguess component.