diff options
-rw-r--r-- | README | 41 |
1 files changed, 41 insertions, 0 deletions
@@ -0,0 +1,41 @@ +libguess - a high-speed character set detection library +------------------------------------------------------- + +libguess is free, but copyrighted software. See COPYING for details. + +Using libguess in your programs: +-------------------------------- + +libguess has two functions: + + - libguess_determine_encoding(const char *inbuf, int length, const char *region); + + This detects a character set. Returns an appropriate charset name + that can be passed to iconv_open(). Region is the name of the language + or region that the data is related to, e.g. 'Baltic' for the Baltic states, + or 'Japanese' for Japan. + + - libguess_validate_utf8(const char *inbuf, int length); + + This employs libguess's DFA-based character set validation rules to ensure + that a string is pure UTF-8. GLib's UTF-8 validation functions are broken, + for example. + +Just include libguess.h and link to libguess to get these functions in your +program. For your convenience, a pkg-config file is also supplied. + +How does libguess work: +----------------------- + +libguess employs discrete-finite automata to deduce the character set of the +input buffer. The advantage of this is that all character sets can be checked +in parallel, and quickly. Right now, libguess passes a byte to each DFA on the +same pass, meaning that the winning character set can be deduced as efficiently +as possible. + +libguess is fully reentrant, using only local stack memory for DFA operations. + +I found a bug: +-------------- + +Please report your bug to http://jira.atheme.org/ against the libguess component. |