summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorMichal Čihař <michal@cihar.com>2017-11-08 13:46:27 +0100
committerMichal Čihař <michal@cihar.com>2017-11-08 13:46:27 +0100
commit0bdce0843a42d6e4957d4b37b0a0644fcdd765fc (patch)
treef10c5a8edbee4c8797f78a57e08ee26947e3a7c8 /doc
parenta24195ca78c6b46d2c1f0c8b9301a35bd9d9c981 (diff)
New upstream version 0.5.2
Diffstat (limited to 'doc')
-rw-r--r--doc/DICTFILE_FORMAT352
-rw-r--r--doc/sdcv.1104
-rw-r--r--doc/uk/sdcv.184
3 files changed, 540 insertions, 0 deletions
diff --git a/doc/DICTFILE_FORMAT b/doc/DICTFILE_FORMAT
new file mode 100644
index 0000000..d1b1d9d
--- /dev/null
+++ b/doc/DICTFILE_FORMAT
@@ -0,0 +1,352 @@
+Format for StarDict dictionary files
+------------------------------------
+
+StarDict homepage: http://stardict.sourceforge.net
+
+{0}. Number and Byte-order Conventions
+When you record the numbers that identify sizes, offsets, etc., you
+should use 32-bit numbers, such as you might represent with a glong.
+
+In order to make StarDict work on different platforms, these numbers
+must be in network byte order. You can ensure the correct byte order
+by using the g_htonl() function when creating dictionary files.
+Conversely, you should use g_ntohl() when reading dictionary files.
+
+Strings should be encoded in UTF-8.
+
+
+{1}. Files
+Every dictionary consists of three files:
+(1). somedict.ifo
+(2). somedict.idx or somedict.idx.gz
+(3). somedict.dict or somedict.dict.dz
+
+You can use gzip -9 to compress the .idx file. If the .idx file are not
+compressed, the loading can be fast and save memory when using, compress it
+will make the .idx file load into memory and make the quering fast when using.
+
+You can use dictzip to compress the .dict file.
+"dictzip" uses the same compression algorithm and file format as does gzip,
+but provides a table that can be used to randomly access compressed blocks
+in the file. The use of 50-64kB blocks for compression typically degrades
+compression by less than 10%, while maintaining acceptable random access
+capabilities for all data in the file. As an added benefit, files
+compressed with dictzip can be decompressed with gunzip.
+For more information about dictzip, refer to DICT project, please see:
+http://www.dict.org
+
+Stardict will search for the .ifo file, then open the .idx or
+.idx.gz file and the .dict.dz or .dict file which is in the same directory and
+has the same base name.
+
+
+
+{2}. The ".ifo" file's format.
+The .ifo file has the following format:
+
+StarDict's dict ifo file
+version=2.4.2
+[options]
+
+Note that the current "version" string must be "2.4.2". If it's not,
+then StarDict will refuse to read the file.
+
+[options]
+---------
+In the example above, [options] expands to any of the following lines
+specifying information about the dictionary. Each option is a keyword
+followed by an equal sign, then the value of that option, then a
+newline. The options may be appear in any order.
+
+Note that the dictionary must have at least a bookname, a wordcount and a
+idxfilesize, or the load will fail. All other information is optional. All
+strings should be encoded in UTF-8.
+
+Available options:
+
+bookname= // required
+wordcount= // required
+idxfilesize= // required
+author=
+email=
+website=
+description=
+date=
+sametypesequence= // very important.
+
+
+wordcount is the count of word entries in .idx file, it must be right.
+
+idxfilesize is the size(in bytes) of the .idx file, even the .idx is compressed
+to a .idx.gz file, this entry must record the original .idx file's size, and it
+must be right too. The .gz file don't contain its original size information,
+but knowing the original size can speed up the extraction to memory, as you
+don't need to call realloc() for many times.
+
+
+The "sametypesequence" option is described in further detail below.
+
+***
+sametypesequence
+
+You should first familiarize yourself with the .dict file format
+described in the next section so that you can understand what effect
+this option has on the .dict file.
+
+If the sametypesequence option is set, it tells StarDict that each
+word's data in the .dict file will have the same sequence of datatypes.
+In this case, we expect a .dict file that's been optimized in two
+ways: the type identifiers should be omitted, and the size marker for
+the last data entry of each word should be omitted.
+
+Let's consider some concrete examples of the sametypesequence option.
+
+Suppose that a dictionary records many .wav files, and so sets:
+ sametypesequence=W
+In this case, each word's entry in the .dict file consists solely of a
+wav file. In the .dict file, you would leave out the 'W' character
+before each entry, and you would also omit the 32-bit integer at the
+front of each .wav entry that would normally give the entry's length.
+You can do this since the length is known from the information in the
+idx file.
+
+As another example, suppose a dictionary contains phonetic information
+and a meaning for each word. The sametypesequence option for this
+dictionary would be:
+ sametypesequence=tm
+Once again, you can omit the 't' and 'm' characters before each data
+entry in the .dict file. In addition, you should omit the terminating
+'\0' for the 'm' entry for each word in the .dict file, as the length
+of the meaning string can be inferred from the length of the phonetic
+string (still indicated by a terminating '\0') and the length of the
+entire word entry (listed in the .idx file).
+
+So for cases where the last data entry for each word normally requires
+a terminating '\0' character, you should omit this character in the
+dict file. And for cases where the last data entry for each word
+normally requires an initial 32-bit number giving the length of the
+field (such as WAV and PNG entries), you must omit this number in the
+dictionary.
+
+Every dictionary should try to use the sametypesequence feature to
+save disk space.
+***
+
+
+{3}. The ".idx" file's format.
+The .idx file is just a word list.
+
+The word list is a sorted list of word entries.
+
+Each entry in the word list contains three fields, one after the other:
+ word_str; // a utf-8 string terminated by '\0'.
+ word_data_offset; // word data's offset in .dict file
+ word_data_size; // word data's total size in .dict file
+
+word_str gives the string representing this word. It's the string
+that is "looked up" by the StarDict.
+
+word_data_offset and word_data_size should both be 32-bit numbers in
+network byte order.
+
+No two entries should have the same "word_str". In other words,
+(strcmp(s1, s2) != 0).
+
+The length of "word_str" should be less than 256. In other words,
+(strlen(word) < 256).
+
+The word list must be sorted by calling stardict_strcmp() on the "word_str"
+fields. If the word list order is wrong, StarDict will fail to function
+correctly!
+
+============
+gint stardict_strcmp(const gchar *s1, const gchar *s2)
+{
+ gint a;
+ a = g_ascii_strcasecmp(s1, s2);
+ if (a == 0)
+ return strcmp(s1, s2);
+ else
+ return a;
+}
+============
+g_ascii_strcasecmp() is a glib function:
+Unlike the BSD strcasecmp() function, this only recognizes standard
+ASCII letters and ignores the locale, treating all non-ASCII characters
+as if they are not letters.
+
+stardict_strcmp() works fine with English characters, but the other
+locale characters' sorting is not so good. There should be a _strcmp
+function which handles the utf-8 string sorting better. If you know
+one, email me :)
+
+g_utf8_collate()? This is a locale-dependent funcition. So if you look
+up Chinese characters while in the Chinese locale, it works fine. But
+if you are in some other locale then the lookup will fail, as the
+order is not the same as in the Chinese locale (which was used when
+creating the dictionary).
+
+g_utf8_to_ucs4() then do comparing? This sounds like a good solution, but..
+
+The complete solution can be found in "Unicode Technical Standard #10: Unicode
+Collation Algorithm", http://www.unicode.org/reports/tr10/
+
+I hope glib will provide a locale-independent g_utf8_collate() soon.
+http://bugzilla.gnome.org/show_bug.cgi?id=112798
+
+
+
+{4}. The ".dict" file's format.
+The .dict file is a pure data sequence, as the offset and size of each
+word is recorded in the corresponding .idx file.
+
+If the "sametypesequence" option is not used in the .ifo file, then
+the .dict file has fields in the following order:
+==============
+word_1_data_1_type; // a single char identifying the data type
+word_1_data_1_data; // the data
+word_1_data_2_type;
+word_1_data_2_data;
+...... // the number of data entries for each word is determined by
+ // word_data_size in .idx file
+word_2_data_1_type;
+word_2_data_1_data;
+......
+==============
+It's important to note that each field in each word indicates its
+own length, as described below. The number of possible fields per
+word is also not fixed, and is determined by simply reading data until
+you've read word_data_size bytes for that word.
+
+
+Suppose the "sametypesequence" option is used in the .idx file, and
+the option is set like this:
+sametypesequence=tm
+Then the .dict file will look like this:
+==============
+word_1_data_1_data
+word_1_data_2_data
+word_2_data_1_data
+word_2_data_2_data
+......
+==============
+The first data entry for each word will have a terminating '\0', but
+the second entry will not have a terminating '\0'. The omissions of
+the type chars and of the last field's size information are the
+optimizations required by the "sametypesequence" option described
+above.
+
+
+Type identifiers
+----------------
+Here are the single-character type identifiers that may be used with
+the "sametypesequence" option in the .idx file, or may appear in the
+dict file itself if the "sametypesequence" option is not used.
+
+Lower-case characters signify that a field's size is determined by a
+terminating '\0', while upper-case characters indicate that the data
+begins with a 32-bit integer that gives the length of the data field.
+
+'m'
+Word's pure text meaning.
+The data should be a utf-8 string ending with '\0'.
+
+'l'
+Word's pure text meaning.
+The data is NOT a utf-8 string, but is instead a string in locale
+encoding, ending with '\0'. Sometimes using this type will save disk
+space, but its use is discouraged.
+
+'g'
+A utf-8 string which is marked up with the Pango text markup language.
+For more information about this markup language, See the "Pango
+Reference Manual."
+You might have it installed locally at:
+file:///usr/share/gtk-doc/html/pango/PangoMarkupFormat.html
+
+'t'
+English phonetic string.
+The data should be a utf-8 string ending with '\0'.
+
+Here are some utf-8 phonetic characters:
+θʃŋʧðʒæıʌʊɒɛəɑɜɔˌˈːˑ
+æɑɒʌәєŋvθðʃʒːɡˏˊˋ
+
+'y'
+Chinese YinBiao.
+The data should be a utf-8 string ending with '\0'.
+
+
+'W'
+wav file.
+The data begins with a network byte-ordered glong to identify the wav
+file's size, immediately followed by the file's content.
+
+'P'
+png file.
+The data begins with a network byte-ordered glong to identify the png
+file's size, immediately followed by the file's content.
+
+'X'
+this type identifier is reserved for experimental extensions.
+
+
+{5}. Tree Dictionary
+The tree dictionary support is used for information viewing, etc.
+
+A tree dictionary contains three file: sometreedict.ifo, sometreedict.tdx.gz
+and sometreedict.dict.dz.
+
+It is better to compress the .tdx file, as it is always load into memory.
+
+The .ifo file has the following format:
+
+StarDict's treedict ifo file
+version=2.4.2
+[options]
+
+Available options:
+
+bookname= // required
+tdxfilesize= // required
+wordcount=
+author=
+email=
+website=
+description=
+date=
+sametypesequence=
+
+wordcount is only used for info view in the dict manage dialog, so it is not
+important in tree dictionary.
+
+The .tdx file is just the word list.
+-----------
+The word list is a tree list of word entries.
+
+Each entry in the word list contains four fields, one after the other:
+ word_str; // a utf-8 string terminated by '\0'.
+ word_data_offset; // word data's offset in .dict file
+ word_data_size; // word data's total size in .dict file. it can be 0.
+ word_subentry_count; //have many sub word this entry has, 0 means none.
+
+Subentry is immidiately followed by its parent entry. This make the order is
+just as when a tree list with all its nodes extended, then sort from top to
+bottom.
+
+The .dict file's format is the same as the normal dictionary.
+
+
+
+{6}. More information.
+You can read "src/lib.cpp", "src/dictmanagedlg.cpp" and
+"src/tools/*.cpp" for more information.
+
+If you have any questions, email me. :)
+
+Thanks to Will Robinson <wsr23@stanford.edu> for cleaning up this file's
+English.
+
+Hu Zheng <huzheng_001@163.com>
+http://forlinux.yeah.net
+2003.11.11
diff --git a/doc/sdcv.1 b/doc/sdcv.1
new file mode 100644
index 0000000..86351b7
--- /dev/null
+++ b/doc/sdcv.1
@@ -0,0 +1,104 @@
+.TH SDCV 1 "2006-04-24" "sdcv-0.4.2"
+.SH NAME
+sdcv \- console version of StarDict program
+.SH SYNOPSIS
+.B sdcv
+[
+.BI options
+]
+[list of words]
+.SH DESCRIPTION
+.I sdcv
+is a simple, cross-platform text-based utility
+for working with dictionaries in StarDict format.
+Each word from "list of words" may be a string
+with a leading '/' for using a Fuzzy search algorithm,
+with a leading '|' for using full-text search,
+and the string may contain '?' and '*' for regexp search.
+It works in interactive and non-interactive mode.
+To exit from interactive mode press Ctrl+D.
+In interactive mode,
+if sdcv was compiled with readline library support,
+you can use the UP and DOWN keys to cycle through history.
+.SH OPTIONS
+.TP 8
+.B "\-h \-\-help"
+Display help message and exit
+.TP 8
+.B "\-v \-\-verbose"
+Display version and exit
+.TP 8
+.B "\-l \-\-list\-dicts"
+Display list of available dictionaries and exit
+.TP 8
+.B "\-u \-\-use\-dict filename"
+For search use only dictionary with this bookname
+.TP 8
+.B "\-n \-\-non\-interactive"
+For use in scripts
+.TP 8
+.B "\-x \-\-only\-data\-dir"
+For use in scripts: only use the dictionaries in data-dir, do not search in user and system directories
+.TP 8
+.B "\-e \-\-exact\-search"
+Do not fuzzy-search for similar words, only return exact matches
+.TP 8
+.B "\-j \-\-json"
+Print the results of list-dicts and searches as json, not as plain text.
+For use in automatically processing the results of a dictionary lookup.
+.TP 8
+.B "\-\-utf8\-output"
+Force sdcv to not convert to locale charset, output in utf8
+.TP 8
+.B "\-\-utf8\-input"
+Force sdcv to not convert from locale charset, assume that
+input is in utf8
+.TP 8
+.B "\-\-data\-dir path/to/directory"
+Use this directory as the path to the stardict data directory. This means that
+sdcv searches for dictionaries in data-dir/dic directory.
+.TP 8
+.B "\-\-color"
+Use ANSI escape codes for colorizing sdcv output (does not work with json output).
+.SH FILES
+.TP
+/usr/share/stardict/dic
+.TP
+$(HOME)/.stardict/dic
+
+Place where sdcv expects to find dictionaries.
+Instead of /usr/share/stardict/dic you can use any directory
+you want, just set the STARDICT_DATA_DIR environment variable.
+For example, if you have dictionaries in /mnt/data/stardict-dicts/dic,
+set STARDICT_DATA_DIR to /mnt/data/stardict-dicts.
+.TP
+$(HOME)/.sdcv_history
+
+This file includes the last $(SDCV_HISTSIZE) words, which you sought with sdcv.
+SDCV uses this file only if it was compiled with readline library support.
+.TP
+$(HOME)/.sdcv_ordering
+
+This is a text file containing one dictionary bookname per line.
+It specifies in which order the results of a search should be shown.
+.SH ENVIRONMENT
+Environment Variables Used By \fIsdcv\fR:
+.TP 20
+.B STARDICT_DATA_DIR
+If set, sdcv uses this variable as the data directory, this means that sdcv
+searches dictionaries in $\fBSTARDICT_DATA_DIR\fR\\dic
+.TP 20
+.B SDCV_HISTSIZE
+If set, sdcv writes in $(HOME)/.sdcv_history the last $(SDCV_HISTSIZE) words,
+which you look up using sdcv. If it is not set, then the last 2000 words are saved in $(HOME)/.sdcv_history.
+.TP 20
+.B SDCV_PAGER
+If SDCV_PAGER is set, its value is used as the name of the program
+to use to display the dictionary article.
+.SH BUGS
+Email bug reports to dushistov at mail dot ru. Be sure to include the word
+"sdcv" somewhere in the "Subject:" field.
+.SH AUTHORS
+Evgeniy A. Dushistov, Hu Zheng
+.SH SEE ALSO
+stardict(1), http://sdcv.sourceforge.net/, http://stardict.sourceforge.net
diff --git a/doc/uk/sdcv.1 b/doc/uk/sdcv.1
new file mode 100644
index 0000000..ff3b270
--- /dev/null
+++ b/doc/uk/sdcv.1
@@ -0,0 +1,84 @@
+.TH SDCV 1 "2004-12-06" "sdcv-0.4"
+.SH NAME
+sdcv \- консольна версія Зоряного словника [Stardict]
+.SH SYNOPSIS
+.B sdcv
+[
+.BI options
+]
+[list of words]
+.SH DESCRIPTION
+.I sdcv
+sdcv проста, міжплатформена текстова утиліта для роботи із
+словниками у форматі Зоряного словника [StarDict].
+Слово зі "списку слів", може бути рядком з початковим слешем '/'
+щоб задіяти нечіткий пошуковий алгоритм, рядок, може
+містити '?' і '*' для використання пошуку з регулярними виразами.
+Утиліта працює в діалоговому та не в інтерактивному режимах.
+Щоб вийти з діалогового режиму натискають Ctrl+D.
+У діалоговому режимі, якщо sdcv був скомпільований з підтримкою
+бібліотеки readline, Ви можете використовувати клавіші ДОГОРИ
+та ВНИЗ для роботи з хронологією.
+.SH OPTIONS
+.TP 8
+.B "\-h \-\-help"
+відображає повідомлення довідки та виходить
+.TP 8
+.B "\-v \-\-verbose"
+відображає версію та виходить
+.TP 8
+.B "\-l \-\-list\-dicts"
+відображає список доступних словників та виходить
+.TP 8
+.B "\-u \-\-use\-dict filename"
+для пошуку з використанням лише словника з цим іменем(bookname)
+.TP 8
+.B "\-n \-\-non\-interactive"
+для використання в скриптах
+.TP 8
+.B "\-\-utf8\-output"
+Заставити sdcv розмовляти не в системному кодуванні locale, а робити вивід в utf8
+.TP 8
+.B "\-\-utf8\-input"
+Заставити sdcv слухати не в системному кодуванні locale, а припускати що це
+ввід в utf8
+.TP 8
+.B "\-\-data\-dir path/to/directory"
+Використовуйте цю теку як шлях до теки даних зоряного словника [stardict].
+Це значає, що sdcv шукає словники у теці data-dir/dic.
+.SH FILES
+.TP
+/usr/share/stardict/dic
+.TP
+$(HOME)/.stardict/dic
+
+Місце, де sdcv очікує знайти словники.
+Замість шляху /usr/share/stardict/dic Ви можете використовувати все,
+що Ви хочете, лише встановіть змінну оточення STARDICT_DATA_DIR.
+Наприклад, якщо Ви маєте словники у теці /mnt/data/stardict-dicts/dic,
+встановіть STARDICT_DATA_DIR у /mnt/data/stardict-dicts.
+.TP
+$(HOME)/.sdcv_history
+
+Цей файл містить останні $(SDCV_HISTSIZE) слова, які Ви шукали з sdcv.
+SDCV використовує цей файл при умові, якщо sdcv був скомпільований
+з підтримкою бібліотеки readline.
+
+.SH ENVIRONMENT
+Змінні оточення для \fIsdcv\fR:
+.TP 20
+.B STARDICT_DATA_DIR
+Якщо встановлена, sdcv використає цю змінну як теку даних, це означає,
+що sdcv шукатиме словники у $\fBSTARDICT_DATA_DIR\fR\dic
+.TP 20
+.B SDCV_HISTSIZE
+Якщо встановлена, sdcv писатиме у $(HOME)/.sdcv_history лише
+останні $(SDCV_HISTSIZE) слова, які Ви шукали з sdcv. Якщо не встановлена,
+то збірігатиметься останніх 2000 слів у $(HOME)/.sdcv_history.
+.SH BUGS
+Звіти про помилки висилайте на адресу dushistov на mail крапка ru.
+Не забувайте включати слово "sdcv" десь у полі "Тема:".
+.SH AUTHORS
+Эвгений А. Душистов, Hu Zheng
+.SH SEE ALSO
+stardict(1), http://sdcv.sourceforge.net/, http://stardict.sourceforge.net