summaryrefslogtreecommitdiff
path: root/src/de/lmu/ifi/dbs/elki/datasource/parser/package-info.java
blob: f1999262d605417542ee8b841f39a97e4b914a3b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/**
 * <p>Parsers for different file formats and data types.</p>
 * 
 * <p>The general use-case for any parser is to create objects out of an
 * {@link java.io.InputStream} (e.g. by reading a data file).
 * The objects are packed in a
 * {@link de.lmu.ifi.dbs.elki.datasource.bundle.MultipleObjectsBundle} which,
 * in turn, is used by a {@link de.lmu.ifi.dbs.elki.datasource.DatabaseConnection}-Object
 * to fill a {@link de.lmu.ifi.dbs.elki.database.Database}
 * containing the corresponding objects.</p>
 * <p>By default (i.e., if the user does not specify any specific requests),
 * any {@link de.lmu.ifi.dbs.elki.KDDTask} will
 * use the {@link de.lmu.ifi.dbs.elki.database.StaticArrayDatabase} which,
 * in turn, will use a {@link de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection}
 * and a {@link de.lmu.ifi.dbs.elki.datasource.parser.DoubleVectorLabelParser}
 * to parse a specified data file creating
 * a {@link de.lmu.ifi.dbs.elki.database.StaticArrayDatabase}
 * containing {@link de.lmu.ifi.dbs.elki.data.DoubleVector}-Objects.</p>
 * 
 * <p>Thus, the standard procedure to use a data set of a real-valued vector space
 * is to prepare the data set in a file of the following format
 * (as suitable to {@link de.lmu.ifi.dbs.elki.datasource.parser.DoubleVectorLabelParser}):
 * <ul>
 *  <li>One point per line, attributes separated by whitespace.</li>
 *  <li>Several labels may be given per point. A label must not be parseable as double.</li>
 *  <li>Lines starting with &quot;#&quot; will be ignored.</li>
 *  <li>An index can be specified to identify an entry to be treated as class label.
 *      This index counts all entries (numeric and labels as well) starting with 0.</li>
 *  <li>Files can be gzip compressed.</li>
 * </ul>
 * This file format is e.g. also suitable to gnuplot.
 * </p>
 * 
 * <p>As an example file following these requirements consider e.g.:
 * <a href="http://www.dbs.ifi.lmu.de/research/KDD/ELKI/datasets/example/exampledata.txt">exampledata.txt</a>
 * </p>
 */
/*
This file is part of ELKI:
Environment for Developing KDD-Applications Supported by Index-Structures

Copyright (C) 2012
Ludwig-Maximilians-Universität München
Lehr- und Forschungseinheit für Datenbanksysteme
ELKI Development Team

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.
*/
package de.lmu.ifi.dbs.elki.datasource.parser;