blob: c21ab31fb5559bb34a884b44abe28efba0552ddf (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
|
/**
* <p>Parsers for different file formats and data types.</p>
*
* <p>The general use-case for any parser is to create objects out of an
* {@link java.io.InputStream} (e.g. by reading a data file).
* The objects are packed in a
* {@link de.lmu.ifi.dbs.elki.datasource.bundle.MultipleObjectsBundle} which,
* in turn, is used by a {@link de.lmu.ifi.dbs.elki.datasource.DatabaseConnection}-Object
* to fill a {@link de.lmu.ifi.dbs.elki.database.Database}
* containing the corresponding objects.</p>
* <p>By default (i.e., if the user does not specify any specific requests),
* any {@link de.lmu.ifi.dbs.elki.KDDTask} will
* use the {@link de.lmu.ifi.dbs.elki.database.StaticArrayDatabase} which,
* in turn, will use a {@link de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection}
* and a {@link de.lmu.ifi.dbs.elki.datasource.parser.DoubleVectorLabelParser}
* to parse a specified data file creating
* a {@link de.lmu.ifi.dbs.elki.database.StaticArrayDatabase}
* containing {@link de.lmu.ifi.dbs.elki.data.DoubleVector}-Objects.</p>
*
* <p>Thus, the standard procedure to use a data set of a real-valued vector space
* is to prepare the data set in a file of the following format
* (as suitable to {@link de.lmu.ifi.dbs.elki.datasource.parser.DoubleVectorLabelParser}):
* <ul>
* <li>One point per line, attributes separated by whitespace.</li>
* <li>Several labels may be given per point. A label must not be parseable as double.</li>
* <li>Lines starting with "#" will be ignored.</li>
* <li>An index can be specified to identify an entry to be treated as class label.
* This index counts all entries (numeric and labels as well) starting with 0.</li>
* <li>Files can be gzip compressed.</li>
* </ul>
* This file format is e.g. also suitable to gnuplot.
* </p>
*
* <p>As an example file following these requirements consider e.g.:
* <a href="http://www.dbs.ifi.lmu.de/research/KDD/ELKI/datasets/example/exampledata.txt">exampledata.txt</a>
* </p>
*
* @apiviz.exclude java.io.*
* @apiviz.exclude de.lmu.ifi.dbs.elki.utilities.*
*/
/*
This file is part of ELKI:
Environment for Developing KDD-Applications Supported by Index-Structures
Copyright (C) 2013
Ludwig-Maximilians-Universität München
Lehr- und Forschungseinheit für Datenbanksysteme
ELKI Development Team
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
package de.lmu.ifi.dbs.elki.datasource.parser;
|