summaryrefslogtreecommitdiff
path: root/doc/man/formatrpsdb.1
blob: 73efcac0d627fa02a331a0e1a140fe02a2df74c1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
.TH FORMATRPSDB 1 2004-10-20 NCBI "NCBI Tools User's Manual"
.SH NAME
formatrpsdb \- Build databases for RPS Blast
.SH SYNOPSIS
.B formatrpsdb
[\|\fB\-\fP\|]
[\|\fB\-E\fP\ \fIN\fP\|]
[\|\fB\-G\fP\ \fIN\fP\|]
[\|\fB\-S\fP\ \fIX\fP\|]
[\|\fB\-U\fP\ \fIstr\fP\|]
[\|\fB\-b\fP\|]
[\|\fB\-f\fP\ \fIX\fP\|]
\fB\-i\fP\ \fIfilename\fP
[\|\fB\-l\fP\ \fIfilename\fP\|]
[\|\fB\-n\fP\ \fIstr\fP\|]
[\|\fB\-o\fP\|]
[\|\fB\-t\fP\ \fIstr\fP\|]
[\|\fB\-v\fP\ \fIN\fP\|]
.SH DESCRIPTION
\fBFormatrpsdb\fP is a utility that converts a collection of input
sequences into a database suitable for use with Reverse Position
Specific (RPS) Blast.
Each input sequence, together with its position-specific scoring
matrix (PSSM), is ASN.1 encoded into a PssmWithParameters (or
`scoremat') object and resides in a separate file.
Scoremat objects can be created using \fBblastpgp\fP.
\fBFormatrpsdb\fP is given a list of these files and produces the 
corresponding database. 

\fBFormatrpsdb\fP is designed to perform the work of \fBformatdb\fP,
\fBmakemat\fP and \fBcopymat\fP simultaneously, without generating the
large number of intermediate files these utilities would need to
create an RPS Blast database.
Further, scoremat objects are in more general use than the binary
format makemat requires.
It is hoped that direct manipulation of scoremat objects will
encourage conversion of more diverse sequence collections into RPS
Blast databases.

Databases generated by formatrpsdb are binary compatible with
databases generated by \fBformatdb\fP/\fBmakemat\fP/\fBcopymat\fP,
although the database files will in general not be byte- for-byte
identical.
.SH OPTIONS
A summary of options is included below.
.TP
\fB\-\fP
Print usage message
.TP
\fB\-E\fP\ \fIN\fP
The gap extension penalty (if not specified in the scoremat; default = 1)
.TP
\fB\-G\fP\ \fIN\fP
The gap opening penalty (if not specified in the scoremat; default = 11)
.TP
\fB\-S\fP\ \fIX\fP
For scoremats that contain only residue frequencies, the scaling
factor to apply when creating PSSMs (default = 100)
.TP
\fB\-U\fP\ \fIstr\fP
Underlying score matrix (if not specified in the scoremat; default = BLOSUM62)
.TP
\fB\-b\fP
Scoremat files are binary (vs. text) ASN1.
.TP
\fB\-f\fP\ \fIX\fP
Threshold for extending hits for RPS database (default = 11)
.TP
\fB\-i\fP\ \fIfilename\fP
Input file containing list of ASN.1 Scoremat filenames
.TP
\fB\-l\fP\ \fIfilename\fP
Log file name (default = formatrpsdb.log)
.TP
\fB\-n\fP\ \fIstr\fP
Base name of output database (same as input file if not specified)
.TP
\fB\-o\fP
Create index files for database
.TP
\fB\-t\fP\ \fIstr\fP
Title for database file
.TP
\fB\-v\fP\ \fIN\fP
Database volume size in millions of letters (default = 0, which really
means no limit)
.SH AUTHOR
The National Center for Biotechnology Information.
.SH SEE ALSO
.BR blast (1),
.BR copymat (1),
.BR formatdb (1),
.BR makemat (1),
formatrpsdb.html