diff options
author | Aaron M. Ucko <ucko@debian.org> | 2005-03-24 18:31:57 +0000 |
---|---|---|
committer | Aaron M. Ucko <ucko@debian.org> | 2005-03-24 18:31:57 +0000 |
commit | ccba467ae4f393d7acce357a9847bfe1fb77ccc7 (patch) | |
tree | cd38978b0dd2149bb2804c7cd81e300ca13f241a /doc/blast/impala.html | |
parent | 9dd1ccc4b3f1bac2a7dda6ff84c690c05aaca0af (diff) |
To prepare to load ncbi (6.1.20040616) into
ncbi-tools6/branches/upstream/current, perform 9 renames.
* ncbi-tools6/branches/upstream/current/doc/blast/blast.html: Renamed
from ncbi-tools6/branches/upstream/current/doc/blast.txt.
* ncbi-tools6/branches/upstream/current/doc/blast/blastclust.html:
Renamed from
ncbi-tools6/branches/upstream/current/doc/blastclust.txt.
* ncbi-tools6/branches/upstream/current/doc/blast/fastacmd.html:
Renamed from ncbi-tools6/branches/upstream/current/doc/fastacmd.txt.
* ncbi-tools6/branches/upstream/current/doc/blast/formatdb.html:
Renamed from ncbi-tools6/branches/upstream/current/doc/formatdb.txt.
* ncbi-tools6/branches/upstream/current/doc/blast/impala.html: Renamed
from ncbi-tools6/branches/upstream/current/doc/impala.txt.
* ncbi-tools6/branches/upstream/current/doc/blast/megablast.html:
Renamed from ncbi-tools6/branches/upstream/current/doc/megablast.txt.
* ncbi-tools6/branches/upstream/current/doc/blast/netblast.html:
Renamed from ncbi-tools6/branches/upstream/current/doc/netblast.txt.
* ncbi-tools6/branches/upstream/current/doc/blast/rpsblast.html:
Renamed from ncbi-tools6/branches/upstream/current/doc/rpsblast.txt.
* ncbi-tools6/branches/upstream/current/doc/blast/blastall.html:
Renamed from ncbi-tools6/branches/upstream/current/doc/README-qm.
Diffstat (limited to 'doc/blast/impala.html')
-rw-r--r-- | doc/blast/impala.html | 212 |
1 files changed, 212 insertions, 0 deletions
diff --git a/doc/blast/impala.html b/doc/blast/impala.html new file mode 100644 index 00000000..1d3d5cde --- /dev/null +++ b/doc/blast/impala.html @@ -0,0 +1,212 @@ +IMPALA: Integrating Matrix Profiles And Local Alignments + +1. Files in Distribution + +The following IMPALA source code files are distributed: + +copymat.c +impatool.c +makemat.c +posit2.c +profiles.c +newkar.c +profiles.h +Makefile + +2. Compilation + +Run the following commands in the directory, containing IMPALA source code files: + +make makemat +make copymat +make impala + +This will result in three binary executable files: + +makemat : primary profile preprocessor + (converts a collection of binary profiles, created by the -C option + of PSI-BLAST, into portable ASCII form); + +copymat : secondary profile preprocessor + (converts ASCII matrices, produced by the primary preprocessor, + into database that can be read into memory quickly); + +impala : search program (searches a database of score + matrices, prepared by copymat, producing BLAST-like output). + +3. Conversion of profiles into searchable database + +3.1. Primary preprocessing + +Prepare the following files: + +i. a collection of PSI-BLAST-generated profiles with arbitrary + names and suffix .chk; + +ii. a collection of "profile master sequences", associated with + the profiles, each in a separate file with arbitrary name and a 3 character + suffix starting with c; + the sequences can have deflines; they need not be sequences in nr or + in any other sequence database; if the sequences have deflines, then + the deflines must be unique. + +iii. a list of profile file names, one per line, named + <database_name>.pn; + +iv. a list of master sequence file names, one per line, in the same + order as a list of profile names, named + <database_name>.sn; + +The following files will be created: + +i. a collection of ASCII files, corresponding to each of the + original profiles, named + <profile_name>.mtx; + +ii. a list of ASCII matrix files, named + <database_name>.mn; + +iii. ASCII file with auxiliary information, named + <database_name>.aux; + +Arguments to makemat: + + -P database name (required) + -G Cost to open a gap (optional) + default = 11 + -E Cost to extend a gap (optional) + default = 1 + -U Underlying amino acid scoring matrix (optional) + default = BLOSUM62 + -d Underlying sequence database used to create profiles (optional) + default = nr + -z Effective size of sequence database given by -d + default = current size of -d option + Note: It may make sense to use -z without -d when the + profiles were created with an older, smaller version of an + existing database + -S Scaling factor for matrix outputs to avoid round-off problems + default = PRO_DEFAULT_SCALING_UP (currently defined as 100) + Use 1.0 to have no scaling + Output scores will be scaled back down to a unit scale to make + them look more like BLAST scores, but we found working with a larger + scale to help with roundoff problems. + -H get help (overrides all other arguments) +Note: It is not enforced that the values of -G and -E passed to makemat +were actually used in making the checkpoints. However, the values fed +in to makemat are propagated to copymat and impala. + +3.1. Secondary preprocessing + +Prepare the following files: + +i. a collection of ASCII files, corresponding to each of the + original profiles, named + <profile_name>.mtx +(created by makemat); + +ii. a collection of "profile master sequences", associated with + the profiles, each in a separate file with arbitrary name and a 3 character + suffix starting with c. + +iii. a list of ASCII_matrix files, named + <database_name>.mn + (created by makemat); + +iv. a list of master sequence file names, one per + line, in the same order as a list of matrix names, named + <database_name>.sn; + +v. ASCII file with auxiliary information, named + <database_name>.aux +(created by makemat); + +The files input to copymatices are in ASCII format and thus portable +between machines with different encodings for machine-readable files + +The following files will be created: + +i. a huge binary file, containing all profile matrices, named + <database_name>.mat; + +Arguments to copymat + + -P database name (required) + -H get help (overrides all other arguments) + +4. Search + +Before you start searching, check that you have copies of or soft +links to all the files associated with the PSSM library. If the +library has K PSSMs, you should have + + K files with names ending in .mtx + K files with names ending in a 3-letter extension starting with c + 1 file with name ending in .pn + 1 file with name ending in .sn + 1 file with name ending in .aux + 1 file with name ending in .mn + 1 file with name ending in .mat + +Arguments to impala + + -i query sequence file (required) + -P database of profiles (required) + -o output file (optional) + default = stdout + -e Expectation value threshold (E), (optional, same as for BLAST) + default = 10 + -m alignment view (optional, same as for BLAST) + -z effective length of database (optional) + -1 = length given via -z option to makemat + default (0) implies length is actual length of profile library + adjusted for end effects + -H get help (overrides all other options) + +5. Directory convention + + Since IMPALA requires a large number of files, it may be convenient +to store your impala files in various directories. For copymat, +makemat, and impala the following parsing convention applies +to the string that follows the -P argument. +If the string starts with a '/', then it is deemed to be a full +path name. Whatever prefix occurs upto and including the rightmost +'/' is deemed to be a prefix that should be prepended to all +file names in the .sn, .pn, and .mn files. + +Example: If you call any of the 3 programs including the + argument -P /foo/bar/wolf1187 +then + /foo/bar/ is prepended to every filename listed in + wolf1187.pn + wolf1187.sn + wolf1187.mn + before opening the file, but the files + wolf1187.pn + wolf1187.sn + wolf1187.mn + themselves are not changed. + +6. Output + +IMPALA output closely mimics output of BLASTP family programs and +should be compatible with SEALS BLAST parsers. + +Send suggestions, comments, complaints only to Alejandro Schaffer +schaffer@helix.nih.gov + + +Reference: + + Schaffer, A.A., Wolf, Y.I., Ponting, C.P. Koonin, E.V., +Aravind, L., Altschul, S. F., IMPALA: Matching a Protein Sequence +Against a Collection of PSI-BLAST-Constructed Position-Specific +Score Matrices, Bioninformatics, to appear. + +Please cite the above paper if you publish any results computed by IMPALA. + + + + + + |