summaryrefslogtreecommitdiff
path: root/doc/README.asn2xml
blob: 67fbb9493b90ffbde47bf0f8124f44dc49f2b517 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
Last updated: Fri, March 16, 2001
File Locator: ftp://ftp.ncbi.nih.gov/toolbox/xml/asn2xml/README.asn2xml

asn2xml

Program Description: asn2xml is a utility program designed to read 
sequence data in ASN.1 and output the sequence data as "full XML". For
further description on "full XML", refer to the NCBI Data in XML Doc.

Binary: ftp://ftp.ncbi.nih.gov/toolbox/xml/asn2xml/
Source: ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/ncbi.tar.Z   
NCBI XML DTDs: ftp://ftp.ncbi.nih.gov/toolbox/xml/xmlspecs/
NCBI Data in XML Doc: ftp://ftp.ncbi.nih.gov/toolbox/xml/ncbixml.txt

BASIC INSTRUCTIONS:
1. Obtain GenBank ASN.1 data file at
ftp://ftp.ncbi.nih.gov/ncbi-asn1/

/ncbi-asn1/daily/ directory: ASN.1 Cumulative Update: gbcu.aso.gz
/ncbi-asn1/daily-nc/ directory: contains individual files for each day's
new or updated entries since close-of-data for the last GenBank Release,
in ASN.1 format.

Additional documentation:
/ncbi-asn1/README.asn1
/ncbi-asn1/README.asn1.daily-nc
/ncbi-asn1/README.asn1.daily

2. Download the appropriate asn2xml binary for your platform
ftp://ftp.ncbi.nih.gov/toolbox/xml/asn2xml/
     asn2xml.alphaOSF1.tar.Z      OSF1 V5.1
     asn2xml.linux.tar.Z          Intel x86/Linux
     asn2xml.sgi.tar.Z            IRIX 6.4/IRIX 6.5 
     asn2xml.sgi5.tar.Z           IRIX 5.3
     asn2xml.solaris.tar.Z        Sparc/Solaris 2.6/2.7/8
     asn2xml.solarisintel.tar.Z   Intel x86/Solaris 2.7/8
     asn2xml.win32.exe            Intel x86/Microsoft NT or Windows 95 (32bit)

3. Running the program. As usual with any NCBI application, providing a
hyphen with no arguments provide a basic help.

prompt> asn2xml -

asn2xml 1.0   arguments:

  -i  Filename for asn.1 input [File In]
    default = stdin
  -e  Input is a Seq-entry [T/F]  Optional
    default = F
  -b  Input asnfile in binary mode [T/F]  Optional
    default = T
  -o  Filename for XML output [File Out]  Optional
    default = stdout
  -l  Log errors to file named: [File Out]  Optional

Example:
(Efficient method)

ftp://ftp.ncbi.nih.gov/ncbi-asn1/daily-nc/nc0305.aso.gz
gunzip -c nc0305.aso.gz | asn2xml -l error.log > nc0305.xml

or
ftp://ftp.ncbi.nih.gov/ncbi-asn1/daily-nc/nc0305.aso.gz
gunzip nc0305.aso.gz
asn2xml -i nc0305.aso -o nc0305.xml2 -l error3.log



----------------

Notes:

1. The uncompressed XML file is about 10 times the size of the compressed
binary ASN.1 file. The uncompressed XML can be extremely big. Note, the
ASN.1 contains more than the GenBank files; it includes other databases
like PDB and RefSeq, gaps of HTG records, and shows the quality scores of
HTG records. For further information, read 
ftp://ftp.ncbi.nih.gov/ncbi-asn1/README.asn1


Email any questions not answered in this documentation to:
info@ncbi.nih.gov