diff options
-rw-r--r-- | INSTALL | 2 | ||||
-rw-r--r-- | NEWS | 36 | ||||
-rw-r--r-- | PKG-INFO | 25 | ||||
-rw-r--r-- | README | 79 | ||||
-rwxr-xr-x | scripts/dtrx | 220 | ||||
-rw-r--r-- | setup.py | 28 | ||||
-rw-r--r-- | tests/compare.py | 23 | ||||
-rw-r--r-- | tests/test-2_all.deb | bin | 0 -> 590 bytes | |||
-rw-r--r-- | tests/tests.yml | 32 |
9 files changed, 308 insertions, 137 deletions
@@ -28,7 +28,7 @@ rpm archives rpm2cpio, cpio deb archives - ar, tar, zcat + ar, tar, zcat, bzcat, lzcat gem archives tar, zcat @@ -1,6 +1,42 @@ Changes in dtrx =============== +Version 6.5 +----------- + +Enhancements +~~~~~~~~~~~~ + + * When you list archive contents with -l or -t, dtrx will start printing + results much faster than it used to. There's a small chance that it + will print some incorrect listings if it misdetects the archive type of + a given file, but it will show you an error message when that happens. + + * dtrx recognizes more kinds of compressed tar archives by their + extension. + + * You can now extract newer .deb packages that are compressed with bzip2 + or lzma. + +Bug fixes +~~~~~~~~~ + + * When extracting an archive that contained a file with a mismatched + filename, the prompt would offer you a chance to "rename the directory" + instead of "rename the file." This wording has been fixed, along with + some other wording adjustments in the prompts generally. + + * Perform more reliable detection of the terminal size, and improve word + wrapping on prompts. + +Other changes +~~~~~~~~~~~~~ + + * The README is now written like a man page, and can be converted to a man + page by using rst2man_. + +.. _rst2man: http://docutils.sourceforge.net/sandbox/manpage-writer/ + Version 6.4 ----------- @@ -1,10 +1,31 @@ Metadata-Version: 1.0 Name: dtrx -Version: 6.4 +Version: 6.5 Summary: Script to intelligently extract multiple archive types Home-page: http://www.brettcsmith.org/2007/dtrx/ Author: Brett Smith Author-email: brettcsmith@brettcsmith.org License: GNU General Public License, version 3 or later -Description: UNKNOWN +Download-URL: http://www.brettcsmith.org/2007/dtrx/ +Description: dtrx extracts archives in a number of different + formats; it currently supports tar, zip (including self-extracting + .exe files), cpio, rpm, deb, gem, 7z, cab, rar, and InstallShield + files. It can also decompress files compressed with gzip, bzip2, + lzma, or compress. + + In addition to providing one command to handle many different archive + types, dtrx also aids the user by extracting contents consistently. + By default, everything will be written to a dedicated directory + that's named after the archive. dtrx will also change the + permissions to ensure that the owner can read and write all those + files. Platform: UNKNOWN +Classifier: Development Status :: 5 - Production/Stable +Classifier: Environment :: Console +Classifier: Intended Audience :: End Users/Desktop +Classifier: Intended Audience :: System Administrators +Classifier: License :: OSI Approved :: GNU General Public License (GPL) +Classifier: Natural Language :: English +Classifier: Operating System :: POSIX +Classifier: Programming Language :: Python +Classifier: Topic :: Utilities @@ -1,12 +1,48 @@ -dtrx - Intelligent archive extraction -===================================== - -Introduction ------------- +==== +dtrx +==== + +---------------------------------- +cleanly extract many archive types +---------------------------------- + +:Author: Brett Smith <brettcsmith@brettcsmith.org> +:Date: 2009-07-04 +:Copyright: + + dtrx 6.5 is copyright © 2006-2009 Brett Smith and others. Feel free to + send comments, bug reports, patches, and so on. You can find the latest + version of dtrx on its home page at + <http://www.brettcsmith.org/2007/dtrx/>. + + dtrx is free software; you can redistribute it and/or modify it under the + terms of the GNU General Public License as published by the Free Software + Foundation; either version 3 of the License, or (at your option) any + later version. + + This program is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General + Public License for more details. + + You should have received a copy of the GNU General Public License along + with this program; if not, see <http://www.gnu.org/licenses/>. + +:Version: 6.5 +:Manual section: 1 + +SYNOPSIS +======== + +dtrx [OPTIONS] ARCHIVE [ARCHIVE ...] + +DESCRIPTION +=========== dtrx extracts archives in a number of different formats; it currently -supports tar, zip, cpio, rpm, deb, gem, 7z, cab, and rar files. It can -also decompress files compressed with gzip, bzip2, lzma, or compress. +supports tar, zip (including self-extracting .exe files), cpio, rpm, deb, +gem, 7z, cab, rar, and InstallShield files. It can also decompress files +compressed with gzip, bzip2, lzma, or compress. In addition to providing one command to handle many different archive types, dtrx also aids the user by extracting contents consistently. By @@ -14,14 +50,14 @@ default, everything will be written to a dedicated directory that's named after the archive. dtrx will also change the permissions to ensure that the owner can read and write all those files. -Running dtrx ------------- - To run dtrx, simply call it with the archive(s) you wish to extract as arguments. For example:: $ dtrx coreutils-5.*.tar.gz +OPTIONS +======= + dtrx supports a number of options to mandate specific behavior: -r, --recursive @@ -84,26 +120,3 @@ dtrx supports a number of options to mandate specific behavior: --version Display dtrx's version, copyright, and license information. - -Other Useful Information ------------------------- - -dtrx 6.4 is copyright ⓒ 2006, 2007, 2008 `Brett Smith`_ and others. Feel -free to send comments, bug reports, patches, and so on. You can find the -latest version of dtrx on `its home page`_. - -.. _`Brett Smith`: mailto:brettcsmith@brettcsmith.org -.. _`its home page`: http://www.brettcsmith.org/2007/dtrx/ - -dtrx is free software; you can redistribute it and/or modify it under the -terms of the GNU General Public License as published by the Free Software -Foundation; either version 3 of the License, or (at your option) any later -version. - -This program is distributed in the hope that it will be useful, but WITHOUT -ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or -FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for -more details. - -You should have received a copy of the GNU General Public License along -with this program; if not, see <http://www.gnu.org/licenses/>. diff --git a/scripts/dtrx b/scripts/dtrx index 70e7965..f053989 100755 --- a/scripts/dtrx +++ b/scripts/dtrx @@ -2,8 +2,8 @@ # -*- coding: utf-8 -*- # # dtrx -- Intelligently extract various archive types. -# Copyright ⓒ 2006, 2007, 2008 Brett Smith <brettcsmith@brettcsmith.org> -# Copyright ⓒ 2008 Peter Kelemen <Peter.Kelemen@gmail.com> +# Copyright © 2006-2009 Brett Smith <brettcsmith@brettcsmith.org> +# Copyright © 2008 Peter Kelemen <Peter.Kelemen@gmail.com> # # This program is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the @@ -21,6 +21,7 @@ # Python 2.3 string methods: 'rfind', 'rindex', 'rjust', 'rstrip' import errno +import fcntl import logging import mimetypes import optparse @@ -29,9 +30,12 @@ import re import shutil import signal import stat +import string +import struct import subprocess import sys import tempfile +import termios import textwrap import traceback @@ -40,10 +44,10 @@ try: except NameError: from sets import Set as set -VERSION = "6.4" +VERSION = "6.5" VERSION_BANNER = """dtrx version %s -Copyright ⓒ 2006, 2007, 2008 Brett Smith <brettcsmith@brettcsmith.org> -Copyright ⓒ 2008 Peter Kelemen <Peter.Kelemen@gmail.com> +Copyright © 2006-2009 Brett Smith <brettcsmith@brettcsmith.org> +Copyright © 2008 Peter Kelemen <Peter.Kelemen@gmail.com> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the @@ -168,12 +172,21 @@ class BaseExtractor(object): return index return None + def add_process(self, processes, command, stdin, stdout): + try: + processes.append(subprocess.Popen(command, stdin=stdin, + stdout=stdout, + stderr=self.stderr)) + except OSError, error: + if error.errno == errno.ENOENT: + raise ExtractorUnusable("could not run %s" % (command[0],)) + raise + def run_pipes(self, final_stdout=None): if not self.pipes: return elif final_stdout is None: - # FIXME: Buffering this might be dumb. - final_stdout = tempfile.TemporaryFile() + final_stdout = open('/dev/null', 'w') num_pipes = len(self.pipes) last_pipe = num_pipes - 1 processes = [] @@ -186,14 +199,7 @@ class BaseExtractor(object): stdout = final_stdout else: stdout = subprocess.PIPE - try: - processes.append(subprocess.Popen(command, stdin=stdin, - stdout=stdout, - stderr=self.stderr)) - except OSError, error: - if error.errno == errno.ENOENT: - raise ExtractorUnusable("could not run %s" % (command[0],)) - raise + self.add_process(processes, command, stdin, stdout) self.exit_codes = [pipe.wait() for pipe in processes] self.archive.close() for index in range(last_pipe): @@ -285,17 +291,25 @@ class BaseExtractor(object): self.archive.close() os.chdir(old_path) - def get_filenames(self): - self.pipe(self.list_pipe, "listing") - self.run_pipes() - self.check_success(False) - self.archive.seek(0, 0) + def get_filenames(self, internal=False): + if not internal: + self.pipe(self.list_pipe, "listing") + processes = [] + stdin = self.archive + for command in [pipe[0] for pipe in self.pipes]: + self.add_process(processes, command, stdin, subprocess.PIPE) + stdin = processes[-1].stdout + get_output_line = processes[-1].stdout.readline while True: - line = self.archive.readline() + line = get_output_line() if not line: - self.archive.close() - return + break yield line.rstrip('\n') + self.exit_codes = [pipe.wait() for pipe in processes] + self.archive.close() + for process in processes: + process.stdout.close() + self.check_success(False) class CompressionExtractor(BaseExtractor): @@ -377,11 +391,25 @@ class RPMExtractor(CpioExtractor): class DebExtractor(TarExtractor): file_type = 'Debian package' + data_re = re.compile(r'^data\.tar\.[a-z0-9]+$') def prepare(self): - self.pipe(['ar', 'p', self.filename, 'data.tar.gz'], - "data.tar.gz extraction") - self.pipe(['zcat'], "data.tar.gz decompression") + self.pipe(['ar', 't', self.filename], "finding package data file") + for filename in self.get_filenames(internal=True): + if self.data_re.match(filename): + data_filename = filename + break + else: + raise ExtractorError(".deb contains no data.tar file") + self.archive.seek(0, 0) + self.pipes.pop() + # self.pipes = start_pipes + encoding = mimetypes.guess_type(data_filename)[1] + if not encoding: + raise ExtractorError("data.tar file has unrecognized encoding") + self.pipe(['ar', 'p', self.filename, data_filename], + "extracting data.tar from .deb") + self.pipe([self.decoders[encoding]], "decoding data.tar") def basename(self): pieces = os.path.basename(self.filename).split('_') @@ -471,7 +499,7 @@ class SevenExtractor(NoPipeExtractor): if fn_index is not None: break else: - fn_index = line.rindex(' ') + 1 + fn_index = string.rindex(line, ' ') + 1 elif fn_index is not None: yield line[fn_index:] self.archive.close() @@ -661,11 +689,16 @@ class BombHandler(BaseHandler): class BasePolicy(object): try: - width = int(os.environ['COLUMNS']) - except (KeyError, ValueError): + size = fcntl.ioctl(sys.stdout.fileno(), termios.TIOCGWINSZ, + struct.pack("HHHH", 0, 0, 0, 0)) + width = struct.unpack("HHHH", size)[1] + except IOError: width = 80 - wrapper = textwrap.TextWrapper(width=width - 1) - + width = width - 1 + choice_wrapper = textwrap.TextWrapper(width=width, initial_indent=' * ', + subsequent_indent=' ', + break_long_words=False) + def __init__(self, options): self.current_policy = None if options.batch: @@ -673,15 +706,10 @@ class BasePolicy(object): else: self.permanent_policy = None - def wrap(self, question, filename): - # Note: This function assumes the filename is the first thing in the - # question text, and that's the only place it appears. - if len(self.wrapper.wrap(filename + ' a')) > 1: - return [filename] + self.wrapper.wrap(question[3:]) - return self.wrapper.wrap(question % (filename,)) - def ask_question(self, question): - question = question + self.choices + question = question + ["You can:"] + for choice in self.choices: + question.extend(self.choice_wrapper.wrap(choice)) while True: print "\n".join(question) try: @@ -693,6 +721,19 @@ class BasePolicy(object): except KeyError: print + def wrap(self, question, *args): + words = question.split() + for arg in args: + words[words.index('%s')] = arg + result = [words.pop(0)] + for word in words: + extend = '%s %s' % (result[-1], word) + if len(extend) > self.width: + result.append(word) + else: + result[-1] = extend + return result + def __cmp__(self, other): return cmp(self.current_policy, other) @@ -700,10 +741,9 @@ class BasePolicy(object): class OneEntryPolicy(BasePolicy): answers = {'h': EXTRACT_HERE, 'i': EXTRACT_WRAP, 'r': EXTRACT_RENAME, '': EXTRACT_WRAP} - choices = ["You can:", - " * extract it Inside another directory", - " * extract it and Rename the directory", - " * extract it Here"] + choice_template = ["extract the %s _I_nside a new directory named %s", + "extract the %s and _R_ename it %s", + "extract the %s _H_ere"] prompt = "What do you want to do? (I/r/h) " def __init__(self, options): @@ -724,11 +764,14 @@ class OneEntryPolicy(BasePolicy): raise ValueError("bad value %s for default policy" % (default,)) def prep(self, archive_filename, extractor): - question = self.wrap(("%%s contains one %s, but its name " + - "doesn't match.") % - (extractor.content_type,), archive_filename) + question = self.wrap( + "%s contains one %s but its name doesn't match.", + archive_filename, extractor.content_type) question.append(" Expected: " + extractor.basename()) question.append(" Actual: " + extractor.content_name) + choice_vars = (extractor.content_type, extractor.basename()) + self.choices = [text % choice_vars[:text.count('%s')] + for text in self.choice_template] self.current_policy = (self.permanent_policy or self.ask_question(question)) @@ -739,12 +782,11 @@ class OneEntryPolicy(BasePolicy): class RecursionPolicy(BasePolicy): answers = {'o': RECURSE_ONCE, 'a': RECURSE_ALWAYS, 'n': RECURSE_NOT_NOW, 'v': RECURSE_NEVER, 'l': RECURSE_LIST, '': RECURSE_NOT_NOW} - choices = ["You can:", - " * Always extract included archives", - " * extract included archives this Once", - " * choose Not to extract included archives", - " * neVer extract included archives", - " * List included archives"] + choices = ["_A_lways extract included archives during this session", + "extract included archives this _O_nce", + "choose _N_ot to extract included archives this once", + "ne_V_er extract included archives during this session", + "_L_ist included archives"] prompt = "What do you want to do? (a/o/N/v/l) " def __init__(self, options): @@ -759,10 +801,9 @@ class RecursionPolicy(BasePolicy): if (self.permanent_policy is not None) or (archive_count == 0): self.current_policy = self.permanent_policy or RECURSE_NOT_NOW return - question = self.wrap(("%%s contains %s other archive file(s), " + - "out of %s file(s) total.") % - (archive_count, extractor.file_count), - current_filename) + question = self.wrap( + "%s contains %s other archive file(s), out of %s file(s) total.", + current_filename, archive_count, extractor.file_count) if target == '.': target = '' included_root = extractor.included_root @@ -840,8 +881,10 @@ class ExtractorBuilder(object): for extension in ext_info.get('extensions', ()): extension_map.setdefault(extension, []).append((ext_name, None)) - for mapping in (('tar', 'bzip2', 'tar.bz2'), + for mapping in (('tar', 'bzip2', 'tar.bz2', 'tbz2', 'tb2', 'tbz'), ('tar', 'gzip', 'tar.gz', 'tgz'), + ('tar', 'lzma', 'tar.lzma', 'tlz'), + ('tar', 'compress', 'tar.Z', 'taz'), ('compress', 'gzip', 'Z', 'gz'), ('compress', 'bzip2', 'bz2'), ('compress', 'lzma', 'lzma')): @@ -936,6 +979,7 @@ class BaseAction(object): self.options = options self.filenames = filenames self.target = None + self.do_print = False def report(self, function, *args): try: @@ -945,15 +989,20 @@ class BaseAction(object): logger.debug(''.join(traceback.format_exception(*sys.exc_info()))) return error + def show_filename(self, filename): + if len(self.filenames) < 2: + return + elif self.do_print: + print + else: + self.do_print = True + print "%s:" % (filename,) + class ExtractionAction(BaseAction): handlers = [FlatHandler, OverwriteHandler, MatchHandler, EmptyHandler, BombHandler] - def __init__(self, options, filenames): - BaseAction.__init__(self, options, filenames) - self.did_print = False - def get_handler(self, extractor): if extractor.content_type in ONE_ENTRY_UNKNOWN: self.options.one_entry_policy.prep(self.current_filename, @@ -967,11 +1016,7 @@ class ExtractionAction(BaseAction): def show_extraction(self, extractor): if self.options.log_level > logging.INFO: return - elif self.did_print: - print - else: - self.did_print = True - print "%s:" % (self.current_filename,) + self.show_filename(self.current_filename) if extractor.contents is None: print self.current_handler.target return @@ -1007,29 +1052,28 @@ class ExtractionAction(BaseAction): class ListAction(BaseAction): - def __init__(self, options, filenames): - BaseAction.__init__(self, options, filenames) - self.count = 0 - - def get_list(self, extractor): - # Note: The reason I'm getting all the filenames up front is - # because if we run into trouble partway through the archive, we'll - # try another extractor. So before we display anything we have to - # be sure this one is successful. We maybe don't have to be quite - # this conservative but this is the easy way out for now. - self.filelist = list(extractor.get_filenames()) - - def show_list(self, filename): - self.count += 1 - if len(self.filenames) != 1: - if self.count > 1: - print - print "%s:" % (filename,) - print '\n'.join(self.filelist) - + def list_filenames(self, extractor, filename): + # We get a line first to make sure there's not going to be some + # basic error before we show what filename we're listing. + filename_lister = extractor.get_filenames() + try: + first_line = filename_lister.next() + except StopIteration: + self.show_filename(filename) + else: + self.did_list = True + self.show_filename(filename) + print first_line + for line in filename_lister: + print line + def run(self, filename, extractor): - return (self.report(self.get_list, extractor) or - self.report(self.show_list, filename)) + self.did_list = False + error = self.report(self.list_filenames, extractor, filename) + if error and self.did_list: + logger.error("lister failed: ignore above listing for %s" % + (filename,)) + return error class ExtractorApplication(object): @@ -3,11 +3,33 @@ from distutils.core import setup setup(name="dtrx", - version = "6.4", + version = "6.5", description = "Script to intelligently extract multiple archive types", author = "Brett Smith", author_email = "brettcsmith@brettcsmith.org", url = "http://www.brettcsmith.org/2007/dtrx/", + download_url = "http://www.brettcsmith.org/2007/dtrx/", scripts = ['scripts/dtrx'], - license = "GNU General Public License, version 3 or later" - ) + license = "GNU General Public License, version 3 or later", + classifiers = ['Development Status :: 5 - Production/Stable', + 'Environment :: Console', + 'Intended Audience :: End Users/Desktop', + 'Intended Audience :: System Administrators', + 'License :: OSI Approved :: GNU General Public License (GPL)', + 'Natural Language :: English', + 'Operating System :: POSIX', + 'Programming Language :: Python', + 'Topic :: Utilities'], + long_description = """dtrx extracts archives in a number of different + formats; it currently supports tar, zip (including self-extracting + .exe files), cpio, rpm, deb, gem, 7z, cab, rar, and InstallShield + files. It can also decompress files compressed with gzip, bzip2, + lzma, or compress. + + In addition to providing one command to handle many different archive + types, dtrx also aids the user by extracting contents consistently. + By default, everything will be written to a dedicated directory + that's named after the archive. dtrx will also change the + permissions to ensure that the owner can read and write all those + files.""" + ) diff --git a/tests/compare.py b/tests/compare.py index cdbcc5d..7927467 100644 --- a/tests/compare.py +++ b/tests/compare.py @@ -1,7 +1,8 @@ #!/usr/bin/env python +# -*- coding: utf-8 -*- # # compare.py -- High-level tests for dtrx. -# Copyright (c) 2006, 2007, 2008 Brett Smith <brettcsmith@brettcsmith.org>. +# Copyright © 2006-2009 Brett Smith <brettcsmith@brettcsmith.org>. # # This program is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the @@ -54,8 +55,13 @@ class ExtractorTest(object): setattr(self, 'options', kwargs.get('options', '-n').split()) setattr(self, 'filenames', kwargs.get('filenames', '').split()) for key in ('directory', 'prerun', 'posttest', 'baseline', 'error', - 'grep', 'antigrep', 'input', 'output', 'cleanup'): + 'input', 'output', 'cleanup'): setattr(self, key, kwargs.get(key, None)) + for key in ('grep', 'antigrep'): + value = kwargs.get(key, []) + if isinstance(value, str): + value = [value] + setattr(self, key, value) def get_results(self, commands, stdin=None): print >>output_buffer, "Output from %s:" % (' '.join(commands),) @@ -165,12 +171,13 @@ class ExtractorTest(object): return None def grep_output(self, output): - if self.grep and (not re.search(self.grep.replace(' ', '\\s+'), - output, re.MULTILINE)): - return "output did not match %s" % (self.grep) - elif self.antigrep and re.search(self.antigrep.replace(' ', '\\s+'), - output, re.MULTILINE): - return "output matched antigrep %s" % (self.antigrep) + for pattern in self.grep: + if not re.search(pattern.replace(' ', '\\s+'), output, + re.MULTILINE): + return "output did not match %s" % (pattern) + for pattern in self.antigrep: + if re.search(pattern.replace(' ', '\\s+'), output, re.MULTILINE): + return "output matched antigrep %s" % (self.antigrep) return None def check_output(self, output): diff --git a/tests/test-2_all.deb b/tests/test-2_all.deb Binary files differnew file mode 100644 index 0000000..2ed2886 --- /dev/null +++ b/tests/test-2_all.deb diff --git a/tests/tests.yml b/tests/tests.yml index 54e79cd..a0758f0 100644 --- a/tests/tests.yml +++ b/tests/tests.yml @@ -29,6 +29,13 @@ cd test-1.23 ar p ../$1 data.tar.gz | tar -zx +- name: .deb with LZMA compression + filenames: test-2_all.deb + baseline: | + mkdir test-2 + cd test-2 + ar p ../$1 data.tar.lzma | lzcat | tar -x + - name: basic .gem filenames: test-1.23.gem baseline: | @@ -339,6 +346,12 @@ cd test-onefile tar -zxf ../$1 +- name: prompt wording with one file + options: "" + filenames: test-onefile.tar.gz + input: i + grep: file _I_nside + - name: one file extracted with rename, with Expected text options: "" filenames: test-onefile.tar.gz @@ -358,7 +371,7 @@ - name: bomb with preceding dot in the table filenames: test-dot-first-bomb.tar.gz options: "" - antigrep: one entry + antigrep: one baseline: | mkdir test-dot-first-bomb cd test-dot-first-bomb @@ -512,6 +525,22 @@ grep: "^1/2/3$" antigrep: "^dtrx:" +- name: listing multiple file with misleading extensions + options: -l + filenames: trickery.tar.gz trickery.tar.gz + prerun: cp ${1}test-1.23.zip ${1}trickery.tar.gz + cleanup: rm -f ${1}trickery.tar.gz + output: | + trickery.tar.gz: + 1/2/3 + a/b + foobar + + trickery.tar.gz: + 1/2/3 + a/b + foobar + - name: non-archive error filenames: /dev/null error: true @@ -606,7 +635,6 @@ directory: busydir filenames: ../test-onedir.tar.gz output: | - ../test-onedir.tar.gz: test/ test/foobar test/quux |