39

I just untar'd an archive that produced a mess of files into my tidy directory. For example:

user@comp:~/tidy$ tar xvf myarchive.tar
file1
file2
dir1/
dir1/file1
dir1/subdir1/
dir1/subdir1/file1
dir2/
dir2/file1
...

I was expecting that the tar file would have been organized in a single folder (i.e., myarchive/), but it wasn't! Now I have some 190 files and directories that have digitally barfed in what was an organized directory. These untar'd files need to be cleaned up.

Is there any way to "undo" this and delete the files and directories that were extracted from this archive?


Thanks for the excellent answers below. In summary, here is what works with two steps (1) delete files, and (2) delete empty directory structure in reverse packing order (to delete outer directories first):

tar tf myarchive.tar | xargs -d'\n' rm
tar tf myarchive.tar | tac | xargs -d'\n' rmdir

And safer yet, to preview a dry-run of the commands by appending echo after xargs.

Mike T
  • 753
  • 4
  • 12
  • 25
  • I guess you could list the files in the archive and delete them from the current directory, but that feels potentially data destructive (data you want to keep). I also have no idea how to write a bash script, so I can't help there. – Bob May 15 '12 at 12:03
  • Fortunately, nothing was overwritten! – Mike T May 15 '12 at 12:04
  • I'm not after the rep and I'm afraid I will sound cranky no matter how I put this, which I'm not (I liked slhck's answer as well and I +1:ed it, and honestly: ±15 rep is _not_ my world), but you end up using my suggested answer with pipes and `xargs` (`tac` instead of `sort -r` is just cosmetics), but you accept the answer with process substitution that, as you explained in the comments, did not fit you? Also, please give the `xargs -d'\n'` switch in your post if you want to summarize for future users, so they won't get bitten by spaces in file names. – Daniel Andersson May 16 '12 at 06:07
  • @DanielAndersson, I never understood the necessity of `-d'\n'` until now, and upon further analysis your answer is actually closer to what I used. – Mike T May 16 '12 at 07:17
  • Totally fine with that too, liked @Daniel's solution :) The necessity of `-d'\n'` lies in the fact that if you don't tell `xargs` to split arguments on new lines (which is what you're feeding) but on spaces, then a file with the name `folder1/some file` will be read as `folder1/some` and `name`. – slhck May 16 '12 at 07:58

5 Answers5

44
tar tf archive.tar

will list the contents line by line.

This can be piped to xargs directly, but beware: do the deletion very carefully. You don't want to just rm -r everything that tar tf tells you, since it might include directories that were not empty before unpacking!

You could do

tar tf archive.tar | xargs -d'\n' rm -v
tar tf archive.tar | sort -r | xargs -d'\n' rmdir -v

to first remove all files that were in the archive, and then the directories that are left empty.

sort -r (glennjackman suggested tac instead of sort -r in the comments to the accepted answer, which also works since tar's output is regular enough) is needed to delete the deepest directories first; otherwise a case where dir1 contains a single empty directory dir2 will leave dir1 after the rmdir pass, since it was not empty before dir2 was removed.

This will generate a lot of

rm: cannot remove `dir/': Is a directory

and

rmdir: failed to remove `dir/': Directory not empty
rmdir: failed to remove `file': Not a directory

Shut this up with 2>/dev/null if it annoys you, but I'd prefer to keep as much information on the process as possible.

And don't do it until you are sure that you match the right files. And perhaps try rm -i to confirm everything. And have backups, eat your breakfast, brush your teeth, etc.

Daniel Andersson
  • 23,895
  • 5
  • 57
  • 61
13

List the contents of the tar file like so:

tar tzf myarchive.tar

Then, delete those file names by iterating over that list:

while IFS= read -r file; do echo "$file"; done < <(tar tzf myarchive.tar.gz)

This will still just list the files that would be deleted. Replace echo with rm if you're really sure these are the ones you want to remove. And maybe make a backup to be sure.

In a second pass, remove the directories that are left over:

while IFS= read -r file; do rmdir "$file"; done < <(tar tzf myarchive.tar.gz)

This prevents directories with from being deleted if they already existed before.


Another nice trick by @glennjackman, which preserves the order of files, starting from the deepest ones. Again, remove echo when done.

tar tvf myarchive.tar | tac | xargs -d'\n' echo rm

This could then be followed by the normal rmdir cleanup.

slhck
  • 223,558
  • 70
  • 607
  • 592
  • Strange way to write a pipe. – Stéphane Gimenez May 15 '12 at 12:14
  • It's *not* a pipe. It's [process substitution](http://mywiki.wooledge.org/ProcessSubstitution) and I prefer this over simple piping when used in combination with `while` to loop over a set of records. Just got used to it. @sté – slhck May 15 '12 at 12:20
  • These remove the files in the current directory, but then I get a large list of "rm: cannot remove `dir1/file1': Permission denied" errors. I think they need to be deleted in the *reverse* order that they were extracted, so the files are removed before the directories. – Mike T May 15 '12 at 12:21
  • If a directory is removed with `rm -rf`, the files inside are already deleted. Are you sure you still have actual permissions to delete those files? @mik – slhck May 15 '12 at 12:22
  • Ah yes, I needed to `chmod` those directories in order to remove them. This archive file was pretty messed up .. also it had an extension .tar.zip .. thanks! – Mike T May 15 '12 at 12:26
  • 1
    Sorry for the little delay, I noticed that using `rm -rf` could delete files that were not from the archive but inside a directory that has the same name as one from the archive. Better be careful here and use `rmdir` in a second pass. – Stéphane Gimenez May 15 '12 at 12:36
  • 1
    Actually the second pass with `rmdir` needs to be run for each level of nesting of directories. So it will clean out `subdir1` on the first pass, but leave `dir1` since it tried to delete this first when it wasn't empty at the time. This command could be done once if the file list can be reverse sorted. – Mike T May 15 '12 at 12:49
  • Tricky problem indeed, @Mike. I don't know if there's an easy solution to do this other than trying to sort the paths by their nesting level, then remove the files, and *then* start with removing the innermost directories. – slhck May 15 '12 at 12:56
  • 4
    If you want to delete the in the reverse order: `tar tvf arch.tar | tac | xargs echo rm` (remove the echo when you're confident) – glenn jackman May 15 '12 at 13:30
  • @glennjackman That will only reverse the order of the listing, but the problem is reversing the order *by depth*. So basically, you'd need to delete the files in a backwards breadth-first order. – slhck May 15 '12 at 13:35
  • @slhck, based on the example in the question, same thing. – glenn jackman May 15 '12 at 13:36
  • @glennjackman Ah, correct, didn't see the updated example. Added your line to the answer so it's easier to see. Seems like a great trick. – slhck May 15 '12 at 13:43
  • the order from `tar tf` preserves depth, so `tac` is working correctly – Mike T May 15 '12 at 13:43
2

Here's a possibility that will take the extracted files and move them to a subdirectory, cleaning up your main folder.

    #!/usr/bin/perl -w

    use strict;
    use Getopt::Long;

    my $clean_folder = "clean";
    my $DRY_RUN;
    die "Usage: $0 [--dry] [--clean=dir-name]\n"
        if ( !GetOptions("dry!" => \$DRY_RUN,
                         "clean=s" => \$clean_folder));

    # Protect the 'clean_folder' string from shell substitution
    $clean_folder =~ s/'/'\\''/g;

    # Process the "tar tv" listing and output a shell script.
    print "#!/bin/sh\n" if ( !$DRY_RUN );
    while (<>)
    {
        chomp;

        # Strip out permissions string and the directory entry from the 'tar' list
        my $perms = substr($_, 0, 10);
        my $dirent = substr($_, 48);

        # Drop entries that are in subdirectories
        next if ( $dirent =~ m:/.: );

        # If we're in "dry run" mode, just list the permissions and the directory
        # entries.
        #
        if ( $DRY_RUN )
        {
            print "$perms|$dirent\n";
            next;
        }

        # Emit the shell code to clean up the folder
        $dirent =~ s/'/'\\''/g;
        print "mv -i '$dirent' '$clean_folder'/.\n";
    }

Save this to the file fix-tar.pl and then execute it like this:

$ tar tvf myarchive.tar | perl fix-tar.pl --dry

This will confirm that your tar list is like mine. You should get output like:

-rw-rw-r--|batch
-rw-rw-r--|book-report.png
-rwx------|CaseReports.png
-rw-rw-r--|caseTree.png
-rw-rw-r--|tree.png
drwxrwxr-x|sample/

If that looks good, then run it again like this:

$ mkdir cleanup
$ tar tvf myarchive.tar | perl fix-tar.pl --clean=cleanup > fixup.sh

The fixup.sh script will be the shell commands that will move the top-level files and directories into a "clean" folder (in this instance, the folder called cleanup). Have a peek through this script to confirm that it's all kosher. If it is, you can now clean up your mess with:

$ sh fixup.sh

I prefer this kind of cleanup because it doesn't destroy anything that isn't already destroyed by being overwritten by that initial tar xv.

Note: if that initial dry run output doesn't look right, you should be able to fiddle with the numbers in the two substr function calls until they look proper. The $perms variable is used only for the dry run so really only the $dirent substring needs to be proper.

One other thing: you may need to use the tar option --numeric-owner if the user names and/or group names in the tar listing make the names start in an unpredictable column.

S2VpdGgA
  • 91
  • 4
2

That kind of (antisocial) archive is called a tar bomb because of what it does. Once one of these "explodes" on you, the solutions in the other answers are way better than what I would have suggested.

The best "solution", however, is to prevent the problem in the first place.

The easiest (laziest) way to do that is to always unpack a tar archive into an empty directory. If it includes a top level directory, then you just move that to the desired destination. If not, then just rename your working directory (the one that was empty) and move that to the desired location.

If you just want to get it right the first time, you can run tar -tvf archive-file.tar | less and it will list the contents of the archive so you can see how it is structured and then do what is necessary to extract it to the desired location to start with.

The t option also comes in handy if you want to inspect the contents of an archive just to see if it has something you're looking for in it. If it does, you can, optionally, just extract the file(s) you want.

Joe
  • 586
  • 6
  • 13
0
tar -tf fly.tar | cut -d"/" -f1 | uniq | xargs rm -rf
ZygD
  • 2,459
  • 12
  • 26
  • 43