25

Why does this not work?

ls *.txt | xargs cat > all.txt

(I want to join the contents of all text files into a single 'all.txt' file.) find with -exec should also work, but I would really like to understand the xargs syntax.

Thanks

ajo
  • 915
  • 2
  • 12
  • 20
  • 1
    Though [don't use `ls` for this](https://mywiki.wooledge.org/ParsingLs). If you really can't use `cat *.txt >all.txt` then try `printf '%s\0' *.txt | xargs -r0 cat >all` and then `mv all all.txt` to avoid having the file referencing itself. – tripleee Jul 24 '19 at 05:52

4 Answers4

30

ls *.txt | xargs cat >> all.txt

might work a bit better, since it would append to all.txt instead of creating it again after each file.

By the way, cat *.txt >all.txt would also work. :-)

Janne Pikkarainen
  • 7,715
  • 1
  • 31
  • 32
  • 7
    The cat *.txt >all.txt is naturally better. Thanks – ajo Sep 28 '10 at 11:11
  • 1
    However, the ... | xargs cat >> all.txt or > all.txt always return error with xargs: unmatched single quote ... Is it because xargs takes everything after it as the command? – ajo Sep 28 '10 at 11:12
  • 1
    Do you have filenames with spaces? If so, then use something like "find /your/path -iname '*.txt' -print0 | xargs -0 cat >>all.txt" instead – Janne Pikkarainen Sep 28 '10 at 11:17
  • 1
    no, I replaced all the filename spaces with _. But thinking of it, some filenames are likely to include single quotes as in listing_O'Connor_.txt, this might be the problem! – ajo Sep 28 '10 at 11:29
  • Yes, that's the problem then. :) The easiest and the sanest way is to use find with -print0 combined with xargs -0 -- then the whole chain will use NULL character as a separator and whitespace and special characters will be taken care of automatically. – Janne Pikkarainen Sep 28 '10 at 11:37
  • Indeed: After removing the single-quotes in some filenames via "s/'/_/g" *.txt, the command works OK!! But could it be done from within xargs via some option??? – ajo Sep 28 '10 at 11:40
  • OK, find isn't bad either... I remember having similar problems when the file names contained spaces!! – ajo Sep 28 '10 at 11:42
  • find is much better than ls in case you need to recurse into subdirectories. – Janne Pikkarainen Sep 28 '10 at 11:44
  • for the latter: what about ls -R? (apart from the directory line?!) – ajo Sep 28 '10 at 11:51
  • ls -R maybe fine for human readable form, but if you need to handle something with xargs or other tools -- not so much. See, ls -R does not list the full path along with the every filename, but find or tree will do it. Makes scripting a lot easier. When scripting or piping stuff, please get rid of ls and use more advanced tools :-) – Janne Pikkarainen Sep 28 '10 at 11:54
  • This is potentially very dangerous command. If "all.txt" already exists, running this command will expand to fill all available hard drive space. – Dan Loewenherz Aug 28 '14 at 15:43
3

If some of your file names contain ', " or space xargs will fail because of the separator problem

In general never run xargs without -0 as it will come back and bite you some day.

Consider using GNU Parallel instead:

ls *.txt | parallel cat > tmp/all.txt

or if you prefer:

ls *.txt | parallel cat >> tmp/all.txt

Learn more about GNU Parallel http://www.youtube.com/watch?v=OpaiGYxkSuQ

Ole Tange
  • 436
  • 2
  • 3
1

all.txt is a file in the same directory, so cat gets confused when it wants to write from the same file to the same file.

On the other hand:

ls *.txt | xargs cat > tmp/all.txt

This will read from textfiles in your current directory into the all.txt in a subdirectory (not included with *.txt).

Jeremy Smyth
  • 354
  • 2
  • 4
0

You could also come across a command line length limitation. Part of the reason for using xargs is that it splits up the input into safe command-line-sized chunks. So, imagine a situation in which you have hundreds of thousands of .txt files in the directory. ls *.txt will fail. You would need to do

ls | grep .txt$ |xargs cat > /some/other/path/all.txt

.txt$ in this case is a regular expression matching everything that ends in .txt (so it's not exactly like *.txt, since if you have a file called atxt, then *.txt would not match it, but the regular expression would.)

The use of another path is because, as other answers have pointed out, all.txt is matched by the pattern *.txt so there would be a conflict between input and output.

Note that if you have any files with ' in their names (and this may be the cause of the unmatched single quote error), you would want to do

ls | grep --null .txt$ | xargs -0 cat > /some/other/path/all.txt

The --null option tells grep to use output separated by a \0 (aka null) character instead of the default newline, and the -0 option to `xargs tells it to expect its input in the same format. This would work even if you had file names with newlines in them.

Brian Minton
  • 579
  • 7
  • 13