3

I have 100,000 URLs of small files to download. Would like to use 10 threads and pipelining is a must. I concatenate the result to one file. Current approach is:

cat URLS | xargs -P5 -- curl >> OUTPUT

Is there a better option that will show progress of the whole operation? Must work from the command line.

William Entriken
  • 2,224
  • 5
  • 27
  • 39
  • "Would like to use 10 threads and pipelining is a must. I concatenate the result to one file." So the order doesn't matter? – Bobby Aug 16 '13 at 13:21
  • 1
    Use [GNU parallel](http://www.gnu.org/software/parallel/), it will even keep the order of the output. If you tag your question accordingly, you might be lucky and [the author](http://superuser.com/users/41337/ole-tange) might chime in ;-) – Adrian Frühwirth Aug 16 '13 at 14:37
  • Order is not an issue. Tagged for gnu-parallel good idea. Is it possible to use parallel and still get the pipelining in curl? – William Entriken Aug 16 '13 at 15:45
  • Don't you get the files intermingled when you do that? Unless your webserver is single-threaded, I don't see how you would avoid having two processes writing simultaneously to your output file. – rici Aug 16 '13 at 16:30
  • Mangling, jumbling are all not a problem for me. – William Entriken Aug 16 '13 at 20:18

1 Answers1

3
cat URLS | parallel -k -P10 curl >> OUTPUT

or if progress is more important:

cat URLS | parallel -k -P10 --eta curl >> OUTPUT

or:

cat URLS | parallel -k -P10 --progress curl >> OUTPUT

The 10 seconds installation will try do to a full installation; if that fails, a personal installation; if that fails, a minimal installation.

wget -O - pi.dk/3 | sh

Watch the intro video for a quick introduction: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Ole Tange
  • 4,529
  • 2
  • 34
  • 51
  • I had tried this installer `wget -O - pi.dk/3 | sh` but seem to have gotten some lame excuse for parallel that really does nothing: `parallel [OPTIONS] command -- arguments / for each argument, run command with argument, in parallel` – William Entriken Aug 18 '13 at 14:55
  • 1
    Ah, I had to uninstall moreutils first. `apt-get remove moreutils` – William Entriken Aug 18 '13 at 15:01