Linux command to concatenate a file to itself n times

Question

I've taken a plain text file book from Project Gutenberg (around 0.5MB) which I want to concatenate to itself n times in order to generate a large text file that I can benchmark some algorithms on. Is there a linux command I can use to achieve this? cat sounds ideal, but doesn't seem to play too nice with concatenating a file onto itself, plus does not directly address the n times part of the question.

use some kind of loop, and appending? so repeat foo.txt>>bar.txt and wrap that up in something that will run the command that many times? — Journeyman Geek, Sep 22 '11 at 12:32

Journeyman Geek · Accepted Answer · 2011-09-22T13:04:10.380

42

Two parts to this, to me - first - to use cat to output the text file to standard output, and use append to add it to another file - eg foo.txt>>bar.txt will append foo.txt to bar.txt

then run it n times with

for i in {1..n};do cat foo.txt >> bar.txt; done

replacing n in that command with your number

should work, where n is your number

If you use csh, there's the 'repeat' command.

repeat related parts of the answer are copied from here , and i tested it on an ubuntu 11.04 system on the default bash shell.

edited Sep 22 '11 at 13:04

answered Sep 22 '11 at 12:43

Journeyman Geek

127,463
52
260
430

3

Fun fact: this actually works without replacing 'n', in which case it'll execute the body once for each character between ASCII '1' and ASCII 'n' (so 62 times). But `{1..12}` will correctly run the body 12 times. – Arnout Engelen Mar 25 '16 at 20:25
3

You might want to just redirect the whole pipeline, rather than appending in each iteration: `for i in {1..n};do cat foo.txt; done > bar.txt` – Toby Speight Mar 02 '17 at 12:59
This doesn't give you the exponential growth size... :) – rogerdpack Jan 20 '22 at 20:04

Toby Speight · Answer 2 · 2021-07-01T17:30:11.833

5

You certainly can use cat for this:

$ cat /tmp/f
foo
$ cat /tmp/f /tmp/f
foo
foo

To get $n copies, you could use yes piped into head -n $n:

$ yes /tmp/f | head -n 10
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f
/tmp/f

Putting that together gives

yes /tmp/f | head -n $n | xargs cat >/tmp/output

edited Jul 01 '21 at 17:30

answered Mar 02 '17 at 12:58

Toby Speight

4,866
1
26
36

phicr · Answer 3 · 2017-03-02T10:42:01.147

I am bored so here are a few more methods on how to concatenate a file to itself, mostly with head as a crutch. Pardon me if I overexplain myself, I just like saying things :P

Assuming N is the number of self concatenations you want to do and that your file is named file.

Variables:

linecount=$(<file wc -l)

total_repeats=$(echo "2^$N - 1" | bc) # obtained through the power of MATH

total_lines=$((linecount*(total_repeats+1)))

tmp=$(mktemp --suffix .concat.self)

Given a copy of file called file2, total_repeats is the number of times file would need to be added to file2 to make it the same as if file was concatenated to itself N times.

Said MATH is here, more or less: MATH (gist)

It's first semester computer science stuff but It's been a while since I did a induction proof so I can't get over it... (also this class of recursion is pretty well known to be 2^Loops so there is that too....)

POSIX

I use a few non-posix things but they are not essential. For my purposes:

 yes() { while true; do echo "$1"; done; }

Oh, I only used that. Oh well, the section is already here...

Methods

head with linecount tracking.

ln=$linecount
for i in $(seq 1 $N); do
    <file head -n $ln >> file;
    ln=$((ln*2))
done

No temp file, no cat, not even too much math yet, all joy.

tee with MATH

<file tee -a file | head -n $total_lines > $tmp
cat $tmp > file

Here tee is reading from file but perpetually appending to it, so it will keep reading the file on repeat until head stops it. And we know when to stop it because of MATH. The appending goes overboard through, so I used a temp file. You could trim the excess lines from file too.

eval, the lord of darkness!

eval "cat $(yes file | head -n $((total_repeats+1)) | tr '\n' ' ')" > $tmp
cat $tmp > file

This just expands to cat file file file ... and evals it. You can do it without the $tmp file, too:

eval "cat $(yes file | head -n $total_repeats | tr '\n' ' ')" |
  head -n $((total_lines-linecount)) >> file

The second head "tricks" cat by putting a middle man between it and the write operation. You could trick cat with another cat as well but that has inconsistent behavior. Try this:

test_double_cat() {
    local Expected=0
    local Got=0
    local R=0
    local file="$(mktemp --suffix .double.cat)"
    for i in $(seq 1 100); do

        printf "" > $file
        echo "1" >> $file
        echo "2" >> $file
        echo "3" >> $file

        Expected=$((3*$(<file wc -l)))

        cat $file $file | cat >> $file

        Got=$(<file wc -l)

        [ "$Expected" = "$Got" ] && R="$((R+1))"
    done
    echo "Got it right $R/100"
    rm $file
}

sed:

<file tr '\n' '\0' |
    sed -e "s/.*/$(yes '\0' | head -n $total_repeats | tr -d '\n')/g" |
        tr '\0' '\n' >> file

Forces sed into reading the entire file as a line, captures all of it, then pastes it $total_repeats number of times.

This will fail of course if you have any null characters in your file. Pick one that you know isn't there.

find_missing_char() {
  local file="${1:-/dev/stdin}"

  firstbyte="$(<$file fold -w1 | od -An -tuC | sort -un | head -n 1)"
  if [ ! "$firstbyte" = "0" ]; then
    echo "\0"
  else
    printf "\\$(printf '%03o\t' $((firstbyte-1)) )"
  fi
}

That's all for now lads, I hope this arbitrary answer didn't bother anyone. I tested all of them many times but I am only a two-year shell user so keep that in mind I guess. Now to sleep...

rm $tmp

score 0 · Answer 4 · answered Feb 09 '21 at 11:44

You might be able to use tee for this. tee -a x x will append the same lines twice to file x.

Now we need to write x $N times. We can do that with yes x|head -n $N, giving

<file tee -a $(yes outfile|head -n $N)

Demo:

$ cat foo
foo
bar
$ tee -a $(yes x|head -5) <foo >/dev/null
$ cat x
foo
bar
foo
bar
foo
bar
foo
bar
foo
bar

Linux command to concatenate a file to itself n times

4 Answers4

Methods