23

Why does redirecting the output of a file to itself produce a blank file?

Stated in Bash, why do

less foo.txt > foo.txt

and

fold foo.txt > foo.txt

produce an empty foo.txt? Since an append such as less eggs.py >> eggs.py produces a two copies of the text in eggs.py, one might expect that an overwrite would produce one copy of the text.

Note, I'm not saying this is a bug, it is more likely a pointer to something deep about Unix.

slhck
  • 223,558
  • 70
  • 607
  • 592
seewalker
  • 713
  • 1
  • 6
  • 8

3 Answers3

23

When you use >, the file is opened in truncation mode so its contents are removed before the command attempts to read it.

When you use >>, the file is opened in append mode so the existing data is preserved. It is however still pretty risky to use the same file as input and output in this case. If the file is large enough not to fit the read input buffer size, its size might grow indefinitely until the file system is full (or your disk quota is reached).

Should you want to use a file both as input and output with a command that doesn't support in place modification, you can use a couple of workarounds:

  • Use an intermediary file and overwrite the original one when done and only if no error occurred while running the utility (this is the safest and more common way).

    fold foo.txt > fold.txt.$$ && mv fold.txt.$$ foo.txt
    
  • Avoid the intermediary file at the expense of a potential partial or complete data loss should an error or interruption happen. In this example, the contents of foo.txt are passed as input to a subshell (inside the parentheses) before the file is deleted. The previous inode stays alive as the subshell is keeping it open while reading data. The file written by the inner utility (here fold) while having the same name (foo.txt) points to a different inode because the old directory entry has been removed so technically, there are two different "files" with the same name during the process. When the subshell ends, the old inode is released and its data is lost. Beware to make sure you have enough space to temporarily store both the old file and the new one at the same time otherwise you'll lose data.

    (rm foo.txt; fold > foo.txt) < foo.txt
    
jlliagre
  • 13,899
  • 4
  • 31
  • 48
  • 4
    `sponge` from [moreutils](http://joeyh.name/code/moreutils/) can also help. `fold foo.txt | sponge foo.txt` – or `fold foo.txt | sponge !$` should also do. – slhck May 19 '13 at 11:29
  • @slhck Indeed, sponge could do the job too. However, being neither specified by POSIX nor mainstream in Unix like OSes, it is unlikely to be present. – jlliagre May 19 '13 at 20:17
  • It's not like it can't be *made* present though ;) – slhck May 19 '13 at 21:44
8

The file is opened for writing by the shell before the application has a chance to read it. Opening the file for writing truncates it.

Ignacio Vazquez-Abrams
  • 111,361
  • 10
  • 201
  • 247
1

In bash, the stream redirection operator ... > foo.txt empties foo.txt before evaluating the left operand.

One might use command substitution and print its result as a workaround. This solution takes less additional characters than in other answers:

printf '%s\n' "$(less foo.txt)" > foo.txt

Beware: This command does not preserve any trailling newline(s) in foo.txt. Have a look in the comment section below for more information

Here, the command substitution $(...) is evaluated before the stream redirection operator >, hence the preservation of information.

ljleb
  • 11
  • 2
  • @KamilMaciorowski: Actually, there is ```tmp=$(cmd; printf q);  printf '%s' "${tmp%q}"```. But you missed another issue with this answer: it says “subshell” when it means “command substitution”.  Yes, command substitutions are generally subshells, but not vice versa, and subshells, in general, are no help for this problem. – Scott - Слава Україні May 29 '19 at 17:08
  • @KamilMaciorowski I feel so bad for missing all of this. Thanks for pointing all of this. For your (4)th point: would backquotes do the trick i.e. preserve trailing newline(s)? – ljleb May 30 '19 at 20:20
  • @Scott thanks for your reply. I changed "subshell" for "command substitution". By the way, I wonder what's the exact difference between the two. – ljleb May 30 '19 at 20:22
  • No, backquotes (backticks) strip trailing newline characters as well. – Kamil Maciorowski May 30 '19 at 20:23
  • Alright then, I added a warning message for now. I'll remove it if I find a solution. – ljleb May 30 '19 at 20:31
  • Well, now the answer is not that bad. Even with the warning there's one more problem: POSIX requires any non-empty text file to end with a newline character (otherwise the last line is incomplete). So `%s\n` as format would be better. But if the file is binary, `%s` may be better. In any case you're risking the new content is not exactly what it should be. Scott's approach can fix this; it's far from being elegant though. – Kamil Maciorowski May 30 '19 at 20:40