33

I've been studying about the command line and learned that | (pipeline) is meant to redirect the output from a command to the input of another one. So why does the command ls | file doesn't work?

file input is one of more filenames, like file filename1 filename2

ls output is a list of directories and files on a folder, so I thought ls | file was supposed to show the file type of every file on a folder.

When I use it however, the output is:

    Usage: file [-bcEhikLlNnprsvz0] [--apple] [--mime-encoding] [--mime-type]
        [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ...
    file -C [-m magicfiles]
    file [--help]

As there was some error with the usage of the file command

Willem Van Onsem
  • 618
  • 1
  • 8
  • 18
IanC
  • 931
  • 9
  • 19
  • 2
    If you are using plain `ls` , it indicates that you want all files in the current directory handled with the `file` command. ... So why not simply do : `file *` , which will reply with a line for every file , folder. – Knud Larsen Jul 06 '16 at 16:07
  • `file *` is the smartest way, I was just wondering why using `ls` output was not working. Doubt cleared :) – IanC Jul 06 '16 at 16:24
  • 6
    The premise is flawed: "file input is one of more filenames, like file filename1 filename2" That isn't input. Those are command-line arguments, as @John Kugelman points out below. – Monty Harder Jul 06 '16 at 19:23
  • 3
    Tangentially, [parsing `ls`](http://mywiki.wooledge.org/ParsingLs) is generally a bad idea. – kojiro Jul 07 '16 at 10:41
  • GNU coreutils added `ls --zero` (i.e., end each output line with NUL and not newline "\n")(as of 2022-06-11). This avoids issues with special characters (including spaces) in filenames. See [ls source code](https://github.com/coreutils/coreutils/blob/master/src/ls.c) – user2514157 Jun 12 '22 at 01:46

8 Answers8

70

The fundamental issue is that file expects file names as command-line arguments, not on stdin. When you write ls | file the output of ls is being passed as input to file. Not as arguments, as input.

What's the difference?

  • Command-line arguments are when you write flags and file names after a command, as in cmd arg1 arg2 arg3. In shell scripts these arguments are available as the variables $1, $2, $3, etc. In C you'd access them via the char **argv and int argc arguments to main().

  • Standard input, stdin, is a stream of data. Some programs like cat or wc read from stdin when they're not given any command-line arguments. In a shell script you can use read to get a single line of input. In C you can use scanf() or getchar(), among various options.

file does not normally read from stdin. It expects at least one file name to be passed as an argument. That's why it prints out usage when you write ls | file, because you didn't pass an argument.

You could use xargs to convert stdin into arguments, as in ls | xargs file. Still, as terdon mentions, parsing ls is a bad idea. The most direct way to do this is simply:

file *
John Kugelman
  • 1,391
  • 10
  • 11
  • 2
    Or force `file` to get filenames from its input, using `ls | file -f -`. Still a bad idea ofc. – spectras Jul 06 '16 at 17:24
  • @spectras that just tells file to read the filenames from the stdin. – Braiam Jul 06 '16 at 17:27
  • 2
    @Braiam> That's the point. And that pipes `ls`'s output into `file`'s stdin. Try it out. – spectras Jul 06 '16 at 17:28
  • @spectras Sorry, I wasn't clear, it makes file to use stdin as source file for the filenames. But anyways it's illogical to call ls and pipe its output to file when `file *` would work. – Braiam Jul 06 '16 at 17:30
  • 4
    @Braiam> Indeed it's wasteful and dangerous. But it works and it's nice to have it to compare to better options if the OP is learning to use redirections. For completeness I could also mention `file $(ls)`, which also works, in yet another way. – spectras Jul 06 '16 at 17:33
  • 2
    I think after reading all the answers I have a bigger picture of the issue, even though I think I'll need further reading to really understand it all. First, apparently using piping and redirecting doesn't parse the output as _arguments_, but as _STDIN_. Which I still have to read further to understand better, but making a superficial search _arguments_ seems like text being parsed to the program in an array, and _STDIN_ like a way of pooling information for a file or an output (not all programs being designed to work with this "pooling") – IanC Jul 06 '16 at 18:21
  • 3
    Second, using _ls_ to make a list of filenames seems like a bad idea, because of special characters that are accepted on filenames but can end up in a misleading output on _ls_. Since it uses _newlines_ as a separator between filenames and filenames can contain _newlines_ and other special characters, the final output might not be precise. – IanC Jul 06 '16 at 18:23
  • 1
    @terdon> please re-read my comment, it's not `ls | file -`, it's `ls | file -f -`. You forgot the `-f` flag. – spectras Jul 06 '16 at 21:24
  • 1
    @spectras so I did! Sorry about that, you're quite right. – terdon Jul 06 '16 at 21:26
18

Because, as you say, the input of file has to be filenames. The output of ls, however, is just text. That it happens to be a list of file names doesn't change the fact that it is simply text and not the location of files on the hard drive.

When you see output printed on the screen, what you see is text. Whether that text is a poem or a list of filenames makes no difference to the computer. All it knows is that it is text. This is why you can pass the output of ls to programs that take text as input (although you really, really shouldn't):

$ ls / | grep etc
etc

So, to use the output of a command that lists file names as text (such as ls or find) as input for a command that takes filenames, you need to use some tricks. The typical tool for this is xargs:

$ ls
file1 file2

$ ls | xargs wc
 9  9 38 file1
 5  5 20 file2
14 14 58 total

As I said before, though, you really don't want to be parsing the output of ls. Something like find is better (the print0 prints a \0 instead of a newilne after each file name and the -0 of xargs lets it deal with such input; this is a trick to make your commands work with filenames containing newlines):

$ find . -type f -print0 | xargs -0 wc
 9  9 38 ./file1
 5  5 20 ./file2
14 14 58 total

Which also has its own way of doing this, without needing xargs at all:

$ find . -type f -exec wc {} +
 9  9 38 ./file1
 5  5 20 ./file2
14 14 58 total

Finally, you can also use a shell loop. However, note that in most cases, xargs will be much faster and more efficient. For example:

$ for file in *; do wc "$file"; done
 9  9 38 file1
 5  5 20 file2
terdon
  • 98,183
  • 15
  • 197
  • 293
  • A side-issue is that `file` doesn't appear to actually read stdin unless given an explicit `-` placeholder: compare `file foo`, `echo foo | file`, and `echo foo | file -`; in fact that's probably the reason for the usage message in the OPs case (i.e. it's not really because the output of `ls` is "simply text", but rather because the argument list to `file` is empty) – steeldriver Jul 06 '16 at 17:01
  • @steeldriver yeah. AFAIK that's the case for all programs that expect files and not text as input. They just ignore stdin by default. Note that `echo foo | file -` doesn't actually run `file` on the file `foo` but on the stdin stream. – terdon Jul 06 '16 at 17:06
  • Well there are odd ducks (?!) like `cat` that except stdin without `-` except when given file arguments as well I think? – steeldriver Jul 06 '16 at 17:32
  • @steeldriver or like `paste`, yeah. – terdon Jul 06 '16 at 17:35
  • 3
    This answer fails to explain the difference between stdin and command line arguments, and so, despite being more on point than the accepted answer, is still deeply misleading for the same reason. – zwol Jul 06 '16 at 17:52
  • @zwol I figured that might be a little bit beyond the scope of what the OP was looking for here, considering that they're only starting out. A discussion of input streams, files, pointers etc would probably make this answer more complicated than informative. – terdon Jul 06 '16 at 17:54
  • 5
    @terdon I think that's a serious error in this case. "file(1) takes the list of files to operate on as command line arguments, not as standard input" is _fundamental_ to understanding why the OP's command didn't work, and the distinction is fundamental to shell scripting in general; you are not doing them any favors by glossing over it. – zwol Jul 06 '16 at 17:57
6

learned that '|' (pipeline) is meant to redirect the output from a command to the input of another one.

It doesn't "redirect" the output, but takes the output of a program and use it as input, while file doesn't take inputs but filenames as arguments, which are then tested. Redirections do not pass these filenames as arguments neither piping does, the later what you are doing.

What you can do is read the filenames from a file with the --files-from option if you have a file which list all files you want to test, otherwise just pass the paths to your files as arguments.

Braiam
  • 66,947
  • 30
  • 177
  • 264
6

The accepted answer explains why the pipe command doesn't work straightaway, and with the file * command, it offers a simple, straightforward solution.

I'd like to suggest another alternative that might come in handy at some time. The trick is using the backtick (`) character. The backtick is explained in great detail here. In short, it takes the output of the command enclosed in the backticks and substitutes it as a string into the remaining command.

So, find `ls` will take the output of the ls command, and substitute it as arguments for the find command. This is longer and more complicated than the accepted solution, but variants of this may be helpful in other situations.

Schmuddi
  • 161
  • 1
  • 7
  • I'm reading a book about using the command line on Linux (the doubt came from me experimenting with it), and coincidentaly I just readed about "command substitution". You can use either _$(command)_ or _``command``_ (can't find the backslash code on my phone) to expand the output of a command in the bash and use it as parameter to other commands. Really useful, even though using it in this case (with _ls_) would still result in some issues because of the special characters on some filenames. – IanC Jul 07 '16 at 00:28
  • @IanC Unfortunately, most books and tutorials out there about bash are garbage, tainted with bad practices, deprecated syntax, subtle bugs; (the only) trustworthy references out there are the bash developers, that is, [the manual](https://www.gnu.org/software/bash/manual/) and the [#bash IRC channel](http://webchat.freenode.net/?channels=bash) on freenode (also check out the resources linked in the channel topic). – ignis Jul 07 '16 at 21:25
  • 1
    Using command substitution can be really helpful at times, but in this context it's pretty perverse - especially with ls. – Joe Jul 17 '16 at 07:57
  • also related: http://unix.stackexchange.com/a/5782/107266 – matth Feb 15 '17 at 11:51
5

The output of ls through a pipe is a solid block of data with 0x0a separating each line - ie a linefeed character - and file gets this as one parameter, where it expects multiple characters to work on one at a time.

As a general rule, never use ls to generate a data source for other commands - one day it'll pipe .. into rm and then you're in trouble!

Better to use a loop, such as for i in *; do file "$i" ; done which will produce the output you want, predictably. The quotes are there in case of filenames with spaces.

Mark Williams
  • 2,580
  • 13
  • 22
  • 8
    easier: `file *` ;-) – Wayne_Yux Jul 06 '16 at 16:05
  • True - it works in this situation; but perhaps not universally as helpful! – Mark Williams Jul 06 '16 at 16:06
  • Indeed, I just ran `ls > lscommandoutput.txt` and opened the file in a Hexadecimal editor. Even when the output is just one file, there is a 0x0A on the end of the filename, which is probably messing with the `file` call. `file *` is definitely the easiest call, I was just intrigued why the command didn't work. Thanks! – IanC Jul 06 '16 at 16:21
  • 3
    @IanC I really can't stress enough that parsing the output of `ls` is a [very, very bad idea](http://mywiki.wooledge.org/ParsingLs). Not only because you might pass it to something harmful such as `rm`, more importantly because it breaks on any non-standard file name. – terdon Jul 06 '16 at 16:49
  • 5
    The first paragraph is somewhere between misleading and straight nonsense. Linefeeds have no relevance. The second paragraph is right for the wrong reason. It's bad to parse ls, but not because it might be somehow magically "piped" to rm. – John Kugelman Jul 06 '16 at 17:23
  • 1
    Does `rm` take filenames from standard input? I think not. Also, as a general rule, `ls` has been one of the principal examples of a data source for the use of Unix pipelines since the beginning of Unix. That's why it defaults to a simple one-filename-per-line with no attributes or adornments when it's output is a pipe, unlike its usual default formatting when the output is the terminal. – davidbak Jul 06 '16 at 17:42
  • @terdon - is it really a _very, very bad idea_, or just an ordinary bad idea? It breaks on filenames with a linefeed in them. I don't have a Linux system handy at the moment, but how many such files are on the mounted filesystem on your desktop? Improperly programmed utility programs might not be able to handle filenames with embedded nulls but how many such files are on the mounted filesystem on your desktop? – davidbak Jul 06 '16 at 17:48
  • 1
    This answer is misleading, it doesn't address the difference between *standard input* and *command line arguments* which is what's really wrong here. – zwol Jul 06 '16 at 17:51
  • @davidbak on *my* system, none. That doesn't mean you should ignore the possibility. Since anything except `/` and `\0` is allowed in *nix file names, your programs should always be ready to deal with such cases. However, parsing `ls` can also fail on spaces (for example `for i in $(ls); echo "$i"; done`) and filenames with spaces are actually quite common. For more than you ever wanted to know about why it should be avoided see [here](http://mywiki.wooledge.org/ParsingLs) and [here](http://unix.stackexchange.com/q/128985/22222). – terdon Jul 06 '16 at 18:28
  • 1
    @terdon - avoided in programs, yes. But in commands, typed on the commandline, it's fine, so long as you are confident in the contents of the subset of your own filesystem that you are working on. – Dewi Morgan Jul 06 '16 at 18:41
  • @DewiMorgan It fails in many easy-to-encounter subtle cases, dash after space, `touch 'a b' a b`, question mark, you're not going to find a complete checklist. – ignis Jul 07 '16 at 10:13
  • 2
    @DewiMorgan This website is mainly targeted at a non-technical audience, so spreading/encouraging bad habits here does harm and does nothing good. On unix.SE or other tech community, whose users have the knowledge/means to aim very close to their feet without shooting the feet themselves, your point might hold (regarding other practices) but here it does not make your comment look smart. – ignis Jul 07 '16 at 10:27
4

If you want to use a pipe to feed file use the option -f which is normally followed by a filename but you can also use a single hyphen - to read from stdin, so

$ ls
cow.pdf  some.txt
$ ls | file -f -
cow.pdf:       PDF document, version 1.4
some.txt:        ASCII text

The trick with the hyphen - works with a lot of the standard command-line utils (although it is -- sometimes), so it is always worth a try.

The tool xarg is much more powerful and in most cases only needed if the argument-list is too long (see this post for details).

deamentiaemundi
  • 223
  • 1
  • 4
  • When is it `--`? I've never seen that. `--` is typically the "end of flags" indicator. – John Kugelman Jul 06 '16 at 20:34
  • Yes, but I found it in a couple of instances (ab)used in that way by the programmer. I cannot remember where exactly (will add a comment if I do) but I remember the curses I uttered when I found it out and these curses were definitely NSFW ;-) – deamentiaemundi Jul 06 '16 at 21:09
2

It works use command like below

ls | xargs file

It will work better to me

SuperKrish
  • 269
  • 1
  • 2
  • 9
1

This should also work:

file $(ls)

as also discussed here: https://unix.stackexchange.com/questions/5778/whats-the-difference-between-stuff-and-stuff

matth
  • 153
  • 1
  • 7
  • The `$()` method is the same as the backtick method in http://askubuntu.com/a/795744/158442 – muru Feb 15 '17 at 11:07