Recursive bash script to collect information about each file in a directory structure

Question

How do I work recursively through a directory tree and execute a specific command on each file, and output the path, filename, extension, filesize and some other specific text to a single file in bash.

lol, thanks for the edit; i'll be the first to admit i overcomplicate things, because i'm used to being asked 800 irrelevant questions in the hooman world; so i try to answer the obvious ones in the questions; i'll learn though :-) — SPooKYiNeSS, Oct 25 '17 at 17:34
OK, I think the question is pretty clear on what should be done, go through directory tree and output info about each file. The question is pretty clear, and judging by amount of answers already, people understand it fairly well. The 3 votes for being unclear are really not deserved to this question — Sergiy Kolodyazhnyy, Oct 25 '17 at 21:14

pa4080 · Accepted Answer · 2017-11-09T15:56:01.020

While find solutions are simple and powerful, I decided to create a more complicated solution, that is based on this interesting function, which I saw few days ago.

More explanations and two other scripts, based on the current are provided here.

1. Create executable script file, called walk, that is located in /usr/local/bin to be accessible as shell command:

sudo touch /usr/local/bin/walk
sudo chmod +x /usr/local/bin/walk
sudo nano /usr/local/bin/walk

Copy the below script content and use in nano: Shift+Insert for paste; Ctrl+O and Enter for save; Ctrl+X for exit.

2. The content of the script walk is:

#!/bin/bash

# Colourise the output
RED='\033[0;31m'        # Red
GRE='\033[0;32m'        # Green
YEL='\033[1;33m'        # Yellow
NCL='\033[0m'           # No Color

file_specification() {
        FILE_NAME="$(basename "${entry}")"
        DIR="$(dirname "${entry}")"
        NAME="${FILE_NAME%.*}"
        EXT="${FILE_NAME##*.}"
        SIZE="$(du -sh "${entry}" | cut -f1)"

        printf "%*s${GRE}%s${NCL}\n"                    $((indent+4)) '' "${entry}"
        printf "%*s\tFile name:\t${YEL}%s${NCL}\n"      $((indent+4)) '' "$FILE_NAME"
        printf "%*s\tDirectory:\t${YEL}%s${NCL}\n"      $((indent+4)) '' "$DIR"
        printf "%*s\tName only:\t${YEL}%s${NCL}\n"      $((indent+4)) '' "$NAME"
        printf "%*s\tExtension:\t${YEL}%s${NCL}\n"      $((indent+4)) '' "$EXT"
        printf "%*s\tFile size:\t${YEL}%s${NCL}\n"      $((indent+4)) '' "$SIZE"
}

walk() {
        local indent="${2:-0}"
        printf "\n%*s${RED}%s${NCL}\n\n" "$indent" '' "$1"
        # If the entry is a file do some operations
        for entry in "$1"/*; do [[ -f "$entry" ]] && file_specification; done
        # If the entry is a directory call walk() == create recursion
        for entry in "$1"/*; do [[ -d "$entry" ]] && walk "$entry" $((indent+4)); done
}

# If the path is empty use the current, otherwise convert relative to absolute; Exec walk()
[[ -z "${1}" ]] && ABS_PATH="${PWD}" || cd "${1}" && ABS_PATH="${PWD}"
walk "${ABS_PATH}"      
echo

3. Explanation:

The main mechanism of the walk() function is pretty well described by Zanna in her answer. So I will describe only the new part.
Within the walk() function I've added this loop:
```
for entry in "$1"/*; do [[ -f "$entry" ]] && file_specification; done
```
That means for each $entry that is a file will be executed the function file_specification().
The function file_specification() has two parts. The first part gets data related to the file - name, path, size, etc. The second part output the data in well formatted form. To format the data is used the command printf. And if you want to tweak the script you should read about this command - for example this article.
The function file_specification() is good place where you can put the specific command that should be execute for each file. Use this format:
```
command "${entry}"
```
Or you can save the output of the command as variable, and then printf this variable, etc.:
```
MY_VAR="$(command "${entry}")"
printf "%*s\tFile size:\t${YEL}%s${NCL}\n" $((indent+4)) '' "$MY_VAR"
```
Or directly printf the output of the command:
```
printf "%*s\tFile size:\t${YEL}%s${NCL}\n" $((indent+4)) '' "$(command "${entry}")"
```
The section to the begging, called Colourise the output, initialise few variables that are used within the printf command to colourise the output. More about this you could find here.
To the bottom of the scrip is added additional condition that deals with absolute and relative paths.

4. Examples of usage:

To run walk for the current directory:

walk      # You shouldn't use any argument, 
walk ./   # but you can use also this format

To run walk for any child directory:

walk <directory name>
walk ./<directory name>
walk <directory name>/<sub directory>

To run walk for any other directory:
```
walk /full/path/to/<directory name>
```
To create a text file, based on the walk output:
```
walk > output.file
```

To create output file without colour codes (source):

walk | sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[mGK]//g" > output.file

5. Demonstration of usage:

@pbhj, under Ubuntu I'm using [Peek](http://www.omgubuntu.co.uk/2016/08/peek-desktop-gif-screen-recorder-linux) it is simple and nice, but sometimes crashes and doesn't have editing abilities. The most of my GIFs are created under Windows, where I'm recording the the window of the VNC connection. I have a separate desktop machine that mainly I'm using for MS Office and GIF creation :) The tool that I'm using there is [ScreenToGif](http://www.screentogif.com/). It is opensource, free, and has powerful editor and processing mechanisum. Unfortunately I can't find tool like ScreenToGif for Ubuntu. — pa4080, Nov 09 '17 at 22:08

score 14 · Answer 2 · answered Oct 25 '17 at 09:43

I'm slightly perplexed as to why nobody has posted it yet, but indeed bash does have recursive capabilities, if you enable globstar option and use ** glob. As such, you can write (almost) pure bash script that uses that recursive globstar like this:

#!/usr/bin/env bash

shopt -s globstar

for i in ./**/*
do
    if [ -f "$i" ];
    then
        printf "Path: %s\n" "${i%/*}" # shortest suffix removal
        printf "Filename: %s\n" "${i##*/}" # longest prefix removal
        printf "Extension: %s\n"  "${i##*.}"
        printf "Filesize: %s\n" "$(du -b "$i" | awk '{print $1}')"
        # some other command can go here
        printf "\n\n"
    fi
done

Notice that here we use parameter expansion to get the parts of filename we want and we're not relying on external commands except for getting the file size with du and cleaning output with awk.

And as it traverses your directory tree, your output should something like this:

Path: ./glibc/glibc-2.23/benchtests
Filename: sprintf-source.c
Extension: c
Filesize: 326

Standard rules of script usage apply: make sure it is executable with chmod +x ./myscript.sh and run it from current directory via ./myscript.sh or place it in ~/bin and run source ~/.profile.

If you're printing the full filename what extra does "extension" give you? Perhaps you really want the MIME information that `"$(file "$i")"` (in the above script as second part of a printf) would return? — pbhj, Nov 09 '17 at 21:42
@pbhj To me personally ? Nothing. But OP who asked the question asked for `output the path, filename, extension, filesize `, so the answer matches what is asked. :) — Sergiy Kolodyazhnyy, Nov 09 '17 at 22:42

score 12 · Answer 3 · edited Oct 25 '17 at 15:09

12

You can use find to do the job

find /path/ -type f -exec ls -alh {} \;

This will help you if you just want to list all files with size.

-exec will allow you to execute custom command or script for each file \; used to parse files one by one, you can use +; if you want to concatenate them (means file names).

edited Oct 25 '17 at 15:09

αғsнιη

35,092
41
129
192

answered Oct 25 '17 at 05:11

Rajesh Rajendran

1,077
9
17

This is nice, but not answer to the all the requirements OP mentioned. – αғsнιη Oct 25 '17 at 08:46
1

@αғsнιη I just gave him a template to work on. I know, this is not a complete answer to this question, as I think the question itself is broad in scope. – Rajesh Rajendran Oct 25 '17 at 08:52

αғsнιη · Answer 4 · 2017-10-25T08:22:04.727

6

With find only.

find /path/ -type f -printf "path:%h  fileName:%f  size:%kKB Some Text\n" > to_single_file

Or, you could use below instead:

find -type f -not -name "to_single_file"  -execdir sh -c '
    printf "%s %s %s %s Some Text\n" "$PWD" "${1#./}" "${1##*.}" $(stat -c %s "$1")
' _ {} \; > to_single_file

edited Oct 25 '17 at 08:22

answered Oct 25 '17 at 07:45

αғsнιη

35,092
41
129
192

2

Elegant and simple (if you know about `find -printf`). +1 – David Foerster Oct 25 '17 at 13:18

Benubird · Answer 5 · 2017-10-25T11:10:53.067

1

If you know how deep the tree is, the easiest way will be to use the wildcard *.

Write up everything you want to do as a shell script or a function

function thing() { ... }

then run for i in *; do thing "$i"; done, for i in */*; do thing "$i"; done, ... etc

Within your function/script, you can use some simple tests to single out the files you want to work with and do whatever you need to with them.

edited Oct 25 '17 at 11:10

answered Oct 25 '17 at 09:35

Benubird

441
7
16

"this won't work if any of your filenames have spaces in them" ... because your forgot to quote your variables! Use "$i" instead of `$i`. – muru Oct 25 '17 at 09:38
@muru no, the reason it doesn't work is because the "for" loop splits on spaces - "*/*' gets expanded into a space-separated list of all files. You can work around this, e.g. by messing with the IFS, but at that point you might as well just use find. – Benubird Oct 25 '17 at 10:52
@pa4080 not relevant to this answer, but that looks super useful anyway, thanks! – Benubird Oct 25 '17 at 10:54
I think you don't understand how `for i in */*` works. Here, test it: `for i in */*; do printf "|%s|\n" "$i"; done` – muru Oct 25 '17 at 10:56
Here is an evidence of quotation marks importance: https://i.stack.imgur.com/oYSj2.png – pa4080 Oct 25 '17 at 11:06
@muru You're right; that's very interesting, I wonder how that actually works. I guess it must be interacting with the wildcard somehow to recognize spaces as part of the object, as I know if splits on spaces normally. Thanks for pointing that out. – Benubird Oct 25 '17 at 11:10

score 1 · Answer 6 · answered Nov 09 '17 at 15:51

find can do this:

find ./ -type f -printf 'Size:%s\nPath:%H\nName:%f\n'

Have a look at man find for other file properties.

If you really need the extension, you can add this:

find ./ -type f -printf 'Size:%s\nPath:%H\nName:%f\nExtension:' -exec sh -c 'echo "${0##*.}\n"' {} \;

Recursive bash script to collect information about each file in a directory structure

6 Answers6

Linked

Related