32

I need to create a list of checksums of the files that are inside a directory, including any subdirectories.

The command that I try to execute is the following:

 sha256sum -b * 

Usage:


 -b = Read in Binary.

 * = Specifies that you must verify all file extensions.

With the command I get the following output:

sha256sum: test0: Is a directory e3d748fdf10adca15c96d77a38aa0447fa87af9c297cb0b75e314cc313367daf *test1.txt db0c7a354881fe2dd1b45642a68f6a971c7421e8fdffe56ffa7c740111e07274 *test2.txt

Instead of reporting that test0 is a directory, you should also generate the checksum of the content.

Do you recommend always using -b in any type of file? In what cases should -t be used?

Is it possible to filter the types of files I want to omit in the verification, without having to add all the files I want to admit? What command should I execute?

I looked for help but I do not find anything related.

MarianoM
  • 615
  • 2
  • 12
  • 22

6 Answers6

34

You can use find to find all files in the directory tree, and let it run sha256sum. The following command line will create checksums for the files in the current directory and its subdirectories.

find . -type f -exec sha256sum {} \;

I don't use the options -b and -t, but if you wish, you can use -b for all files. The only difference that I notice is the asterisk in front of each file name.

sudodus
  • 45,126
  • 5
  • 87
  • 151
  • Excellent! And why should we add find instead of containing the option within the same sha256sum program? Does this usually happen? – MarianoM Nov 09 '18 at 08:48
  • Now I do not understand the use of the curly braces {} well. I was reading a bit more but I found that "it can be used as a placeholder for each file that locates the search command" what does that mean? Does it refer to the coloring of the text or some other reason? I tried inserting a route / test and accepted it. This confuses me even more. It's just a curiosity to learn more about the parameters used. – MarianoM Nov 09 '18 at 08:52
  • Using `find` is a good way to find files in subdirectories, and with the `-exec` option it is possible to run commands with parameters `{}`. Each file found by `find` will be replacing the spaceholder `{}`, so in your case `sha256sum` will work on each of the files one after another. – sudodus Nov 09 '18 at 08:53
  • Thank you so much for everything. As a clarification, due to tests that I was doing, if this command is going to be used; you should not use the -b option if you do not want to have to edit the text later because when you run (sha256sum -c) you can not find the path of the files. However, I wonder if there will be a difference between using -b or not. – MarianoM Nov 09 '18 at 12:01
  • I think the asterisk (`*`) in the output is the only difference. *Maybe* (I am guessing here) long ago there was some difference (that some characters in binary files could create problems like truncation of the process). – sudodus Nov 09 '18 at 14:34
  • 1
    This is a good idea, but it doesn't work. It depends on the order in which `find` finds files, which appears to be implementation dependent. On two different debian machines I have here, the order is changed. I am not sure why this is the case. – user3728501 Sep 04 '20 at 10:26
  • @user3728501, Please tell me what does not work: 1. Does `find` fail to find all files in the current directory and its subdirectories; 2. Does the activation of `sha256sum` fail to run; 3. Or is something else failing for you? - If I understand correctly, this should work in all current versions of Ubuntu, but I have not tested it in Debian. – sudodus Sep 04 '20 at 10:40
  • @sudodus See my answer below – user3728501 Sep 04 '20 at 10:44
  • 1
    @MarianoM the -b flag means to open the file in binary mode. This would only make a difference on systems where binary and text mode are different, for example Windows uses `\r\n` for line endings and text mode will convert that to `\n`. For any Linux, binary and text modes should be the same. – Mark Ransom Aug 23 '22 at 20:17
13

TL;DR

cd /path/to/working/directory
sha256sum <(find . -type f -exec sha256sum \; | sort)

Intro

A more complete answer to the one above, which fixes the problem with find "finding" files in different orders on different systems.

Piping output to file, compare with diff

Firstly, you probably want to pipe the output to a file for comparison with diff. For this you would use

find . -type f -exec sha256sum {} \; > file1.lst

Then on your other system

find . -type f -exec sha256sum {} \; > file2.lst
rsync file2.lst user@host:/home/user/file2.lst
ssh user@host
diff file1.lst file2.lst # might not match due to order

Fixing order of files found with find by piping to sort

Here I am assuming you are doing something similar to what I required this for - copying files from one system to another over a network and verifying the integrity of those files.

What I found was that the order in which find finds files can vary between two systems, even when the OS is "Debian" in both cases.

Therefore, one needs to sort the output in the text files.

sort file1.lst > file1sorted.lst
sort file2.lst > file2sorted.lst
diff file1.lst file2.lst # bad
diff file1sorted.lst file2sorted.lst # ok

You can do the find and sort all in one line, while redirecting the output to a file.

find . -type f -exec sha256sum {} \; | sort > file1.lst

Other sha/md5 sums

You might want to have an increased level of shasumming. To use the 512 bit version simply do;

find . -type f -exec sha512sum {} \; | sort > file1.lst

Alternatively, 256 bit might be overkill for what you are doing, so do

find . -type f -exec md5sum {} \; | sort > file1.lst

A complete 1 line command to compare 2 directories with 1 shasum output

Now, if you have many files and do not want to save the output to a file, you could simply shasum the output. To do this, use

sha256sum <(find . -type -f -exec sha256sum \; | sort)

The pipe to sort is required to ensure the output is sorted before computing the final sha256sum. Without this, if find finds files in a different order, despite the shasums for each file being correct, the overall shasum will depend on the order.

Problem relating to diff output and paths used

You may have some path which looks like

/A/B/C/*

where * are the subdirectories and files you are interested in shasumming. If A/B/C are 1 or more directories containing only 1 subfolder you might end up accidentally running your shasum command in the wrong directory, resulting in the following

sort1.txt
sha256sum1    ./A/B/C/file1

sort2.txt
sha256sum2    ./B/C/file1

Even if sha256sum = sha256sum2 diff will say the files are different. (Because they are due to the different base directory in the path.)

Here is a short python3 code to check the sums line by line, which solves this problem.

#!/usr/bin/env python3
file1_name = "sort1.txt"
file2_name = "sort2.txt"
file1 = open(file1_name, 'r')
file2 = open(file2_name, 'r')
file1_lines = file1.readlines();
file2_lines = file2.readlines();
if(len(file1_lines) == len(file2_lines)):
    print("line numbers ok")
    for i in range(len(file1_lines)):
        line1 = file1_lines[i]
        line2 = file2_lines[i]
        line1_split = line1.split(' ')
        line2_split = line2.split(' ')
        shasum1 = line1_split[0]
        shasum2 = line2_split[0]
        if(shasum1 != shasum2):
            print("shasum error: ", line1)
else:
    print("Error: file ", file1_name, " number of lines != ", file2_name, " number of lines")
print("done")

I initially wanted to write a shell script to do this, but I got bored trying to figure out how to do it, so went back to python.

This makes me think that actually writing a python code to do the entire thing would have been easier, except for the find command.

user3728501
  • 1,044
  • 2
  • 13
  • 27
  • 1
    +1: I see what you mean. Sorting helps, when `find` finds files in different order. – sudodus Sep 04 '20 at 10:55
  • 1
    @sudodus It's possible this is only relevant when doing this across different systems - on the same machine presumably the same results occur in different copies of the same directory contents. (aka: presumably `find` behaves consistently) – user3728501 Sep 04 '20 at 12:51
  • When doing this, usually between windows and linux, I use `get-filehash *.*` in powershell, and `find . -type f -exec sha256sum {} \;` in bash. I just copy the output of both to new files in notepad++, uppercase everything, and replace everything from the end of the hash to the start of the filename with `\t`, then run a csv plugin to `SELECT * FROM THIS ORDER BY Col2` and then dump that back into the file, then run compare, which when it's a `good copy` tells me the files match, then closes them and I'm done. That being said I don't have to do this often enough to warrant an actual script. – gattsbr Jan 17 '21 at 17:13
  • good comment about sorting to accepted question, but your `TL;DR` makes one integral sum, not list of sums for each file. That IMHO should be mentioned there. Sorting by file name, not shasum would be practical for many IMHO, I could not write one-liner right away now... – Martian2020 Feb 13 '21 at 06:53
  • The inner command `find . -type f -exec sha256sum {} +` runs in a fraction of the time for me, producing the exact same list. I have literally no idea what `\;` and `+` even mean. I'm such a noob my bash environment is running in Windows and hardly 'legit'! Any idea what might be causing this massive speed improvement for me? – ne1410s Jul 12 '22 at 20:03
  • @ne1410s It might be you have a bash environment/interpreter which is not compliant with the expected "standard" and is therefore interpreting your command differently to how you expect. (And therefore not running.) Unless you are hashing very small files. Is there anything to indicate it isn't working beyond how long it takes. Using your disk transfer speed you can estimate how long you would expect the hashing process to take for a certain file size. Just calculate the read+write times. You can assume processing time is zero, unless you have very high speed IO. – user3728501 Jul 19 '22 at 08:58
  • People have done it in Python many years ago: https://pypi.org/project/cfv/ Don't reinvent the wheel out of boredom, help maintaining wheels keep spinning. (Pun intended.) – LiveWireBT Dec 18 '22 at 15:19
  • @LiveWireBT That requires (presumably) installing something which you (presumably) won't find by default on most systems. Regardless, can you add a more detailed answer explaining how to use it? – user3728501 Dec 20 '22 at 10:55
  • @user3728501 https://askubuntu.com/a/1446160/40581 – LiveWireBT Dec 20 '22 at 14:55
5

Late answer, but for the sake of documentation...

The other answers suggest to call sha256sum via find and the -exec option. This has the effect that sha256sum is called once for each file, which is a significant overhead for the OS.

A more efficient solution is to convert the find results to command line arguments by piping it through xargs and call sha256sum that way. xargs runs sha256sum once or in large batches if there are too many lines.

find /path/to/your/dir -type f | xargs sha256sum -b

In case that you have filenames with whitespaces, use the -print0 flag in find and -0 flag in xargs to terminate strings with \0

find /path/to/your/dir -type f -print0 | xargs -0 sha256sum -b
Bernd Gloss
  • 151
  • 1
  • 1
  • @Alexander why? xargs is invoking shasum with a list of file paths. Shasum supports passing N filepaths in as arguments, in which case it prints out ` ` – mindlace Dec 28 '21 at 23:01
  • This is a good point, but ignores the original question, which requires a single output, sorted. Try: `cd /path/to/your/dir; find . -type f -print0 | LC_COLLATE=C sort -z | xargs -0 sha256sum` (thx to https://unix.stackexchange.com/a/553172/26736 for the `LC_COLLATE` trick) – Jesse Glick Mar 23 '23 at 11:16
3

To include all files in subdirectories use double asterisks:

sha256sum /path/to/your/dir/**

It requires the globbing enabled. If not, try to enable it: shopt -s globstar. See this question for more details.

  • This answer does not answer the OP's question. It does not act recursively. – theYnot Mar 17 '22 at 06:33
  • It require the [globbing](https://stackoverflow.com/a/28199633/4182169) enabled. If not, try to enable it: `shopt -s globstar` – Valentin Safonnikov Apr 22 '22 at 13:00
  • 1
    Might be good to edit your answer to include the additional content – theYnot Apr 27 '22 at 05:37
  • 1
    Good one-liner, as a side note, easy to check store & check: `sha256sum /path/to/your/dir/** > allfolder.sha256` then the check : `sha256sum -c allfolder.sha256` if $? is different of 0, you know that one of the files has changed. – user2173392 Aug 21 '23 at 13:36
2

Short answer: sha256deep


It feels very wrong being directed to this FAQ here as one of the most relevant answers now. sha*deep|md5deep have existed for years, moved to the hashdeep package some years ago and have been maintained because... well sha256sum has very limited scope of functionality.


On another note:

I used CFV in the past for such tasks, but it was removed from Ubuntu and was one of the latest projects to find new maintainers willing to port it to Python3. Finding this question here and the many answers, but also realizing that pipx exists, just jumped back to CFV.

# Install pipx
python3 -m pip install --user pipx

# Install CFV
pipx install cfv

# Hash the current directory recursively and create a file containing the
# hashes name like the directory
cfv -Crrt sha256

Is it possible to filter the types of files I want to omit in the verification, without having to add all the files I want to admit? What command should I execute?

That would be where find come in handy to create a list of files you want to hash. You experiment with find and --exclude until the output matches what you need, then you redirect find's output to a file and run cfv -Crrt sha256 -f file_list

LiveWireBT
  • 28,405
  • 26
  • 107
  • 221
0

If you want to sort files in the folder by filesize and then do the sha, you can run

for i in $(ls -Sr .); do sha256sum $i; done;
Scholtz
  • 191
  • 1
  • 8