177

I sometimes get files from my clients that have the wrong file extension. For example, the name is image.jpg but the file is actually a TIFF image. In many cases I can clarify it by opening the file in a text editor, looking at the first few bytes, then deducing which file type it is.

This works for me with JPEG, TIFF, GIF and PDF files. However there are many more file types out there.

Is it possible to automate identification of the correct file type by analyzing the containing data?

Stevoisiak
  • 13,555
  • 39
  • 101
  • 154
Martin
  • 3,862
  • 3
  • 22
  • 25
  • 48
    For those interested the `file` command does this on *nix machines. – boehj Apr 24 '11 at 12:37
  • 11
    I do not understand why this question is off-topic (after 3 years). I do not ask for a specific software (i reworded my question to underline this). I just aks for a solution. – Martin Dec 22 '14 at 10:13
  • 3
    I don't understand why 26 people think that boehj *nix-related comment above "adds something useful to the post". This question is tagged `windows`, but the comment imply: "You can't do that on Windows, you must use *nix instead". So? The comment is directed "for those interested". In what? Change computer? **`:(`** – Aacini Sep 08 '15 at 14:47
  • 5
    @Aacini useful for *nix people who come here from google. – jingyu9575 Nov 14 '15 at 14:46
  • Moved to http://softwarerecs.stackexchange.com/questions/36519/determine-file-type-from-its-content-not-trusting-the-extension – Nicolas Raoul Sep 28 '16 at 06:48
  • 5
    @Aacini Also, Windows 10 now supports bash, so `file` is now a valid answer to this question (although I haven't tested it). – ThatMatthew Aug 15 '17 at 13:30

6 Answers6

171

You can use the TrID tool which has a growing library of file type definitions for identifying files with.

Screenshot

Wildcards are supported, so in your example you could just put all the images to be examined in a folder, e.g. C:\verifyimages - then you can use the command:

trid C:\verifyimages\*

This will examine all files in the verifyimages folder.


There is also a GUI version available, TrIDNet:

Screenshot

There is documentation available on how you can you can easily integrate TrID or TrIDNet into Windows Explorer and Total Commander:

Windows Explorer

Total Commander

Gaff
  • 18,569
  • 15
  • 57
  • 68
  • 6
    Do note that it indicates it is not licensed for commercial use, only personal use – Chris Magnuson Jan 31 '15 at 17:31
  • 5
    I had some trouble figuring out which download files were necessary to use this program. So this comment is to aid in that. You'll need to download two files. First, either the command line utility or the GUI utility. Second, a folder of XML definitions called "TrID XML defs". Place the definition XML files in the same directory as TrID. Then scan definitions. Finally you can start using it. – mrtsherman Mar 26 '15 at 15:40
  • Thanks, mrtsherman, for the clarification. I was confused as well. Docs could be improved, but nice tool! – Woodchuck Oct 06 '18 at 17:26
58

file

File tests each argument in an attempt to classify it. There are three sets of tests, performed in this order: filesystem tests, magic number tests, and language tests. The first test that succeeds causes the file type to be printed.

The type printed will usually contain one of the words text (the file contains only printing characters and a few common control characters and is probably safe to read on an ASCII terminal), executable (the file contains the result of compiling a program in a form understandable to some UNIX kernel or another), or data meaning anything else (data is usually “binary” or non-printable). Exceptions are well-known file formats (core files, tar archives) that are known to contain binary data.

Ignacio Vazquez-Abrams
  • 111,361
  • 10
  • 201
  • 247
  • 1
    `file` is standard, but on older systems (especially non-Linux) not very knowledgeable. For Ubuntu etc it should be quite respectable and even installed as standard. – Thorbjørn Ravn Andersen Apr 24 '11 at 13:28
  • I didn't think file existed on Windows installs at all (don't have a windows box to test with) – Anm Apr 24 '11 at 18:59
  • 1
    @Anm_LA, it isn't standard at all on Windows, but the link in the answer is to a port of the GNU version of `file` to Windows. If other *nix commands are interesting to you as a Windows user, then poke around [that site](http://gnuwin32.sourceforge.net/) to find all kinds of gems. – RBerteig Apr 24 '11 at 19:54
  • 2
    I very much doubt that `file` is an expert on files made by Windows applications. – Robin Green Apr 24 '11 at 20:23
  • 6
    @Robin: You're welcome to test it. – Ignacio Vazquez-Abrams Apr 24 '11 at 20:27
  • 13
    @Robin: I very much doubt you've used `file` at all, and yet you've almost made up your mind about its effectiveness. – tzot Apr 24 '11 at 23:24
  • 1
    The linked to version of file is 5.09 which was released back in [2011] (ftp://ftp.astron.com/pub/file/). To get a much more recent version with an updated libmagic database (5.20 released in 2014 as of this writing) download and install [Cygwin](https://www.cygwin.com/). – Chris Magnuson Jan 31 '15 at 21:27
  • The linked tool doesn't work on Windows 10. – Gqqnbig Apr 16 '19 at 01:18
  • 2
    @Gqqnbig, that version of `file.exe` is a decade old and overall status of `gnuwin32` is `unmaintained` since 2013 as per Wikipedia. The modern approach is to use git-for-win: https://git-scm.com/download/win, that bundles Unix utilities (latest versions). After installation, you should have `%ProgramFile%\Git\usr\bin` in `PATH` with `file.exe` in it. For Windows 10 you may also enable Windows Sybsystem for Linux (WSL), install a distro of your choice (ubuntu, fedora, alpine, gentoo, etc.), enter it and do `file /mnt/c/your/path/in/windows/filename.extension` (`/c/` part represents C: drive). – vulcan raven May 25 '19 at 10:57
15

I used to work for the French National Library, to build an digital archive system that contains not only digitized books but also millions of digital artefacts with all kinds of strange file types. We used JHOVE to recognize file formats.

JHOVE is open source, it is maintained by JSTOR and the Harvard University Library. It is rather simple to use.

Nicolas Raoul
  • 10,711
  • 18
  • 64
  • 102
  • cool! but does it recognize proprietary formats like TrID does? anyways, I *do* have some uses to identify subformats/variants of non-proprietary formats (or, to be precise, proprietary 'extensions' to standardized formats), so this would come in handy. thank you for the heads-up! – pepoluan Apr 24 '11 at 14:00
9

A modern approach that may appeal is to use Git for Windows. Run git-bash.exe and run the command file path\to\file. An example output might be:

TestFile.ico: MS Windows icon resource - 1 icon, 128x128, 32 bits/pixel

Alternatively, use the command file -i path\to\file, which might give:

TestFile.ico: image/vnd.microsoft.icon; charset=binary
AlainD
  • 4,447
  • 15
  • 49
  • 96
  • Thank you! At least I can get the mime types and construct an `mv` batch to fix the file extensions. Perhaps someone with more time can [automate the process into a program](https://superuser.com/q/1635525/117986) :) – ADTC Mar 23 '21 at 17:22
3

You can check the file type from any computer including windows at

http://www.checkfiletype.com

  • 3
    Welcome to Super User! Please read [how to recommend software in answers](https://meta.superuser.com/questions/5329/how-do-i-recommend-software-in-my-answers), particularly the bits in **bold**; then edit your answer to follow the guidelines there. This applies even though you are recommending a website! Cheers – bertieb Jun 04 '18 at 11:33
2

I use Oracle's OutsideIn libraries in my programs. Not free, but they work well, especially for images. The market-speak says it supports over 500 file types.