De-dup Files from the Terminal


Here's a way to remove duplicate files using a cli tool.

Install fdupes

There are many tooling options. I have used fdupes to good effect.

Get it:

sudo apt install fdupes

Interactive phases

If you want to delete the duplicate files in the current directory, run this command:

fdupes . -d

This will detect duplicates then open a terminal ui for interactively deleting the duplicates. There are 3 phases to this process:

  1. Selection - which files will be operated on
  2. Tag - mark for deletion (or preservation)
  3. Execute - do the actual filesystem deletion

First, selection: There are many methods for selection. To see these (and other helpful commands in this interactive mode), type help.

For our example, we'll delete those files that are duplicated according to the MacOS naming of duplicate files on creation/import: That is, the files get " 1" or " 2" appended to the end of the file name, for the 1st, 2nd, etc. duplicate copy.

(A note here: fdupes is smarter at detecting dupes than just checking file names. It actually compares bytes. Because, for example IMG_1.JPG and IMG_1 1.JPG aren't necessarily duplicates.)

Selection

Ok, on to the selection. At the interactive prompt, I'll type:

sele  1.JPG

This will select any files that have this pattern at the end of their file name. They will be highlighted in the UI. Note the two spaces in this command. The first is for the command, separating sele from the file pattern. The second is for the first character of the pattern itself.

We could repeat this selection for " 2.JPG", " 3.MOV" etc.

Selecting in this way is faster for this use case compared to the default method. By default, your duplicates are brough up in sets and you preserve one file per set. But if you have 300 sets, this takes a long time.

Output can look like this (note this is already-tagged output):

Set 1 of 349:

  1 [ ] ./IMG_7618.JPEG
  2 [-] ./IMG_7618 1.JPEG

Set 2 of 349:

    [-] ./IMG_0188 3.MOV
    [+] ./IMG_0188 1.MOV

Set 3 of 349:

    [+] ./IMG_0177 1.MOV
    [-] ./IMG_0177 3.MOV

Set 4 of 349:

    [+] ./IMG_6923 1.JPG
    [-] ./IMG_6923 2.JPG

...

With the prompt:

( Preserve files [1 - 2, all, help] ):

Tagging

Now with a selection made, we want to tag, or mark, that selection for deletion. We do that with this command:

ds

That will put a - sign in front of the selected files, like [-] ./IMG_6923 2.JPG.

Execution

Finally, we get those files deleted. Beware, this command is the one that counds, and files will be removed from the filesystem upon execution.

To delete, type:

prune

(or press the Delete key).

Automatic delete

If you don't want an interactive delete, possibly in a simpler case, you can detect duplicates and delete them automatically with this command:

fdupes --order=name -d -N

--order will order the files by name, -d will delete found duplicates, and -N (or --noprompt) won't ask you if you're sure (non-interactive).

For other options on automatic processes, use fdupes --help.

Now go out and regain that hard drive space!