Make This Tech Work (Archive)

Python script to remove duplicate files

2018-11-15 10:46:28

 

remove 1dash python script output

Remove duplicate files with a Python script on Windows (might work on Mac as well)

Here's a python script to remove duplicate files in a Windows file tree. This may work on Mac but I haven't tested it. The comments in the script explain how it works. Be sure to uncomment the "os.remove" line (highlighted) in order to actually remove the duplicate files when you're ready. If this is run as it is below, it will say "removing file <file>" but won't actually remove those files.

 



This script is based on the code here: https://thispointer.com/python-how-to-get-list-of-files-in-directory-and-sub-directories/.

You can add a usage section to this script by using the script in my other post here

This script can also be found on github here: https://github.com/martyh1/python

[code language="python" wraplines="false" collapse="false" highlight="27,45"]
import os
from pathlib import Path

'''
    This script uses os lib methods to traverse a directory tree and do the following:
    - check each file and see if another file exists with same name but prepended with a string.  In this case, the string is &quot;1-&quot;.

    I created this because of MusicBee music software for pc creating duplicates when I sync/transferred music to my sony nw-a45 hd music player because I
      hadn't set up settings in MusicBee correctly.  

    This script could be easily modified to find duplicate files using other search criteria.
'''

def getAllFilesInTree(directoryName):
    # Get the list of duplicate files (based on a specified string) in directory tree
    entriesInCurrentDir = os.listdir(directoryName)
    allFiles = list()
    # Iterate over all the entries
    for entry in entriesInCurrentDir:
        # Create full path
        fullPath = os.path.join(directoryName, entry)
        # If entry is a directory then get the list of files in this directory
        if os.path.isdir(fullPath):
            allFiles = allFiles + getAllFilesInTree(fullPath)
        else:
            fileName = entry  # just for naming convenience, use a new var.  Assuming if it is NOT a dir than it's a file.
            if (fileName[0:2]=='1-'):   # &amp;amp;amp;amp;amp;amp;lt;--- this is the string to check for
                my_file = Path(os.path.join(directoryName, fileName[2:]))
                if my_file.is_file():
                    allFiles.append(fullPath)

    return allFiles        

def main():

    startingDirectory = '.';

    # Get the list of duplicate files (based on a specified string) in directory tree at given path
    allFiles = getAllFilesInTree(startingDirectory)

    # Print (and remove) the duplicate files
    for fileName in allFiles:
        print(&quot;removing {}&quot;.format(fileName))
        #os.remove(fileName) # &amp;amp;amp;amp;amp;amp;lt;--- uncomment this line to actually remove each duplicate file

if __name__ == '__main__':
    main()[/code]