Solved

long running compare process

Posted on 2004-08-25
4
155 Views
Last Modified: 2010-04-23
Hi Experts,

I have folder with subfolders containing files like

c:\aaa\001\0001.txt c:\aaa\001\0002.txt
c:\aaa\002\0001.txt
c:\aaa\003\0001.txt c:\aaa\003\0002.txt

I need to get rid of duplicate files. They might be in different subfolders and having different names but file content
may be the same.

Like : c:\aaa\002\0001.txt  may be same as c:\aaa\003\0002.txt

It will be long running process since it will compare each file to the rest but it is OK. Content is plain text and can be
checked on any level whatever is faster.

I wrote it in JS and C but would see it in VB.

Please help.
0
Comment
Question by:fpoyavo
  • 3
4 Comments
 
LVL 85

Accepted Solution

by:
Mike Tomlinson earned 500 total points
ID: 11898120
Here ya go...

Imports System.IO

Public Class Form1
    Inherits System.Windows.Forms.Form

#Region " Windows Form Designer generated code "

    Public Sub New()
        MyBase.New()

        'This call is required by the Windows Form Designer.
        InitializeComponent()

        'Add any initialization after the InitializeComponent() call

    End Sub

    'Form overrides dispose to clean up the component list.
    Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)
        If disposing Then
            If Not (components Is Nothing) Then
                components.Dispose()
            End If
        End If
        MyBase.Dispose(disposing)
    End Sub

    'Required by the Windows Form Designer
    Private components As System.ComponentModel.IContainer

    'NOTE: The following procedure is required by the Windows Form Designer
    'It can be modified using the Windows Form Designer.  
    'Do not modify it using the code editor.
    Friend WithEvents Label1 As System.Windows.Forms.Label
    Friend WithEvents cmdRootPath As System.Windows.Forms.Button
    Friend WithEvents rootPath As System.Windows.Forms.TextBox
    Friend WithEvents cmdSearch As System.Windows.Forms.Button
    Friend WithEvents Label3 As System.Windows.Forms.Label
    Friend WithEvents lblCurrentFile As System.Windows.Forms.Label
    Friend WithEvents ProgressBar1 As System.Windows.Forms.ProgressBar
    Friend WithEvents lblDeleted As System.Windows.Forms.Label
    Friend WithEvents Label4 As System.Windows.Forms.Label
    Friend WithEvents FolderBrowserDialog1 As System.Windows.Forms.FolderBrowserDialog
    <System.Diagnostics.DebuggerStepThrough()> Private Sub InitializeComponent()
        Me.Label1 = New System.Windows.Forms.Label
        Me.rootPath = New System.Windows.Forms.TextBox
        Me.cmdRootPath = New System.Windows.Forms.Button
        Me.cmdSearch = New System.Windows.Forms.Button
        Me.Label3 = New System.Windows.Forms.Label
        Me.lblCurrentFile = New System.Windows.Forms.Label
        Me.ProgressBar1 = New System.Windows.Forms.ProgressBar
        Me.lblDeleted = New System.Windows.Forms.Label
        Me.Label4 = New System.Windows.Forms.Label
        Me.FolderBrowserDialog1 = New System.Windows.Forms.FolderBrowserDialog
        Me.SuspendLayout()
        '
        'Label1
        '
        Me.Label1.Location = New System.Drawing.Point(8, 8)
        Me.Label1.Name = "Label1"
        Me.Label1.Size = New System.Drawing.Size(80, 16)
        Me.Label1.TabIndex = 0
        Me.Label1.Text = "Root Path:"
        Me.Label1.TextAlign = System.Drawing.ContentAlignment.MiddleRight
        '
        'rootPath
        '
        Me.rootPath.Anchor = CType(((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Left) _
                    Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.rootPath.Location = New System.Drawing.Point(88, 8)
        Me.rootPath.Name = "rootPath"
        Me.rootPath.Size = New System.Drawing.Size(504, 20)
        Me.rootPath.TabIndex = 1
        Me.rootPath.Text = "C:\"
        '
        'cmdRootPath
        '
        Me.cmdRootPath.Anchor = CType((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.cmdRootPath.Location = New System.Drawing.Point(600, 8)
        Me.cmdRootPath.Name = "cmdRootPath"
        Me.cmdRootPath.Size = New System.Drawing.Size(72, 24)
        Me.cmdRootPath.TabIndex = 2
        Me.cmdRootPath.Text = "Select Path"
        '
        'cmdSearch
        '
        Me.cmdSearch.Anchor = CType((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.cmdSearch.Location = New System.Drawing.Point(600, 40)
        Me.cmdSearch.Name = "cmdSearch"
        Me.cmdSearch.Size = New System.Drawing.Size(72, 24)
        Me.cmdSearch.TabIndex = 8
        Me.cmdSearch.Text = "Search"
        '
        'Label3
        '
        Me.Label3.Location = New System.Drawing.Point(8, 48)
        Me.Label3.Name = "Label3"
        Me.Label3.Size = New System.Drawing.Size(72, 16)
        Me.Label3.TabIndex = 10
        Me.Label3.Text = "Searching:"
        Me.Label3.TextAlign = System.Drawing.ContentAlignment.TopRight
        '
        'lblCurrentFile
        '
        Me.lblCurrentFile.Location = New System.Drawing.Point(88, 48)
        Me.lblCurrentFile.Name = "lblCurrentFile"
        Me.lblCurrentFile.Size = New System.Drawing.Size(504, 32)
        Me.lblCurrentFile.TabIndex = 9
        '
        'ProgressBar1
        '
        Me.ProgressBar1.Location = New System.Drawing.Point(8, 136)
        Me.ProgressBar1.Name = "ProgressBar1"
        Me.ProgressBar1.Size = New System.Drawing.Size(664, 16)
        Me.ProgressBar1.TabIndex = 11
        '
        'lblDeleted
        '
        Me.lblDeleted.Location = New System.Drawing.Point(88, 88)
        Me.lblDeleted.Name = "lblDeleted"
        Me.lblDeleted.Size = New System.Drawing.Size(504, 32)
        Me.lblDeleted.TabIndex = 12
        '
        'Label4
        '
        Me.Label4.Location = New System.Drawing.Point(8, 88)
        Me.Label4.Name = "Label4"
        Me.Label4.Size = New System.Drawing.Size(72, 16)
        Me.Label4.TabIndex = 13
        Me.Label4.Text = "Deleted:"
        Me.Label4.TextAlign = System.Drawing.ContentAlignment.TopRight
        '
        'Form1
        '
        Me.AutoScaleBaseSize = New System.Drawing.Size(5, 13)
        Me.ClientSize = New System.Drawing.Size(680, 158)
        Me.Controls.Add(Me.Label4)
        Me.Controls.Add(Me.lblDeleted)
        Me.Controls.Add(Me.ProgressBar1)
        Me.Controls.Add(Me.Label3)
        Me.Controls.Add(Me.lblCurrentFile)
        Me.Controls.Add(Me.cmdSearch)
        Me.Controls.Add(Me.cmdRootPath)
        Me.Controls.Add(Me.rootPath)
        Me.Controls.Add(Me.Label1)
        Me.Name = "Form1"
        Me.Text = "Find and delete duplicate (contents) text files"
        Me.ResumeLayout(False)

    End Sub

#End Region

    Private fileList As ArrayList
    Private duplicatesDeleted As Integer

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        RootPathChanged(Nothing, Nothing)
    End Sub

    Private Sub cmdRootPath_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdRootPath.Click
        If FolderBrowserDialog1.ShowDialog = DialogResult.OK Then
            rootPath.Text = FolderBrowserDialog1.SelectedPath
        End If
    End Sub

    Private Sub RootPathChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles rootPath.TextChanged
        cmdSearch.Enabled = Directory.Exists(rootPath.Text)
    End Sub

    Private Sub cmdSearch_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdSearch.Click
        Dim de As DictionaryEntry
        Dim fi As FileInfo
        Dim fileName As String

        cmdRootPath.Enabled = False
        rootPath.Enabled = False
        cmdSearch.Enabled = False
        ProgressBar1.Value = 0

        fileList = New ArrayList
        lblCurrentFile.Text = ""
        lblDeleted.Text = ""

        Label3.Text = "Searching:"
        searchForTargetFiles(rootPath.Text)
        lblCurrentFile.Text = ""

        If fileList.Count >= 2 Then
            compareFiles()
            MsgBox(duplicatesDeleted & " duplicate(s) deleted", MsgBoxStyle.Information, "Done")
        Else
            MsgBox("No Duplicates Found")
        End If

        lblCurrentFile.Text = ""
        lblDeleted.Text = ""
        ProgressBar1.Value = 0
        cmdSearch.Enabled = True
        rootPath.Enabled = True
        cmdRootPath.Enabled = True
    End Sub

    Private Sub searchForTargetFiles(ByVal pathToSearch As String)
        Dim fi As FileInfo
        Dim di As DirectoryInfo
        Dim subDI As DirectoryInfo

        lblCurrentFile.Text = pathToSearch
        lblCurrentFile.Refresh()
        Application.DoEvents()
        di = New DirectoryInfo(pathToSearch)
        For Each fi In di.GetFiles("*.txt")
            fileList.Add(fi.FullName)
        Next

        For Each subDI In di.GetDirectories
            Application.DoEvents()
            searchForTargetFiles(subDI.FullName)
        Next
    End Sub

    Private Sub compareFiles()
        Dim i As Integer
        Dim j As Integer
        Dim filename1 As String
        Dim filename2 As String
        Dim sw As StreamReader
        Dim contents1 As String
        Dim contents2 As String
        Dim readError As Boolean
        Dim p As Integer

        Label3.Text = "Comparing:"
        Label3.Refresh()
        i = 0
        duplicatesDeleted = 0
        While i <= fileList.Count - 1
            filename1 = fileList(i)
            lblCurrentFile.Text = filename1
            lblCurrentFile.Refresh()
            Try
                readError = False
                sw = New StreamReader(filename1)
                contents1 = sw.ReadToEnd
                sw.Close()
            Catch ex As Exception
                readError = True
                MsgBox(filename1 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to read file")
            End Try

            If Not readError Then
                j = i + 1
                While j <= fileList.Count - 1
                    Try
                        readError = False
                        filename2 = fileList(j)
                        sw = New StreamReader(filename2)
                        contents2 = sw.ReadToEnd
                        sw.Close()
                    Catch ex As Exception
                        readError = True
                        MsgBox(filename2 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to read file")
                    End Try

                    If Not readError Then
                        If contents2.Equals(contents1) Then
                            Try
                                File.Delete(filename2)
                                duplicatesDeleted = duplicatesDeleted + 1
                                lblDeleted.Text = filename2
                                fileList.RemoveAt(j)

                                ' do not increment j
                                ' file was removed from arraylist
                                ' and everything shifted down
                            Catch ex As Exception
                                MsgBox(filename2 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to delete duplicate file")
                                j = j + 1 ' move to next file
                            End Try
                        Else
                            j = j + 1 ' move to next file
                        End If
                    End If
                    Application.DoEvents()
                End While
            End If

            i = i + 1
            p = CInt(i / fileList.Count * 100)
            ProgressBar1.Value = p
            Application.DoEvents()
        End While
    End Sub

End Class
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902218
Wow. Let me try it.
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902425
Idle Mind,

Hmm...it runs and scans directory and subdirectories but it does not find dups.

Thank you.
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902451
Sorry It does.

Thanks a lot.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Creating an analog clock UserControl seems fairly straight forward.  It is, after all, essentially just a circle with several lines in it!  Two common approaches for rendering an analog clock typically involve either manually calculating points with…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now