Solved

long running compare process

Posted on 2004-08-25
4
160 Views
Last Modified: 2010-04-23
Hi Experts,

I have folder with subfolders containing files like

c:\aaa\001\0001.txt c:\aaa\001\0002.txt
c:\aaa\002\0001.txt
c:\aaa\003\0001.txt c:\aaa\003\0002.txt

I need to get rid of duplicate files. They might be in different subfolders and having different names but file content
may be the same.

Like : c:\aaa\002\0001.txt  may be same as c:\aaa\003\0002.txt

It will be long running process since it will compare each file to the rest but it is OK. Content is plain text and can be
checked on any level whatever is faster.

I wrote it in JS and C but would see it in VB.

Please help.
0
Comment
Question by:fpoyavo
  • 3
4 Comments
 
LVL 85

Accepted Solution

by:
Mike Tomlinson earned 500 total points
ID: 11898120
Here ya go...

Imports System.IO

Public Class Form1
    Inherits System.Windows.Forms.Form

#Region " Windows Form Designer generated code "

    Public Sub New()
        MyBase.New()

        'This call is required by the Windows Form Designer.
        InitializeComponent()

        'Add any initialization after the InitializeComponent() call

    End Sub

    'Form overrides dispose to clean up the component list.
    Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)
        If disposing Then
            If Not (components Is Nothing) Then
                components.Dispose()
            End If
        End If
        MyBase.Dispose(disposing)
    End Sub

    'Required by the Windows Form Designer
    Private components As System.ComponentModel.IContainer

    'NOTE: The following procedure is required by the Windows Form Designer
    'It can be modified using the Windows Form Designer.  
    'Do not modify it using the code editor.
    Friend WithEvents Label1 As System.Windows.Forms.Label
    Friend WithEvents cmdRootPath As System.Windows.Forms.Button
    Friend WithEvents rootPath As System.Windows.Forms.TextBox
    Friend WithEvents cmdSearch As System.Windows.Forms.Button
    Friend WithEvents Label3 As System.Windows.Forms.Label
    Friend WithEvents lblCurrentFile As System.Windows.Forms.Label
    Friend WithEvents ProgressBar1 As System.Windows.Forms.ProgressBar
    Friend WithEvents lblDeleted As System.Windows.Forms.Label
    Friend WithEvents Label4 As System.Windows.Forms.Label
    Friend WithEvents FolderBrowserDialog1 As System.Windows.Forms.FolderBrowserDialog
    <System.Diagnostics.DebuggerStepThrough()> Private Sub InitializeComponent()
        Me.Label1 = New System.Windows.Forms.Label
        Me.rootPath = New System.Windows.Forms.TextBox
        Me.cmdRootPath = New System.Windows.Forms.Button
        Me.cmdSearch = New System.Windows.Forms.Button
        Me.Label3 = New System.Windows.Forms.Label
        Me.lblCurrentFile = New System.Windows.Forms.Label
        Me.ProgressBar1 = New System.Windows.Forms.ProgressBar
        Me.lblDeleted = New System.Windows.Forms.Label
        Me.Label4 = New System.Windows.Forms.Label
        Me.FolderBrowserDialog1 = New System.Windows.Forms.FolderBrowserDialog
        Me.SuspendLayout()
        '
        'Label1
        '
        Me.Label1.Location = New System.Drawing.Point(8, 8)
        Me.Label1.Name = "Label1"
        Me.Label1.Size = New System.Drawing.Size(80, 16)
        Me.Label1.TabIndex = 0
        Me.Label1.Text = "Root Path:"
        Me.Label1.TextAlign = System.Drawing.ContentAlignment.MiddleRight
        '
        'rootPath
        '
        Me.rootPath.Anchor = CType(((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Left) _
                    Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.rootPath.Location = New System.Drawing.Point(88, 8)
        Me.rootPath.Name = "rootPath"
        Me.rootPath.Size = New System.Drawing.Size(504, 20)
        Me.rootPath.TabIndex = 1
        Me.rootPath.Text = "C:\"
        '
        'cmdRootPath
        '
        Me.cmdRootPath.Anchor = CType((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.cmdRootPath.Location = New System.Drawing.Point(600, 8)
        Me.cmdRootPath.Name = "cmdRootPath"
        Me.cmdRootPath.Size = New System.Drawing.Size(72, 24)
        Me.cmdRootPath.TabIndex = 2
        Me.cmdRootPath.Text = "Select Path"
        '
        'cmdSearch
        '
        Me.cmdSearch.Anchor = CType((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.cmdSearch.Location = New System.Drawing.Point(600, 40)
        Me.cmdSearch.Name = "cmdSearch"
        Me.cmdSearch.Size = New System.Drawing.Size(72, 24)
        Me.cmdSearch.TabIndex = 8
        Me.cmdSearch.Text = "Search"
        '
        'Label3
        '
        Me.Label3.Location = New System.Drawing.Point(8, 48)
        Me.Label3.Name = "Label3"
        Me.Label3.Size = New System.Drawing.Size(72, 16)
        Me.Label3.TabIndex = 10
        Me.Label3.Text = "Searching:"
        Me.Label3.TextAlign = System.Drawing.ContentAlignment.TopRight
        '
        'lblCurrentFile
        '
        Me.lblCurrentFile.Location = New System.Drawing.Point(88, 48)
        Me.lblCurrentFile.Name = "lblCurrentFile"
        Me.lblCurrentFile.Size = New System.Drawing.Size(504, 32)
        Me.lblCurrentFile.TabIndex = 9
        '
        'ProgressBar1
        '
        Me.ProgressBar1.Location = New System.Drawing.Point(8, 136)
        Me.ProgressBar1.Name = "ProgressBar1"
        Me.ProgressBar1.Size = New System.Drawing.Size(664, 16)
        Me.ProgressBar1.TabIndex = 11
        '
        'lblDeleted
        '
        Me.lblDeleted.Location = New System.Drawing.Point(88, 88)
        Me.lblDeleted.Name = "lblDeleted"
        Me.lblDeleted.Size = New System.Drawing.Size(504, 32)
        Me.lblDeleted.TabIndex = 12
        '
        'Label4
        '
        Me.Label4.Location = New System.Drawing.Point(8, 88)
        Me.Label4.Name = "Label4"
        Me.Label4.Size = New System.Drawing.Size(72, 16)
        Me.Label4.TabIndex = 13
        Me.Label4.Text = "Deleted:"
        Me.Label4.TextAlign = System.Drawing.ContentAlignment.TopRight
        '
        'Form1
        '
        Me.AutoScaleBaseSize = New System.Drawing.Size(5, 13)
        Me.ClientSize = New System.Drawing.Size(680, 158)
        Me.Controls.Add(Me.Label4)
        Me.Controls.Add(Me.lblDeleted)
        Me.Controls.Add(Me.ProgressBar1)
        Me.Controls.Add(Me.Label3)
        Me.Controls.Add(Me.lblCurrentFile)
        Me.Controls.Add(Me.cmdSearch)
        Me.Controls.Add(Me.cmdRootPath)
        Me.Controls.Add(Me.rootPath)
        Me.Controls.Add(Me.Label1)
        Me.Name = "Form1"
        Me.Text = "Find and delete duplicate (contents) text files"
        Me.ResumeLayout(False)

    End Sub

#End Region

    Private fileList As ArrayList
    Private duplicatesDeleted As Integer

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        RootPathChanged(Nothing, Nothing)
    End Sub

    Private Sub cmdRootPath_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdRootPath.Click
        If FolderBrowserDialog1.ShowDialog = DialogResult.OK Then
            rootPath.Text = FolderBrowserDialog1.SelectedPath
        End If
    End Sub

    Private Sub RootPathChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles rootPath.TextChanged
        cmdSearch.Enabled = Directory.Exists(rootPath.Text)
    End Sub

    Private Sub cmdSearch_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdSearch.Click
        Dim de As DictionaryEntry
        Dim fi As FileInfo
        Dim fileName As String

        cmdRootPath.Enabled = False
        rootPath.Enabled = False
        cmdSearch.Enabled = False
        ProgressBar1.Value = 0

        fileList = New ArrayList
        lblCurrentFile.Text = ""
        lblDeleted.Text = ""

        Label3.Text = "Searching:"
        searchForTargetFiles(rootPath.Text)
        lblCurrentFile.Text = ""

        If fileList.Count >= 2 Then
            compareFiles()
            MsgBox(duplicatesDeleted & " duplicate(s) deleted", MsgBoxStyle.Information, "Done")
        Else
            MsgBox("No Duplicates Found")
        End If

        lblCurrentFile.Text = ""
        lblDeleted.Text = ""
        ProgressBar1.Value = 0
        cmdSearch.Enabled = True
        rootPath.Enabled = True
        cmdRootPath.Enabled = True
    End Sub

    Private Sub searchForTargetFiles(ByVal pathToSearch As String)
        Dim fi As FileInfo
        Dim di As DirectoryInfo
        Dim subDI As DirectoryInfo

        lblCurrentFile.Text = pathToSearch
        lblCurrentFile.Refresh()
        Application.DoEvents()
        di = New DirectoryInfo(pathToSearch)
        For Each fi In di.GetFiles("*.txt")
            fileList.Add(fi.FullName)
        Next

        For Each subDI In di.GetDirectories
            Application.DoEvents()
            searchForTargetFiles(subDI.FullName)
        Next
    End Sub

    Private Sub compareFiles()
        Dim i As Integer
        Dim j As Integer
        Dim filename1 As String
        Dim filename2 As String
        Dim sw As StreamReader
        Dim contents1 As String
        Dim contents2 As String
        Dim readError As Boolean
        Dim p As Integer

        Label3.Text = "Comparing:"
        Label3.Refresh()
        i = 0
        duplicatesDeleted = 0
        While i <= fileList.Count - 1
            filename1 = fileList(i)
            lblCurrentFile.Text = filename1
            lblCurrentFile.Refresh()
            Try
                readError = False
                sw = New StreamReader(filename1)
                contents1 = sw.ReadToEnd
                sw.Close()
            Catch ex As Exception
                readError = True
                MsgBox(filename1 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to read file")
            End Try

            If Not readError Then
                j = i + 1
                While j <= fileList.Count - 1
                    Try
                        readError = False
                        filename2 = fileList(j)
                        sw = New StreamReader(filename2)
                        contents2 = sw.ReadToEnd
                        sw.Close()
                    Catch ex As Exception
                        readError = True
                        MsgBox(filename2 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to read file")
                    End Try

                    If Not readError Then
                        If contents2.Equals(contents1) Then
                            Try
                                File.Delete(filename2)
                                duplicatesDeleted = duplicatesDeleted + 1
                                lblDeleted.Text = filename2
                                fileList.RemoveAt(j)

                                ' do not increment j
                                ' file was removed from arraylist
                                ' and everything shifted down
                            Catch ex As Exception
                                MsgBox(filename2 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to delete duplicate file")
                                j = j + 1 ' move to next file
                            End Try
                        Else
                            j = j + 1 ' move to next file
                        End If
                    End If
                    Application.DoEvents()
                End While
            End If

            i = i + 1
            p = CInt(i / fileList.Count * 100)
            ProgressBar1.Value = p
            Application.DoEvents()
        End While
    End Sub

End Class
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902218
Wow. Let me try it.
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902425
Idle Mind,

Hmm...it runs and scans directory and subdirectories but it does not find dups.

Thank you.
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902451
Sorry It does.

Thanks a lot.
0

Featured Post

ScreenConnect 6.0 Free Trial

Discover new time-saving features in one game-changing release, ScreenConnect 6.0, based on partner feedback. New features include a redesigned UI, app configurations and chat acknowledgement to improve customer engagement!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Article by: jpaulino
XML Literals are a great way to handle XML files and the community doesn’t use it as much as it should.  An XML Literal is like a String (http://msdn.microsoft.com/en-us/library/system.string.aspx) Literal, only instead of starting and ending with w…
The ECB site provides FX rates for major currencies since its inception in 1999 in the form of an XML feed. The files have the following format (reducted for brevity) (CODE) There are three files available HERE (http://www.ecb.europa.eu/stats/exch…
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question