Solved

long running compare process

Posted on 2004-08-25
4
164 Views
Last Modified: 2010-04-23
Hi Experts,

I have folder with subfolders containing files like

c:\aaa\001\0001.txt c:\aaa\001\0002.txt
c:\aaa\002\0001.txt
c:\aaa\003\0001.txt c:\aaa\003\0002.txt

I need to get rid of duplicate files. They might be in different subfolders and having different names but file content
may be the same.

Like : c:\aaa\002\0001.txt  may be same as c:\aaa\003\0002.txt

It will be long running process since it will compare each file to the rest but it is OK. Content is plain text and can be
checked on any level whatever is faster.

I wrote it in JS and C but would see it in VB.

Please help.
0
Comment
Question by:fpoyavo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
4 Comments
 
LVL 86

Accepted Solution

by:
Mike Tomlinson earned 500 total points
ID: 11898120
Here ya go...

Imports System.IO

Public Class Form1
    Inherits System.Windows.Forms.Form

#Region " Windows Form Designer generated code "

    Public Sub New()
        MyBase.New()

        'This call is required by the Windows Form Designer.
        InitializeComponent()

        'Add any initialization after the InitializeComponent() call

    End Sub

    'Form overrides dispose to clean up the component list.
    Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)
        If disposing Then
            If Not (components Is Nothing) Then
                components.Dispose()
            End If
        End If
        MyBase.Dispose(disposing)
    End Sub

    'Required by the Windows Form Designer
    Private components As System.ComponentModel.IContainer

    'NOTE: The following procedure is required by the Windows Form Designer
    'It can be modified using the Windows Form Designer.  
    'Do not modify it using the code editor.
    Friend WithEvents Label1 As System.Windows.Forms.Label
    Friend WithEvents cmdRootPath As System.Windows.Forms.Button
    Friend WithEvents rootPath As System.Windows.Forms.TextBox
    Friend WithEvents cmdSearch As System.Windows.Forms.Button
    Friend WithEvents Label3 As System.Windows.Forms.Label
    Friend WithEvents lblCurrentFile As System.Windows.Forms.Label
    Friend WithEvents ProgressBar1 As System.Windows.Forms.ProgressBar
    Friend WithEvents lblDeleted As System.Windows.Forms.Label
    Friend WithEvents Label4 As System.Windows.Forms.Label
    Friend WithEvents FolderBrowserDialog1 As System.Windows.Forms.FolderBrowserDialog
    <System.Diagnostics.DebuggerStepThrough()> Private Sub InitializeComponent()
        Me.Label1 = New System.Windows.Forms.Label
        Me.rootPath = New System.Windows.Forms.TextBox
        Me.cmdRootPath = New System.Windows.Forms.Button
        Me.cmdSearch = New System.Windows.Forms.Button
        Me.Label3 = New System.Windows.Forms.Label
        Me.lblCurrentFile = New System.Windows.Forms.Label
        Me.ProgressBar1 = New System.Windows.Forms.ProgressBar
        Me.lblDeleted = New System.Windows.Forms.Label
        Me.Label4 = New System.Windows.Forms.Label
        Me.FolderBrowserDialog1 = New System.Windows.Forms.FolderBrowserDialog
        Me.SuspendLayout()
        '
        'Label1
        '
        Me.Label1.Location = New System.Drawing.Point(8, 8)
        Me.Label1.Name = "Label1"
        Me.Label1.Size = New System.Drawing.Size(80, 16)
        Me.Label1.TabIndex = 0
        Me.Label1.Text = "Root Path:"
        Me.Label1.TextAlign = System.Drawing.ContentAlignment.MiddleRight
        '
        'rootPath
        '
        Me.rootPath.Anchor = CType(((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Left) _
                    Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.rootPath.Location = New System.Drawing.Point(88, 8)
        Me.rootPath.Name = "rootPath"
        Me.rootPath.Size = New System.Drawing.Size(504, 20)
        Me.rootPath.TabIndex = 1
        Me.rootPath.Text = "C:\"
        '
        'cmdRootPath
        '
        Me.cmdRootPath.Anchor = CType((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.cmdRootPath.Location = New System.Drawing.Point(600, 8)
        Me.cmdRootPath.Name = "cmdRootPath"
        Me.cmdRootPath.Size = New System.Drawing.Size(72, 24)
        Me.cmdRootPath.TabIndex = 2
        Me.cmdRootPath.Text = "Select Path"
        '
        'cmdSearch
        '
        Me.cmdSearch.Anchor = CType((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.cmdSearch.Location = New System.Drawing.Point(600, 40)
        Me.cmdSearch.Name = "cmdSearch"
        Me.cmdSearch.Size = New System.Drawing.Size(72, 24)
        Me.cmdSearch.TabIndex = 8
        Me.cmdSearch.Text = "Search"
        '
        'Label3
        '
        Me.Label3.Location = New System.Drawing.Point(8, 48)
        Me.Label3.Name = "Label3"
        Me.Label3.Size = New System.Drawing.Size(72, 16)
        Me.Label3.TabIndex = 10
        Me.Label3.Text = "Searching:"
        Me.Label3.TextAlign = System.Drawing.ContentAlignment.TopRight
        '
        'lblCurrentFile
        '
        Me.lblCurrentFile.Location = New System.Drawing.Point(88, 48)
        Me.lblCurrentFile.Name = "lblCurrentFile"
        Me.lblCurrentFile.Size = New System.Drawing.Size(504, 32)
        Me.lblCurrentFile.TabIndex = 9
        '
        'ProgressBar1
        '
        Me.ProgressBar1.Location = New System.Drawing.Point(8, 136)
        Me.ProgressBar1.Name = "ProgressBar1"
        Me.ProgressBar1.Size = New System.Drawing.Size(664, 16)
        Me.ProgressBar1.TabIndex = 11
        '
        'lblDeleted
        '
        Me.lblDeleted.Location = New System.Drawing.Point(88, 88)
        Me.lblDeleted.Name = "lblDeleted"
        Me.lblDeleted.Size = New System.Drawing.Size(504, 32)
        Me.lblDeleted.TabIndex = 12
        '
        'Label4
        '
        Me.Label4.Location = New System.Drawing.Point(8, 88)
        Me.Label4.Name = "Label4"
        Me.Label4.Size = New System.Drawing.Size(72, 16)
        Me.Label4.TabIndex = 13
        Me.Label4.Text = "Deleted:"
        Me.Label4.TextAlign = System.Drawing.ContentAlignment.TopRight
        '
        'Form1
        '
        Me.AutoScaleBaseSize = New System.Drawing.Size(5, 13)
        Me.ClientSize = New System.Drawing.Size(680, 158)
        Me.Controls.Add(Me.Label4)
        Me.Controls.Add(Me.lblDeleted)
        Me.Controls.Add(Me.ProgressBar1)
        Me.Controls.Add(Me.Label3)
        Me.Controls.Add(Me.lblCurrentFile)
        Me.Controls.Add(Me.cmdSearch)
        Me.Controls.Add(Me.cmdRootPath)
        Me.Controls.Add(Me.rootPath)
        Me.Controls.Add(Me.Label1)
        Me.Name = "Form1"
        Me.Text = "Find and delete duplicate (contents) text files"
        Me.ResumeLayout(False)

    End Sub

#End Region

    Private fileList As ArrayList
    Private duplicatesDeleted As Integer

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        RootPathChanged(Nothing, Nothing)
    End Sub

    Private Sub cmdRootPath_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdRootPath.Click
        If FolderBrowserDialog1.ShowDialog = DialogResult.OK Then
            rootPath.Text = FolderBrowserDialog1.SelectedPath
        End If
    End Sub

    Private Sub RootPathChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles rootPath.TextChanged
        cmdSearch.Enabled = Directory.Exists(rootPath.Text)
    End Sub

    Private Sub cmdSearch_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdSearch.Click
        Dim de As DictionaryEntry
        Dim fi As FileInfo
        Dim fileName As String

        cmdRootPath.Enabled = False
        rootPath.Enabled = False
        cmdSearch.Enabled = False
        ProgressBar1.Value = 0

        fileList = New ArrayList
        lblCurrentFile.Text = ""
        lblDeleted.Text = ""

        Label3.Text = "Searching:"
        searchForTargetFiles(rootPath.Text)
        lblCurrentFile.Text = ""

        If fileList.Count >= 2 Then
            compareFiles()
            MsgBox(duplicatesDeleted & " duplicate(s) deleted", MsgBoxStyle.Information, "Done")
        Else
            MsgBox("No Duplicates Found")
        End If

        lblCurrentFile.Text = ""
        lblDeleted.Text = ""
        ProgressBar1.Value = 0
        cmdSearch.Enabled = True
        rootPath.Enabled = True
        cmdRootPath.Enabled = True
    End Sub

    Private Sub searchForTargetFiles(ByVal pathToSearch As String)
        Dim fi As FileInfo
        Dim di As DirectoryInfo
        Dim subDI As DirectoryInfo

        lblCurrentFile.Text = pathToSearch
        lblCurrentFile.Refresh()
        Application.DoEvents()
        di = New DirectoryInfo(pathToSearch)
        For Each fi In di.GetFiles("*.txt")
            fileList.Add(fi.FullName)
        Next

        For Each subDI In di.GetDirectories
            Application.DoEvents()
            searchForTargetFiles(subDI.FullName)
        Next
    End Sub

    Private Sub compareFiles()
        Dim i As Integer
        Dim j As Integer
        Dim filename1 As String
        Dim filename2 As String
        Dim sw As StreamReader
        Dim contents1 As String
        Dim contents2 As String
        Dim readError As Boolean
        Dim p As Integer

        Label3.Text = "Comparing:"
        Label3.Refresh()
        i = 0
        duplicatesDeleted = 0
        While i <= fileList.Count - 1
            filename1 = fileList(i)
            lblCurrentFile.Text = filename1
            lblCurrentFile.Refresh()
            Try
                readError = False
                sw = New StreamReader(filename1)
                contents1 = sw.ReadToEnd
                sw.Close()
            Catch ex As Exception
                readError = True
                MsgBox(filename1 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to read file")
            End Try

            If Not readError Then
                j = i + 1
                While j <= fileList.Count - 1
                    Try
                        readError = False
                        filename2 = fileList(j)
                        sw = New StreamReader(filename2)
                        contents2 = sw.ReadToEnd
                        sw.Close()
                    Catch ex As Exception
                        readError = True
                        MsgBox(filename2 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to read file")
                    End Try

                    If Not readError Then
                        If contents2.Equals(contents1) Then
                            Try
                                File.Delete(filename2)
                                duplicatesDeleted = duplicatesDeleted + 1
                                lblDeleted.Text = filename2
                                fileList.RemoveAt(j)

                                ' do not increment j
                                ' file was removed from arraylist
                                ' and everything shifted down
                            Catch ex As Exception
                                MsgBox(filename2 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to delete duplicate file")
                                j = j + 1 ' move to next file
                            End Try
                        Else
                            j = j + 1 ' move to next file
                        End If
                    End If
                    Application.DoEvents()
                End While
            End If

            i = i + 1
            p = CInt(i / fileList.Count * 100)
            ProgressBar1.Value = p
            Application.DoEvents()
        End While
    End Sub

End Class
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902218
Wow. Let me try it.
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902425
Idle Mind,

Hmm...it runs and scans directory and subdirectories but it does not find dups.

Thank you.
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902451
Sorry It does.

Thanks a lot.
0

Featured Post

[Live Webinar] The Cloud Skills Gap

As Cloud technologies come of age, business leaders grapple with the impact it has on their team's skills and the gap associated with the use of a cloud platform.

Join experts from 451 Research and Concerto Cloud Services on July 27th where we will examine fact and fiction.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Reports are based on a report definition, which is an XML file that describes data and layout for the report, with a different extension. You can create a client-side report definition language (*.rdlc) file with Visual Studio, and build g…
It’s quite interesting for me as I worked with Excel using vb.net for some time. Here are some topics which I know want to share with others whom this might help. First of all if you are working with Excel then you need to Download the Following …
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
Suggested Courses

624 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question