Solved

long running compare process

Posted on 2004-08-25
4
158 Views
Last Modified: 2010-04-23
Hi Experts,

I have folder with subfolders containing files like

c:\aaa\001\0001.txt c:\aaa\001\0002.txt
c:\aaa\002\0001.txt
c:\aaa\003\0001.txt c:\aaa\003\0002.txt

I need to get rid of duplicate files. They might be in different subfolders and having different names but file content
may be the same.

Like : c:\aaa\002\0001.txt  may be same as c:\aaa\003\0002.txt

It will be long running process since it will compare each file to the rest but it is OK. Content is plain text and can be
checked on any level whatever is faster.

I wrote it in JS and C but would see it in VB.

Please help.
0
Comment
Question by:fpoyavo
  • 3
4 Comments
 
LVL 85

Accepted Solution

by:
Mike Tomlinson earned 500 total points
ID: 11898120
Here ya go...

Imports System.IO

Public Class Form1
    Inherits System.Windows.Forms.Form

#Region " Windows Form Designer generated code "

    Public Sub New()
        MyBase.New()

        'This call is required by the Windows Form Designer.
        InitializeComponent()

        'Add any initialization after the InitializeComponent() call

    End Sub

    'Form overrides dispose to clean up the component list.
    Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)
        If disposing Then
            If Not (components Is Nothing) Then
                components.Dispose()
            End If
        End If
        MyBase.Dispose(disposing)
    End Sub

    'Required by the Windows Form Designer
    Private components As System.ComponentModel.IContainer

    'NOTE: The following procedure is required by the Windows Form Designer
    'It can be modified using the Windows Form Designer.  
    'Do not modify it using the code editor.
    Friend WithEvents Label1 As System.Windows.Forms.Label
    Friend WithEvents cmdRootPath As System.Windows.Forms.Button
    Friend WithEvents rootPath As System.Windows.Forms.TextBox
    Friend WithEvents cmdSearch As System.Windows.Forms.Button
    Friend WithEvents Label3 As System.Windows.Forms.Label
    Friend WithEvents lblCurrentFile As System.Windows.Forms.Label
    Friend WithEvents ProgressBar1 As System.Windows.Forms.ProgressBar
    Friend WithEvents lblDeleted As System.Windows.Forms.Label
    Friend WithEvents Label4 As System.Windows.Forms.Label
    Friend WithEvents FolderBrowserDialog1 As System.Windows.Forms.FolderBrowserDialog
    <System.Diagnostics.DebuggerStepThrough()> Private Sub InitializeComponent()
        Me.Label1 = New System.Windows.Forms.Label
        Me.rootPath = New System.Windows.Forms.TextBox
        Me.cmdRootPath = New System.Windows.Forms.Button
        Me.cmdSearch = New System.Windows.Forms.Button
        Me.Label3 = New System.Windows.Forms.Label
        Me.lblCurrentFile = New System.Windows.Forms.Label
        Me.ProgressBar1 = New System.Windows.Forms.ProgressBar
        Me.lblDeleted = New System.Windows.Forms.Label
        Me.Label4 = New System.Windows.Forms.Label
        Me.FolderBrowserDialog1 = New System.Windows.Forms.FolderBrowserDialog
        Me.SuspendLayout()
        '
        'Label1
        '
        Me.Label1.Location = New System.Drawing.Point(8, 8)
        Me.Label1.Name = "Label1"
        Me.Label1.Size = New System.Drawing.Size(80, 16)
        Me.Label1.TabIndex = 0
        Me.Label1.Text = "Root Path:"
        Me.Label1.TextAlign = System.Drawing.ContentAlignment.MiddleRight
        '
        'rootPath
        '
        Me.rootPath.Anchor = CType(((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Left) _
                    Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.rootPath.Location = New System.Drawing.Point(88, 8)
        Me.rootPath.Name = "rootPath"
        Me.rootPath.Size = New System.Drawing.Size(504, 20)
        Me.rootPath.TabIndex = 1
        Me.rootPath.Text = "C:\"
        '
        'cmdRootPath
        '
        Me.cmdRootPath.Anchor = CType((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.cmdRootPath.Location = New System.Drawing.Point(600, 8)
        Me.cmdRootPath.Name = "cmdRootPath"
        Me.cmdRootPath.Size = New System.Drawing.Size(72, 24)
        Me.cmdRootPath.TabIndex = 2
        Me.cmdRootPath.Text = "Select Path"
        '
        'cmdSearch
        '
        Me.cmdSearch.Anchor = CType((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.cmdSearch.Location = New System.Drawing.Point(600, 40)
        Me.cmdSearch.Name = "cmdSearch"
        Me.cmdSearch.Size = New System.Drawing.Size(72, 24)
        Me.cmdSearch.TabIndex = 8
        Me.cmdSearch.Text = "Search"
        '
        'Label3
        '
        Me.Label3.Location = New System.Drawing.Point(8, 48)
        Me.Label3.Name = "Label3"
        Me.Label3.Size = New System.Drawing.Size(72, 16)
        Me.Label3.TabIndex = 10
        Me.Label3.Text = "Searching:"
        Me.Label3.TextAlign = System.Drawing.ContentAlignment.TopRight
        '
        'lblCurrentFile
        '
        Me.lblCurrentFile.Location = New System.Drawing.Point(88, 48)
        Me.lblCurrentFile.Name = "lblCurrentFile"
        Me.lblCurrentFile.Size = New System.Drawing.Size(504, 32)
        Me.lblCurrentFile.TabIndex = 9
        '
        'ProgressBar1
        '
        Me.ProgressBar1.Location = New System.Drawing.Point(8, 136)
        Me.ProgressBar1.Name = "ProgressBar1"
        Me.ProgressBar1.Size = New System.Drawing.Size(664, 16)
        Me.ProgressBar1.TabIndex = 11
        '
        'lblDeleted
        '
        Me.lblDeleted.Location = New System.Drawing.Point(88, 88)
        Me.lblDeleted.Name = "lblDeleted"
        Me.lblDeleted.Size = New System.Drawing.Size(504, 32)
        Me.lblDeleted.TabIndex = 12
        '
        'Label4
        '
        Me.Label4.Location = New System.Drawing.Point(8, 88)
        Me.Label4.Name = "Label4"
        Me.Label4.Size = New System.Drawing.Size(72, 16)
        Me.Label4.TabIndex = 13
        Me.Label4.Text = "Deleted:"
        Me.Label4.TextAlign = System.Drawing.ContentAlignment.TopRight
        '
        'Form1
        '
        Me.AutoScaleBaseSize = New System.Drawing.Size(5, 13)
        Me.ClientSize = New System.Drawing.Size(680, 158)
        Me.Controls.Add(Me.Label4)
        Me.Controls.Add(Me.lblDeleted)
        Me.Controls.Add(Me.ProgressBar1)
        Me.Controls.Add(Me.Label3)
        Me.Controls.Add(Me.lblCurrentFile)
        Me.Controls.Add(Me.cmdSearch)
        Me.Controls.Add(Me.cmdRootPath)
        Me.Controls.Add(Me.rootPath)
        Me.Controls.Add(Me.Label1)
        Me.Name = "Form1"
        Me.Text = "Find and delete duplicate (contents) text files"
        Me.ResumeLayout(False)

    End Sub

#End Region

    Private fileList As ArrayList
    Private duplicatesDeleted As Integer

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        RootPathChanged(Nothing, Nothing)
    End Sub

    Private Sub cmdRootPath_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdRootPath.Click
        If FolderBrowserDialog1.ShowDialog = DialogResult.OK Then
            rootPath.Text = FolderBrowserDialog1.SelectedPath
        End If
    End Sub

    Private Sub RootPathChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles rootPath.TextChanged
        cmdSearch.Enabled = Directory.Exists(rootPath.Text)
    End Sub

    Private Sub cmdSearch_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdSearch.Click
        Dim de As DictionaryEntry
        Dim fi As FileInfo
        Dim fileName As String

        cmdRootPath.Enabled = False
        rootPath.Enabled = False
        cmdSearch.Enabled = False
        ProgressBar1.Value = 0

        fileList = New ArrayList
        lblCurrentFile.Text = ""
        lblDeleted.Text = ""

        Label3.Text = "Searching:"
        searchForTargetFiles(rootPath.Text)
        lblCurrentFile.Text = ""

        If fileList.Count >= 2 Then
            compareFiles()
            MsgBox(duplicatesDeleted & " duplicate(s) deleted", MsgBoxStyle.Information, "Done")
        Else
            MsgBox("No Duplicates Found")
        End If

        lblCurrentFile.Text = ""
        lblDeleted.Text = ""
        ProgressBar1.Value = 0
        cmdSearch.Enabled = True
        rootPath.Enabled = True
        cmdRootPath.Enabled = True
    End Sub

    Private Sub searchForTargetFiles(ByVal pathToSearch As String)
        Dim fi As FileInfo
        Dim di As DirectoryInfo
        Dim subDI As DirectoryInfo

        lblCurrentFile.Text = pathToSearch
        lblCurrentFile.Refresh()
        Application.DoEvents()
        di = New DirectoryInfo(pathToSearch)
        For Each fi In di.GetFiles("*.txt")
            fileList.Add(fi.FullName)
        Next

        For Each subDI In di.GetDirectories
            Application.DoEvents()
            searchForTargetFiles(subDI.FullName)
        Next
    End Sub

    Private Sub compareFiles()
        Dim i As Integer
        Dim j As Integer
        Dim filename1 As String
        Dim filename2 As String
        Dim sw As StreamReader
        Dim contents1 As String
        Dim contents2 As String
        Dim readError As Boolean
        Dim p As Integer

        Label3.Text = "Comparing:"
        Label3.Refresh()
        i = 0
        duplicatesDeleted = 0
        While i <= fileList.Count - 1
            filename1 = fileList(i)
            lblCurrentFile.Text = filename1
            lblCurrentFile.Refresh()
            Try
                readError = False
                sw = New StreamReader(filename1)
                contents1 = sw.ReadToEnd
                sw.Close()
            Catch ex As Exception
                readError = True
                MsgBox(filename1 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to read file")
            End Try

            If Not readError Then
                j = i + 1
                While j <= fileList.Count - 1
                    Try
                        readError = False
                        filename2 = fileList(j)
                        sw = New StreamReader(filename2)
                        contents2 = sw.ReadToEnd
                        sw.Close()
                    Catch ex As Exception
                        readError = True
                        MsgBox(filename2 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to read file")
                    End Try

                    If Not readError Then
                        If contents2.Equals(contents1) Then
                            Try
                                File.Delete(filename2)
                                duplicatesDeleted = duplicatesDeleted + 1
                                lblDeleted.Text = filename2
                                fileList.RemoveAt(j)

                                ' do not increment j
                                ' file was removed from arraylist
                                ' and everything shifted down
                            Catch ex As Exception
                                MsgBox(filename2 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to delete duplicate file")
                                j = j + 1 ' move to next file
                            End Try
                        Else
                            j = j + 1 ' move to next file
                        End If
                    End If
                    Application.DoEvents()
                End While
            End If

            i = i + 1
            p = CInt(i / fileList.Count * 100)
            ProgressBar1.Value = p
            Application.DoEvents()
        End While
    End Sub

End Class
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902218
Wow. Let me try it.
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902425
Idle Mind,

Hmm...it runs and scans directory and subdirectories but it does not find dups.

Thank you.
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902451
Sorry It does.

Thanks a lot.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since .Net 2.0, Visual Basic has made it easy to create a splash screen and set it via the "Splash Screen" drop down in the Project Properties.  A splash screen set in this manner is automatically created, displayed and closed by the framework itsel…
1.0 - Introduction Converting Visual Basic 6.0 (VB6) to Visual Basic 2008+ (VB.NET). If ever there was a subject full of murkiness and bad decisions, it is this one!   The first problem seems to be that people considering this task of converting…
This Micro Tutorial will give you a basic overview how to record your screen with Microsoft Expression Encoder. This program is still free and open for the public to download. This will be demonstrated using Microsoft Expression Encoder 4.
This is used to tweak the memory usage for your computer, it is used for servers more so than workstations but just be careful editing registry settings as it may cause irreversible results. I hold no responsibility for anything you do to the regist…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now