Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

long running compare process

Posted on 2004-08-25
4
Medium Priority
?
166 Views
Last Modified: 2010-04-23
Hi Experts,

I have folder with subfolders containing files like

c:\aaa\001\0001.txt c:\aaa\001\0002.txt
c:\aaa\002\0001.txt
c:\aaa\003\0001.txt c:\aaa\003\0002.txt

I need to get rid of duplicate files. They might be in different subfolders and having different names but file content
may be the same.

Like : c:\aaa\002\0001.txt  may be same as c:\aaa\003\0002.txt

It will be long running process since it will compare each file to the rest but it is OK. Content is plain text and can be
checked on any level whatever is faster.

I wrote it in JS and C but would see it in VB.

Please help.
0
Comment
Question by:fpoyavo
  • 3
4 Comments
 
LVL 86

Accepted Solution

by:
Mike Tomlinson earned 2000 total points
ID: 11898120
Here ya go...

Imports System.IO

Public Class Form1
    Inherits System.Windows.Forms.Form

#Region " Windows Form Designer generated code "

    Public Sub New()
        MyBase.New()

        'This call is required by the Windows Form Designer.
        InitializeComponent()

        'Add any initialization after the InitializeComponent() call

    End Sub

    'Form overrides dispose to clean up the component list.
    Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)
        If disposing Then
            If Not (components Is Nothing) Then
                components.Dispose()
            End If
        End If
        MyBase.Dispose(disposing)
    End Sub

    'Required by the Windows Form Designer
    Private components As System.ComponentModel.IContainer

    'NOTE: The following procedure is required by the Windows Form Designer
    'It can be modified using the Windows Form Designer.  
    'Do not modify it using the code editor.
    Friend WithEvents Label1 As System.Windows.Forms.Label
    Friend WithEvents cmdRootPath As System.Windows.Forms.Button
    Friend WithEvents rootPath As System.Windows.Forms.TextBox
    Friend WithEvents cmdSearch As System.Windows.Forms.Button
    Friend WithEvents Label3 As System.Windows.Forms.Label
    Friend WithEvents lblCurrentFile As System.Windows.Forms.Label
    Friend WithEvents ProgressBar1 As System.Windows.Forms.ProgressBar
    Friend WithEvents lblDeleted As System.Windows.Forms.Label
    Friend WithEvents Label4 As System.Windows.Forms.Label
    Friend WithEvents FolderBrowserDialog1 As System.Windows.Forms.FolderBrowserDialog
    <System.Diagnostics.DebuggerStepThrough()> Private Sub InitializeComponent()
        Me.Label1 = New System.Windows.Forms.Label
        Me.rootPath = New System.Windows.Forms.TextBox
        Me.cmdRootPath = New System.Windows.Forms.Button
        Me.cmdSearch = New System.Windows.Forms.Button
        Me.Label3 = New System.Windows.Forms.Label
        Me.lblCurrentFile = New System.Windows.Forms.Label
        Me.ProgressBar1 = New System.Windows.Forms.ProgressBar
        Me.lblDeleted = New System.Windows.Forms.Label
        Me.Label4 = New System.Windows.Forms.Label
        Me.FolderBrowserDialog1 = New System.Windows.Forms.FolderBrowserDialog
        Me.SuspendLayout()
        '
        'Label1
        '
        Me.Label1.Location = New System.Drawing.Point(8, 8)
        Me.Label1.Name = "Label1"
        Me.Label1.Size = New System.Drawing.Size(80, 16)
        Me.Label1.TabIndex = 0
        Me.Label1.Text = "Root Path:"
        Me.Label1.TextAlign = System.Drawing.ContentAlignment.MiddleRight
        '
        'rootPath
        '
        Me.rootPath.Anchor = CType(((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Left) _
                    Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.rootPath.Location = New System.Drawing.Point(88, 8)
        Me.rootPath.Name = "rootPath"
        Me.rootPath.Size = New System.Drawing.Size(504, 20)
        Me.rootPath.TabIndex = 1
        Me.rootPath.Text = "C:\"
        '
        'cmdRootPath
        '
        Me.cmdRootPath.Anchor = CType((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.cmdRootPath.Location = New System.Drawing.Point(600, 8)
        Me.cmdRootPath.Name = "cmdRootPath"
        Me.cmdRootPath.Size = New System.Drawing.Size(72, 24)
        Me.cmdRootPath.TabIndex = 2
        Me.cmdRootPath.Text = "Select Path"
        '
        'cmdSearch
        '
        Me.cmdSearch.Anchor = CType((System.Windows.Forms.AnchorStyles.Top Or System.Windows.Forms.AnchorStyles.Right), System.Windows.Forms.AnchorStyles)
        Me.cmdSearch.Location = New System.Drawing.Point(600, 40)
        Me.cmdSearch.Name = "cmdSearch"
        Me.cmdSearch.Size = New System.Drawing.Size(72, 24)
        Me.cmdSearch.TabIndex = 8
        Me.cmdSearch.Text = "Search"
        '
        'Label3
        '
        Me.Label3.Location = New System.Drawing.Point(8, 48)
        Me.Label3.Name = "Label3"
        Me.Label3.Size = New System.Drawing.Size(72, 16)
        Me.Label3.TabIndex = 10
        Me.Label3.Text = "Searching:"
        Me.Label3.TextAlign = System.Drawing.ContentAlignment.TopRight
        '
        'lblCurrentFile
        '
        Me.lblCurrentFile.Location = New System.Drawing.Point(88, 48)
        Me.lblCurrentFile.Name = "lblCurrentFile"
        Me.lblCurrentFile.Size = New System.Drawing.Size(504, 32)
        Me.lblCurrentFile.TabIndex = 9
        '
        'ProgressBar1
        '
        Me.ProgressBar1.Location = New System.Drawing.Point(8, 136)
        Me.ProgressBar1.Name = "ProgressBar1"
        Me.ProgressBar1.Size = New System.Drawing.Size(664, 16)
        Me.ProgressBar1.TabIndex = 11
        '
        'lblDeleted
        '
        Me.lblDeleted.Location = New System.Drawing.Point(88, 88)
        Me.lblDeleted.Name = "lblDeleted"
        Me.lblDeleted.Size = New System.Drawing.Size(504, 32)
        Me.lblDeleted.TabIndex = 12
        '
        'Label4
        '
        Me.Label4.Location = New System.Drawing.Point(8, 88)
        Me.Label4.Name = "Label4"
        Me.Label4.Size = New System.Drawing.Size(72, 16)
        Me.Label4.TabIndex = 13
        Me.Label4.Text = "Deleted:"
        Me.Label4.TextAlign = System.Drawing.ContentAlignment.TopRight
        '
        'Form1
        '
        Me.AutoScaleBaseSize = New System.Drawing.Size(5, 13)
        Me.ClientSize = New System.Drawing.Size(680, 158)
        Me.Controls.Add(Me.Label4)
        Me.Controls.Add(Me.lblDeleted)
        Me.Controls.Add(Me.ProgressBar1)
        Me.Controls.Add(Me.Label3)
        Me.Controls.Add(Me.lblCurrentFile)
        Me.Controls.Add(Me.cmdSearch)
        Me.Controls.Add(Me.cmdRootPath)
        Me.Controls.Add(Me.rootPath)
        Me.Controls.Add(Me.Label1)
        Me.Name = "Form1"
        Me.Text = "Find and delete duplicate (contents) text files"
        Me.ResumeLayout(False)

    End Sub

#End Region

    Private fileList As ArrayList
    Private duplicatesDeleted As Integer

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        RootPathChanged(Nothing, Nothing)
    End Sub

    Private Sub cmdRootPath_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdRootPath.Click
        If FolderBrowserDialog1.ShowDialog = DialogResult.OK Then
            rootPath.Text = FolderBrowserDialog1.SelectedPath
        End If
    End Sub

    Private Sub RootPathChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles rootPath.TextChanged
        cmdSearch.Enabled = Directory.Exists(rootPath.Text)
    End Sub

    Private Sub cmdSearch_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdSearch.Click
        Dim de As DictionaryEntry
        Dim fi As FileInfo
        Dim fileName As String

        cmdRootPath.Enabled = False
        rootPath.Enabled = False
        cmdSearch.Enabled = False
        ProgressBar1.Value = 0

        fileList = New ArrayList
        lblCurrentFile.Text = ""
        lblDeleted.Text = ""

        Label3.Text = "Searching:"
        searchForTargetFiles(rootPath.Text)
        lblCurrentFile.Text = ""

        If fileList.Count >= 2 Then
            compareFiles()
            MsgBox(duplicatesDeleted & " duplicate(s) deleted", MsgBoxStyle.Information, "Done")
        Else
            MsgBox("No Duplicates Found")
        End If

        lblCurrentFile.Text = ""
        lblDeleted.Text = ""
        ProgressBar1.Value = 0
        cmdSearch.Enabled = True
        rootPath.Enabled = True
        cmdRootPath.Enabled = True
    End Sub

    Private Sub searchForTargetFiles(ByVal pathToSearch As String)
        Dim fi As FileInfo
        Dim di As DirectoryInfo
        Dim subDI As DirectoryInfo

        lblCurrentFile.Text = pathToSearch
        lblCurrentFile.Refresh()
        Application.DoEvents()
        di = New DirectoryInfo(pathToSearch)
        For Each fi In di.GetFiles("*.txt")
            fileList.Add(fi.FullName)
        Next

        For Each subDI In di.GetDirectories
            Application.DoEvents()
            searchForTargetFiles(subDI.FullName)
        Next
    End Sub

    Private Sub compareFiles()
        Dim i As Integer
        Dim j As Integer
        Dim filename1 As String
        Dim filename2 As String
        Dim sw As StreamReader
        Dim contents1 As String
        Dim contents2 As String
        Dim readError As Boolean
        Dim p As Integer

        Label3.Text = "Comparing:"
        Label3.Refresh()
        i = 0
        duplicatesDeleted = 0
        While i <= fileList.Count - 1
            filename1 = fileList(i)
            lblCurrentFile.Text = filename1
            lblCurrentFile.Refresh()
            Try
                readError = False
                sw = New StreamReader(filename1)
                contents1 = sw.ReadToEnd
                sw.Close()
            Catch ex As Exception
                readError = True
                MsgBox(filename1 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to read file")
            End Try

            If Not readError Then
                j = i + 1
                While j <= fileList.Count - 1
                    Try
                        readError = False
                        filename2 = fileList(j)
                        sw = New StreamReader(filename2)
                        contents2 = sw.ReadToEnd
                        sw.Close()
                    Catch ex As Exception
                        readError = True
                        MsgBox(filename2 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to read file")
                    End Try

                    If Not readError Then
                        If contents2.Equals(contents1) Then
                            Try
                                File.Delete(filename2)
                                duplicatesDeleted = duplicatesDeleted + 1
                                lblDeleted.Text = filename2
                                fileList.RemoveAt(j)

                                ' do not increment j
                                ' file was removed from arraylist
                                ' and everything shifted down
                            Catch ex As Exception
                                MsgBox(filename2 & vbCrLf & vbCrLf & ex.Message, MsgBoxStyle.Critical, "Unable to delete duplicate file")
                                j = j + 1 ' move to next file
                            End Try
                        Else
                            j = j + 1 ' move to next file
                        End If
                    End If
                    Application.DoEvents()
                End While
            End If

            i = i + 1
            p = CInt(i / fileList.Count * 100)
            ProgressBar1.Value = p
            Application.DoEvents()
        End While
    End Sub

End Class
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902218
Wow. Let me try it.
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902425
Idle Mind,

Hmm...it runs and scans directory and subdirectories but it does not find dups.

Thank you.
0
 
LVL 1

Author Comment

by:fpoyavo
ID: 11902451
Sorry It does.

Thanks a lot.
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tutorial demonstrates one way to create an application that runs without any Forms but still has a GUI presence via an Icon in the System Tray. The magic lies in Inheriting from the ApplicationContext Class and passing that to Application.Ru…
I think the Typed DataTable and Typed DataSet are very good options when working with data, but I don't like auto-generated code. First, I create an Abstract Class for my DataTables Common Code.  This class Inherits from DataTable. Also, it can …
This course is ideal for IT System Administrators working with VMware vSphere and its associated products in their company infrastructure. This course teaches you how to install and maintain this virtualization technology to store data, prevent vuln…
Integration Management Part 2
Suggested Courses
Course of the Month12 days, 5 hours left to enroll

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question