Avatar of Arvind Kumar
Arvind Kumar

asked on 

Repair corrupted files and Clean, Trim, Loop through Folder

Hello. I am nil at VBA. Probably will never use excel again after current use,
I have 2500 Excel 2013 (XLSX) files, corrupted because recovered after HDD Volume was deleted by mistake.
Hence two questions.
1-When file opened, excel asks to repair, I click Yes, in few seconds corrected file ready to be save.Possible to have VBA to Automate the process?
2-Many  files need to be trimmed for extra space, Non printing characters to be removed, and characters like ***++Ášê›‰$‰$‰$‰$å„%f/1ûmÆìsŽ(©fàÂŒ9òBËÙ•!‹ØJˆ to be removed. Is it possible to have a VBA for these three, the loop through each file in folder until all done?

Please let me know if I needed to post both questions separately. Thank you.
VBA

Avatar of undefined
Last Comment
Sam Jacobs
Avatar of Sam Jacobs
Sam Jacobs
Flag of United States of America image

For #1, do you have a corrupt file you can share? I don;t know how to create one to test code on.
For #2, do you want multiple non-alpha characters removed (e.g. ** or ++)?
Avatar of Arvind Kumar
Arvind Kumar

ASKER

Thank you so much for kindness. Two files are attached. On with strange characters, all need to be removed.
Second file is the one which asks to be corrected. It has personal information. Will file be available to other members to view? Any way I can restrict it to only you? Thank you.
All--XLSX-69380-c.XLSX
Avatar of Sam Jacobs
Sam Jacobs
Flag of United States of America image

You can send me a direct message and attach the file there.
Avatar of Arvind Kumar
Arvind Kumar

ASKER

I am so sorry, but I do not know how to send direct message. I do not see any option to do that.
May be, I do not have full privileges yet because I just signed up little while ago.
Avatar of Arvind Kumar
Arvind Kumar

ASKER

I am so sorry, but I do not know how to send direct message. I do not see any option to do that.
May be, I do not have full privileges yet because I just signed up little while ago.
Avatar of Sam Jacobs
Sam Jacobs
Flag of United States of America image

Hi Arvind ... I just sent you a DM ... you can simply reply to it.
For the future, you can simply hover over my picture, and a Message button should appear.
Avatar of Sam Jacobs
Sam Jacobs
Flag of United States of America image

Arvind,

It seems that Excel will not allow you to overwrite a recovered file without user intervention, so dump all the original files into one directory, and the code below will write the recovered/modified spreadsheets to a new directory. You can always delete the original directory if you so choose, but it's probably a good idea to keep the original files anyway.
The code removes all non-alpha characters (anything not A-Z, a-z, 0-9, or a space).
Unfortunately, the code has no way to know that something like "h1WPBFmSw3fp8" is junk, so it can't remove it.
Please add the following code to a VBA module. Modify the directory names at the top. Dump the original files into the "ToClean" directory, and make sure to create the "Cleaned" directory.  
Option Explicit

Const DirectoryToClean = "C:\Data\Experts Exchange\Excel\ToClean"
Const CleanedDirectory = "C:\Data\Experts Exchange\Excel\Cleaned"

Public Function StripNonAsciiChars(ByVal InputString As String) As String
    Dim i As Integer
    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = "[^a-zA-Z0-9 ]"
        StripNonAsciiChars = Trim(RegEx.Replace(InputString, ""))
    End With
End Function

Sub CleanWorkBook()

Dim lastRow As Long
lastRow = Cells(ActiveSheet.Rows.Count, 1).End(xlUp).Row
Dim lastCol As Integer
Dim colIdx As Integer
Dim rowIdx As Long
Dim sht As Worksheet
Dim strCell As String
Dim bHaveChars As Boolean
Set sht = ActiveSheet

For rowIdx = 1 To lastRow
    If rowIdx > lastRow Then Exit For
    bHaveChars = False
    lastCol = Cells(rowIdx, ActiveSheet.Columns.Count).End(xlToLeft).Column
    'Debug.Print "Checking Row: " & CStr(rowIdx) & " ... Cols: " & CStr(lastCol)
    
    For colIdx = 1 To lastCol
        strCell = StripNonAsciiChars(sht.Cells(rowIdx, colIdx))
        sht.Cells(rowIdx, colIdx) = strCell
        If Len(strCell) > 0 Then bHaveChars = True
    Next
    If bHaveChars = False Then
        ' delete the row
        Rows(rowIdx).Delete
        rowIdx = rowIdx - 1
        lastRow = lastRow - 1
    End If
Next

End Sub

Sub CleanDirectory()

Dim idx As Integer
Dim fso As Object       '// Scripting.fileSystemObject
Set fso = CreateObject("Scripting.FileSystemObject")
    
Dim fld As Object       '// Scripting.folder
Set fld = fso.GetFolder(DirectoryToClean)
    
Dim fl As Object        '// Scripting.File
Dim wb As Workbook

Dim ToCleanName As String
Dim CleanedName As String

For Each fl In fld.Files
    ToCleanName = DirectoryToClean & "\" & fl.Name
    CleanedName = CleanedDirectory & "\" & fl.Name
    Debug.Print "Cleaning: " & fl.Name
    Application.StatusBar = "Cleaning: " & ToCleanName
    DoEvents
    Set wb = Workbooks.Open(ToCleanName, , , , , , , , , , , , , , True)
    CleanWorkBook
    wb.SaveAs Filename:=CleanedName
    wb.Close saveChanges:=True
Next
    
Application.StatusBar = "Ready."
Debug.Print "Done!"

End Sub

Open in new window

Avatar of Arvind Kumar
Arvind Kumar

ASKER

Hello Sam:
Hope, you had some rest. I had appointments since morning. I am just starting to deploy to VBA, and will be back to you.
Arvind.
Avatar of Sam Jacobs
Sam Jacobs
Flag of United States of America image

Arvind,

Here i the updated code. If there's no corruption (unprintable characters) within the first 10 lines, I assume the rest of the file is ok too.

Sam
Option Explicit

Const DirectoryToClean = "C:\Data\Experts Exchange\Excel\ToClean"
Const CleanedDirectory = "C:\Data\Experts Exchange\Excel\Cleaned"

Public Function StripNonAsciiChars(ByVal InputString As String) As String
    Dim i As Integer
    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = "[^a-zA-Z0-9 ]"
        StripNonAsciiChars = Trim(RegEx.Replace(InputString, ""))
    End With
End Function

' Do not execute this macro directly
' it is called from CleanDirectory

Sub CleanWorkBook()

Dim lastRow As Long
lastRow = Cells(ActiveSheet.Rows.Count, 1).End(xlUp).Row
Dim lastCol As Integer
Dim colIdx As Integer
Dim rowIdx As Long
Dim sht As Worksheet
Dim strCell As String
Dim bHaveChars As Boolean
Dim bCorruptionDetected As Boolean

Set sht = ActiveSheet
bCorruptionDetected = False

For rowIdx = 1 To lastRow
    If rowIdx > lastRow Then Exit For
     ' if no corruption in the first 10 lines, assume the rest of the file is ok
    If rowIdx > 10 And bCorruptionDetected = False Then
        Exit For
    End If
    bHaveChars = False
    lastCol = Cells(rowIdx, ActiveSheet.Columns.Count).End(xlToLeft).Column
    'Debug.Print "Checking Row: " & CStr(rowIdx) & " ... Cols: " & CStr(lastCol)
    
    For colIdx = 1 To lastCol
        strCell = StripNonAsciiChars(sht.Cells(rowIdx, colIdx))
        If Len(strCell) < Len(sht.Cells(rowIdx, colIdx)) Then
            sht.Cells(rowIdx, colIdx) = strCell
            bCorruptionDetected = True
        End If
        If Len(strCell) > 0 Then bHaveChars = True
    Next
    If bHaveChars = False Then
        ' delete the row
        Rows(rowIdx).Delete
        rowIdx = rowIdx - 1
        lastRow = lastRow - 1
    End If
Next

End Sub

Sub CleanDirectory()

Dim idx As Integer
Dim fso As Object       '// Scripting.fileSystemObject
Set fso = CreateObject("Scripting.FileSystemObject")
    
Dim fld As Object       '// Scripting.folder
Set fld = fso.GetFolder(DirectoryToClean)
    
Dim fl As Object        '// Scripting.File
Dim wb As Workbook

Dim ToCleanName As String
Dim CleanedName As String

For Each fl In fld.Files
    ToCleanName = DirectoryToClean & "\" & fl.Name
    CleanedName = CleanedDirectory & "\" & fl.Name
    Debug.Print "Cleaning: " & fl.Name
    Application.StatusBar = "Cleaning: " & ToCleanName
    DoEvents
    Application.ScreenUpdating = False
    Set wb = Workbooks.Open(ToCleanName, , , , , , , , , , , , , , True)
    CleanWorkBook
    wb.SaveAs Filename:=CleanedName
    wb.Close saveChanges:=True
    Application.ScreenUpdating = True
Next
    
Application.ScreenUpdating = True
Application.StatusBar = "Ready."
Debug.Print "Done!"

End Sub

Open in new window

Avatar of Arvind Kumar
Arvind Kumar

ASKER

Hello Sam:
Screen Update negative and quick sampling if file is corrupted, these both really expedited.
VBA does skip files as you desired.
Now average time of processed files is about 4 to 5 minutes for 6-8 MB file. Removes about 1 to 1.5 MB of junk. I think, Time is acceptable.

I will feed back if there is new information. In the mean time, Thanks again.
Arvind
ASKER CERTIFIED SOLUTION
Avatar of Sam Jacobs
Sam Jacobs
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of Arvind Kumar
Arvind Kumar

ASKER

Amazing nice disposition, extremely prompt response and follow up, fantastic synthesis of VBA.
I thought this VBA not possible. I just tried my luck and Sam Jacobs proved his mettle with so much dedication, worked until 2 AM to author the VBA. He engaged and performed way above and beyond what I could hope for.
He is an inspiration - truly.
I am very happy, and highly recommend Sam Jacobs.
Arvind
Avatar of Sam Jacobs
Sam Jacobs
Flag of United States of America image

Thank you, Arvind.
VBA
VBA

Visual Basic for Applications (VBA) enables building user-defined functions (UDFs), automating processes and accessing Windows API and other low-level functionality through dynamic-link libraries (DLLs). VBA is closely related to Visual Basic and uses the Visual Basic Runtime Library, but it can normally only run code within a host application rather than as a standalone program. It can, however, be used to control one application from another via OLE Automation. VBA is built into most Microsoft Office applications.

17K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo