Identifying Spreadsheets with different headers

daiwhyte
daiwhyte used Ask the Experts™
on
Hi,

Ive got 300+ csv files which have very similar headers. There appears to be approx 3 different types of csv files. Some have 25 columns whilst others have 26/27/28 columns

Is there a way or program which can group the various csv's based on the amount of colums in the csv file?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Commented:
Hi.
I would use a variant of the VBA code below.
I have not tested it, but it should give you an idea
gordontm

Sub fileloop()
Dim MyDir As String
Dim strPath As String
Dim vaFileName As Variant
im i As Integer
   
    MyDir = ActiveWorkbook.Path ' current path
	    strPath = MyDir & "\files" ' files subdir
	 
	    With Application.FileSearch
	        .NewSearch
	        .LookIn = strPath
	        .SearchSubFolders = False
	        .Filename = ".csv"
	 
	        If .Execute > 0 Then
	 
            For Each vaFileName In .FoundFiles
	                ' open the workbook
	                Workbooks.Open vaFileName
	        
	                With ActiveWorkbook
                            currentName=activeworkbook.name
	                    count=ActiveSheet.UsedRange.Columns.Count
	                    .SaveAs  currentName+trim(count)
	                    .Close
	                End With
	            Next
	        End If
	    End With
	End Sub

Open in new window

Author

Commented:
what do I do with that code? Use the excel vb editor? Sorry very limited knowledge on using code.
I see the Author has tagged Excel 2003 but Just to note, Application.FileSearch no longer works in 2007.
When you say group, do you mean a list in excel according to number of columns or actual movement of the files to different folders?
 
Exploring SharePoint 2016

Explore SharePoint 2016, the web-based, collaborative platform that integrates with Microsoft Office to provide intranets, secure document management, and collaboration so you can develop your online and offline capabilities.

Commented:
Save a blank excel in the same directory as the CSV sheets
From Excel, go to the VBA Editor (ALT-F11)
Right click "This Workbook" and press View Code
Paste the code
Put the cursor on the first line of the code
Press F5

I am sure you will get an error of some kind, this will need to be fixed.

BTW what my code should do is to save all the csvs with the same name but with the number of columns at the end.
(probably better to put the number of columns at the front, then the directory can be sorted)

Author

Commented:
Ok, here is a screenshot of the error which was generated after following the above procedure.
experts.jpg

Author

Commented:
Ok, here is the latest code which Ive tried running in Excel 2003 but nothing happens when I follow your instructions. No errors are generated either.



Sub fileloop()
Dim MyDir As String
Dim strPath As String
Dim vaFileName As Variant
Dim i As Integer
   
    MyDir = ActiveWorkbook.Path 'C:\Documents and Settings\jw\Desktop\pt data'
        strPath = MyDir & "\files" '1'
     
        With Application.FileSearch
            .NewSearch
            .LookIn = strPath
            .SearchSubFolders = False
            .Filename = ".csv"
     
            If .Execute > 0 Then
     
            For Each vaFileName In .FoundFiles
                    ' open the workbook
                    Workbooks.Open vaFileName
           
                    With ActiveWorkbook
                            currentName = ActiveWorkbook.Name
                        Count = ActiveSheet.UsedRange.Columns.Count
                        .SaveAs currentName + Trim(Count)
                        .Close
                    End With
                Next
            End If
        End With
    End Sub


Also, a screenshot for you.
experts.jpg

Author

Commented:
Ok, moved on a bit. It was not looking in the right place for the files. When I hit F5 with the code below, it asks me to save a file which I guess thats what I want but its appending the count to the end of the file so for example:-

data file.csv1

Now the files have more than 1 column so the code is not counting the amount of columns used. The files have between 24 and 28 columns of data in them so I think it needs another tweak.

Sub fileloop()
Dim MyDir As String
Dim strPath As String
Dim vaFileName As Variant
Dim i As Integer
   
    MyDir = ActiveWorkbook.Path 'C:\Documents and Settings\jw\Desktop\pt data\'
        strPath = MyDir 'C:\Documents and Settings\jw\Desktop\pt data\'
     
        With Application.FileSearch
            .NewSearch
            .LookIn = strPath
            .SearchSubFolders = False
            .Filename = ".csv"
     
            If .Execute > 0 Then
     
            For Each vaFileName In .FoundFiles
                    ' open the workbook
                    Workbooks.Open vaFileName
           
                    With ActiveWorkbook
                            currentName = ActiveWorkbook.Name
                        Count = ActiveSheet.UsedRange.Columns.Count
                        .SaveAs currentName + Trim(Count)
                        .Close
                    End With
                Next
            End If
        End With
    End Sub

Author

Commented:
Ok, moved the count to the beginning of the saved file name but now Im getting a lot of entries appearing in the right hand Explorer window of VB Editor. Ive got 500+ files to interrogate so will need to stop this window from getting populated.

Also, latest code

Sub fileloop()
Dim MyDir As String
Dim strPath As String
Dim vaFileName As Variant
Dim i As Integer
   
    MyDir = ActiveWorkbook.Path 'C:\Documents and Settings\jw\Desktop\pt data\'
        strPath = MyDir 'C:\Documents and Settings\jw\Desktop\pt data\'
     
        With Application.FileSearch
            .NewSearch
            .LookIn = strPath
            .SearchSubFolders = False
            .Filename = ".csv"
     
            If .Execute > 0 Then
     
            For Each vaFileName In .FoundFiles
                    ' open the workbook
                    Workbooks.Open vaFileName
           
                    With ActiveWorkbook
                            currentName = ActiveWorkbook.Name
                        Count = ActiveSheet.UsedRange.Columns.Count
                        .SaveAs Trim(Count) + "-" + currentName
                        .Close
                    End With
                Next
            End If
        End With
    End Sub

experts.jpg

Commented:
Hi, you seem to have made great progress!
Good you added the hyphen also....  to the new name
The line .close should close all the files from Excel, and it should disappear from the right hand view as well

Is the file open in Excel as well, or is it just open in the project window?

You could try .close true (which closes the file and automatically saves) instead of .close

Good luck!




Author

Commented:
The only problem I have now gordontim is its not detecting the correct amount of columns used. The code is appending the file name with "1" for all spreadsheets. Any idea why this is happening?

Author

Commented:
Also, I cannot find where the spreadsheets are being saved to?? they are not in the active folder?

Commented:
With regard to your first comment, use the function shown below instead to determine the number of columns:
You could call it by saying count=countColumns(activeWorkbook.name,activeSheet.name)
There are more elegant ways of doing this, but it should work.

With regard to your second comment, you could replace the line .SaveAs Trim(Count) + "-" + currentName
with .SaveAs "C:\temp\"+Trim(Count) + "-" + currentName

Function countColumns(aWorkbook, aSheet)
j = 1
With Workbooks(aWorkbook).Sheets(aSheet)
  Do
    If Trim(.Cells(1, j)) = "" Then countColumns = j - 1: Exit Do
    j = j + 1
  Loop
End With
End Function

Open in new window

Author

Commented:
gordontm, Ive changed the .save function and now that is working.

Struggling with the final piece of the puzzle. Ive inserted the function command at the top of the code and when I press F5 to run the code it errors - please see the attached. I suspect its how Ive inserted the function.
experts.jpg

Author

Commented:
ok, Ive created a module and pasted the function into that. But its still throws the same error when running. It opens the first spreadsheet but when it tries to count the columns, thats when the error pops.

Run-time error '9'
Subscript out of range.

Commented:
Yes, the placing of the function is not the problem...

Maybe add the lines
 debug.print activeworkbook.name
debug.print activesheet.name
after the line saying Function(aWorkbook....)


then press F8 to go through the function line by line and tell me on what exact line there is the error? and also tell me what is printed in the debug window (press CTRL+G to see it)

Author

Commented:
When going through the code using the F8 key, it hightlights the line 'With Workbooks(aWorkbook).Sheets(aSheet)' in yellow, once i hit F8 to move to the next line, it errors. In the Immediate window, this is displayed

01.01.10 Failed.csv
01.01.10 Failed

This looks like the file name of the file which its trying to detect the amount of lines.

Commented:
maybe try replacieing the line With Workbooks(aWorkbook).Sheets(aSheet) with With ActiveSheet

Author

Commented:
Spot on, its just running through the 500+ files. Hopefully it will complete the run through, will post an update by the points are surely your gordontm - thank you.

Commented:
great! look forward to hearing from you!

Author

Commented:
Brillant solution, excellent support throughout. Thank you Gordon.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial