Solved

Manipulating existing Excel rows and columns to new format

Posted on 2014-03-20
14
214 Views
Last Modified: 2014-03-27
I have several very large Excel spreadsheets containing customer information, each record consisting of multiple rows (variable).  I need to coerce the existing data into a standardized format with a fixed number of rows, concatenating some cells into one and breaking others into separate rows.  While I don't think the VBA required to perform any one of these tasks may not be all that difficult, when I tried doing this myself I quickly hit a wall and gave up.  Perhaps someone here can assist...

The existing spreadsheet looks like this:
Existing Excel spreadsheet layout
Notice how the Address field can be comprised of either 3 or 4 lines (depending on whether or not there is a "Suite #) and the Business Classification(s) and Countries where work is performed fields can span multiple rows.  I need to coerce the existing spreadsheet such that:
the Address field is a concatenation of street address and optional suite # separated by a comma
the City, ST Zip information is separated into individually labeled rows (City, ST and Postal Code)
the Country info is separated into its own labeled row (Country)
the Business Classification(s) info is concatenated into one cell separated by commas
the Countries where work is performed info is concatenated into one cell separated by commas

I would like to get the resultant spreadsheet post-manipulation to have one customer record per row with each field (Name, Address, City, ST, Postal Code, Country, etc) in its own column.

The Excel spreadsheets I currently have comprise several hundreds of thousands of records spread across multiple spreadsheets.  Any assistance with the VBA necessary to massage the data into the desired format is greatly appreciated.  Thank you.
0
Comment
Question by:Lloyd Charlier
  • 5
  • 5
  • 3
  • +1
14 Comments
 
LVL 13

Expert Comment

by:Norm Dickinson
Comment Utility
In the long run the best way to approach this kind of data is to convert/import it into Access.
0
 
LVL 7

Expert Comment

by:COACHMAN99
Comment Utility
why not add another column and add formula to concatenate the fields?
e.g. address = B2 & B3 & B4
0
 

Author Comment

by:Lloyd Charlier
Comment Utility
Norm,

Ultimately, all of this data will be imported into Access.  The only reason I asked for an Excel-to-Excel manipulation is that I thought (perhaps ignorantly) that it might be easier to implement.  Once in the desired Excel format, I already know how to import it into Access.  If you can provide guidance on the necessary VBA/queries to perform this import into Access directly, please have at it! <grin>

Thanks.
0
 
LVL 13

Expert Comment

by:Norm Dickinson
Comment Utility
I think that to do it right, you will want to map out all of the data in a normalization map, so that you know ahead of time what fields and tables you will be putting the data into in order to achieve a workable relational database from your existing flat file. That said, manipulating the data in Excel is probably going to be a bit redundant in that regard and may be very difficult to pull off. For example, some records are going to vary in size by record, such as Countries where Work is Performed. Do you really want to take the time to put that into a comma delimited field in a spreadsheet only to have to break it back out into a table in Access? From the other perspective, this is exactly the kind of data import nearly any large database faces and is fairly common for an import. I think setting up the database with the best normalization possible and then fitting the data into the database is going to be far easier than manipulating the data as it sits and then importing it. Of course I am assuming that you are going to import into a relational database, as opposed to a flat file database, and that the benefits of the relational database concept are going to be realized.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
ok lets take it one step at a time and be practical

Appreciate you correct if my understanding is wrong

1) Your present data is layed out in a (or several) worksheet using only 2 columns A and B like the picture you have attached.

2) if 1) is correct then all the records have the label in A and the data in B and they follow ie when record1 ends, record 2 starts after the last row of record 1)

3) if 2) is correct Are all the Fields have the exact same layout (for multiple rows Filed) like Address, Buisness Classifications, Country where work is Performed
They all start with the Label Address, Buisness Classification etc.. and if data span on a second row the label is blank, the next row label is blank and the data end when there is a new field. Please confirm also this pattern is all along the data. Like you will never have Address on row1 then Address on row 2 for the same record.

4) You need a VBA that will convert this multi/non table layout data in Data Like table format having 1 row per record and having all the fields in Columns like you very clearly explained all along in your post.

Regards
gowflow
0
 

Author Comment

by:Lloyd Charlier
Comment Utility
gowflow:

Thanks for the response.  I understand your concerns and will try to clear things up a bit.

The data is currently laid out in two (2) columns only.  The number of rows for each customer varies (some have an address comprising two lines if they have a suite # - others just one).  All of the "Address" info (street, optional suite, city, st zip, country) is spread across multiple rows in the "B" column but fronted by a joined 'A' label cell (Address).  The same is true for "Business Classification(s)" and "Countries where work is performed" (i.e. the data spans one or more rows in the 'B' column but is fronted by a joined 'A' label cell.  While I understand and appreciate data normalization for the latter two fields (placing the possible values in their own tables), I suggested concatenation instead for the sake of expediency and the fact that I have no visibility into all of the possible values.  For what it's worth, the importance of the latter two fields is low.

Does this answer your questions or do you require additional clarification?
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
yes it does but best is to have a sample of your data (that you can alter to remove confidentiality) and the more you have the better it is and will build a macro for you.
gowflow
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 13

Expert Comment

by:Norm Dickinson
Comment Utility
It is really a toss up as to manipulating the data in the current format as gowflow suggests or importing it into a new database and doing the manipulation there. The advantage to the gowflow method would be if you continue to use the data for a period of time prior to actually launching the database.
0
 

Author Comment

by:Lloyd Charlier
Comment Utility
gowflow,

See attached file...  Thanks.
Engineering-Firms.xlsx
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Perfect for the file,

One last re

•the City, ST Zip information is separated into individually labeled rows (City, ST and Postal Code)


•the Country info is separated into its own labeled row (Country)

the end result you want
Col A    Col B        Col C   Col D     Col E                Col F
Name   Address   City     State   Postal Code    Country

? pls advise for these (shall we break also Address in its constituants or leave the Adress altogether ?

gowflow
0
 

Author Comment

by:Lloyd Charlier
Comment Utility
gowflow:
In a perfect world, I suppose the 2nd optional address line (suite, apt, unit, ...) should be stored separately ("Address2").  Practically speaking however, either approach would be fine.
0
 
LVL 29

Accepted Solution

by:
gowflow earned 500 total points
Comment Utility
Hi

Sorry it took sometime, as was a bit busy and needed to make sure data is converted correctly.

I have attached the macro in the file posted pls make sure macros are enabled and in sheet1 Activate the button Transfer Data to Table and check results in DataTable sheet.

Here is the code for that.

Sub CreateDataTable()
Dim WS As Worksheet
Dim WSTable As Worksheet
Dim MaxRow As Long, I As Long, J As Long
Dim Title As Variant, CSC As Variant

'---> Disable Events
With Application
    .EnableEvents = False
    .DisplayAlerts = False
    .ScreenUpdating = False
End With

Set WS = ActiveSheet
Set WSTable = Sheets("DataTable")
MaxRow = WS.Range("B" & WS.Rows.Count).End(xlUp).Row

'---> Clean DataTable
WSTable.Cells.Delete

'---> Generate Header
Title = Array("Name", "Address1", "Address2", "City", "State", "Postal Code", "Country", "Phone", "Fax", "Website", "Primary Contact", "Description", "Total Number of employees", "Business Classifications", "Countries where work is performed")
WSTable.Range(WSTable.Range("A1"), WSTable.Cells(1, 15)).Value = Title
WSTable.UsedRange.EntireColumn.AutoFit
J = 1

'---> Start Process Creating DataTable
For I = 1 To MaxRow
    
    Select Case WS.Cells(I, "A")
        Case "Name:", "Phone:", "Fax:", "Website:", "Primary Contact:", "Description:", "Total Number of employees:"
            
            '---> Increment Row
            If WS.Cells(I, "A") = "Name:" Then
                J = J + 1
                WSTable.UsedRange.EntireColumn.AutoFit
            End If
            
            For K = 0 To UBound(Title)
                If InStr(1, WS.Cells(I, "A"), Title(K)) <> 0 Then
                    Exit For
                End If
            Next K
            WSTable.Cells(J, K + 1) = WS.Cells(I, "B")
        
        Case "Address:"
            '---> Address1
            WSTable.Cells(J, 2) = WS.Cells(I, "B")
            I = I + 1
            
            '---> Address2
            If WS.Range("A" & I).End(xlDown).Row - I > 2 Then
            'If InStr(1, LCase(WS.Cells(I, "B")), "suite") <> 0 Then
                WSTable.Cells(J, 3) = WS.Cells(I, "B")
                I = I + 1
            End If
                
            '---> City, State, Postal
            '---> City is all before the comma
            CSC = Split(WS.Cells(I, "B"), ",")
            WSTable.Cells(J, 4) = CSC(0)
            
            CSC = Split(Trim(CSC(1)), " ")
            For K = LBound(CSC) To UBound(CSC)
                WSTable.Cells(J, K + 5) = CSC(K)
            Next K
            I = I + 1
            
            '---> Country
            WSTable.Cells(J, 7) = WS.Cells(I, "B")
            
        Case "Business Classifications:", "Countries where work is performed:"
            For K = 0 To UBound(Title)
                If InStr(1, WS.Cells(I, "A"), Title(K)) <> 0 Then
                    Exit For
                End If
            Next K
            Do
                If WSTable.Cells(J, K + 1) <> "" Then WSTable.Cells(J, K + 1) = WSTable.Cells(J, K + 1) & ", "
                WSTable.Cells(J, K + 1) = WSTable.Cells(J, K + 1) & WS.Cells(I, "B")
                I = I + 1
            Loop Until WS.Cells(I, "A") <> ""
            I = I - 1
            
    End Select

Next I

'---> Enable Events
With Application
    .EnableEvents = False
    .DisplayAlerts = False
    .ScreenUpdating = False
End With


MsgBox ("All data Transfered successfully for " & J & " records")

End Sub

Open in new window



Let me know your comments.
gowflow
Engineering-Firms.xlsm
0
 

Author Closing Comment

by:Lloyd Charlier
Comment Utility
gowflow,

Your script works perfectly!  I have already run it against multiple spreadsheets, the resultant spreadsheets I subsequently imported into Access to build a 150000+ record database.  I need to tweak it a bit so as to accommodate some anomalous record layouts I've encountered but all-in-all your work will prove to be a real productivity boon for me going forward.  Thanks again for the great job!
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Your welcome and glad I could help.
gowflow
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

HOW TO: Install and Configure VMware vSphere Hypervisor 6.5 (ESXi 6.5), Step by Step Tutorial with screenshots. From Download, Checking Media, to Completed Installation.
Never store passwords in plain text or just their hash: it seems a no-brainier, but there are still plenty of people doing that. I present the why and how on this subject, offering my own real life solution that you can implement right away, bringin…
Viewers will learn how to maximize accessibility options in an Excel workbook for users with accessibility issues.
Viewers will learn how to create a PivotTable and make basic changes to it in Excel 2013.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now