Link to home
Start Free TrialLog in
Avatar of Lloyd Charlier
Lloyd CharlierFlag for United States of America

asked on

Manipulating existing Excel rows and columns to new format

I have several very large Excel spreadsheets containing customer information, each record consisting of multiple rows (variable).  I need to coerce the existing data into a standardized format with a fixed number of rows, concatenating some cells into one and breaking others into separate rows.  While I don't think the VBA required to perform any one of these tasks may not be all that difficult, when I tried doing this myself I quickly hit a wall and gave up.  Perhaps someone here can assist...

The existing spreadsheet looks like this:
User generated image
Notice how the Address field can be comprised of either 3 or 4 lines (depending on whether or not there is a "Suite #) and the Business Classification(s) and Countries where work is performed fields can span multiple rows.  I need to coerce the existing spreadsheet such that:
the Address field is a concatenation of street address and optional suite # separated by a comma
the City, ST Zip information is separated into individually labeled rows (City, ST and Postal Code)
the Country info is separated into its own labeled row (Country)
the Business Classification(s) info is concatenated into one cell separated by commas
the Countries where work is performed info is concatenated into one cell separated by commas

I would like to get the resultant spreadsheet post-manipulation to have one customer record per row with each field (Name, Address, City, ST, Postal Code, Country, etc) in its own column.

The Excel spreadsheets I currently have comprise several hundreds of thousands of records spread across multiple spreadsheets.  Any assistance with the VBA necessary to massage the data into the desired format is greatly appreciated.  Thank you.
Avatar of Norm Dickinson
Norm Dickinson

In the long run the best way to approach this kind of data is to convert/import it into Access.
why not add another column and add formula to concatenate the fields?
e.g. address = B2 & B3 & B4
Avatar of Lloyd Charlier

ASKER

Norm,

Ultimately, all of this data will be imported into Access.  The only reason I asked for an Excel-to-Excel manipulation is that I thought (perhaps ignorantly) that it might be easier to implement.  Once in the desired Excel format, I already know how to import it into Access.  If you can provide guidance on the necessary VBA/queries to perform this import into Access directly, please have at it! <grin>

Thanks.
I think that to do it right, you will want to map out all of the data in a normalization map, so that you know ahead of time what fields and tables you will be putting the data into in order to achieve a workable relational database from your existing flat file. That said, manipulating the data in Excel is probably going to be a bit redundant in that regard and may be very difficult to pull off. For example, some records are going to vary in size by record, such as Countries where Work is Performed. Do you really want to take the time to put that into a comma delimited field in a spreadsheet only to have to break it back out into a table in Access? From the other perspective, this is exactly the kind of data import nearly any large database faces and is fairly common for an import. I think setting up the database with the best normalization possible and then fitting the data into the database is going to be far easier than manipulating the data as it sits and then importing it. Of course I am assuming that you are going to import into a relational database, as opposed to a flat file database, and that the benefits of the relational database concept are going to be realized.
Avatar of Jacques Geday
ok lets take it one step at a time and be practical

Appreciate you correct if my understanding is wrong

1) Your present data is layed out in a (or several) worksheet using only 2 columns A and B like the picture you have attached.

2) if 1) is correct then all the records have the label in A and the data in B and they follow ie when record1 ends, record 2 starts after the last row of record 1)

3) if 2) is correct Are all the Fields have the exact same layout (for multiple rows Filed) like Address, Buisness Classifications, Country where work is Performed
They all start with the Label Address, Buisness Classification etc.. and if data span on a second row the label is blank, the next row label is blank and the data end when there is a new field. Please confirm also this pattern is all along the data. Like you will never have Address on row1 then Address on row 2 for the same record.

4) You need a VBA that will convert this multi/non table layout data in Data Like table format having 1 row per record and having all the fields in Columns like you very clearly explained all along in your post.

Regards
gowflow
gowflow:

Thanks for the response.  I understand your concerns and will try to clear things up a bit.

The data is currently laid out in two (2) columns only.  The number of rows for each customer varies (some have an address comprising two lines if they have a suite # - others just one).  All of the "Address" info (street, optional suite, city, st zip, country) is spread across multiple rows in the "B" column but fronted by a joined 'A' label cell (Address).  The same is true for "Business Classification(s)" and "Countries where work is performed" (i.e. the data spans one or more rows in the 'B' column but is fronted by a joined 'A' label cell.  While I understand and appreciate data normalization for the latter two fields (placing the possible values in their own tables), I suggested concatenation instead for the sake of expediency and the fact that I have no visibility into all of the possible values.  For what it's worth, the importance of the latter two fields is low.

Does this answer your questions or do you require additional clarification?
yes it does but best is to have a sample of your data (that you can alter to remove confidentiality) and the more you have the better it is and will build a macro for you.
gowflow
It is really a toss up as to manipulating the data in the current format as gowflow suggests or importing it into a new database and doing the manipulation there. The advantage to the gowflow method would be if you continue to use the data for a period of time prior to actually launching the database.
gowflow,

See attached file...  Thanks.
Engineering-Firms.xlsx
Perfect for the file,

One last re

•the City, ST Zip information is separated into individually labeled rows (City, ST and Postal Code)


•the Country info is separated into its own labeled row (Country)

the end result you want
Col A    Col B        Col C   Col D     Col E                Col F
Name   Address   City     State   Postal Code    Country

? pls advise for these (shall we break also Address in its constituants or leave the Adress altogether ?

gowflow
gowflow:
In a perfect world, I suppose the 2nd optional address line (suite, apt, unit, ...) should be stored separately ("Address2").  Practically speaking however, either approach would be fine.
ASKER CERTIFIED SOLUTION
Avatar of Jacques Geday
Jacques Geday
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
gowflow,

Your script works perfectly!  I have already run it against multiple spreadsheets, the resultant spreadsheets I subsequently imported into Access to build a 150000+ record database.  I need to tweak it a bit so as to accommodate some anomalous record layouts I've encountered but all-in-all your work will prove to be a real productivity boon for me going forward.  Thanks again for the great job!
Your welcome and glad I could help.
gowflow