Improve company productivity with a Business Account.Sign Up

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 388
  • Last Modified:

importing text data into excel

Hi Experts;

I have a text file that i need to bring into Excel.  Problem is the data from the text file is all in one column, when brought into the Excel file i need it separated into separate columns.  Here is an Example of what i have;
Anna Lee Abelson 26 Phillips St. Andover, MA  01810
Robert Abisi 349 South Main Street Andover, Ma. 01810
Abdesslem Aboutori 6 Miller St Methuen, Ma 01840
Denise  Abraham 56 Porter Rd. Andover, MA  01810
Abrams, Mr. M. 13 York Street Andover, MA  01810
Rafael Acevedo 42 Boxford Street Lawrence, Ma. 01843
 George & Allison Adam 37 Smithshire Andover, MA  01810

Here is what i need it to look line;
FName        LName           Address                                     City               State    ZIP

Denise                     Abraham       56 Porter Rd.                             Andover       MA      01810
Mr. M.                     Abrams          13 York Street                           Andover        MA      01810
Rafael                      Acevedo        42 Boxford Street                     Lawrence      Ma.      01843
George &  Allison  Adam             37 Smithshire                            Andover        MA  01810

is there a way to bring in the data from the text file into Excel that will separate them into each column?
4 Solutions
dani gammonProduct Operations ManagerCommented:
Hi there,

You can use "Text to Columns" to separate your columns.

Highlight your column and go to: Data --> Text to Columns --> choose 'delimited' --> click next --> choose 'space' --> click next --> click finish.

Your numbers will be separated into separate columns but you'll need to adjust some of the data.
if you go to the data tab and 'get external data, from text' you can open the file directly and choose 'delimited' with 'space'

This will not be perfect as it will separate "George & Alison" into three separate cells and 'St Methuen' into two for example (street address will be worse). correcting that will be a manual chore.

Given the way the data has been presented there is no choice but a manual approach.

If you have the resource and the time you could go through the text and separate appropriate groups with commas then use the same approach but use 'delimited by', comma which might be easier than editing in excel
BeaucairejAuthor Commented:
There are over 3600 line entries and that will not be possible.  i need an automated process, if possible.

thanks for the help.
Anne TroyEast Coast ManagerCommented:
We may be able to help if we can see your file, but as you have explained and demonstrated the data, there would be no automated method. However, there may be methods that would lessen the scrubbing requirements.
There is no automated way, in excel or *any* other computer application, to identify the correct resolution of "George Smith" compared to "George & Alison Smith" unless you have a list of valid first names, surnames and special processing to recognise the & and all the data has been accurately typed. Even then it would be confused by "Michael York 23 York St York" unless it also had a list of street names and states to compare to as well. Add in the fact that there are differences in spelling (MA vs Ma.) and you have a further set of problems to deal with. By the time you have completed a set of rules and supporting lists you will have exceeded the time needed to clean the data by eye - its one of the few things human's do better than computers.

The only method to clean name and address data effectively - without a dedicated application using all the above lists and rules is to have people review it, this is a perennial problem in institutions who deal with large amounts of typed data and no-one has managed a 100% solution despite the millions spent on it.

however all that attention to the problem has led to methods of scrubbing which can be applied as dreamboat suggests.

For example: Once you have loaded the data as described above you can count the number of fields created in each row and that would give you an idea of which are clearly wrong and which are almost right (though they would still need an individual check). Similarly it would relatively easy to identify well formed (though not necessarily valid) zip codes (because there is a comma and rules about what a zip code is whereas the name/address data is subject to variations and the vagaries of human data entry translation, misspelling and abbreviation)

The approach would be to run through multiple passes of the data and extract the correct items generated by each pass until you have a final smaller set with intractable problems that have to be individually corrected. Given the sample and the small number of records you have I would estimate that  you could handle the whole list in a few of days  or with small outlay you could hand it over to a commercial data cleaning firm.

For example: apply the text to columns idea from D4N1 above and put your field names at the top of the first 6 columns then name the subsequent columns as Extra1, Extra2 etc.
use autofilter to identify all items where all these extra's are blank and then cut and paste that set to a new sheet. Manually review the sheet and confirm the contents then go back to the main sheet and filter on an individual Extra being blank, you will then have another group of records, which you know are incorrect, and you manually fix those and so on.

Given a similar problem at a local client when they migrated to a new accounting system I employed a team of students to work a holiday weekend and we cleaned 10,000 records quite effectively
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now