Improve Data Cleaning Macro

The solution to Q-28534211, attached, cleans up electoral roll address data into:
Flat ID
Building ID
Street
Address3
Address4
Postal Town
Postcode

1 anomaly remains to solve:

1. Cases where there is only one number block, and that is the Building ID, the data is erroneously shifted one field to the right. Additional code needs to examine the last word in column L in the attached example Q-28534211.xls, after the code has run) matches a lookup table of road types:
"Road, Street, Avenue, Drive, Close, Lane, Crescent, Parade, Place, Way, Row, Mews, Hill, Gardens, Grove, Walk, Square, Boulevard"
AND column I is blank, shift all three fields one to the left.
LVL 1
hawkeye_zzzAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

aikimarkCommented:
Please post some sample lines that are not parsed properly, including how they SHOULD be parsed.
0
hawkeye_zzzAuthor Commented:
See attached.
Shows my instructions weren't accurate, there are two cases not one:
1) if a 'road type' appears in column L, and column I is blank, move J to I, K to J and L to K
2) if a 'road type' appears in column L, and column I is not blank but column J is blank, move K to J and L to K.
I've expressed it from the point of view of the first pass having been done, easier to explain that way, but better if it can all be done in one pass.
Q-28534211-showing-new-issue.xls
0
aikimarkCommented:
The lines you identified now seem to parse correctly. Please go back and check that I haven't introduced any parsing errors with the prior data.
Q-28534211.xls
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

hawkeye_zzzAuthor Commented:
This works beautifully thank you.
There are no errors from this negating the previous, but could I ask please if you could you add code for two last anomalies:

1. If the parsed result has a value in Flat ID, but nothing in Building ID, move the value from Flat ID to Building ID.

2. If the parsed result has an "identifiable road" in 'Address 3', and blank 'Flat ID', check 'Road' is an "identifiable road" Else move 'Building ID' to 'Flat ID', 'Road' to 'Building ID', and 'Address 3' to 'Road'

("Identifiable Road" is what we've already used: Road|Street|Avenue|Drive|Close|Lane|Crescent|Parade|Place|Way|Row|Mews|Hill|Gardens|Grove|Walk|Square|Boulevard)

I attach an updated version of the code, where I've added additional types of Flat ID, now: Flat|Court|Cottage|House|Building|Floor|Lodge|Villas|View|Mansions

I'm happy to make them another question for these two additions if you want. Just ask.
0
aikimarkCommented:
what attachment?
0
hawkeye_zzzAuthor Commented:
Sorry it didn't attach. See now.
Q-28534211-revised.xls
0
aikimarkCommented:
I don't understand your description of "anomalies".  Is this related to any particular rows in the workbook you attached?

What have you done to the workbook?  It looks like the one I posted earlier.
0
hawkeye_zzzAuthor Commented:
I had only changed the code in the attachment above. ID: 40405050, to add additional types of Flat ID

I've now changed the test page to show examples of the 2 anomalies.
hope that's clear
Q-28534211-revised.xls
0
aikimarkCommented:
I need you to think about this in data pattern terms.  In the following examples, the cell data has been converted into a tilde-delimited string.
Anomaly 1: The Cottage~Double Name Way

Since you already defined flat (data) as those containing one of the following strings:
Flat|Court|Cottage|House|Building|Floor|Lodge|Villas|View|Mansions

How is this anomaly going to fit into that pattern?

=============================
I think the anomaly 2 examples have to do with the order in which the patterns are checked.  This is why I needed you to double-check the parsing.  This should be easy to find and fix.
0
aikimarkCommented:
Actually, a new pattern was required.  Anomaly 2 is handled.  Of course, you'll have to check all the rows again.
0
hawkeye_zzzAuthor Commented:
I think you have a more sophisticated sense of these patterns so I may not be thinking clearly from your point of view, but I've been seeing the patterns in terms of looking for the number blocks primarily, whereas you've done it very successfully in terms of key term recognition.

In number block terms:
if there are no number blocks, the Cottage~House~Building Flat ID should go in the Building ID, because there's no building ID number.
If there are two number blocks, the second one goes in Building ID and everything to the left of it goes in Flat ID.

Doing it by key word is a different approach. It has produced a great result, and may be incompatible with the number block way of looking at things. I don't want to rewrite the whole thing, just tidy up these last two - as long as it's not too complex - that hadn't come out quite right.
In other words:
for Anomaly 1, there's no Building ID, so shift the Flat ID into Building ID
for Anomaly 2, there's a road/street name in L and not in K, so we need to shift everything left one column.

in Anomaly 2, 'Lodge' is a flat ID so I'm not sure why it's all ended up one column along. But there are also lots of unique flatnames like 'brookbank' which we can't keyword, and that may be why they've ended up shifted right one column.

Does that help?
0
aikimarkCommented:
Does that help?
Maybe.

if there are no number blocks, the Cottage~House~Building Flat ID should go in the Building ID, because there's no building ID number.
Are there any other rows that have no numbers that might be affected by this?
0
hawkeye_zzzAuthor Commented:
No, as far as I can see, every instance where there is no number block at all, we've dropped the data into Flat ID and in these cases should drop into building ID instead.
0
aikimarkCommented:
It's after midnight here on the East coast.  I'll look at this when I awake.
0
aikimarkCommented:
I reassigned the column field mapping for anomaly 1.  Please test thoroughly.
Q-28534211-revised--1-.xls
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
hawkeye_zzzAuthor Commented:
This is fantastic! thank you! That works really well.

I have two more little anomalies which I have posted to a new question - should be quite quick ones:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28545779.html
0
hawkeye_zzzAuthor Commented:
Fantastic! great code :)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.