?
Solved

Improve Data Cleaning Macro

Posted on 2014-10-25
17
Medium Priority
?
90 Views
Last Modified: 2014-10-28
The solution to Q-28534211, attached, cleans up electoral roll address data into:
Flat ID
Building ID
Street
Address3
Address4
Postal Town
Postcode

1 anomaly remains to solve:

1. Cases where there is only one number block, and that is the Building ID, the data is erroneously shifted one field to the right. Additional code needs to examine the last word in column L in the attached example Q-28534211.xls, after the code has run) matches a lookup table of road types:
"Road, Street, Avenue, Drive, Close, Lane, Crescent, Parade, Place, Way, Row, Mews, Hill, Gardens, Grove, Walk, Square, Boulevard"
AND column I is blank, shift all three fields one to the left.
0
Comment
Question by:hawkeye_zzz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 8
17 Comments
 
LVL 46

Expert Comment

by:aikimark
ID: 40404237
Please post some sample lines that are not parsed properly, including how they SHOULD be parsed.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40404363
See attached.
Shows my instructions weren't accurate, there are two cases not one:
1) if a 'road type' appears in column L, and column I is blank, move J to I, K to J and L to K
2) if a 'road type' appears in column L, and column I is not blank but column J is blank, move K to J and L to K.
I've expressed it from the point of view of the first pass having been done, easier to explain that way, but better if it can all be done in one pass.
Q-28534211-showing-new-issue.xls
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40404397
The lines you identified now seem to parse correctly. Please go back and check that I haven't introduced any parsing errors with the prior data.
Q-28534211.xls
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40404970
This works beautifully thank you.
There are no errors from this negating the previous, but could I ask please if you could you add code for two last anomalies:

1. If the parsed result has a value in Flat ID, but nothing in Building ID, move the value from Flat ID to Building ID.

2. If the parsed result has an "identifiable road" in 'Address 3', and blank 'Flat ID', check 'Road' is an "identifiable road" Else move 'Building ID' to 'Flat ID', 'Road' to 'Building ID', and 'Address 3' to 'Road'

("Identifiable Road" is what we've already used: Road|Street|Avenue|Drive|Close|Lane|Crescent|Parade|Place|Way|Row|Mews|Hill|Gardens|Grove|Walk|Square|Boulevard)

I attach an updated version of the code, where I've added additional types of Flat ID, now: Flat|Court|Cottage|House|Building|Floor|Lodge|Villas|View|Mansions

I'm happy to make them another question for these two additions if you want. Just ask.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40405046
what attachment?
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405050
Sorry it didn't attach. See now.
Q-28534211-revised.xls
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40405071
I don't understand your description of "anomalies".  Is this related to any particular rows in the workbook you attached?

What have you done to the workbook?  It looks like the one I posted earlier.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405612
I had only changed the code in the attachment above. ID: 40405050, to add additional types of Flat ID

I've now changed the test page to show examples of the 2 anomalies.
hope that's clear
Q-28534211-revised.xls
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40405630
I need you to think about this in data pattern terms.  In the following examples, the cell data has been converted into a tilde-delimited string.
Anomaly 1: The Cottage~Double Name Way

Since you already defined flat (data) as those containing one of the following strings:
Flat|Court|Cottage|House|Building|Floor|Lodge|Villas|View|Mansions

How is this anomaly going to fit into that pattern?

=============================
I think the anomaly 2 examples have to do with the order in which the patterns are checked.  This is why I needed you to double-check the parsing.  This should be easy to find and fix.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40405636
Actually, a new pattern was required.  Anomaly 2 is handled.  Of course, you'll have to check all the rows again.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405652
I think you have a more sophisticated sense of these patterns so I may not be thinking clearly from your point of view, but I've been seeing the patterns in terms of looking for the number blocks primarily, whereas you've done it very successfully in terms of key term recognition.

In number block terms:
if there are no number blocks, the Cottage~House~Building Flat ID should go in the Building ID, because there's no building ID number.
If there are two number blocks, the second one goes in Building ID and everything to the left of it goes in Flat ID.

Doing it by key word is a different approach. It has produced a great result, and may be incompatible with the number block way of looking at things. I don't want to rewrite the whole thing, just tidy up these last two - as long as it's not too complex - that hadn't come out quite right.
In other words:
for Anomaly 1, there's no Building ID, so shift the Flat ID into Building ID
for Anomaly 2, there's a road/street name in L and not in K, so we need to shift everything left one column.

in Anomaly 2, 'Lodge' is a flat ID so I'm not sure why it's all ended up one column along. But there are also lots of unique flatnames like 'brookbank' which we can't keyword, and that may be why they've ended up shifted right one column.

Does that help?
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40405655
Does that help?
Maybe.

if there are no number blocks, the Cottage~House~Building Flat ID should go in the Building ID, because there's no building ID number.
Are there any other rows that have no numbers that might be affected by this?
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405661
No, as far as I can see, every instance where there is no number block at all, we've dropped the data into Flat ID and in these cases should drop into building ID instead.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40405683
It's after midnight here on the East coast.  I'll look at this when I awake.
0
 
LVL 46

Accepted Solution

by:
aikimark earned 2000 total points
ID: 40406559
I reassigned the column field mapping for anomaly 1.  Please test thoroughly.
Q-28534211-revised--1-.xls
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40408169
This is fantastic! thank you! That works really well.

I have two more little anomalies which I have posted to a new question - should be quite quick ones:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28545779.html
0
 
LVL 1

Author Closing Comment

by:hawkeye_zzz
ID: 40408170
Fantastic! great code :)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
This article describes how you can use Custom Document Properties to store settings and other information in your workbook so that they will be available the next time you open the workbook.
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question