Solved

Improve Data Cleaning Macro

Posted on 2014-10-25
17
82 Views
Last Modified: 2014-10-28
The solution to Q-28534211, attached, cleans up electoral roll address data into:
Flat ID
Building ID
Street
Address3
Address4
Postal Town
Postcode

1 anomaly remains to solve:

1. Cases where there is only one number block, and that is the Building ID, the data is erroneously shifted one field to the right. Additional code needs to examine the last word in column L in the attached example Q-28534211.xls, after the code has run) matches a lookup table of road types:
"Road, Street, Avenue, Drive, Close, Lane, Crescent, Parade, Place, Way, Row, Mews, Hill, Gardens, Grove, Walk, Square, Boulevard"
AND column I is blank, shift all three fields one to the left.
0
Comment
Question by:hawkeye_zzz
  • 9
  • 8
17 Comments
 
LVL 45

Expert Comment

by:aikimark
ID: 40404237
Please post some sample lines that are not parsed properly, including how they SHOULD be parsed.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40404363
See attached.
Shows my instructions weren't accurate, there are two cases not one:
1) if a 'road type' appears in column L, and column I is blank, move J to I, K to J and L to K
2) if a 'road type' appears in column L, and column I is not blank but column J is blank, move K to J and L to K.
I've expressed it from the point of view of the first pass having been done, easier to explain that way, but better if it can all be done in one pass.
Q-28534211-showing-new-issue.xls
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40404397
The lines you identified now seem to parse correctly. Please go back and check that I haven't introduced any parsing errors with the prior data.
Q-28534211.xls
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40404970
This works beautifully thank you.
There are no errors from this negating the previous, but could I ask please if you could you add code for two last anomalies:

1. If the parsed result has a value in Flat ID, but nothing in Building ID, move the value from Flat ID to Building ID.

2. If the parsed result has an "identifiable road" in 'Address 3', and blank 'Flat ID', check 'Road' is an "identifiable road" Else move 'Building ID' to 'Flat ID', 'Road' to 'Building ID', and 'Address 3' to 'Road'

("Identifiable Road" is what we've already used: Road|Street|Avenue|Drive|Close|Lane|Crescent|Parade|Place|Way|Row|Mews|Hill|Gardens|Grove|Walk|Square|Boulevard)

I attach an updated version of the code, where I've added additional types of Flat ID, now: Flat|Court|Cottage|House|Building|Floor|Lodge|Villas|View|Mansions

I'm happy to make them another question for these two additions if you want. Just ask.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405046
what attachment?
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405050
Sorry it didn't attach. See now.
Q-28534211-revised.xls
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405071
I don't understand your description of "anomalies".  Is this related to any particular rows in the workbook you attached?

What have you done to the workbook?  It looks like the one I posted earlier.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405612
I had only changed the code in the attachment above. ID: 40405050, to add additional types of Flat ID

I've now changed the test page to show examples of the 2 anomalies.
hope that's clear
Q-28534211-revised.xls
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 45

Expert Comment

by:aikimark
ID: 40405630
I need you to think about this in data pattern terms.  In the following examples, the cell data has been converted into a tilde-delimited string.
Anomaly 1: The Cottage~Double Name Way

Since you already defined flat (data) as those containing one of the following strings:
Flat|Court|Cottage|House|Building|Floor|Lodge|Villas|View|Mansions

How is this anomaly going to fit into that pattern?

=============================
I think the anomaly 2 examples have to do with the order in which the patterns are checked.  This is why I needed you to double-check the parsing.  This should be easy to find and fix.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405636
Actually, a new pattern was required.  Anomaly 2 is handled.  Of course, you'll have to check all the rows again.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405652
I think you have a more sophisticated sense of these patterns so I may not be thinking clearly from your point of view, but I've been seeing the patterns in terms of looking for the number blocks primarily, whereas you've done it very successfully in terms of key term recognition.

In number block terms:
if there are no number blocks, the Cottage~House~Building Flat ID should go in the Building ID, because there's no building ID number.
If there are two number blocks, the second one goes in Building ID and everything to the left of it goes in Flat ID.

Doing it by key word is a different approach. It has produced a great result, and may be incompatible with the number block way of looking at things. I don't want to rewrite the whole thing, just tidy up these last two - as long as it's not too complex - that hadn't come out quite right.
In other words:
for Anomaly 1, there's no Building ID, so shift the Flat ID into Building ID
for Anomaly 2, there's a road/street name in L and not in K, so we need to shift everything left one column.

in Anomaly 2, 'Lodge' is a flat ID so I'm not sure why it's all ended up one column along. But there are also lots of unique flatnames like 'brookbank' which we can't keyword, and that may be why they've ended up shifted right one column.

Does that help?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405655
Does that help?
Maybe.

if there are no number blocks, the Cottage~House~Building Flat ID should go in the Building ID, because there's no building ID number.
Are there any other rows that have no numbers that might be affected by this?
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405661
No, as far as I can see, every instance where there is no number block at all, we've dropped the data into Flat ID and in these cases should drop into building ID instead.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405683
It's after midnight here on the East coast.  I'll look at this when I awake.
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 40406559
I reassigned the column field mapping for anomaly 1.  Please test thoroughly.
Q-28534211-revised--1-.xls
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40408169
This is fantastic! thank you! That works really well.

I have two more little anomalies which I have posted to a new question - should be quite quick ones:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28545779.html
0
 
LVL 1

Author Closing Comment

by:hawkeye_zzz
ID: 40408170
Fantastic! great code :)
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

What is a Form List Box? (skip if you know this) The forms List Box is the alternative to the ActiveX list box. If you are using excel 2007, you first make sure you have a developer tab (click the Orb)->"Excel Options"->Popular->"Show Developer tab…
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
The view will learn how to download and install SIMTOOLS and FORMLIST into Excel, how to use SIMTOOLS to generate a Monte Carlo simulation of 30 sales calls, and how to calculate the conditional probability based on the results of the Monte Carlo …
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now