Solved

Improve Data Cleaning Macro

Posted on 2014-10-25
17
88 Views
Last Modified: 2014-10-28
The solution to Q-28534211, attached, cleans up electoral roll address data into:
Flat ID
Building ID
Street
Address3
Address4
Postal Town
Postcode

1 anomaly remains to solve:

1. Cases where there is only one number block, and that is the Building ID, the data is erroneously shifted one field to the right. Additional code needs to examine the last word in column L in the attached example Q-28534211.xls, after the code has run) matches a lookup table of road types:
"Road, Street, Avenue, Drive, Close, Lane, Crescent, Parade, Place, Way, Row, Mews, Hill, Gardens, Grove, Walk, Square, Boulevard"
AND column I is blank, shift all three fields one to the left.
0
Comment
Question by:hawkeye_zzz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 8
17 Comments
 
LVL 45

Expert Comment

by:aikimark
ID: 40404237
Please post some sample lines that are not parsed properly, including how they SHOULD be parsed.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40404363
See attached.
Shows my instructions weren't accurate, there are two cases not one:
1) if a 'road type' appears in column L, and column I is blank, move J to I, K to J and L to K
2) if a 'road type' appears in column L, and column I is not blank but column J is blank, move K to J and L to K.
I've expressed it from the point of view of the first pass having been done, easier to explain that way, but better if it can all be done in one pass.
Q-28534211-showing-new-issue.xls
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40404397
The lines you identified now seem to parse correctly. Please go back and check that I haven't introduced any parsing errors with the prior data.
Q-28534211.xls
0
Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40404970
This works beautifully thank you.
There are no errors from this negating the previous, but could I ask please if you could you add code for two last anomalies:

1. If the parsed result has a value in Flat ID, but nothing in Building ID, move the value from Flat ID to Building ID.

2. If the parsed result has an "identifiable road" in 'Address 3', and blank 'Flat ID', check 'Road' is an "identifiable road" Else move 'Building ID' to 'Flat ID', 'Road' to 'Building ID', and 'Address 3' to 'Road'

("Identifiable Road" is what we've already used: Road|Street|Avenue|Drive|Close|Lane|Crescent|Parade|Place|Way|Row|Mews|Hill|Gardens|Grove|Walk|Square|Boulevard)

I attach an updated version of the code, where I've added additional types of Flat ID, now: Flat|Court|Cottage|House|Building|Floor|Lodge|Villas|View|Mansions

I'm happy to make them another question for these two additions if you want. Just ask.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405046
what attachment?
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405050
Sorry it didn't attach. See now.
Q-28534211-revised.xls
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405071
I don't understand your description of "anomalies".  Is this related to any particular rows in the workbook you attached?

What have you done to the workbook?  It looks like the one I posted earlier.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405612
I had only changed the code in the attachment above. ID: 40405050, to add additional types of Flat ID

I've now changed the test page to show examples of the 2 anomalies.
hope that's clear
Q-28534211-revised.xls
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405630
I need you to think about this in data pattern terms.  In the following examples, the cell data has been converted into a tilde-delimited string.
Anomaly 1: The Cottage~Double Name Way

Since you already defined flat (data) as those containing one of the following strings:
Flat|Court|Cottage|House|Building|Floor|Lodge|Villas|View|Mansions

How is this anomaly going to fit into that pattern?

=============================
I think the anomaly 2 examples have to do with the order in which the patterns are checked.  This is why I needed you to double-check the parsing.  This should be easy to find and fix.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405636
Actually, a new pattern was required.  Anomaly 2 is handled.  Of course, you'll have to check all the rows again.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405652
I think you have a more sophisticated sense of these patterns so I may not be thinking clearly from your point of view, but I've been seeing the patterns in terms of looking for the number blocks primarily, whereas you've done it very successfully in terms of key term recognition.

In number block terms:
if there are no number blocks, the Cottage~House~Building Flat ID should go in the Building ID, because there's no building ID number.
If there are two number blocks, the second one goes in Building ID and everything to the left of it goes in Flat ID.

Doing it by key word is a different approach. It has produced a great result, and may be incompatible with the number block way of looking at things. I don't want to rewrite the whole thing, just tidy up these last two - as long as it's not too complex - that hadn't come out quite right.
In other words:
for Anomaly 1, there's no Building ID, so shift the Flat ID into Building ID
for Anomaly 2, there's a road/street name in L and not in K, so we need to shift everything left one column.

in Anomaly 2, 'Lodge' is a flat ID so I'm not sure why it's all ended up one column along. But there are also lots of unique flatnames like 'brookbank' which we can't keyword, and that may be why they've ended up shifted right one column.

Does that help?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405655
Does that help?
Maybe.

if there are no number blocks, the Cottage~House~Building Flat ID should go in the Building ID, because there's no building ID number.
Are there any other rows that have no numbers that might be affected by this?
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405661
No, as far as I can see, every instance where there is no number block at all, we've dropped the data into Flat ID and in these cases should drop into building ID instead.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405683
It's after midnight here on the East coast.  I'll look at this when I awake.
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 40406559
I reassigned the column field mapping for anomaly 1.  Please test thoroughly.
Q-28534211-revised--1-.xls
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40408169
This is fantastic! thank you! That works really well.

I have two more little anomalies which I have posted to a new question - should be quite quick ones:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28545779.html
0
 
LVL 1

Author Closing Comment

by:hawkeye_zzz
ID: 40408170
Fantastic! great code :)
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to quickly and accurately populate Word documents with Excel data, charts and images (including Automated Bookmark generation) David Miller (dlmille) Synopsis In this article you’ll learn how to use ExcelToWord! to copy data,charts, shapes …
This tutorial explains how to create a series of drop-down lists that are dependent upon prior selections to guide (“force”) the user to make the correct selection and reduce data errors within Microsoft Excel. Excel 2010 was used for this tutorial;…
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question