Solved

Improve Data Cleaning Macro

Posted on 2014-10-25
17
84 Views
Last Modified: 2014-10-28
The solution to Q-28534211, attached, cleans up electoral roll address data into:
Flat ID
Building ID
Street
Address3
Address4
Postal Town
Postcode

1 anomaly remains to solve:

1. Cases where there is only one number block, and that is the Building ID, the data is erroneously shifted one field to the right. Additional code needs to examine the last word in column L in the attached example Q-28534211.xls, after the code has run) matches a lookup table of road types:
"Road, Street, Avenue, Drive, Close, Lane, Crescent, Parade, Place, Way, Row, Mews, Hill, Gardens, Grove, Walk, Square, Boulevard"
AND column I is blank, shift all three fields one to the left.
0
Comment
Question by:hawkeye_zzz
  • 9
  • 8
17 Comments
 
LVL 45

Expert Comment

by:aikimark
ID: 40404237
Please post some sample lines that are not parsed properly, including how they SHOULD be parsed.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40404363
See attached.
Shows my instructions weren't accurate, there are two cases not one:
1) if a 'road type' appears in column L, and column I is blank, move J to I, K to J and L to K
2) if a 'road type' appears in column L, and column I is not blank but column J is blank, move K to J and L to K.
I've expressed it from the point of view of the first pass having been done, easier to explain that way, but better if it can all be done in one pass.
Q-28534211-showing-new-issue.xls
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40404397
The lines you identified now seem to parse correctly. Please go back and check that I haven't introduced any parsing errors with the prior data.
Q-28534211.xls
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40404970
This works beautifully thank you.
There are no errors from this negating the previous, but could I ask please if you could you add code for two last anomalies:

1. If the parsed result has a value in Flat ID, but nothing in Building ID, move the value from Flat ID to Building ID.

2. If the parsed result has an "identifiable road" in 'Address 3', and blank 'Flat ID', check 'Road' is an "identifiable road" Else move 'Building ID' to 'Flat ID', 'Road' to 'Building ID', and 'Address 3' to 'Road'

("Identifiable Road" is what we've already used: Road|Street|Avenue|Drive|Close|Lane|Crescent|Parade|Place|Way|Row|Mews|Hill|Gardens|Grove|Walk|Square|Boulevard)

I attach an updated version of the code, where I've added additional types of Flat ID, now: Flat|Court|Cottage|House|Building|Floor|Lodge|Villas|View|Mansions

I'm happy to make them another question for these two additions if you want. Just ask.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405046
what attachment?
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405050
Sorry it didn't attach. See now.
Q-28534211-revised.xls
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405071
I don't understand your description of "anomalies".  Is this related to any particular rows in the workbook you attached?

What have you done to the workbook?  It looks like the one I posted earlier.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405612
I had only changed the code in the attachment above. ID: 40405050, to add additional types of Flat ID

I've now changed the test page to show examples of the 2 anomalies.
hope that's clear
Q-28534211-revised.xls
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405630
I need you to think about this in data pattern terms.  In the following examples, the cell data has been converted into a tilde-delimited string.
Anomaly 1: The Cottage~Double Name Way

Since you already defined flat (data) as those containing one of the following strings:
Flat|Court|Cottage|House|Building|Floor|Lodge|Villas|View|Mansions

How is this anomaly going to fit into that pattern?

=============================
I think the anomaly 2 examples have to do with the order in which the patterns are checked.  This is why I needed you to double-check the parsing.  This should be easy to find and fix.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405636
Actually, a new pattern was required.  Anomaly 2 is handled.  Of course, you'll have to check all the rows again.
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405652
I think you have a more sophisticated sense of these patterns so I may not be thinking clearly from your point of view, but I've been seeing the patterns in terms of looking for the number blocks primarily, whereas you've done it very successfully in terms of key term recognition.

In number block terms:
if there are no number blocks, the Cottage~House~Building Flat ID should go in the Building ID, because there's no building ID number.
If there are two number blocks, the second one goes in Building ID and everything to the left of it goes in Flat ID.

Doing it by key word is a different approach. It has produced a great result, and may be incompatible with the number block way of looking at things. I don't want to rewrite the whole thing, just tidy up these last two - as long as it's not too complex - that hadn't come out quite right.
In other words:
for Anomaly 1, there's no Building ID, so shift the Flat ID into Building ID
for Anomaly 2, there's a road/street name in L and not in K, so we need to shift everything left one column.

in Anomaly 2, 'Lodge' is a flat ID so I'm not sure why it's all ended up one column along. But there are also lots of unique flatnames like 'brookbank' which we can't keyword, and that may be why they've ended up shifted right one column.

Does that help?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405655
Does that help?
Maybe.

if there are no number blocks, the Cottage~House~Building Flat ID should go in the Building ID, because there's no building ID number.
Are there any other rows that have no numbers that might be affected by this?
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40405661
No, as far as I can see, every instance where there is no number block at all, we've dropped the data into Flat ID and in these cases should drop into building ID instead.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40405683
It's after midnight here on the East coast.  I'll look at this when I awake.
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 40406559
I reassigned the column field mapping for anomaly 1.  Please test thoroughly.
Q-28534211-revised--1-.xls
0
 
LVL 1

Author Comment

by:hawkeye_zzz
ID: 40408169
This is fantastic! thank you! That works really well.

I have two more little anomalies which I have posted to a new question - should be quite quick ones:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28545779.html
0
 
LVL 1

Author Closing Comment

by:hawkeye_zzz
ID: 40408170
Fantastic! great code :)
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Dynamic Chart Range 13 37
VBA: insert new column and re-adapat string with lower letterS 4 30
Excel VBA User Form Help 21 29
Request to review costing formula 3 36
Introduction While answering a recent question (http:/Q_27311462.html), I created an alternative function to the Excel Concatenate() function that you might find useful.  I tested several solutions and share the results in this article as well as t…
Convert between Excel file formats (.XLS, .XLSX, .XLSM) with/without macro option David Miller (dlmille) Intro Over this past Fall, I've had the opportunity to see several similar requests and have developed a couple related solutions associate…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question