Charles
asked on
Macro to Tidy address data in Excel
See attached file showing various types of rough data in columns A to E
For each type (one row per type), the output in terms of clean data is in columns G-N.
Currently I am cleaning this data using a series of text-to-columns, Left/right formulas and sorting. I wonder if there is a better way if someone has a clever way of writing a macro to make it faster.
There are anything from 20,000 rows upwards in each set of data.
Assume numbers in this example file could be any number from 1-100
Assume 'Name' could be any name or word - I'm just showing the number of words and spaces for each type.
It doesn't matter if macro isn't 100%, if it does 70% that helps.
For each type (one row per type), the output in terms of clean data is in columns G-N.
Currently I am cleaning this data using a series of text-to-columns, Left/right formulas and sorting. I wonder if there is a better way if someone has a clever way of writing a macro to make it faster.
There are anything from 20,000 rows upwards in each set of data.
Assume numbers in this example file could be any number from 1-100
Assume 'Name' could be any name or word - I'm just showing the number of words and spaces for each type.
It doesn't matter if macro isn't 100%, if it does 70% that helps.
@Hawkeye_zzz: Can you please attach your example file?
I'm sure an automated solution - based on your description - can be generated. But we'll need to see the data. :-)
-Glenn
I'm sure an automated solution - based on your description - can be generated. But we'll need to see the data. :-)
-Glenn
ASKER
Sorry! don't know why it didn't upload before.
One other thing, these columns are part of a wider set. If any sorting is involved, we'll need to sort all columns together or just add a key column to this data.
Sample-Address-formats.xls
One other thing, these columns are part of a wider set. If any sorting is involved, we'll need to sort all columns together or just add a key column to this data.
Sample-Address-formats.xls
seems to be very messed up data :-)
Thanks for posting an example, but in order to code a good set of logic, it will be better to see an actual sample of address information. Redact it, if needed; there doesn't appear to be any personal information in this set, however.
For example, one issue is in row 11: you have "City" in Address Line 4, but it does not show on the converted fields to the right.
This also appears to be English addresses; if so, having a lookup table of Postal Codes will help strip those out properly (and verify their entry).
-Glenn
For example, one issue is in row 11: you have "City" in Address Line 4, but it does not show on the converted fields to the right.
This also appears to be English addresses; if so, having a lookup table of Postal Codes will help strip those out properly (and verify their entry).
-Glenn
Is this data already in Excel or are you importing it from an ASCII text file?
ASKER
Thank you all for your questions.
The original data is voter data so has to be redacted to be untraceable.
However I have left the spaces and words as they are, and from 70,000 records, given one example of each type I can find. Thus where I have written "name road" this could be holly road, klaxon drive, winterbottom crescent etc. the only real identifier here are the spaces, and the way the numbers fall. So for example the postcode is easy to pick out, it's always the rightmost data and it's always letter/doubleletter-number -space-num ber-letter -letter. But it will not always be in the rightmost column.
Then if in the rest of it there is only one number (1 or 15 or 109 or 14b or 27c) with a space after it, and no other numbers in the group, that is definitely the house identifier, and what follows is the street name. But the street name could be two or three words (Maverly street or Great Toad Street), the only way we'd know is to recognise key terms like: "road, street, crescent, drive, avenue" and if these terms are in the macro I can learn and add new terms as they come up.
If there are two numbers (not including the postcode) such as 14 sunny mansions 27 High street new town, the 27 here would be the house identifier and 14 Sunny Mansions would be the Flat identifier. Other flat identifiers would have to be recognised such as G-F, T-F, U-F, again if they are listed in the macro I can add to the list and learn as I use it. the point is always that two numbers in a group, the rightmost one is the street identifier and anything before it should be dumped in the flat identifier field.
Sorry about the missing 'City' in row 11. City in that case should be in the rightmost field before the postcode.
Yes these are all British addresses. It's not a bad idea Glen, but I don't have a full postcode database of the UK to match against and they are expensive.
I hope that helps. If I don't get the perfect answer here, I am happy to award points and begin a new question with new points to further refine the macro once I've added new information.
The original data is voter data so has to be redacted to be untraceable.
However I have left the spaces and words as they are, and from 70,000 records, given one example of each type I can find. Thus where I have written "name road" this could be holly road, klaxon drive, winterbottom crescent etc. the only real identifier here are the spaces, and the way the numbers fall. So for example the postcode is easy to pick out, it's always the rightmost data and it's always letter/doubleletter-number
Then if in the rest of it there is only one number (1 or 15 or 109 or 14b or 27c) with a space after it, and no other numbers in the group, that is definitely the house identifier, and what follows is the street name. But the street name could be two or three words (Maverly street or Great Toad Street), the only way we'd know is to recognise key terms like: "road, street, crescent, drive, avenue" and if these terms are in the macro I can learn and add new terms as they come up.
If there are two numbers (not including the postcode) such as 14 sunny mansions 27 High street new town, the 27 here would be the house identifier and 14 Sunny Mansions would be the Flat identifier. Other flat identifiers would have to be recognised such as G-F, T-F, U-F, again if they are listed in the macro I can add to the list and learn as I use it. the point is always that two numbers in a group, the rightmost one is the street identifier and anything before it should be dumped in the flat identifier field.
Sorry about the missing 'City' in row 11. City in that case should be in the rightmost field before the postcode.
Yes these are all British addresses. It's not a bad idea Glen, but I don't have a full postcode database of the UK to match against and they are expensive.
I hope that helps. If I don't get the perfect answer here, I am happy to award points and begin a new question with new points to further refine the macro once I've added new information.
Have you looked at address validation software?
You might qualify for a free copy:
http://www.poweredbypaf.com/developer/
There are some free digital town/postal code data sources:
http://www.dangibbs.co.uk/journal/free-uk-postcode-towns-counties-database
================
You can use references, such as Wikipedia to get your postal code city validation.
http://en.wikipedia.org/wiki/List_of_postcode_districts_in_the_United_Kingdom
Note: The larger postal districts have a live link to a detailed page with the towns/cities
Here's a 16 page table of the postal code and city reference
http://www.doogal.co.uk/PostcodeDistricts.php
Here is a tab-separated list of just the cities/towns in the UK:
While your worksheet does show some examples, it doesn't tell me in what form the data currently resides.
With a postal code/city reference, you can parse the right most data from each entry (if it exists). Next, you need to make several passes at the remaining text, looking for the most complex first and then trying to match against the least complex patterns.
You might qualify for a free copy:
http://www.poweredbypaf.com/developer/
There are some free digital town/postal code data sources:
http://www.dangibbs.co.uk/journal/free-uk-postcode-towns-counties-database
================
You can use references, such as Wikipedia to get your postal code city validation.
http://en.wikipedia.org/wiki/List_of_postcode_districts_in_the_United_Kingdom
Note: The larger postal districts have a live link to a detailed page with the towns/cities
Here's a 16 page table of the postal code and city reference
http://www.doogal.co.uk/PostcodeDistricts.php
Here is a tab-separated list of just the cities/towns in the UK:
Abingdon, Oxfordshire Abingdon
Accrington Accrington
Acle Acle
Acton, London Acton
Adlington, Lancashire Adlington
Alcester Alcester
Aldeburgh Aldeburgh
Aldershot Aldershot
Alford, Lincolnshire Alford
Alfreton Alfreton
Alnwick Alnwick
Alsager Alsager
Alston, Cumbria Alston
Alton, Hampshire Alton
Altrincham Altrincham
Amble Amble
Ambleside Ambleside
Amersham Amersham
Amesbury Amesbury
Ampthill Ampthill
Andover, Hampshire Andover
Appleby-in-Westmorland Appleby-in-Westmorland
Arlesey Arlesey
Arundel Arundel
Ashbourne, Derbyshire Ashbourne
Ashburton, Devon Ashburton
Ashby-de-la-Zouch Ashby-de-la-Zouch
Ashby Woulds Ashby Woulds
Ashford, Kent Ashford
Ashington Ashington
Ashton-under-Lyne Ashton-under-Lyne
Askern Askern
Aspatria Aspatria
Atherstone Atherstone
Attleborough Attleborough
Axbridge Axbridge
Axminster Axminster
Aylesbury Aylesbury
Aylsham Aylsham
Bacup Bacup
Bakewell Bakewell
Banbury Banbury
Barking Barking
Barnard Castle Barnard Castle
Barnes, London Barnes
Barnet Barnet
Barnoldswick Barnoldswick
Barnsley Barnsley
Barnstaple Barnstaple
Barrow-in-Furness Barrow-in-Furness
Barton-upon-Humber Barton-upon-Humber
Basingstoke Basingstoke
Batley Batley
Battle, East Sussex Battle
Bawtry Bawtry
Beaconsfield Beaconsfield
Beaminster Beaminster
Bebington Bebington
Beccles Beccles
Beckenham Beckenham
Bedale Bedale
Bedford Bedford
Bedworth Bedworth
Belper Belper
Bentham, North Yorkshire Bentham
Berkeley, Gloucestershire Berkeley
Berkhamsted Berkhamsted
Berwick-upon-Tweed Berwick-upon-Tweed
Beverley Beverley
Bewdley Bewdley
Bexhill-on-Sea Bexhill-on-Sea
Bexley Bexley
Bicester Bicester
Biddulph Biddulph
Bideford Bideford
Biggleswade Biggleswade
Billericay Billericay
Billingham Billingham
Bilston Bilston
Bingham, Nottinghamshire Bingham
Bingley Bingley
Birchwood Birchwood
Birkenhead Birkenhead
Bishop Auckland Bishop Auckland
Bishop's Castle Bishop's Castle
Bishop's Stortford Bishop's Stortford
Bishop's Waltham Bishop's Waltham
Blackburn Blackburn
Blackpool Blackpool
Blackrod Blackrod
Blackwater and Hawley Blackwater and Hawley
Blandford Forum Blandford Forum
Bletchley and Fenny Stratford Bletchley and Fenny Stratford
Blyth, Northumberland Blyth
Bodmin Bodmin
Bognor Regis Bognor Regis
Bollington Bollington
Bolsover Bolsover
Bolton Bolton
Bootle Bootle
Boroughbridge Boroughbridge
Boston, Lincolnshire Boston
Bottesford, Lincolnshire Bottesford
Bourne, Lincolnshire Bourne
Bournemouth Bournemouth
Bovey Tracey Bovey Tracey
Brackley Brackley
Bradford-on-Avon Bradford-on-Avon
Brading Brading
Bradley Stoke Bradley Stoke
Bradninch Bradninch
Braintree, Essex Braintree
Brampton, Carlisle Brampton
Brandon, Suffolk Brandon
Braunstone Town Braunstone Town
Brentford Brentford
Brentwood, Essex Brentwood
Bridgnorth Bridgnorth
Bridgwater Bridgwater
Bridlington Bridlington
Bridport Bridport
Brierfield, Lancashire Brierfield
Brierley Brierley
Brigg Brigg
Brighouse Brighouse
Brightlingsea Brightlingsea
Brixham Brixham
Broadstairs and St Peter's Broadstairs and St Peter's
Bromborough Bromborough
Bromley Bromley
Bromsgrove Bromsgrove
Bromyard and Winslow Bromyard and Winslow
Broseley Broseley
Broughton, Lincolnshire Broughton
Broughton-in-Furness Broughton-in-Furness
Bruton Bruton
Buckfastleigh Buckfastleigh
Buckingham Buckingham
Bude-Stratton Bude-Stratton
Budleigh Salterton Budleigh Salterton
Bulwell Bulwell
Bungay, Suffolk Bungay
Buntingford Buntingford
Burford Burford
Burgess Hill Burgess Hill
Burgh-le-Marsh Burgh-le-Marsh
Burnham-on-Crouch Burnham-on-Crouch
Burnham-on-Sea and Highbridge Burnham-on-Sea and Highbridge
Burnley Burnley
Burntwood Burntwood
Burslem Burslem
Burton Latimer Burton Latimer
Burton upon Trent Burton upon Trent
Bury Bury
Bury St Edmunds Bury St Edmunds
Bushey Bushey
Buxton Buxton
Caistor Caistor
Callington, Cornwall Callington
Calne Calne
Camborne Camborne
Camelford Camelford
Cannock Cannock
Canvey Island Canvey Island
Carnforth Carnforth
Carlton Colville Carlton Colville
Carshalton Carshalton
Carterton, Oxfordshire Carterton
Castle Cary Castle Cary
Castleford Castleford
Chagford Chagford
Chapel-en-le-Frith Chapel-en-le-Frith
Chard, Somerset Chard
Charlbury Charlbury
Chatham, Kent Chatham
Chatteris Chatteris
Cheadle, Staffordshire Cheadle
Cheltenham Cheltenham
Chertsey Chertsey
Chesham Chesham
Cheshunt Cheshunt
Chesterfield Chesterfield
Chester-le-Street Chester-le-Street
Chickerell Chickerell
Chingford Chingford
Chippenham Chippenham
Chipping Campden Chipping Campden
Chipping Norton, Oxfordshire Chipping Norton
Chipping Sodbury Chipping Sodbury
Chorley Chorley
Chorleywood Chorleywood
Christchurch, Dorset Christchurch
Chudleigh Chudleigh
Chulmleigh Chulmleigh
Church Stretton Church Stretton
Cinderford Cinderford
Cirencester Cirencester
Clare, Suffolk Clare
Clay Cross Clay Cross
Cleator Moor Cleator Moor
Cleethorpes Cleethorpes
Cleobury Mortimer Cleobury Mortimer
Clevedon Clevedon
Clitheroe Clitheroe
Clun Clun
Cockermouth Cockermouth
Coggeshall Coggeshall
Colburn, North Yorkshire Colburn
Colchester Colchester
Coleford, Gloucestershire Coleford
Coleshill, Warwickshire Coleshill
Colne Colne
Colyton, Devon Colyton
Congleton Congleton
Conisbrough Conisbrough
Corbridge Corbridge
Corby Corby
Corringham, Essex Corringham
Corsham Corsham
Cotgrave Cotgrave
Cowes Cowes
Coulsdon Coulsdon
Cramlington Cramlington
Cranbrook, Kent Cranbrook
Craven Arms Craven Arms
Crawley Crawley
Crediton Crediton
Crewe Crewe
Crewkerne Crewkerne
Cricklade Cricklade
Cromer Cromer
Crosby, Merseyside Crosby
Crowborough Crowborough
Croydon Croydon
Crowland Crowland
Crowle, Lincolnshire Crowle
Cullompton Cullompton
Dagenham Dagenham
Dalton Town with Newton Dalton Town with Newton
Darley Dale Darley Dale
Darlington Darlington
Dartford Dartford
Dartmouth, Devon Dartmouth
Darwen Darwen
Daventry Daventry
Dawley Dawley
Dawlish Dawlish
Deal, Kent Deal
Dereham Dereham
Desborough Desborough
Devizes Devizes
Dewsbury Dewsbury
Didcot Didcot
Dinnington St John's Dinnington St John's
Diss Diss
Doncaster Doncaster
Dorchester, Dorset Dorchester
Dorking Dorking
Dover Dover
Dovercourt Dovercourt
Downham Market Downham Market
Driffield Driffield
Droitwich Spa Droitwich Spa
Dronfield Dronfield
Dudley Dudley
Dukinfield Dukinfield
Dulverton Dulverton
Dunstable Dunstable
Dunwich Dunwich
Dursley Dursley
Ealing Ealing
Earl Shilton Earl Shilton
Earley Earley
Easingwold Easingwold
East Cowes East Cowes
East Grinstead East Grinstead
East Ham East Ham
Eastbourne Eastbourne
Eastleigh Eastleigh
Retford East Retford
Eastwood, Nottinghamshire Eastwood
Eccles, Greater Manchester Eccles
Eccleshall Eccleshall
Edenbridge, Kent Edenbridge
Edgware Edgware
Edmonton, London Edmonton
Egremont, Cumbria Egremont
Elland Elland
Ellesmere, Shropshire Ellesmere
Ellesmere Port Ellesmere Port
Elstree and Borehamwood Elstree and Borehamwood
Emsworth Emsworth
Enfield Town Enfield
Epping Epping
Epworth, Lincolnshire Epworth
Erith Erith
Eton, Berkshire Eton
Evesham Evesham
Exmouth Exmouth
Eye, Suffolk Eye
Fairford Fairford
Fakenham Fakenham
Falmouth, Cornwall Falmouth
Fareham Fareham
Faringdon Faringdon
Farnham Farnham
Faversham Faversham
Fazeley Fazeley
Featherstone Featherstone
Felixstowe Felixstowe
Ferndown Ferndown
Ferryhill Ferryhill
Filey Filey
Filton Filton
Finchley Finchley
Fleet, Hampshire Fleet
Fleetwood Fleetwood
Flitwick Flitwick
Folkestone Folkestone
Fordbridge Fordbridge
Fordingbridge Fordingbridge
Fordwich Fordwich
Fowey Fowey
Framlingham Framlingham
Frinton and Walton Frinton and Walton
Frodsham Frodsham
Frome Frome
Gainsborough, Lincolnshire Gainsborough
Garstang Garstang
Gateshead Gateshead
Gillingham, Dorset Gillingham
Gillingham, Kent Gillingham
Glastonbury Glastonbury
Glossop Glossop
Godalming Godalming
Godmanchester Godmanchester
Goole Goole
Gorleston Gorleston
Gosport Gosport
Grange-over-Sands Grange-over-Sands
Grantham Grantham
Gravesend, Kent Gravesend
Grays Grays
Great Dunmow Great Dunmow
Great Torrington Great Torrington
Great Yarmouth Great Yarmouth
Greater Willington Greater Willington
Grimsby Grimsby
Guildford Guildford
Guisborough Guisborough
Hadleigh, Suffolk Hadleigh
Hailsham Hailsham
Halesowen Halesowen
Halesworth Halesworth
Halifax, West Yorkshire Halifax
Halstead Halstead
Haltwhistle Haltwhistle
Redenhall with Harleston Redenhall with Harleston
Harlow Harlow
Harpenden Harpenden
Harrogate Harrogate
Harrow, London Harrow
Hartland, Devon Hartland
Hartlepool Hartlepool
Harwich Harwich
Harworth and Bircotes Harworth and Bircotes
Haslemere Haslemere
Haslingden Haslingden
Hastings Hastings
Hatfield, Hertfordshire Hatfield
Hatherleigh Hatherleigh
Havant Havant
Haverhill, Suffolk Haverhill
Haxby Haxby
Hayle Hayle
Haywards Heath Haywards Heath
Heanor and Loscoe Heanor and Loscoe
Heathfield, East Sussex Heathfield
Hebden Royd Hebden Royd
Hedge End Hedge End
Hednesford Hednesford
Hedon Hedon
Helmsley Helmsley
Helston Helston
Hemel Hempstead Hemel Hempstead
Hemsworth Hemsworth
Hendon Hendon
Henley-in-Arden Henley-in-Arden
Henley-on-Thames Henley-on-Thames
Hertford Hertford
Hessle Hessle
Hetton-le-Hole Hetton
Hexham Hexham
Heywood, Greater Manchester Heywood
Higham Ferrers Higham Ferrers
Highworth Highworth
High Wycombe High Wycombe
Hinckley Hinckley
Hingham, Norfolk Hingham
Hitchin Hitchin
Hoddesdon Hoddesdon
Holbeach Holbeach
Holsworthy, Devon Holsworthy
Holt, Norfolk Holt
Honiton Honiton
Horley Horley
Horncastle, Lincolnshire Horncastle
Hornsea Hornsea
Hornsey Hornsey
Horsforth Horsforth
Horsham Horsham
Horwich Horwich
Houghton Regis Houghton Regis
Howden Howden
Huddersfield Huddersfield
Hungerford Hungerford
Hunstanton Hunstanton
Huntingdon Huntingdon
Hyde, Greater Manchester Hyde
Hythe, Kent Hythe
Ilford Ilford
Ilfracombe Ilfracombe
Ilkeston Ilkeston
Ilkley Ilkley
Ilminster Ilminster
Immingham Immingham
Ingleby Barwick Ingleby Barwick
Ipswich Ipswich
Irthlingborough Irthlingborough
Ivybridge Ivybridge
Jarrow Jarrow
Keighley Keighley
Kempston Kempston
Kendal Kendal
Kenilworth Kenilworth
Kesgrave Kesgrave
Keswick, Cumbria Keswick
Kettering Kettering
Keynsham Keynsham
Kidderminster Kidderminster
Kidsgrove Kidsgrove
Kimberley, Nottinghamshire Kimberley
Kingsbridge Kingsbridge
King's Lynn King's Lynn
Kingston-upon-Thames Kingston-upon-Thames
Kington, Herefordshire Kington
Kirkby-in-Ashfield Kirkby-in-Ashfield
Kirkby Lonsdale Kirkby Lonsdale
Kirkby Stephen Kirkby Stephen
Kirkbymoorside Kirkbymoorside
Kirkham, Lancashire Kirkham
Kirton-in-Lindsey Kirton-in-Lindsey
Knaresborough Knaresborough
Knutsford Knutsford
Langport Langport
Launceston, Cornwall Launceston
Leatherhead Leatherhead
Lechlade Lechlade
Ledbury Ledbury
Leek, Staffordshire Leek
Leigh, Greater Manchester Leigh
Leighton-Linslade Leighton-Linslade
Leigh-on-Sea Leigh-on-Sea
Leiston Leiston
Leominster Leominster
Letchworth Garden City Letchworth Garden City
Lewes Lewes
Leyburn Leyburn
Leyton Leyton
Liskeard Liskeard
Littlehampton Littlehampton
Loddon, Norfolk Loddon
Loftus, North Yorkshire Loftus
Long Sutton, Lincolnshire Long Sutton
Longridge Longridge
Longtown, Cumbria Longtown
Looe Looe
Lostwithiel Lostwithiel
Loughborough Loughborough
Loughton Loughton
Louth, Lincolnshire Louth
Lowestoft Lowestoft
Ludgershall, Wiltshire Ludgershall
Ludlow Ludlow
Luton Luton
Lutterworth Lutterworth
Lydd Lydd
Lydney Lydney
Lyme Regis Lyme Regis
Lynton and Lynmouth Lynton and Lynmouth
Lytham St Annes Lytham St Annes
Mablethorpe and Sutton Mablethorpe and Sutton
Macclesfield Macclesfield
Madeley, Shropshire Madeley
Maghull Maghull
Maidenhead Maidenhead
Maidstone Maidstone
Maldon, Essex Maldon
Malmesbury, Wiltshire Malmesbury
Maltby, South Yorkshire Maltby
Malton, North Yorkshire Malton
Malvern, Worcestershire Malvern
Manningtree Manningtree
Mansfield Mansfield
Marazion Marazion
March, Cambridgeshire March
Margate Margate
Market Bosworth Market Bosworth
Market Deeping Market Deeping
Market Drayton Market Drayton
Market Harborough Market Harborough
Market Rasen Market Rasen
Market Weighton Market Weighton
Marlborough, Wiltshire Marlborough
Marlow, Buckinghamshire Marlow
Maryport Maryport
Masham Masham
Matlock, Derbyshire Matlock
Medlar with Wesham Medlar with Wesham
Melksham Melksham
Meltham Meltham
Melton Mowbray Melton Mowbray
Mere, Wiltshire Mere
Mexborough Mexborough
Middleham Middleham
Middlesbrough Middlesbrough
Middleton, Greater Manchester Middleton
Middlewich Middlewich
Midhurst Midhurst
Midsomer Norton Midsomer Norton
Mildenhall, Suffolk Mildenhall
Millom Millom
Minchinhampton Minchinhampton
Minehead Minehead
Minster, Swale Minster
Mirfield Mirfield
Mitcham Mitcham
Mitcheldean Mitcheldean
Morecambe Morecambe
Moretonhampstead Moretonhampstead
Moreton-in-Marsh Moreton-in-Marsh
Morley, West Yorkshire Morley
Morpeth, Northumberland Morpeth
Mossley Mossley
Much Wenlock Much Wenlock
Nailsea Nailsea
Nailsworth Nailsworth
Nantwich Nantwich
Needham Market Needham Market
Nelson, Lancashire Nelson
Neston Neston
New Alresford New Alresford
New Mills New Mills
New Milton New Milton
New Romney New Romney
Newark-on-Trent Newark-on-Trent
Newbiggin-by-the-Sea Newbiggin-by-the-Sea
Newbury, Berkshire Newbury
Newcastle-under-Lyme Newcastle-under-Lyme
Newent Newent
Newhaven, East Sussex Newhaven
Newlyn Newlyn
Newmarket, Suffolk Newmarket
Newport, Isle of Wight Newport
Newport, Shropshire Newport
Newport Pagnell Newport Pagnell
Newquay Newquay
Newton Abbot Newton Abbot
Newton-le-Willows Newton-le-Willows
Normanton, West Yorkshire Normanton
North Hykeham North Hykeham
North Petherton North Petherton
North Tawton North Tawton
North Walsham North Walsham
Northallerton Northallerton
Northam, Devon Northam
Northampton Northampton
Northfleet Northfleet
Northleach with Eastington Northleach with Eastington
Northwich Northwich
Norton-on-Derwent Norton-on-Derwent
Nuneaton Nuneaton
Oakengates Oakengates
Oakham Oakham
Okehampton Okehampton
Oldbury, West Midlands Oldbury
Oldham Oldham
Ollerton and Boughton Ollerton and Boughton
Olney, Buckinghamshire Olney
Chipping Ongar Ongar
Orford, Suffolk Orford
Ormskirk Ormskirk
Ossett Ossett
Oswestry Oswestry
Otley Otley
Ottery St Mary Ottery St Mary
Oundle Oundle
Paddock Wood Paddock Wood
Padiham Padiham
Padstow Padstow
Paignton Paignton
Painswick Painswick
Partington, Greater Manchester Partington
Patchway Patchway
Pateley Bridge Pateley Bridge
Peacehaven Peacehaven
Penistone Penistone
Penkridge Penkridge
Penrith, Cumbria Penrith
Penryn, Cornwall Penryn
Penwortham Penwortham
Penzance Penzance
Pershore Pershore
Peterlee Peterlee
Petersfield Petersfield
Petworth Petworth
Pickering, North Yorkshire Pickering
Pocklington Pocklington
Polegate Polegate
Pontefract Pontefract
Ponteland Ponteland
Poole Poole
Porthleven Porthleven
Portishead and North Weston Portishead and North Weston
Isle of Portland Portland
Potton Potton
Poynton-with-Worth Poynton-with-Worth
Preesall Preesall
Prescot Prescot
Princes Risborough Princes Risborough
Prudhoe Prudhoe
Pudsey Pudsey
Queenborough Queenborough-in-Sheppey
Radstock Radstock
Ramsey, Cambridgeshire Ramsey
Ramsgate Ramsgate
Raunds Raunds
Rawtenstall Rawtenstall
Rayleigh, Essex Rayleigh
Reading, Berkshire Reading
Redcar Redcar
Redruth Redruth
Reepham, Norfolk Reepham
Reigate Reigate
Richmond, London Richmond
Richmond, North Yorkshire Richmond
Ringwood Ringwood
Ripley, Derbyshire Ripley
Ripon Ripon
Rochdale Rochdale
Rochester, Kent Rochester
Rochford Rochford
Romford Romford
Romsey Romsey
Ross-on-Wye Ross-on-Wye
Rothbury Rothbury
Rotherham Rotherham
Rothwell, Northamptonshire Rothwell
Rothwell, West Yorkshire Rothwell
Rowley Regis Rowley Regis
Royal Leamington Spa Royal Leamington Spa
Royal Tunbridge Wells Royal Tunbridge Wells
Royal Wootton Bassett Royal Wootton Bassett
Royston, Hertfordshire Royston
Rugby, Warwickshire Rugby
Rugeley Rugeley
Rushden Rushden
Ryde Ryde
Rye, East Sussex Rye
Saffron Walden Saffron Walden
St Austell St Austell
St Blazey St Blaise
St Columb Major St Columb Major
St Helens, Merseyside St Helens
St Ives, Cambridgeshire St Ives
St Ives, Cornwall St Ives
St Just-in-Penwith St Just-in-Penwith
St Mawes St Mawes
St Neots St Neots
Salcombe Salcombe
Sale, Greater Manchester Sale
Saltash Saltash
Sandbach Sandbach
Sandhurst, Berkshire Sandhurst
Sandiacre Sandiacre
Sandown Sandown
Sandwich, Kent Sandwich
Sandy, Bedfordshire Sandy
Sawbridgeworth Sawbridgeworth
Saxmundham Saxmundham
Scarborough, North Yorkshire Scarborough
Scunthorpe Scunthorpe
Seaford, East Sussex Seaford
Seaham Seaham
Seaton, Devon Seaton
Sedbergh Sedbergh
Selby Selby
Selsey Selsey
Settle Settle
Sevenoaks Sevenoaks
Shaftesbury Shaftesbury
Shanklin Shanklin
Shefford, Bedfordshire Shefford
Shepshed Shepshed
Shepton Mallet Shepton Mallet
Sherborne Sherborne
Sheringham Sheringham
Shifnal Shifnal
Shildon Shildon
Shipston-on-Stour Shipston-on-Stour
Shirebrook Shirebrook
Shoreham-by-Sea Shoreham-by-Sea
Shrewsbury Shrewsbury
Sidmouth Sidmouth
Silloth Silloth
Silsden Silsden
Sittingbourne Sittingbourne
Skegness Skegness
Skelmersdale Skelmersdale
Skelton-in-Cleveland Skelton-in-Cleveland
Skipton Skipton
Sleaford Sleaford
Slough Slough
Smethwick Smethwick
Snaith and Cowick Snaith and Cowick
Snodland Snodland
Soham Soham
Solihull Solihull
Somerton Somerton
South Cave South Cave
South Elmsall South Elmsall
South Kirkby and Moorthorpe South Kirkby and Moorthorpe
South Molton South Molton
South Petherton South Petherton
South Shields South Shields
South Woodham Ferrers South Woodham Ferrers
Southam Southam
Southall Southall
Southborough, Kent Southborough
Southend-on-Sea Southend-on-Sea
Southgate, London Southgate
Southminster Southminster
Southport Southport
Southsea Southsea
Southwell, Nottinghamshire Southwell
Southwick, Hampshire Southwick
Southwold Southwold
Spalding, Lincolnshire Spalding
Spennymoor Spennymoor
Spilsby Spilsby
Stafford Stafford
Staines-upon-Thames Staines-upon-Thames
Stainforth, South Yorkshire Stainforth
Stalbridge Stalbridge
Stalham Stalham
Stalybridge Stalybridge
Stamford, Lincolnshire Stamford
Stanley, County Durham Stanley
Stanhope, County Durham Stanhope
Stapleford, Nottinghamshire Stapleford
Staveley, Derbyshire Staveley
Stevenage Stevenage
Steyning Steyning
St Mary Cray St Mary Cray
Stockport Stockport
Stocksbridge Stocksbridge
Stockton-on-Tees Stockton-on-Tees
Stone, Staffordshire Stone
Stonehouse, Gloucestershire Stonehouse
Stony Stratford Stony Stratford
Stotfold Stotfold
Stourbridge Stourbridge
Stourport-on-Severn Stourport-on-Severn
Stowmarket Stowmarket
Stow-on-the-Wold Stow-on-the-Wold
Stratford-upon-Avon Stratford-upon-Avon
Stretford Stretford
Strood Strood
Stroud, Gloucestershire Stroud
Sturminster Newton Sturminster Newton
Sudbury, Suffolk Sudbury
Surbiton Surbiton
Sutton, London Sutton
Sutton Coldfield Sutton Coldfield
Swaffham Swaffham
Swanage Swanage
Swanley Swanley
Swanscombe and Greenhithe Swanscombe and Greenhithe
Swindon Swindon
Syston Syston
Tadcaster Tadcaster
Tadley Tadley
Tamworth, Staffordshire Tamworth
Taunton Taunton
Tavistock, Devon Tavistock
Teignmouth Teignmouth
Telscombe Telscombe
Tenbury Wells Tenbury Wells
Tenterden Tenterden
Tetbury Tetbury
Tewkesbury Tewkesbury
Thame Thame
Thatcham Thatcham
Thaxted Thaxted
Thetford Thetford
Thirsk Thirsk
Thornaby-on-Tees Thornaby-on-Tees
Thornbury, South Gloucestershire Thornbury
Thorne, South Yorkshire Thorne
Thorpe St Andrew Thorpe St Andrew
Thrapston Thrapston
Tickhill Tickhill
Tidworth Tidworth
Tipton Tipton
Tisbury, Wiltshire Tisbury
Tiverton, Devon Tiverton
Todmorden Todmorden
Tonbridge Tonbridge
Topsham, Devon Topsham
Torpoint Torpoint
Torquay Torquay
Totnes Totnes
Tottenham Tottenham
Totton and Eling Totton and Eling
Tow Law Tow Law
Towcester Towcester
Tring Tring
Trowbridge Trowbridge
Twickenham Twickenham
Tynemouth Tynemouth
Uckfield Uckfield
Ulverston Ulverston
Uppingham Uppingham
Upton-upon-Severn Upton-upon-Severn
Uttoxeter Uttoxeter
Uxbridge Uxbridge
Ventnor Ventnor
Verwood Verwood
Wadebridge Wadebridge
Wadhurst Wadhurst
Wainfleet, Lincolnshire Wainfleet All Saints
Wallasey Wallasey
Wallsend Wallsend
Wallingford, Oxfordshire Wallingford
Walsall Walsall
Waltham Abbey (town) Waltham Abbey
Waltham Cross Waltham Cross
Walthamstow Walthamstow
Walton-on-Thames Walton-on-Thames
Wantage Wantage
Ware, Hertfordshire Ware
Wareham, Dorset Wareham
Warminster Warminster
Warrington Warrington
Warwick Warwick
Watchet Watchet
Watford Watford
Wath-upon-Dearne Wath-upon-Dearne
Watlington, Oxfordshire Watlington
Watton, Norfolk Watton
Wellingborough Wellingborough
Wellington, Shropshire Wellington
Wellington, Somerset Wellington
Wells-next-the-Sea Wells-next-the-Sea
Wem Wem
Wembley Wembley
Wendover Wendover
West Bedlington West Bedlington
West Bromwich West Bromwich
West Ham West Ham
West Malling West Malling
West Mersea West Mersea
West Tilbury West Tilbury
Westbury, Wiltshire Westbury
Westerham Westerham
Westhoughton Westhoughton
Weston-super-Mare Weston-super-Mare
Wetherby Wetherby
Weybridge Weybridge
Weymouth, Dorset Weymouth
Whaley Bridge Whaley Bridge
Whitby Whitby
Whitchurch, Hampshire Whitchurch
Whitchurch, Shropshire Whitchurch
Whitehaven Whitehaven
Whitehill, Hampshire Whitehill
Whitnash Whitnash
Whittlesey Whittlesey
Whitworth, Lancashire Whitworth
Wickham Wickham
Wickwar Wickwar
Widnes Widnes
Wigan Wigan
Wigton Wigton
Willenhall Willenhall
Willesden Willesden
Wilton, Wiltshire Wilton
Wimbledon, London Wimbledon
Wimborne Minster Wimborne Minster
Wincanton Wincanton
Winchcombe Winchcombe
Winchelsea Winchelsea
Windermere, Cumbria Windermere
Windsor, Berkshire Windsor
Winsford Winsford
Winslow, Buckinghamshire Winslow
Winterton, Lincolnshire Winterton
Wirksworth Wirksworth
Wisbech Wisbech
Witham Witham
Withernsea Withernsea
Witney Witney
Wiveliscombe Wiveliscombe
Wivenhoe Wivenhoe
Woburn, Bedfordshire Woburn
Woburn Sands Woburn Sands
Woking Woking
Wokingham Wokingham
Wolsingham Wolsingham
Wolverton and Greenleys Wolverton and Greenleys
Wood Green Wood Green
Woodbridge, Suffolk Woodbridge
Woodley, Berkshire Woodley
Woodstock, Oxfordshire Woodstock
Wooler Wooler
Workington Workington
Worksop Worksop
Worthing Worthing
Wotton-under-Edge Wotton-under-Edge
Wragby Wragby
Wymondham Wymondham
================While your worksheet does show some examples, it doesn't tell me in what form the data currently resides.
With a postal code/city reference, you can parse the right most data from each entry (if it exists). Next, you need to make several passes at the remaining text, looking for the most complex first and then trying to match against the least complex patterns.
aikimark's suggestions are just the thing you're going to need to get you from, say a 50% to 60% hit rate to probably over 90%. I think it's totally worth looking into.
In the meantime, I've made some progress strictly on a parsing level. I know enough about the UK Postal Code nomenclature so that those codes can be identified by general alphanumeric matching. Fortunately, your raw data isn't too badly convoluted to pull that out. Where you'll start having issues is properly allocating space for missing cities/towns or improper street/flat assignment.
I'll post an example workbook tomorrow with basic formulas set up to get you started.
-Glenn
In the meantime, I've made some progress strictly on a parsing level. I know enough about the UK Postal Code nomenclature so that those codes can be identified by general alphanumeric matching. Fortunately, your raw data isn't too badly convoluted to pull that out. Where you'll start having issues is properly allocating space for missing cities/towns or improper street/flat assignment.
I'll post an example workbook tomorrow with basic formulas set up to get you started.
-Glenn
To see an example of multiple patterns being applied to data, look through this question thread about parsing citation data.
http:Q_28254706.html
It is likely that your records that have no town identifier will probably need to be manually parsed or placed into a "bad" list.
http:Q_28254706.html
It is likely that your records that have no town identifier will probably need to be manually parsed or placed into a "bad" list.
ASKER
Thanks for all your comments. aikimark that's very helpful, I've downloaded some PAF files (which seem to be just samples) and that should be useful, but only as a verification step and I'll have to clean up the addresses first.
It's clear I haven't phrased my question well. Let me therefore have another go and break it down into some smaller easier chunks that should more obviously turn into the macro I'm looking for:
Assume 7 columns (A to G) of raw data, and 7 columns (H to N) of Output data. (Flat Identifier, House Identifier, Street Name, 2nd Line, Locality, Post Town, Postcode)
1. Find the postcode and place in column N.
- this might be as simple as taking the data in the rightmost data-containing column of A-G
- It might add some basic verification to check that field is a postcode.
2. Where there is one clear number block in the leftmost field, (1, 22b, 115, etc), drop that number block into column I, and the rest of that field into column J, and the remaining fields dropped into J-M unchanged.
3. Where there are two clear number blocks in the raw data, take everything to the left of the 2nd number block and place in column H, then run step 2 on the rest
4. Where the single number block is not to the far left of the raw data, split everything to the left of it off into column H, then run step 2 on the rest.
5. Where there is no number block, place the first field in column H, leave I blank, and put the next data in column J and following columns.
6. If the sole number block has the word "flat" before it, place that field in column H and leave column I blank
7. If there is a sole number block, but the next but one word after that is from a set list (named list called "HouseBlockNames") which would include words like: Mansions, Court, House etc., then that whole field goes in column H rather than being split over I and J.
8. The icing on the cake would be to take take the PAF file, and verify the now-cleaned data against it.
Add a help note to explain how to change the macro if raw data is in eg. columns X onwards and the output is columns AF onwards.
Hopefully these steps can be recreated in a macro that I can run on the data carrying out most of the tidying up with one button click.
It's clear I haven't phrased my question well. Let me therefore have another go and break it down into some smaller easier chunks that should more obviously turn into the macro I'm looking for:
Assume 7 columns (A to G) of raw data, and 7 columns (H to N) of Output data. (Flat Identifier, House Identifier, Street Name, 2nd Line, Locality, Post Town, Postcode)
1. Find the postcode and place in column N.
- this might be as simple as taking the data in the rightmost data-containing column of A-G
- It might add some basic verification to check that field is a postcode.
2. Where there is one clear number block in the leftmost field, (1, 22b, 115, etc), drop that number block into column I, and the rest of that field into column J, and the remaining fields dropped into J-M unchanged.
3. Where there are two clear number blocks in the raw data, take everything to the left of the 2nd number block and place in column H, then run step 2 on the rest
4. Where the single number block is not to the far left of the raw data, split everything to the left of it off into column H, then run step 2 on the rest.
5. Where there is no number block, place the first field in column H, leave I blank, and put the next data in column J and following columns.
6. If the sole number block has the word "flat" before it, place that field in column H and leave column I blank
7. If there is a sole number block, but the next but one word after that is from a set list (named list called "HouseBlockNames") which would include words like: Mansions, Court, House etc., then that whole field goes in column H rather than being split over I and J.
8. The icing on the cake would be to take take the PAF file, and verify the now-cleaned data against it.
Add a help note to explain how to change the macro if raw data is in eg. columns X onwards and the output is columns AF onwards.
Hopefully these steps can be recreated in a macro that I can run on the data carrying out most of the tidying up with one button click.
In what form is the RAW DATA?
ASKER
Excel, just like the attached (same file as before with a few errors corrected)
In the main data there are additional columns for name etc. otherwise the same.
Sample-Address-formats.xls
In the main data there are additional columns for name etc. otherwise the same.
Sample-Address-formats.xls
@hawkeye
That worksheet shows mapping. Are you saying that the raw data is a mapping example?
The workbook is your attempt to communicate the different possible address formats to us and their mapping.
I would expect to have you tell us that the worksheet is sequentially populated cells or that the raw data is an ASCII text file or that the raw data is a tab (or other character) delimited file
That worksheet shows mapping. Are you saying that the raw data is a mapping example?
The workbook is your attempt to communicate the different possible address formats to us and their mapping.
I would expect to have you tell us that the worksheet is sequentially populated cells or that the raw data is an ASCII text file or that the raw data is a tab (or other character) delimited file
ASKER
Hi Aikimark,
I attach a file with a number of tabbed examples. I've left one anonymised row in each example.
I received these as tab delimited files which I've opened and saved in Excel. If there's a way of operating on them as raw tab files please explain how I would do that. I assumed I would have to open them in excel to run excel macros on them.
One problem I'm having is that there are often too many rows of data for Excel so when I open in Excel I am losing data. I'm assuming I have to open in a text programme and chunk the data into separate files. Another option is to upload directly into Salesforce (Force.com platform) which is the database I'm using ultimately, and write code to operate in salesforce. The wider question is maybe I should be asking this question under programming algorithms directly for salesforce, but I am very familiar with excel and I don't see any salesforce headings so I have planned to clean in excel and upload, and thus keep a good understanding of the processes so I can spot errors. I am currently cleaning all the data in excel manually using a combination of sorting, text-to-columns, left and right functions, delete spaces etc. I was hoping here to simply speed up that very manual process with a macro to do the majority of that work but ultimately, once I understand the programming steps, I would try and get an algorithm written in Salesforce and upload raw data. I'd rather get the process working in Excel where I can understand it and follow the steps and track the output results as a first stage.
Does that answer your question?
Example.xlsx
I attach a file with a number of tabbed examples. I've left one anonymised row in each example.
I received these as tab delimited files which I've opened and saved in Excel. If there's a way of operating on them as raw tab files please explain how I would do that. I assumed I would have to open them in excel to run excel macros on them.
One problem I'm having is that there are often too many rows of data for Excel so when I open in Excel I am losing data. I'm assuming I have to open in a text programme and chunk the data into separate files. Another option is to upload directly into Salesforce (Force.com platform) which is the database I'm using ultimately, and write code to operate in salesforce. The wider question is maybe I should be asking this question under programming algorithms directly for salesforce, but I am very familiar with excel and I don't see any salesforce headings so I have planned to clean in excel and upload, and thus keep a good understanding of the processes so I can spot errors. I am currently cleaning all the data in excel manually using a combination of sorting, text-to-columns, left and right functions, delete spaces etc. I was hoping here to simply speed up that very manual process with a macro to do the majority of that work but ultimately, once I understand the programming steps, I would try and get an algorithm written in Salesforce and upload raw data. I'd rather get the process working in Excel where I can understand it and follow the steps and track the output results as a first stage.
Does that answer your question?
Example.xlsx
too many rowsYou have more than 1 million rows of data?!? If so, you should be using a database.
You have an alignment problem, too. Examples 1,3,5 seem to be the same and Examples 2,6 seem to be the same. Examples 4 and 7 are two different sets of data (from each other and the other two sets).
Elector Number Prefix Elector Number Elector Number Suffix Elector Markers Elector DOB Elector Name PostCode Address1 Address2 Address3 Address4 Address5 Address6
Elector Number Prefix Elector Number Elector Number Suffix Elector Markers Elector DOB Elector Surname Elector Forename PostCode Address1 Address2 Address3 Address4 Address5 Address6
Elector Number Prefix Elector Number Elector Number Suffix Elector Markers Elector DOB Elector Name PostCode Address1 Address2 Address3 Address4 Address5 Address6
Elector Number Prefix Elector Number Elector Number Suffix Elector Markers Elector DOB Elector Name PostCode Address1 Address2 Address3 Address4 Address5 Address6
Elector Number Prefix Elector Number Elector Number Suffix Elector Markers Elector DOB Elector Name PostCode Address1 Address2 Address3 Address4 Address5 Address6
Elector Number Prefix Elector Number Elector Number Suffix Elector Markers Elector DOB Elector Surname Elector Forename PostCode Address1 Address2 Address3 Address4 Address5 Address6
Import Code Constituency Code Polling District Code Full Roll Number Title First Name Initials Last Name Suffix Date Of Birth Gender Address (all one line) Mobile Telephone Number Home Telephone Number Email Address Mosaic Code Address Line 1 Address Line 2 Address Line 3 Address Line 4 Address Line 5 Address Line 6 Postcode House Identifier Street Name Election (Last Known VI) Voting Intention (Last Time) Voting Intention (Last Known) Voting Intention (This Time)
The good news is that each set has a PostCode column. This means that we won't have to look through the columns to find that pattern for the town lookup.Please double-check the anonymized data you posted. Example 4 looks strange.
Although there is an inconsistency in the Example 7 data, it might be the best of the extracts from a parsing perspective.
ASKER
Sorry for the delay I've been away without internet access.
too many rows?
...sorry, I was locked in xls. and 64k lines. I can do it in xlsx and 1 million is more than enough.
Yes some Councils deliver data in very similar formats. You're right, we can ignore 3, 5 and 6 as being identical formats to others. Even then, it is negligible for me to rearrange the columns manually into an identical form where we have columns A-F as Address 1-6 and G as Postcode. So you can assume that this has been done, with blank columns created in H-N ready for the processed data as the starting point for the macro.
Ignore the fact that example 7 has 'address in one line' different from 'address in separate lines, I over-randomised the data - they are in fact the same.
too many rows?
...sorry, I was locked in xls. and 64k lines. I can do it in xlsx and 1 million is more than enough.
Yes some Councils deliver data in very similar formats. You're right, we can ignore 3, 5 and 6 as being identical formats to others. Even then, it is negligible for me to rearrange the columns manually into an identical form where we have columns A-F as Address 1-6 and G as Postcode. So you can assume that this has been done, with blank columns created in H-N ready for the processed data as the starting point for the macro.
Ignore the fact that example 7 has 'address in one line' different from 'address in separate lines, I over-randomised the data - they are in fact the same.
* So, this data comes from different sources?
* Does this (raw) data come to you in CSV format or have you done some editing to create the CSV (or Excel workbook) files we see in this thread?
* Will the one line address datum be available from all your sources or is that only from a subset of the data sources?
* Have you assembled your city/postalcode table?
* Does this (raw) data come to you in CSV format or have you done some editing to create the CSV (or Excel workbook) files we see in this thread?
* Will the one line address datum be available from all your sources or is that only from a subset of the data sources?
* Have you assembled your city/postalcode table?
ASKER
Yes, every Council in the country is a different source, each with their own format.
They all vary, but seem to boil down to a handful of different types, in any case it looks like they all have columns somewhere that equate to Address 1-5 and Postcode. (We'll also need to look at names in a future question - Address is the important one). It won't be difficult to create a macro for each council putting their columns in the right place before processing. But we could automate that: eg. Look for column heading "address 1", cut and paste/insert into A... etc.
The data seems to come in tab delimited formats, some may come in csv, but in any case all can by opened in excel.
The one line of data was just one random line I picked out and changed parts of words in, to give a sense of where the spaces are and how many words etc. There will generally be 50k to 100k rows in a file. When I've cleaned it in excel I will upload it to a Salesforce database.
I attach a lookup of City to Postcode-FirstPart
Postcode-FirstPart-Lookup.xlsx
They all vary, but seem to boil down to a handful of different types, in any case it looks like they all have columns somewhere that equate to Address 1-5 and Postcode. (We'll also need to look at names in a future question - Address is the important one). It won't be difficult to create a macro for each council putting their columns in the right place before processing. But we could automate that: eg. Look for column heading "address 1", cut and paste/insert into A... etc.
The data seems to come in tab delimited formats, some may come in csv, but in any case all can by opened in excel.
The one line of data was just one random line I picked out and changed parts of words in, to give a sense of where the spaces are and how many words etc. There will generally be 50k to 100k rows in a file. When I've cleaned it in excel I will upload it to a Salesforce database.
I attach a lookup of City to Postcode-FirstPart
Postcode-FirstPart-Lookup.xlsx
Now obfuscate at least one address of each type you listed in the Sample Address Formats workbook. Do NOT obfuscate the postal code or town data for those rows. It would be best if they were all in the same 'style' -- place on a single worksheet based on one of the Example # worksheets.
You can paste either a .CSV or tab-delimited file.
You can paste either a .CSV or tab-delimited file.
ASKER
Hi Aikimark,
Is this what you're looking for? See attached.
Example2-Raw-Data.csv
Example2-columns-aligned.csv
Is this what you're looking for? See attached.
Example2-Raw-Data.csv
Example2-columns-aligned.csv
Thanks.
What is (with full stop)?
I won't be able to spend any time on this until tomorrow (Sunday).
What is (with full stop)?
I won't be able to spend any time on this until tomorrow (Sunday).
ASKER
That's just me pointing out there is a full stop there, probably you didn't need to know that!
I've cleaned up the files a bit more for that, and getting rid of spare lines and headers, see attached.
Sunday is fine, thanks
Example2-Raw-Data.csv
Example2-columns-aligned.csv
I've cleaned up the files a bit more for that, and getting rid of spare lines and headers, see attached.
Sunday is fine, thanks
Example2-Raw-Data.csv
Example2-columns-aligned.csv
I discovered that your aligned sample has some high-range ASCII values (160) that look like space characters (32), but aren't.
ASKER
Can you treat anything that looks like a space, as a space?
I can programmatically replace the 160.
Here is my first pass results for matching data. Assume that the town and postcode values are correct.
1. EN4 should indicate a town of Barnet. However, two of your EN4 lines have a town of Hertfordshire
2. the following postcode column values have a trailing space character:
"WS3 2RG "
"WS2 7LB "
"WS3 3SS "
Here is my first pass results for matching data. Assume that the town and postcode values are correct.
The House~5 Road Name~Village
FlatID The House
HouseID 5
Street Road Name
2ndLine Village
The House~5 Road Name~Village
FlatID The House
HouseID 5
Street Road Name
2ndLine Village
5 Road Name~Village
HouseID 5
Street Road Name
2ndLine Village
5 Road Name~Village
HouseID 5
Street Road Name
2ndLine Village
HouseName~5 Road Name~Village
FlatID HouseName
HouseID 5
Street Road Name
2ndLine Village
HouseName~5 Road Name~Village
FlatID HouseName
HouseID 5
Street Road Name
2ndLine Village
HouseName~5 Road Name~Village
FlatID HouseName
HouseID 5
Street Road Name
2ndLine Village
817 Browns Lane
HouseID 817
Street Browns Lane
Flat 3~7-9 Double Name Avenue
FlatID Flat 3
HouseID 7-9
Street Double Name Avenue
HouseName~Name Road
FlatID HouseName
Street Name Road
Flat 3~158 Name Road
FlatID Flat 3
HouseID 158
Street Name Road
St.Name Sch. Building~85 Name Road
HouseID 85
Street Name Road
Lower Flat~18 Name Avenue
FlatID Lower Flat
HouseID 18
Street Name Avenue
Flat 7~23 Name Avenue
FlatID Flat 7
HouseID 23
Street Name Avenue
5 Name Cottages~Name Road
HouseID 5
Street Name Cottages
2ndLine Name Road
The Cottage~Double Name Way
FlatID The Cottage
Street Double Name Way
729 The Name
HouseID 729
Street The Name
Double Name~702 The RoadName
HouseID 702
Street The RoadName
Three Part Name~The RoadName
FlatID Three Part Name
Street The RoadName
Flat 7~501 Name Lane
FlatID Flat 7
HouseID 501
Street Name Lane
5 Name Court~7A Name Road
HouseID 5
Street Name Court
2ndLine 7A Name Road
4 Name Road
HouseID 4
Street Name Road
5 Name Court~1B Trinity Avenue
HouseID 5
Street Name Court
2ndLine 1B Trinity Avenue
Flat 3~702 Name Road
FlatID Flat 3
HouseID 702
Street Name Road
Lower Flat~57 Name Road
FlatID Lower Flat
HouseID 57
Street Name Road
5 Name Cottages~Village
HouseID 5
Street Name Cottages
2ndLine Village
4 Name House~The Name~Mapledurwell
HouseID 4
Street Name House
2ndLine The Name~Mapledurwell
2 Name Farm~St Double Name
HouseID 2
Street Name Farm
2ndLine St Double Name
Name Hotel~6 Name Road
HouseID 6
Street Name Road
7 Name Court
HouseID 7
Street Name Court
78 Name Close
HouseID 78
Street Name Close
9 Name Court~Double Name Close
HouseID 9
Street Name Court
2ndLine Double Name Close
Name House~75 Name Road~Village
FlatID Name House
HouseID 75
Street Name Road
2ndLine Village
Name House~57A Double Name
HouseID 57A
Street Double Name
57 Name Lane
HouseID 57
Street Name Lane
118 Name Lane~Essington
HouseID 118
Street Name Lane
2ndLine Essington
Flat~6 Name Street
FlatID Flat
HouseID 6
Street Name Street
5 Name Road
HouseID 5
Street Name Road
Exceptions:1. EN4 should indicate a town of Barnet. However, two of your EN4 lines have a town of Hertfordshire
2. the following postcode column values have a trailing space character:
"WS3 2RG "
"WS2 7LB "
"WS3 3SS "
ASKER
Thanks Aikimark, this is all great.
First of all your exceptions:
1. EN4 giving county rather than town - we could insert the town before the county, or replace the county. This is straying beyond the bounds of the question so I think I should make that a separate question. To deal with it we'd have to lookup both post-town and county, put them in specific fields and search and delete them if we find them in the given address. Then we'd have to adjust for misspellings. It would be nice to get the data that clean, but it's not necessary for now, and I can't afford the PAF licence (yet) so maybe I will post this up as an additional question when I have all the lookup data in place.
2. Trailing space in postcode - it would be nice to lose it but not essential.
Second the errors in your first pass:
Please note that I have been looking for something to dramatically speed things up, not necessarily to give me perfection, so don't worry if we don't solve everything, but here are the ones that could be better and some ideas as to how one might solve them:
line 48 - Type A - St.Name Sch. Building would be in Flat ID
line 59 - Type B: "5 Name Cottages" should be Flat ID: 5; House ID: Name Cottages; Street: Name Road
line 63 - Type C: "The Cottage" would be house ID
line 69 - Type A: same as 48: "double name" would be in Flat ID
line 72 - Type C same as 63: "Three Part Name" would be house ID
line 79 - Type D: "5 Name Court" would be Flat ID; 7A is House ID and Name Road is the Street
Line 86 - Type D same as 79 "5 Name Court" would be Flat ID
Line 102 - Type E: "Mapledurell" would be third line
line 119 - Type D same as 79 "9 Name Court" would be Flat ID
Line 128 - Type A same as 48: Name House would be Flat ID
Type A we would know because there is a number block not in the leftmost field. Can you do something like "if there's a number block 'in the middle' put anything left of it in 'Flat ID'"?
Type B, would be solved by that Type A step before treating as normal.
Type C, we'd need a rule like: "if there's no number block put the first field in House ID and the next field in Street ID" that may not be perfect but I think it would solve most.
Type D, is like Type A, only there's a leading number block in addition to the 'one in the middle'. Same rule as A would solve this as well though I think.
Type E, you may have only put them both in line 2 to save space, but we have 6 address fields so can use them.
I attach a file with the errors next to the lines referenced if that helps.
If any of those are particularly hard I don't mind if we spin those off into another question.
Many thanks
Charles
Algorithm-Output-with-errors-19-Oct.xlsx
First of all your exceptions:
1. EN4 giving county rather than town - we could insert the town before the county, or replace the county. This is straying beyond the bounds of the question so I think I should make that a separate question. To deal with it we'd have to lookup both post-town and county, put them in specific fields and search and delete them if we find them in the given address. Then we'd have to adjust for misspellings. It would be nice to get the data that clean, but it's not necessary for now, and I can't afford the PAF licence (yet) so maybe I will post this up as an additional question when I have all the lookup data in place.
2. Trailing space in postcode - it would be nice to lose it but not essential.
Second the errors in your first pass:
Please note that I have been looking for something to dramatically speed things up, not necessarily to give me perfection, so don't worry if we don't solve everything, but here are the ones that could be better and some ideas as to how one might solve them:
line 48 - Type A - St.Name Sch. Building would be in Flat ID
line 59 - Type B: "5 Name Cottages" should be Flat ID: 5; House ID: Name Cottages; Street: Name Road
line 63 - Type C: "The Cottage" would be house ID
line 69 - Type A: same as 48: "double name" would be in Flat ID
line 72 - Type C same as 63: "Three Part Name" would be house ID
line 79 - Type D: "5 Name Court" would be Flat ID; 7A is House ID and Name Road is the Street
Line 86 - Type D same as 79 "5 Name Court" would be Flat ID
Line 102 - Type E: "Mapledurell" would be third line
line 119 - Type D same as 79 "9 Name Court" would be Flat ID
Line 128 - Type A same as 48: Name House would be Flat ID
Type A we would know because there is a number block not in the leftmost field. Can you do something like "if there's a number block 'in the middle' put anything left of it in 'Flat ID'"?
Type B, would be solved by that Type A step before treating as normal.
Type C, we'd need a rule like: "if there's no number block put the first field in House ID and the next field in Street ID" that may not be perfect but I think it would solve most.
Type D, is like Type A, only there's a leading number block in addition to the 'one in the middle'. Same rule as A would solve this as well though I think.
Type E, you may have only put them both in line 2 to save space, but we have 6 address fields so can use them.
I attach a file with the errors next to the lines referenced if that helps.
If any of those are particularly hard I don't mind if we spin those off into another question.
Many thanks
Charles
Algorithm-Output-with-errors-19-Oct.xlsx
what is Type A,B,C,D,E?
I'll take a look at your comments, The first one you pointed out was one I actually expected to see. It has to do with the order of the patterns I try.
I'll take a look at your comments, The first one you pointed out was one I actually expected to see. It has to do with the order of the patterns I try.
Please take a look at the attached workbook.
I copied the address data columns into a new workbook and added the parsed columns to the right. Immediately to the right of the PostalCode column, I added some codes that let me know what exceptions happened in the program.
ZLU= zip/postal code lookup failure
CLU= city lookup failure
Note: all the 160 characters have been changed to spaces and the postal code values have been trimmed of leading/trailing spaces.
Q-28534211.xls
I copied the address data columns into a new workbook and added the parsed columns to the right. Immediately to the right of the PostalCode column, I added some codes that let me know what exceptions happened in the program.
ZLU= zip/postal code lookup failure
CLU= city lookup failure
Note: all the 160 characters have been changed to spaces and the postal code values have been trimmed of leading/trailing spaces.
Q-28534211.xls
ASKER
Hi Aikimark,
This is terrific, it looks like there are only two 'errors' left - the CLU and the Type B.
The Type A, Type B etc. were just my attempts to categorise the errors, in the order I found them. (made up labels). Although I was wrong about 119 above being Type D, it's Type B, which is the only one it looks like remains unsolved in your recent attachment.
Re. CLU errors - Currently you seem to be leaving a blank when a city isn't found. Can we lookup the city from the postcode in that case and resolve the address anyway?
Re. Type B errors - I'm not sure how we could solve these. There's no second number block to help us locate the road. The only indication that we're putting the building name in the road field is that there's another field following it with an identifiable name ("road", "avenue", "close", "way", "street", "crescent" etc.). We could have a lookup, and if the last word in that field (column L in your example Q-28534211) matches a lookup table of road types, and column I is blank, shift all three fields one to the left.
If you can resolve those two it would be amazing. I don't mind moving the Type Bs to a new question.
What is the mechanism then for the execution? a macro in excel or something running externally to the file?
This is terrific, it looks like there are only two 'errors' left - the CLU and the Type B.
The Type A, Type B etc. were just my attempts to categorise the errors, in the order I found them. (made up labels). Although I was wrong about 119 above being Type D, it's Type B, which is the only one it looks like remains unsolved in your recent attachment.
Re. CLU errors - Currently you seem to be leaving a blank when a city isn't found. Can we lookup the city from the postcode in that case and resolve the address anyway?
Re. Type B errors - I'm not sure how we could solve these. There's no second number block to help us locate the road. The only indication that we're putting the building name in the road field is that there's another field following it with an identifiable name ("road", "avenue", "close", "way", "street", "crescent" etc.). We could have a lookup, and if the last word in that field (column L in your example Q-28534211) matches a lookup table of road types, and column I is blank, shift all three fields one to the left.
If you can resolve those two it would be amazing. I don't mind moving the Type Bs to a new question.
What is the mechanism then for the execution? a macro in excel or something running externally to the file?
How many CLU errors do you think you'll have? (percentage or absolute number)
I assume that you would create some aligned version of your data and run the code against it. The final version of the code would prompt you to select the data in the postalcode column.
identifiable name ("road", "avenue", "close", "way", "street", "crescent" etc.)What goes into the etc list?
I assume that you would create some aligned version of your data and run the code against it. The final version of the code would prompt you to select the data in the postalcode column.
ASKER
I have no idea how many CLU errors we might encounter. We could assume the same ratio as our test data.
Re, identifiable names - I think I would start with that list and add to it when I found others from eye-scanning the data. I would either create a named array called "RoadNames", or follow your instructions on where to put the list.
Are you now able to compile the final code?
If it's not an excel macro please give me a guide how to run it.
Re, identifiable names - I think I would start with that list and add to it when I found others from eye-scanning the data. I would either create a named array called "RoadNames", or follow your instructions on where to put the list.
Are you now able to compile the final code?
If it's not an excel macro please give me a guide how to run it.
The macro is in the workbook I posted. You are welcome to add other patterns or change existing patterns.
ASKER
Ok got it.
I'm getting a Run-time error 9 "Subscript out of range" when it hits:
Set wkb = Workbooks("Postcode-FirstP art-Lookup .xlsx").
I have that file open, and moved it to the same folder as the open file with the data. Do you know why it wouldn't work?
I tried deleting that line, moving the worksheet into the file and changing the next line to:
Set wks = Worksheets("Postcode Simple Lookup")
That seemed to work.
Next run time error (which I couldn't work out) is on line:
dicZipCities(vData(lngRow, 1)) = dicZipCities(vData(lngRow, 1)) & "^" & vData(lngRow, 2)
More generally, I don't understand your objective with the dicZipCities, or what the CLU error is? For example in rows 18 and 19 of your example file throwing up the CLU error, EN4 is Barnet, and Barnet is in Field B, what is stopping it parsing out. Do we need this error and why not just parse those rows anyway? I don't mind having not perfect data eg as per the 'Type B' issue above, but blank lines are not good. Can you change it to leave an error code and parse anyway?
Once this is finished, can I post your completed file to another question and create a new question around the 'type B' problem?
I'm getting a Run-time error 9 "Subscript out of range" when it hits:
Set wkb = Workbooks("Postcode-FirstP
I have that file open, and moved it to the same folder as the open file with the data. Do you know why it wouldn't work?
I tried deleting that line, moving the worksheet into the file and changing the next line to:
Set wks = Worksheets("Postcode Simple Lookup")
That seemed to work.
Next run time error (which I couldn't work out) is on line:
dicZipCities(vData(lngRow,
More generally, I don't understand your objective with the dicZipCities, or what the CLU error is? For example in rows 18 and 19 of your example file throwing up the CLU error, EN4 is Barnet, and Barnet is in Field B, what is stopping it parsing out. Do we need this error and why not just parse those rows anyway? I don't mind having not perfect data eg as per the 'Type B' issue above, but blank lines are not good. Can you change it to leave an error code and parse anyway?
Once this is finished, can I post your completed file to another question and create a new question around the 'type B' problem?
That workbook must be open when the code runs. I know that's a bit slack, but I was concentrating on the validation code. I thought I'd included that requirement in an earlier comment. I apologize for to omission.
I'm using a dictionary object to do the postalcode -> city lookups. I thought about using a VLookup() function, but you have several duplicate postalcode entries and I would need to use a filter to get all of them. The dictionary object is much faster than any of the Excel data lookup methods.
===================
This raises an interesting question...if there are multiple cities for a postalcode value, which one should be used if the row doesn't have a matching city?
I'm using a dictionary object to do the postalcode -> city lookups. I thought about using a VLookup() function, but you have several duplicate postalcode entries and I would need to use a filter to get all of them. The dictionary object is much faster than any of the Excel data lookup methods.
===================
This raises an interesting question...if there are multiple cities for a postalcode value, which one should be used if the row doesn't have a matching city?
ASKER
I don't mind having the workbook open. The problem is the workbook is open, and it still doesn't work. I wonder why?
But I can put the worksheet in the file which solves it.
I didn't know about the dictionary tool. It looks pretty cool.
I hadn't realised there were postcode values with multiple cities. I would have no problem, if the row doesn't have either of the two alternatives, with just going with the first in the list.
Can you fix it so it does that?
And do you know why it broke on that line?
But I can put the worksheet in the file which solves it.
I didn't know about the dictionary tool. It looks pretty cool.
I hadn't realised there were postcode values with multiple cities. I would have no problem, if the row doesn't have either of the two alternatives, with just going with the first in the list.
Can you fix it so it does that?
And do you know why it broke on that line?
What is the active worksheet when you started the routine?
ASKER
Q-28534211.xls is the active workbook, I click within it several times to be sure.
I run the macro by alt.t.m.m and choosing the macro and clicking run.
I run the macro by alt.t.m.m and choosing the macro and clicking run.
Are you running it against the data in the workbook I posted or different data? If different, is the data consistent with the aligned sample data? (cols A:F = Address1:Address6; col G = postcode)
What are the values of
vData(lngRow, 1)
vData(lngRow, 2)
when the code fails?
What are the values of
vData(lngRow, 1)
vData(lngRow, 2)
when the code fails?
ASKER
I tried running both against new data in an aligned sample, and against the data you posted, same error.
How/where would I find the values of
vData(lngRow, 1)
vData(lngRow, 2)?
How/where would I find the values of
vData(lngRow, 1)
vData(lngRow, 2)?
When the code errors, you will be in debug mode. You should be able to hover your mouse over the variables. You can also display the values in the Immediate Window by preceding the variable with a question mark. You can also select an expression and click on the eye glasses toolbar button.
ASKER
Thanks, found the values.
Sorry this is going so slow, thank you for your patience:
I got the two bugs (hadn't noticed the postcode name had dashes in the code, and row 619 had an n/a#, both now rectified).
The macro now runs.
Running on your data works perfectly.
Running on some new data delivers error ZLU in the header row (expected) and ZLU2 on the next 56 lines then stops.
The data is Address 1-6 and postcode (A-G), new sheet. I checked the Postcodes (RG24 and RG27) are in the postcode lookup file.
Sorry this is going so slow, thank you for your patience:
I got the two bugs (hadn't noticed the postcode name had dashes in the code, and row 619 had an n/a#, both now rectified).
The macro now runs.
Running on your data works perfectly.
Running on some new data delivers error ZLU in the header row (expected) and ZLU2 on the next 56 lines then stops.
The data is Address 1-6 and postcode (A-G), new sheet. I checked the Postcodes (RG24 and RG27) are in the postcode lookup file.
The operating range is hard coded. It is 57 rows tall to get all the sample data. For testing purposes, you can change that range address to accommodate more data rows for your testing.
ASKER
I got the range ok. Now the whole lot is ZLU2 error. What is this telling me?
What does the code currently look like?
ASKER
I don't understand the question
The only thing I have changed is G57 is now G60249
The only thing I have changed is G57 is now G60249
earlier, you wrote that you had commented some lines of code.
ASKER
All I did was put the postcode into the same file and remove the line looking for it in another workbook. I didn't save those changes, and later solved the problem by re-naming the postcode file to match the code and reverting to the original code. So nothing has been altered in the code other than changing the array from G57 to G60249.
The oustanding issues are:
1. What the ZLU2 error is telling us
2. What the CLU error is telling us, and parsing those lines, rather than leaving them blank
The oustanding issues are:
1. What the ZLU2 error is telling us
2. What the CLU error is telling us, and parsing those lines, rather than leaving them blank
look for the ZLU2 string literal in the code.
Are you sure the dictionary contains anything? You might want to set a breakpoint at the place where the dictionary is being filled and step through the code. Also, the dictionary defaults to case sensitive key comparisons. If you have lower case zip codes, this will need to change.
Are you sure the dictionary contains anything? You might want to set a breakpoint at the place where the dictionary is being filled and step through the code. Also, the dictionary defaults to case sensitive key comparisons. If you have lower case zip codes, this will need to change.
ASKER
Ok, I finally figured that out. A non-breaking space in the data.
Excel couldn't find and replace it (or said it had but hadn't). Is there a better way to get rid of those than opening in notepad, copying to word (so I can see the things), find&replace, copy back into notepad, save as csv and reopen in Excel?
Can I ask you to resolve the CLU error as the last point before closing? So these lines are parsed out rather than leaving the destination blank. I have tried to understand the code but can't work out what it's doing.
Excel couldn't find and replace it (or said it had but hadn't). Is there a better way to get rid of those than opening in notepad, copying to word (so I can see the things), find&replace, copy back into notepad, save as csv and reopen in Excel?
Can I ask you to resolve the CLU error as the last point before closing? So these lines are parsed out rather than leaving the destination blank. I have tried to understand the code but can't work out what it's doing.
Else
Set rngSrc = wks.Range(rngSrc.Cells(1, 1), rngSrc.Cells(1, rngFind.Column))
Set rngFind = rngSrc.Find(dicZipCities(strZipCode), , , xlWhole, , xlPrevious)
If rngFind Is Nothing Then
rng.Offset(0, 1).Value = "CLU"
Exit Do
What is the error this has found? and can we drop the error code in there but proceed anyway to parse that line?
Please upload a workbook with the non-breaking space. I will add it to the 160 character cleanup.
When getting CLU errors, can you determine if it is in cases where there are multiple cities associated with a postal code? I just realized that this lookup statement would never find a city in such cases, since the city names have been put in a delimited string in the dictionary.
When getting CLU errors, can you determine if it is in cases where there are multiple cities associated with a postal code? I just realized that this lookup statement would never find a city in such cases, since the city names have been put in a delimited string in the dictionary.
ASKER
I attach the non-breaking space example.
re. CLU errors. In this case the Code is EN4 and only has one city associated with it: Barnet.
The problem is the City is in a field after a comma as follows:
Address1
5 Name Mews
Address2
Hadley Wood, Barnet
Address3
Hertfordshire
Address4
EN4 0ED
Address5
Address6
PostCode
EN4 0ED
Having a field with two items separated by a comma throws up no errors if the city is visible in another field
If I add Barnet in another field the code works
This error appears to indicate that the city found in the postcode lookup is not in an existing standalone field.
In this case I would like it to go ahead and add the city anyway rather than stop the process, perhaps leave the CLU code as a notifier.
There are 56 cases of one postcode 2 cities. I'll put those cases together with the 'Type B' error in another question.
Example-of-non-breaking-space.xls
re. CLU errors. In this case the Code is EN4 and only has one city associated with it: Barnet.
The problem is the City is in a field after a comma as follows:
Address1
5 Name Mews
Address2
Hadley Wood, Barnet
Address3
Hertfordshire
Address4
EN4 0ED
Address5
Address6
PostCode
EN4 0ED
Having a field with two items separated by a comma throws up no errors if the city is visible in another field
If I add Barnet in another field the code works
This error appears to indicate that the city found in the postcode lookup is not in an existing standalone field.
In this case I would like it to go ahead and add the city anyway rather than stop the process, perhaps leave the CLU code as a notifier.
There are 56 cases of one postcode 2 cities. I'll put those cases together with the 'Type B' error in another question.
Example-of-non-breaking-space.xls
did you come up with a list of road (category) strings for me?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Sorry, I listed all the road strings and didn't submit, here they are:
Road, Street, Avenue, Drive, Close, Lane, Crescent, Parade, Place, Way, Row, Mews, Hill, Gardens, Grove, Walk, Square, Boulevard.
Thanks for the above, I'll test it shortly. Nice touch to open the PC file if not open. Thanks.
Road, Street, Avenue, Drive, Close, Lane, Crescent, Parade, Place, Way, Row, Mews, Hill, Gardens, Grove, Walk, Square, Boulevard.
Thanks for the above, I'll test it shortly. Nice touch to open the PC file if not open. Thanks.
ASKER
A little problem with the clean up sub.
Line:
vData = wks.Range(wks.Cells(lngRow , 1), parmCell).Value
"Method 'range' of object '_Worksheet' failed"
Only for the data that had the non-breaking space problem (for other data it worked perfectly), it also failed with the same error when I replaced the non-breaking space with a normal space. Do you know what the problem might be?
Line:
vData = wks.Range(wks.Cells(lngRow
"Method 'range' of object '_Worksheet' failed"
Only for the data that had the non-breaking space problem (for other data it worked perfectly), it also failed with the same error when I replaced the non-breaking space with a normal space. Do you know what the problem might be?
Is the non-breaking space the same as the ASCII 160 character or something else?
ASKER
It's the same data as before with the non-breaking space
Presumably this is only searching for those two characters and replacing if it finds them. why would it break at that line? What kind of thing would make it break there? I don"t understand he code enough.
Presumably this is only searching for those two characters and replacing if it finds them. why would it break at that line? What kind of thing would make it break there? I don"t understand he code enough.
Have you clicked the Reset toolbar button (Run Reset) or close/reopened the workbook?
ASKER
Trying it on my mac laptop and it breaks on
Run-time error 429.
ActiveX component can't create object:
Set oRE = CreateObject("vbscript.reg exp")
might be because it's a mac
Yes I've clicked the reset button, but I'll check in case I haven't reopened when I get back to PC.
It would be helpful if you could say what that line does, then I'll be able to trace error more intelligently.
Run-time error 429.
ActiveX component can't create object:
Set oRE = CreateObject("vbscript.reg
might be because it's a mac
Yes I've clicked the reset button, but I'll check in case I haven't reopened when I get back to PC.
It would be helpful if you could say what that line does, then I'll be able to trace error more intelligently.
No MACs allowed. Windows only.
ASKER
What does that line do? Can you write out the logical statement in plain English? I can't locate the error if I don't know what it's doing. I had assumed it would be something like: "look for a 160 or n-b-space character, if found, replace with space, else carry on" If that's the case it's hard to see what would make that code stop.
vData = wks.Range(wks.Cells(lngRow, 1), parmCell).Value
This statement transfers the data in the cells from column 1 through the postalcode column on the current row (in the calling routine) into the vData variable. vData is a variant that contains an array.
ASKER
Interesting:
following reboot of the PC:
1. Running the code on the Basingstoke data which had previously had the n-b-spaces replaced, and moved into the file with the code: code runs and works.
2. Running the code on the Basingstoke data in its original file with three files open (Basingstoke data, file with the code, postal code file): appears to run (you wait about as long as per (1)), but ultimately it stops, no errors nothing happens.
3. Move the Basingstoke data (moving the whole sheet) into the file with the code, then running it, causes this error.
4. Saving the file as per (3), closing Excel and reopening it then running the code as you suggested, works.
So it looks like moving a sheet into a new workbook does something to mess up the vData variable, it doesn't matter what I just know to not do that.
Interesting that (2) doesn't work. Do you know why?
(It doesn't matter, I just have to transfer the data into the file with the code, save, close and re-open), but curious if you have an explanation.
On that basis I'm accepting the answer and transferring those two outstanding elements to a new question, since they weren't part of the original brief. Thank you for working so hard on this.
following reboot of the PC:
1. Running the code on the Basingstoke data which had previously had the n-b-spaces replaced, and moved into the file with the code: code runs and works.
2. Running the code on the Basingstoke data in its original file with three files open (Basingstoke data, file with the code, postal code file): appears to run (you wait about as long as per (1)), but ultimately it stops, no errors nothing happens.
3. Move the Basingstoke data (moving the whole sheet) into the file with the code, then running it, causes this error.
4. Saving the file as per (3), closing Excel and reopening it then running the code as you suggested, works.
So it looks like moving a sheet into a new workbook does something to mess up the vData variable, it doesn't matter what I just know to not do that.
Interesting that (2) doesn't work. Do you know why?
(It doesn't matter, I just have to transfer the data into the file with the code, save, close and re-open), but curious if you have an explanation.
On that basis I'm accepting the answer and transferring those two outstanding elements to a new question, since they weren't part of the original brief. Thank you for working so hard on this.
I don't know what might be happening to cause those errors. Here is a slightly better version of the cleanup routine.
During my testing, I did encounter a few instances of invoking the code and it going off into never land. At least one time, I found that I had double-clicked a cell and was in edit mode. This will certainly cause a problem.
It is possible to restructure this so that you would point to a CSV file (file open dialog) and the code would detect the six address# columns and the postcode column. The original and cleaned data could be added to the current or a new worksheet. However, that would probably only be a worthwhile effort if you were going to do this often.
Sub CleanRowData(parmCell As Range)
Static wks As Worksheet
Static rng As Range
Dim vData As Variant
Dim strValue As String
Dim lngCol As Long
Dim lngRow As Long
Dim lngBefore As Long
If wks Is Nothing Then
Set wks = parmCell.Worksheet
End If
lngRow = parmCell.Row
vData = wks.Range(wks.Cells(lngRow, 1), parmCell).Value
For lngCol = 1 To UBound(vData, 2)
lngBefore = Len(vData(1, lngCol))
strValue = Trim(vData(1, lngCol))
If (lngBefore <> Len(strValue)) Or (InStr(strValue, Chr(160)) <> 0) Then
wks.Cells(lngRow, lngCol).Value = Replace(strValue, Chr(160), " ")
End If
Next
End Sub
During my testing, I did encounter a few instances of invoking the code and it going off into never land. At least one time, I found that I had double-clicked a cell and was in edit mode. This will certainly cause a problem.
It is possible to restructure this so that you would point to a CSV file (file open dialog) and the code would detect the six address# columns and the postcode column. The original and cleaned data could be added to the current or a new worksheet. However, that would probably only be a worthwhile effort if you were going to do this often.
If you are getting adequate results, it doesn't seem that adding the road names is warranted.
ASKER
We rely on knowing the street name so this last step would really help. I have asked it under the following Question: ID: http:Q_28544577.html
ASKER
Great solution, did more than I originally asked and thanks for putting so much work in to get there.
I can't see it.