Link to home
Start Free TrialLog in
Avatar of Sam OZ
Sam OZFlag for Australia

asked on

Finding Rows breaking the data integrity rule

I have an Sql Server  table with following data   ( My data is not  exactly street data , but this is the sample format) 

Country    State   District   ZIPCode     Street  

Note:  I can take this to an excel sheet or MS Access database , A solution based on this is also fine

The current data  seems to be corrupt . How can I check the data integrity . What I mean by that is -  checking where it   breaks the top down approach 

 Ideally, if you visualize it in a form with picklist , Selecting a Country from the picklist for country will show states in the country and then  selecting State will show District ... then ZIPCode under District and Street under ZIPCode

How can I get the rows of  corrupt data breaking this rule ? 

Avatar of Saqib Husain
Saqib Husain
Flag of Pakistan image

Your best starting point would be uploading a sample file. For me, I would request an excel file.
Avatar of John Tsioumpris
John Tsioumpris
Flag of Greece image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If you import the data into Access, you can create a query that looks for Null values in any of the rows.
I created something similar some time ago using Dynamic Data Validation; selection in one cell restricted the selection in the next cell to a limited number of options related to first cell; third cell was then restricted to entries related to second cell. This was quite cumbersome but worked quite well using various lists and INDIRECT function in the Data Validation selection.

Because it used Data Validation I was able to use the Data Validation checker on the Data Tab, Data tools group and it would add a ring around invalid data. There is an article about it on my profile if you want to take a look but that is somewhat after the horse has bolted.

Are you able to create a list of correct combinations? This would have repetitions in all but the last level. Sample Data below; each of the District references are unique but the levels above are repeated as necessary.
AState A1District A1.1
AState A1District A1.2
AState A1District A1.3
AState A1District A1.4
AState A2District A2.1
AState A2District A2.2
AState A2District A2.3
AState A3District A3.1
AState A3District A3.2
AState A3District A3.3
BState B1District B1.1
BState B1District B1.2
BState B1District B1.3
BState B1District B1.4
BState B2District B2.1
BState B2District B2.2
BState B2District B2.3
BState B3District B3.1
BState B3District B3.2
BState B3District B3.3
BState B3District B3.4
CState C1District C1.1
CState C1District C1.2
CState C1District C1.3
CState C1District C1.4
CState C2District C2.1
CState C2District C2.2
CState C2District C2.3
CState C2District C2.4
CState C2District C2.5

You could then do effectively a reverse lookup. You could find the listed State in the lookup and compare Country in the lookup list to make sure it is correct; find district in the list to make sure State is correct etc etc.

If it truly were Address data, I suspect there would be an issue with Street as the same Street could occur in multiple ZIPs. Could this occur in your real data, ie one value could occur in multiple options at the higher level?
Are you sure you totally understand the underlying data about Country State District ZIPCode and Street?
Maps often seem straightforward and top down, but in reality they mix geography, history, political divisions and USPS conventions, and it is easy to get confused.
If you give us and excel download which demonstrates your idea of “corruption” we will certainly be able to help you.
In the meantime, do you think the following demonstrates “corruption”
  • Pick sequence A shows  zip code 89439 has exactly 2 streets  Main and Washington
  • But pick sequence B shows the same zip code 89439 has exactly 3 streets Adam, Lincoln and Main
If you answered “yes that is corruption” then you are wrong !
This is because zip codes do not follow political boundaries. For instance zip code 89439 contains several cities and some of those cities cross the border between California and Nevada.
And of course some of the streets also cross the border, and some do not.
For example “Main Street” might start in Reno NV, then pass through Reno CA and end in Verdi CA.
But Lincoln Street might only be in Reno NV
Pick sequence A might show
Level 1: Country Picked USA
Level 2: State picked Nevada
Level 3: City picked Reno
Level 4: ZIP Code picked 89439
Level 5: Street might have drop down with exactly two streets  Main and Washington
Pick sequence B might show
Level 1: Country Picked USA
Level 2: State picked California
Level 3: City picked Reno
Level 4: ZIPCode picked 89439
Level 5: Street might have drop down with exactly three streets  Adam, Lincoln and Main
And, for what it is worth, there are 13  US Census' ZIP Codes with two states. 42223, 59221, 63673, 71749, 73949, 81137, 84536, 86044, 86515, 88063, 89439 & 97635.
I am hope you realize that "Country    State   District   ZIPCode     Street   "  would normally NOT have a "top down approach".  

For instance if "Main Street" may go through districts "A", "M" and "Z" and each district would have many streets.

and near a go through OH " and

does not make much sense to say "Date integrity" means "checking where it   breaks the top down approach " .
For instance Main Street

sam oz, it has been 3 weeks since you asked this question.  Please review the experts advice and see if you have more questions. If not, please close the thread.

Avatar of Sam OZ


closing the question