AsNanda
asked on
Need a cursor for Select Statement
Hi I have a select statement that runs against 2 tables , these two tables have more then 5 milliion records, tableA has 4.5 mill and tableB has around 50000.
The columns in the tables are properly indexed but it takes a lot of time the way i am doing the select query.Can someone help this to make it faster.
tableA
name
age
zip
zip_4
tableB
city
zip
zip_4
I need to select the data from both the tables where the zips match but the way the data is in secod table
i hv
city zip zip_4
a 98765 9999
b 98765 1234
c 98765 2345
d 88989 9999
e 67676 9999
table A has the data in a different way they have all zip_4 in XXXX but not 9999
so i have to run two passes
the first select does
here is the query
select * from tableA A,table B B
where a.zip+a.zip_4=b.zip+b.zip_ 4
union
select * from tableA A,table B B
where a.zip=b.zip
and
b.zip+b.zip_4 not in
(
select b.zip+b.zip_4 from tableA A,table B B
where a.zip+a.zip_4=b.zip+b.zip_ 4
)
Can someone tell me an easier way so that query accessing 4.5 mill records doesnt take that long ..
I finaly need to build a table out of this data ... i run a truncate / insert statement .. or something ...
Thanks
Ash
The columns in the tables are properly indexed but it takes a lot of time the way i am doing the select query.Can someone help this to make it faster.
tableA
name
age
zip
zip_4
tableB
city
zip
zip_4
I need to select the data from both the tables where the zips match but the way the data is in secod table
i hv
city zip zip_4
a 98765 9999
b 98765 1234
c 98765 2345
d 88989 9999
e 67676 9999
table A has the data in a different way they have all zip_4 in XXXX but not 9999
so i have to run two passes
the first select does
here is the query
select * from tableA A,table B B
where a.zip+a.zip_4=b.zip+b.zip_
union
select * from tableA A,table B B
where a.zip=b.zip
and
b.zip+b.zip_4 not in
(
select b.zip+b.zip_4 from tableA A,table B B
where a.zip+a.zip_4=b.zip+b.zip_
)
Can someone tell me an easier way so that query accessing 4.5 mill records doesnt take that long ..
I finaly need to build a table out of this data ... i run a truncate / insert statement .. or something ...
Thanks
Ash
Try this:
SELECT Name, Age, ZIP, ZIP_4, ISNULL(B.City, C.City) AS City
FROM TableA A LEFT OUTER JOIN TableB B
ON A.ZIP = B.ZIP AND A.ZIP_4 = B.ZIP_4
LEFT OUTER JOIN TableC C
ON A.ZIP = C.ZIP
SELECT Name, Age, ZIP, ZIP_4, ISNULL(B.City, C.City) AS City
FROM TableA A LEFT OUTER JOIN TableB B
ON A.ZIP = B.ZIP AND A.ZIP_4 = B.ZIP_4
LEFT OUTER JOIN TableC C
ON A.ZIP = C.ZIP
Ooops, the second LEFT OUTER JOIN should be TableB and not TableC
SELECT Name, Age, ZIP, ZIP_4, ISNULL(B.City, C.City) AS City
FROM TableA A LEFT OUTER JOIN TableB B
ON A.ZIP = B.ZIP AND A.ZIP_4 = B.ZIP_4
LEFT OUTER JOIN TableB C
ON A.ZIP = C.ZIP
SELECT Name, Age, ZIP, ZIP_4, ISNULL(B.City, C.City) AS City
FROM TableA A LEFT OUTER JOIN TableB B
ON A.ZIP = B.ZIP AND A.ZIP_4 = B.ZIP_4
LEFT OUTER JOIN TableB C
ON A.ZIP = C.ZIP
Don't use
a.zip+a.zip_4=b.zip+b.zip_ 4
but
a.zip=b.zip and a.zip_4=b.zip_4
a.zip+a.zip_4=b.zip+b.zip_
but
a.zip=b.zip and a.zip_4=b.zip_4
ASKER
When i say the data is different i mean that in table B the Cities i have are
City Zip zip_4
City1 98765 9999
City2 98765 1245
City3 98765 2345
City4 98789 9999
City5 98666 9999
What i am trying to say is that every 4 digit zip code will belong to the same city if the Zip_4='9999' but its possible that their are zip codes(Zip_4) in their which belong to another city i.e in the example of Zip 98765 where 987651245 belong to city2 and 987652345 belong to city3. So all the Zips from 987650001-987651244 belong to City1 as Zip_4 is 9999 for City 1 but then Zip_4=1245 goes in to City2.
So technically 9999(in table B) is just to minimize the amount of data and not include every zip code.
But my table A does not have 9999 in Zip_4 i.e an actual zip_4 is used their.
So i will have
98765 0001
98765 0002
98765 0003
....
98765 1245
.....
98789 0001
98789 0002
...
and so on
Now for me to get the exact zips and not duplicating the data the only way i can do it by doing the multiple pass i am doing...a match for all the 9 digits and get their city and then match the fives digits of the zip excluding the 9 digists i have matched
Does this makes sense?
Ash
City Zip zip_4
City1 98765 9999
City2 98765 1245
City3 98765 2345
City4 98789 9999
City5 98666 9999
What i am trying to say is that every 4 digit zip code will belong to the same city if the Zip_4='9999' but its possible that their are zip codes(Zip_4) in their which belong to another city i.e in the example of Zip 98765 where 987651245 belong to city2 and 987652345 belong to city3. So all the Zips from 987650001-987651244 belong to City1 as Zip_4 is 9999 for City 1 but then Zip_4=1245 goes in to City2.
So technically 9999(in table B) is just to minimize the amount of data and not include every zip code.
But my table A does not have 9999 in Zip_4 i.e an actual zip_4 is used their.
So i will have
98765 0001
98765 0002
98765 0003
....
98765 1245
.....
98789 0001
98789 0002
...
and so on
Now for me to get the exact zips and not duplicating the data the only way i can do it by doing the multiple pass i am doing...a match for all the 9 digits and get their city and then match the fives digits of the zip excluding the 9 digists i have matched
Does this makes sense?
Ash
Try this. This is the same as the one above except that I added the 9999 in the second LEFT OUTER JOIN:
SELECT Name, Age, ZIP, ZIP_4, ISNULL(B.City, C.City) AS City
FROM TableA A LEFT OUTER JOIN TableB B
ON A.ZIP = B.ZIP AND A.ZIP_4 = B.ZIP_4
LEFT OUTER JOIN TableB C
ON A.ZIP = C.ZIP AND C.ZIP_4 = '9999'
SELECT Name, Age, ZIP, ZIP_4, ISNULL(B.City, C.City) AS City
FROM TableA A LEFT OUTER JOIN TableB B
ON A.ZIP = B.ZIP AND A.ZIP_4 = B.ZIP_4
LEFT OUTER JOIN TableB C
ON A.ZIP = C.ZIP AND C.ZIP_4 = '9999'
ASKER
Francisco,
The format of the table does not exactly match the way i put the table strucutes , i have more columns then just column city i want to pull in like state,timeZone,Area,Region ...etc..et c from table B, how would i include those?
Ash
The format of the table does not exactly match the way i put the table strucutes , i have more columns then just column city i want to pull in like state,timeZone,Area,Region
Ash
You can try this:
SELECT *
FROM TableA A INNER JOIN TableB B
ON A.ZIP = B.ZIP AND A.ZIP_4 = B.ZIP_4
UNION ALL
SELECT *
FROM TableA A INNER JOIN TableB B
ON A.ZIP = B.ZIP AND B.ZIP_4 = '9999'
WHERE NOT EXISTS (SELECT 'X' FROM TableB C
WHERE A.ZIP = C.ZIP AND A.ZIP_4 = C.ZIP_4)
SELECT *
FROM TableA A INNER JOIN TableB B
ON A.ZIP = B.ZIP AND A.ZIP_4 = B.ZIP_4
UNION ALL
SELECT *
FROM TableA A INNER JOIN TableB B
ON A.ZIP = B.ZIP AND B.ZIP_4 = '9999'
WHERE NOT EXISTS (SELECT 'X' FROM TableB C
WHERE A.ZIP = C.ZIP AND A.ZIP_4 = C.ZIP_4)
Select * from TableA A INNER JOIN TableB B
on A.ZIP = B.ZIP and A.ZIP_4 = case B.ZIP_4 when '9999' then A.ZIP_4 else B.ZIP_4
This will force any TableA row that matches on B.ZIP and does not match on B.Zip_4 to match to the '9999' row in TableB instead.
Since IF an A.ZIP row matches a '9999' row in TableB on the ZIP field, then the exact match of the TableA row (all nine digits) will never occur in TableB. These are two exclusive sets, or I am missing something.
I'm not sure what you are asking in the follow-up on columns for State, timeZone, unless you are still concerned about the UNION, which I eliminated.
Tom
on A.ZIP = B.ZIP and A.ZIP_4 = case B.ZIP_4 when '9999' then A.ZIP_4 else B.ZIP_4
This will force any TableA row that matches on B.ZIP and does not match on B.Zip_4 to match to the '9999' row in TableB instead.
Since IF an A.ZIP row matches a '9999' row in TableB on the ZIP field, then the exact match of the TableA row (all nine digits) will never occur in TableB. These are two exclusive sets, or I am missing something.
I'm not sure what you are asking in the follow-up on columns for State, timeZone, unless you are still concerned about the UNION, which I eliminated.
Tom
folderol, your query will work but it will contain duplicate records from tableA. Based on the sample data in tableB
City Zip zip_4
City1 98765 9999
City2 98765 1245
City3 98765 2345
City4 98789 9999
City5 98666 9999
A record in tableA where the ZIP = 98765 and ZIP_4 = 1245 will produce 2 records in the output, one for City1 and another for City2.
City Zip zip_4
City1 98765 9999
City2 98765 1245
City3 98765 2345
City4 98789 9999
City5 98666 9999
A record in tableA where the ZIP = 98765 and ZIP_4 = 1245 will produce 2 records in the output, one for City1 and another for City2.
ASKER
Yes
Francisco is right if we just do the comparison your way tom it will produce 2 rows
if tableA had jim 21 91765 1245
Name Age City1 Zip Zip_4
jim 21 City1 91765 9999
jim 21 City2 91765 1245
but jim can belong toonly 1 city and that is with Zip_4 1245.
I ran your query Francisco but it is not returining all the records ... Some of them are missing ..
What will happen if either a Zip or Zip_4 is null?
Ash
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I strongly suspect this is already true, but just to be sure:
For best performance, tableB should have an index on (zip, zip_4) [in that order].
[TableA very probably does not need an index for this query.]
For best performance, tableB should have an index on (zip, zip_4) [in that order].
[TableA very probably does not need an index for this query.]
ASKER
Yes i have the indexes Franciscos Q changed the time to 4 mins... i still hv to test Scotts query ...
I will let u guys know how it worked out..
thanks
Ash
I will let u guys know how it worked out..
thanks
Ash
SELECT *
FROM tableA A
INNER JOIN tableB B ON B.zip = A.zip
I don't fully understand this:
>> table A has the data in a different way they have all zip_4 in XXXX but not 9999 <<
If this is incorrect, perhaps you could give some sample data from the other table and the expected results from the combined data?