Solved

Oracle 10i = City name issue query issue

Posted on 2013-06-05
35
437 Views
Last Modified: 2013-06-07
Hello all

I'm pulling a list of all cities in a table by using this SQL below:
SELECT DISTINCT CITY FROM LOCATION
ORDER BY CITY; 

Open in new window


But  what i'm trying to do is to be able to extract all cities where the 13 first characters are the same on 2 different city names.

Ex:
If i have a city named: BARRINGTON PASSAGE NORTH and also a city name called: BARRINGTON PASSAGE SOUTH.

The 13 first characters on both cities are: BARRINGTON PA


I need to pull all those different cities with the same 13 first chars.

In the first column, it would give me the full city name and in column 2 the 13 first characters.

Than it would order the column by the 13 chars column.

How can i do that?

Thanks again
0
Comment
Question by:Wilder1626
  • 15
  • 9
  • 4
  • +2
35 Comments
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39222923
>>where the 13 first characters are the same on 2 different city names.


Extracting the first 13 characters is easy: substr(city,1,13).

I'm not understanding the actual requirement on what to do with them once we've extracted them.

Can you provide some sample data and expected results?
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39222944
>>If i have a city named: BARRINGTON PASSAGE NORTH

Did you edit this while I was typing of did I completely miss it?

I understand your requirements now.  Give me a minute...
0
 
LVL 18

Expert Comment

by:UnifiedIS
ID: 39222959
This is how I would do it in T-SQL, hope it helps:

SELECT COUNT(*) AS MatchCount,
 substr(city,1,13) AS PartialCity
FROM LOCATION
GROUP BY  substr(city,1,13)
HAVING COUNT(*) > 1
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39222967
Try this:

select city, first_13 from 
(
select city, substr(city,1,13) first_13, count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count
from location
)
where substr_count > 1
order by first_13
/

Open in new window



>>This is how I would do it in T-SQL, hope it helps:

That doesn't return the city.  Just the same substrings.
0
 
LVL 18

Expert Comment

by:UnifiedIS
ID: 39222968
Added the lookup to the location table to get the full cities and joined the count query by the first 13 characters:

SELECT L.city, A.PartialCity
FROM Location L
INNER JOIN
(SELECT COUNT(*) AS MatchCount,
 substr(city,1,13) AS PartialCity
FROM LOCATION
GROUP BY  substr(city,1,13)
HAVING COUNT(*) > 1
) A
ON A.PartialCity = substr(L.city, 1,13)
ORDER BY A.PartialCity
0
 
LVL 11

Author Comment

by:Wilder1626
ID: 39222981
wow, ok, let me try all these
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39222993
The one in http:#a39222968 hits the table twice.

Test them both but I feel mine will be more efficient.
0
 
LVL 11

Author Comment

by:Wilder1626
ID: 39223115
Ok, i like both, it gives me a result.

But if i only want to pull the ones in conflict!
For example, if i have :
BARRINGTON PASSAGE NORTH
BARRINGTON PASSAGE SOUTH
MISSISSAGA

Then the SQL will only give me :
BARRINGTON PASSAGE NORTH
BARRINGTON PASSAGE SOUTH


SELECT CITY, FIRST_13 FROM 
(
SELECT DISTINCT CITY, SUBSTR(CITY,1,13) FIRST_13, COUNT(SUBSTR(CITY,1,13)) OVER(PARTITION BY SUBSTR(CITY,1,13)) SUBSTR_COUNT
FROM LOCATION
)
WHERE SUBSTR_COUNT > 1
ORDER BY FIRST_13
/

Open in new window

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39223131
>>But if i only want to pull the ones in conflict!

Both of them seem to do this.  Where are they failing?

Check out the following test case.

Both methods return:
BARRINGTON PASSAGE SOUTH                           BARRINGTON PA
BARRINGTON PASSAGE NORTH                           BARRINGTON PA

drop table tab1 purge;
create table tab1 (city varchar2(50));

insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE SOUTH');
insert into tab1 values('MISSISSAGA');
commit;

select city, first_13 from 
(
select city, substr(city,1,13) first_13, count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count
from tab1
)
where substr_count > 1
order by first_13
/


SELECT L.city, A.PartialCity
FROM tab1 L
INNER JOIN 
(SELECT COUNT(*) AS MatchCount,
 substr(city,1,13) AS PartialCity
FROM tab1
GROUP BY  substr(city,1,13)
HAVING COUNT(*) > 1 
) A
ON A.PartialCity = substr(L.city, 1,13)
ORDER BY A.PartialCity 
/

Open in new window

0
 
LVL 11

Author Comment

by:Wilder1626
ID: 39223179
i understand that they are all pulled but for example MISSISSAUGA city, no other city name look like it. so i should not have it in my query. Only if more than 1 city name have the same 13 first chars.

Its a small detail but it will short the list since i have more than 7000 city names in USA and Canada
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39223188
>>i should not have it in my query.

Neither of the queries above brings it back.
0
 
LVL 11

Author Comment

by:Wilder1626
ID: 39223222
If you look in the city name 1 text file, you will see the extract info

i see all cities and not just the duplicated 13 chars
city-name1.txt
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39223245
Then there is more than 1 city with that name in the location table.

To change this to ONLY dupes that are 13 characters long try the code below.

Notice I added two DUPLICATE cities.
drop table tab1 purge;
create table tab1 (city varchar2(50));

insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE SOUTH');
insert into tab1 values('MISSISSAGA');
insert into tab1 values('DUPLICATE');
insert into tab1 values('DUPLICATE');
commit;

select city, first_13 from 
(
select city, substr(city,1,13) first_13, count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count
from tab1
)
where substr_count > 1 and length(first_13) = 13
order by first_13
/

Open in new window


Now to further the requirements:  Original city > 13 characters and dupes of 13 characters:

...
where substr_count > 1 and length(first_13) = 13 and length(city)>13
...
0
 
LVL 11

Author Comment

by:Wilder1626
ID: 39223263
i understand what you are saying. in the table i may see 100 type MISSISSAUGA.

But since it is the same name, it should not count them as duplicated name.

They only time it would is if i would have MISSISSAUGA and MISSISSAUGA1 for example.

I tried to add the DISTINCT but still pull all cities with more than 13 chars

SELECT DISTINCT CITY, FIRST_13 FROM 
(
SELECT DISTINCT CITY, SUBSTR(CITY,1,13) FIRST_13, COUNT(SUBSTR(CITY,1,13)) OVER(PARTITION BY SUBSTR(CITY,1,13)) SUBSTR_COUNT
FROM LOCATION
)
WHERE SUBSTR_COUNT > 1 AND LENGTH(FIRST_13) = 13
ORDER BY FIRST_13
/

Open in new window

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39223273
>>They only time it would is if i would have MISSISSAUGA and MISSISSAUGA1 for example.

MISSISSAUGAZ is only 11 characters.  I don't see why it would be considered a dupe with MISSISSAUGA1 given your current requirement of 13 characters.

Did you see the additional where clause outside of the code block above:
where substr_count > 1 and length(first_13) = 13 and length(city)>13


Please add to my test case above and post expected results.
0
 
LVL 11

Author Comment

by:Wilder1626
ID: 39223294
OK, here is another data set

If you take the first city name result: ANNAPOLIS ROYAL,ANNAPOLIS ROY

I only have 1 ANNAPOLIS ROY that link to ANNAPOLIS ROYAL

So it should not be pulled.

Unless i would of had:
ANNAPOLIS ROYAL
ANNAPOLIS ROYALB

Then it would pull
ANNAPOLIS ROYAL,ANNAPOLIS ROY
ANNAPOLIS ROYALB,ANNAPOLIS ROY
city-name2.txt
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39223453
OK, I think I finally understand.

Give me a little while to work on it.

I'll post what I get if no other Expert beats me to it.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39223483
One last potential issue.

Given the test case below, what are your expected results (I added another BARRINGTON PASSAGE NORTH):
drop table tab1 purge;
create table tab1 (city varchar2(50));

insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE SOUTH');
insert into tab1 values('MISSISSAGA');
insert into tab1 values('DUPLICATE');
insert into tab1 values('DUPLICATE');
insert into tab1 values('ANNAPOLIS ROYAL');
insert into tab1 values('ANNAPOLIS ROYAL');
commit;

Open in new window

0
 
LVL 11

Author Comment

by:Wilder1626
ID: 39223504
from your list, i should get these below only:

BARRINGTON PASSAGE SOUTH, BARRINGTON PA
BARRINGTON PASSAGE NORTH, BARRINGTON PA
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39223529
Thanks.  That was what I was guessing.

This is ugly but I think it works.  I'll see if I can clean it up some.  If not, hopefully another Expert will be able to find a cleaner way.

select city, first_13 from 
(
	select city,
		substr(city,1,13) first_13,
		count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count,
		count(substr(city,1,13)) over(partition by city) city_count
	from (
		select city from (
			select city, row_number() over(partition by city order by city) city_row_num
			from tab1
		)
		where city_row_num = 1
	)
)
where substr_count > 1 and substr_count > city_count
order by first_13
/

Open in new window

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39223540
This is a little cleaner:

select city, first_13 from 
(
	select city,
		substr(city,1,13) first_13,
		count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count
	from (
		select city from (
			select city, row_number() over(partition by city order by city) city_row_num
			from tab1
		)
		where city_row_num = 1
	)
)
where substr_count > 1
order by first_13
/

Open in new window

0
 
LVL 31

Expert Comment

by:awking00
ID: 39223619
select l.* from location l,
(select city, substr(city,1,13) first13
 ,rank() over (partition by substr(city,1,13) order by city) rnk
 from location) x
where substr(l.city,1,13) = x.first13
and x.rnk > 1;
0
 
LVL 31

Assisted Solution

by:awking00
awking00 earned 166 total points
ID: 39223628
If first13 is also needed -
select l.*, x.first13 from location l,
(select city, SUBSTR(CITY,1,13) first13
 ,rank() over (partition by substr(city,1,13) order by city) rnk
 from location) x
where substr(l.city,1,13) = x.first13
and x.rnk > 1
;
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39223645
awking00,

Using the test case I posted in http:#a39223483, I get an extra BARRINGTON PASSAGE NORTH.

It also hits the table twice.  It does however do fewer sorts...

Might be a wash depending on the size of the table.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39224711
I got a bit lost along the way about the requirements, here is a mixture of using row_number() and rank(), in 2 variations. The upper variations identifies the 'problems' (one record of each), the lower variation shows each problem record
select
*
from (
	
            SELECT city
                , SUBSTR(CITY, 1, 13) first13
                , row_number() OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnum
                , rank()       OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnk
            FROM LOCATION
      ) x
WHERE ( x.rnk > 1 OR x.rnum > 1 )
;


SELECT
      l.*
    , x.first13
    , x.rnum
    , x.rnk
FROM LOCATION l
INNER JOIN (
            SELECT city
                , SUBSTR(CITY, 1, 13) first13
                , row_number() OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnum
                , rank()       OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnk
            FROM LOCATION
            ) x ON l.city = x.city
WHERE ( x.rnk > 1 OR x.rnum > 1 )
;

Open in new window

http://sqlfiddle.com/#!4/e7536/6 further test cases might prove useful
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39224715
actually I'd like to know the full table definition really -
I assume there's an index on city, and an ID field.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39224758
mmm, adding another repeat in a city makes for further repeats using a join, using where exists avoids this potential
create table location (city varchar2(50));
 
insert into location values('BARRINGTON PASSAGE NORTH');
insert into location values('BARRINGTON PASSAGE NORTH');
insert into location values('BARRINGTON PASSAGE NORTH');
insert into location values('BARRINGTON PASSAGE SOUTH');
insert into location values('MISSISSAGA');
insert into location values('DUPLICATE');
insert into location values('DUPLICATE');
insert into location values('ANNAPOLIS ROYAL');
insert into location values('ANNAPOLIS ROYAL');

SELECT
*
FROM LOCATION
WHERE EXISTS (
              SELECT 1 FROM (
                              SELECT
                                    city
                                  , row_number() OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnum
                                  --, rank()       OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnk
                              FROM LOCATION
                             ) x
              WHERE LOCATION.city = x.city
              AND x.rnum > 1 -- ( x.rnk > 1 OR x.rnum > 1 )
             )
;	

Open in new window

http://sqlfiddle.com/#!4/0552d/1

{edit: row_number() by itself is sufficient: http://sqlfiddle.com/#!4/0552d/2 }
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39225362
PortletPaul,

Given the test case in http:#a39223483, I'm unable to produce the expected results in http:#a39223504 using any of those.

Here is how I understand the requirements:
For any different city names with a length greater than 13 characters, show the distinct ones that match on the first 13 characters.
0
 
LVL 31

Expert Comment

by:awking00
ID: 39225426
>>Using the test case I posted in http:#a39223483, I get an extra BARRINGTON PASSAGE NORTH.<<
While it may degrade performance, using distinct would resolve that issue.
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 167 total points
ID: 39225442
right - thanks - now I see: this produces the expected results:
CITY                                           FIRST13
BARRINGTON PASSAGE NORTH   BARRINGTON PA
BARRINGTON PASSAGE SOUTH   BARRINGTON PA
SELECT DISTINCT
	  l.*
	, x.first13
FROM LOCATION l
INNER JOIN (
            SELECT city
                , SUBSTR(CITY, 1, 13) first13
            FROM LOCATION
            ) x ON substr(l.city, 1, 13) = x.first13
               AND l.city <> x.city
;

Open in new window

http://sqlfiddle.com/#!4/0552d/9
0
 
LVL 76

Accepted Solution

by:
slightwv (䄆 Netminder) earned 167 total points
ID: 39225539
To summarize for Wilder1626:

There are now 3 queries that appear to give you what you want.

I suggest you test them all and pick the one that works best you for.

Here is a complete test case with 7000 random 'cities' added to you can see how they scale.

I've turned on autotrace so you can see which ones perform better than others.

Your actual mileage may vary since I don't know what possible indexes you have.

drop table tab1 purge;
create table tab1 (city varchar2(50));

insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE SOUTH');
insert into tab1 values('MISSISSAGA');
insert into tab1 values('DUPLICATE');
insert into tab1 values('DUPLICATE');
insert into tab1 values('ANNAPOLIS ROYAL');
insert into tab1 values('ANNAPOLIS ROYAL');
commit;

begin
	for i in 1..7000 loop
		insert into tab1 values(dbms_random.string('a',25));
	end loop;
	commit;
end;
/


set autotrace traceonly

--slightwv
select city, first_13 from 
(
	select city,
		substr(city,1,13) first_13,
		count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count
	from (
		select city from (
			select city, row_number() over(partition by city order by city) city_row_num
			from tab1
		)
		where city_row_num = 1
	)
)
where substr_count > 1
order by first_13
/

--awking00
select distinct l.*, x.first13 from tab1 l,
(
	select city, SUBSTR(CITY,1,13) first13 ,rank() over (partition by substr(city,1,13) order by city) rnk
 	from tab1
) x
where substr(l.city,1,13) = x.first13
and x.rnk > 1
; 

--PortletPaul
SELECT DISTINCT
	  l.*
	, x.first13
FROM tab1 l
INNER JOIN (
            SELECT city
                , SUBSTR(CITY, 1, 13) first13
            FROM tab1
            ) x ON substr(l.city, 1, 13) = x.first13
               AND l.city <> x.city
;
                                            
set autotrace off

Open in new window

0
 
LVL 11

Author Comment

by:Wilder1626
ID: 39225595
i all

wow thanks for your help.

I will test them all and let you know later.

Thanks again
0
 
LVL 11

Author Closing Comment

by:Wilder1626
ID: 39228501
Hi all

Thank you so mutch for all your help, the 3 versions are very good.

Many thanks
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

How to Unravel a Tricky Query Introduction If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them.  …
PL/SQL can be a very powerful tool for working directly with database tables. Being able to loop will allow you to perform more complex operations, but can be a little tricky to write correctly. This article will provide examples of basic loops alon…
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
This video shows how to Export data from an Oracle database using the Datapump Export Utility.  The corresponding Datapump Import utility is also discussed and demonstrated.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now