Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 446
  • Last Modified:

Oracle 10i = City name issue query issue

Hello all

I'm pulling a list of all cities in a table by using this SQL below:
SELECT DISTINCT CITY FROM LOCATION
ORDER BY CITY; 

Open in new window


But  what i'm trying to do is to be able to extract all cities where the 13 first characters are the same on 2 different city names.

Ex:
If i have a city named: BARRINGTON PASSAGE NORTH and also a city name called: BARRINGTON PASSAGE SOUTH.

The 13 first characters on both cities are: BARRINGTON PA


I need to pull all those different cities with the same 13 first chars.

In the first column, it would give me the full city name and in column 2 the 13 first characters.

Than it would order the column by the 13 chars column.

How can i do that?

Thanks again
0
Wilder1626
Asked:
Wilder1626
  • 15
  • 9
  • 4
  • +2
3 Solutions
 
slightwv (䄆 Netminder) Commented:
>>where the 13 first characters are the same on 2 different city names.


Extracting the first 13 characters is easy: substr(city,1,13).

I'm not understanding the actual requirement on what to do with them once we've extracted them.

Can you provide some sample data and expected results?
0
 
slightwv (䄆 Netminder) Commented:
>>If i have a city named: BARRINGTON PASSAGE NORTH

Did you edit this while I was typing of did I completely miss it?

I understand your requirements now.  Give me a minute...
0
 
UnifiedISCommented:
This is how I would do it in T-SQL, hope it helps:

SELECT COUNT(*) AS MatchCount,
 substr(city,1,13) AS PartialCity
FROM LOCATION
GROUP BY  substr(city,1,13)
HAVING COUNT(*) > 1
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
slightwv (䄆 Netminder) Commented:
Try this:

select city, first_13 from 
(
select city, substr(city,1,13) first_13, count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count
from location
)
where substr_count > 1
order by first_13
/

Open in new window



>>This is how I would do it in T-SQL, hope it helps:

That doesn't return the city.  Just the same substrings.
0
 
UnifiedISCommented:
Added the lookup to the location table to get the full cities and joined the count query by the first 13 characters:

SELECT L.city, A.PartialCity
FROM Location L
INNER JOIN
(SELECT COUNT(*) AS MatchCount,
 substr(city,1,13) AS PartialCity
FROM LOCATION
GROUP BY  substr(city,1,13)
HAVING COUNT(*) > 1
) A
ON A.PartialCity = substr(L.city, 1,13)
ORDER BY A.PartialCity
0
 
Wilder1626Author Commented:
wow, ok, let me try all these
0
 
slightwv (䄆 Netminder) Commented:
The one in http:#a39222968 hits the table twice.

Test them both but I feel mine will be more efficient.
0
 
Wilder1626Author Commented:
Ok, i like both, it gives me a result.

But if i only want to pull the ones in conflict!
For example, if i have :
BARRINGTON PASSAGE NORTH
BARRINGTON PASSAGE SOUTH
MISSISSAGA

Then the SQL will only give me :
BARRINGTON PASSAGE NORTH
BARRINGTON PASSAGE SOUTH


SELECT CITY, FIRST_13 FROM 
(
SELECT DISTINCT CITY, SUBSTR(CITY,1,13) FIRST_13, COUNT(SUBSTR(CITY,1,13)) OVER(PARTITION BY SUBSTR(CITY,1,13)) SUBSTR_COUNT
FROM LOCATION
)
WHERE SUBSTR_COUNT > 1
ORDER BY FIRST_13
/

Open in new window

0
 
slightwv (䄆 Netminder) Commented:
>>But if i only want to pull the ones in conflict!

Both of them seem to do this.  Where are they failing?

Check out the following test case.

Both methods return:
BARRINGTON PASSAGE SOUTH                           BARRINGTON PA
BARRINGTON PASSAGE NORTH                           BARRINGTON PA

drop table tab1 purge;
create table tab1 (city varchar2(50));

insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE SOUTH');
insert into tab1 values('MISSISSAGA');
commit;

select city, first_13 from 
(
select city, substr(city,1,13) first_13, count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count
from tab1
)
where substr_count > 1
order by first_13
/


SELECT L.city, A.PartialCity
FROM tab1 L
INNER JOIN 
(SELECT COUNT(*) AS MatchCount,
 substr(city,1,13) AS PartialCity
FROM tab1
GROUP BY  substr(city,1,13)
HAVING COUNT(*) > 1 
) A
ON A.PartialCity = substr(L.city, 1,13)
ORDER BY A.PartialCity 
/

Open in new window

0
 
Wilder1626Author Commented:
i understand that they are all pulled but for example MISSISSAUGA city, no other city name look like it. so i should not have it in my query. Only if more than 1 city name have the same 13 first chars.

Its a small detail but it will short the list since i have more than 7000 city names in USA and Canada
0
 
slightwv (䄆 Netminder) Commented:
>>i should not have it in my query.

Neither of the queries above brings it back.
0
 
Wilder1626Author Commented:
If you look in the city name 1 text file, you will see the extract info

i see all cities and not just the duplicated 13 chars
city-name1.txt
0
 
slightwv (䄆 Netminder) Commented:
Then there is more than 1 city with that name in the location table.

To change this to ONLY dupes that are 13 characters long try the code below.

Notice I added two DUPLICATE cities.
drop table tab1 purge;
create table tab1 (city varchar2(50));

insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE SOUTH');
insert into tab1 values('MISSISSAGA');
insert into tab1 values('DUPLICATE');
insert into tab1 values('DUPLICATE');
commit;

select city, first_13 from 
(
select city, substr(city,1,13) first_13, count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count
from tab1
)
where substr_count > 1 and length(first_13) = 13
order by first_13
/

Open in new window


Now to further the requirements:  Original city > 13 characters and dupes of 13 characters:

...
where substr_count > 1 and length(first_13) = 13 and length(city)>13
...
0
 
Wilder1626Author Commented:
i understand what you are saying. in the table i may see 100 type MISSISSAUGA.

But since it is the same name, it should not count them as duplicated name.

They only time it would is if i would have MISSISSAUGA and MISSISSAUGA1 for example.

I tried to add the DISTINCT but still pull all cities with more than 13 chars

SELECT DISTINCT CITY, FIRST_13 FROM 
(
SELECT DISTINCT CITY, SUBSTR(CITY,1,13) FIRST_13, COUNT(SUBSTR(CITY,1,13)) OVER(PARTITION BY SUBSTR(CITY,1,13)) SUBSTR_COUNT
FROM LOCATION
)
WHERE SUBSTR_COUNT > 1 AND LENGTH(FIRST_13) = 13
ORDER BY FIRST_13
/

Open in new window

0
 
slightwv (䄆 Netminder) Commented:
>>They only time it would is if i would have MISSISSAUGA and MISSISSAUGA1 for example.

MISSISSAUGAZ is only 11 characters.  I don't see why it would be considered a dupe with MISSISSAUGA1 given your current requirement of 13 characters.

Did you see the additional where clause outside of the code block above:
where substr_count > 1 and length(first_13) = 13 and length(city)>13


Please add to my test case above and post expected results.
0
 
Wilder1626Author Commented:
OK, here is another data set

If you take the first city name result: ANNAPOLIS ROYAL,ANNAPOLIS ROY

I only have 1 ANNAPOLIS ROY that link to ANNAPOLIS ROYAL

So it should not be pulled.

Unless i would of had:
ANNAPOLIS ROYAL
ANNAPOLIS ROYALB

Then it would pull
ANNAPOLIS ROYAL,ANNAPOLIS ROY
ANNAPOLIS ROYALB,ANNAPOLIS ROY
city-name2.txt
0
 
slightwv (䄆 Netminder) Commented:
OK, I think I finally understand.

Give me a little while to work on it.

I'll post what I get if no other Expert beats me to it.
0
 
slightwv (䄆 Netminder) Commented:
One last potential issue.

Given the test case below, what are your expected results (I added another BARRINGTON PASSAGE NORTH):
drop table tab1 purge;
create table tab1 (city varchar2(50));

insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE SOUTH');
insert into tab1 values('MISSISSAGA');
insert into tab1 values('DUPLICATE');
insert into tab1 values('DUPLICATE');
insert into tab1 values('ANNAPOLIS ROYAL');
insert into tab1 values('ANNAPOLIS ROYAL');
commit;

Open in new window

0
 
Wilder1626Author Commented:
from your list, i should get these below only:

BARRINGTON PASSAGE SOUTH, BARRINGTON PA
BARRINGTON PASSAGE NORTH, BARRINGTON PA
0
 
slightwv (䄆 Netminder) Commented:
Thanks.  That was what I was guessing.

This is ugly but I think it works.  I'll see if I can clean it up some.  If not, hopefully another Expert will be able to find a cleaner way.

select city, first_13 from 
(
	select city,
		substr(city,1,13) first_13,
		count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count,
		count(substr(city,1,13)) over(partition by city) city_count
	from (
		select city from (
			select city, row_number() over(partition by city order by city) city_row_num
			from tab1
		)
		where city_row_num = 1
	)
)
where substr_count > 1 and substr_count > city_count
order by first_13
/

Open in new window

0
 
slightwv (䄆 Netminder) Commented:
This is a little cleaner:

select city, first_13 from 
(
	select city,
		substr(city,1,13) first_13,
		count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count
	from (
		select city from (
			select city, row_number() over(partition by city order by city) city_row_num
			from tab1
		)
		where city_row_num = 1
	)
)
where substr_count > 1
order by first_13
/

Open in new window

0
 
awking00Commented:
select l.* from location l,
(select city, substr(city,1,13) first13
 ,rank() over (partition by substr(city,1,13) order by city) rnk
 from location) x
where substr(l.city,1,13) = x.first13
and x.rnk > 1;
0
 
awking00Commented:
If first13 is also needed -
select l.*, x.first13 from location l,
(select city, SUBSTR(CITY,1,13) first13
 ,rank() over (partition by substr(city,1,13) order by city) rnk
 from location) x
where substr(l.city,1,13) = x.first13
and x.rnk > 1
;
0
 
slightwv (䄆 Netminder) Commented:
awking00,

Using the test case I posted in http:#a39223483, I get an extra BARRINGTON PASSAGE NORTH.

It also hits the table twice.  It does however do fewer sorts...

Might be a wash depending on the size of the table.
0
 
PortletPaulCommented:
I got a bit lost along the way about the requirements, here is a mixture of using row_number() and rank(), in 2 variations. The upper variations identifies the 'problems' (one record of each), the lower variation shows each problem record
select
*
from (
	
            SELECT city
                , SUBSTR(CITY, 1, 13) first13
                , row_number() OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnum
                , rank()       OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnk
            FROM LOCATION
      ) x
WHERE ( x.rnk > 1 OR x.rnum > 1 )
;


SELECT
      l.*
    , x.first13
    , x.rnum
    , x.rnk
FROM LOCATION l
INNER JOIN (
            SELECT city
                , SUBSTR(CITY, 1, 13) first13
                , row_number() OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnum
                , rank()       OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnk
            FROM LOCATION
            ) x ON l.city = x.city
WHERE ( x.rnk > 1 OR x.rnum > 1 )
;

Open in new window

http://sqlfiddle.com/#!4/e7536/6 further test cases might prove useful
0
 
PortletPaulCommented:
actually I'd like to know the full table definition really -
I assume there's an index on city, and an ID field.
0
 
PortletPaulCommented:
mmm, adding another repeat in a city makes for further repeats using a join, using where exists avoids this potential
create table location (city varchar2(50));
 
insert into location values('BARRINGTON PASSAGE NORTH');
insert into location values('BARRINGTON PASSAGE NORTH');
insert into location values('BARRINGTON PASSAGE NORTH');
insert into location values('BARRINGTON PASSAGE SOUTH');
insert into location values('MISSISSAGA');
insert into location values('DUPLICATE');
insert into location values('DUPLICATE');
insert into location values('ANNAPOLIS ROYAL');
insert into location values('ANNAPOLIS ROYAL');

SELECT
*
FROM LOCATION
WHERE EXISTS (
              SELECT 1 FROM (
                              SELECT
                                    city
                                  , row_number() OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnum
                                  --, rank()       OVER (PARTITION BY substr(city, 1, 13) ORDER BY city ) rnk
                              FROM LOCATION
                             ) x
              WHERE LOCATION.city = x.city
              AND x.rnum > 1 -- ( x.rnk > 1 OR x.rnum > 1 )
             )
;	

Open in new window

http://sqlfiddle.com/#!4/0552d/1

{edit: row_number() by itself is sufficient: http://sqlfiddle.com/#!4/0552d/2 }
0
 
slightwv (䄆 Netminder) Commented:
PortletPaul,

Given the test case in http:#a39223483, I'm unable to produce the expected results in http:#a39223504 using any of those.

Here is how I understand the requirements:
For any different city names with a length greater than 13 characters, show the distinct ones that match on the first 13 characters.
0
 
awking00Commented:
>>Using the test case I posted in http:#a39223483, I get an extra BARRINGTON PASSAGE NORTH.<<
While it may degrade performance, using distinct would resolve that issue.
0
 
PortletPaulCommented:
right - thanks - now I see: this produces the expected results:
CITY                                           FIRST13
BARRINGTON PASSAGE NORTH   BARRINGTON PA
BARRINGTON PASSAGE SOUTH   BARRINGTON PA
SELECT DISTINCT
	  l.*
	, x.first13
FROM LOCATION l
INNER JOIN (
            SELECT city
                , SUBSTR(CITY, 1, 13) first13
            FROM LOCATION
            ) x ON substr(l.city, 1, 13) = x.first13
               AND l.city <> x.city
;

Open in new window

http://sqlfiddle.com/#!4/0552d/9
0
 
slightwv (䄆 Netminder) Commented:
To summarize for Wilder1626:

There are now 3 queries that appear to give you what you want.

I suggest you test them all and pick the one that works best you for.

Here is a complete test case with 7000 random 'cities' added to you can see how they scale.

I've turned on autotrace so you can see which ones perform better than others.

Your actual mileage may vary since I don't know what possible indexes you have.

drop table tab1 purge;
create table tab1 (city varchar2(50));

insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE NORTH');
insert into tab1 values('BARRINGTON PASSAGE SOUTH');
insert into tab1 values('MISSISSAGA');
insert into tab1 values('DUPLICATE');
insert into tab1 values('DUPLICATE');
insert into tab1 values('ANNAPOLIS ROYAL');
insert into tab1 values('ANNAPOLIS ROYAL');
commit;

begin
	for i in 1..7000 loop
		insert into tab1 values(dbms_random.string('a',25));
	end loop;
	commit;
end;
/


set autotrace traceonly

--slightwv
select city, first_13 from 
(
	select city,
		substr(city,1,13) first_13,
		count(substr(city,1,13)) over(partition by substr(city,1,13)) substr_count
	from (
		select city from (
			select city, row_number() over(partition by city order by city) city_row_num
			from tab1
		)
		where city_row_num = 1
	)
)
where substr_count > 1
order by first_13
/

--awking00
select distinct l.*, x.first13 from tab1 l,
(
	select city, SUBSTR(CITY,1,13) first13 ,rank() over (partition by substr(city,1,13) order by city) rnk
 	from tab1
) x
where substr(l.city,1,13) = x.first13
and x.rnk > 1
; 

--PortletPaul
SELECT DISTINCT
	  l.*
	, x.first13
FROM tab1 l
INNER JOIN (
            SELECT city
                , SUBSTR(CITY, 1, 13) first13
            FROM tab1
            ) x ON substr(l.city, 1, 13) = x.first13
               AND l.city <> x.city
;
                                            
set autotrace off

Open in new window

0
 
Wilder1626Author Commented:
i all

wow thanks for your help.

I will test them all and let you know later.

Thanks again
0
 
Wilder1626Author Commented:
Hi all

Thank you so mutch for all your help, the 3 versions are very good.

Many thanks
0

Featured Post

Get quick recovery of individual SharePoint items

Free tool – Veeam Explorer for Microsoft SharePoint, enables fast, easy restores of SharePoint sites, documents, libraries and lists — all with no agents to manage and no additional licenses to buy.

  • 15
  • 9
  • 4
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now