Link to home
Start Free TrialLog in
Avatar of stopm
stopmFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Sql (oracle) select distinct on two rows and order by count

Hi
In one table i have two columns  'Names'  and 'Numbers' .
There are multiple rows for the unique  'Names'  and they can have with different  'Numbers'

eg

Names  |  Number |
John        1234
John        1234
John        1234
John        4567
John        6789

I want a distinct list of the 'Names' and a count of how many different 'Numbers' each name has ordered  highest to lowest so in example above

Names |  Count
John           3

There is also a stored date field so only want results for last 24 hours eg stored > sysdate -1.

Preferably though i would like to include the 'Numbers' field as well but still order by highest count first

eg
Names |  Numbers
John          1234
John          4567
John          6789
Steve        2312
Steve        4765

Probably a real simple one but combinations i tried didn't get results i thought they would.
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Try this:
select names, count(mynumber) from
( select distinct names,mynumber from tab1 where somedate_column >= trunc(sysdate-1)
)
group by names
/

Open in new window

oops...  missed some requirements.  The sample data threw me off.

I'm confused.

You first want the name and a count of distinct numbers then a list of names and distinct numbers.

Which is it?  I see these as two different selects.

Can you clarify?
OK,  see if this test case gives you what you need.

It doesn't have the date logic but I provided that in the post above.

If there are problems, please add to the test case and explain what you are adding and why.

drop table tab1 purge;
create table tab1 (names varchar2(5), mynumber number);

insert into tab1 values('John',1234);
insert into tab1 values('John',1234);
insert into tab1 values('John',1234);
insert into tab1 values('John',4567);
insert into tab1 values('John',6789);
insert into tab1 values('John',5);
--
insert into tab1 values('Steve',2);
insert into tab1 values('Steve',3);
insert into tab1 values('Steve',3);
commit;



select names, max(number_count) from 
(
select names,
	mynumber,
	count(*) over(partition by names, mynumber order by mynumber) number_count
from tab1
)
group by names
order by 2 desc
/



select names, mynumber from
(
select distinct names, mynumber, number_count
from
(
select names,
	mynumber,
	count(*) over(partition by names, mynumber order by mynumber) number_count
from tab1
)
order by names,number_count desc
)
/

Open in new window

I'm not an Oracle expert, this works in MS SQL Server (using a derived-table):
SELECT DISTINCT
	a.Name
,	a.Number
,	b.RowCount
FROM	SomeTable	a
JOIN	(	SELECT	Name
		,	Number
		,	COUNT(*) as RowCount
		FROM	SomeTable
		GROUP
		BY	Name
		,	Number
	)	b	ON	a.Name = b.Name
			AND	a.Number = b.Number

OUTPUT:

	Name	Number	RowCount
	------	------	--------
	John	1234	3
	John	4567	1
	John	6789	1
	Steve	2312	1
	Steve	4765	1

Open in new window

>>I'm not an Oracle expert, this works in MS SQL Server (using a derived-table):

Syntax also appears to work in Oracle in my 11gR2 database.

It hits the table twice which could be an issue.  The examples I posted should only hit it once.
This variant also works in 11g (no nesting)

SELECT DISTINCT
        name
      , mynumber
      , count(*) over(partition BY name, mynumber) AS number_count
FROM Table1
ORDER BY number_count DESC
      
      
but you do get all 3 columns returned

{+edit, order by isn't required in the count}
Avatar of stopm

ASKER

Hi Slightwv,

So going back to your first sql will this one count blanks as i hadn't realised that some rows  have a Name but no Number. (i don't want to count the blanks)

eg

Names  Numbers
John       1234
John      
John       1234

As for the other question,  yes they are two different statements and my preferred one would be to return the names and the numbers with the Name that has the highest amount of combinations at the top of the list.

Names |  Numbers
John          1234
John          4567
John          6789
Steve        2312
Steve        4765

Just going to try the other select statement you sent.
Thanks
yes, nulls will be counted

Names  Numbers
John       1234
John      
John       1234

should produce

John <null> 1
John 1234 2
>> (i don't want to count the blanks)

Add "and numbers is not null" to the inner query?

If the column contains spaces:
and trim(numbers) is not null
Avatar of stopm

ASKER

Sorry for delay in replying- Easter break

Most of the above queries give me a total count of all rows in the table of Names and Numbers but  I need just the unique combinations of Names and Numbers ordered by the highest count of these first (as per example below count below 4,3,2)
Note:- The table has many more rows where the Names and the Numbers are identical but i don't need to know this number only the distinct pairings.

Eg:-

Names |  Numbers | Count of unique combinations
John          1234              4
John          4567              4
John          6789              4
John          9876              4
Steve        2312              3
Steve        4765              3
Steve        5678              3
Colin         7777              2
Colin         6666              2              

This query below from 'slightwv' does give me a count of the unique combinations of Names and Numbers but would be good if i can get as above and include the Numbers field as well or even just the Names and Numbers without the count field providing they are ordered in highest number of  distinct combinations descending:-

select  names, count(numbers) from ( select distinct name, numbers
from tab1 where numbers is not null))
group by names order by count(numbers) desc

Names |  Count
John           4
Steve         3
Colin          2

Thanks,  and hope this makes sense.
Avatar of stopm

ASKER

Increased points as it looks a bit more complicated then i thought originally
Sorry but I still don't understand.

Looks like a simple tweak to the second query in http:#a40005407 returns the numbers and count.

Change:
select names, mynumber from

To:
select names, mynumber, number_count from

Making that change with the sample data above I get:
NAMES   MYNUMBER NUMBER_COUNT
----- ---------- ------------
John        1234            3
John           5            1
John        4567            1
John        6789            1
Steve          3            2
Steve          2            1

Open in new window


Also looks like the example from John_Vidmar also shows similar results even though it hits the table twice.

Instead of trying to explain what you want please provide sample raw data and expected results.  Similar to the test case I posted.  It has the base table and sample data.  then all we need to see is the actual expected results.
count() over( ... order by ..)
there is no point or need for order by inside that over(...)

and nesting of the query isn't required either because distinct is a row operator (it's applied after all rows are known)

 
    SELECT DISTINCT
            names
          , mynumber
          , count(*) over(partition BY names, mynumber) AS number_count
    FROM Tab1
    ORDER BY names, number_count DESC, mynumber ASC
    
    

**[Results][2]**:
    
    | NAMES | MYNUMBER | NUMBER_COUNT |
    |-------|----------|--------------|
    |  John |     1234 |            3 |
    |  John |        5 |            1 |
    |  John |     4567 |            1 |
    |  John |     6789 |            1 |
    | Steve |        3 |            2 |
    | Steve |        2 |            1 |



  [1]: http://sqlfiddle.com/#!4/0eddb/3

Open in new window

Avatar of stopm

ASKER

Thanks slightwv

Ref your comment :-
Regarding the second query in http:#a40005407 

Yes that query does return the results i want , except it does not order them by highest number of unique combinations 1st it orders them by the Names field descending.
 (In my table the Names are actually stored as digits to represent peoples names (eg John = 04, Steve =02 etc.. Thought it would be simpler to use names in example  but maybe that was not such a good idea!)
 
Sorry actual data from my table is a bit sensitive to post on line.

-------------------------------------------------------
Making these changes below only gave me count of the number of  rows which i am not interested in. eg I don't need to know that John 1234 has 3 entries. I only want it to be above Steve because it has 4 unique entries and Steve only has 2


NAMES   MYNUMBER NUMBER_COUNT
----- ---------- ------------
John        1234            3


Change:
select names, mynumber from

To:
select names, mynumber, number_count from

Thanks again
>>" except it does not order them by highest number of unique combinations 1st it orders them by the Names field descending."

a. that is trivially easy to fix, just alter the order by clause to suit
    SELECT DISTINCT
            names
          , mynumber
          , count(*) over(partition BY names, mynumber) AS number_count
    FROM Tab1
    ORDER BY number_count DESC, names, mynumber ASC -- change line this to suit

Open in new window

, and

b. that is what you actually asked for in the question: the provided list is ordered by name first
Preferably though i would like to include the 'Numbers' field as well but still order by highest count first

eg
Names |  Numbers
John          1234
John          4567
John          6789
Steve        2312
Steve        4765
Avatar of stopm

ASKER

Hi PotrletPaul

Your query is returning a count of all the rows as opposed to only a count of the unique combinations of Names and Numbers
eg.  There are thousands of rows  for John 123, John 4567 and John 6789  in the table but i don't need a count of all  these rows i just need a count of the unique parings , in this example would = 3.  
I only ever  wanted the list ordered by highest number of these unique combinations 1st    but grouped together by Names.   So if  'Steve'  has got more unique combinations then 'John'  put all of 'Steve's' unique combinations at the top of the list above 'John's'

Sorry if that was not clear in my statement

Thanks
Please:  As I posted in http:#a40013846

Instead of trying to explain what you want please provide sample raw data and expected results.  Similar to the test case I posted.  It has the base table and sample data.  then all we need to see is the actual expected results.

Given all your explanations once we see the raw data along with the expected results we'll be able to post a working test case.
Avatar of stopm

ASKER

Sorry as i said earlier i cant post the raw data on line.
>>Sorry as i said earlier i cant post the raw data on line.

I'm not asking for real data.  Just dummy data that is close enough to your actual structure that when we post working SQL, you can apply it to your data.

You've provided dummy expected results in just about every post but  you have never provided the raw dummy data to go with it.

What I am looking for is a sample test case like I posted in http:#a40005407

It has a table and sample data.  The only thing I'm missing is:
Given that table and data, what results do you want from it.

Make it up but make it close to your actual situation.

For example say you are really working with payroll data and have:
table employee_pay(ssn varchar2(9), salary number)
111-22-3333, 10000000
666-44-9999, 10
...

Can you not post a dummy table here:
table fuzzy_pickle(flavor varchar2(9), calories number)
dill, 10
kohser, 500
...

Same table structure, same data types, similar data.  All you need to do is provide the expected results from the data you provide.

Then any query against fuzzy_pickle would run against your salary table.  Just change the names.
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
See?  Paul's post above has table, data and results.  nothing in it is 'real'.

If that post isn't what you are after, take that example, add or remove data from it and provide the results you are after.

Don't explain 'why' it doesn't work, just post the results as you want to see them.
utterly agree with slightwv

you have implied sample data, but not actually provided any
e.g. the result list at comment 40013818

(which is what I have based my latest queries on)

e.g. the following would have been clear, with less words

Sample Data:
Table1
NAMES        NUMBERS
Colin        6666
Colin        6666
Colin        6666
Colin        7777
Colin        7777
Colin        7777
John         1234
John         1234
John         1234
John         4567
John         4567
John         4567
John         6789
John         6789
John         6789
John         9876
John         9876
John         9876
Steve        2312
Steve        2312
Steve        2312
Steve        4765
Steve        4765
Steve        4765
Steve        5678
Steve        5678
Steve        5678

Expected Result:
Names |  Numbers | Count of unique combinations
John          1234              4
John          4567              4
John          6789              4
John          9876              4
Steve        2312              3
Steve        4765              3
Steve        5678              3
Colin         7777              2
Colin         6666              2      

assuming of course that I have interpreted comment 40013818 correctly
Avatar of stopm

ASKER

Many Thanks PortletPaul your a star,  both your query's look to work perfectly but still doing some testing to confirm.
Much appreciated
Thanks for the award of points, but you have graded the answer as B. Quite frankly I cannot possibly see how that was warranted.
What grade should I award?

A B grade means the solution given lacked some information or required you to do a good amount of extra work to resolve the problem. When closing the question, the asker should explain why a B grade was awarded.
Did the answer lack information?
Did you have to perform extra work to get it to work for you?
Did you explain why you chose the B grading?

(And by the way, you have flagged the wrong comment as the accepted answer, which should have been comment ID: 40016346)

I see you quite often award B's and C's
You really need to read that article above. If you feel a question is simple to answer, adjust the points downward. You should not use grading for scaling the percieved difficulty.

I do realise reaching the answer probably took longer than you expected, but hopefully we have not only explained why that was true, but also shown you techniques to express your requirements in future.
Avatar of stopm

ASKER

Hi SouthMod

After reading the Grade Award info i can see that i should have given this question an A grade, as the answer  did fulfil all requirements.

Not sure if you can change retrospectively?
Thanks
@stopm

SouthMod did alter set the accepted answer to  ID: 40016346 and the grading to A so there's no more to be done, but thank you for agreeing to that change.

If you did need to change something you may "Request Attention" (a link you will find under the question) and ask the moderators for assistance.

Paul