Link to home
Start Free TrialLog in
Avatar of cmulcahy
cmulcahy

asked on

Oracle's dense_rank() in PostgreSQL

I'm trying to convert a bit of SQL that I got in the Oracle forum to PostgreSQL.  My primary DBMS is Oracle but I'd like to see this run in PostgreSQL if possible.

Here is my scenario:

--Given:
--create table xref (
--  combined  varchar(20),
--  memberid     varchar(20)
--);
--
--insert into xref(combined, memberid) values ('1', '111');
--insert into xref(combined, memberid) values ('1', '222');
--insert into xref(combined, memberid) values ('1', '333');
--insert into xref(combined, memberid) values ('1', '444');
--insert into xref(combined, memberid) values ('2', '555');
--insert into xref(combined, memberid) values ('2', '666');
--insert into xref(combined, memberid) values ('2', '777');
--insert into xref(combined, memberid) values ('2', '888');
--insert into xref(combined, memberid) values ('2', '999');
--insert into xref(combined, memberid) values ('3', '987');



SELECT newid, oldid
  FROM (
         SELECT memberid oldid,
                last_value(memberid) over (
                                         PARTITION BY combined ORDER BY memberid
                                         ROWS BETWEEN CURRENT ROW AND unbounded
                                         following
                                        ) newid
           FROM xref
);

--NEWID         OLDID
--------------- -------------
--444           111
--444           222
--444           333
--444           444
--999           555
--999           666
--999           777
--999           888
--999           999
--987           987
--10 rows selected
--
--

Any ideas?

Thanks!
ASKER CERTIFIED SOLUTION
Avatar of earth man2
earth man2
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of cmulcahy
cmulcahy

ASKER

Of course I'm an optimist.  If I were to face reality all the time, I am not sure I could get out of bed each day.

Is there a way to produce the output I'm after without dense_rank()?  I want to take the input shown, of all of the similar COMBINED fields, take the alphabetically last one as the NEWID and use all of the values like it as OLDID?

I'm thinking subselects.  Now that I think about it a bit more, I think I may have the answer but will need to do some testing.  I still welcome answers here!

Thanks!
I would still like to get dense_rank()-like functionality out of PostgreSQL, but..

select sNew.newMemberID, xref.memberid
  from xref, (select combined, max(memberid) as newMemberID
                   from xref
                  group by combined) sNew
 where xref.combined = sNew.combined;

This produces the output achieved with dense_rank() in Oracle, albeit a bit less efficiently, which is a bit of a problem with my 14 million row data set but at least it produces that output.

If there were an add-on module or plpgsql script that would produce the output I get from dense_rank(), I would be a very happy guy.

Thanks!
What exactly does dense_rank() do? FWIW, you can build your own aggregate functions using PL/pgSQL and CREATE AGGREGATE. Is that what you're looking for?
You can improve performance for this type of problem using set returning functions.   The programming goes on behind the stored procedure interface instead of within the SQL statement.
rank() and dense_rank() in Oracle assign a ranking to each row in a result set:

Given the data (arbitrarily chosen):
  Keyid     Value
  1            Blue
  9            Green
  2            Apple

select dense_rank(), value, keyid from the_data order by value;

  1     Apple     2
  2     Blue       1
  3     Green    9

It just numbers, sequentially, the rows in the result set.  If I had two Blue values in the result set, it would have returned 2  Blue twice.  It will duplicate the rank for duplicated values.

rank() does much the same thing, except when it encounters duplicates, it will assign the second value the same rank as the first, but skip the rank that it would have had.  For example, rank 2 would have been duplicated but 3 would have been skipped.

See:
http://www.oracle-base.com/articles/9i/RankDenseRankFirstLastAnalyticFunctions.php

I'm a decent C programmer, though I'm much better with C++.  Anyone care to at least get me started writing a function?  Of course, a plpgsql function is always welcome!

Thanks!