asked on

Position of an item in a large recordset

Hope this is not too convoluted. I have shown one solution, but I question the approach.
I'm hoping that someone will have a better idea. There is probably some repetition in the description,
I wanted to convey the idea, not optimize it to fewest words.

Problem description:
Determine what ordinal position a single value occurs in an ordered list of all values, and, express this as a percentage.
Essentially asking: "what's the rating of my value in the entire set of values?"
Mean and standard deviation don't really work on this because the data set is not close to normal.

Simple example :
There are 100 rows, values 1 to 10, lots of duplicates. But containing only the distinct values 1 to 10.
Your number is 8
So your value is in the top 80% of all values.
Real case - there are million rows of real numbers. Ranging from -75.234234 ( whatever) to +92.23423 ( again, whatever)
------------------------------------------------------------------
Keep reading...trying to make it clear. Thanks

Assume table of n rows, with numeric column containing random numbers. There are duplicate values.
Problem is to determine what position a specific value is in this set of numbers, when it's reduced to a duplicate free set.

I can get a list of distinct ordered values and the rowid like this:

select Targetvalue , DENSE_RANK() over (ORDER BY Targetvalue ) as rowid from mydatatable

Let's say there are one million rows, now it's reduced to distinct 400,000.

I want to find the rowid, or placement of a specific value in the ordered set of all values,
then divide by the total rows ( the max rowid will be the same value) and get the percentage placement of the single sought value.

This is does work, but maybe someone has a better approach:

-- Find the value, easy:
1) Make a prepared table containing the distinct ordered values and the row ids.
select Targetvalue , DENSE_RANK() over (ORDER BY Targetvalue ) as rowid into _OrderedValues from Mytable

2) Then just do a look up, get the rowid, or absolute placement in the set.
select rowid from _OrderedValues where targetvalue = X

Ok, now, what percentage placement is my target value in the set?
I need the total number of rows, and the rowid for the sought value.
rowid / max(rowid) = %

This works but again, I question the efficiency of the entire approach:

select rowid,targetvalue , cast(rowid as real)/ cast(maxrow as real) from
_OrderedValues join (select MAX(rowid)maxrow from _OrderedValues ) A on maxrow > 0
where targetvalue = 8.322303 ( example random value sought)

This example returns 0.9123968 in my data.
I can state that target value is in the top 91% of the entire set.

As these values don't change very often, I'm thinking it's better to make various lookup tables to avoid having
the full data set crunched over every time a user makes a request.

Or does anyone have a efficient way to do this on a pretty large table without having
to make some canned tables for look ups?

Hope this is clearly enough stated.
Thanks

Ioannis Paraskevopoulos

Hi,

I really do not know if my approach would be better or more efficient, but it might get you away of some writes.

1. Use a CTE for your distinct values:

;
WITH MyDenseRankCTE AS
(
    SELECT Targetvalue  , 
                  DENSE_RANK() over (ORDER BY Targetvalue  ) as rowid  
    FROM   mydatatable
)

Open in new window

If you would select from MyDenseRankCTE (you can name it whatever else you like) it would get the exact same results as _OrderedValues table.

You may now do your calculation:

SELECT @rowid / MAX(rowid) AS Percentage 
FROM   MyDenseRankCTE

Open in new window

Summing up:

DECLARE @rowid AS FLOAT ;
SET @rowid = 8.322303 ;

WITH MyDenseRankCTE AS
(
    SELECT Targetvalue  , 
                  DENSE_RANK() over (ORDER BY Targetvalue  ) as rowid  
    FROM   mydatatable
)

SELECT @rowid / MAX(rowid) AS Percentage 
FROM   MyDenseRankCTE ;

Open in new window

Again, i do not know if this would be more efficient. You could have a look at the execution plans maybe.

Giannis

ASKER CERTIFIED SOLUTION

DBAduck - Ben Miller

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Scott Pletcher

You could also create a separate table that held all unique values with a count of each value.

The advantage of this is that triggers could maintain that table dynamically. Then, when you looked up a value, you'd just have to add predetermined totals less than the current value, not resequence the entire data set over and over.

Periodically, "just to be safe", you could do a complete rebuild of the unique values total table to make sure it accurately matches the data.

awalkinthepark

ASKER

thanks.