Position of an item in a large recordset

Hope this is not too convoluted. I have shown one solution, but I question the approach.
I'm hoping that someone will have a better idea.  There is probably some repetition in the description,
I wanted to convey the idea, not optimize it to fewest words.

Problem description:  
Determine what ordinal position a single value occurs in an ordered list of all values, and, express this as a percentage.  
Essentially asking:  "what's the rating of my value in the entire set of values?"
Mean and standard deviation don't really work on this because the data set is not close to normal.


Simple example :
There are 100 rows, values 1 to 10, lots of duplicates. But containing only the distinct values 1 to 10.
Your number is 8
So your value is in the top 80% of all values.
Real case - there are million rows of real numbers. Ranging from  -75.234234 ( whatever) to +92.23423 ( again, whatever)
------------------------------------------------------------------
Keep reading...trying to make it clear. Thanks

Assume table of n rows, with  numeric column containing random numbers.  There are  duplicate values.
Problem is to  determine what position a specific value is in this set of numbers, when it's reduced to a duplicate free set.

I can get a list of distinct ordered values and the rowid like this:

select  Targetvalue  , DENSE_RANK() over (ORDER BY Targetvalue  ) as rowid  from  mydatatable

Let's say there are one million rows, now it's reduced to distinct 400,000.

I want to find the rowid, or placement of a specific value in the ordered set of all values,
then  divide by the total rows ( the max rowid will be the same value)  and get the  percentage placement of the single sought value.

This is does work, but maybe someone has a better approach:

-- Find the value, easy:
1) Make a prepared table containing the distinct ordered values and the row ids.
select  Targetvalue  , DENSE_RANK() over (ORDER BY Targetvalue  ) as rowid  into _OrderedValues  from  Mytable

2) Then just do a look up, get the rowid, or absolute placement in the set.
select rowid from _OrderedValues   where  targetvalue = X


Ok, now, what percentage placement is my target value in  the set?
I need the total number of rows, and the rowid for the sought value.
rowid / max(rowid) = %

This works but again,  I question the efficiency of the entire approach:

select rowid,targetvalue , cast(rowid as real)/ cast(maxrow as real) from
_OrderedValues  join (select  MAX(rowid)maxrow  from  _OrderedValues  )  A  on  maxrow > 0
where  targetvalue = 8.322303 ( example random value sought)

This  example returns  0.9123968 in my data.
I can state that  target value is in the top 91% of the entire set.

As these values don't change very often, I'm thinking it's better to make various lookup tables to avoid having
the full data set crunched over every time a user makes a request.

Or does anyone have a efficient way to do this on a pretty large table without having
to make some canned tables for look ups?

Hope this is clearly enough stated.
Thanks
awalkintheparkAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ioannis ParaskevopoulosCommented:
Hi,

I really do not know if my approach would be better or more efficient, but it might get you away of some writes.

1. Use a CTE for your distinct values:

;
WITH MyDenseRankCTE AS
(
    SELECT Targetvalue  , 
                  DENSE_RANK() over (ORDER BY Targetvalue  ) as rowid  
    FROM   mydatatable
)

Open in new window


If you would select from MyDenseRankCTE (you can name it whatever else you like) it would get the exact same results as _OrderedValues table.

You may now do your calculation:

SELECT @rowid / MAX(rowid) AS Percentage 
FROM   MyDenseRankCTE 

Open in new window


Summing up:

DECLARE @rowid AS FLOAT ;
SET @rowid = 8.322303 ;

WITH MyDenseRankCTE AS
(
    SELECT Targetvalue  , 
                  DENSE_RANK() over (ORDER BY Targetvalue  ) as rowid  
    FROM   mydatatable
)

SELECT @rowid / MAX(rowid) AS Percentage 
FROM   MyDenseRankCTE ;

Open in new window


Again, i do not know if this would be more efficient. You could have a look at the execution plans maybe.

Giannis
0
DBAduck - Ben MillerPrincipal ConsultantCommented:
As long as you have an index to support your query, the DENSE_RANK is the way to go if you do not have a distinct recordset as it will give you the appropriate position based on the duplicates.

Then you have the right calculation for the rowid and max rowid.

Put an index on the table to support (cover) your query and it is a good solution. Without knowing more about the purpose or make up of the data set, this seems like the right solution except I would not use a Temp Table necessarily.

Maybe this:
DECLARE @row int, @maxrow int;

WITH OrderedValues
AS (
select  Targetvalue  , 
            DENSE_RANK() over (ORDER BY Targetvalue  ) as rowid 
from  Mytable
)
select TOP 1 @row = rowid, 
            @maxrow = (SELECT MAX(rowid) FROM OrderedValues) 
from OrderedValues   where  targetvalue = X;

SELECT CAST(@row as real) / CAST(@maxrow as real) as percentage

Open in new window


Something like that and as long as you have an index on the MyTable like TargetValue as the index, you should be just fine in speed (relatively).
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Scott PletcherSenior DBACommented:
You could also create a separate table that held all unique values with a count of each value.

The advantage of this is that triggers could maintain that table dynamically.  Then, when you looked up a value, you'd just have to add predetermined totals less than the current value, not resequence the entire data set over and over.

Periodically, "just to be safe", you could do a complete rebuild of the unique values total table to make sure it accurately matches the data.
0
awalkintheparkAuthor Commented:
thanks.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.