Hope this is not too convoluted. I have shown one solution, but I question the approach.
I'm hoping that someone will have a better idea. There is probably some repetition in the description,
I wanted to convey the idea, not optimize it to fewest words.
Determine what ordinal position a single value occurs in an ordered list of all values, and, express this as a percentage.
Essentially asking: "what's the rating of my value in the entire set of values?"
Mean and standard deviation don't really work on this because the data set is not close to normal.
Simple example :
There are 100 rows, values 1 to 10, lots of duplicates. But containing only the distinct values 1 to 10.
Your number is 8
So your value is in the top 80% of all values.
Real case - there are million rows of real numbers. Ranging from -75.234234 ( whatever) to +92.23423 ( again, whatever)
Keep reading...trying to make it clear. Thanks
Assume table of n rows, with numeric column containing random numbers. There are duplicate values.
Problem is to determine what position a specific value is in this set of numbers, when it's reduced to a duplicate free set.
I can get a list of distinct ordered values and the rowid like this:
select Targetvalue , DENSE_RANK() over (ORDER BY Targetvalue ) as rowid from mydatatable
Let's say there are one million rows, now it's reduced to distinct 400,000.
I want to find the rowid, or placement of a specific value in the ordered set of all values,
then divide by the total rows ( the max rowid will be the same value) and get the percentage placement of the single sought value.
This is does work, but maybe someone has a better approach:
-- Find the value, easy:
1) Make a prepared table containing the distinct ordered values and the row ids.
select Targetvalue , DENSE_RANK() over (ORDER BY Targetvalue ) as rowid into _OrderedValues from Mytable
2) Then just do a look up, get the rowid, or absolute placement in the set.
select rowid from _OrderedValues where targetvalue = X
Ok, now, what percentage placement is my target value in the set?
I need the total number of rows, and the rowid for the sought value.
rowid / max(rowid) = %
This works but again, I question the efficiency of the entire approach:
select rowid,targetvalue , cast(rowid as real)/ cast(maxrow as real) from
_OrderedValues join (select MAX(rowid)maxrow from _OrderedValues ) A on maxrow > 0
where targetvalue = 8.322303 ( example random value sought)
This example returns 0.9123968 in my data.
I can state that target value is in the top 91% of the entire set.
As these values don't change very often, I'm thinking it's better to make various lookup tables to avoid having
the full data set crunched over every time a user makes a request.
Or does anyone have a efficient way to do this on a pretty large table without having
to make some canned tables for look ups?
Hope this is clearly enough stated.