I am looking to create summary information on numerical data sets within SQL Server 2005 Express. These need to include additional aggregate statistics such as Percentiles (flexible percentiles preferentially i.e add the nth value of the percentile statistic as an argument). I am not sure as to the best approach but have been considering creating an Aggregate Function.

I am surprised that looking into doing this i have not found very much information on the web.

I am considering using a standard method such as that used in Excel to produce a percentile which would require ordering the aggregate group subset and determining the relevant rank of the percentile argument (simplistically 50th Percentile: 50/100 * n = rank) .

One concern i have is that i believe aggregate functions may split and merge subsets which would not work for ranking functions?

If anyone has experience of implementing statistical functions in this way or alternative methods i would be grateful in hearing about it.

Thanks

The script gives the percentile of each number in the series where as the requirement is to return the value of a specific (argument) percentile.

E.g. for values 0.37, 0.01, 0.28, 0.20 The summary query would look something along the lines:

Grouping Min Median Average Max 5%ile 95%ile

July-2009 0.01 0.24 0.22 0.37 0.04 0.36

The 50% percentile (median) falls between ranked values 2 and 3, 0.22 and 0.28 to give a returned value of 0.24.

So I could put any chosen group against it such as this:

SELECT GroupFields, Min(values), Percentile(values, 0.5) Avg(values), Max(values), Percentile(values, 0.05), Percentile(values, 0.95)

GROUP BY GroupFields