How to calculate Median for each group from a large set of data?

Posted on 2007-07-23
Last Modified: 2012-06-21
I know how to calculate Median from Ken's book 'The Guru's Guide to Transact-SQL', but the example given in the book only deals with one set (group) of numbers. In my case, I need to group by each code and before group by I have a total of 18 million records so using cross join (one way talked about in the book to get Median) might take a long time. If using Identity, i wonder how to insert id which starts 1 again when a new group starts. Thanks!
Question by:qinyan
    LVL 6

    Accepted Solution

    Identity won't work if the numbers need to start over on each group, so you'll need to populate the new id column yourself.  If you're using SQL 2005, look at the functions RANK, ROW_NUMBER, and NTILE, (aka, "ranking functions") which are designed for problems like this.  If not, you'll have to go ahead and write your own code to create the ID column and populate it yourself.  I'm thinking a cursor would be the most straightforward way, but with 18 million rows, that could take awhile.  But at least that way you have full control over exactly how the id gets assigned.

    Author Comment

    I'm using SQL 2005 but it says Rank / Row_Number is not a recognized function name. How could that be? Thanks!
    LVL 6

    Assisted Solution

    It's possible that you have SQL 2005 client utilities but that the back end is SQL 2000.  Or I suppose it could be a database running in SQL 2000 compatibility mode (if that's possible).  Anyway, those are things to check.

    Author Comment

    I tested on a db running on SQL 2005 and it worked. Thanks!

    Featured Post

    Why You Should Analyze Threat Actor TTPs

    After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

    Join & Write a Comment

    In this article—a derivative of my blog post (—I will explore a few different perspectives on which week today's date falls within using Microsoft SQL Server. First, to frame this stu…
    Performance is the key factor for any successful data integration project, knowing the type of transformation that you’re using is the first step on optimizing the SSIS flow performance, by utilizing the correct transformation or the design alternat…
    Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
    Via a live example, show how to setup several different housekeeping processes for a SQL Server.

    746 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    15 Experts available now in Live!

    Get 1:1 Help Now