Improve company productivity with a Business Account.Sign Up

x
?
Solved

Selecting an even sample across multiple categories

Posted on 2014-03-31
8
Medium Priority
?
233 Views
Last Modified: 2014-04-07
Hi

I have to provide an extract of data from a large table, sampling evenly across a number of categories. I can't provide the actual data, but say the table contains an id, a supplier ref, a transaction date and a transaction status (say one of 'Open', 'Closed', 'Deleted').

I need to provide a sample of 100 id's from each of the suppliers (say there are 6), with an even(ish) spread in each sample across the whole range of dates (they could be bucketed into months or quarters), and all 3 statuses.

Can anyone suggest an elegant way of doing this (without hard coding the statuses or anything, as there may be an arbitrary number of these)

Thanks

Stuart
0
Comment
Question by:andrewssd3
  • 4
  • 3
8 Comments
 
LVL 9

Expert Comment

by:rfportilla
ID: 39966871
This can probably be done in a single query, but I would need a clearer explanation.  I don't know what you mean by a spread.  I assume representative data, but I would need to know what fields and how you determine what is representative

In either case, I would start by using "select distinct supplier, category" to get a list of unique combinations of supplier and category, then join this back to the main table or view and then group on relevant fields.  This is a generic approach.
0
 
LVL 17

Author Comment

by:andrewssd3
ID: 39966883
By spread, I just mean I would like the data to contain rows with examples from a representative range of the dates (i.e. not every single date, but not all from the last 2 weeks).  Similarly with the statuses, although I would like each status to be represented in the sample.
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 39966924
That's not specific enough.  The date range is just a grouping.  Do you want a max value, min value, avg, sum, etc.?
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 17

Author Comment

by:andrewssd3
ID: 39966975
No, I wasn't being clear - I don't want any aggregation, just a sample of individual rows from the database
0
 
LVL 9

Assisted Solution

by:rfportilla
rfportilla earned 800 total points
ID: 39966987
Then you should look into window functions.  Here is a good starting point.  

http://technet.microsoft.com/en-us/library/ms189461.aspx

I think the ranking function is probably where you want to go.
0
 
LVL 41

Accepted Solution

by:
Sharath earned 1200 total points
ID: 39973477
I have tried simulate your data and pick random sample records for all supplier_ref covering all transaction_status.
Check if this works for you. I am picking atmost 5 random records for each status and supplier.
SELECT ID,supplier_ref,transaction_date,transaction_status 
  FROM (SELECT *, 
               ROW_NUMBER() 
                 OVER ( 
                   PARTITION BY supplier_ref, transaction_status 
                   ORDER BY NEWID()) rn 
          FROM myTable) t1 
 WHERE rn <= ABS(CAST(NEWID() AS BINARY(6)) %5) + 1 
 ORDER BY supplier_ref, 
          transaction_status 

Open in new window


http://sqlfiddle.com/#!3/209c0/2
You can try executing the query multiple times and see how the query is returing random data.
Let me know if you can adopt this to your need.
0
 
LVL 17

Author Closing Comment

by:andrewssd3
ID: 39979753
Thanks for your comments. I gave Sharath more points as he gave me a coded solution, although rfportilla pointed me to the documentation for a similar idea.
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 39983095
I don't like to give a man a fish if I can get him to learn. ;-)  In either case, I'm glad we could help.
0

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

An alternative to the "For XML" way of pivoting and concatenating result sets into strings, and an easy introduction to "common table expressions" (CTEs). Being someone who is always looking for alternatives to "work your data", I came across this …
An introductory discussion about Oracle Analytic Functions which are used to calculate or compute Aggregate values, based on a group of rows.
SQL Database Recovery Software repairs the MDF & NDF Files, corrupted due to hardware related issues or software related errors. Provides preview of recovered database objects and allows saving in either MSSQL, CSV, HTML or XLS format. Ensures recov…
Stellar Phoenix SQL Database Repair software easily fixes the suspect mode issue of SQL Server database. It is a simple process to bring the database from suspect mode to normal mode. Check out the video and fix the SQL database suspect mode problem.

584 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question