Solved

Selecting an even sample across multiple categories

Posted on 2014-03-31
8
218 Views
Last Modified: 2014-04-07
Hi

I have to provide an extract of data from a large table, sampling evenly across a number of categories. I can't provide the actual data, but say the table contains an id, a supplier ref, a transaction date and a transaction status (say one of 'Open', 'Closed', 'Deleted').

I need to provide a sample of 100 id's from each of the suppliers (say there are 6), with an even(ish) spread in each sample across the whole range of dates (they could be bucketed into months or quarters), and all 3 statuses.

Can anyone suggest an elegant way of doing this (without hard coding the statuses or anything, as there may be an arbitrary number of these)

Thanks

Stuart
0
Comment
Question by:andrewssd3
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 9

Expert Comment

by:rfportilla
ID: 39966871
This can probably be done in a single query, but I would need a clearer explanation.  I don't know what you mean by a spread.  I assume representative data, but I would need to know what fields and how you determine what is representative

In either case, I would start by using "select distinct supplier, category" to get a list of unique combinations of supplier and category, then join this back to the main table or view and then group on relevant fields.  This is a generic approach.
0
 
LVL 17

Author Comment

by:andrewssd3
ID: 39966883
By spread, I just mean I would like the data to contain rows with examples from a representative range of the dates (i.e. not every single date, but not all from the last 2 weeks).  Similarly with the statuses, although I would like each status to be represented in the sample.
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 39966924
That's not specific enough.  The date range is just a grouping.  Do you want a max value, min value, avg, sum, etc.?
0
Forrester Webinar: xMatters Delivers 261% ROI

Guest speaker Dean Davison, Forrester Principal Consultant, explains how a Fortune 500 communication company using xMatters found these results: Achieved a 261% ROI, Experienced $753,280 in net present value benefits over 3 years and Reduced MTTR by 91% for tier 1 incidents.

 
LVL 17

Author Comment

by:andrewssd3
ID: 39966975
No, I wasn't being clear - I don't want any aggregation, just a sample of individual rows from the database
0
 
LVL 9

Assisted Solution

by:rfportilla
rfportilla earned 200 total points
ID: 39966987
Then you should look into window functions.  Here is a good starting point.  

http://technet.microsoft.com/en-us/library/ms189461.aspx

I think the ranking function is probably where you want to go.
0
 
LVL 41

Accepted Solution

by:
Sharath earned 300 total points
ID: 39973477
I have tried simulate your data and pick random sample records for all supplier_ref covering all transaction_status.
Check if this works for you. I am picking atmost 5 random records for each status and supplier.
SELECT ID,supplier_ref,transaction_date,transaction_status 
  FROM (SELECT *, 
               ROW_NUMBER() 
                 OVER ( 
                   PARTITION BY supplier_ref, transaction_status 
                   ORDER BY NEWID()) rn 
          FROM myTable) t1 
 WHERE rn <= ABS(CAST(NEWID() AS BINARY(6)) %5) + 1 
 ORDER BY supplier_ref, 
          transaction_status 

Open in new window


http://sqlfiddle.com/#!3/209c0/2
You can try executing the query multiple times and see how the query is returing random data.
Let me know if you can adopt this to your need.
0
 
LVL 17

Author Closing Comment

by:andrewssd3
ID: 39979753
Thanks for your comments. I gave Sharath more points as he gave me a coded solution, although rfportilla pointed me to the documentation for a similar idea.
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 39983095
I don't like to give a man a fish if I can get him to learn. ;-)  In either case, I'm glad we could help.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SQL Server Shrink hurting performance? 4 40
Formating field in mysql Advance formatting 1 41
SQL server client app 3 32
SQL question - help with insert for missing info 5 21
In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
How to leverage one TLS certificate to encrypt Microsoft SQL traffic and Remote Desktop Services, versus creating multiple tickets for the same server.
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question