Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Selecting an even sample across multiple categories

Posted on 2014-03-31
8
Medium Priority
?
226 Views
Last Modified: 2014-04-07
Hi

I have to provide an extract of data from a large table, sampling evenly across a number of categories. I can't provide the actual data, but say the table contains an id, a supplier ref, a transaction date and a transaction status (say one of 'Open', 'Closed', 'Deleted').

I need to provide a sample of 100 id's from each of the suppliers (say there are 6), with an even(ish) spread in each sample across the whole range of dates (they could be bucketed into months or quarters), and all 3 statuses.

Can anyone suggest an elegant way of doing this (without hard coding the statuses or anything, as there may be an arbitrary number of these)

Thanks

Stuart
0
Comment
Question by:andrewssd3
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 9

Expert Comment

by:rfportilla
ID: 39966871
This can probably be done in a single query, but I would need a clearer explanation.  I don't know what you mean by a spread.  I assume representative data, but I would need to know what fields and how you determine what is representative

In either case, I would start by using "select distinct supplier, category" to get a list of unique combinations of supplier and category, then join this back to the main table or view and then group on relevant fields.  This is a generic approach.
0
 
LVL 17

Author Comment

by:andrewssd3
ID: 39966883
By spread, I just mean I would like the data to contain rows with examples from a representative range of the dates (i.e. not every single date, but not all from the last 2 weeks).  Similarly with the statuses, although I would like each status to be represented in the sample.
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 39966924
That's not specific enough.  The date range is just a grouping.  Do you want a max value, min value, avg, sum, etc.?
0
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

 
LVL 17

Author Comment

by:andrewssd3
ID: 39966975
No, I wasn't being clear - I don't want any aggregation, just a sample of individual rows from the database
0
 
LVL 9

Assisted Solution

by:rfportilla
rfportilla earned 800 total points
ID: 39966987
Then you should look into window functions.  Here is a good starting point.  

http://technet.microsoft.com/en-us/library/ms189461.aspx

I think the ranking function is probably where you want to go.
0
 
LVL 41

Accepted Solution

by:
Sharath earned 1200 total points
ID: 39973477
I have tried simulate your data and pick random sample records for all supplier_ref covering all transaction_status.
Check if this works for you. I am picking atmost 5 random records for each status and supplier.
SELECT ID,supplier_ref,transaction_date,transaction_status 
  FROM (SELECT *, 
               ROW_NUMBER() 
                 OVER ( 
                   PARTITION BY supplier_ref, transaction_status 
                   ORDER BY NEWID()) rn 
          FROM myTable) t1 
 WHERE rn <= ABS(CAST(NEWID() AS BINARY(6)) %5) + 1 
 ORDER BY supplier_ref, 
          transaction_status 

Open in new window


http://sqlfiddle.com/#!3/209c0/2
You can try executing the query multiple times and see how the query is returing random data.
Let me know if you can adopt this to your need.
0
 
LVL 17

Author Closing Comment

by:andrewssd3
ID: 39979753
Thanks for your comments. I gave Sharath more points as he gave me a coded solution, although rfportilla pointed me to the documentation for a similar idea.
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 39983095
I don't like to give a man a fish if I can get him to learn. ;-)  In either case, I'm glad we could help.
0

Featured Post

Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
In a question here at Experts Exchange (https://www.experts-exchange.com/questions/29062564/Adobe-acrobat-reader-DC.html), a member asked how to create a signature in Adobe Acrobat Reader DC (the free Reader product, not the paid, full Acrobat produ…

596 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question