Solved

Selecting an even sample across multiple categories

Posted on 2014-03-31
8
216 Views
Last Modified: 2014-04-07
Hi

I have to provide an extract of data from a large table, sampling evenly across a number of categories. I can't provide the actual data, but say the table contains an id, a supplier ref, a transaction date and a transaction status (say one of 'Open', 'Closed', 'Deleted').

I need to provide a sample of 100 id's from each of the suppliers (say there are 6), with an even(ish) spread in each sample across the whole range of dates (they could be bucketed into months or quarters), and all 3 statuses.

Can anyone suggest an elegant way of doing this (without hard coding the statuses or anything, as there may be an arbitrary number of these)

Thanks

Stuart
0
Comment
Question by:andrewssd3
  • 4
  • 3
8 Comments
 
LVL 9

Expert Comment

by:rfportilla
ID: 39966871
This can probably be done in a single query, but I would need a clearer explanation.  I don't know what you mean by a spread.  I assume representative data, but I would need to know what fields and how you determine what is representative

In either case, I would start by using "select distinct supplier, category" to get a list of unique combinations of supplier and category, then join this back to the main table or view and then group on relevant fields.  This is a generic approach.
0
 
LVL 17

Author Comment

by:andrewssd3
ID: 39966883
By spread, I just mean I would like the data to contain rows with examples from a representative range of the dates (i.e. not every single date, but not all from the last 2 weeks).  Similarly with the statuses, although I would like each status to be represented in the sample.
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 39966924
That's not specific enough.  The date range is just a grouping.  Do you want a max value, min value, avg, sum, etc.?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 17

Author Comment

by:andrewssd3
ID: 39966975
No, I wasn't being clear - I don't want any aggregation, just a sample of individual rows from the database
0
 
LVL 9

Assisted Solution

by:rfportilla
rfportilla earned 200 total points
ID: 39966987
Then you should look into window functions.  Here is a good starting point.  

http://technet.microsoft.com/en-us/library/ms189461.aspx

I think the ranking function is probably where you want to go.
0
 
LVL 40

Accepted Solution

by:
Sharath earned 300 total points
ID: 39973477
I have tried simulate your data and pick random sample records for all supplier_ref covering all transaction_status.
Check if this works for you. I am picking atmost 5 random records for each status and supplier.
SELECT ID,supplier_ref,transaction_date,transaction_status 
  FROM (SELECT *, 
               ROW_NUMBER() 
                 OVER ( 
                   PARTITION BY supplier_ref, transaction_status 
                   ORDER BY NEWID()) rn 
          FROM myTable) t1 
 WHERE rn <= ABS(CAST(NEWID() AS BINARY(6)) %5) + 1 
 ORDER BY supplier_ref, 
          transaction_status 

Open in new window


http://sqlfiddle.com/#!3/209c0/2
You can try executing the query multiple times and see how the query is returing random data.
Let me know if you can adopt this to your need.
0
 
LVL 17

Author Closing Comment

by:andrewssd3
ID: 39979753
Thanks for your comments. I gave Sharath more points as he gave me a coded solution, although rfportilla pointed me to the documentation for a similar idea.
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 39983095
I don't like to give a man a fish if I can get him to learn. ;-)  In either case, I'm glad we could help.
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

In this article I will describe the Copy Database Wizard method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

837 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question