Solved

Selecting an even sample across multiple categories

Posted on 2014-03-31
8
212 Views
Last Modified: 2014-04-07
Hi

I have to provide an extract of data from a large table, sampling evenly across a number of categories. I can't provide the actual data, but say the table contains an id, a supplier ref, a transaction date and a transaction status (say one of 'Open', 'Closed', 'Deleted').

I need to provide a sample of 100 id's from each of the suppliers (say there are 6), with an even(ish) spread in each sample across the whole range of dates (they could be bucketed into months or quarters), and all 3 statuses.

Can anyone suggest an elegant way of doing this (without hard coding the statuses or anything, as there may be an arbitrary number of these)

Thanks

Stuart
0
Comment
Question by:andrewssd3
  • 4
  • 3
8 Comments
 
LVL 9

Expert Comment

by:rfportilla
ID: 39966871
This can probably be done in a single query, but I would need a clearer explanation.  I don't know what you mean by a spread.  I assume representative data, but I would need to know what fields and how you determine what is representative

In either case, I would start by using "select distinct supplier, category" to get a list of unique combinations of supplier and category, then join this back to the main table or view and then group on relevant fields.  This is a generic approach.
0
 
LVL 17

Author Comment

by:andrewssd3
ID: 39966883
By spread, I just mean I would like the data to contain rows with examples from a representative range of the dates (i.e. not every single date, but not all from the last 2 weeks).  Similarly with the statuses, although I would like each status to be represented in the sample.
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 39966924
That's not specific enough.  The date range is just a grouping.  Do you want a max value, min value, avg, sum, etc.?
0
 
LVL 17

Author Comment

by:andrewssd3
ID: 39966975
No, I wasn't being clear - I don't want any aggregation, just a sample of individual rows from the database
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 
LVL 9

Assisted Solution

by:rfportilla
rfportilla earned 200 total points
ID: 39966987
Then you should look into window functions.  Here is a good starting point.  

http://technet.microsoft.com/en-us/library/ms189461.aspx

I think the ranking function is probably where you want to go.
0
 
LVL 40

Accepted Solution

by:
Sharath earned 300 total points
ID: 39973477
I have tried simulate your data and pick random sample records for all supplier_ref covering all transaction_status.
Check if this works for you. I am picking atmost 5 random records for each status and supplier.
SELECT ID,supplier_ref,transaction_date,transaction_status 
  FROM (SELECT *, 
               ROW_NUMBER() 
                 OVER ( 
                   PARTITION BY supplier_ref, transaction_status 
                   ORDER BY NEWID()) rn 
          FROM myTable) t1 
 WHERE rn <= ABS(CAST(NEWID() AS BINARY(6)) %5) + 1 
 ORDER BY supplier_ref, 
          transaction_status 

Open in new window


http://sqlfiddle.com/#!3/209c0/2
You can try executing the query multiple times and see how the query is returing random data.
Let me know if you can adopt this to your need.
0
 
LVL 17

Author Closing Comment

by:andrewssd3
ID: 39979753
Thanks for your comments. I gave Sharath more points as he gave me a coded solution, although rfportilla pointed me to the documentation for a similar idea.
0
 
LVL 9

Expert Comment

by:rfportilla
ID: 39983095
I don't like to give a man a fish if I can get him to learn. ;-)  In either case, I'm glad we could help.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
This tutorial gives a high-level tour of the interface of Marketo (a marketing automation tool to help businesses track and engage prospective customers and drive them to purchase). You will see the main areas including Marketing Activities, Design …
Migrating to Microsoft Office 365 is becoming increasingly popular for organizations both large and small. If you have made the leap to Microsoft’s cloud platform, you know that you will need to create a corporate email signature for your Office 365…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now