• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 184
  • Last Modified:

Getting a Random Record Sample

I'm trying to produce a random sample of records from table PopulationBondsCash_Temp. This produces the total sample size for each user.  For example, if @Recordcount = 16, then this code produces 16 records for every UserID.  What I really need is that if the total sample size is 16 and then divide the total sample size evenly among each of the users so if there were 4 users there would we 4 records per user.  


DECLARE @Level Int;
SET @Level = 1;
DECLARE @RecordCount Int;
SET @RecordCount = NULL;

SELECT  @RecordCount = COUNT(*)
FROM    PopulationBondsCash_Temp

SELECT *
FROM (
SELECT       ActivityDate,
      AlienFileLocation,
      UserID,
      BondAmount,
      BondNumber,
      COUNT(UserID) OVER (Partition by UserID) AS UserCount,
      ROW_NUMBER() OVER (partition by UserID order by NEWID()) rn
FROM     PopulationBondsCash_Temp            
) a
where rn <= (CASE
    --Level 1
      WHEN @Level = 1 AND @RecordCount BETWEEN 2 AND 8 THEN 2
      WHEN @Level = 1 AND @RecordCount BETWEEN 9 AND 15 THEN 2
      WHEN @Level = 1 AND @RecordCount BETWEEN 16 AND 25 THEN 3
      WHEN @Level = 1 AND @RecordCount BETWEEN 26 AND 50 THEN 5      
      --Level 2
      WHEN @Level = 2 AND @RecordCount BETWEEN 2 AND 8 THEN 2
      WHEN @Level = 2 AND @RecordCount BETWEEN 9 AND 15 THEN 3
      WHEN @Level = 2 AND @RecordCount BETWEEN 16 AND 25 THEN 5
      WHEN @Level = 2 AND @RecordCount BETWEEN 26 AND 50 THEN 8      
      --Level 3
      WHEN @Level = 3 AND @RecordCount BETWEEN 2 AND 8 THEN 3
      WHEN @Level = 3 AND @RecordCount BETWEEN 9 AND 15 THEN 5
      WHEN @Level = 3 AND @RecordCount BETWEEN 16 AND 25 THEN 8
      WHEN @Level = 3 AND @RecordCount BETWEEN 26 AND 50 THEN 13      
ELSE
    100
  END)
0
evanburen
Asked:
evanburen
1 Solution
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
what about using TABLESAMPLE ?
http://msdn.microsoft.com/en-us/library/ms189108.aspx

anyhow, your requirement is not easy to be done in effient sql, at least not in 1 single pass, as you won't know the "total" until you actually run the sql.
now, say there are N users and T records
also, will there be T/N records for all N users?
what if T is 21, and N = 4, where to "cut", and which "user" to cut?
0
 
evanburenAuthor Commented:
Yea, it's pretty difficult.  I think I'll have to first count the total record count, then total for each UserID and then store those values in a table and then pull the sample based on those.

Thanks
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now