creating a stratified systematic sample of a table

Posted on 2006-05-18
Last Modified: 2008-03-04
I have a large table from which i write-out a text file so that users of the data can read the text file into the statistical software of their choice (e.g. SAS, SPSS, Stata, etc).  The dataset i have is rather large (3.1 million records; 145 columns), and, in testing, i found the statistical software to be very slow.  I'd like to create a sample dataset that has a thousandth of the records (3.1 K records), but is still representative of the table as a whole.  This would allow users to more easily develop sytax in the language of their stat software and test it in a reasonable amount of time, then when they're confident that it works, run it on the whole dataset.

I want to create a query that orders the table by state, then county, select a random sample from the first 1000 of these ordered records, then take every thousandth record starting from the randomly-selected record.  I want write these data to a text file, either directly or from a temporary table or a view.

Any ideas on how to write the query that selects a random sample of the first 1000 ordered records, then selects every thousandth record after that?

thanks in advance,

Question by:med39
    LVL 50

    Expert Comment

    which version of sql server are you using,...

    using the NEWID() function to order the table and taking the top 3100 rows maybe the cheapest "random" method

    select top 3100 *
      from yourtable
     order by newid()

    if you really want to start at a particular row and then tale every 1000th then we need to determine what we
    can use for sequencing...

    you can use the bcp utility   to create the actual file from the eventual query..
    LVL 2

    Author Comment


    I'm using vesion 8.0 of the Query Analyzer and the Enterprise Manager.

    LVL 50

    Accepted Solution

    if you must go the random start then every 1000th.
    then this is probably one of those times i'd actually recommend using a cursor..

    so declare a cursor for the select statement

    open the cursor

    and then use
    to get the random start position

    and then in a while loop

    to get the other 3099ish rows..


    declare cursor samplecur for
      select * from yourtable
         order by state,county,...

    open samplecur

    (Declare @startpos int
     set @startpos = rand(datepart(ms,getdate())* 1000

    Fetch Absolute @startpos
      into .... (list of bvariables)

    while @@fetch_status = 0
    Insert into SampleTable
      values (list of variables)

       fetch relative 1000
         into ... list of variable

    then you can use the BCP command to output the sample table to a text file..

    LVL 2

    Author Comment


    Thanks, LowFatSpread!  (love the name!)


    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Join & Write a Comment

    In this article—a derivative of my blog post (—I will explore a few different perspectives on which week today's date falls within using Microsoft SQL Server. First, to frame this stu…
    Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
    Via a live example, show how to shrink a transaction log file down to a reasonable size.
    Viewers will learn how the fundamental information of how to create a table.

    754 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    16 Experts available now in Live!

    Get 1:1 Help Now