creating a stratified systematic sample of a table
Posted on 2006-05-18
I have a large table from which i write-out a text file so that users of the data can read the text file into the statistical software of their choice (e.g. SAS, SPSS, Stata, etc). The dataset i have is rather large (3.1 million records; 145 columns), and, in testing, i found the statistical software to be very slow. I'd like to create a sample dataset that has a thousandth of the records (3.1 K records), but is still representative of the table as a whole. This would allow users to more easily develop sytax in the language of their stat software and test it in a reasonable amount of time, then when they're confident that it works, run it on the whole dataset.
I want to create a query that orders the table by state, then county, select a random sample from the first 1000 of these ordered records, then take every thousandth record starting from the randomly-selected record. I want write these data to a text file, either directly or from a temporary table or a view.
Any ideas on how to write the query that selects a random sample of the first 1000 ordered records, then selects every thousandth record after that?
thanks in advance,