Link to home
Start Free TrialLog in
Avatar of evanburen
evanburenFlag for United States of America

asked on

Variable in TABLESAMPLE Clause

Hi

I am trying to base the size of my table sample based on the number of records in the table.  In this example, if my record count is between 2 and 8, then I want my sample size to be 2 records from the all of the records in the table.  If the record count is between 9 and 15, then the sample size is 3.  The error message I get is "Variables are not allowed in the TABLESAMPLE or REPEATABLE clauses"

CREATE PROCEDURE p_CashSample
      @RecordCount Int = NULL OUTPUT      
AS
BEGIN

SET @RecordCount = (SELECT COUNT(CashID) FROM PopulationBondsCash)

IF (@RecordCount BETWEEN 2 AND 8)
      SELECT TOP 2 * FROM PopulationBondsCash TABLESAMPLE (@RecordCount ROWS)
ELSE IF (@RecordCount BETWEEN 9 AND 15)
      SELECT TOP 3 * FROM PopulationBondsCash TABLESAMPLE (@RecordCount ROWS)
ELSE IF (@RecordCount BETWEEN 16 AND 25)
      SELECT TOP 5 * FROM PopulationBondsCash TABLESAMPLE (@RecordCount ROWS)
END
GO
SOLUTION
Avatar of JestersGrind
JestersGrind
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi evanburen,

I think you are using 2 techniques simultaneously to narrow down the resultset. Since you are already using TOP keyword with values like 2,3,5 etc, you do not need to use TABLESAMPLE keyword as it would also do the same as TOP keyword. So simply remove the TABLESAMPLE clause and your results will still be good as you want them to narrow down to.

Regards
Kamal
Avatar of evanburen

ASKER

I think using the CASE statement with the NEWID() function is easier to read which is the one I'm going with although using dynamic SQL will also work.   Using TOP instead of TABLESAMPLE or NEWID() does not give you a random sample of records. It just gives you the first x records.
@acperkins

Ordering by NEWID() is brilliant!  Thank you for posting that.  I learned something new.  

Greg

Greg,

Thanks.  As it turns out I learned something too, as I had to read up on TABLESAMPLE clause.  I am not sure if it was just me, but I was less than impressed by the results produced by it and in fact SQL Server BOL seemed to agree with me:

TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table. Based on the random value for a page and the percentage specified in the query, a page is either included in the sample or excluded. Each page that is included returns all rows in the sample result set. For example, if you specify TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified data pages of the table. If the rows are evenly distributed on the pages of the table, and if there is a sufficient number of pages in the table, the number of rows returned should approximate the sample size that is requested. However, because the random value that is generated for each page is independent of the values that are generated for any other page, a larger, or smaller, percentage of pages than have been requested might be returned. The TOP(n) operator can be used to limit the number of rows to a specified maximum.
...
The TABLESAMPLE SYSTEM clause should be used with some caution, and with some understanding of some of the implications of using sampling. For example, a join of two tables is likely to return a match for each row in both tables; however, if TABLESAMPLE SYSTEM is specified for either of the two tables, some rows returned from the unsampled table are unlikely to have a matching row in the sampled table. This behavior might lead you to suspect that a data consistency problem exists in the underlying tables, when the data is actually valid. Similarly, if TABLESAMPLE SYSTEM is specified for both tables that are joined, the perceived problem could be even worse.


Anthony