Variable in TABLESAMPLE Clause

Hi

I am trying to base the size of my table sample based on the number of records in the table.  In this example, if my record count is between 2 and 8, then I want my sample size to be 2 records from the all of the records in the table.  If the record count is between 9 and 15, then the sample size is 3.  The error message I get is "Variables are not allowed in the TABLESAMPLE or REPEATABLE clauses"

CREATE PROCEDURE p_CashSample
      @RecordCount Int = NULL OUTPUT      
AS
BEGIN

SET @RecordCount = (SELECT COUNT(CashID) FROM PopulationBondsCash)

IF (@RecordCount BETWEEN 2 AND 8)
      SELECT TOP 2 * FROM PopulationBondsCash TABLESAMPLE (@RecordCount ROWS)
ELSE IF (@RecordCount BETWEEN 9 AND 15)
      SELECT TOP 3 * FROM PopulationBondsCash TABLESAMPLE (@RecordCount ROWS)
ELSE IF (@RecordCount BETWEEN 16 AND 25)
      SELECT TOP 5 * FROM PopulationBondsCash TABLESAMPLE (@RecordCount ROWS)
END
GO
evanburenAsked:
Who is Participating?
 
Anthony PerkinsCommented:
Or you could scrap the whole TABLESAMPLE approach and use the equivalent:
CREATE PROCEDURE p_CashSample
    @RecordCount Int = NULL OUTPUT
AS 
SELECT  @RecordCount = COUNT(*)
FROM    PopulationBondsCash

SELECT TOP (CASE WHEN @RecordCount BETWEEN 2 AND 8 THEN 2
                 WHEN @RecordCount BETWEEN 9 AND 15 THEN 3
                 WHEN @RecordCount BETWEEN 16 AND 25 THEN 5
                 ELSE 6
            END)
        *
FROM    PopulationBondsCash
ORDER BY NEWID()

Open in new window

0
 
JestersGrindCommented:
You will probably have to use dynamic SQL like this.

Greg


CREATE PROCEDURE p_CashSample
      @RecordCount Int = NULL OUTPUT      
AS
BEGIN

DECLARE @SQL VARCHAR(MAX),
		@Top VARCHAR(5)

SET @RecordCount = (SELECT COUNT(CashID) FROM PopulationBondsCash)

IF (@RecordCount BETWEEN 2 AND 8)
      SET @Top = '2'
ELSE IF (@RecordCount BETWEEN 9 AND 15)
      SET @Top = '3'
ELSE IF (@RecordCount BETWEEN 16 AND 25)
      SET @Top = '5'

SET @SQL = 'SELECT TOP ' + @Top + ' * FROM PopulationBondsCash TABLESAMPLE (' + CONVERT(VARCHAR(10), @RecordCount) + ' ROWS)'

EXECUTE sp_executesql @SQL

END
GO

Open in new window

0
 
kamalranjanCommented:
Hi evanburen,

I think you are using 2 techniques simultaneously to narrow down the resultset. Since you are already using TOP keyword with values like 2,3,5 etc, you do not need to use TABLESAMPLE keyword as it would also do the same as TOP keyword. So simply remove the TABLESAMPLE clause and your results will still be good as you want them to narrow down to.

Regards
Kamal
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

 
evanburenAuthor Commented:
I think using the CASE statement with the NEWID() function is easier to read which is the one I'm going with although using dynamic SQL will also work.   Using TOP instead of TABLESAMPLE or NEWID() does not give you a random sample of records. It just gives you the first x records.
0
 
JestersGrindCommented:
@acperkins

Ordering by NEWID() is brilliant!  Thank you for posting that.  I learned something new.  

Greg

0
 
Anthony PerkinsCommented:
Greg,

Thanks.  As it turns out I learned something too, as I had to read up on TABLESAMPLE clause.  I am not sure if it was just me, but I was less than impressed by the results produced by it and in fact SQL Server BOL seemed to agree with me:

TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table. Based on the random value for a page and the percentage specified in the query, a page is either included in the sample or excluded. Each page that is included returns all rows in the sample result set. For example, if you specify TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified data pages of the table. If the rows are evenly distributed on the pages of the table, and if there is a sufficient number of pages in the table, the number of rows returned should approximate the sample size that is requested. However, because the random value that is generated for each page is independent of the values that are generated for any other page, a larger, or smaller, percentage of pages than have been requested might be returned. The TOP(n) operator can be used to limit the number of rows to a specified maximum.
...
The TABLESAMPLE SYSTEM clause should be used with some caution, and with some understanding of some of the implications of using sampling. For example, a join of two tables is likely to return a match for each row in both tables; however, if TABLESAMPLE SYSTEM is specified for either of the two tables, some rows returned from the unsampled table are unlikely to have a matching row in the sampled table. This behavior might lead you to suspect that a data consistency problem exists in the underlying tables, when the data is actually valid. Similarly, if TABLESAMPLE SYSTEM is specified for both tables that are joined, the perceived problem could be even worse.


Anthony
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.