We help IT Professionals succeed at work.

dummy data

If you want *lot* of dummy data in the test environment quickly for testing, what are some options available to accomplish the same, given that constrains should be obeyed and kept. can it be programmed?
Comment
Watch Question

Database Consultant
Top Expert 2009
Commented:
Adam MenkesC# ASP.NET Developer
Top Expert 2010
Commented:
Yes, it can be programmed, but you would have to setup the rules in code. For example, if dummy data needs to have A-Z only, with or without spaces, no numbers, with numbers, minimum / maximum length, etc, you would need to program all that.

The also have these programs (http://www.red-gate.com/products/sql_data_generator/index.htm and http://www.sqlmanager.net/products/mssql/datagenerator, and google for others)

You could also read the settings from INFORMATION_SCHEMA.COLUMNS to determine your min/max values and whether to do NULL or not.

You can randomize values as well. For example, suppose your string was 'abcdefghij' and you only wanted to randomize data with these characters. You could use a modulus (%) function with rand() to get a substring for each character and also randomize the character length.

Adam MenkesC# ASP.NET Developer
Top Expert 2010
Commented:
Here is an example selecting a random number between 1000 and 1500. I have it looping 100 times.
DECLARE @minValue int
DECLARE @maxValue int
DECLARE @result   int
DECLARE @ctr INT

SET @minValue = 1000
SET @maxValue = 1500
SET @ctr = 1

WHILE @ctr < 100 BEGIN
  SELECT @result  = @minValue + CEILING(RAND(CAST(NEWID() AS VARBINARY)) * (@maxvalue - @minvalue))
  PRINT @result
  SET @ctr = @ctr + 1
END

Open in new window

Adam MenkesC# ASP.NET Developer
Top Expert 2010
Commented:
Here is a sample generating text with a minimum of 3 characters and a maximum of 15. I also added a space and a period to the allowed values (A-Z, a-z).

Some sample results:
byoOOkGKWzwBHY
gbiAHO
ICNDpu
VBIkjYryWpJZM
pwSwh
GKf
HCFrUseUpxHPu
fRYhokYRGhiGpN
Gdpez.AlcKF
MDoFostGay
uAcFsCoKPSM
lcdOssRjSKND
CNDnFyZm
WPyUdMgVN
MhgauwdnnXe
Ncp
jjL
MteO.yqJQUfq
DECLARE @minValue int
DECLARE @maxValue int
DECLARE @ctr      int
DECLARE @cnt      int
DECLARE @innerCnt int
DECLARE @innerCtr int
DECLARE @tempChar CHAR(1)
DECLARE @result   VARCHAR(20)

SET NOCOUNT ON

DECLARE @charTable TABLE (
  ID INT IDENTITY NOT NULL PRIMARY KEY,
  Data CHAR(1)
)
-- Add A-Z and a-z
SET @ctr = 65
WHILE @ctr <= 122 BEGIN
  IF @ctr NOT BETWEEN 91 AND 96 BEGIN
    INSERT INTO @charTable 
    SELECT CHAR(@ctr)
  END  
  SET @ctr = @ctr + 1
END

-- Add space and period 
INSERT INTO @charTable 
SELECT CHAR(32) UNION ALL 
SELECT CHAR(46)

SELECT * FROM @charTable
SELECT @cnt = COUNT(*) FROM @charTable

SET @ctr = 1
SET @minValue = 3
SET @maxValue = 15

WHILE @ctr < 100 BEGIN
  SET @innerCnt = 1
  SET @result = ''
  SET @innerCtr = @minValue + FLOOR(RAND(CAST(NEWID() AS VARBINARY)) * (@maxvalue - @minvalue))
  WHILE @innerCnt <= @innerCtr BEGIN
    SELECT @tempChar = Data 
      FROM @charTable
      WHERE ID = (1 + (FLOOR(RAND(CAST(NEWID() AS VARBINARY)) * @cnt)))
    SET @result = @result + @tempChar  
    SET @innerCnt = @innerCnt + 1
  END
  PRINT @result
  SET @ctr = @ctr + 1
END

Open in new window

Author

Commented:
so you would do this for each and every column- so it is a lot of programming, right?

referential integrity gets trickier?

Adam MenkesC# ASP.NET Developer
Top Expert 2010
Commented:
Well, I would write a function that takes parameters for specific things, and use INFORMATION_SCHEMA.COLUMNS for min/max ranges and to determine null values or not. I might even setup rules so that character data cannot start with space, period, other characters.

Yes, RI gets trickier. You would have to generate the data on the master table, and then do random selection from that table to insert into the child table.

For simple random data, you can use the methods I wrote above. For dates, it is basically the same thing. You would set your date ranges (min / max) and get a DATEDIFF and add it to the minimum date.

If you are going to have some complex scenarios, you might look for either a free data generator or purchase one from one of the links above or search google and make sure it meets the criteria you need.

For around $300, it would save hours and hours of programming, testing, debugging, etc that may well be worth the investment if you are going to have massive amounts of data.

But, going back to the function, no I would not do it for each column. I would read the schema and generate data. The function / stored procedure would have to loop through the columns to build an insert statement, but it would be something like this:

function GenerateData(tablename)

rather than
insert into tablename (field1, field2, field3) values (generateVarchar(3,15), generateint(1000,1500), generatebool())




Author

Commented:
great idea. Thanks much experts!