Hi,
Thanks for that info. How would I create the loop to insert new rows (i.e. double the rows with distinct values) instead of updating the existing column values?
Thanks!
Main Topics
Browse All TopicsHi,
I am trying to generate a table containing 25 columns and populate with some sample data. My aim is to get around 300k rows. I have generated the table with 300 distinct rows of sample data using powerdesigner. Does anybody know a nice way of blowing up this table to 300k rows without duplicating any of the data?? I tried using the STUFF function to take the last 2 digits of the ID and append to each column as follows: -
insert into test1 (C1)
select stuff(C1,5,2,substring(con
from test1
But this method gives me duplicate values when I execute distinct count. Can some1 please help?
But this
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
Just as an example:
declare @max int
declare @cur int
select @cur = 1, @max = 300000 -- test with a smaller number first as I haven't tested it
while @cur < @max
begin
insert into TEST1 values (str(rand()), str(rand(@cur)), str(rand()), ....)
select @cur = @cur + 1
end
You can use different seed values to manipulate random generator as well, I just put @cur as seed as an example.
You have another problem as well. Doing 300K individual inserts in Sybase IQ will take a very long time. IQ simply is not made for fast individual record manipulation.
If you have to do all this inside IQ, create the table without any indexes (except the default FPs of course) and populate it. Start with a few thousand rows of random data. For now, leave the IDs blank. Then, use INSERT/SELECT to double the number of rows each time. If you start with 9375 seed rows and double it 5 times, you will wind up with 300K.
After you have all the rows you need, you can go back and assign the IDs in bulk. You can either use something like UPDATE table SET IDcol = NUMBER(*). Alternatively, you can use the ROWID with something like... UPDATE table SET IDcol = ROWID(table). Of course, if you need a character prefix or something else, you can add that into the expression you use in the UPDATE
Of course, you can always use scripts or another database (ASA or ASE) to generate the data, BCP it out and do a LOAD TABLE to get it into IQ.
Regards,
Bill
Hi Bill,
Many thanks for your reply. As always, your help is very much valued & appreciated.
Its a little more tricky, because my table has 25 columns which are char(6). I designed it with powerdesigner and also used PD to generate 300 rows of sample data. I have been using INSERT/SELECT to double the rows within IQ but the problem is that I end up with duplicate rows. I then tried to use RAND() within my INSERT/SELECT statement but I am still ending up with duplicate rows.
insert into test1 (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21,
C22, C23, C24, C25)
select stuff(C1,5,2,substring(con
select stuff(C2,5,2,substring(con
select stuff(C3,5,2,substring(con
........................
etc
Running this statement twice now gives me 1200 rows with only 900 distinct values in each column.
select top 5 c1 from test1
c1
'H0A9HT'
'RU812M'
'FACK8O'
'UBC86S'
'3REOAK'
select top 5 c1 from test1 where ROWID("Test1") > 300
c1
'H0A9.5'
'RU81.5'
'FACK.5'
'UBC8.5'
'3REO.5'
select top 5 c1 from test1 where ROWID("Test1") > 600
c1
'H0A9.9'
'RU81.9'
'FACK.9'
'UBC8.9'
'3REO.9'
select top 5 c1 from test1 where ROWID("Test1") > 900
c1
'H0A9.9'
'RU81.9'
'FACK.9'
'UBC8.9'
'3REO.9'
Any ideas why my data is being duplicated and not giving me different values using the rand() function???
Thanks
You were not specific about what you wanted. If I understand correctly, you want each value in each of the 26 columns to be unique across all 300,000 rows.
Using a RAND function will not guarantee any degree of uniqueness at all. How could it? They are random numbers. It is a bit like rolling a die (one dice) six times and expecting to get a 1, 2, 3,, 4, 5, and 6. It ain't likely to happen; roughly a 1 in 65 chance (precisely 5*4*3*2 in 6^5 chances).
You might want to examine your real requirements. It would be very unusual for there to be 300,000 rows of 26 six character values and not have any duplicates. Unless you are generating lottery tokens or something, real life will give you dupes much as you are getting with your random number generation.
If you really need each value to be unique, you will have to create each and every value explicitly. 26 columns cries out for each column to begin with a character of the alphabet. The remaining 5 characters can be a zero-padded hex number. You can adjust the starting offset for each one so they are not in total lockstep; you have plenty of room since 5 hex digits gives you just over 1 million possible values.
So what I would do is create the list of numbers from 1 to 300,000, then generate each column value from that by adding the column prefix (A-Z) and calculating a hex number from the ID (1-300000) and the offset for that column.
You can do this by creating a single column table with the IDs and a helper table with 26 columns of INT and the offset for each one. You can then generate the entire target table in an INSERT/SELECT by joining your ID table with the Column Offset table and using an expression to generate each column value, e.g.
INSERT INTO TargetTable
SELECT ID,
'A' + RIGHT(INTTOHEX(ID+CS.COL_A
'B'.........
FROM ID_Table,
Column_Offset CS
Regards,
Bill
I agree that "Using a RAND function will not guarantee the uniqueness". But possibility of uniqueness depends on how that function is used. A dice has 6 sides, a two digit number has only 99 possibilities. If you use rand function 6 times to generate 6 different characters, you would have 308,915,776 possibilities and the chance of uniqueness would be high enough even though it is not guaranteed.
I think using some seed rows is not a good idea.
The problem is that the procedure SybaseUK is using does not generate six individual random characters for each of the 25 fields for each row. It looks as if he is simply stuffing a couple of characters in. Besides, if you did try calling RAND 150 times for each row, I suspect there would be a performance issue. I am pretty sure that RAND is an ASA function, not an IQ function, so all the data would have to pass back and forth between the two engines. This kind of thing should be better documented in IQ but as of now it is more experiencial than anything else.
Bottom line, if you need guaranteed uniqueness, you cannot use random functions.
Regards,
Bill
I might be wrong, but my understanding is this is a one off process to run and prepare a sample data for a test. So performance shouldn't be an issue. Also data can be prepared on another server and then loaded into IQ. Also if it is the case and there are a few duplicate rows, they can be deleted.
You are right saying that instead of using seed rows generating all rows at once would be a better approach. I am not really sure whether using rand function to create six character data is easier or your method is easier. But, I think they are both valid methods.
There might be around 300 duplicate rows out of 300K rows when rand() is used, but they can be deleted easily. Also it depends on how many charaters are used. If small caps and numbers are used, the possibility of uniqueness would be much higher.
Hi alpmoon,
I have tried using the loop method you have suggested but I am still getting the same results as I did without the loop. Can you please look at my code and tell me whats wrong?
declare @max int
declare @cur int
declare @char char(2)
select @cur = 1, @max = 3
while @cur < @max
begin
select @char = substring(convert(char, rand(@cur)),2,2)
insert into Test1 (C1)
select
stuff(C1,5,2,@char) from Test1
select @cur = @cur + 1
end
As you can see I am using stuff to append the last 2 random numbers to replace the last 2 characters of C1. I cannot use literal values therefore I am using INSERT/SELECT...
Any help appreciated please....
Hi Bill,
alpmoon is right in that the sample data does not need to 100% unique and I agree with him. However, the loop method isn't working presently and I am stuck. I desperately need to get around 300k rows and as alpmoon suggested, I can easily remove any duplicates.
The problem I am facing is that the rand() function is not producing random values which I can then stuff into C1, C2, C3 etc....
I think the main issue is rand() generates a number between 0 and 1. So you should multiply that number and find a way to convert it to a character. One way of doing it:
@char1 = char(rand()*100000) % 26) + 65)
This should produce a character between A and Z. I haven't tested it. Some corrections might be needed.
Then you can repeat this in the loop:
declare @max int
declare @cur int
declare @char2 char(2)
select @cur = 1, @max = 3
while @cur < @max
begin
select @char2 = char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)
insert into Test1 (C1)
select stuff(C1,5,2,@char2) from Test1
select @cur = @cur + 1
end
But, you can also repeat it six times to generate the whole string:
select @char6 = char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)
and insert it rather than using insert .. select
Bill you are correct that RAND() will not guarantee uniqueness but a combination of 25 columns with random values appended should be unique (across a single row)....
INSERT/SELECT is doing this process in batches of 300 and thats the reason I think I am not getting a true random value. I need a way of stuffing a new random value (even though there are only 99 possibilities) into the respective columns for every row.
Can you suggest any solution please?
P.S. I'm just trying to generate a modest size table for testing purposes...
Now, I understand what you want better. If you only want to generate only two digit, you should multiply the result of rand() with at least 100:
insert into test1 (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21,
C22, C23, C24, C25)
select stuff(C1,5,2,substring(con
select stuff(C2,5,2,substring(con
select stuff(C3,5,2,substring(con
....
or use module additionally:
insert into test1 (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21,
C22, C23, C24, C25)
select stuff(C1,5,2,substring(con
select stuff(C2,5,2,substring(con
select stuff(C3,5,2,substring(con
......
alpmoon,
Yeah, this gets a log clearer now that we know sybaseUK does not each column value to be unique; just the entire row.
That code will work 90 percent of the time but you still have a problem with single digit numbers. You need to ensure there is a leading zero or you will wind up with about 10 percent of the values being too short.
Try...
select stuff(C1,5,2,right('0'+con
Come to think of it, STUFF is slower than substring and concatenation. We could use...
select substring(C1,1,4)+right('0
Finally, instead of 100000, I would choose a prime number in that vicinity. Something tells me you'll get better (more random) results than using a multiplier that is an even multiple of the modulo.
Regards,
Bill
The text of the syntax error might be useful but I think I know what the issue is.
The modulo operator only works on integers. You will have to add a CONVERT(INT,..... to the expression to get it to function properly.
You know, since you are going to grab two digits by position and then convert them into a string anyway, why bother with the modulo. Try something like....
select stuff(C1,5,2,substring(con
Rega
Bill
Hi Bill,
This is what I began with!
I appreciate that I will get some column values which will be duplicates but the chances of the entire row being a duplicate (or identical) should be should be very less likely. But I am getting many rows which are completely identical so I know there is a bug in my code or the loop.
I was thinking the purpose of the loop is to do single row by row inserts and thus giving me a true random value each time. But I am not doing single row by row inserts, rather I am doubling the amount of rows.
How do I do single row by row inserts but only change/replace the last 2 characters of each value with a random value????
This is getting very frustrating now.
Thanks for all your help.
Hi Guys,
Please disregard all the previous messages. I have decided to scrap my 300 rows of sample data and start from scratch.
My table has an ID (auto-identity) column and 25 columns of char(6). I would like to populate this table with random data and have atleast 300k rows that are unique....please advise the best way to go about
many thanks!
I am correcting the syntax there:
declare @max int
declare @cur int
declare @char6 char(6)
select @cur = 1, @max = 300
while @cur < @max
begin
select @char6 = char((rand()*100000 % 26) + 65)+char((rand()*100000 % 26) + 65) +
char((rand()*100000 % 26) + 65)+char((rand()*100000 % 26) + 65) +
char((rand()*100000 % 26) + 65)+char((rand()*100000 % 26) + 65)
insert into Test1 (C1)
select @char6
select @cur = @cur + 1
end
I have tested it for one column and it works fine(covers A to Z). You can repeat @char6 variable as many as you can.
Hi guys,
I now have the following and it works fine but it takes a very long time and also eventually runs out of space. Is there a quicker way of doing this?
declare @max int
declare @cur int
//declare @char6 char(6)
select @cur = 1, @max = 100000
while @cur < @max
begin
insert into Test1 (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21,
C22, C23, C24, C25)
select stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
stuff(inttohex(convert(int
select @cur = @cur + 1
end
Actually the code I have tested was for ASE. If you have an ASE server. You can extend it for 25 columns and run for 300K rows. Then bcp out the data and insert into IQ. Inserting data into IQ one row at a time is not a good idea at all as Bill said before.
Or if you have ASA, you can do it in ASA as well. If you don't both, directing output to a text file and then eliminating unnecessary lines and tabs by sed might be a better option without writing a shell script.
All I can do is repeat my suggestion from early in the thread....
So what I would do is create the list of numbers from 1 to 300,000, then generate each column value from that by adding the column prefix (A-Z) and calculating a hex number from the ID (1-300000) and the offset for that column.
You can do this by creating a single column table with the IDs and a helper table with 26 columns of INT and the offset for each one. You can then generate the entire target table in an INSERT/SELECT by joining your ID table with the Column Offset table and using an expression to generate each column value, e.g.
INSERT INTO TargetTable
SELECT ID,
'A' + RIGHT(INTTOHEX(ID+CS.COL_A
'B'.........
FROM ID_Table,
Column_Offset CS
This will do all the inserts in one pass and be at least 100 and probably closer to 1000 times faster than the individual inserts you were doing.
How you create the ID_Table rows is irrelevant. You could create a list of 300000 numbers in a shell script, stuff it in a flat file, and then do a LOAD TABLE to get it in.
Regards,
Bill
Hi,
I have now inserted the data into ASE and I have unique rows of data...
I want to bcp the data out and have it pipe delimited and row delimited by new line. Can you please show me how I can format the data so it is pipe delimited and row delimited by new line? I am using windows.....
I will then load this flat file into IQ
Thanks,
Hi,
Can anybody explain why a simple query like the following takes around 6 seconds in ASE but is taking around 4 mins in IQ????????????
select C1, count(*)
from test1
group by C1
having count(*) > 1
I have loaded the IQ table by generating the random data in ASE and BCP'ing the data out. I am very confused why it is taking so long for this query to run.....
Can some1 please have a look at my query plan also and help me?
Business Accounts
Answer for Membership
by: alpmoonPosted on 2009-05-06 at 04:23:58ID: 24313274
You can use rand() function for that purpose. There is a short article about the usage:
il?id=648
http://www.sybase.com/deta
You can develop a similar loop to create rows.