Solved

Sybase IQ generating distinct rows for dummy table

Posted on 2009-05-06
32
2,678 Views
Last Modified: 2012-05-06
Hi,

I am trying to generate a table containing 25 columns and populate with some sample data. My aim is to get around 300k rows. I have generated the table with 300 distinct rows of sample data using powerdesigner. Does anybody know a nice way of blowing up this table to 300k rows without duplicating any of the data?? I tried using the STUFF function to take the last 2 digits of the ID and append to each column as follows: -

insert into test1 (C1)
select  stuff(C1,5,2,substring(convert(char, ID), length(convert(varchar, ID))-1,2))
from test1

But this method gives me duplicate values when I execute distinct count. Can some1 please help?

But this


create table TEST1 (

   ID                   int                            null default autoincrement,

   C1                   char(6)                        null,

   C2                   char(6)                        null,

   C3                   char(6)                        null,

   C4                   char(6)                        null,

   C5                   char(6)                        null,

   C6                   char(6)                        null,

   C7                   char(6)                        null,

   C8                   char(6)                        null,

   C9                   char(6)                        null,

   C10                  char(6)                        null,

   C11                  char(6)                        null,

   C12                  char(6)                        null,

   C13                  char(6)                        null,

   C14                  char(6)                        null,

   C15                  char(6)                        null,

   C16                  char(6)                        null,

   C17                  char(6)                        null,

   C18                  char(6)                        null,

   C19                  char(6)                        null,

   C20                  char(6)                        null,

   C21                  char(6)                        null,

   C22                  char(6)                        null,

   C23                  char(6)                        null,

   C24                  char(6)                        null,

   C25                  char(6)                        null

);
 

insert into TEST1 (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21, C22, C23, C24, C25) values ('H0A9HT', 'SGYO53', '27V8L2', '9EM508', ' MX12I', '5KP34L', '2S40AO', '2XIYW6', '0G5COU', 'T2QLOI', '4AQXA8', 'YIA80X', ' 6L8S2', 'IFR7NN', '8223LK', 'XHEKTW', '304GVR', 'W EMVT', '08G1JX', 'LT3CVB', 'GT38TX', '21DHHX', '5MKAWW', 'BNCISN', '1XDDJ2');
 

insert into TEST1 (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21, C22, C23, C24, C25) values ('RU812M', 'M6 7WG', '3HUU4A', 'QU55G9', 'RJMNOY', 'WRDJMH', 'V8KGCT', 'UWONM1', '6C3O Q', '9Y  6G', '2H927L', 'TVUT U', 'R5RG0N', '9SUJ37', 'VTOO2Y', '4XPLMN', 'OLMPU8', 'WU6VLD', 'P70A3X', ' 12VPO', '0MI3TX', 'G2E1W2', 'NNBMI5', 'NOH5C9', '73MKPD');

Open in new window

0
Comment
Question by:SybaseUk
  • 17
  • 8
  • 7
32 Comments
 
LVL 13

Expert Comment

by:alpmoon
Comment Utility
You can use rand() function for that purpose. There is a short article about the usage:

http://www.sybase.com/detail?id=648

You can develop a similar loop to create rows.
0
 

Author Comment

by:SybaseUk
Comment Utility
Hi,

Thanks for that info. How would I create the loop to insert new rows (i.e. double the rows with distinct values) instead of updating the existing column values?

Thanks!
0
 
LVL 13

Expert Comment

by:alpmoon
Comment Utility
Just as an example:

declare @max int
declare @cur int

select @cur = 1, @max = 300000 -- test with a smaller number first as I haven't tested it

while @cur < @max
begin

insert into TEST1 values (str(rand()), str(rand(@cur)), str(rand()), ....)

select @cur = @cur + 1

end

You can use different seed values to manipulate random generator as well, I just put @cur as seed as an example.
0
 
LVL 19

Expert Comment

by:grant300
Comment Utility
You have another problem as well.  Doing 300K individual inserts in Sybase IQ will take a very long time.  IQ simply is not made for fast individual record manipulation.

If you have to do all this inside IQ, create the table without any indexes (except the default FPs of course) and populate it.  Start with a few thousand rows of random data.  For now, leave the IDs blank.  Then, use INSERT/SELECT to double the number of rows each time.  If you start with 9375 seed rows and double it 5 times, you will wind up with 300K.

After you have all the rows you need, you can go back and assign the IDs in bulk.    You can either use something like UPDATE table SET IDcol = NUMBER(*).  Alternatively, you can use the ROWID with something like... UPDATE table SET IDcol = ROWID(table).  Of course, if you need a character prefix or something else, you can add that into the expression you use in the UPDATE

Of course, you can always use scripts or another database (ASA or ASE) to generate the data, BCP it out and do a LOAD TABLE to get it into IQ.

Regards,
Bill
0
 

Author Comment

by:SybaseUk
Comment Utility
Hi Bill,

Many thanks for your reply. As always, your help is very much valued & appreciated.

Its a little more tricky, because my table has 25 columns which are char(6). I designed it with powerdesigner and also used PD to generate 300 rows of sample data. I have been using INSERT/SELECT to double the rows within IQ but the problem is that I end up with duplicate rows. I then tried to use RAND() within my INSERT/SELECT statement but I am still ending up with duplicate rows.

insert into test1 (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21,
C22, C23, C24, C25)
select  stuff(C1,5,2,substring(convert(char, rand()), 2,2))
select  stuff(C2,5,2,substring(convert(char, rand()), 2,2))
select  stuff(C3,5,2,substring(convert(char, rand()), 2,2))
........................
etc

Running this statement twice now gives me 1200 rows with only 900 distinct values in each column.

select top 5 c1 from test1

c1
'H0A9HT'
'RU812M'
'FACK8O'
'UBC86S'
'3REOAK'

select top 5 c1 from test1 where ROWID("Test1") > 300

c1
'H0A9.5'
'RU81.5'
'FACK.5'
'UBC8.5'
'3REO.5'

select top 5 c1 from test1 where ROWID("Test1") > 600

c1
'H0A9.9'
'RU81.9'
'FACK.9'
'UBC8.9'
'3REO.9'

select top 5 c1 from test1 where ROWID("Test1") > 900

c1
'H0A9.9'
'RU81.9'
'FACK.9'
'UBC8.9'
'3REO.9'

Any ideas why my data is being duplicated and not giving me different values using the rand() function???

Thanks
0
 

Author Comment

by:SybaseUk
Comment Utility
applmoon,

I need to use stuff to append the first 2 s.f's of the generated random value to the existing column C1, C2, C3 respectively.....

So I can't use literal values, I must use INSERT/SELECT. However, I am still getting duplicates........
0
 
LVL 19

Accepted Solution

by:
grant300 earned 250 total points
Comment Utility
You were not specific about what you wanted.  If I understand correctly, you want each value in each of the 26 columns to be unique across all 300,000 rows.

Using a RAND function will not guarantee any degree of uniqueness at all.  How could it?  They are random numbers.  It is a bit like rolling a die (one dice) six times and expecting to get a 1, 2, 3,, 4, 5, and 6.  It ain't likely to happen; roughly a 1 in 65 chance (precisely 5*4*3*2 in 6^5 chances).

You might want to examine your real requirements.  It would be very unusual for there to be 300,000 rows of 26 six character values and not have any duplicates.  Unless you are generating lottery tokens or something, real life will give you dupes much as you are getting with your random number generation.

If you really need each value to be unique, you will have to create each and every value explicitly.  26 columns cries out for each column to begin with a character of the alphabet.  The remaining 5 characters can be a zero-padded hex number.  You can adjust the starting offset for each one so they are not in total lockstep; you have plenty of room since 5 hex digits gives you just over 1 million possible values.

So what I would do is create the list of numbers from 1 to 300,000, then generate each column value from that by adding the column prefix (A-Z) and calculating a hex number from the ID (1-300000) and the offset for that column.

You can do this by creating a single column table with the IDs and a helper table with 26 columns of INT and the offset for each one.  You can then generate the entire target table in an INSERT/SELECT by joining your ID table with the Column Offset table and using an expression to generate each column value, e.g.

INSERT INTO TargetTable
SELECT ID,
                'A' + RIGHT(INTTOHEX(ID+CS.COL_A),5),
                'B'.........
FROM ID_Table,
            Column_Offset CS

Regards,
Bill
0
 
LVL 13

Expert Comment

by:alpmoon
Comment Utility
I agree that "Using a RAND function will not guarantee the uniqueness". But possibility of uniqueness depends on how that function is used. A dice has 6 sides, a two digit number has only 99 possibilities. If you use rand function 6 times to generate 6 different characters, you would have 308,915,776 possibilities and the chance of uniqueness would be high enough even though it is not guaranteed.

I think using some seed rows is not a good idea.
0
 
LVL 19

Expert Comment

by:grant300
Comment Utility
The problem is that the procedure SybaseUK is using does not generate six individual random characters for each of the 25 fields for each row.  It looks as if he is simply stuffing a couple of characters in.  Besides, if you did try calling RAND 150 times for each row, I suspect there would be a performance issue.  I am pretty sure that RAND is an ASA function, not an IQ function, so all the data would have to pass back and forth between the two engines.  This kind of thing should be better documented in IQ but as of now it is more experiencial than anything else.

Bottom line, if you need guaranteed uniqueness, you cannot use random functions.

Regards,
Bill
0
 
LVL 13

Expert Comment

by:alpmoon
Comment Utility
I might be wrong, but my understanding is this is a one off process to run and prepare a sample data for a test. So performance shouldn't be an issue. Also data can be prepared on another server and then loaded into IQ. Also if it is the case and there are a few duplicate rows, they can be deleted.

You are right saying that instead of using seed rows generating all rows at once would be a better approach. I am not really sure whether using rand function to create six character data is easier or your method is easier. But, I think they are both valid methods.

There might be around 300 duplicate rows out of 300K rows when rand() is used, but they can be deleted easily. Also it depends on how many charaters are used. If small caps and numbers are used, the possibility of uniqueness would be much higher.
0
 

Author Comment

by:SybaseUk
Comment Utility
Hi alpmoon,

I have tried using the loop method you have suggested but I am still getting the same results as I did without the loop. Can you please look at my code and tell me whats wrong?

declare @max int
declare @cur int
declare @char char(2)

select @cur = 1, @max = 3

while @cur < @max
begin
select @char = substring(convert(char, rand(@cur)),2,2)

insert into Test1 (C1)
select
stuff(C1,5,2,@char) from Test1

select @cur = @cur + 1

end

As you can see I am using stuff to append the last 2 random numbers to replace the last 2 characters of C1. I cannot use literal values therefore I am using INSERT/SELECT...

Any help appreciated please....
0
 

Author Comment

by:SybaseUk
Comment Utility
Hi Bill,

alpmoon is right in that the sample data does not need to 100% unique and I agree with him. However, the loop method isn't working presently and I am stuck. I desperately need to get around 300k rows and as alpmoon suggested, I can easily remove any duplicates.

The problem I am facing is that the rand() function is not producing random values which I can then stuff into C1, C2, C3 etc....

0
 
LVL 13

Assisted Solution

by:alpmoon
alpmoon earned 250 total points
Comment Utility
I think the main issue is rand() generates a number between 0 and 1. So you should multiply that number and find a way to convert it to a character. One way of doing it:

@char1 = char(rand()*100000) % 26) + 65)

This should produce a character between A and Z. I haven't tested it. Some corrections might be needed.

Then you can repeat this in the loop:

declare @max int
declare @cur int
declare @char2 char(2)

select @cur = 1, @max = 3

while @cur < @max
begin
select @char2 = char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)

insert into Test1 (C1)
select stuff(C1,5,2,@char2) from Test1

select @cur = @cur + 1

end

But, you can also repeat it six times to generate the whole string:

select @char6 = char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)+char(rand()*100000) % 26) + 65)

and insert it rather than using insert .. select
0
 

Author Comment

by:SybaseUk
Comment Utility
Bill you are correct that RAND() will not guarantee uniqueness but a combination of 25 columns with random values appended should be unique (across a single row)....

INSERT/SELECT is doing this process in batches of 300 and thats the reason I think I am not getting a true random value. I need a way of stuffing a new random value (even though there are only 99 possibilities) into the respective columns for every row.

Can you suggest any solution please?

P.S. I'm just trying to generate a modest size table for testing purposes...
0
 
LVL 13

Expert Comment

by:alpmoon
Comment Utility
Now, I understand what you want better. If you only want to generate only two digit, you should multiply the result of rand() with at least 100:

insert into test1 (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21,
C22, C23, C24, C25)
select  stuff(C1,5,2,substring(convert(char, rand()*100), 2,2))
select  stuff(C2,5,2,substring(convert(char, rand()*100), 2,2))
select  stuff(C3,5,2,substring(convert(char, rand()*100), 2,2))
....

or use module additionally:

insert into test1 (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21,
C22, C23, C24, C25)
select  stuff(C1,5,2,substring(convert(char, rand()*100000 % 100), 2,2))
select  stuff(C2,5,2,substring(convert(char, rand()*100000 % 100), 2,2))
select  stuff(C3,5,2,substring(convert(char, rand()*100000 % 100), 2,2))
......
0
 
LVL 19

Expert Comment

by:grant300
Comment Utility
alpmoon,

Yeah, this gets a log clearer now that we know sybaseUK does not each column value to be unique; just the entire row.

That code will work 90 percent of the time but you still have a problem with single digit numbers.  You need to ensure there is a leading zero or you will wind up with about 10 percent of the values being too short.

Try...
select stuff(C1,5,2,right('0'+convert(varchar,rand()*100000 % 100),2))
Come to think of it, STUFF is slower than substring and concatenation.  We could use...
select substring(C1,1,4)+right('0'+convert(varchar,rand()*100000 % 100),2)

Finally, instead of 100000, I would choose a prime number in that vicinity.  Something tells me you'll get better (more random) results than using a multiplier that is an even multiple of the modulo.

Regards,
Bill
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:SybaseUk
Comment Utility
hi guys,

i was trying to convert the rand() output to char and then take a substring of that i.e. 2 digits and stuff it into C1.

so for example:

select stuff('H0A9HT', 5,2,substring(convert(char, rand()), 2,2))

would produce the result:

H0A935

the last 2 digits being random values
0
 

Author Comment

by:SybaseUk
Comment Utility
Hi,

I get a syntax error from executing the following statement: -

select stuff(C1,5,2,right('0'+convert(varchar,rand()*100000 % 100),2))

If I remove % 100 then the statement executes but I dont think its doing what it should be????

select stuff(C1,5,2,right('0'+convert(varchar,rand()*100000),2))
0
 
LVL 19

Expert Comment

by:grant300
Comment Utility
The text of the syntax error might be useful but I think I know what the issue is.

The modulo operator only works on integers.  You will have to add a CONVERT(INT,..... to the expression to get it to function properly.

You know, since you are going to grab two digits by position and then convert them into a string anyway, why bother with the modulo.  Try something like....

select stuff(C1,5,2,substring(convert(varchar,rand(),3,2)))

Regards,
Bill
0
 

Author Comment

by:SybaseUk
Comment Utility
Hi Bill,

This is what I began with!

I appreciate that I will get some column values which will be duplicates but the chances of the entire row being a duplicate (or identical) should be should be very less likely. But I am getting many rows which are completely identical so I know there is a bug in my code or the loop.

I was thinking the purpose of the loop is to do single row by row inserts and thus giving me a true random value each time. But I am not doing single row by row inserts, rather I am doubling the amount of rows.

How do I do single row by row inserts but only change/replace the last 2 characters of each value with a random value????

This is getting very frustrating now.

Thanks for all your help.  
0
 

Author Comment

by:SybaseUk
Comment Utility
Hi Guys,

Please disregard all the previous messages. I have decided to scrap my 300 rows of sample data and start from scratch.

My table has an ID (auto-identity) column and 25 columns of char(6). I would like to populate this table with random data and have atleast 300k rows that are unique....please advise the best way to go about

many thanks!
0
 
LVL 13

Expert Comment

by:alpmoon
Comment Utility
I am correcting the syntax there:

declare @max int
declare @cur int
declare @char6 char(6)

select @cur = 1, @max = 300

while @cur < @max
begin
select @char6 = char((rand()*100000 % 26) + 65)+char((rand()*100000 % 26) + 65) +
  char((rand()*100000 % 26) + 65)+char((rand()*100000 % 26) + 65) +
  char((rand()*100000 % 26) + 65)+char((rand()*100000 % 26) + 65)

insert into Test1 (C1)
select @char6

select @cur = @cur + 1

end

I have tested it for one column and it works fine(covers A to Z). You can repeat @char6 variable as many as you can.
0
 

Author Comment

by:SybaseUk
Comment Utility
alpmoon,

I have tried the above and I get a syntax error. If I execute this on its own: -

select char((rand()*100000 % 26) + 65)

I get syntax error near '(end of line)' message. Please help me resolve this.
0
 

Author Comment

by:SybaseUk
Comment Utility
Hi guys,

I now have the following and it works fine but it takes a very long time and also eventually runs out of space. Is there a quicker way of doing this?

declare @max int
declare @cur int
//declare @char6 char(6)

select @cur = 1, @max = 100000

while @cur < @max
begin


insert into Test1 (C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21,
C22, C23, C24, C25)
select stuff(inttohex(convert(int,rand()*100000)),1,3,'A'),
       stuff(inttohex(convert(int,rand()*100000)),1,3,'B'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'C'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'D'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'E'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'F'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'G'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'H'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'I'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'J'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'K'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'L'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'M'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'N'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'O'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'P'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'Q'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'R'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'S'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'T'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'U'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'V'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'W'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'X'),
 stuff(inttohex(convert(int,rand()*100000)),1,3,'Y')

select @cur = @cur + 1

end
0
 

Author Comment

by:SybaseUk
Comment Utility
Do you think it would be easier to write a shell script to create a data file containing delimited rows and then load these into IQ???

Can anyone please help me write a C shell script to create a test data file??

Thanks,
0
 

Author Comment

by:SybaseUk
Comment Utility
P.S. I need your help to write a shell script that will create the data (300K rows) each containing a unique id and random data made up of 25 columns of char(6).....

Please let me know whether I shud ask a new question and I would be happy to do that....

Many Thanks,
0
 
LVL 13

Expert Comment

by:alpmoon
Comment Utility
Actually the code I have tested was for ASE. If you have an ASE server. You can extend it for 25 columns and run for 300K rows. Then bcp out the data and insert into IQ. Inserting data into IQ one row at a time is not a good idea at all as Bill said before.

Or if you have ASA, you can do it in ASA as well. If you don't both, directing output to a text file and then eliminating unnecessary lines and tabs by sed might be a better option without writing a shell script.
0
 
LVL 19

Expert Comment

by:grant300
Comment Utility
All I can do is repeat my suggestion from early in the thread....

So what I would do is create the list of numbers from 1 to 300,000, then generate each column value from that by adding the column prefix (A-Z) and calculating a hex number from the ID (1-300000) and the offset for that column.You can do this by creating a single column table with the IDs and a helper table with 26 columns of INT and the offset for each one.  You can then generate the entire target table in an INSERT/SELECT by joining your ID table with the Column Offset table and using an expression to generate each column value, e.g.INSERT INTO TargetTableSELECT ID,                'A' + RIGHT(INTTOHEX(ID+CS.COL_A),5),                'B'.........FROM ID_Table,            Column_Offset CS
This will do all the inserts in one pass and be at least 100 and probably closer to 1000 times faster than the individual inserts you were doing.

How you create the ID_Table rows is irrelevant.  You could create a list of 300000 numbers in a shell script, stuff it in a flat file, and then do a LOAD TABLE to get it in.

Regards,
Bill
0
 

Author Comment

by:SybaseUk
Comment Utility
Hi,

I have now inserted the data into ASE and I have unique rows of data...

I want to bcp the data out and have it pipe delimited and row delimited by new line. Can you please show me how I can format the data so it is pipe delimited and row delimited by new line? I am using windows.....

I will then load this flat file into IQ

Thanks,
0
 

Author Comment

by:SybaseUk
Comment Utility
Hi,

Can anybody explain why a simple query like the following takes around 6 seconds in ASE but is taking around 4 mins in IQ????????????

select C1, count(*)
from test1
group by C1
having count(*) > 1

I have loaded the IQ table by generating the random data in ASE and BCP'ing the data out. I am very confused why it is taking so long for this query to run.....

Can some1 please have a look at my query plan also and help me?
queryplan.doc
0
 
LVL 19

Expert Comment

by:grant300
Comment Utility
That one sounds like a whole new topic and question.  Why don't you close off this one an open a new one.  Give the points to Alpmoon.

Regards,
Bill
0
 

Author Closing Comment

by:SybaseUk
Comment Utility
I am splitting the points 50/50 as both of you participated equally in the final solution..
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

A procedure for exporting installed hotfix details of remote computers using powershell
A safe way to clean winsxs folder from your windows server 2008 R2 editions
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now