Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

creating a huge text file having a repetitive randomized string in C

Posted on 2013-01-30
17
Medium Priority
?
348 Views
Last Modified: 2013-02-02
Hi there,

I am trying to create a huge text file having a repetitive randomized string in C.

I can randomize the string values and also can write to the file. What i am trying to do is to have a file having a huge size, say 500 Mb.

Is there a smart way to do this?

Regards.
0
Comment
Question by:jazzIIIlove
  • 7
  • 5
  • 3
  • +1
17 Comments
 
LVL 60

Accepted Solution

by:
Julian Hansen earned 1252 total points
ID: 38834885
Is this a once off function or do you need to do this often?

When you say "smart way" what do you mean?

If you need to write the values to the file then buffer the string in a large memory block and then repeat write the block - I am assuming what you want is this

Random string: abcd1234

And you want a 500Mb comprising multiple repeats of this string so
abcd1234abcd1234abcd1234 ... abcd1234 = 500MB

In which case I would create a buffer (say 50K) - fill this with the randomized string (make the buffer a multiple of the string length) and then repeat write the buffer
int bufflen = 50 * 1024;
int strsize;
char * buffer = new char[bufflen];
char randstr[9];
*buffer = '\0';
strcpy(randstr, "abcd1234");
strsize = strlen(randstr);

while (bufflen >= 0) {
  strcat(buffer, randstr);
  bufflen -= strsize;
}
pfile = fopen('bigfile.txt','wt');
for(int i=0; i < filesize/bufflen;i++)
{
  fwrite(buffer, 1, bufflen, pfile);
}
fclose(fp);
delete buffer;

Open in new window

0
 
LVL 13

Assisted Solution

by:Hugh McCurdy
Hugh McCurdy earned 252 total points
ID: 38835049
I think Julian's answer is pretty good.  However, it's hard to know without really understanding the purpose of the project.

For instance, is the string for some sort of security scheme?  If not, what is it for?  The answer could help with finding an answer that suits your actual need, whatever it is.

Also, what do you mean by "repeatedly?"  Repeat the random string several times until you get to 500MB or do you want a very long string that is randomized?

Occurs to me that a simple approach, if using the GNU compiler and libraries is to make your string from a sequence of unique characters and then call strfry() which will "randomize" your string.  Then you can repeatedly write that out (if that's what you mean).
0
 
LVL 12

Author Comment

by:jazzIIIlove
ID: 38835615
Hi guys;

I like Julian's approach. For clarification. The file content is as follows. It's one off thing,

Test A:
12 0.4
.....
.....
Test B:
....
.....
Test Z:
.....
....
This is the schema where A is incremental until Z, yet it has to stay as single character
The numbers 12 0.4 are tab and space delimited, and they can repeat until the size of the file is huge.

Regards.

I am using VS as the tool and its compiler. Not GNU. I can also use GNU C for this need.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 13

Expert Comment

by:Hugh McCurdy
ID: 38836049
I think Julian's approach is good too.  I was just concerned about what you are trying to do but now it appears you just want to make some test data.  I think you are good to go with Julian's answer.
0
 
LVL 12

Author Comment

by:jazzIIIlove
ID: 38841490
I also think Julian's approach is good but I think there is problem in the while loop, when the bufflen is 0 or below, it leaves the loop with 0 or a negative value and for loop fails.

So, I added this code to the solution. It works good with this, but let's see what Julian says.

while (bufflen >= 0) {
		strcat(buffer, randstr);
		actual = bufflen;
		bufflen -= strsize;  
		if(bufflen <= 0)  
			break;
	}
	bufflen = actual;

Open in new window


Regards.

P.S. Also there is a linked question in the link:
http://www.experts-exchange.com/Programming/Languages/C/Q_28016182.html
0
 
LVL 60

Assisted Solution

by:Julian Hansen
Julian Hansen earned 1252 total points
ID: 38841582
My solution was pseduo to illustrate a point and was predicated on the fact  that bufflen was a multiple of strsize

For the algorithm to work as it is you need to make the buffer size a multiple of the string you are replicating.
0
 
LVL 12

Author Comment

by:jazzIIIlove
ID: 38842444
Hi;

Thanks for the information.

I distort your algorithm. The following is the code. The problem is the line
for(int i = 0; i < filesize/100;i++)

Open in new window

.

I am trying to populate just as that, the inner for should be the main source of randomization and repetition. But when you run this, it will run smoothly but I cannot go for larger files more that 2.5 MB...All in all, the numbers in the file are the main source
for(int i = 1000; i < 20000; i+=1000){

Open in new window

to make the file larger. I want to see at least 500 MB or so, but it seems to populate extremely slow and I am afraid that there will be a problem with buffer.

Following is the code, the code quality seems sucks. What do you think? How can I achieve around 500 MB or so without getting stuck?

# include <stdio.h>
# include <string.h>
# include <time.h>
# include <stdlib.h>
# include <math.h>
# define filesize 50000

char * construct(char c, int bufflen);

void main()
{		
	int strsize;

	FILE *pfile = fopen("bigfile.txt","wt");		 
	int m = 65;
	srand(time(NULL));
	for(int i = 65; i <= 90;i++)
	{
		int bufflen = 50 * 1024;	
		char * buffer = new char[bufflen];
		*buffer = '\0';
		char * randstr = construct(i, bufflen);
		strsize = strlen(randstr);

		bufflen = strsize;	
		fwrite(randstr, 1, bufflen, pfile);

	}
	fclose(pfile);
}

char * construct(char c,int bufflen)
{		
	char randstr[1000000] = "Operator ";	

	randstr[9] = c;
	strcat(randstr,":\n");

	for(int i = 0; i < filesize/100;i++)
	{
		int j = 0;
		int r[50];
		double p[50];
		double tr[50];
		for(int i = 1000; i < 20000; i+=1000){
			r[j] = rand() % i + 10;
			p[j] = ((double)(rand() % i)/(double)RAND_MAX+rand() % (j+1));	 	

			char integer_string[32];
			sprintf(integer_string, "%d", r[j]);

			char double_string[32];
			sprintf(double_string, "%.2f", p[j]);
			strcat(randstr, integer_string);
			strcat(randstr, " \t");
			strcat(randstr, double_string);
			strcat(randstr, "\n");
			j++;
		}	
	}
	return randstr;
}

Open in new window

0
 
LVL 60

Assisted Solution

by:Julian Hansen
Julian Hansen earned 1252 total points
ID: 38842685
There is a lot about your algorithm I don't understand.

Why are you storing values in the p and r arrays - why not just have an integer and double value as you don't seem to be using the value after you have added it to the string.

I have not compiled and tested this - but is this not in essence what you are trying to do (we can address the speed issues later - first need to understand what you are trying to achieve)
# include <stdio.h>
# include <string.h>
# include <time.h>
# include <stdlib.h>
# include <math.h>
# define filesize 50000

char * construct(char c, int bufflen);

void main()
{		
	FILE * pfile = fopen("bigfile.txt","wt");		 
	
	srand(time(NULL));
	
	for(int i = 65; i <= 90;i++)
	{
		char * randstr = construct(i, bufflen);
		int len = strlen(randstr);
		fwrite(randstr, 1, len, pfile);
	}
	
	fclose(pfile);
}

char * construct(char c,int bufflen)
{		
	static char randstr[1000000];	
	
	strcpy(randstr, "Operator");
	randstr[9] = c;
	strcat(randstr,":\n");

	for(int i = 0; i < filesize/100;i++)
	{
		int j = 0;
		int r;
		double p;

		for(int i = 1000; i < 20000; i+=1000){
			r = rand() % i + 10;
			p = ((double)(rand() % i)/(double)RAND_MAX+rand() % (++j));	 	

			char result_string[32];
			sprintf(integer_string, "%d\t%.2f\n", r, p);
			strcat(randstr, result_string);
		}	
	}
	return randstr;
}

Open in new window

0
 
LVL 85

Assisted Solution

by:ozo
ozo earned 496 total points
ID: 38842841
# include <stdio.h>
# include <string.h>
# include <time.h>
# include <stdlib.h>
# include <math.h>
# define filesize 50000

void construct(char c,FILE *pfile);
void main()
{		
	FILE * pfile = fopen("bigfile.txt","wt");		 
	
	srand(time(NULL));
	
	for(int i = 65; i <= 90;i++)
	{
		construct(i, pfile);
	}
	
	fclose(pfile);
}

void construct(char c,FILE *pfile)
{		
  	
	fprintf(pfile,"Operator%c:\n",c);

	for(int i = 0; i < filesize/100;i++)
	{
		int j = 0;
		int r;
		double p;
		for(int i = 1000; i < 20000; i+=1000){
			r = rand() % i + 10;
			p = ((double)(rand() % i)/(double)RAND_MAX+rand() % (++j));	 	

			fprintf(pfile, "%d\t%.2f\n", r, p);
		}	
	}
}

Open in new window

0
 
LVL 60

Assisted Solution

by:Julian Hansen
Julian Hansen earned 1252 total points
ID: 38843329
Ozo has optomised further - I think were things got confused is that the original recommendation was to write a pre-created buffer multiple times to the same file to generate a large file.

Your updated code shows that each buffer is essentially different so there is no benefit in precreating the buffer - as Ozo has done it makes more sense to simply output the data to the file.
0
 
LVL 12

Author Comment

by:jazzIIIlove
ID: 38843374
Yup, that seems true. Sorry it was evolving in my mind.

The code is clean, yet i cannot have a huge filesize. /100 makes it smaller, if i remove, my machine seems stuck..

Also is there a need to free the pointer?

Regards.
0
 
LVL 12

Author Comment

by:jazzIIIlove
ID: 38845046
another question is that;

do you think bringing a multithreaded approach helps to run faster, or better?

#pragma omp parallel for
	for(int i = 0; i < filesize;i++)

Open in new window


or should I put this to the outer loop?

#pragma omp parallel for
for(int i = 65; i <= 90;i++)
      {

or should I open a new question?

regards.
0
 
LVL 85

Expert Comment

by:ozo
ID: 38845199
Which code seems stuck?
The one that writes directly to the file,
or the one that repeatedly scans to the the end of a large buffer in order to append to it?
0
 
LVL 12

Author Comment

by:jazzIIIlove
ID: 38845310
thanks, your question is extremely wise. I can debug the code but per line and no slowness at all but when I run, the loop takes too long to execute.As you see, i change the loop condition from filesize/100 to filesize where filesize is 500000. I can produce around 2 GB with no failure but the creation takes time.

Do you think i should go for a threaded solution by putting that pragma line to the for loop in construct function or the loop in the main or both? do you think it can produce a notable improvement?

I went for the pragma parallel idea from:
http://stackoverflow.com/questions/4835192/threaded-for-loop-in-c
http://www.viva64.com/en/a/0054/

Regards.
0
 
LVL 85

Assisted Solution

by:ozo
ozo earned 496 total points
ID: 38846210
Unless you are using quadratic time string shuffling operations, which I've told you how to avoid, your bottleneck will probably be  just disk IO.
0
 
LVL 60

Assisted Solution

by:Julian Hansen
Julian Hansen earned 1252 total points
ID: 38846502
Disk IO is always going to be a bottle neck - parallelising your code is not going to acheive anything because you are still going to have to wait for the disk operations to complete.

What you want to do is rather make sure that the chunks you write to disk are as big as possible.
0
 
LVL 12

Author Closing Comment

by:jazzIIIlove
ID: 38847242
Thanks guys.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.
Suggested Courses

886 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question