Link to home
Start Free TrialLog in
Avatar of markov123
markov123

asked on

Creating Dynamic Array

My program reads text files of different sizes. The way I would do it is to store all characters in an array and then process them.

What I have in mind is to create a dynamic array where the size will be determined by the file size.


I have 2 questions,

1. Is my approach right?, if yes please show me how to create dynamic array keeping in mind I will be reading text files of different sizes. I would appreciate a clear example since I am new to C.

2. If my approach is wrong, please let me know the correct way, with example.

Thanks in advance,

Mark
SOLUTION
Avatar of sunnycoder
sunnycoder
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>>1. Is my approach right?
Just an addition to sunnycoder's comments : If you know that the files you're gonna be reading won't be bigger than a certain limit you set (eg. 100kB), then this approach is ok ... If however you might also deal with very large files, then reading them in into memory completely is probably not the best approach (unless you have a LOT of FAST memory and a FAST hard-disk-to-memory transfer line). That's why sunnycoder suggested using a buffered approach : you fill up the buffer, and process it, then fill it up again ...
Hi markov123,

Good advise so far.  :)

If this were my application, I'd consider using the mmap() function (in unix/linux) to map the file to memory.  Then the operating system will do all of the work and all you have to do is process the data in the array.


Good Luck!
Kent
Avatar of tushar_comp
tushar_comp


 hi,

 ok u can use ftell() to get the size of files in bytes.
 then go for malloc or calloc whatever u want or go for char pointer instead
 read all stuff i these premative types
 and do operations
 i think thsi apporocah will work out
 
 if i m on rite path to answer ur query  
 let me know

 Tushar
Hello,

I think that regardless of the size of the files, the only reason that you might wish to read all the file from scratch and then process it is not that good, with the exception that you wish to process the file as fast as you can. But even in this case there are better ways.
I would recomend you the following, not exactly simple but very effective way of processing different size files.

Create a cyclic buffer of length n consisting of the following described stractures/nodes (that i will name below WordNode) .  Each node (lets call it WordNode) will have a char[] (lets call it word)  and a pointer to WordNode (lets call it nextptr). Each ptr will point to the next sequence node, BUT the LAST node will point to the FIRTST node. imagine it as a chain, each ring contains one world and the chain is closed.
Also 3 pointers (headptr, readptr,processptr) will point to the first node of the buffer.

After having this structure you will create 2 processes -or more depending of the application- (readProcess() processProcess() ) -see fork() on how to create a process-.  The read process will read a word from the file and put it to the node that readptr points. then you will move readptr to the readptr->nextptr -which is the next node of the buffer-.
The processProcess() will get the word from the node that the readptr points and will move then move this pointer to the next node (processptr->nextptr). The size of the word array, and the size of the buffer are a matter of you and you may manage to optimise better your code in terms of memory and spead.
One thing you need to be carefull about is that (apart from initialisation) pointer processptr must never be at the same node as the readptr.

Good thing with this technique is that you use the minimun required memory with the maximun posible spead -in turms of design-. Also that you may optimise your code by setting the buffer size. You could i.e. acquire the file size and set the buffer size based on the file's size.
The bad thing is that it is a little more coplex than what you have in mind.
The most usual and simple -for small projects- is what "sunnycoder" told you.

char buffer[MAX_BUF_LEN];
fptr = fopen(...)
while (fgets(buffer, MAX_BUF_LEN, fptr))
{
       //process one line of input present in buffer
}
fclose(fptr);

I do not know if i helped you, it is my 1st time possting code solution. Please let me know if you need any help with what i have told you. i would be happy to help you.

Hi rongasa,

Welcome to Experts-Exchange .... hope you have a good time here .. Rgearding your post:
>Good thing with this technique is that you use the minimun required memory with the maximun posible spead
>-in turms of design
How do you determine the size of word[] ? If you allocate it dynamically, you would be spending a lot of time malloc'ing. malloc()s are lot more expensive than your usual code. If you allocate it statically, how do you determine the size of longest word[] ?

If you have it static, then what is the advantage of maintaing a circulat list? It could as well have been a single array and you could have treated it as a circular buffer using the two pointers ... that would save you the memory being spent for maintaining next pointers .... 4 bytes per node (you wont need headptr in either case)

While I am of the opinion that best way depends on the situation you are in, but this does not seem convincing - may be of academic interest.

Cheers!
sunnycoder
Ok,it  always depends on the application, what kind of code you will use, this is true. My point is the following though and it applies in some applications -which means that your solution is no less or more significant/correct than mine, i hope this is clear. Atleast it is clear to me-

After having created the cyclic buffer (initialisation of the problem... thus it is common not to care for the time -especially of malloc??-),  you have two processes acting in parallel!! Which means that for big files you start processing the file from the time you read the first word, and you continiusly read the file as well. So you lose no time.
The buffer is a dynamic list containing a static array that i name word. This static array, word, may have the minimun length of the unit that we wish to process, either it is a word, a line or a single character.

I simply do not wish to convince anyone of what is the best way to do something, nor to start comments uppon commnets on this, however, regarding the "academic interest only" i need to mention that this code design is used from many OS regarding the keyboard's, electronic filters and other applications. I have also used it in a gatekeepper server of my company and thus have run lint on this code and the results were very good. However once again, I wish not to offend anybody nor to excibit my "expertiese" on something. I just provide my opinion and one solution that i like.
Kind Regards,
Rongas
If I'm right, sunnycoder didn't mean to imply that your solution is a bad one. He just said (and I agree) that it could be made faster by using a static cyclic buffer, instead of allocating and de-allocating it dynamically. a malloc call does take a considerable amount of time, so if you can eliminate it, you speed up the reading of the file. And since the processing of the file depends on the read speed, you can also speed up the processing time, thus speeding up the whole application.

A lot depends on the amount of processing you have to do for each "word". And of course the transfer speed between hard disk and memory.

I'll go even further : to optimise for speed, it's best to read blocks from the file that are as big as possible. The smaller the blocks are, the higher the transfer overhead will be, and thus the slower you'll be able to read the file. Obviously reading in the whole file at once is the optimal solution (when optimizing for speed) as I've mentioned earlier in this thread, but the size of the file and of the available memory limits the usability of this solution for certain applications. Thus a buffer (cyclic or not) should be used, and preferrably a large buffer.

Reading in words at a time (what's the size of a word for you btw ?) is not optimal, especially not if you have to allocate memory for each word separately and that all the time. But it can have its advantages for certain applications. A solution depends a lot on the problem, and there's almost never a "solution that fits all".
Avatar of markov123

ASKER

I would like to thank you all for the valuable information. At the moment I am learning C so I cannot implement all the stuff you recommended.

Maybe if I showed you part of the program I wrote will help explain what I am after:

while ((in_char = fgetc(rfile)) != EOF)  
     {
            
      process in_char    


      }



The problem here is that once execution leaves the while loop block, in_char contains only the last read in character.

What I need is to accummulate the characters into a string variable, giving me a freedom to process the string.


char in_str[50];

so I can include within the above loop, eg

   
{
   ...
   ...

   in_str[i] = in_char;
   ++i;
}


This is where I face difficulties.

Thanks,

M
Instead of reading character per character, you can read in a string using fgets() (as shown in the code in sunnycoder's first post eg.). Is that what you're looking for ?

http://www.cplusplus.com/ref/cstdio/fgets.html
OK, I got the program to work using fgets().

while (fgets(inchars, 80, infile) !=NULL)
 {
    fputs(inchars,  outfile);
    count++;
   
    for (i = 0; i < strlen(inchars); ++i)
    {         
        ptr = inchars;
       (*ptr + strlen); <---- why this does not seem to work
    }
}/*endwhile*/

 puts(ptr);

This prog reads from  a text file and then writes to another file using fputs(inchars, outfile). In the for loop I am trying to accummulate inchars, move a pointer no. of characters read and write the next line. Unfortunately, when I print out, I get the last line.

I would appreciate any pointers :)
There is a typo there, the pointer incremment line should read:

(*ptr + strlen(inchars));
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I have managed to do the above, but this is not what I want, if I can put it in one statement.

"I want to read a file and copy contents of that file into an array".

As you can see from here:

     ptr = inchars;
     (*ptr + strlen(inchars));

ptr is pointing to an array

char outchars[1024];
char *ptr;

ptr = outchars;


This may not be the most efficient method, but am learning C and experimenting with simple stuff before getting to complex ones.

What I need now is to move the pointer depending on no of characters, but it doesnt seem to work.

Please enlighten me, thanks.

M
>> "I want to read a file and copy contents of that file into an array".
That's what the code I gave does (it also copies the file to another file because your code also did that) ... which part of it doesn't do what you wanted ?

>>     ptr = inchars;
>>     (*ptr + strlen(inchars));

That's one of the mistakes in your code. if ptr is a char*, then *ptr will dereference that pointer, and will be a char. Then you're adding a value to that character, so you'll change the character, not the pointer. The correct way to do this would be :

     ptr = inchars;
     (ptr + strlen(inchars));

This will move ptr to the end of the text in the inchars array. If you check the code I gave you, you'll see that I did something similar.

>> This may not be the most efficient method, but am learning C and experimenting with simple stuff before getting to complex ones.
No problem, that's the best way to learn :)

>> What I need now is to move the pointer depending on no of characters, but it doesnt seem to work.
See my earlier comment about *ptr

>> Please enlighten me, thanks.
I hope I did ... if not, let me know.