Try to read a huge text file via C# or Java.

Hi there;

This is a hypotetical question. Say that, I am trying to read extremely huge text file via C# or Java. Now, first how can i read a huge file in those languages?

Second, how can i be sure that for each line I read, there won't be a collapse in the reading process?

I need to extract the content per line and work on the extracted atoms which are blank delimited. I am planning to have a thread for this very process for each line but should I join the thread to the main program or what can be the strategy? There is no write operation on the file, and assume that the file is prepopulated.

Can you give me some strategies above this scenario in Java and C#?

Regards.

P.S. The culprit is the huge file size, I have to be sure that it won't collapse once I open the file for read purpose and read. I can also go for C or PHP for this need.
LVL 12
jazzIIIloveAsked:
Who is Participating?
 
Göran AnderssonConnect With a Mentor Commented:
For a small file you could just read all of it at once, but for a large file you would want to use a StreamReader.

If you just read lines and start a new thread for each line, you will quickly start a huge number of threads and congest the system. Instead you should start a limited number of threads, and have them polling a synchronised queue to pick up work from. Then you read lines from the file and put in the queue, and when the queue reaches a certain size you just wait for the threads to pick items from it before continuing to read from the file.
0
 
AndyAinscowConnect With a Mentor Freelance programmer / ConsultantCommented:
>>Now, first how can i read a huge file in those languages?
Exactly the same way as a small file - the size makes no difference to the functions you use to read.

>>Second, how can i be sure that for each line I read, there won't be a collapse in the reading process?
What do you mean collapse?

>>I need to extract the content per line and work on the extracted atoms which are blank delimited. I am planning to have a thread for this very process for each line
Awk - do you mean having millions of threads ?
0
 
CEHJConnect With a Mentor Commented:
The size of the file is immaterial. What counts is the size of the buffers into which that file is read. For a BufferedReader the default is 8192 bytes but you can make it any size you like.
Since your file is essentially a csv file you have absolutely no problems with buffering.

I am planning to have a thread for this very process for each line but should I join the thread to the main program or what can be the strategy?
If you're right about a multi-threaded approach being a good one - again there's no problem. One thread could read the file into a queue and that queue could be processed by threads from a thread pool. Of course you would have to justify to yourself that such a relatively complex approach was a good strategy.
0
 
Hugh McCurdyCommented:
I'm guessing this is related to the desire to create a large file with random data that you since posted.

I'm pretty much with the others, just read the file.  

This comment about collapse puzzles me.  Are you worried about running out of RAM?   If so, why?  I'm wondering if this is more about memory leaks than something else.  Can't tell.
0
 
jazzIIIloveAuthor Commented:
Ah,
Thanks for the strategies and comments. I think I resolved this.

Regards.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.