Solved

Try to read a huge text file via C# or Java.

Posted on 2013-01-25
5
679 Views
Last Modified: 2013-02-02
Hi there;

This is a hypotetical question. Say that, I am trying to read extremely huge text file via C# or Java. Now, first how can i read a huge file in those languages?

Second, how can i be sure that for each line I read, there won't be a collapse in the reading process?

I need to extract the content per line and work on the extracted atoms which are blank delimited. I am planning to have a thread for this very process for each line but should I join the thread to the main program or what can be the strategy? There is no write operation on the file, and assume that the file is prepopulated.

Can you give me some strategies above this scenario in Java and C#?

Regards.

P.S. The culprit is the huge file size, I have to be sure that it won't collapse once I open the file for read purpose and read. I can also go for C or PHP for this need.
0
Comment
Question by:jazzIIIlove
5 Comments
 
LVL 44

Assisted Solution

by:AndyAinscow
AndyAinscow earned 167 total points
ID: 38817851
>>Now, first how can i read a huge file in those languages?
Exactly the same way as a small file - the size makes no difference to the functions you use to read.

>>Second, how can i be sure that for each line I read, there won't be a collapse in the reading process?
What do you mean collapse?

>>I need to extract the content per line and work on the extracted atoms which are blank delimited. I am planning to have a thread for this very process for each line
Awk - do you mean having millions of threads ?
0
 
LVL 29

Accepted Solution

by:
Göran Andersson earned 167 total points
ID: 38817871
For a small file you could just read all of it at once, but for a large file you would want to use a StreamReader.

If you just read lines and start a new thread for each line, you will quickly start a huge number of threads and congest the system. Instead you should start a limited number of threads, and have them polling a synchronised queue to pick up work from. Then you read lines from the file and put in the queue, and when the queue reaches a certain size you just wait for the threads to pick items from it before continuing to read from the file.
0
 
LVL 86

Assisted Solution

by:CEHJ
CEHJ earned 166 total points
ID: 38818251
The size of the file is immaterial. What counts is the size of the buffers into which that file is read. For a BufferedReader the default is 8192 bytes but you can make it any size you like.
Since your file is essentially a csv file you have absolutely no problems with buffering.

I am planning to have a thread for this very process for each line but should I join the thread to the main program or what can be the strategy?
If you're right about a multi-threaded approach being a good one - again there's no problem. One thread could read the file into a queue and that queue could be processed by threads from a thread pool. Of course you would have to justify to yourself that such a relatively complex approach was a good strategy.
0
 
LVL 13

Expert Comment

by:Hugh McCurdy
ID: 38835077
I'm guessing this is related to the desire to create a large file with random data that you since posted.

I'm pretty much with the others, just read the file.  

This comment about collapse puzzles me.  Are you worried about running out of RAM?   If so, why?  I'm wondering if this is more about memory leaks than something else.  Can't tell.
0
 
LVL 12

Author Comment

by:jazzIIIlove
ID: 38847247
Ah,
Thanks for the strategies and comments. I think I resolved this.

Regards.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now