Solved

Try to read a huge text file via C# or Java.

Posted on 2013-01-25
5
684 Views
Last Modified: 2013-02-02
Hi there;

This is a hypotetical question. Say that, I am trying to read extremely huge text file via C# or Java. Now, first how can i read a huge file in those languages?

Second, how can i be sure that for each line I read, there won't be a collapse in the reading process?

I need to extract the content per line and work on the extracted atoms which are blank delimited. I am planning to have a thread for this very process for each line but should I join the thread to the main program or what can be the strategy? There is no write operation on the file, and assume that the file is prepopulated.

Can you give me some strategies above this scenario in Java and C#?

Regards.

P.S. The culprit is the huge file size, I have to be sure that it won't collapse once I open the file for read purpose and read. I can also go for C or PHP for this need.
0
Comment
Question by:jazzIIIlove
5 Comments
 
LVL 44

Assisted Solution

by:AndyAinscow
AndyAinscow earned 167 total points
ID: 38817851
>>Now, first how can i read a huge file in those languages?
Exactly the same way as a small file - the size makes no difference to the functions you use to read.

>>Second, how can i be sure that for each line I read, there won't be a collapse in the reading process?
What do you mean collapse?

>>I need to extract the content per line and work on the extracted atoms which are blank delimited. I am planning to have a thread for this very process for each line
Awk - do you mean having millions of threads ?
0
 
LVL 29

Accepted Solution

by:
Göran Andersson earned 167 total points
ID: 38817871
For a small file you could just read all of it at once, but for a large file you would want to use a StreamReader.

If you just read lines and start a new thread for each line, you will quickly start a huge number of threads and congest the system. Instead you should start a limited number of threads, and have them polling a synchronised queue to pick up work from. Then you read lines from the file and put in the queue, and when the queue reaches a certain size you just wait for the threads to pick items from it before continuing to read from the file.
0
 
LVL 86

Assisted Solution

by:CEHJ
CEHJ earned 166 total points
ID: 38818251
The size of the file is immaterial. What counts is the size of the buffers into which that file is read. For a BufferedReader the default is 8192 bytes but you can make it any size you like.
Since your file is essentially a csv file you have absolutely no problems with buffering.

I am planning to have a thread for this very process for each line but should I join the thread to the main program or what can be the strategy?
If you're right about a multi-threaded approach being a good one - again there's no problem. One thread could read the file into a queue and that queue could be processed by threads from a thread pool. Of course you would have to justify to yourself that such a relatively complex approach was a good strategy.
0
 
LVL 13

Expert Comment

by:Hugh McCurdy
ID: 38835077
I'm guessing this is related to the desire to create a large file with random data that you since posted.

I'm pretty much with the others, just read the file.  

This comment about collapse puzzles me.  Are you worried about running out of RAM?   If so, why?  I'm wondering if this is more about memory leaks than something else.  Can't tell.
0
 
LVL 12

Author Comment

by:jazzIIIlove
ID: 38847247
Ah,
Thanks for the strategies and comments. I think I resolved this.

Regards.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
LINQ - C# to VB convertion 12 45
Angular JS Route 3 45
servlet example issue 6 27
WCF Service Application cannot connect from TCP terminal 1 11
Introduction Java can be integrated with native programs using an interface called JNI(Java Native Interface). Native programs are programs which can directly run on the processor. JNI is simply a naming and calling convention so that the JVM (Java…
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.

929 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now