Solved

Try to read a huge text file via C# or Java.

Posted on 2013-01-25
5
704 Views
Last Modified: 2013-02-02
Hi there;

This is a hypotetical question. Say that, I am trying to read extremely huge text file via C# or Java. Now, first how can i read a huge file in those languages?

Second, how can i be sure that for each line I read, there won't be a collapse in the reading process?

I need to extract the content per line and work on the extracted atoms which are blank delimited. I am planning to have a thread for this very process for each line but should I join the thread to the main program or what can be the strategy? There is no write operation on the file, and assume that the file is prepopulated.

Can you give me some strategies above this scenario in Java and C#?

Regards.

P.S. The culprit is the huge file size, I have to be sure that it won't collapse once I open the file for read purpose and read. I can also go for C or PHP for this need.
0
Comment
Question by:jazzIIIlove
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 44

Assisted Solution

by:AndyAinscow
AndyAinscow earned 167 total points
ID: 38817851
>>Now, first how can i read a huge file in those languages?
Exactly the same way as a small file - the size makes no difference to the functions you use to read.

>>Second, how can i be sure that for each line I read, there won't be a collapse in the reading process?
What do you mean collapse?

>>I need to extract the content per line and work on the extracted atoms which are blank delimited. I am planning to have a thread for this very process for each line
Awk - do you mean having millions of threads ?
0
 
LVL 29

Accepted Solution

by:
Göran Andersson earned 167 total points
ID: 38817871
For a small file you could just read all of it at once, but for a large file you would want to use a StreamReader.

If you just read lines and start a new thread for each line, you will quickly start a huge number of threads and congest the system. Instead you should start a limited number of threads, and have them polling a synchronised queue to pick up work from. Then you read lines from the file and put in the queue, and when the queue reaches a certain size you just wait for the threads to pick items from it before continuing to read from the file.
0
 
LVL 86

Assisted Solution

by:CEHJ
CEHJ earned 166 total points
ID: 38818251
The size of the file is immaterial. What counts is the size of the buffers into which that file is read. For a BufferedReader the default is 8192 bytes but you can make it any size you like.
Since your file is essentially a csv file you have absolutely no problems with buffering.

I am planning to have a thread for this very process for each line but should I join the thread to the main program or what can be the strategy?
If you're right about a multi-threaded approach being a good one - again there's no problem. One thread could read the file into a queue and that queue could be processed by threads from a thread pool. Of course you would have to justify to yourself that such a relatively complex approach was a good strategy.
0
 
LVL 13

Expert Comment

by:Hugh McCurdy
ID: 38835077
I'm guessing this is related to the desire to create a large file with random data that you since posted.

I'm pretty much with the others, just read the file.  

This comment about collapse puzzles me.  Are you worried about running out of RAM?   If so, why?  I'm wondering if this is more about memory leaks than something else.  Can't tell.
0
 
LVL 12

Author Comment

by:jazzIIIlove
ID: 38847247
Ah,
Thanks for the strategies and comments. I think I resolved this.

Regards.
0

Featured Post

Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Suggested Courses

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question