Solved

Need help saving a large file

Posted on 2006-10-28
9
212 Views
Last Modified: 2010-05-18
if you have used any of the main file sharing apps out there you will notice how most of them save a large empty file at the begining of the download job.
For example if you are downloading a 1 Gb file called bigfile.zip they will save a 1 GB file called bigfile.zip immediatly

however this file is made of what? zeros maybe? anyways over the course of the download they right over this file with the real data as it is downloaded.

I am wondering how i should go about doing this because i need to write a app that saves big big files.
0
Comment
Question by:joshuadavidlee
  • 5
  • 4
9 Comments
 
LVL 6

Accepted Solution

by:
der_jth earned 500 total points
Comment Utility
Preallocate the space like this:

      using (FileStream fs = new FileStream(@"d:\test.dat", FileMode.Create)) {
        fs.SetLength(500000000); // Allocate 500 megs
      }

After that, just open a normal write handle onto the file and use Seek calls to write to the appropriate position in the created file stream. As a rule, there are no file-system wide guarantees on the file contents. I think NTFS makes them empty, but as far as I can tell, results could be anything. Make sure you keep track on what you've already written so that the resultant file won't be corrupted (this applies regardless of what the initial content is).

If you have further questions, just ask.
0
 

Author Comment

by:joshuadavidlee
Comment Utility
thanks, yeah i was wondering what u do if there is no log file of what u have written, i can not seem to find any log file in the main downloading apps out there, so it would be nice to know how to tell what portions have been written without a log file

any advice on that would be great
0
 
LVL 6

Expert Comment

by:der_jth
Comment Utility
Well... There are various approaches one could use. I don't know which ones the mainstream download managers use. Some viable alternatives include:

1) Maintain the data in memory. Various data structures will do here. Of course, if your application has to survive process shutdowns, this won't be an option.

2) Write it in some sort of a file, not necessarily very visible to the user. It's quite possible for the application to maintain such an information, but keep it in isolated storage (see <http://www.dotnetdevs.com/articles/IsolatedStorage.aspx>) or even just a temp directory.

3) Keep the write logs together with the data. For example, you could allocate N bytes for a file of N byte size, but then write the log data after the actual file data. Once the file is completely downloaded, you just truncate the file at N bytes and you're ready.

Regardless of which strategy you pick, it's probably a good idea to store the "ready information" as an array of byte position ranges. You could have, for example,
struct PositionRange { public long lowBound, highBound; }

and then store the readiness data as a PositionRange[] (or a List<PositionRange> or whatever suits you). That way, you could have a position range indicating that bytes 1-2000 have been downloaded (and written to the file) and another one stating that bytes 8000-24000 are ready as well. Then you can reasonably easily calculate that bytes 2000-8000 need to be retrieved, as well as all bytes after position 24000. Of course, once you get bytes 6000-7999 downloaded, you'll probably want to merge the ranges 6000-7999 and 8000-24000 to 6000-24000 to avoid creating huge numbers of range objects and thus consuming memory. This'll be a very easy exercise anyway (compared to the other parts involved in creating a download manager).
0
 

Author Comment

by:joshuadavidlee
Comment Utility
i like the one where u suggest writing the log to the end of the file perhaps
ok so  i am downloading and saving in 1MB chunks.
so there is no way of examining each offset in the big file to determine which ones have been written to i guess right?
i mean if they were initially all zeros how do you know the actual file itself meant to have all zeros also in that particular offset?

0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 6

Expert Comment

by:der_jth
Comment Utility
Exactly. There's no way you can tell, so you just have to create some sort of metadata. If you always have 1 MB chunks, you could consider just allocating a bool array of sufficient size, indicating each 1 MB chunk with a single boolean. That way you could avoid a lot of hassle with the position ranges. Even with a reasonably sizeful 5 GB file, you would only allocate 5000 booleans - hardly an issue about memory consumption :-)
0
 

Author Comment

by:joshuadavidlee
Comment Utility
right ok thanks i will attemp this all tomorrow and then accept your answer then
0
 

Author Comment

by:joshuadavidlee
Comment Utility
ok so i got it all implemented, and i would say that because my app and some apps use a header file aka torret file and it contains a list of hashcodes for each offset therefore a log file is not required because for broken downloads you can always recheck each offset againt the properhashcode in the header
0
 
LVL 6

Expert Comment

by:der_jth
Comment Utility
I don't know about Torrent's hashing mechanisms, but it is at least theoretically possible for the block of the empty file to produce the same hash as the real data. This is, in practice, highly unlikely. A more practical point is, perhaps, the question of whether or not separating faulty blocks from non-downloaded ones is necessary.
0
 

Author Comment

by:joshuadavidlee
Comment Utility
well let me tell you the ENTIRE filesharing community is based on the HOPE AND PRAYER that there will be no collissions when it comes to hashing lol

anyways all i do on startup of a resumed download is begin at the first offest and hash ti to see if its correct, if not i start a download thread for that offset, if it is correct then i move on to the next offset
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Suggested Solutions

Article by: Najam
Having new technologies does not mean they will completely replace old components.  Recently I had to create WCF that will be called by VB6 component.  Here I will describe what steps one should follow while doing so, please feel free to post any qu…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
This video discusses moving either the default database or any database to a new volume.
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now