[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now


Need help saving a large file

Posted on 2006-10-28
Medium Priority
Last Modified: 2010-05-18
if you have used any of the main file sharing apps out there you will notice how most of them save a large empty file at the begining of the download job.
For example if you are downloading a 1 Gb file called bigfile.zip they will save a 1 GB file called bigfile.zip immediatly

however this file is made of what? zeros maybe? anyways over the course of the download they right over this file with the real data as it is downloaded.

I am wondering how i should go about doing this because i need to write a app that saves big big files.
Question by:joshuadavidlee
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4

Accepted Solution

der_jth earned 2000 total points
ID: 17828190
Preallocate the space like this:

      using (FileStream fs = new FileStream(@"d:\test.dat", FileMode.Create)) {
        fs.SetLength(500000000); // Allocate 500 megs

After that, just open a normal write handle onto the file and use Seek calls to write to the appropriate position in the created file stream. As a rule, there are no file-system wide guarantees on the file contents. I think NTFS makes them empty, but as far as I can tell, results could be anything. Make sure you keep track on what you've already written so that the resultant file won't be corrupted (this applies regardless of what the initial content is).

If you have further questions, just ask.

Author Comment

ID: 17828192
thanks, yeah i was wondering what u do if there is no log file of what u have written, i can not seem to find any log file in the main downloading apps out there, so it would be nice to know how to tell what portions have been written without a log file

any advice on that would be great

Expert Comment

ID: 17828220
Well... There are various approaches one could use. I don't know which ones the mainstream download managers use. Some viable alternatives include:

1) Maintain the data in memory. Various data structures will do here. Of course, if your application has to survive process shutdowns, this won't be an option.

2) Write it in some sort of a file, not necessarily very visible to the user. It's quite possible for the application to maintain such an information, but keep it in isolated storage (see <http://www.dotnetdevs.com/articles/IsolatedStorage.aspx>) or even just a temp directory.

3) Keep the write logs together with the data. For example, you could allocate N bytes for a file of N byte size, but then write the log data after the actual file data. Once the file is completely downloaded, you just truncate the file at N bytes and you're ready.

Regardless of which strategy you pick, it's probably a good idea to store the "ready information" as an array of byte position ranges. You could have, for example,
struct PositionRange { public long lowBound, highBound; }

and then store the readiness data as a PositionRange[] (or a List<PositionRange> or whatever suits you). That way, you could have a position range indicating that bytes 1-2000 have been downloaded (and written to the file) and another one stating that bytes 8000-24000 are ready as well. Then you can reasonably easily calculate that bytes 2000-8000 need to be retrieved, as well as all bytes after position 24000. Of course, once you get bytes 6000-7999 downloaded, you'll probably want to merge the ranges 6000-7999 and 8000-24000 to 6000-24000 to avoid creating huge numbers of range objects and thus consuming memory. This'll be a very easy exercise anyway (compared to the other parts involved in creating a download manager).

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.


Author Comment

ID: 17828229
i like the one where u suggest writing the log to the end of the file perhaps
ok so  i am downloading and saving in 1MB chunks.
so there is no way of examining each offset in the big file to determine which ones have been written to i guess right?
i mean if they were initially all zeros how do you know the actual file itself meant to have all zeros also in that particular offset?


Expert Comment

ID: 17828255
Exactly. There's no way you can tell, so you just have to create some sort of metadata. If you always have 1 MB chunks, you could consider just allocating a bool array of sufficient size, indicating each 1 MB chunk with a single boolean. That way you could avoid a lot of hassle with the position ranges. Even with a reasonably sizeful 5 GB file, you would only allocate 5000 booleans - hardly an issue about memory consumption :-)

Author Comment

ID: 17828256
right ok thanks i will attemp this all tomorrow and then accept your answer then

Author Comment

ID: 17838172
ok so i got it all implemented, and i would say that because my app and some apps use a header file aka torret file and it contains a list of hashcodes for each offset therefore a log file is not required because for broken downloads you can always recheck each offset againt the properhashcode in the header

Expert Comment

ID: 17840018
I don't know about Torrent's hashing mechanisms, but it is at least theoretically possible for the block of the empty file to produce the same hash as the real data. This is, in practice, highly unlikely. A more practical point is, perhaps, the question of whether or not separating faulty blocks from non-downloaded ones is necessary.

Author Comment

ID: 17840028
well let me tell you the ENTIRE filesharing community is based on the HOPE AND PRAYER that there will be no collissions when it comes to hashing lol

anyways all i do on startup of a resumed download is begin at the first offest and hash ti to see if its correct, if not i start a download thread for that offset, if it is correct then i move on to the next offset

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Ivo
C# And Nullable Types Since 2.0 C# has Nullable(T) Generic Structure. The idea behind is to allow value type objects to have null values just like reference types have. This concerns scenarios where not all data sources have values (like a databa…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
This course is ideal for IT System Administrators working with VMware vSphere and its associated products in their company infrastructure. This course teaches you how to install and maintain this virtualization technology to store data, prevent vuln…
Video by: ITPro.TV
In this episode Don builds upon the troubleshooting techniques by demonstrating how to properly monitor a vSphere deployment to detect problems before they occur. He begins the show using tools found within the vSphere suite as ends the show demonst…

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question