We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you two Citrix podcasts. Learn about 2020 trends and get answers to your biggest Citrix questions!Listen Now

x

Need help saving a large file

joshuadavidlee
on
Medium Priority
243 Views
Last Modified: 2010-05-18
if you have used any of the main file sharing apps out there you will notice how most of them save a large empty file at the begining of the download job.
For example if you are downloading a 1 Gb file called bigfile.zip they will save a 1 GB file called bigfile.zip immediatly

however this file is made of what? zeros maybe? anyways over the course of the download they right over this file with the real data as it is downloaded.

I am wondering how i should go about doing this because i need to write a app that saves big big files.
Comment
Watch Question

Commented:
Preallocate the space like this:

      using (FileStream fs = new FileStream(@"d:\test.dat", FileMode.Create)) {
        fs.SetLength(500000000); // Allocate 500 megs
      }

After that, just open a normal write handle onto the file and use Seek calls to write to the appropriate position in the created file stream. As a rule, there are no file-system wide guarantees on the file contents. I think NTFS makes them empty, but as far as I can tell, results could be anything. Make sure you keep track on what you've already written so that the resultant file won't be corrupted (this applies regardless of what the initial content is).

If you have further questions, just ask.

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts

Author

Commented:
thanks, yeah i was wondering what u do if there is no log file of what u have written, i can not seem to find any log file in the main downloading apps out there, so it would be nice to know how to tell what portions have been written without a log file

any advice on that would be great

Commented:
Well... There are various approaches one could use. I don't know which ones the mainstream download managers use. Some viable alternatives include:

1) Maintain the data in memory. Various data structures will do here. Of course, if your application has to survive process shutdowns, this won't be an option.

2) Write it in some sort of a file, not necessarily very visible to the user. It's quite possible for the application to maintain such an information, but keep it in isolated storage (see <http://www.dotnetdevs.com/articles/IsolatedStorage.aspx>) or even just a temp directory.

3) Keep the write logs together with the data. For example, you could allocate N bytes for a file of N byte size, but then write the log data after the actual file data. Once the file is completely downloaded, you just truncate the file at N bytes and you're ready.

Regardless of which strategy you pick, it's probably a good idea to store the "ready information" as an array of byte position ranges. You could have, for example,
struct PositionRange { public long lowBound, highBound; }

and then store the readiness data as a PositionRange[] (or a List<PositionRange> or whatever suits you). That way, you could have a position range indicating that bytes 1-2000 have been downloaded (and written to the file) and another one stating that bytes 8000-24000 are ready as well. Then you can reasonably easily calculate that bytes 2000-8000 need to be retrieved, as well as all bytes after position 24000. Of course, once you get bytes 6000-7999 downloaded, you'll probably want to merge the ranges 6000-7999 and 8000-24000 to 6000-24000 to avoid creating huge numbers of range objects and thus consuming memory. This'll be a very easy exercise anyway (compared to the other parts involved in creating a download manager).

Author

Commented:
i like the one where u suggest writing the log to the end of the file perhaps
ok so  i am downloading and saving in 1MB chunks.
so there is no way of examining each offset in the big file to determine which ones have been written to i guess right?
i mean if they were initially all zeros how do you know the actual file itself meant to have all zeros also in that particular offset?

Commented:
Exactly. There's no way you can tell, so you just have to create some sort of metadata. If you always have 1 MB chunks, you could consider just allocating a bool array of sufficient size, indicating each 1 MB chunk with a single boolean. That way you could avoid a lot of hassle with the position ranges. Even with a reasonably sizeful 5 GB file, you would only allocate 5000 booleans - hardly an issue about memory consumption :-)

Author

Commented:
right ok thanks i will attemp this all tomorrow and then accept your answer then

Author

Commented:
ok so i got it all implemented, and i would say that because my app and some apps use a header file aka torret file and it contains a list of hashcodes for each offset therefore a log file is not required because for broken downloads you can always recheck each offset againt the properhashcode in the header

Commented:
I don't know about Torrent's hashing mechanisms, but it is at least theoretically possible for the block of the empty file to produce the same hash as the real data. This is, in practice, highly unlikely. A more practical point is, perhaps, the question of whether or not separating faulty blocks from non-downloaded ones is necessary.

Author

Commented:
well let me tell you the ENTIRE filesharing community is based on the HOPE AND PRAYER that there will be no collissions when it comes to hashing lol

anyways all i do on startup of a resumed download is begin at the first offest and hash ti to see if its correct, if not i start a download thread for that offset, if it is correct then i move on to the next offset
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.