• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 227
  • Last Modified:

Need help saving a large file

if you have used any of the main file sharing apps out there you will notice how most of them save a large empty file at the begining of the download job.
For example if you are downloading a 1 Gb file called bigfile.zip they will save a 1 GB file called bigfile.zip immediatly

however this file is made of what? zeros maybe? anyways over the course of the download they right over this file with the real data as it is downloaded.

I am wondering how i should go about doing this because i need to write a app that saves big big files.
0
joshuadavidlee
Asked:
joshuadavidlee
  • 5
  • 4
1 Solution
 
der_jthCommented:
Preallocate the space like this:

      using (FileStream fs = new FileStream(@"d:\test.dat", FileMode.Create)) {
        fs.SetLength(500000000); // Allocate 500 megs
      }

After that, just open a normal write handle onto the file and use Seek calls to write to the appropriate position in the created file stream. As a rule, there are no file-system wide guarantees on the file contents. I think NTFS makes them empty, but as far as I can tell, results could be anything. Make sure you keep track on what you've already written so that the resultant file won't be corrupted (this applies regardless of what the initial content is).

If you have further questions, just ask.
0
 
joshuadavidleeAuthor Commented:
thanks, yeah i was wondering what u do if there is no log file of what u have written, i can not seem to find any log file in the main downloading apps out there, so it would be nice to know how to tell what portions have been written without a log file

any advice on that would be great
0
 
der_jthCommented:
Well... There are various approaches one could use. I don't know which ones the mainstream download managers use. Some viable alternatives include:

1) Maintain the data in memory. Various data structures will do here. Of course, if your application has to survive process shutdowns, this won't be an option.

2) Write it in some sort of a file, not necessarily very visible to the user. It's quite possible for the application to maintain such an information, but keep it in isolated storage (see <http://www.dotnetdevs.com/articles/IsolatedStorage.aspx>) or even just a temp directory.

3) Keep the write logs together with the data. For example, you could allocate N bytes for a file of N byte size, but then write the log data after the actual file data. Once the file is completely downloaded, you just truncate the file at N bytes and you're ready.

Regardless of which strategy you pick, it's probably a good idea to store the "ready information" as an array of byte position ranges. You could have, for example,
struct PositionRange { public long lowBound, highBound; }

and then store the readiness data as a PositionRange[] (or a List<PositionRange> or whatever suits you). That way, you could have a position range indicating that bytes 1-2000 have been downloaded (and written to the file) and another one stating that bytes 8000-24000 are ready as well. Then you can reasonably easily calculate that bytes 2000-8000 need to be retrieved, as well as all bytes after position 24000. Of course, once you get bytes 6000-7999 downloaded, you'll probably want to merge the ranges 6000-7999 and 8000-24000 to 6000-24000 to avoid creating huge numbers of range objects and thus consuming memory. This'll be a very easy exercise anyway (compared to the other parts involved in creating a download manager).
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
joshuadavidleeAuthor Commented:
i like the one where u suggest writing the log to the end of the file perhaps
ok so  i am downloading and saving in 1MB chunks.
so there is no way of examining each offset in the big file to determine which ones have been written to i guess right?
i mean if they were initially all zeros how do you know the actual file itself meant to have all zeros also in that particular offset?

0
 
der_jthCommented:
Exactly. There's no way you can tell, so you just have to create some sort of metadata. If you always have 1 MB chunks, you could consider just allocating a bool array of sufficient size, indicating each 1 MB chunk with a single boolean. That way you could avoid a lot of hassle with the position ranges. Even with a reasonably sizeful 5 GB file, you would only allocate 5000 booleans - hardly an issue about memory consumption :-)
0
 
joshuadavidleeAuthor Commented:
right ok thanks i will attemp this all tomorrow and then accept your answer then
0
 
joshuadavidleeAuthor Commented:
ok so i got it all implemented, and i would say that because my app and some apps use a header file aka torret file and it contains a list of hashcodes for each offset therefore a log file is not required because for broken downloads you can always recheck each offset againt the properhashcode in the header
0
 
der_jthCommented:
I don't know about Torrent's hashing mechanisms, but it is at least theoretically possible for the block of the empty file to produce the same hash as the real data. This is, in practice, highly unlikely. A more practical point is, perhaps, the question of whether or not separating faulty blocks from non-downloaded ones is necessary.
0
 
joshuadavidleeAuthor Commented:
well let me tell you the ENTIRE filesharing community is based on the HOPE AND PRAYER that there will be no collissions when it comes to hashing lol

anyways all i do on startup of a resumed download is begin at the first offest and hash ti to see if its correct, if not i start a download thread for that offset, if it is correct then i move on to the next offset
0

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now