Solved

Need help saving a large file

Posted on 2006-10-28
9
216 Views
Last Modified: 2010-05-18
if you have used any of the main file sharing apps out there you will notice how most of them save a large empty file at the begining of the download job.
For example if you are downloading a 1 Gb file called bigfile.zip they will save a 1 GB file called bigfile.zip immediatly

however this file is made of what? zeros maybe? anyways over the course of the download they right over this file with the real data as it is downloaded.

I am wondering how i should go about doing this because i need to write a app that saves big big files.
0
Comment
Question by:joshuadavidlee
  • 5
  • 4
9 Comments
 
LVL 6

Accepted Solution

by:
der_jth earned 500 total points
ID: 17828190
Preallocate the space like this:

      using (FileStream fs = new FileStream(@"d:\test.dat", FileMode.Create)) {
        fs.SetLength(500000000); // Allocate 500 megs
      }

After that, just open a normal write handle onto the file and use Seek calls to write to the appropriate position in the created file stream. As a rule, there are no file-system wide guarantees on the file contents. I think NTFS makes them empty, but as far as I can tell, results could be anything. Make sure you keep track on what you've already written so that the resultant file won't be corrupted (this applies regardless of what the initial content is).

If you have further questions, just ask.
0
 

Author Comment

by:joshuadavidlee
ID: 17828192
thanks, yeah i was wondering what u do if there is no log file of what u have written, i can not seem to find any log file in the main downloading apps out there, so it would be nice to know how to tell what portions have been written without a log file

any advice on that would be great
0
 
LVL 6

Expert Comment

by:der_jth
ID: 17828220
Well... There are various approaches one could use. I don't know which ones the mainstream download managers use. Some viable alternatives include:

1) Maintain the data in memory. Various data structures will do here. Of course, if your application has to survive process shutdowns, this won't be an option.

2) Write it in some sort of a file, not necessarily very visible to the user. It's quite possible for the application to maintain such an information, but keep it in isolated storage (see <http://www.dotnetdevs.com/articles/IsolatedStorage.aspx>) or even just a temp directory.

3) Keep the write logs together with the data. For example, you could allocate N bytes for a file of N byte size, but then write the log data after the actual file data. Once the file is completely downloaded, you just truncate the file at N bytes and you're ready.

Regardless of which strategy you pick, it's probably a good idea to store the "ready information" as an array of byte position ranges. You could have, for example,
struct PositionRange { public long lowBound, highBound; }

and then store the readiness data as a PositionRange[] (or a List<PositionRange> or whatever suits you). That way, you could have a position range indicating that bytes 1-2000 have been downloaded (and written to the file) and another one stating that bytes 8000-24000 are ready as well. Then you can reasonably easily calculate that bytes 2000-8000 need to be retrieved, as well as all bytes after position 24000. Of course, once you get bytes 6000-7999 downloaded, you'll probably want to merge the ranges 6000-7999 and 8000-24000 to 6000-24000 to avoid creating huge numbers of range objects and thus consuming memory. This'll be a very easy exercise anyway (compared to the other parts involved in creating a download manager).
0
Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

 

Author Comment

by:joshuadavidlee
ID: 17828229
i like the one where u suggest writing the log to the end of the file perhaps
ok so  i am downloading and saving in 1MB chunks.
so there is no way of examining each offset in the big file to determine which ones have been written to i guess right?
i mean if they were initially all zeros how do you know the actual file itself meant to have all zeros also in that particular offset?

0
 
LVL 6

Expert Comment

by:der_jth
ID: 17828255
Exactly. There's no way you can tell, so you just have to create some sort of metadata. If you always have 1 MB chunks, you could consider just allocating a bool array of sufficient size, indicating each 1 MB chunk with a single boolean. That way you could avoid a lot of hassle with the position ranges. Even with a reasonably sizeful 5 GB file, you would only allocate 5000 booleans - hardly an issue about memory consumption :-)
0
 

Author Comment

by:joshuadavidlee
ID: 17828256
right ok thanks i will attemp this all tomorrow and then accept your answer then
0
 

Author Comment

by:joshuadavidlee
ID: 17838172
ok so i got it all implemented, and i would say that because my app and some apps use a header file aka torret file and it contains a list of hashcodes for each offset therefore a log file is not required because for broken downloads you can always recheck each offset againt the properhashcode in the header
0
 
LVL 6

Expert Comment

by:der_jth
ID: 17840018
I don't know about Torrent's hashing mechanisms, but it is at least theoretically possible for the block of the empty file to produce the same hash as the real data. This is, in practice, highly unlikely. A more practical point is, perhaps, the question of whether or not separating faulty blocks from non-downloaded ones is necessary.
0
 

Author Comment

by:joshuadavidlee
ID: 17840028
well let me tell you the ENTIRE filesharing community is based on the HOPE AND PRAYER that there will be no collissions when it comes to hashing lol

anyways all i do on startup of a resumed download is begin at the first offest and hash ti to see if its correct, if not i start a download thread for that offset, if it is correct then i move on to the next offset
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article series is supposed to shed some light on the use of IDisposable and objects that inherit from it. In essence, a more apt title for this article would be: using (IDisposable) {}. I’m just not sure how many people would ge…
This article is for Object-Oriented Programming (OOP) beginners. An Interface contains declarations of events, indexers, methods and/or properties. Any class which implements the Interface should provide the concrete implementation for each Inter…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question