Link to home
Start Free TrialLog in
Avatar of frize
frizeFlag for Algeria

asked on

Calc MD5 before downloading

Hi , i use indy 'idhttp' to download files , But what i want is :

Before start downloading a file from server i should calc its MD5 why :
because i must be sure if this file has not already been downloaded  and saved .

let me suppose the following scenario :

yesterday i downloaded x0.myfile which MD5 is 76ba2f8c4deb6f5cc9275dbdb63fd811 , and today i want to download other files among of them is x0.myfile so when the download process starts it should bypass x0.myfile because it's already downloaded yesterday .

Yes i can use If FileExists(x0.myfile) or comparing x0.myfile's size    then bypass , But here i want to work with MD5;

i put this file : update.ini   into my remote server :

update.ini

[File01]
Name=x0.myfile
MD5=76ba2f8c4deb6f5cc9275dbdb63fd811
[File02]
Name=x1.myfile
MD5=d908cd707036920c642cf3652c044e07
.
.
So only the file x1.myfile will be downloaded .


thank you




Avatar of aikimark
aikimark
Flag of United States of America image

in order to calculate a file's MD5 hash, you need to read the file's contents.  That would require your download program to download the file, which is what you are trying to avoid.

I think what is needed is to calculate the MD5 hash on the server and download just the MD5 hash values.  These can be compared to the MD5 hash values from the prior downloads or current MD5 hash values of the files (if the user is allowed to change the files).

Add the TPLockbox library to your project:
http://sourceforge.net/projects/tplockbox/
As aikimark said, checking MD5 requires the program to download the file in order to perform the hash for each byte that will produce the MD5 you want.
Therefore you need

1) to store some other file that contains the MD5 of the uploaded files and download only that file (update.ini).  
2) (the first time)  compare for each filename the MD5 that is stored in this textfile you have downloaded with the actual files you have.
3) Once you have performed the test download any file that doesn't match the downloaded ini file's MD5 contents.
4) write a local file that contains all the local files files MD5 that now match the files from the ini you downloaded.

5) any subsecuent time you check files (other than the first time), to save time compare the the result of the downloaded MD5 file, with the ini you saved as "local" the previous time you did the update.  That way you won't compare nothing but 2 strings from the 2nd time and on (while first time you will have to open your local file to check its contents with the downloaded MD5 value - the second time you will simply read the last MD5 value of your local file with that of the server).  Therefore your check will be done in zero time and spend time only during the download.
Avatar of frize

ASKER

@ ioannisa: please could you give me an exemple of Comparing files ' MD5 , cause i can compare 2 files but not more than that .



Function Comp_MD5(Fl_,_Fl:String):Boolean;
Var
Str,Strs:TSTringList;
begin
Result:=False;
{
 Str will gets its values from a local saved file
 named saved.txt which contents are ** Only MD5 ** :

76ba2f8c4deb6f5cc9275dbdb63fd811  --> x0.myfile
6e96992103ab0af8b88a1ce2de73280d  --> x1.myfile
0efa8efe1adeca5b1d60a0b2536d9d13  --> x2.myfile

 }
Str:=tstringlist.Create;

Strs:=tstringlist.Create;
{
Strs will gets its values from update.ini
after been downloaded and MD5 values have been parsed :

76ba2f8c4deb6f5cc9275dbdb63fd811  --> x0.myfile
6e96992103ab0af8b88a1ce2de73280d  --> x1.myfile
0efa8efe1adeca5b1d60a0b2536d9d13  --> x2.myfile
037dcf77a1ea143f865e6fe5c70e1581  --> x3.myfile
d908cd707036920c642cf3652c044e07  --> x4.myfile
}
try
Str.LoadFromFile('saved.txt');
// here how it will be the comparison routine ? betwen Str and Strs
then
Result:=True;
finally
Str.Free;
Strs.Free;
end

Open in new window

Avatar of frize

ASKER

So as you can see normaly only the files : x3.myfile and x4.myfile will be downloaded
>>** Only MD5 **

I'm a bit confused by this, since the example line
76ba2f8c4deb6f5cc9275dbdb63fd811  --> x0.myfile

contains a non-MD5 value "  --> x0.myfile"

===========
One would iterate the items in the shorter list (I assume that to be the update.ini sourced list) and then do a lookup and compare for the matching item in the larger list.

If you don't find a match, that would identify a new file that needs to be downloaded.

You might need to identify which files to delete.  My guess would be to use a hash value of all zeroes.


Avatar of frize

ASKER

@ aikimark :
76ba2f8c4deb6f5cc9275dbdb63fd811  --> x0.myfile

'  --> x0.myfile ' is just to tell that this MD5 value is for this file in my exemple .

Strs will gets its values from update.ini
after been downloaded and MD5 values have been parsed :
/// the real values will be as follows :
76ba2f8c4deb6f5cc9275dbdb63fd811
6e96992103ab0af8b88a1ce2de73280d
0efa8efe1adeca5b1d60a0b2536d9d13
037dcf77a1ea143f865e6fe5c70e1581
d908cd707036920c642cf3652c044e07
ASKER CERTIFIED SOLUTION
Avatar of Ioannis Anifantakis
Ioannis Anifantakis
Flag of Greece image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@frize

You really ought to rethink this and identify your strings.  Look at the TIniFile class or look at a name=value string configuration processed with TStringList controls.

There might not always be an x0.myfile file.

//assume you have a file called "local.ini" with contents
//[myblock]
//somebool=false
//someint=5
//
//[my_other_block]
//someString=I am a String

//-------------------------------------

uses iniFiles;

procedure blablabla;
var ini: TiniFile;
      tmp: String;
begin
    ini:=TIniFile.Create(ExtractFilePath(Application.ExeName)+'\local.ini');
    try
        // TO READ VALUES
        // you pass: the block name, the variable of the block, default value if variable not found
        myBool:=ini.ReadBool('myblock', 'somebool', true);
        myInt:=ini.ReadInteger('myblock', 'someint', 0);
        myString:=ini.ReadString('my_other_block', 'someString', '');

        // TO WRITE VALUES
        // you pass: the block name, the variable of the block, the value you will write in that variable
        ini.writeString('my_other_block', 'someString', 'blabla');
    finally
        ini.Free;
    end;
end;

Open in new window

Avatar of frize

ASKER

@ ioannisa: the problem is not in reading or writing into the  local.ini .
the problem is comparing the MD5 values between the saved local.ini and the update.ini

////////////////
LocalFile's contents :

76ba2f8c4deb6f5cc9275dbdb63fd811
6e96992103ab0af8b88a1ce2de73280d
0efa8efe1adeca5b1d60a0b2536d9d13
037dcf77a1ea143f865e6fe5c70e1581
d908cd707036920c642cf3652c044e07

update.ini MD5 Vlaues :

76ba2f8c4deb6f5cc9275dbdb63fd811
6e96992103ab0af8b88a1ce2de73280d
0efa8efe1adeca5b1d60a0b2536d9d13
037dcf77a1ea143f865e6fe5c70e1581

the problem is here how to compare the MD5 values .

something like this pseudo code :





for i := LocalFile.count - 1 downto 0 do
   for j :=  update.count - 1 downto 0 do
// if MD5 values found in LocalFile = MD5 values found in update then 
// don't download
//else
// download only the file that does't equal the MD5 value in localfile

Open in new window

to compare these you can't without having something like

file1=76ba2f8c4deb6f5cc9275dbdb63fd811
file2=6e96992103ab0af8b88a1ce2de73280d
file3=0efa8efe1adeca5b1d60a0b2536d9d13
file4=037dcf77a1ea143f865e6fe5c70e1581
file5=d908cd707036920c642cf3652c044e07

That is you need to associate your values with a field name (that of a file).

-----------------------------------------------------

I explained you above.  You need a client dataset with two columns (filename and md5).  That way you can always know what filename has what md5.

Then  if you find the files population by the code I gave you above, you can find the local filenames and produce md5 per file you populate.

I really don't get your problem.

If you just have a bunch of lines without any association to a file ofcourse you won't make it.

Its really up to you how you build it, but ofcourse.... to get the result of a variable, we must first know what variable we talk about.
previously I showed you this:

file1=76ba2f8c4deb6f5cc9275dbdb63fd811
file2=6e96992103ab0af8b88a1ce2de73280d
file3=0efa8efe1adeca5b1d60a0b2536d9d13
file4=037dcf77a1ea143f865e6fe5c70e1581
file5=d908cd707036920c642cf3652c044e07

assume that you have instead of file1, file2, file3, the real filenames

project1.exe=76ba2f8c4deb6f5cc9275dbdb63fd811
someTextFile.txt=6e96992103ab0af8b88a1ce2de73280d
moreFile.dll=0efa8efe1adeca5b1d60a0b2536d9d13
subProject.bat=037dcf77a1ea143f865e6fe5c70e1581
settings.ini=d908cd707036920c642cf3652c044e07

Then what you download should have the same
project1.exe=76ba2f8c4deb6f5cc9275dbdb63fd811
someTextFile.txt=6e96992103ab0af8b88a1ce2de73280d
moreFile.dll=0efa8efe1adeca5b1d60a0b2536d9d13
subProject.bat=037dcf77a1ea143f865e6fe5c70e1581
settings.ini=d908cd707036920c642cf3652c044e07

so when you search for the project1.exe in one file, to check whether it exists on the other file as well.  Once you manage to match the filenames, then try to match the md5 that is after the "="

so since you find project1.exe in both local and update files, then check the md5 on both files to see if it matches.

But you can't do this if you don't give some name to your MD5 so that you know which file it concerns

@frize

you will need to iterate all the sections in both INI files (with TIniFile class) and transfer that data (both file name and MD5 hash value) to a data structure that facilitates searching or is easily searched.

You iterate through the updates.ini content and, for each file section, look for that file name in the other INI file content.  If matching file name found, compare the MD5 string values.  If the MD5 values are different, you need to download.

If there is no matching file name, you know you need to download.

Look back at my previous comment about the possibility of deleting a file.
Sorry if anyone has already posted this idea.
I did read all your posts but didn't saw something.

So @Frize

SERVER SIDE >
1) Just place a PHP file in your server.
2) Inside this php file say calc.php use GET to parse the given string for example:
    calc.php?file=ToBeDownloaded.zip
3) Now your server knows what you want to download.
    Inside the php tell to echo the md5 value of the file.
    Here's how : http://www.fastsum.com/developers/md5-parser.php
    (Its free)
4) The only thing left to do is echo the value.
    <?php echo $MyMD5value; ?>


LOCAL SIDE >
1) Say you want to download the file "ToBeDownloaded.zip"
    the only thing you have to do is : Open with delphi (Indy) the window  :
    http://www.myserver_is_waiting.com/calc.php?file=ToBeDownloaded.zip
2) Server outputs the only thing which is the md5.
    Save this output to a string.
3) Compare it with your saved md5's ....


Thats it.

Hope this will help you.

@CodedK

I think this is a push pattern.  The client doesn't know what files to pull down as changed or new without guidance from the update.ini file.  Thanks for the fastsum.com link.
@aikimark

Another scenario would be 2 php files.

1) Server side the php file could provide a full list with md5 codes for all the files inside a folder.
2) Second file would start download a file given the missing md5 code.
Avatar of frize

ASKER

Thank you all for your posts
@ioannisa , What if i have 100 files or more will this methode take a long time to check and compare MD5 ?

@CodedK :  To be honest with you i never worked with php , i'm sure i will declare a war between eggs and chiken
@frize

Do you allow the user to change these files?  If not, then you only need to compare the MD5 hash strings in the two ini files.
@frize

... it would make your life much easier. I hope you'll get the chance to learn it.
In these situation if you follow scenario #2. You will only need 10-15 lines of code.