How to calculate MD5 or SHA checksum of a file in SSIS

I need to calculate the checksum  MD5 or SHA of an entire file as a whole and not as individual rows.
SSIS can calculate checksum of rows pretty easily. There is also a data flow task available for this as a download but thats not what I am looking for.

I want to calculate checksum of a file...
Can anyone help me with this?

I am not good with the scripting task so if you are giving script solutions plz provide then script if it can be easily made and guide me accordingly as I dont know how to work with it. (I am sorry if I am  being demanding :P its just that I dont have a lot of experience with that)

Any other solutions?

All the help is greatly appreciated.
I am new to SSIS so please bear with me.
Thanks.
esotericmeeAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Megan BrooksSQL Server ConsultantCommented:
I didn't know how to do this either but I copied a few lines of code from http://sharpertutorials.com/calculate-md5-checksum-file and created a script task from it.

I have included both the package file (rename .xml to .dtsx) and the main part of the script, below.

There are two independent sections to the script, one for MD5 and one for SHA1. Part of the coding is redundant and could be combined. The results are stored in the package variables MD5hash and SHA1hash. The script also logs information events showing the hash values, which you can view using the Log Events window in the SSIS designer (this is in the package script, but not in the code sample below).

The script uses a File Connection Manager to hold the name of the file to be read. The connection manager is only used to obtain the file name; it does not actually read the file.

I don't know if this is exactly what you were looking for, but it shouldn't be hard to modify to do other things.



            //Calculate MD5 hash from file
            FileStream file = new FileStream(Dts.Connections["Input File Connection"].ConnectionString, FileMode.Open);
            MD5 md5 = new MD5CryptoServiceProvider();
            byte[] retVal = md5.ComputeHash(file);
            file.Close();

            //Store MD5 hash in package variable
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < retVal.Length; i++)
            {
                sb.Append(retVal[i].ToString("x2"));
            }

            string sMD5hash = sb.ToString();

            Dts.Variables["MD5hash"].Value = sb.ToString();

            //Calculate SHA1 hash from file
            FileStream file2 = new FileStream(Dts.Connections["Input File Connection"].ConnectionString, FileMode.Open);
            SHA1 sha1 = new SHA1CryptoServiceProvider();
            byte[] retVal2 = sha1.ComputeHash(file2);
            file2.Close();

            //Store SHA1 hash in package variable
            StringBuilder sb2 = new StringBuilder();
            for (int i = 0; i < retVal2.Length; i++)
            {
                sb2.Append(retVal2[i].ToString("x2"));
            }

            string sSHA1hash = sb2.ToString();

            Dts.Variables["SHA1hash"].Value = sb2.ToString();

Open in new window

md5sha1hash.xml
0
esotericmeeAuthor Commented:
oh thanks a lot ..This is brillilant... this is exactly what I was looking for..Thank you so much..
Now jus a lil more help ( I am sorry I am new to using this script task )

so I opened the xml file... do I have to copy the whole XML and C# script from that ( if I paste the entire code then Script editor is showing a lot of errors) . When I paste only the C# part from the code then it doesnt show any errors... So please just tell what to paste from that XML file coz it also has some garbage value in it... but when the C# is pasted it jus looks fine..

I have created 2 variables MD5Hash and SHA5Hash in SSIS and now I go ahead and add them in the read & write variables in the SCRIPT task right?

I also didnt know where do I actually pass in the file for which the Checksum has to be calculated...
Do I pass in as a read only variable?


and once again I really appreciate your time and all the help...
Thanks a lot!!
0
Megan BrooksSQL Server ConsultantCommented:
You are welcome.

You will need to rename the .XML file as a .DTSX file (EE doesn't allow .DTSX to be uploaded). Once you have done that, you can add it to an existing SSIS project in Visual Studio (or BIDS) by placing the file in the project directory, right-clicking the project name in Solution Explorer, and using "Add-->Existing Item" to add it to the project. Don't try to use the "Add-->Existing Package" option.

The File Connection Manager (named "Input File Connection" in the sample) contains the filename, but you could use a variable instead. Use the connection manager if the filename does not change, and use a variable if the filename is going to come from somewhere else (calculated, from a database, etc.).

To use a variable for the filename you would create the package variable, add the name to the script task on the Script pane under "ReadOnlyVariables", and modify the script code to reference Dts.Variables["variablename"].Value in place of Dts.Connections["Input File Connection"].ConnectionString in the FileStream method calls.

The script task does need to declare the output variables under ReadWriteVariables on the Script pane. You can see this in the sample.
0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

esotericmeeAuthor Commented:
Thanks a Lot..This is definitely the solution I was looking for...
but I got one question.. May be I must have done something wrong....

I added the package just as u instructed.. I added a derived column task in dataflow, which follows your Script Task, to capture the values of the variables MD5 and SHA1 and write them to a flat file destination...

When I try running the package for different files , I am getting the same checksum values for both MD5 and SHA1 (MD5 = -1071628257 and SHA1 = 27 ) I was getting truncation error for This derived column so I chose to redirect the rows when truncation occurs and then in the redirected rows I get these same exact values even if I choose different files in the" Input File Connection manager"...
Isnt it supposed to generate different values for different input files??
I mean I am using this checksum for validation purposes, so if it generates same values for all files, then its kind of defying the purpose of our validation.

If you think I did somethin wrong somewhere like redirecting rows or something please let me know.
If you get a chance could u check the task on ur machine for different files, if you are getting different checksum values for different input files, then its me who has goofed up somewhere. But if thats not the case then let me know what can be changed to make it work.

Thanks a Lot for all your help.
I really appreciate. You truly are an expert!!
I am loving this forum..
All you guys are great..
0
Megan BrooksSQL Server ConsultantCommented:
The MD5 and SHA1 hashes are not the same length so they can't be the same. Likewise, a hash can't be the same for two files with different content -- that is the whole idea of calculating the hash. You can see the results (below) when I ran the sample package on two different input files.
The output of the cryptographic methods is a byte stream which represents a series of bits. The sample code that I used to create this package converts that byte stream to hexadecimal, byte by byte, and builds a human-readable text string from it. If you need something different, you may need to modify that code. Right now, that is what goes into the variables. There is no way they could contain values like -1071628257 or 27. See the code sample below for what they look like.
I think something is happening in the derived column task. Upload the package if you'd like and I will look at it. You will have to rename it to a .xml file.

File 1:
    MD5 = cb1f0d50d96b73d91089fe87fb74665b
    SHA1 = e7d57edd26f1a1effc3921a6a79725381df8524f

File 2:
    MD5 = f1423122f1124c5623e3d7006c990f2f
    SHA1 = d6653a9926c8783906f9b72fae73cd05986e9a12

Open in new window

0
esotericmeeAuthor Commented:
Hi,
When I see the output window I see the similar values for MDA5 and SHA1 but when I open the text file where these variable values are written they are what I said earlier 1071628257 or 27.

I have uploaded the package definition file as an XML.
Please if you get a chance take a look at it and let me where I might be going wrong.

Thanks a lot for your help..
I appreciate it..
CheckSum.XML
0
Megan BrooksSQL Server ConsultantCommented:
For whatever reason, the derived column transformation was defaulting to a string length of 0 or 1 for user string variables. I am not sure why this is (I don't ordinarily need to add variables to the pipeline this way), but the widths were showing as zero when I looked at your package. I added a suitable type cast and the problem went away. I also removed the error output since it isn't needed. The truncation was due to the derived columns being too narrow.
The numbers you were seeing in the error output file were error information columns, not hashes.
The expressions I used for the derived column component columns are
(DT_WSTR,256)@[User::MD5hash]
and
(DT_WSTR,256)@[User::SHA1hash]
They are the same as what you were using except that a type cast has been added to increase the width.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
esotericmeeAuthor Commented:
Oh wonderful....Now it works just the way I wanted it too...
you are awesome...Thanks a lot...

Actually My goal here is that I have to calculate a checksum for a file and also do a rowcount for the same file and store these 2 values along with the data file in a folder and this file will then be sent to some other company where they will do the checksum and rowcount on their side and validate it with the text file we send them..
So what i was doing is I calculated and stored the rowcount in a variable and now the checksum will also be in a variable.. so Using Derived column I wanted to write the rows to a text file...
thats the only reason I am using this..


I want the checksum and rowcount values from SSIS variables in 1 row in a text file..
Is there another easier way to do this other than using derived columns?
I will be using ur script task to calculate the checksum n store it in variables.. and the RowCount transformation is storing that value in a variable too..
I just need to write these variables into a text file...

But anyway..
I really appreciate your help..
Thanks a lot..You rock!!!
Happy Easter :)
0
esotericmeeAuthor Commented:
Very Helpful and excellent Guidance...
10 on 10
0
Megan BrooksSQL Server ConsultantCommented:
You are welcome. The easiest way I see to do the rest is to create a script component with an asynchronous output to count rows and copy that and the hash variables to the pipeline. This replaces the derived column component. I have included a sample package below (again, rename .xml to .dtsx). You can do as you like with the script component output (which is a single row); I wrote it to a flat file, similar to what you had done earlier.
CheckSum2.xml
0
Ashribad sahooCommented:
how to take the file name from database
0
Anthony PerkinsCommented:
Ashribad sahoo,

This question was answered over six years ago.  You should start a new  question.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.