Office 2007 docx file corrupted when retrieved from SQL 2005 DB in C++

The code attached uses Chunking to store a doc file as a binary file on SQL DB. This code works fine for Office 2003 doc files, but Office 2007 docx files are being retrieved from the database incorrectly.

When I try to open the file retrieved from the DB, Word says its corrupted, if you select Repair, the original file is recovered.

Have been at a loss to explain the difference, I know docx is essentially a zip file containing XML files, but they are getting stored as binary on the database, the field type on the DB is image. I feel that this method of storage should be ok, but am not sure.

Have read online that SQL 2005 is storing docx files on the database with an extra byte, however I have compared byte array length and content at the end and they are similar for the file being read in and the data read out of DB.

This has halted our migration to Office 2007 and any help would be greatly appreciated.

Many thanks
// storing the data on DB
 
	// get the file in memory
	ULONG datasize = (ULONG)file.GetLength();
	file.Close();
 
	int counter=0;
	char ch;
	SAFEARRAY FAR *psa;
	SAFEARRAYBOUND rgsabound[1];
	rgsabound[0].lLbound = 0;
	rgsabound[0].cElements = datasize;
    	psa = SafeArrayCreate(VT_UI1,1,rgsabound);
	long index1 = 0;
	std::ifstream in(filename, std::ios::in | std::ios::binary);
	while(!in.eof())
	{
		in.get(ch);
		HRESULT hr = SafeArrayPutElement(psa,&index1,(void*)&ch);
		index1++;
	} 
	in.close();
 
	// update the CV image data
	cmd.Format("select * from CV where DocID = %ld", cd.m_nDocID);
	pCVInfo->CursorType = adOpenKeyset;
	pCVInfo->LockType = adLockOptimistic;
	pCVInfo->Open(_bstr_t(cmd), _variant_t((IDispatch*)m_pConn,true),adOpenKeyset,adLockOptimistic,adCmdText);
	_variant_t varChunk;
	varChunk.vt = VT_ARRAY|VT_UI1;
	varChunk.parray = psa;
	pCVInfo->ADOFields->GetItem("Data")->AppendChunk(varChunk);
	if(pCVInfo->Update()==S_OK)
	{
		pCVInfo->Close();
		bRet = true;
	}
	pCVInfo=NULL;
 
 
// retrieving the data from DB
 
	CString filter;
	filter.Format("select data from CV where DocID = %ld", docID);
 
	const int nChunkSize = 1024;
	_RecordsetPtr pCVInfo = NULL;
 
	pCVInfo.CreateInstance(__uuidof(Recordset));
	pCVInfo->CursorType = adOpenStatic;
	pCVInfo->LockType = adLockOptimistic;
	pCVInfo->Open(_bstr_t(filter), _variant_t((IDispatch*)m_pConn,true),adOpenStatic,adLockReadOnly,adCmdText);
 
	ULONG datasize = pCVInfo->ADOFields->Item["Data"]->ActualSize;
	//Create a safe array to store the array of BYTES  
	ULONG lngOffSet = 0;
	std::ofstream out(filename, std::ios::out | std::ios::binary);
	UCHAR chData;
	while(lngOffSet < datasize)
	{
		_variant_t varChunk = pCVInfo->ADOFields->Item["Data"]->GetChunk(nChunkSize);
		//Copy the data only upto the Actual Size of Field.  
		for(long index=0;index<=(nChunkSize-1);index++)
		{
			HRESULT hr = SafeArrayGetElement(varChunk.parray,&index,(void*)&chData);
			out.put((char)chData);
		}
		lngOffSet = lngOffSet + nChunkSize;
	}
	lngOffSet = 0;		
	out.close();
 
	pCVInfo->Close();
	pCVInfo = NULL;

Open in new window

husaamAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ZoppoCommented:
Hi husaam,

I guess the problem is your function writes too much at the end of the file if the last chunk is shorter than 'nChunkSize' since in the 'for' loop you always write 'nChunkSize' chars, no matter if 'lngOffSet + nChunkSize' >= 'datasize'.

I found a sample for how to use this in MSDN and found that there the return value of SafeArrayGetElement is evaluated to find when no more data is present - maybe this works for you too:

...
               for(long index=0;index<=(nChunkSize-1);index++)
                {
                        HRESULT hr = SafeArrayGetElement(varChunk.parray,&index,(void*)&chData);
                        if ( SUCCEEDED( hr ) )
                        {
                           out.put((char)chData);
                        }
                        else
                        {
                            break;
                        }
                }
                lngOffSet = lngOffSet + nChunkSize;
...


Hope that helps,

ZOPPO
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
husaamAuthor Commented:
Fantastic Zoppo,

you were spot on.
Guess Word in Office 2003 did not mind the extra characters, Word 12 does.

Thanks once again,
0
husaamAuthor Commented:
Thanks !
0
ZoppoCommented:
Yes, it seem so ...

you're welcome, I'm glad I could help you.

Have a nice day,

best regards,

ZOPPO
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C++

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.