Link to home
Start Free TrialLog in
Avatar of holsen32
holsen32

asked on

Inserting large files in SQLServer

I am having trouble inserting large binary files in SQLServer. From what I can tell, my implementation is more or less an exact copy of the code samples in MSDN. The code works excellent for small files, but when they reach say 15 Mb in size SQLServer starts consuming memory rapidly (something like 500 Mb of RAM for a 15 Mb file).

I have tried running the client application both on the same computer as the database as well as on a separate computer and I have tried both SQLServer 7.0 and SQLServer 2000 on Windows NT4 and Windows 2000 always with the same result.

The database element is of type "image".

I would really appreciate if some one could help me with this. It does not matter if I have to use a totally different method for inserting the file as long as it does the job!



bool CDBInsertFile::AddBinary(ULONG rowid,const char* binaryFile)
{
     bool rc = false;
     if(binaryFile == NULL || *binaryFile == 0 || !VerifyAttachment(binaryFile))
          rc = true;
     else if(rowid != 0 && AllocStatement())
     {
          CString statement;
          statement.Format("UPDATE t_file SET c_binary = ? WHERE c_rowid = %d",rowid);

          if(Check(SQLPrepare(m_hstmt,(SQLCHAR *)(const char *)statement,SQL_NTS)))
          {
               SQLINTEGER     cbBinary=SQL_LEN_DATA_AT_EXEC(GetFileLen(binaryFile));
               SQLBindParameter(m_hstmt,1,SQL_PARAM_INPUT,SQL_C_BINARY,SQL_LONGVARBINARY,0,0,(SQLPOINTER)1,0,&cbBinary);
               SQLRETURN sql_rc = SQLExecute(m_hstmt);
               while(sql_rc == SQL_NEED_DATA)
               {
                    SQLPOINTER     pToken;
                    if((sql_rc = SQLParamData(m_hstmt,&pToken)) == SQL_NEED_DATA && (int(pToken) == 1))
                    {
                         FILE* fp = fopen(binaryFile,"rb");
                         if(fp)
                         {
                              char buf[1024];
                              int len;
                              while((len = fread(buf,1,1024,fp)) > 0)
                              {
                                   Sleep(1);
                                   if(!Check(SQLPutData(m_hstmt,buf,len)))
                                        ResolveError();
                              }
                              fclose(fp);
                         }
                    }
               }
               if(Check(sql_rc))
                    rc = true;
               else ResolveError();
          }
          DeallocStatement();
     }
     return rc;
}
Avatar of bkdc
bkdc
Flag of Romania image

Sorry but I can't seem to understand the why on Earth do you want to insert a 15Mb picture in a db (I guess you're not the proud owner of a 500 pieces server farm).
 I admit I have no ideea why does MSSQL require so much memory but I think it would be wiser (don't know if useful too) to store only the link to the picture and to store the picture itself as an external file.
Avatar of holsen32
holsen32

ASKER

I do need to insert the file into the database, otherwise there would not be a problem. The file itself can be any kind of binary file, i.e. most likely not an Image. The database element however, in which the binary file is stored, is of type "image".
I have seen something similar using ADO and Oracle, and we think the performance degradation was due to something like the way the database created rollback segments and the like, so maybe the thing to do is put the database into some sort of Load mode (i.e. temporarily turn off the ability to rollback or check constraints etc...)when the data is inserted.

You will have to check your API reference for how to do that with SQL Server, however.

I suspect that, in general, this is more of a SQL Server issue than a C++ one, however.  You should probably post a link (0 point question) referencing this question in the appropriate database forums.
BTW, one thing you could consider if the file is *actually* not of type image - is to compress it before you load it...there are some good software components out there for this - look at http://www.xceedsoft.com/ for examples - (this is what I did).
I think that "jasonclarke" is talking about Transactions for the most part here.  Are implicit transactions turned on?  Are you executing this query as part of a Transaction?

Also, zlib (http://www.gzip.org/zlib/) gives you free compression code.

-=- James.
It is not necessarily just a function of transactions...there are 'implicit' rollbacks that might occur, for example if a constraint violation is made.  
Thank you all very much for your help. It seems very likely that problem could be solved by changing the way SQLServer handles commits and rollbacks. It would of course be nice to be able to roll back the transactions if something goes wrong, but I would have settled with just being able to insert the file. I tried both auto commit mode, which is the default in SQLServer, as well as manual commit mode / implicit transactions both with the same result.

I did however find another way to solve the problem. ?The macro SQL_LEN_DATA_AT_EXEC(0);? takes the length of the binary data as input. In MSDN however they (with no exception as far as I could find) supplied the value zero (0) in their examples. This works fine for small files, but when the files grow the server starts consuming memory and eventually you get a function sequence error. So, simply by supplying the length the problem disappeared. I actually did this in the code segment that I supplied above, but there was an identical section in the program except that it supplied the value zero, and it was this code that didn?t work.

Anyway, the problem is solved!

Thank you all for your input!


// holsen32
That is why I asked: "Are implicit transactions
turned on?"  IME, saving/loosing the data in the "deleted" and "inserted" tables is usually not too bad.  But then again, I never stored large BLOBs directly in the DB, just pointers to their location.

Anyway, glad to hear that your problem is solved.

-=- James.
> That is why I asked: "Are implicit transactions
> turned on?"  

There is still more to it than just this, I think.  Consider what happens if an SQL query is occuring at the same time the database is updating the table.  The DB engine must be able to ensure that the query that takes place remains consistent throughout it's lifetime, regardless of whether the change is committed or not.  Databases are pretty hot on data consistency too - what should the database do if the system goes down part way through updating a BLOB - which is after all a potentially long process?  
> There is still more to it than just this, I think.  [...]

I understand that there are things that you may have little or no control over, but that does not mean that you should not try everything at your disposal.

> what should the database do if the system goes down part way through updating a BLOB - which is after all a
> potentially long process?

Depends on what settings/modes are in place at the time of the update.  Internally, the RDBMS needs to make sure that the internal data is not corrupted, but that has nothing (really) to do with the use of transactions to make sure that user data is consistant.  

A column of 80 CHARs that is missing half its user data does not bother the RDBMS, but it might bother the user!

Besides, that is why we never put really large objects into the database.  If the database would end up storing a pointer to the data (stored externally), which can happen if the data is large enough, we just managed the pointers ourselves, and took the load of the transfer off of the DB server.

-=- James.
> I understand that there are things that you may have
> little or no control over, but that does not mean
> that you should not try everything at your disposal

I agree, I thought you were suggesting that just controlling whether or not transactions were on or off was all there was to think about.  My point was only that there may well be more to it than that.
Agreed.

-=- James.
Avatar of DanRollins
While playing with the code my SQL 7 server would seem to stop at about the 5MB point.  

I went in and changed the database growth and log growth to 20MB and the problem went away (I still see small pauses at 5MB and 10MB, but the transaction completes without error.

I took some liberties with your code, so something I did there might be key.

1) I delete the record and insert anew each time.
2) Removed an outer loop
3) I'm moving larger chunks.
4) I added the "final" call to SQLParamData(); as was shown in MS example code.  Without it I got an error when I  did SQLFreeStmt();

I have an ODBC connection using Named Pipes transport.
 
Incidentally, at 1024 bytes, it took 74 seconds to complete.  At 4096, it took 33 and at 8192, it took 16 seconds to store a 15MB file -- so the chunk size is certainly relevant.
=--==-=-=-=-=-=-=-=-=-
ErrRet CDb::SqlInsertFile( int nIdx, LPCSTR sFile )
{
    ErrRet    eRet;
    SQLRETURN rc;

    FILE* fp= fopen( sFile,"rb" );
    if ( !fp ) {
        AfxMessageBox("PULLLLEEEEEEZE!");
        return( IS_ERR );
    }

    fseek( fp, 0, SEEK_END );
    int nFileLen= ftell(fp);
    fseek( fp, 0, SEEK_SET );

    SqlPrintfExec("DELETE FROM t_file WHERE nIdx= %d", nIdx );
    SqlPrintfExec("INSERT INTO t_file (nIdx) VALUES (%d)", nIdx );

    CString  sStmt;
    sStmt.Format("UPDATE t_file SET cBinaryData= ? WHERE nIdx= %d", nIdx );

     eRet= SqlPrepare( sStmt );  // SQLAllocStatement and SQLPrepare

    SQLINTEGER cbBinary= SQL_LEN_DATA_AT_EXEC( nFileLen );
    rc= SQLBindParameter(m_hStmt,1,SQL_PARAM_INPUT,SQL_C_BINARY,SQL_LONGVARBINARY,0,0,(SQLPOINTER)1,0,&cbBinary);

    rc= SQLExecute( m_hStmt );

    gnDoneSoFar= 0;
    SQLPOINTER pToken;
    rc= SQLParamData(m_hStmt, &pToken );
    if ( rc== SQL_NEED_DATA && pToken == (SQLPOINTER)1 )  {
        char buf[8192];
        int len;
        while( (len= fread(buf,1,8192,fp)) > 0) {
            DoDelayMs(1); // let UI update
            rc= SQLPutData( m_hStmt, buf, len );
            if ( rc != 0 ) {
                SqlCheckAndLogErr(rc);  // bummer!
                break;
            }
            gnDoneSoFar += len; // UI only
        }
    }
    rc= SQLParamData(m_hStmt, &pToken );
 
    fclose( fp );
    eRet= SqlComplete();
    return( eRet );
}
 
=--==-=-=-=-=-=-=-=-=-
You'll see that I used my own CDatabase-derived class, so  some calls are named SqlXxx (not SQLXxx) and that I use m_hStmt (not CDatabase::m_hstmnt).  If you have any questions, let me know.

-- Dan
ASKER CERTIFIED SOLUTION
Avatar of DanRollins
DanRollins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Just a small comment:

> CString  sStmt;
> sStmt.Format("UPDATE t_file SET cBinaryData= ? WHERE nIdx= %d", nIdx );

If you are going to use this in anything that is supposed to be high performance, or is multi-threaded with no multi-threaded-heap-management.  I would suggest, at a minimum, not using the CString and just using a stack-allocated TCHAR buffer and _stprintf(...).  That will be faster than the dynamic allocation, and will not parse the format string twice, as CString::Format(...) does.

-=- James.


come on jtwine...
We are talking about a function that takes 74 seconds to complete... and you identify a performace issue that *may* have a 1 millisecond impact?

-- Dan

> We are talking about a function that takes 74 seconds to complete

We are talking about a solution that may be used as an example elsewhere, where the other actions might not take so long to complete.

> and you identify a performace issue that *may* have a 1 millisecond impact?

True, on trivial apps, 1ms is likely to not even be noticed.  However, in real world server projects, 1ms per client request starts to adds up.

(...Or more, esp if multiple threads are all hitting the heap at once, or the SQL query string gets long enough the "parsing twice" problem wastes the time, the memory allocation may block.)

As I said: "If you are going to use this in anything that is supposed to be high performance, or is multi-threaded with no multi-threaded-heap-management."

"High performance" means just that: high performance; not "well, it's kinda fast, when compared to the time other actions take...".

And as said above, it was a suggestion, nothing more.  Feel free to ignore or trivialize it, I have seen others do (much) worse.

-=- James.
holsen32,
Thanks for accepting my comment as an answer.

As an expert here at EE, I am striving to obtain a "4.0" grade point average.  You have awarded me a "B" which degrades my average.  I'm sure that you intended no harm, but I would appreciate it if you would answer a few questions -- to help me attain my goal:

1) Was my comment unclear?  Is my use of the English language inadequate?

2) What could I have added that would bring my comment up from a "B" to an "A" ?

3) Are you aware that a comment is not necessarily intended as a "final answer" and that it is possible for you to reply -- asking for clarification and/or followup?

4) Did I offend you in any way?

6) Are you aware that it costs you the same number of points to award an Excellent (A)?  Do you understand that Experts prefer getting an A's and we will often spend more time helping users who have a history of awarding A's?

Thanks!

-- Dan
As a comment to DanRollins comment...
You have all given valuable input on my upload-problem even though the discussion some times tended to drift away in other directions ? still often with an interesting conversation though. For example jasonclarke and jtwine have given very useful input on how attempt to work around the problem by changing the way the database handles/logs transactions. I did however not manage to actually solve the upload problem based on their input, though their input together with my ?research based on their input? gave me a better understanding of how the database actually handles transactions.

The reason that I accepted your comment as an answer was that you gave an actual example of how the problem could be solved and that you actually found some errors that I had made in the code. I also made the assumption, right or wrong, that you were more eager to get the points than the others who rather wanted to share their knowledge.

The reason that I did not give your answer a higher rating was that the problem had actually already been solved a few days earlier. I could of course have given you a top rating even though your comment did not actually solve the problem, not saying that it could not have had it only arrived a few days earlier, but would this have been right?

It was however never my intention to offend you in any way by giving you the lower grade, and I actually did not even know that it effected an average score some where. I should probably have given you a higher grade though it would have felt strange to give a top grade to a possible solution of a problem that had already been solved.

Again, thank you all for your help!

// holsen32