Solved

quick/cheap method of concatenating _bstr_t or wchar_t *

Posted on 2002-06-14
23
853 Views
Last Modified: 2012-05-04
I have an ADO recordset object which is giving me _bstr_t * or wchar_t *s like so:

pVal = (wchar_t *) (_bstr_t) m_pRecS->Fields->GetItem( lIndex )->GetValue();

However I am experiencing shocking performance when I try to manipulate/concatenate the resulting _bstr_t or wchar_t * with another _bstr_t or wchar_t.

Is there an efficient way of concatenating these types together? I have tried using the _bstr_t += operator and the wcscat( wcTarget, wcSource ) function, but both seem to be very slow.

Any help will be much appreciated.

Another alternative would be to make ADO give me back unicode strings if this is possible.
0
Comment
Question by:duncanlatimer
  • 14
  • 4
  • 3
  • +2
23 Comments
 

Author Comment

by:duncanlatimer
ID: 7078035
Woops, what I meant was give me back unibyte strings. Sorry.
0
 
LVL 86

Expert Comment

by:jkr
ID: 7078097
'wchar_t*' is in fact a pointer to a UNICODE string. Have you tried stringstreams? E.g.

#include <sstream>

using namespace std;

wstringstream wss;

wss << ((wchar_t *) (_bstr_t) m_pRecS->Fields->GetItem( lIndex )->GetValue());

const wchar_t* pAllConcatenated = wss.str().c_str();
0
 
LVL 12

Expert Comment

by:pjknibbs
ID: 7078137
Are you sure it's the concatenation operation which is slowing you down, rather than the ADO operations necessary to return the strings from the database in the first place?
0
 

Author Comment

by:duncanlatimer
ID: 7078392
jkr, I tried the wstringstream, but it gave me the same performance as when using _bstr_t.

pjknibbs, I'm pretty sure that the inefficientcy is in the concatenation of _bstr_t and wchar_t as if I leave out the concatenation I gain about 25% performance.

The best performance is achieved when simply bringing back variant_t, but I haven't got any way of cancatenating two together, so they are no good to me.
0
 

Author Comment

by:duncanlatimer
ID: 7078460
jkr, I tried the wstringstream, but it gave me the same performance as when using _bstr_t.

pjknibbs, I'm pretty sure that the inefficientcy is in the concatenation of _bstr_t and wchar_t as if I leave out the concatenation I gain about 25% performance.

The best performance is achieved when simply bringing back variant_t, but I haven't got any way of cancatenating two together, so they are no good to me.
0
 
LVL 30

Expert Comment

by:Axter
ID: 7078473
Can you post your code?
The section that is getting the performance hit.
0
 

Author Comment

by:duncanlatimer
ID: 7078640
jkr, I tried the wstringstream, but it gave me the same performance as when using _bstr_t.

pjknibbs, I'm pretty sure that the inefficientcy is in the concatenation of _bstr_t and wchar_t as if I leave out the concatenation I gain about 25% performance.

The best performance is achieved when simply bringing back variant_t, but I haven't got any way of cancatenating two together, so they are no good to me.
0
 

Author Comment

by:duncanlatimer
ID: 7078671
Code below:

const DatabaseReader & DatabaseReader::ReadValues()
{
   m_sValue.erase();
   
   try
   {
      m_bOnBoundary = false;

      for( long lIndex( 0 ); lIndex < m_pRecS->Fields->GetCount(); ++lIndex )
      {
         DString sCol = m_ColMap.find( lIndex )->second;
         DString sVal;
   
         if( m_pRecS->Fields->GetItem( lIndex )->GetValue().vt == VT_NULL )
         {
            sVal = "NULL";
         }
         else
         {
///////////////////////////////////////////
// The following bit was really slow
// but I have now changed this as follows:
//
//  variant_t & varVal = m_pRecS->Fields->GetItem( lIndex )->GetValue();
//        
//         switch( varVal.vt )
//         {
//         case VT_NULL:
//            sVal = "NULL";
//            break;
//
//         case VT_BSTR:
//            {
//               char * pc = (char *) varVal.pbVal;
//               unsigned char szTmp[1024];
//
//               for( size_t i(0); pc[i*2]; ++i )
//               {
//                  szTmp[i] = pc[i*2];
//               }
//               szTmp[i] = 0;
//
//               sVal = DString( (char *) szTmp );
//            }
//            break;
//            ......
// before doing this I tried casting the variant_t to a _bstr_t * and a wchar_t *
// as this did not work I resorted to the desparate measures you see above
//
///////////////////////////////////////////
            sVal = (_bstr_t)m_pRecS->Fields->GetItem( lIndex )->GetValue();
         }
         
         m_rSelector.DoDataMapping( sCol, sVal );

         if( sCol == m_sBoundaryField && sVal != m_sBoundaryFieldLastValue )
         {
            m_bOnBoundary = true;
            m_sBoundaryFieldLastValue = sVal;
         }

         m_sValue += sVal;
         m_sValue += ",";
      }
 
      if( !m_sValue.empty() ) m_sValue.resize( m_sValue.length() -1 );

      m_pRecS->MoveNext();

      if( m_pRecS->adoEOF == VARIANT_TRUE ) m_bFinished = true;

   }
     catch (...)
     {
          throw e_DatabaseException( "error reading recordset" );
     }

   static int iCounter(0);

   ++iCounter;
   
   if( m_bStats && !(iCounter%100) )
   {
      cout << "\rcompleted " << (int)(((float)iCounter / (float)m_iRows)*100) << "%   ";
   }

   return *this;
}

0
 
LVL 30

Expert Comment

by:Axter
ID: 7079646
Is DString a CString decendent?
If so, you sould be able to do the following:
sVal = (wchar_t)varVal.pbVal;
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 7082234
Are you concatenating really *long* strings?  Or are you concatenating *many* strings?  (no way to tell from the code).

If so, I ran into a similar probelem while generating giant XML files (250K).  With these string operations, the + operator checks to see if the resulting string will fit in the existing buffer and if not, it must allocate a buffer that is big enough for the combined length.  At 250K, that is two calls to strlen (or equiv) an allocation of 250K, a move of 250K, a move of the new part, and free of the old buffer -- for EACH concatenation.

Here is how I solved that:  I concatenate into a temporary cache string and when that string gets longer than a few KB, I then add it to the final string and empty the cache string (without deallocating it).  The result is vastly fewer trips to the allocation pool.  I also decided to track the current length of the cache manually and 'concatenate in place' (avoiding the many calls to strlen) but that was the lesser improvement.

In my app, the XML generation went from an agonizing 30 seconds to less than one second.  It was breathtaking.  I am so proud.

-- Dan
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 7082252
>>Another alternative would be to make ADO give me back [non=] unicode strings...

That is not possible.  UNICODE, and all of the inefficiency that it brings into every part of string processing, irritates the heck out of me.  But I'm old school and I never sell into the far-east market.  Just buck up and do the conversion.  Note that the _bstr_t does a conversion behind your back when you need it (whenever you cast to char*).  If I am remembering right, you actually carry around two copies of the string (inefficient as hell for long strings, rarly a problem with short ones).  In your case, I suggest to NOT do a conversion from UNICODE to ANSI unto the very end.

-- Dan
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:duncanlatimer
ID: 7083280
DString is actually a decendant of std::string, but the MS provided version of std::string seems happy to construct itself from a _bstr_t, it just doesn't seem to do it very fast.

The strings are not very long, but there are 62 million concatenations required. (93 cols X 671K rows)

Can I get ADO to give me back the entire row, rather than one column at a time?
0
 

Author Comment

by:duncanlatimer
ID: 7083285
DString is actually a decendant of std::string, but the MS provided version of std::string seems happy to construct itself from a _bstr_t, it just doesn't seem to do it very fast.

The strings are not very long, but there are 62 million concatenations required. (93 cols X 671K rows)

Can I get ADO to give me back the entire row, rather than one column at a time?
0
 

Author Comment

by:duncanlatimer
ID: 7083369
DString is actually a decendant of std::string, but the MS provided version of std::string seems happy to construct itself from a _bstr_t, it just doesn't seem to do it very fast.

The strings are not very long, but there are 62 million concatenations required. (93 cols X 671K rows)

Can I get ADO to give me back the entire row, rather than one column at a time?
0
 

Author Comment

by:duncanlatimer
ID: 7083370
I am now trying to use a std::basic_string<wchar_t> and this is giving me much better performance, any ideas? (still doing my own conversions!)

code now:
variant_t & varVal = m_pRecS->Fields->GetItem( lIndex )->GetValue();

         switch( varVal.vt )
         {
         case VT_NULL:
            wsVal = PWCHAR( "NULL" );
            break;

         case VT_BSTR:
            wsVal = (wchar_t *) varVal.pbVal;
            break;
         
         case VT_I2:
            swprintf( szTmp, PWCHAR( "%i" ), (long) varVal.iVal );
            wsVal = szTmp;
            break;

         case VT_I4:
            swprintf( szTmp, PWCHAR( "%i" ), varVal.lVal );
            wsVal = szTmp;
            break;
         
         case VT_R4:
            swprintf( szTmp, PWCHAR( "%f" ), varVal.fltVal );
            wsVal = szTmp;
            break;

         case VT_R8:
         case 14:
            swprintf( szTmp, PWCHAR( "%lf" ), varVal.dblVal );
            wsVal = szTmp;
            break;

         default:
            wsVal = (_bstr_t) varVal;

         }
0
 

Author Comment

by:duncanlatimer
ID: 7083590
I am now trying to use a std::basic_string<wchar_t> and this is giving me much better performance, any ideas? (still doing my own conversions!)

code now:
variant_t & varVal = m_pRecS->Fields->GetItem( lIndex )->GetValue();

         switch( varVal.vt )
         {
         case VT_NULL:
            wsVal = PWCHAR( "NULL" );
            break;

         case VT_BSTR:
            wsVal = (wchar_t *) varVal.pbVal;
            break;
         
         case VT_I2:
            swprintf( szTmp, PWCHAR( "%i" ), (long) varVal.iVal );
            wsVal = szTmp;
            break;

         case VT_I4:
            swprintf( szTmp, PWCHAR( "%i" ), varVal.lVal );
            wsVal = szTmp;
            break;
         
         case VT_R4:
            swprintf( szTmp, PWCHAR( "%f" ), varVal.fltVal );
            wsVal = szTmp;
            break;

         case VT_R8:
         case 14:
            swprintf( szTmp, PWCHAR( "%lf" ), varVal.dblVal );
            wsVal = szTmp;
            break;

         default:
            wsVal = (_bstr_t) varVal;

         }
0
 

Author Comment

by:duncanlatimer
ID: 7083639
I am now trying to use a std::basic_string<wchar_t> and this is giving me much better performance, any ideas? (still doing my own conversions!)

code now:
variant_t & varVal = m_pRecS->Fields->GetItem( lIndex )->GetValue();

         switch( varVal.vt )
         {
         case VT_NULL:
            wsVal = PWCHAR( "NULL" );
            break;

         case VT_BSTR:
            wsVal = (wchar_t *) varVal.pbVal;
            break;
         
         case VT_I2:
            swprintf( szTmp, PWCHAR( "%i" ), (long) varVal.iVal );
            wsVal = szTmp;
            break;

         case VT_I4:
            swprintf( szTmp, PWCHAR( "%i" ), varVal.lVal );
            wsVal = szTmp;
            break;
         
         case VT_R4:
            swprintf( szTmp, PWCHAR( "%f" ), varVal.fltVal );
            wsVal = szTmp;
            break;

         case VT_R8:
         case 14:
            swprintf( szTmp, PWCHAR( "%lf" ), varVal.dblVal );
            wsVal = szTmp;
            break;

         default:
            wsVal = (_bstr_t) varVal;

         }
0
 
LVL 49

Accepted Solution

by:
DanRollins earned 200 total points
ID: 7084672
>>there are 62 million concatenations required.

Then you will see ENORMOUS benefit from using the technique I described.  Contcatenate into a temporary buffer, then when it is about to overflow, concatenate the temp buffer to the final output buffer and clear the temp buffer.

>> Can I get ADO to give me back the entire row...
I can't think of any reasonable way to do that.

-- Dan
0
 
LVL 30

Expert Comment

by:Axter
ID: 7084697
>>I am now trying to use a std::basic_string<wchar_t> and
>>this is giving me much better performance, any
>>ideas? (still doing my own conversions!)

If you have a rough idea how large you need std::basic_string<wchar_t>, you should reserve the space you need for the approximate size.

Example:
std::basic_string<wchar_t> MySting;
MySting.reserve(MyAproximateSize);

That will speed it up a bit.
0
 

Author Comment

by:duncanlatimer
ID: 7089679
Dan, I am accepting your comment as the answer because the caching gave me the most benefit, although the comments of all were helpful.

My 42 minute process now takes under 8.

Cheers.
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 7089917
Thanks.  I did think of something else for you:

>> >> Can I get ADO to give me back the entire row...
>>I can't think of any reasonable way to do that.

You are making zillions of roundtrips to the database.  If there is any way to write a stored procedure to gather more data in one call, then you will eliminate oodles of overhead.  Just a thot.

-- Dan
0
 

Author Comment

by:duncanlatimer
ID: 7092045
Thanks, that sounds like a great idea, I'll give it a try.
0
 

Author Comment

by:duncanlatimer
ID: 7092280
Run time now less than 3 minutes, cheers again.
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now