Link to home
Start Free TrialLog in
Avatar of allmer
allmerFlag for Türkiye

asked on

Memory management CString (CString? - std||mfc), string(stl).

Hi experts,
I was wondering:
1 is CString part of the ANSI standard?
2 if I use CString and / or string do I have to take care of memory?
 
With CString I know I have to call ReleaseBuffer() after I called GetBuffer().
Do I have to call string c; c="Hallo" c.Empty();
or does the memory get dealocated when string falls out of scope like with intrinsic types.
Same question for CString.

Thanks,
Jens
SOLUTION
Avatar of jkr
jkr
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of allmer

ASKER

Thanks alot,
you made it all clear to me ;-)
But now I have to rewrite everything that contains CString :(
But well, no problem , I am learning alot here.
Cheers,
Jens
Actually, CString is a better string class than std::string. The latter - though part of C++ standard - has a very small interface focussing to much on handling different char types by neclecting elementary string functionality. E. g. there isn't an operator + function, so the following statements will fail:

    string s1 = "ABC";
    string s2 = "XYZ";
    string s3 = s1 + s2;       // fails
    string s4 =  s1 + "123"; // fails

If you want to build a string from many other strings or string literals, you either had to use string streams or append any single component with a new += statement.

Although you could add functionality by defining new operator functions, doing so would make your solution as proprietary as it would be with any other non-standard string class. Actually, std::string is a poor string class whose biggest advantage is that it is available everywhere.

Thus, before replacing CString by std::string you should check before that std::string has the functionality required.

Regards, Alex


 
Avatar of allmer

ASKER

Thanks for the inside Alex.
I think in my <string> class the operator+ function is overloaded.
It is declared as
#include <string>
using namespace std;
I hope windows includes unchanged standard libraries ... but you never know.

But the main issue for me is portability.
I am developing this in a Windows environment but as soon as it
is running (what it does right now) I have to get it running
on a UNIX server. So I am trying to be as ANSI as possible (except for the GUI).
Best regard,
Jens.
>>But the main issue for me is portability.
FYI:
The CString class is much more efficient then the std::string class that comes with VC++ compiler.

So if you switch from CString to std::string, you'll loose efficiency.
I would recommend you use the CString class, and use a CString clone on the UNIX environment.

The following are links to some CString clones:
http://www.codeguru.com/Cpp/Cpp/string/alts/article.php/c5645/
http://www.codeguru.com/Cpp/Cpp/string/alts/article.php/c2781/

CString is not only more efficient, but the interface is far superior to std::string.
Avatar of allmer

ASKER

What do you mean be efficency?
I don't much care about the interface.
The only thing I need is pure runtime speed.

...So I changed everything from CString to string.
And to me it seems like a version using CString is alot faster than one using string.
At least 10 times faster.

This is what I do:
I read in text from a text file (line for line) and use string += string.
Than I do alot of searching within the text in memory.
Which is actually faster than loading the text.

The CString concatenation and loading may take up to 10 min for a 100 MB file.
The string .... I killed after say 25 min.

So I will look into the classes you mentioned or simply use charackter arrays.
Which will probably be the fastes aproach.
Since I am planning on mapping the whole file to memory this will probably be the
method of choice.

Thanks Axter.

>>So I will look into the classes you mentioned or simply use charackter arrays.
>>Which will probably be the fastes aproach.
I had a big project that I had to change from MFC to a UNIX platform.
Since I wanted both versions of the program to use the same class, I tried to create a portable CString that would work on VC++ as good (or better) then the MFC CString class.
I wasted over 3 weeks, in this failed attempt.
I couldn't get a class to work as good as the MFC CString class.

At the end, I just used a modified version of the MFC CString class for the UNIX port.

The implementation for the CString class comes with VC++, so you have the full source if you need it, and most of it can be ported to UNIX.


>>So I will look into the classes you mentioned or simply use charackter arrays.
>>Which will probably be the fastes aproach.

Don't count on it being faster using character arrays.  I recommend you test it out before you invest too much time in it.

>>Since I am planning on mapping the whole file to memory this will probably be the
>>method of choice.

Yes, file mapping will give you a lot more efficiency.  It would probably be the best way to beat CString efficiency.
But remember that the API functions used for windows file mapping is different then the UNIX (POSIX) functions.
If you use file mapping, I recommend you use a wrapper class, that will work on both UNIX/Linux and Windows.
In the implementation of your wrapper class, you can put #ifdef WIN32 directives to separate UNIX code from Windows.
One last point...
The MFC CString class can be pass to a variable argument function.
Example:
      CString data = "Hello World!";
      printf("Test %s\n", data);

This works with MFC CString, but fails with most CString clones.
If you try this with CStr, it will fail.
And of course, this fails with std::string.

To make things worse, this type of failure is hard to find, since you don't get a compile error, and ususally you will not get a runtime failure.
>>So I will look into the classes you mentioned or simply use charackter arrays.
>>Which will probably be the fastes aproach.

There are mostly two reasons why CString is fast:

- Reference counting: that means that any copy of a CString uses only a pointer to the original buffer and a common counter is used to count the number of references
  ( so the buffer could be deleted if the counter is 0). Copying of strings happens when strings are passed or returned by value.
- CString allocates more memory than required by the size parameter. I mean the minimum allocation is 64 byte, then there are steps of 128, 256, 512 bytes depending
  on the required size or resize argument. Only CStrings bigger than 512 get a precise allocation. Thus, for concatening strings CString normally don't need any
  reallocation. However, once i tried to load 10000 database records each of them holding 20 or more CString members. That program failed to run on computers having
  leyss than 32 MB main memory (ok, it was some time in the past).

The only thing to speed up is if you know the exact size of your buffer(s) and have a read-only access on it. BTW, you could make all string classes faster by using a constructor where you can pass an initial allocation size big enough to cover all modifications you intend to do later. That is, because no string class i know of, deallocates a large buffer once allocated. So,

    string largeBuf(1000000, '\0');
    largeBuf.resize(0);

would get you an empty string that is very speedy unless you need more than 1000000 string length.

Regards, Alex
>> Reference counting: that means that any copy of a CString uses only a pointer to the original
>>buffer and a common counter is used to count the number of references
Yes, but std::string also has reference counting and it's much slower then CString.  So I wouldn't give this as the reason why it's fast.

Also, all the CString clones I've created use reference counting, and none of them beat CString's speed.


>>- CString allocates more memory than required by the size parameter. I mean the minimum
>>allocation is 64 byte, then there are steps of 128, 256, 512 bytes depending
>>would get you an empty string that is very speedy unless you need more than 1000000 string
>>length.

The size of the initial allocation, also doesn't seem to be a significant factor.  I've played around with the size, and CString still beats std::string by a factor of 10 to 1.

Here's a little test program:

int dummy_var = 0;

void function1(const char*Data)
{
      CString strData = Data;
      dummy_var += strData[99];
}

void function2(const char*Data)
{
      std::string strData = Data;
      dummy_var += strData[99];
}

void function3(const char*Data)
{
      CString strData;
      strData = Data;
      dummy_var += strData[99];
}

void function4(const char*Data)
{
      std::string strData(513, 0);
      strData = Data;
      dummy_var += strData[99];
}

void function5(const char*Data)
{
      std::string strData(999, 0);
      strData = Data;
      dummy_var += strData[99];
}

int _tmain(int argc, TCHAR* argv[], TCHAR* envp[])
{
      const QtyTimesTest = 99999;
      DWORD StartTime, LenTime1=0, LenTime2=0, LenTime3=0, LenTime4=0, LenTime5=0;
      const int SizeData = 512;
      char DummyData[SizeData+1];
      memset(DummyData, 'x', SizeData);
      DummyData[SizeData] = 0;
      
      {
            StartTime = GetTickCount();
            for (int i = 0;i < QtyTimesTest;++i)
            {
                  function1(DummyData);
            }
            LenTime1 += (GetTickCount() - StartTime);
      }
      
      {
            StartTime = GetTickCount();
            for (int i = 0;i < QtyTimesTest;++i)
            {
                  function2(DummyData);
            }
            LenTime2 += (GetTickCount() - StartTime);
      }
      
      {
            StartTime = GetTickCount();
            for (int i = 0;i < QtyTimesTest;++i)
            {
                  function3(DummyData);
            }
            LenTime3 += (GetTickCount() - StartTime);
      }
      
      {
            StartTime = GetTickCount();
            for (int i = 0;i < QtyTimesTest;++i)
            {
                  function4(DummyData);
            }
            LenTime4 += (GetTickCount() - StartTime);
      }
      
      {
            StartTime = GetTickCount();
            for (int i = 0;i < QtyTimesTest;++i)
            {
                  function5(DummyData);
            }
            LenTime5 += (GetTickCount() - StartTime);
      }
      
      printf("Time duration is as follow:\n\
            LenTime1 = %li\n\
            LenTime2 = %li\n\
            LenTime3 = %li\n\
            LenTime4 = %li\n\
            LenTime5 = %li\n",
         LenTime1, LenTime2, LenTime3, LenTime4, LenTime5);
      if (!dummy_var) printf("dummy %i", dummy_var);
      system("pause");
Avatar of allmer

ASKER

Hi Alex,
I was going through similar ideas as you mentioned above.
I have to deal with textfiles called fasta-"databases".
They are organized like this:
https://www.experts-exchange.com/questions/20943958/ifstream-get-parts-of-a-formatted-textfile.html
I always compare everything to a running JAVA version I have so
concatenating strings was even slower than JAVA which is supposed to be 100 times slower than Cpp.
Concatenating CStrings was pretty quick about 100 times as fast as Java, as predicted.
Then I tried to take a fixed Buffer slightly larger than the largest chunk in the "database"
But that turned out to be very slow, too, since most parts of the db were relatively small.
The fastes way I found so far is marking a point at the beginning and at the end of each chunk in
the textfile, thus getting the required size for the char[]. Then reading the whole piece into the char[].
I am actually reading the file twice in this case. But it still proves faster than all the
other approaches.
But then again can be an error in measurement.
I will set up two questions on monday regarding file mapping, to get rid of all these problems.
The only time critical part is the file reading all other usages of string are not time critical.
Thanks for your suggestions.

And hopefully a nice discussion on monday ;-)
The above program returns the following results when compiled on ***RELEASE*** version. (VC++ 6.0)

Time duration is as follow:
                LenTime1 = 297
                LenTime2 = 2140
                LenTime3 = 313
                LenTime4 = 4375
                LenTime5 = 3953
Press any key to continue . . .

Function 4 and 5 initializes std::string with enough space for the target data, however CString still out peformance std::string by a factor of MORE then 10 to 1.

I'm not entirely certain why CString is so much more efficient then std::string, but I believe it might be because the std::string implementation that comes with VC++ is very very poorly optimized.
And I believe that CString might have been specificaly made for the VC++ compiler, so as to produce the best optimization.
>> I'm not entirely certain why CString is so much more efficient then std::string

Maybe CString does inlining for the test function and std::string doesn't.

Axter,

do you know whether std::string is thread-safe? My own string class is made thread-safe and i had to enter a critical section when updating the string.

Maybe i'll find time tomorrow to debug both implementations.

Regards, Alex
>>do you know whether std::string is thread-safe?

I'm not sure.