Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


is data in c++ string class contiguous?

Posted on 2003-03-29
Medium Priority
Last Modified: 2006-11-17
Is the data in the STL's string class guaranteed to be contiguous?

I'm asking because I'm just starting to do spot optimization of C++ code using inline assembly language. What I've tried so far is using the string class's data() member function to return a pointer to the first character in the string. This pointer and the length of the string is then passed to an assembly routine, which performs various manipulations on the string. So far it's working fine on all the fairly short strings I've tried.

There are a few places that mention that the C++ string class encapsulates C-style strings, and c-style string are just character arrays which are contiguous in memory. But I want to make sure that the characters in the STL's string class are guaranteed to be contiguous in memory. If not, these assembly routines are going to have problems in the future.
Question by:Posit
LVL 11

Expert Comment

ID: 8232034
data() returns a char* pointing at contiguous memory that contains the data of the string. c_str() does the same thing after making sure there is a \0 at the end of the string in use. Note that the pointer is not valid across modification of the string.
LVL 15

Accepted Solution

efn earned 400 total points
ID: 8232578
As a practical matter, what you are doing may well work, but it is not strictly guaranteed to work.  But the problem is not that the memory is not contiguous.

Each of the data() and c_str() functions returns a pointer to an array, and the elements of the array are guaranteed to be contiguous.  But you are not supposed to change the contents of the array.  (The pointer is a pointer to const characters.)  The C++ standard says "Requires: The program shall not alter any of the values stored in the character array."  (Section 21.3.6, paragraph 4).  Presumably, if you break this rule, you are venturing into the territory of undefined behavior, where the string class thereafter may or may not work correctly.

Even if you could change the contents of the array, it is also not guaranteed to be where the string keeps its data, although it may be.  The standard says "the member returns a pointer to the initial element of an array whose first size() elements equal the corresponding elements of the string controlled by *this."  So the string class is allowed to keep its data somewhere else in some other format and just produce a copy of it represented as a character array when you call one of those functions.  

A typical implementation of the string class is likely to store its data in a character array and give you a pointer to the actual data, which is why your manipulations are likely to work.  But the specification of the string class does not guarantee that they will work with any standard-conforming implementation.  Of course, inline assembly language isn't guaranteed to work with any standard-conforming implementation either, so all this may not matter to you.

LVL 28

Expert Comment

ID: 8232759
efn's summarized this nicely.  As a practical consideration, the memory layout of different STL implementations of std::string is *not* the same.

I remember an excellent presentation by Scott Meyers (of Effective C++ fame) where he analyzed the memory performance of different std::string classes from different (and commonly used) STL implementations.  They're all over the place.  For example, some use one layout for short strings (<= 15 chars) and another for longer strings.  There's all kinds of jiggery-pokery going on.

So this is a case where the STL implementations really do differ and making assumptions about the memory layout very well may mean that running a different version of your compiler (e.g. Visual C++ 7 vs Visual C++ 6) may get you a completely different result (because they usually plug-in a different STL implementation with each release), never mind mixing across platforms.


Author Comment

ID: 8233161
The STL implementation is VC++ 6.0.

It sounds like it may be better to convert a string class object to a C-style char array before using inline assembly on it. I was hoping to avoid the overhead of doing this, though, since the whole point of using inline assembly was to optimize performance. I would rather be certain, however, than risk "undefined behavior"

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Errors will happen. It is a fact of life for the programmer. How and when errors are detected have a great impact on quality and cost of a product. It is better to detect errors at compile time, when possible and practical. Errors that make their wa…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question