is data in c++ string class contiguous?

Posted on 2003-03-29
Medium Priority
Last Modified: 2006-11-17
Is the data in the STL's string class guaranteed to be contiguous?

I'm asking because I'm just starting to do spot optimization of C++ code using inline assembly language. What I've tried so far is using the string class's data() member function to return a pointer to the first character in the string. This pointer and the length of the string is then passed to an assembly routine, which performs various manipulations on the string. So far it's working fine on all the fairly short strings I've tried.

There are a few places that mention that the C++ string class encapsulates C-style strings, and c-style string are just character arrays which are contiguous in memory. But I want to make sure that the characters in the STL's string class are guaranteed to be contiguous in memory. If not, these assembly routines are going to have problems in the future.
Question by:Posit
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 11

Expert Comment

ID: 8232034
data() returns a char* pointing at contiguous memory that contains the data of the string. c_str() does the same thing after making sure there is a \0 at the end of the string in use. Note that the pointer is not valid across modification of the string.
LVL 15

Accepted Solution

efn earned 400 total points
ID: 8232578
As a practical matter, what you are doing may well work, but it is not strictly guaranteed to work.  But the problem is not that the memory is not contiguous.

Each of the data() and c_str() functions returns a pointer to an array, and the elements of the array are guaranteed to be contiguous.  But you are not supposed to change the contents of the array.  (The pointer is a pointer to const characters.)  The C++ standard says "Requires: The program shall not alter any of the values stored in the character array."  (Section 21.3.6, paragraph 4).  Presumably, if you break this rule, you are venturing into the territory of undefined behavior, where the string class thereafter may or may not work correctly.

Even if you could change the contents of the array, it is also not guaranteed to be where the string keeps its data, although it may be.  The standard says "the member returns a pointer to the initial element of an array whose first size() elements equal the corresponding elements of the string controlled by *this."  So the string class is allowed to keep its data somewhere else in some other format and just produce a copy of it represented as a character array when you call one of those functions.  

A typical implementation of the string class is likely to store its data in a character array and give you a pointer to the actual data, which is why your manipulations are likely to work.  But the specification of the string class does not guarantee that they will work with any standard-conforming implementation.  Of course, inline assembly language isn't guaranteed to work with any standard-conforming implementation either, so all this may not matter to you.

LVL 28

Expert Comment

ID: 8232759
efn's summarized this nicely.  As a practical consideration, the memory layout of different STL implementations of std::string is *not* the same.

I remember an excellent presentation by Scott Meyers (of Effective C++ fame) where he analyzed the memory performance of different std::string classes from different (and commonly used) STL implementations.  They're all over the place.  For example, some use one layout for short strings (<= 15 chars) and another for longer strings.  There's all kinds of jiggery-pokery going on.

So this is a case where the STL implementations really do differ and making assumptions about the memory layout very well may mean that running a different version of your compiler (e.g. Visual C++ 7 vs Visual C++ 6) may get you a completely different result (because they usually plug-in a different STL implementation with each release), never mind mixing across platforms.


Author Comment

ID: 8233161
The STL implementation is VC++ 6.0.

It sounds like it may be better to convert a string class object to a C-style char array before using inline assembly on it. I was hoping to avoid the overhead of doing this, though, since the whole point of using inline assembly was to optimize performance. I would rather be certain, however, than risk "undefined behavior"

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Templates For Beginners Or How To Encourage The Compiler To Work For You Introduction This tutorial is targeted at the reader who is, perhaps, familiar with the basics of C++ but would prefer a little slower introduction to the more ad…
IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question