?
Solved

fixed-length string buffer allocated on stack v.s. std::string on heap

Posted on 2006-06-07
12
Medium Priority
?
1,690 Views
Last Modified: 2013-12-14
At times I find I have a string that I know will not exceed a certain length (e.g. a file name).  It seems wasteful to use a std::string for this type of thing because std::string allocates memory on the heap when it could instead be done on the stack.  I'm concerned somewhat about heap fragmentation and new/delete overhead.  I know certain implementations of std::string have a "small string optimization" so that small strings are actually allocated on the stack, but on VC++2005 this size is only 16 characters.

char* will work, but I'd want to guard against buffer overflow, and it does seem a bit inefficient to use O(n) strlen to retrieve string length.

I might be looking for is a template class like boost::array (http://www.boost.org/doc/html/array.html), but I think boost::array<char, n> might not be the most suitable for strings.  For example, size() will not return a value less than n, and I think I'd have to write my own safe versions of functions like strcat.  Though boost::array checks for overflow, std::copy on boost::array seemed to allow overflow of the buffer without warning when I tested it.  The template approach also results in generation of multiple copies of executable code if not done carefully.

No, I'm not going to use the MS safe string library (for portability reasons).

Maybe my concerns about std::string performance are not entirely warranted, but the overhead does seem avoidable.  It would seem good also to avoid writing my own string library too.  Maybe the Safe C String Library (http://www.zork.org/safestr/) will do, but that uses malloc, though it does allow a custom memory allocator.

Surely someone else has already written a useful and well tested class for allocating small strings on the stack and manipulating those strings safely and efficiently(?)
0
Comment
Question by:ext2
  • 5
  • 5
  • 2
12 Comments
 
LVL 4

Expert Comment

by:Woodster
ID: 16859386
You said that you know that the string will not exceed a certain length.

If you know it will not exceed a certain length and you want to allocate a fixed length buffer for the string (again - knowing it will not exceed this length), why do you need to check for buffer overflow.

Obviously it is generally always a good habit to fall into to avoid any memory problems but for the sake of this problem no checking is required if the buffer size length will never be exceeded.
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 16860170
You can use your own allocator with basic_string
0
 
LVL 2

Author Comment

by:ext2
ID: 16865345
The desired buffer length check is only intended as a safeguard for detecting logic errors in the program, possibly only in the debug build.  Indeed, it's not otherwise needed.  Exceeding the buffer length should result in an assertion.

I understand that custom allocators on basic_string are not necessarily portable, especially if they contain state, such as a pointer to a stack-allocated char buffer as would be desired here (http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/dfe1ee97e67ae0e0/99317fd76dff9139?tvc=1&q=basic_string+allocator+custom&hl=en#99317fd76dff9139).

What I was looking for was something like this (http://johnpanzer.com/tsc_cuj/ToolboxOfStrings.html), but that doesn't seem to be maintained.  It gives a bunch of compiler errors under MSVC++ and G++.  It would be nice if something like that were in boost and well maintained.

Some other ideas are here (http://www.ddj.com/dept/cpp/184402023), but none do what I want: either they do dynamic allocation (rather than stack allocation) or they don't store the buffer length inside the buffer (e.g. the BSD-like strlcpy or the MS secure strcpy_s).  The article acknowledges that the latter are prone to buffer overflow for that reason.

Actually, it looks like VC++2005 does buffer overflow detection on built-in char arrays when the "/RTCs" (stack frame run-time error checking) is enabled during debugging (http://msdn2.microsoft.com/en-us/library/8wtf2dfz(d=ide).aspx).  I did some tests, and it seems to work.  I think that may be what I want, and it allows me to continue using the familiar C string functions.

Sometimes I wonder why simple things (like strings) can't be simpler in C/C++!
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 17

Expert Comment

by:rstaveley
ID: 16866522
Stack allocation using _alloca (Microsoft) or the GCC variable sized array extension (cf C99) is of course not portable. It also requires a certain amount of house-keeping to adjust the stack frame pointer etc, so I'd be interested to see how it compares with heap allocation via malloc.

However, you can do other things with allocators. If you needed many strings which were (say) guaranteed never to be longer than 255 characters (like Pascal strings), you could quite easily implement a FixedAllocator, which allocated a fixed 256 bytes each time from a chunk of memory reserved for strings (i.e. one big malloc).

However, you'll find that any functions you write which need to be able to cope with basic_string<> with different allocators will need to be written in template form, because the static type of std::basic_string<char,std::char_traits<char>,ext2::MyFixedAllocator> differs from that of std::basic_string<char,std::char_traits<char>,std::allocator<char> >. That's a show-stopper for a lot of projects :-)

> it looks like VC++2005 does buffer overflow detection

Buffer overflow detection comes at a price too. You'd be advised to profile that and compare it against std::string.
0
 
LVL 4

Expert Comment

by:Woodster
ID: 16866532
As far as I know the only way you are going to guarantee stack allocation is by decalring the buffer as an array within the desired function.  Any other dynamic memory allocation will more than likely end up on the heap which is what you are trying to avoid here.

Templates are not necessarily portable so that will also not fit the purpose that you are looking for.  We had to remove some very basic templates from an app becuase it was not portable to PocketPC devices with the version of the compiler we have.  his may have changed in later versions though.
0
 
LVL 2

Author Comment

by:ext2
ID: 16867397
Here's the type of implementation I was thinking of:

template <int N>
struct char_array
{
    char s[N];
    int length;
};

int main() {
    char_array<5> n;
    ...
}

The character data is allocated on the stack, and this requires no special thing like _alloca.  It's fairly simple.  It would be nice if someone already had a string library based on that concept to avoid reinventing the wheel.

I might relook at basic_string with a custom allocator.  Some people had some warnings about that, which concerned me some.  Using a FixedAllocator to allocate from one big malloc-ed memory area as you mention may do the trick, and it might be operated similar to a stack storage area.  When I say "portability" I mean that only loosely since I really only care about the latest Intel-based MSVC++ and G++ compilers and maybe a few others.

>> it looks like VC++2005 does buffer overflow detection
>Buffer overflow detection comes at a price too. You'd be
>advised to profile that and compare it against std::string.

VC++2005 buffer overflow detection is only intended for debugging because VC++2005 disables these checks when optimization is turned on.  I'm not sure yet whether the overflow detection handles some, most, or all circumstances, but at least it seems less flexible (e.g. can't be used in release builds).

The below code demonstrates an idea I had.  It's somewhat crazy.  It attempts to detect overflows at both compile-time and run-time.  The former is the unique part.  In certain cases (not all), this code triggers a compile-time error when it knows that the fixed-buffer will overflow.  The way this works is to make the buffer size and the "maximum length of data that could possibly exist in the buffer at run-time" both be part of the type.  This means that as you modify the buffer, the type may change.  That may be a bit cumbersome, expecially for a statically typed language like C++.  It would be easier if C++ allowed me to associate mutable data with static types--C++ template metaprogramming has its limits.

#include <cassert>
#include <cstring>
#include <iostream>

using namespace std;

// compile time assertion.
// implemented with template metaprogramming.
template<bool B> class compiler {};
template<> class compiler<true> {
    public: static const int ASSERT = 1;
};
template<> class compiler<false> {
    private: static const int ASSERT = 1;
};

// character string using fixed-sized buffer
template<int N, int M=N>
struct char_buffer {
    // number of characters in buffer
    static const int size = N;
    // maximum number of characters that might be used in buffer
    static const int maxlength = M;
    // character buffer
    char s[N];
    // make behave like char[].
    operator char*() { return s; }
};
template<int N, int M=N>
struct char_const_buffer {
    // number of characters in buffer
    static const int size = N;
    // maximum number of characters that might be used in buffer
    static const int maxlength = M;
    // character buffer
    char s[N];
    // make behave like char[].
    operator const char*() { return s; }
};

// get compile-time information on char_buffer or char[].
// implemented with template metaprogramming.
template <class T>
struct info {
    static const int size = T::size;
    static const int maxlength = T::maxlength;
};
template <int N>
struct info<const char[N]> {
    static const int size = N;
    static const int maxlength = N;
};
template <int N>
struct info<char[N]> {
    static const int size = N;
    static const int maxlength = N;
};
template <>
struct info<const char*> {
    static const int size = INT_MAX;
    static const int maxlength = INT_MAX;
};
template <>
struct info<char*> {
    static const int size = INT_MAX;
    static const int maxlength = INT_MAX;
};

// safe template version of strcpy
template<class D, class S>
inline char_buffer< info<D>::size, info<S>::maxlength >&
strcpy_t(D& d, S& s)
{
    compiler<info<D>::size >= info<S>::maxlength>::ASSERT;
    assert(info<D>::size >= strlen(s)+1);
    strcpy(d, s);
    return *reinterpret_cast<
        char_buffer< info<D>::size, info<S>::maxlength >*
    >(static_cast<char*>(d));
}

// safe template version of strcat
template<class D, class S>
inline char_buffer< info<D>::size, info<D>::maxlength + info<S>::maxlength - 1 >&
strcat_t(D& d, S& s)
{
    compiler<info<D>::maxlength + info<S>::maxlength - 1 <= info<D>::size>::ASSERT;
    assert(strlen(d) + strlen(s) - 1 <= info<D>::size);
    strcat(d, s);
    return *reinterpret_cast<
        char_buffer< info<D>::size, info<D>::maxlength + info<S>::maxlength - 1 >*
    >(static_cast<char*>(d));
}

// downcast max maxlength
template<int M, class S>
char_buffer< info<S>::size, M >&
maxlength_cast(S& s) {
    compiler< M >= 0 && M <= info<S>::size >::ASSERT;
    assert(strlen(s) + 1 <= M);
    return *reinterpret_cast<
        char_buffer< info<S>::size, M >*
    >(static_cast<char*>(s));
}
template<int M, class S>
char_const_buffer< info<S>::size, M >&
maxlength_const_cast(S& s) {
    compiler< M >= 0 && M <= info<S>::size >::ASSERT;
    assert(strlen(s) + 1 <= M);
    return *reinterpret_cast<
        char_const_buffer< info<S>::size, M >*
    >(const_cast<char*>(static_cast<const char*>(s)));
}


int main()
{
    char s1[5] = "a";
    char s2[6] = "b";

    char_buffer<5,5>& s1b = strcat_t(strcpy_t(s1, maxlength_cast<2>(s2)), "zzz");
    //strcat_t(s1b, "z"); // overflow triggers compile time assertion.

    const char * p = "ab";
    strcat_t(maxlength_cast<2>(s2), maxlength_const_cast<3>(p));

    cout << s1 << " " << s2 << endl;
}

0
 
LVL 17

Accepted Solution

by:
rstaveley earned 300 total points
ID: 16868028
There are two weaknesses with template <int N> struct char_array that spring to mind:

(1) You cannot use a variable for N; it needs to be known at compile time.
(2) Any function that takes a char_array<> parameter, needs to be templatised (because each different value of N results in a different static type) and you will be subject to code bloat of you deal with many different values of N.

You could improve matters for (2) perhaps by using a base class, which wasn't templatised and provided you with the length of the string, but not the storage. Then your functions should be implemented for the base class (e.g. see function f() below):

--------8<--------
#include <iostream>
#include <cstring>

template<typename CharT> struct basic_str_base {
      size_t capacity;
      CharT buf[1];            // One character array last data member in struct

      basic_str_base(size_t capacity) : capacity(capacity) {}
};

template<typename CharT,size_t Capacity> struct basic_str : public basic_str_base<CharT> {
      CharT buf_ex[Capacity-1];

      basic_str() : basic_str_base<CharT>(Capacity) {}
      basic_str(const char* str) : basic_str_base<CharT>(Capacity) {strncpy(buf,str,Capacity);}
      basic_str(const basic_str_base<CharT>& rhs) : basic_str_base<CharT>(Capacity) {strncpy(buf,rhs.buf,Capacity);}
};

typedef basic_str_base<char> str_base;
typedef basic_str_base<wchar_t> wstr_base;

void f(const str_base& s)
{
      std::cout << "Capacity of s is " << s.capacity << " and contents are '" << s.buf << "'\n";
}

int main()
{
      typedef basic_str<char,128> str128;
      typedef basic_str<char,512> str512;
      typedef basic_str<char,1024> str1024;

      str128 my128("my128");
      str1024 my1024("my1024");
      str512 my128copy(my128);

      f(my128);
      f(my1024);
      f(my128copy);
}
--------8<--------
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 16868093
Afterthought...

Beware of slicing, if you do go the str_base route though.

--------8<--------
void f(const str_base /*&*/ s)    // Oops didn't make it a reference, and passing by copy is very dangerous here!!
{
     std::cout << "Capacity of s is " << s.capacity << " and contents are '"
                 << s.buf    /* <- Don't expect the 2nd character to have been copied */
                 << "'\n";
}
--------8<--------

You can guard against accidental passing by copy by making the copy constructor private.

Also, you will not be able to use an STL container of str_base for the same reason.
0
 
LVL 2

Author Comment

by:ext2
ID: 16870322
> (1) You cannot use a variable for N; it needs to be known at compile time.

Yes.  More precisely, the maximum value of N must be known if we allow space to remain unused (e.g. a file name might have a maximum length of 255 chars).

> (2) Any function that takes a char_array<> parameter, needs to be templatised..will be subject to code bloat...

Not necessarily.  For this reason, my strcpy_t and strcat_t template functions are both marked as inline and are merely thin wrappers around the regular functions.

Concerning splitting the char buffer as above, I don't believe the standard guarantees that buf and buf_ex are placed adjacent to each other, though the compiler likely will in practice, and there are compiler extensions than can force padding constraints.

Below is the simpler version of my code that only does run-time checks (not compile-time given before, which I'm not sure are that practical in C++).

#include <iostream>
using namespace std;

template<int N> class char_array
{
private:
    char m_buf[N];
    int m_length;
public:
    static const int capacity = N;

    char_array() : m_length(1) {
        assert(capacity >= 1);
        m_buf[0] = '\0';
    }
    char_array(char * s) {
        m_length = strlen(s) + 1;
        assert(capacity >= m_length);
        memcpy(m_buf, s, m_length);
    }
    template< template<int M> class char_array2, int M >
    char_array(char_array2<M>& other) : m_length(other.m_length) {
        assert(capacity >= other.m_length);
        memcpy(m_buf, other.m_buf, other.m_length);
    }

    char * c_str() { return m_buf; }
    int length() { return m_length; }

    template <int M> friend class char_array;
};

template <class D, class S>
void strcat_t(D& d, S& s) {
    assert(D::capacity >= d.length() + s.length() - 1);
    strcat(d.c_str(), s.c_str());
}

int main()
{
    char_array<6> s1("abc");
    char_array<4> s2(s1);
    char_array<3> s3("zz");
    strcat_t(s1, s3);

    cout << s1.c_str() << " " << s2.c_str() << endl;
    return 0;
}




0
 
LVL 2

Author Comment

by:ext2
ID: 16870347
Probably some errors above (e.g. strcat_t does not update m_length).
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 16871219
> Concerning splitting the char buffer as above, I don't believe the standard guarantees that buf and buf_ex are placed adjacent to each other, though the compiler likely will in practice, and there are compiler extensions than can force padding constraints.

The standard does guarantee you the order of member placement as long as you don't put an intervening access specifier (i.e. public/protected/private). Here is the relevant bit from ISO/IEC 14882:1998(E):

9.2.12 12 Nonstatic data members of a (nonunion) class declared without an intervening access specifier allocated so that later members have higher addresses within a class object.

Padding doesn't matter because you will have at least enough buffer size. So your first character will go into buf and subsequent characters will either go buf_ex or padding between the two.
0
 
LVL 2

Author Comment

by:ext2
ID: 16879585
You make a good point about the padding.  I still mainly like the generic programming (boost::array like) approach with inlining.  Here's my current code (checked_array.hpp) on this: http://math2.org/david/checked_array/
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
Update (December 2011): Since this article was published, the things have changed for good for Android native developers. The Sequoyah Project (http://www.eclipse.org/sequoyah/) automates most of the tasks discussed in this article. You can even fin…
The viewer will learn how to use and create keystrokes in Netbeans IDE 8.0 for Windows.
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question