Link to home
Start Free TrialLog in
Avatar of forums_mp
forums_mp

asked on

char* versus vector <char*>


The development environment centers around QT - which might not be appropriate here.  My question though centers around char* with a vector of char* and the ramifications.  So now consider:

In a header file
//  header.h
  struct msg_header_t {
     unsigned int  msg_id;
     unsigned int  msg_count;
     unsigned int  msg_size;
     unsigned int  msg_chksum;
  };

 struct t01_msg
 {
    msg_header_t msg_header_;
    // more
 };

 struct incoming_msg {
    t01_msg    t01_msg_;
 };

////////// end header.h

// later
class MainWinImpl : public MainWin
{
  Q_OBJECT
public:
  MainWinImpl(QWidget *parent = 0L);
  ~MainWinImpl();

private:

  char           *mBuffer;
  int             mBufPos;
  int             mMsgSize;

};
#endif


MainWinImpl::MainWinImpl(QWidget *parent): MainWin(parent)
{
  // later
  mCount = 0;
  mBufPos = 0;
  mBuffer = new char[10000];
  mMsgSize = -1;

  // more
}

void MainWinImpl::slotRead()
{
  int l = mSocket->readBlock(mBuffer + mBufPos, 10000);
  mBufPos += l;
  if(mMsgSize == -1)
    if(mBufPos >= sizeof(msg_header_t))
    {
      // Header ready
      msg_header_t *header = (msg_header_t *)mBuffer;
      mMsgSize = header->msg_size;
    }
  if(mMsgSize != -1)
    if(mBufPos >= mMsgSize)
    {
      // parse message
      incoming_msg *msg = (incoming_msg *)mBuffer;
      parseMessage(msg);
      for(int i = mMsgSize; i < mBufPos; i++)
        mBuffer[i - mMsgSize] = mBuffer[i];
      mBufPos -= mMsgSize;
    }
}

Would it be safe to replace the dynamic allocation (below) with a vector of char*?  Certainly wont be any contiguous memory issues here?

  mBuffer = new char[10000];

Conversely - theres no guarantee that the vectors storage is contiguous give:
 
    void ReadData(void* destination, unsigned count);

    std::vector<char> v(count);
    ReadData(&v[0], count);

This would/might be better.

    char* b(new char[count]);
    ReadData(b, count);
  // later - at destruction time
   delete[] b;

Avatar of Axter
Axter
Flag of United States of America image

Hi forums_mp,
>> Conversely - theres no guarantee that the vectors storage is contiguous give:

Yes there is.  IAW C++ standards, the vector storage is garanteed to be continuous.

I recommend using vector over using new[] operator.  It's much safer, and easier to use.


David Maisonave :-)
Cheers!
ASKER CERTIFIED SOLUTION
Avatar of efn
efn

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of cup
cup

How fast do you want the code to be?  Accessing mbuffer[x] when mbuffer is a vector is a lot slower than if it were a char array.  Just step into it in the debugger and you'll realize how bad it is.  Good excuse to get a faster machine though :)
>>Accessing mbuffer[x] when mbuffer is a vector is a lot slower than if it were a char array.

That depends how you accesss it.
If your iterating through the buffer, and performance is an issue, then you do want to go with a vector, because a vector can out perform a regaular buffer when iterating if you use iterators VS operator [].
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I compiled the above code in VC++ 6.0 and in release mode, and I got the following results:
Time duration is as follow:
C_Style_static_2dArray = 93156
C_Style_dynamic_2dArray = 106031
vec_2dArray op[] = 75563
vec_2dArray iterators = 84578
Press any key to continue . . .

The above results show vector out performing the C-Style buffers even when using operator[].

FYI:   Ignore the 2dArray wording.  I extracted this test from a 2dArray test program, and I forgot to change the names.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>>>> How fast do you want the code to be?  

The performance issue can be ignored especially if handling one single buffer. Differences for one single call are a matter of nanoseconds that can't be measured.

>>>> char* buffer = new char[10000];
>>>> vector<char> v(10000);

The technique to create a large buffer big enough to meet all requirements now, might to be overthought:

(1) Times are changing and 10000 bytes now may be too short tomorrow.
(2) If actually messages have only a few bytes, a 10000 bytes buffer seems to be an overkill.

A better idea might be to send the size of a message at the very beginning. Then a read opetration first reads the size and allocates the needed buffer size before second read.

Regards, Alex
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>>> So I would highly recommend against using this type of logic if you have a program that is sending many packets.

Axter, I didn't recommmend to send two packages but use a struct like that:

struct message
{
     unsigned int    siz;
     unsigned char  buf[128];
};

Then, recv would read 'siz' member, allocate a new buffer and read the rest.


Regards, Alex

Avatar of forums_mp

ASKER


Appreaciate all the resposnes.   What it amounts to - in effect - is to rethink my approach to  C style arrarys.  As efn pointed out:  "In most cases, if you are dealing with an API that wants a character array, it will be simplest just to use a character array".   However,  faced with this API

int          readBlock( char *data, uint maxlen );

Now I've got three choices with regard to reading data transmitted to me across ethernet.  I 'forgot' max size.  

////
  char buffer[1000];  
  int len = socket->readBlock(&buffer[0],1000);  // or sizeof(buffer) for maxlen
}
 
OR
 
////
  char *buffer = new char [1000];  
  int len = socket->readBlock(&buffer[0],1000);  // or sizeof(buffer) for maxlen
 // must delete here though
 delete [] buffer;
}

OR

////
  vector<char>buffer(1000);
  int len = socket->readBlock(&buffer[0], buffer.size());
}

The downside has been pointed out by Alex.  In that 1000 bytes might not be 1000 bytes tomorrow.  However (Alex), I'm somewhat confused on how to accomplish this:
" Then a read opetration first reads the size and allocates the needed buffer size before second read. "
IOW  readBlock expects a maxlen param.  It's only after I've performed the read that I'm aware of the actual length of the data.  So ..

Axter:
>>>>>>>>A more efficient method would be to assume a minimum size for the packet, and include in the minimum size packet data to indicate if there’s going to be a follow-up packet, and the size of the follow-up packet.

Could you elaborate on this.  I dont thikn I'm following you here.  First note that ALL messages include a header.  The header includes the msg_size paramter.

  struct msg_header_t {
     unsigned int  msg_id;
     unsigned int  msg_count;
     unsigned int  msg_size;
     unsigned int  msg_chksum;
  };

 
 
>>Could you elaborate on this.  I dont thikn I'm following you here.  First note that ALL messages include a header.  The header includes the msg_size paramter.

When you send your packet, you can include information that can indicate there's a second packet comming.
struct msg_header_t {
     unsigned int  msg_id;
     unsigned int  msg_count;
     unsigned int  msg_size;
     unsigned int  msg_chksum;
     unsigned int  msg_This_Packet_Size;
     unsigned int  msg_Second_Packet_Size;
};

Your receiving code can check msg_Second_Packet_Size value, and if that value is greater then zero, it knows that there is a second packet and it knows the total size of the second packet.


>>Axter, I didn't recommmend to send two packages but use a struct like that:

Sorry, I misunderstood your comment.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Perhaps a std::string might be a better choice.  A string has both a reserved size and a separate length.  You can preallocate the string buffer to give it a certain length (string.reserve(size_type)), and even if you append more data to it than the buffer will hold it will automatically reallocate its buffer to accomodate the new data.

std::strings's are comfortable containing '\0' characters.  You can append buffers onto them using string.append(const char * srcbuf, size_t size).  They work well with output stream << operators.
>>Perhaps a std::string might be a better choice.

Many C++ programmers consider it inappropriate to store binary data in a std::string container, and recommend using std::vector<char> over std::string for binary storage.
>>>> Many C++ programmers consider it inappropriate to store binary data in a std::string

Yes, and I am one of these ;-)  

However, I derived String and Buffer classes from my own StringBuf baseclass and it worked fine - including appropriate cast operators.

Regards, Alex
>Many C++ programmers consider it inappropriate to store binary data in a std::string container, and recommend using std::vector<char> over std::string for binary storage.

Where can I read up on this?
While I'm on the topic of API limitations etc.  I'd like to expand pose another question:


#ifndef  HEADER_H
#define HEADER_H

  struct msg_header_t {
     unsigned int  msg_id;
     unsigned int  msg_count;
     unsigned int  msg_size;
     unsigned int  msg_chksum;
  };
 struct t01_msg
 {
  signed int       xdircosine  :16;  // X Direction Cosine  - LSB 2^-15
  signed int       ydircosine  :16;  // Y Direction Cosine  - LSB 2^-15
  signed int       zdircosine  :16;  // Z Direction Cosine  - LSB 2^-15
    // more
 };

 struct outgoing_msg {
    msg_header_t msg_header_;
    t01_msg  t01_msg_;
    // more
 };

# endif

// header main_win.h

# include "header.h"
class main_win_impl : public main_win
{
  Q_OBJECT
  main_win_impl(QWidget *parent);
  ~ main_win_impl();

// later
private:
  QSocket         *socket;
  outgoing_msg  mPacket;
};

// main_win_impl.cpp
// later
#include <algorithm> //required for std::swap

#define ByteSwap5(x) ByteSwap((unsigned char *) &x,sizeof(x))

static void ByteSwap(unsigned char * b, int n)
{
   register int i = 0;
   register int j = n-1;
   while (i<j)
   {
      std::swap(b[i], b[j]);
      i++, j--;
   }
}

main_win_impl::main_win_impl(QWidget *parent)
  : main_win (parent)
{
  // later
  socket = new QSocket();
}

void main_win::send_message()
{
  if ( socket->state() == QSocket::Connected )
  {
     mPacket.msg_header_.msg_count  = mCount,  mCount++;
//   for (int idx(0); idx < sizeof(mPacket); ++idx)
//      ByteSwap5(  );
    socket->WriteBlock( (char *)&mPacket, sizeof(mPacket));
  }
}

before I call WriteBlock I'd like to 'perform an endian conversion' on MPacket.   So I've tried reinterpret_cast on mPacket to a char* with little success... How would I achieve this?

NOTE:  I obtained advise on handling bit fields when dealing with different machines.  I'm investigating how to achieve the end result based on advice given.  In the meantime....

>>>> Where can I read up on this?

One of the major requirements for a string class is to be a substition of a C char array. So zero termination is one of the first properties a string class has to support. Binary data in a string definitively destroys that property.

Regards, Alex
Are you talking generally about a class that implements c-style null terminated strings, or std::string?  std::string specifically supports null characters in the body of the string.  My copy of the C++ standard defines a std::string like this:

"For a charlike type charT, the template class basic_string describes objects that can store a sequence consisting of a varying number of arbitrary charlike objects (21)."

Nothing specifically is said about null characters, but I believe that "arbitrary charlike objects" covers it.  

std::string's are basically implemented as a byte buffer, an allocation-length int, and a used-length int.  This enables std::string to contain arbitrary characters, including null characters.

Of course, if you use std::string.c_str() on a string containing null characters, you're going to be disappointed.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>>>> 'perform an endian conversion'

endian issue only applies for integer types (int16, int32, int64). You may use htons/htonl to convert int16/int32 from little-endian (PC) to big-endian and ntohs/ntohl for the inverse operation.

C++ casting of C types wouldn't change byte-order and is independent of endianess.

Generally, most hardware is little-endian while the network is big-endian (hton means host-to-network). You always should use the same endianess when sending messages. The sender converts all integer data to the needed endianess - if necessary. The receiver has to decide whether it has to reconvert or not.

Regards, Alex

Only integer types. I see.

This brings about an interesting dilema.  The PC is LITTL endian.  The power pc is BIG.  I'm using the APIs provided by QT to transmit messages so I'll have to do the conversion up front.  That said I dont think hton applies in this case.  Am I way off?
>>>> I'm using the APIs provided by QT to transmit messages

That doesn't matter as long as the API doesn't take care of integer members in your struct. If you send the whole buffer, endianess isn't an issue when transmitting data from a PC. The receiver (MAC) would need to use htonl on all integer members. If the sender is big-endian you have to convert integers by ntohs/nthol before sending. PC already gets the correct values and doesn't need to convert.

Note, when sending structs from one platform to another, you need to care about alignment. Different compilers may have different alignments. A struct should be portable if all double, int64 or struct members have an offset that could be divided by 8 and all other members at a 4-byte offset. You could test is by printing sizeof(my_struct) at both platforms what must be equal.

Regards, Alex
From your last post:  >>>> endianess isn't an issue when transmitting data from a PC.
Maybe I misunderstood your 11:25PM PDT post.  More specifically
>>  The sender converts all integer data to the needed endianess - if necessary.
I'm of the impression that the sender - in this case a PC (little endian) needs to perform the conversion prior to sending to the receiver.

Similary the receiver - when it's ready to transmit - needs to convert to prior to sending.


>>> Note, when sending structs from one platform to another, you need to care about
>>>alignment. Different compilers may have different alignments. A struct should be
>>>portable if all double, int64 or struct members have an offset that could be divided by 8
>>>>and all other members at a 4-byte offset. You could test is by printing sizeof(my_struct) at both platforms what must be equal

I did and in this case they're equal.   Here's what - in summary - I've come to learn abotu this issue:  

1. Take a good look at the documentation for your compiler(s). Look
especially for things like alignment of bitfields within a storage
unit, order of allocation of bitfields and how bitfields are handled
which might span multiple storage units.

OR

2. For each compiler, create a header and source file containing the
structures with the bitfields and some functions to read/write the
bitfield (optionally, you could include a couple functions to convert
the structures to/from a buffer of bytes). The bitfield should be
declared such that the compiler will give it the same memory layout as
the external interface requires.

OR

3. As a fall-back, create a header and source file with a
straight-forward structure (no bitfields) that can hold the data of the
interface and the same read/write functions as before.
In this case, the read/write functions have to perform some bit
manipulations to convert the data in the structure into the form that
is required for the interface.


4. Configure build system such that for known compilers, the
compiler specific header and implementation are used, and for unknown
compilers the fall-back implementation.

///////////////////
It appears option 3 is my most viable...  Like all of us, I like to do it right the first time and have a fallback design that I know is sound.  I'm still studying how to achieve my object though.  (I wish experts-exachange had a QT forum) so I could work with an expert (REAL code - like we did prior) on this.  
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Just ran Axter's test and it fell over in a heap.  [(++s)&8] gives either 0 or 8 and 8 is outside the array bounds.   I changed 8 to 7 and got some strange results on VS.net 2003

For chars, there is virtually no difference.
For shorts and integers, vectors outperform arrays
For long long, arrays outperform vectors.
>> Just ran Axter's test and it fell over in a heap

Could you elaborate on this.  "Fell over in a heap"??
cup, By the way, post your results...
"Fell over in a heap" = SEGV and crashitis.

Results (number of iterations = 10000)

char
C_Style_static_2dArray = 50
C_Style_dynamic_2dArray = 100
vec_2dArray op[] = 100
vec_2dArray iterators = 90

short
C_Style_static_2dArray = 60
C_Style_dynamic_2dArray = 181
vec_2dArray op[] = 140
vec_2dArray iterators = 90

long
C_Style_static_2dArray = 50
C_Style_dynamic_2dArray = 100
vec_2dArray op[] = 100
vec_2dArray iterators = 100

long long
C_Style_static_2dArray = 50
C_Style_dynamic_2dArray = 181
vec_2dArray op[] = 190
vec_2dArray iterators = 180

These were done on Visual C++ 2005 Beta.  I got completely different results on Visual C++ 2003 but I cannot rerun that test as I've removed both the compiler and executables.  Didn't see any point in having 4 variants (6.0, 2002, 2003, 2005) of Microsoft compilers.  Results are almost the same on Linux and on Code Warrior.  I've changed the Axter's test to

template <class type> void tryit ()
{
      const type SrcBuf[] = {1,2,3,4,5,6,7,8};
      const int QtyTestItera = 10000, BufferSize = 999;
...
replacing all the significant ints with type.  100000 was just too long to wait.  The main looks like

int _tmain(int argc, _TCHAR* argv[])
{
      printf ("char\n"); tryit <char>();
      printf ("short\n"); tryit <short>();
      printf ("long\n"); tryit <long>();
      printf ("long long\n"); tryit <long long>();
     system("pause");

     return 0;
}

Basically, iterators are faster if you are using chars and shorts.  Once you get bigger than an int, the performance is almost the same.  Using memory from the stack is somehow always faster than that from the heap.