Pointer cast

Consider the source snippet below.    'Type  *pFoo_ = *(Type **)buf; '  produce a core dump ("encountered a problem needs to close")  on my box.  

It's unclear to me why or when 'Type  *pFoo_ = *(Type **)buf; would be valid.  I tried a contrived example:

 // unsigned char buff [ 100 ] [ 200 ];
  unsigned char **buff = new unsigned char* [ 300 ] ;
  Type  *pFoo_ = *(Type **)buff;
  if ( pFoo_ ) {
    int xx = 15 ;
    pFoo_->abc = 15 ;
    std::cout <<  pFoo_->abc << std::endl;
  }

Code still bombed,hence I'm confused on how the code above - albeit 'Type" is contrived works in source code I'm looking at.     Clarity appreciated, thanks


# include <iostream>

int main() {

  struct Type {
    int abc ; 
  };

  unsigned char *buf = new unsigned char [ 100 ] ;
  Type  *pFoo = (Type *)buf; 
  if ( pFoo ) {
    pFoo->abc = 15 ;
    std::cout <<  pFoo->abc << std::endl; 
  }
  std::cout << ".... " << std::endl; 

  Type  *pFoo_ = *(Type **)buf; 
  if ( pFoo_ ) {
    int xx = 15 ; 
    pFoo_->abc = 15 ;
    std::cout <<  pFoo_->abc << std::endl; 
  }

  return ( EXIT_SUCCESS ) ;
}

Open in new window

forums_mpAsked:
Who is Participating?
 
rushtoshankarSr Manager Software DevelopmentCommented:
sorry for the confusion. i slightly misunderstood the question.

the statement "It is simple difference between one dimension vs two dimensional array access" --- has to be
"It is simple difference between pointer vs pointer to pointer"
dont get confused pointer to pointer with two dimensional array.

e.g.
int a, *p = &a, **ptp = &p;
int **two = (int**) mallc(100);
two[0][0] = 0x1010;


_pFoo_ = *(Type **)ptp is valid
while
_pFoo = *(Type **) two is always in valid because,

when you do pFoo->abc the first statement
_pFoo_ = *(Type **)ptp  ===> _pFoo_ = *ptp (now ptp is of type Type**)
===> _pFoo = p (which is of type Type *

_pFoo is now pointing to the address pointed by p which is a
_pFoo->abc = 10 alters the value at a because the address of abc and a are same

when you look at the second statement
_pFoo_ = *(Type **)two  ===> _pFoo_ = *two (now two is of type Type**)
===> _pFoo = 0x1010 (which is of type Type *)

_pFoo is now pointing to 0x1010
the statement _pFoo->abc = 10 tries to alter a value at 0x1010 which is invalid
0
 
rushtoshankarSr Manager Software DevelopmentCommented:
It is simple difference between one dimension vs two dimensional array access

--- Type  *pFoo = (Type *)buf; ----
In this case, *pFoo is pointing to the place where buf is pointing to
e.g. address if buf is 0x1000. address of pFoo is 0x2000
value at 0x1000 is 0x11000 (value stored in buf).
value at 0x2000 is also 0x11000 (value stored in pFoo)

so buf is pointing to 0x11000 so is pFoo

--- Type  *pFoo_ = *(Type **)buf; ----  (for clarity, i rewrite the statement as  *pFoo_ = (Type **)*buf;)
e.g. address if buf is 0x1000. address of pFoo_ is 0x2000
value at 0x1000 is 0x11000 (value stored in buf).
data at 0x11000 is 0x1234 (this is the value at the location pointed by buff)

now the statement stores the following value in pFoo_
so value at 0x2000 is 0x1234 and not 0x11000.

Hope this helps. If you still have any doubt, we are here to help :)

0
 
tampnicCommented:
The code doesn't work as written because you are using a doubly-indirected pointer to buf. Its to do with the order that the compiler sorts out the casts when doing the assignment.

You need to say  'Type  *pFoo_ = *(Type **)&buf; '
 
&buf is doubly-indirected, it gets dereferenced to Type*, then the assignment occurs correctly.

Cheers,
  Chris


# include <iostream>

int main() {

  struct Type {
    int abc ; 
  };

  unsigned char *buf = new unsigned char [ 100 ] ;
  Type  *pFoo = (Type *)buf; 
  if ( pFoo ) {
    pFoo->abc = 15 ;
    std::cout <<  pFoo->abc << std::endl; 
  }
  std::cout << ".... " << std::endl; 

  Type  *pFoo_ = *(Type **)&buf; 
  if ( pFoo_ ) {
    int xx = 15 ; 
    pFoo_->abc = 15 ;
    std::cout <<  pFoo_->abc << std::endl; 
  }

  return ( EXIT_SUCCESS ) ;
}

Open in new window

0
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

 
tampnicCommented:
To clarify my post ...

*(Type**) ... will pass compiler type-checking as it boils down to a Type*.

When you make the assignment "Type  *pFoo_ = *(Type **)buf;" the compiler understands this as "make pFoo_ equal to the contents of the Type double-pointer buf". However "buf" holds the value of a pointer to Type, not a double-pointer, so you have to assign the address of buf (&buf) for this to work correctly.

Cheers,
  Chris
 
0
 
tampnicCommented:
rushtoshankar: message 37006562 simply restates my solution in 37006529 with a bit of example code. Maybe its a language thing? A "doubly-indirected pointer", sometimes shortened to "double-pointer", means exactly the same as "pointer-to-pointer".

Cheers,
  Chris
0
 
rushtoshankarSr Manager Software DevelopmentCommented:
I accept that my statement restates your solution.
My comment just has the detailed steps to make the things clear to understand. That is all.
Actually, I didn't  refresh this page when i submitted my comment.

A small difference between the terms "double pointer" and "pointer to pointer" (depends on the context).
Pointer to pointer can only be used to refer a pointer that can be dereferenced twice to attain the value at a memory.
But double pointer can be used in both this scenario as well as to refer two dimensional array context.
Pointer to pointer gives slightly different meaning when you use it to refer two dimensional array.
It is like the difference between square and rectangle.
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> It's unclear to me why or when 'Type  *pFoo_ = *(Type **)buf; would be valid.

In your example not only are you casting from a char pointer to a Type pointer but you are also changing the level of indirection as part of that cast. The result of this is unspecified according to the C++ standard (meaning, anything could happen). The result of casting from one point type to another and then trying to use that pointer are unspecified. So, it may or may not do what you want but it may also (as you've just seen) cause your application to crash.

Specifically, the standard states:

"A pointer to an object can be explicitly converted to a pointer to an object of different type. Except that
converting an rvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types
and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type
yields the original pointer value, the result of such a pointer conversion is unspecified."

In other words, the only safe thing you can do when casting a pointer to a different type is to cast it back to the original type. Anything else you do with this pointer will result in unspecified behaviour.

Unspecified behavior: "behavior, for a well-formed program construct and correct data, that depends on the implementation. The implementation is not required to document which behavior occurs."

There are also issues that relate to strict aliasing.

So, to put this another way, unless you really really understand what you are doing casting from a char array to a type should be avoided.
0
 
evilrixSenior Software Engineer (Avast)Commented:
And just to explain the semantics of what your snippet example does...

You are creating a array of pointers to char (none of which point to valid memory). You are then casting a pointer to an array of pointers to char to be a pointer to a pointer to Type and then trying to dereference it to get a pointer to Type. You then try and access member abc... so you are now trying to dereference memory that is completely invalid and execute whatever is there as a function.

If this were to every work it would be by pure chance and certainly not design.

Your second full code example differs slightly in that you have an array of chars and not pointers to char but otherwise the problem is exactly the same; you are trying to execute uninitialised memory.

FWIW, if you really want to do something like this you can use placement new (which I am guessing is what you were trying to do?). This works because it actually initialises the memory properly.

It's worth reading what C++Lite has to say about this though as there are caveats to be aware of.
#include <iostream>

struct Type {
   int abc ; 
};

int main()
{
   unsigned char *buff = new unsigned char [ 300 ] ;
   Type  *pFoo_ = new (buff) Type; // 'creates' Type in buff array (the first sizeof(Type) bytes are used!)
   if ( pFoo_ ) {
      int xx = 15 ; 
      pFoo_->abc = 15 ;
      std::cout <<  pFoo_->abc << std::endl; 
   }

   pFoo_->~Type(); // we have to explicitly call the destructor.

   delete [] buff; // delete the buffer
}

Open in new window

0
 
tampnicCommented:
Evilrix: The way I understand the piece of the standard you quoted, applied to this specific instance, is that unspecified behaviour occurs if one tried to access Type  *pFoo_ when cast back to the original type i.e. (unsigned char*)pFoo_. The posters original snippet didn't attempt that, so the compiler can produce working code (VS10 on win7 and GCC 4.5.1 on linux both produce working executables when the original code is amended to include the ampersand I suggested) ... or am I not reading that correctly? (I'm not 100% confident here!)

AFAI can tell, he's allocated memory on the heap with "new" using an array of bytes (in the C standard 'unsigned char' and 'byte' are equivalent I believe) then dumped some Type objects into that memory through a pointer to the beginning of the array. "void *buf = new unsigned char [ 100 ] ;" might be a better declaration, as use of the void pointer hints that you are going to cast other types into the buffer.

IMO its sloppy not to use stronger typing in the memory allocation, as the code is only dumping one type of variable into the buffer. If there is a true requirement to dump different types into the buffer, maybe a redesign rather than a refactor is necessary to facilitate stronger typing. Loosely typed memory can be very difficult to debug.

Cheers,
  Chris
0
 
evilrixSenior Software Engineer (Avast)Commented:
>>  or am I not reading that correctly? (I'm not 100% confident here!)

"Except that converting an rvalue of type “pointer to T1” to the type “pointer to T2” and back to its original type
yields the original pointer value, the result of such a pointer conversion is unspecified."

...is pretty clear to me.

The result of using the cast pointer (apart from casting back to the original pointer) is unspecified... that isn't the same as undefined.

Note, unspecified and undefined have very specific definitions in the standard. Unspecified basically means platform/compiler specific and undefined means the code is malformed.

The case is unspecified but the example code will result in undefined behaviour regardless because it is trying to dereference uninitialised memory. In other words, there are multiple issues with the examples shown.

As I also noted, this cast breaks the rules of strict aliasing. Only the following are well defined as far as the standard is concerned:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
* a type compatible with the effective type of the object,
* a qualified version of a type compatible with the effective type of the object,
* a type that is the signed or unsigned type corresponding to the effective type of the object,
* a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
* an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
* a character type.

Any thing else is either unspecified or undefined.

>> then dumped some Type objects into that memory
Well, that's not what the code shows... all I see is a pointer cast of uninitialised memory (allocated, but as yet uninitialised) and then an attempt to use it in a different type context. That is always going to end in tears.

>> so the compiler can produce working code
Well, yes the compiler will produce working code but what that working code does is undefined because that working code is trying to execute uninitialised memory.

>> in the C standard 'unsigned char' and 'byte' are equivalent I believe
No, they are not (although this question is C++ so it is that standard that applies but regarding the definition of a char the C and C++ standards are aligned). A char is defined as being "large enough to store any member of the implementation’s basic character set". That is the only definition the standard gives. Although it is mostly common for an unsigned char to be 8 bits it is not prescribed by the standard.

But, apart from all that -- the code examples are flawed because at no point is the memory in buff every initialised to be of type Type. Using placement new will do this. If Type is a POD (Plain Old Data) type then an existing object may also be binary copied into buff using memcpy.



0
 
tampnicCommented:
>>Well, that's not what the code shows... all I see is a pointer cast of uninitialised memory (allocated, but as yet uninitialised) and then an attempt to use it in a different type context. That is always going to end in tears.
>>But, apart from all that -- the code examples are flawed because at no point is the memory in buff every initialised to be of type Type.

This is clearer to me now due to your emphasis on initialisation - thanks.

I was thinking about general use of void pointers returned by memory allocation functions and casting them to the appropriate type later, something I've seen done (bug fixing wasn't easy because it wasn't always obvious what type of object was in the buffer). The proper practice is to cast to the correct type in the *right hand side* of the assignment, so the rvalue is typed appropriately. The rvalue in the original code assignment shouldn't have been recast. As you said, the dereference of "buf" is unspecified. Alles klar!
 
/* UNSPECIFIED */
  void *buf = malloc( 100 ) ;
  Type  *pFoo = (Type *)buf; 
  if ( pFoo ) {
    pFoo->abc = 15 ;
    std::cout <<  pFoo->abc << std::endl; 
  }
  Type  *pFoo_ = *(Type **)&buf; // dereference of (Type **)&buf is unspecified behaviour
  etc etc

/* BETTER */
  Type *buf = (Type *)malloc( sizeof(Type) ) ; 
  Type  *pFoo = buf; 
  if ( pFoo ) {
    pFoo->abc = 15 ;
    std::cout <<  pFoo->abc << std::endl; 
  }

  Type  *pFoo_ = *(Type **)&buf; // buf is a Type* so dereference of &buf is OK
etc etc

Open in new window


>>>> in the C standard 'unsigned char' and 'byte' are equivalent I believe
>><snip>A char is defined as being "large enough to store any member of the implementation’s basic character set".<snip>
From wikipedia .... "The C and C++ programming languages, for example, define byte as an "addressable unit of data storage large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard)."

... so 'byte' and 'char' look like they are aligned in their definition, depending on your trust of wikipedia :-) I can't get a free copy of the standard to check this directly.

Cheers,
  Chris
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> This is clearer to me now due to your emphasis on initialisation - thanks.
Any time.

>> addressable unit of data storage large enough to hold any member of the basic character set
Large enough only defines a minimum size. There is no reason a char, for example, couldn't be 16 bits.
http://www.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.4
0
 
evilrixSenior Software Engineer (Avast)Commented:
Sorry, I forgot to address this...

>> I was thinking about general use of void pointers returned by memory allocation functions

The malloc function is a special case because it returns a void * which has no defined type and is guaranteed to return memory of the correct alignment to be suitable for casting to another type.

More specifically: "The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object in the space allocated".

It should be noted that malloc is a C and NOT C++ memory allocator and, as such, the result is only valid for casting to POD types. Attempting, for example, to cast the memory to a class that has a constructor or virtual functions, will result in undefined (and probably fatal) behaviour. Of course, you can use placement new to create objects in memory allocated by malloc but, again, there are issues with data alignment that make this problematic.

NB. I find that when working with C++ is it generally better (and safer) not to think of it as a superset of C because although the syntax is the semantics are most definitely not.
0
 
tampnicCommented:
Excellent in-depth knowledge Ricky - thanks for the enlightenment.

I'm off to write some templates and research design patterns in the ongoing rewrite of "struct brain" into "class brain_plusplus" :-)

Cheers,
  Chris
0
 
forums_mpAuthor Commented:
I was torn on the best solution.  The individuals who responded provided clear and concise arguments - to include examples which are always invaluable.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.