Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Pointer cast

Posted on 2011-10-21
15
Medium Priority
?
375 Views
Last Modified: 2012-05-12
Consider the source snippet below.    'Type  *pFoo_ = *(Type **)buf; '  produce a core dump ("encountered a problem needs to close")  on my box.  

It's unclear to me why or when 'Type  *pFoo_ = *(Type **)buf; would be valid.  I tried a contrived example:

 // unsigned char buff [ 100 ] [ 200 ];
  unsigned char **buff = new unsigned char* [ 300 ] ;
  Type  *pFoo_ = *(Type **)buff;
  if ( pFoo_ ) {
    int xx = 15 ;
    pFoo_->abc = 15 ;
    std::cout <<  pFoo_->abc << std::endl;
  }

Code still bombed,hence I'm confused on how the code above - albeit 'Type" is contrived works in source code I'm looking at.     Clarity appreciated, thanks


# include <iostream>

int main() {

  struct Type {
    int abc ; 
  };

  unsigned char *buf = new unsigned char [ 100 ] ;
  Type  *pFoo = (Type *)buf; 
  if ( pFoo ) {
    pFoo->abc = 15 ;
    std::cout <<  pFoo->abc << std::endl; 
  }
  std::cout << ".... " << std::endl; 

  Type  *pFoo_ = *(Type **)buf; 
  if ( pFoo_ ) {
    int xx = 15 ; 
    pFoo_->abc = 15 ;
    std::cout <<  pFoo_->abc << std::endl; 
  }

  return ( EXIT_SUCCESS ) ;
}

Open in new window

0
Comment
Question by:forums_mp
  • 6
  • 5
  • 3
  • +1
15 Comments
 
LVL 6

Expert Comment

by:rushtoshankar
ID: 37006334
It is simple difference between one dimension vs two dimensional array access

--- Type  *pFoo = (Type *)buf; ----
In this case, *pFoo is pointing to the place where buf is pointing to
e.g. address if buf is 0x1000. address of pFoo is 0x2000
value at 0x1000 is 0x11000 (value stored in buf).
value at 0x2000 is also 0x11000 (value stored in pFoo)

so buf is pointing to 0x11000 so is pFoo

--- Type  *pFoo_ = *(Type **)buf; ----  (for clarity, i rewrite the statement as  *pFoo_ = (Type **)*buf;)
e.g. address if buf is 0x1000. address of pFoo_ is 0x2000
value at 0x1000 is 0x11000 (value stored in buf).
data at 0x11000 is 0x1234 (this is the value at the location pointed by buff)

now the statement stores the following value in pFoo_
so value at 0x2000 is 0x1234 and not 0x11000.

Hope this helps. If you still have any doubt, we are here to help :)

0
 
LVL 7

Expert Comment

by:tampnic
ID: 37006370
The code doesn't work as written because you are using a doubly-indirected pointer to buf. Its to do with the order that the compiler sorts out the casts when doing the assignment.

You need to say  'Type  *pFoo_ = *(Type **)&buf; '
 
&buf is doubly-indirected, it gets dereferenced to Type*, then the assignment occurs correctly.

Cheers,
  Chris


# include <iostream>

int main() {

  struct Type {
    int abc ; 
  };

  unsigned char *buf = new unsigned char [ 100 ] ;
  Type  *pFoo = (Type *)buf; 
  if ( pFoo ) {
    pFoo->abc = 15 ;
    std::cout <<  pFoo->abc << std::endl; 
  }
  std::cout << ".... " << std::endl; 

  Type  *pFoo_ = *(Type **)&buf; 
  if ( pFoo_ ) {
    int xx = 15 ; 
    pFoo_->abc = 15 ;
    std::cout <<  pFoo_->abc << std::endl; 
  }

  return ( EXIT_SUCCESS ) ;
}

Open in new window

0
 
LVL 7

Expert Comment

by:tampnic
ID: 37006529
To clarify my post ...

*(Type**) ... will pass compiler type-checking as it boils down to a Type*.

When you make the assignment "Type  *pFoo_ = *(Type **)buf;" the compiler understands this as "make pFoo_ equal to the contents of the Type double-pointer buf". However "buf" holds the value of a pointer to Type, not a double-pointer, so you have to assign the address of buf (&buf) for this to work correctly.

Cheers,
  Chris
 
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 6

Accepted Solution

by:
rushtoshankar earned 100 total points
ID: 37006562
sorry for the confusion. i slightly misunderstood the question.

the statement "It is simple difference between one dimension vs two dimensional array access" --- has to be
"It is simple difference between pointer vs pointer to pointer"
dont get confused pointer to pointer with two dimensional array.

e.g.
int a, *p = &a, **ptp = &p;
int **two = (int**) mallc(100);
two[0][0] = 0x1010;


_pFoo_ = *(Type **)ptp is valid
while
_pFoo = *(Type **) two is always in valid because,

when you do pFoo->abc the first statement
_pFoo_ = *(Type **)ptp  ===> _pFoo_ = *ptp (now ptp is of type Type**)
===> _pFoo = p (which is of type Type *

_pFoo is now pointing to the address pointed by p which is a
_pFoo->abc = 10 alters the value at a because the address of abc and a are same

when you look at the second statement
_pFoo_ = *(Type **)two  ===> _pFoo_ = *two (now two is of type Type**)
===> _pFoo = 0x1010 (which is of type Type *)

_pFoo is now pointing to 0x1010
the statement _pFoo->abc = 10 tries to alter a value at 0x1010 which is invalid
0
 
LVL 7

Expert Comment

by:tampnic
ID: 37007114
rushtoshankar: message 37006562 simply restates my solution in 37006529 with a bit of example code. Maybe its a language thing? A "doubly-indirected pointer", sometimes shortened to "double-pointer", means exactly the same as "pointer-to-pointer".

Cheers,
  Chris
0
 
LVL 6

Expert Comment

by:rushtoshankar
ID: 37007198
I accept that my statement restates your solution.
My comment just has the detailed steps to make the things clear to understand. That is all.
Actually, I didn't  refresh this page when i submitted my comment.

A small difference between the terms "double pointer" and "pointer to pointer" (depends on the context).
Pointer to pointer can only be used to refer a pointer that can be dereferenced twice to attain the value at a memory.
But double pointer can be used in both this scenario as well as to refer two dimensional array context.
Pointer to pointer gives slightly different meaning when you use it to refer two dimensional array.
It is like the difference between square and rectangle.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 37007996
>> It's unclear to me why or when 'Type  *pFoo_ = *(Type **)buf; would be valid.

In your example not only are you casting from a char pointer to a Type pointer but you are also changing the level of indirection as part of that cast. The result of this is unspecified according to the C++ standard (meaning, anything could happen). The result of casting from one point type to another and then trying to use that pointer are unspecified. So, it may or may not do what you want but it may also (as you've just seen) cause your application to crash.

Specifically, the standard states:

"A pointer to an object can be explicitly converted to a pointer to an object of different type. Except that
converting an rvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types
and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type
yields the original pointer value, the result of such a pointer conversion is unspecified."

In other words, the only safe thing you can do when casting a pointer to a different type is to cast it back to the original type. Anything else you do with this pointer will result in unspecified behaviour.

Unspecified behavior: "behavior, for a well-formed program construct and correct data, that depends on the implementation. The implementation is not required to document which behavior occurs."

There are also issues that relate to strict aliasing.

So, to put this another way, unless you really really understand what you are doing casting from a char array to a type should be avoided.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 37008138
And just to explain the semantics of what your snippet example does...

You are creating a array of pointers to char (none of which point to valid memory). You are then casting a pointer to an array of pointers to char to be a pointer to a pointer to Type and then trying to dereference it to get a pointer to Type. You then try and access member abc... so you are now trying to dereference memory that is completely invalid and execute whatever is there as a function.

If this were to every work it would be by pure chance and certainly not design.

Your second full code example differs slightly in that you have an array of chars and not pointers to char but otherwise the problem is exactly the same; you are trying to execute uninitialised memory.

FWIW, if you really want to do something like this you can use placement new (which I am guessing is what you were trying to do?). This works because it actually initialises the memory properly.

It's worth reading what C++Lite has to say about this though as there are caveats to be aware of.
#include <iostream>

struct Type {
   int abc ; 
};

int main()
{
   unsigned char *buff = new unsigned char [ 300 ] ;
   Type  *pFoo_ = new (buff) Type; // 'creates' Type in buff array (the first sizeof(Type) bytes are used!)
   if ( pFoo_ ) {
      int xx = 15 ; 
      pFoo_->abc = 15 ;
      std::cout <<  pFoo_->abc << std::endl; 
   }

   pFoo_->~Type(); // we have to explicitly call the destructor.

   delete [] buff; // delete the buffer
}

Open in new window

0
 
LVL 7

Expert Comment

by:tampnic
ID: 37010857
Evilrix: The way I understand the piece of the standard you quoted, applied to this specific instance, is that unspecified behaviour occurs if one tried to access Type  *pFoo_ when cast back to the original type i.e. (unsigned char*)pFoo_. The posters original snippet didn't attempt that, so the compiler can produce working code (VS10 on win7 and GCC 4.5.1 on linux both produce working executables when the original code is amended to include the ampersand I suggested) ... or am I not reading that correctly? (I'm not 100% confident here!)

AFAI can tell, he's allocated memory on the heap with "new" using an array of bytes (in the C standard 'unsigned char' and 'byte' are equivalent I believe) then dumped some Type objects into that memory through a pointer to the beginning of the array. "void *buf = new unsigned char [ 100 ] ;" might be a better declaration, as use of the void pointer hints that you are going to cast other types into the buffer.

IMO its sloppy not to use stronger typing in the memory allocation, as the code is only dumping one type of variable into the buffer. If there is a true requirement to dump different types into the buffer, maybe a redesign rather than a refactor is necessary to facilitate stronger typing. Loosely typed memory can be very difficult to debug.

Cheers,
  Chris
0
 
LVL 40

Assisted Solution

by:evilrix
evilrix earned 100 total points
ID: 37010953
>>  or am I not reading that correctly? (I'm not 100% confident here!)

"Except that converting an rvalue of type “pointer to T1” to the type “pointer to T2” and back to its original type
yields the original pointer value, the result of such a pointer conversion is unspecified."

...is pretty clear to me.

The result of using the cast pointer (apart from casting back to the original pointer) is unspecified... that isn't the same as undefined.

Note, unspecified and undefined have very specific definitions in the standard. Unspecified basically means platform/compiler specific and undefined means the code is malformed.

The case is unspecified but the example code will result in undefined behaviour regardless because it is trying to dereference uninitialised memory. In other words, there are multiple issues with the examples shown.

As I also noted, this cast breaks the rules of strict aliasing. Only the following are well defined as far as the standard is concerned:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
* a type compatible with the effective type of the object,
* a qualified version of a type compatible with the effective type of the object,
* a type that is the signed or unsigned type corresponding to the effective type of the object,
* a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
* an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
* a character type.

Any thing else is either unspecified or undefined.

>> then dumped some Type objects into that memory
Well, that's not what the code shows... all I see is a pointer cast of uninitialised memory (allocated, but as yet uninitialised) and then an attempt to use it in a different type context. That is always going to end in tears.

>> so the compiler can produce working code
Well, yes the compiler will produce working code but what that working code does is undefined because that working code is trying to execute uninitialised memory.

>> in the C standard 'unsigned char' and 'byte' are equivalent I believe
No, they are not (although this question is C++ so it is that standard that applies but regarding the definition of a char the C and C++ standards are aligned). A char is defined as being "large enough to store any member of the implementation’s basic character set". That is the only definition the standard gives. Although it is mostly common for an unsigned char to be 8 bits it is not prescribed by the standard.

But, apart from all that -- the code examples are flawed because at no point is the memory in buff every initialised to be of type Type. Using placement new will do this. If Type is a POD (Plain Old Data) type then an existing object may also be binary copied into buff using memcpy.



0
 
LVL 7

Expert Comment

by:tampnic
ID: 37011058
>>Well, that's not what the code shows... all I see is a pointer cast of uninitialised memory (allocated, but as yet uninitialised) and then an attempt to use it in a different type context. That is always going to end in tears.
>>But, apart from all that -- the code examples are flawed because at no point is the memory in buff every initialised to be of type Type.

This is clearer to me now due to your emphasis on initialisation - thanks.

I was thinking about general use of void pointers returned by memory allocation functions and casting them to the appropriate type later, something I've seen done (bug fixing wasn't easy because it wasn't always obvious what type of object was in the buffer). The proper practice is to cast to the correct type in the *right hand side* of the assignment, so the rvalue is typed appropriately. The rvalue in the original code assignment shouldn't have been recast. As you said, the dereference of "buf" is unspecified. Alles klar!
 
/* UNSPECIFIED */
  void *buf = malloc( 100 ) ;
  Type  *pFoo = (Type *)buf; 
  if ( pFoo ) {
    pFoo->abc = 15 ;
    std::cout <<  pFoo->abc << std::endl; 
  }
  Type  *pFoo_ = *(Type **)&buf; // dereference of (Type **)&buf is unspecified behaviour
  etc etc

/* BETTER */
  Type *buf = (Type *)malloc( sizeof(Type) ) ; 
  Type  *pFoo = buf; 
  if ( pFoo ) {
    pFoo->abc = 15 ;
    std::cout <<  pFoo->abc << std::endl; 
  }

  Type  *pFoo_ = *(Type **)&buf; // buf is a Type* so dereference of &buf is OK
etc etc

Open in new window


>>>> in the C standard 'unsigned char' and 'byte' are equivalent I believe
>><snip>A char is defined as being "large enough to store any member of the implementation’s basic character set".<snip>
From wikipedia .... "The C and C++ programming languages, for example, define byte as an "addressable unit of data storage large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard)."

... so 'byte' and 'char' look like they are aligned in their definition, depending on your trust of wikipedia :-) I can't get a free copy of the standard to check this directly.

Cheers,
  Chris
0
 
LVL 40

Expert Comment

by:evilrix
ID: 37011372
>> This is clearer to me now due to your emphasis on initialisation - thanks.
Any time.

>> addressable unit of data storage large enough to hold any member of the basic character set
Large enough only defines a minimum size. There is no reason a char, for example, couldn't be 16 bits.
http://www.parashift.com/c++-faq-lite/intrinsic-types.html#faq-26.4
0
 
LVL 40

Expert Comment

by:evilrix
ID: 37011416
Sorry, I forgot to address this...

>> I was thinking about general use of void pointers returned by memory allocation functions

The malloc function is a special case because it returns a void * which has no defined type and is guaranteed to return memory of the correct alignment to be suitable for casting to another type.

More specifically: "The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object in the space allocated".

It should be noted that malloc is a C and NOT C++ memory allocator and, as such, the result is only valid for casting to POD types. Attempting, for example, to cast the memory to a class that has a constructor or virtual functions, will result in undefined (and probably fatal) behaviour. Of course, you can use placement new to create objects in memory allocated by malloc but, again, there are issues with data alignment that make this problematic.

NB. I find that when working with C++ is it generally better (and safer) not to think of it as a superset of C because although the syntax is the semantics are most definitely not.
0
 
LVL 7

Expert Comment

by:tampnic
ID: 37011484
Excellent in-depth knowledge Ricky - thanks for the enlightenment.

I'm off to write some templates and research design patterns in the ongoing rewrite of "struct brain" into "class brain_plusplus" :-)

Cheers,
  Chris
0
 

Author Closing Comment

by:forums_mp
ID: 37041796
I was torn on the best solution.  The individuals who responded provided clear and concise arguments - to include examples which are always invaluable.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

580 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question