reinterpret_cast on an array for initialising a structure!

Ah hello.

Please consider the following code snippet:

short * getData()
{
	static short data[] = { -1, 99, 1234, 768 };
	return data;
}

struct SimpleStructure
{
	int m_n1, m_n2;
};

int main( void )
{
	SimpleStructure* ss = reinterpret_cast<SimpleStructure*>(getData());

	printf("%i %i", ss->m_n1, ss->m_n2);
}

Open in new window


After running the program, I see the values

6553599 50332882

printed.

When I change array returned by getData() to be an array of integers, I see

-1 99

printed: I get the general principle - the memory locations occupied by the array are "initialising" the members of the structure.  But I cannot see why we get the different results when we use the short type instead.  It is something to do with how the bit pattern in the memory is being interpreted, but I cannot put my finger on the exact reason.

Can someone explain please?

Thanks in advance.
LVL 19
mrwad99Asked:
Who is Participating?
 
phoffricConnect With a Mentor Commented:
>> No, its got nothing to do with endianess
>> how a computer populates a value within a given datatype
The population of shorts on the stack is:
    -1,          99,          1234,           768      // decimal shorts
0xFFFF   0x0063      0x04D2     0x0300   // hex shorts

The memory layout is FFFF0063 04D20300 (broken into two 32-bit words).
On a little endian platform, the least significant bytes are in the lowest address; in this case the 32-bit pattern FFFF0063 is interpreted as an integer to be 0x0063FFFF.

On a big-endian platform, the same 32-bit pattern is interpreted as an integer to be 0xFFFF0063. jkr is correct in that the results depend on endianess.

(Note that in either platform, an int or a short is considered a signed value; in this case, on a big-endian platform, the interpreted 0xFFFF0063 result is negative.)
0
 
jkrConnect With a Mentor Commented:
>> It is something to do with how the bit pattern in the memory is being interpreted, but I
>> cannot put my finger on the exact reason.

Byte-ordering: http://en.wikipedia.org/wiki/Endianness - two consecutive shorts in memory interpreted as an integer are completely different, given the memory layout. Adding that this would be less difficult on a non-Intel CPU will probably not give much comfort.
0
 
mrwad99Author Commented:
Um....

OK.

As a WORD (same size as short) in Windows calculator,

-1 is 1111111111111111
99 is 0000000001100011

When I change calc from the binary equivalent of -1 back to decimal, I get 65535.

The value in the struct is 6553599: it seems that the "99" has just been stuck on the end of 65535.  Coincidence?  Probably.

Assuming my memory layout is 11111111111111110000000001100011, how can I apply the endianness to get my answer of 6553599?

Sorry, I know this might be a really deep question - I didn't realise I had allocated 500points for it, but I guess it is going to be worth it :)
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
phoffricCommented:
Hello mrwad99,

Here's a transformation that may interest you. Using the calculator, I have converted the short decimal values to hex values, and did the same thing for your results. There is a nice pattern that shows up:

INPUT:
-1,  99,  1234, 768
FF  63    4D2  300

OUTPUT:
6553599    50332882
  63FFFF     30004D2
0
 
mrwad99Author Commented:
Thanks phoffric.

OK, I think I see a bit of a pattern: it seems the hex values have had their order swapped and then been combined.

What I really want to understand though is how the memory layout of the two short values gets interpreted as the integer value.  jkr has mentioned endianness (thanks jkr) but there are still some pieces of the puzzle missing...can you help with this?
0
 
OrcbighterCommented:
It is quite simple!
You have declared an array of shorts and, from a programming point of view, these addresses are contiguous.
You then cast it to a struct with, from a programming point of view, two contiguous integers.
When you cast the result from GetData into the SimpleStructure struct, you effectively moved the first short into the top half of the first integer, and the second short into the second half of the first integer. You did the same with the second set of shorts.
So, now you are reading an integer where the top word has been set to 0xFFFFF in hex, and the bottom word has been set to 0x63, giving a result of 63FFFF or 6553599 in decimal.
0
 
mrwad99Author Commented:
Thanks for participating Orcbighter :)

I see what you saying, but...

1) Assuming that each short is 2 bytes, 16 bits, we have, as phoffric states

0xFF and 0x63

Each of these is only 1 byte, so we need to pad out with zeros.

Hence 0xFF becomes 0x00FF

How do you get 0xFFFF - that is two extra Fs!

2) Assuming 0x63FFFF is correct, why has the high order word taken the value of the *second* short, in other words, why is the result not 0xFFFF63?  Is this something to do with endianness?

Thanks.
0
 
OrcbighterCommented:
No, its got nothing to do with endianess and all to do with alignment and how a computer populates a value within a given datatype.!
A short is two bytes. If you assign it the value of -1 in decimal, the short will contain the value 0xFFFF, not 0xFF.
Look in the degugger and examine the values of data, turn on the hex display and you will see.
0
 
mrwad99Author Commented:
Yes, I am sorry: a single hex digit is 4 bits, not 8: therefore 4 hex digits make up 16 bits (2 bytes).  My mistake :)

I would like to ask for some clarification, if possible, on what you mean by alignment.  Assuming the structure contained the values...

0xFFFF 0x0063

...I thought that treating this series of 32 bits as one 32 bit integer would result in

0xFFFF0063

In other words, we simply remove the "space" between the values.  Why are we swapping the order, i.e.

0x0063 0xFFFF

...before combining them to give the answer of 0x0063FFFF

?

0
 
OrcbighterCommented:
As stated by phoffric, it has nothing to do with endianness. It has to do with how a computer loads up the address space of a given data type, which can be thought of as right-to-left.
thus:
An integer is 32 bits long (on a 32 bit OS), and a short is 16 bits long, shown thus

|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|  -> short

|31|30|29|28|27|26|25|24|23|22|21|20|19|18|17|16|
|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|  -> integer

|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|
|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|  -> 2 shorts

Now,
 1 in binary is 0000000000000001
-1 in binary is 1111111111111111

now, load -1 into the first 15 bits of the longword (integer), the rightmost 15 bits.
Now load 99 into the second 15 bits of the longword, because thats the way the computer does it:
00000000011000111111111111111111
and when you look at that address space as a decimal you get 6553599, and as a hex value, 63FFFF.

Your misunderstanding is thinking that the loading of the integer happens left-to-right.
0
 
phoffricCommented:
Orcbighter wrote:
>> As stated by phoffric, it has nothing to do with endianness.
    It is ok to disagree and then we sort things out, but I was very clear that it has everything to do with endianness. So, please do not misquote me.

I wrote:
>> "jkr is correct in that the results depend on endianess."
    I do not see how my statement could be misinterpreted by Orcbighter.

Orcbighter wrote:
>> "Your misunderstanding is thinking that the loading of the integer happens left-to-right."
    In the OP, there is no loading of int values; there is only loading of shorts; and the -1 is going to be the low order address of the data array regardless of the endianess. For the given memory layout that occurs when loading the short data array (which does not change in the main program), it is the endianess, as both jkr and I are saying, that gives the different values to an int.

     When looking at an int in the debugger, you will see what appears to be a swapping (if little endian), but this is the system understanding how to interpret the byte ordering. There is no physical swap of the bytes in memory; it is just how it is presented by the debugger and the program.

     The reinterpret cast is not loading anything. It is facilitating the reinterpretation of the memory layout. It is just telling the compiler in this case to accept the code, and that the programmer will accept the consequences.
0
 
mrwad99Author Commented:
phoffric - right, OK.  I have gone over all the comments, read a simple article on endianness (http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html) and think I understand what is happening.  Thanks.

On the understanding that what you are saying about low/high order bytes is correct, and from the article linked above quoting

"For example, DEC and IBMs(?) are little endian, while Motorolas and Suns are big endian"

I compiled/ran my little program on a Sun Solaris 8 machine, and got the output

-65437 80872192

Now, according to Windows calculator, 80872192 *is* 04D20300 (which is great), but -65437 *is not* FFFF0063 : FFFF0063 is 4294901859.

Any ideas on the discrepancy here?

Thanks again.
0
 
phoffricCommented:
>> but -65437 *is not* FFFF0063 : FFFF0063 is 4294901859.
      On my windows calculator, -65437 is FFFFFFFFFFFF0063, which is a 64-bit integer with the extra F's simply being a sign-extension. On my calculator, anything less than 64 bits means that the msb is 0 (for a 64-bit integer) and so the integer is positive. But, if your computer has a 32-bit word, then FFFF0063 is negative and not positive.

     A 32-bit int, FFFF0063, corresponds to -65437.
     A 64-bit int, FFFFFFFFFFFF0063, also corresponds to -65437.
0
 
mrwad99Author Commented:
Thanks for coming back phoffric.

I am confused.

1) In calculator, in decimal mode I enter -65437.  I then change the display to Hex, and the value changes to FFFF0063.  I then change to hex mode, enter FFFF0063 then change to decimal mode.  I get the answer 4294901859!  Why is this??

2)
>> On my windows calculator, -65437 is FFFFFFFFFFFF0063, which is a 64-bit integer with the extra F's simply being a sign-extension. On my calculator, anything less than 64 bits means that the msb is 0 (for a 64-bit integer) and so the integer is positive

I get that too; but only if QWORD is selected in Hex mode.  Can you explain why we have added an extra 8 Fs on when in QWORD mode?  Why aren't they zeros?

3)
>> But, if your computer has a 32-bit word, then FFFF0063 is negative and not positive.

I thought words were two bytes - 16 bits - regardless of the computer?

4)
>> A 32-bit int, FFFF0063, corresponds to -65437.
     A 64-bit int, FFFFFFFFFFFF0063, also corresponds to -65437.

Pasting both of those into calculator gives 4294901859 and 18446744073709486179 respectively.

??

0
 
phoffricCommented:
When I use a Windows 7 calculator in Programmer's mode, and type in all F's, and convert to decimal, I get a -1. On Windows XP I do see what you are talking about - it can treat a decimal negative number giving the correct hex, but then go back to decimal, and it treats the hex as unsigned.

It is best to use the debugger on your platform, inspecting both memory layouts and individual shorts and ints in both decimal and hex for your situation.
0
 
phoffricCommented:
>> I thought words were two bytes - 16 bits - regardless of the computer?
Modern processors, including embedded systems, usually have a word size of 8, 16, 24, 32 or 64 bits, while modern general purpose computers usually use 32 or 64 bits.
             http://en.wikipedia.org/wiki/Word_(computer_architecture)
0
 
mrwad99Author Commented:
Thanks very much all for helping me understand this :o)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.