Link to home
Start Free TrialLog in
Avatar of mrwad99
mrwad99Flag for United Kingdom of Great Britain and Northern Ireland

asked on

reinterpret_cast on an array for initialising a structure!

Ah hello.

Please consider the following code snippet:

short * getData()
{
	static short data[] = { -1, 99, 1234, 768 };
	return data;
}

struct SimpleStructure
{
	int m_n1, m_n2;
};

int main( void )
{
	SimpleStructure* ss = reinterpret_cast<SimpleStructure*>(getData());

	printf("%i %i", ss->m_n1, ss->m_n2);
}

Open in new window


After running the program, I see the values

6553599 50332882

printed.

When I change array returned by getData() to be an array of integers, I see

-1 99

printed: I get the general principle - the memory locations occupied by the array are "initialising" the members of the structure.  But I cannot see why we get the different results when we use the short type instead.  It is something to do with how the bit pattern in the memory is being interpreted, but I cannot put my finger on the exact reason.

Can someone explain please?

Thanks in advance.
SOLUTION
Avatar of jkr
jkr
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mrwad99

ASKER

Um....

OK.

As a WORD (same size as short) in Windows calculator,

-1 is 1111111111111111
99 is 0000000001100011

When I change calc from the binary equivalent of -1 back to decimal, I get 65535.

The value in the struct is 6553599: it seems that the "99" has just been stuck on the end of 65535.  Coincidence?  Probably.

Assuming my memory layout is 11111111111111110000000001100011, how can I apply the endianness to get my answer of 6553599?

Sorry, I know this might be a really deep question - I didn't realise I had allocated 500points for it, but I guess it is going to be worth it :)
Avatar of phoffric
phoffric

Hello mrwad99,

Here's a transformation that may interest you. Using the calculator, I have converted the short decimal values to hex values, and did the same thing for your results. There is a nice pattern that shows up:

INPUT:
-1,  99,  1234, 768
FF  63    4D2  300

OUTPUT:
6553599    50332882
  63FFFF     30004D2
Avatar of mrwad99

ASKER

Thanks phoffric.

OK, I think I see a bit of a pattern: it seems the hex values have had their order swapped and then been combined.

What I really want to understand though is how the memory layout of the two short values gets interpreted as the integer value.  jkr has mentioned endianness (thanks jkr) but there are still some pieces of the puzzle missing...can you help with this?
It is quite simple!
You have declared an array of shorts and, from a programming point of view, these addresses are contiguous.
You then cast it to a struct with, from a programming point of view, two contiguous integers.
When you cast the result from GetData into the SimpleStructure struct, you effectively moved the first short into the top half of the first integer, and the second short into the second half of the first integer. You did the same with the second set of shorts.
So, now you are reading an integer where the top word has been set to 0xFFFFF in hex, and the bottom word has been set to 0x63, giving a result of 63FFFF or 6553599 in decimal.
Avatar of mrwad99

ASKER

Thanks for participating Orcbighter :)

I see what you saying, but...

1) Assuming that each short is 2 bytes, 16 bits, we have, as phoffric states

0xFF and 0x63

Each of these is only 1 byte, so we need to pad out with zeros.

Hence 0xFF becomes 0x00FF

How do you get 0xFFFF - that is two extra Fs!

2) Assuming 0x63FFFF is correct, why has the high order word taken the value of the *second* short, in other words, why is the result not 0xFFFF63?  Is this something to do with endianness?

Thanks.
No, its got nothing to do with endianess and all to do with alignment and how a computer populates a value within a given datatype.!
A short is two bytes. If you assign it the value of -1 in decimal, the short will contain the value 0xFFFF, not 0xFF.
Look in the degugger and examine the values of data, turn on the hex display and you will see.
Avatar of mrwad99

ASKER

Yes, I am sorry: a single hex digit is 4 bits, not 8: therefore 4 hex digits make up 16 bits (2 bytes).  My mistake :)

I would like to ask for some clarification, if possible, on what you mean by alignment.  Assuming the structure contained the values...

0xFFFF 0x0063

...I thought that treating this series of 32 bits as one 32 bit integer would result in

0xFFFF0063

In other words, we simply remove the "space" between the values.  Why are we swapping the order, i.e.

0x0063 0xFFFF

...before combining them to give the answer of 0x0063FFFF

?

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
As stated by phoffric, it has nothing to do with endianness. It has to do with how a computer loads up the address space of a given data type, which can be thought of as right-to-left.
thus:
An integer is 32 bits long (on a 32 bit OS), and a short is 16 bits long, shown thus

|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|  -> short

|31|30|29|28|27|26|25|24|23|22|21|20|19|18|17|16|
|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|  -> integer

|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|
|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|  -> 2 shorts

Now,
 1 in binary is 0000000000000001
-1 in binary is 1111111111111111

now, load -1 into the first 15 bits of the longword (integer), the rightmost 15 bits.
Now load 99 into the second 15 bits of the longword, because thats the way the computer does it:
00000000011000111111111111111111
and when you look at that address space as a decimal you get 6553599, and as a hex value, 63FFFF.

Your misunderstanding is thinking that the loading of the integer happens left-to-right.
Orcbighter wrote:
>> As stated by phoffric, it has nothing to do with endianness.
    It is ok to disagree and then we sort things out, but I was very clear that it has everything to do with endianness. So, please do not misquote me.

I wrote:
>> "jkr is correct in that the results depend on endianess."
    I do not see how my statement could be misinterpreted by Orcbighter.

Orcbighter wrote:
>> "Your misunderstanding is thinking that the loading of the integer happens left-to-right."
    In the OP, there is no loading of int values; there is only loading of shorts; and the -1 is going to be the low order address of the data array regardless of the endianess. For the given memory layout that occurs when loading the short data array (which does not change in the main program), it is the endianess, as both jkr and I are saying, that gives the different values to an int.

     When looking at an int in the debugger, you will see what appears to be a swapping (if little endian), but this is the system understanding how to interpret the byte ordering. There is no physical swap of the bytes in memory; it is just how it is presented by the debugger and the program.

     The reinterpret cast is not loading anything. It is facilitating the reinterpretation of the memory layout. It is just telling the compiler in this case to accept the code, and that the programmer will accept the consequences.
Avatar of mrwad99

ASKER

phoffric - right, OK.  I have gone over all the comments, read a simple article on endianness (http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html) and think I understand what is happening.  Thanks.

On the understanding that what you are saying about low/high order bytes is correct, and from the article linked above quoting

"For example, DEC and IBMs(?) are little endian, while Motorolas and Suns are big endian"

I compiled/ran my little program on a Sun Solaris 8 machine, and got the output

-65437 80872192

Now, according to Windows calculator, 80872192 *is* 04D20300 (which is great), but -65437 *is not* FFFF0063 : FFFF0063 is 4294901859.

Any ideas on the discrepancy here?

Thanks again.
>> but -65437 *is not* FFFF0063 : FFFF0063 is 4294901859.
      On my windows calculator, -65437 is FFFFFFFFFFFF0063, which is a 64-bit integer with the extra F's simply being a sign-extension. On my calculator, anything less than 64 bits means that the msb is 0 (for a 64-bit integer) and so the integer is positive. But, if your computer has a 32-bit word, then FFFF0063 is negative and not positive.

     A 32-bit int, FFFF0063, corresponds to -65437.
     A 64-bit int, FFFFFFFFFFFF0063, also corresponds to -65437.
Avatar of mrwad99

ASKER

Thanks for coming back phoffric.

I am confused.

1) In calculator, in decimal mode I enter -65437.  I then change the display to Hex, and the value changes to FFFF0063.  I then change to hex mode, enter FFFF0063 then change to decimal mode.  I get the answer 4294901859!  Why is this??

2)
>> On my windows calculator, -65437 is FFFFFFFFFFFF0063, which is a 64-bit integer with the extra F's simply being a sign-extension. On my calculator, anything less than 64 bits means that the msb is 0 (for a 64-bit integer) and so the integer is positive

I get that too; but only if QWORD is selected in Hex mode.  Can you explain why we have added an extra 8 Fs on when in QWORD mode?  Why aren't they zeros?

3)
>> But, if your computer has a 32-bit word, then FFFF0063 is negative and not positive.

I thought words were two bytes - 16 bits - regardless of the computer?

4)
>> A 32-bit int, FFFF0063, corresponds to -65437.
     A 64-bit int, FFFFFFFFFFFF0063, also corresponds to -65437.

Pasting both of those into calculator gives 4294901859 and 18446744073709486179 respectively.

??

When I use a Windows 7 calculator in Programmer's mode, and type in all F's, and convert to decimal, I get a -1. On Windows XP I do see what you are talking about - it can treat a decimal negative number giving the correct hex, but then go back to decimal, and it treats the hex as unsigned.

It is best to use the debugger on your platform, inspecting both memory layouts and individual shorts and ints in both decimal and hex for your situation.
>> I thought words were two bytes - 16 bits - regardless of the computer?
Modern processors, including embedded systems, usually have a word size of 8, 16, 24, 32 or 64 bits, while modern general purpose computers usually use 32 or 64 bits.
             http://en.wikipedia.org/wiki/Word_(computer_architecture)
Avatar of mrwad99

ASKER

Thanks very much all for helping me understand this :o)