?
Solved

reinterpret_cast on an array for initialising a structure!

Posted on 2012-08-28
17
Medium Priority
?
953 Views
Last Modified: 2012-09-03
Ah hello.

Please consider the following code snippet:

short * getData()
{
	static short data[] = { -1, 99, 1234, 768 };
	return data;
}

struct SimpleStructure
{
	int m_n1, m_n2;
};

int main( void )
{
	SimpleStructure* ss = reinterpret_cast<SimpleStructure*>(getData());

	printf("%i %i", ss->m_n1, ss->m_n2);
}

Open in new window


After running the program, I see the values

6553599 50332882

printed.

When I change array returned by getData() to be an array of integers, I see

-1 99

printed: I get the general principle - the memory locations occupied by the array are "initialising" the members of the structure.  But I cannot see why we get the different results when we use the short type instead.  It is something to do with how the bit pattern in the memory is being interpreted, but I cannot put my finger on the exact reason.

Can someone explain please?

Thanks in advance.
0
Comment
Question by:mrwad99
  • 7
  • 6
  • 3
  • +1
17 Comments
 
LVL 86

Assisted Solution

by:jkr
jkr earned 400 total points
ID: 38341899
>> It is something to do with how the bit pattern in the memory is being interpreted, but I
>> cannot put my finger on the exact reason.

Byte-ordering: http://en.wikipedia.org/wiki/Endianness - two consecutive shorts in memory interpreted as an integer are completely different, given the memory layout. Adding that this would be less difficult on a non-Intel CPU will probably not give much comfort.
0
 
LVL 19

Author Comment

by:mrwad99
ID: 38341999
Um....

OK.

As a WORD (same size as short) in Windows calculator,

-1 is 1111111111111111
99 is 0000000001100011

When I change calc from the binary equivalent of -1 back to decimal, I get 65535.

The value in the struct is 6553599: it seems that the "99" has just been stuck on the end of 65535.  Coincidence?  Probably.

Assuming my memory layout is 11111111111111110000000001100011, how can I apply the endianness to get my answer of 6553599?

Sorry, I know this might be a really deep question - I didn't realise I had allocated 500points for it, but I guess it is going to be worth it :)
0
 
LVL 32

Expert Comment

by:phoffric
ID: 38342406
Hello mrwad99,

Here's a transformation that may interest you. Using the calculator, I have converted the short decimal values to hex values, and did the same thing for your results. There is a nice pattern that shows up:

INPUT:
-1,  99,  1234, 768
FF  63    4D2  300

OUTPUT:
6553599    50332882
  63FFFF     30004D2
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 19

Author Comment

by:mrwad99
ID: 38344799
Thanks phoffric.

OK, I think I see a bit of a pattern: it seems the hex values have had their order swapped and then been combined.

What I really want to understand though is how the memory layout of the two short values gets interpreted as the integer value.  jkr has mentioned endianness (thanks jkr) but there are still some pieces of the puzzle missing...can you help with this?
0
 
LVL 9

Expert Comment

by:Orcbighter
ID: 38345262
It is quite simple!
You have declared an array of shorts and, from a programming point of view, these addresses are contiguous.
You then cast it to a struct with, from a programming point of view, two contiguous integers.
When you cast the result from GetData into the SimpleStructure struct, you effectively moved the first short into the top half of the first integer, and the second short into the second half of the first integer. You did the same with the second set of shorts.
So, now you are reading an integer where the top word has been set to 0xFFFFF in hex, and the bottom word has been set to 0x63, giving a result of 63FFFF or 6553599 in decimal.
0
 
LVL 19

Author Comment

by:mrwad99
ID: 38345443
Thanks for participating Orcbighter :)

I see what you saying, but...

1) Assuming that each short is 2 bytes, 16 bits, we have, as phoffric states

0xFF and 0x63

Each of these is only 1 byte, so we need to pad out with zeros.

Hence 0xFF becomes 0x00FF

How do you get 0xFFFF - that is two extra Fs!

2) Assuming 0x63FFFF is correct, why has the high order word taken the value of the *second* short, in other words, why is the result not 0xFFFF63?  Is this something to do with endianness?

Thanks.
0
 
LVL 9

Expert Comment

by:Orcbighter
ID: 38346236
No, its got nothing to do with endianess and all to do with alignment and how a computer populates a value within a given datatype.!
A short is two bytes. If you assign it the value of -1 in decimal, the short will contain the value 0xFFFF, not 0xFF.
Look in the degugger and examine the values of data, turn on the hex display and you will see.
0
 
LVL 19

Author Comment

by:mrwad99
ID: 38346298
Yes, I am sorry: a single hex digit is 4 bits, not 8: therefore 4 hex digits make up 16 bits (2 bytes).  My mistake :)

I would like to ask for some clarification, if possible, on what you mean by alignment.  Assuming the structure contained the values...

0xFFFF 0x0063

...I thought that treating this series of 32 bits as one 32 bit integer would result in

0xFFFF0063

In other words, we simply remove the "space" between the values.  Why are we swapping the order, i.e.

0x0063 0xFFFF

...before combining them to give the answer of 0x0063FFFF

?

0
 
LVL 32

Accepted Solution

by:
phoffric earned 1600 total points
ID: 38347707
>> No, its got nothing to do with endianess
>> how a computer populates a value within a given datatype
The population of shorts on the stack is:
    -1,          99,          1234,           768      // decimal shorts
0xFFFF   0x0063      0x04D2     0x0300   // hex shorts

The memory layout is FFFF0063 04D20300 (broken into two 32-bit words).
On a little endian platform, the least significant bytes are in the lowest address; in this case the 32-bit pattern FFFF0063 is interpreted as an integer to be 0x0063FFFF.

On a big-endian platform, the same 32-bit pattern is interpreted as an integer to be 0xFFFF0063. jkr is correct in that the results depend on endianess.

(Note that in either platform, an int or a short is considered a signed value; in this case, on a big-endian platform, the interpreted 0xFFFF0063 result is negative.)
0
 
LVL 9

Expert Comment

by:Orcbighter
ID: 38348118
As stated by phoffric, it has nothing to do with endianness. It has to do with how a computer loads up the address space of a given data type, which can be thought of as right-to-left.
thus:
An integer is 32 bits long (on a 32 bit OS), and a short is 16 bits long, shown thus

|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|  -> short

|31|30|29|28|27|26|25|24|23|22|21|20|19|18|17|16|
|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|  -> integer

|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|
|15|14|13|12|11|10|09|08|07|06|05|04|03|02|01|00|  -> 2 shorts

Now,
 1 in binary is 0000000000000001
-1 in binary is 1111111111111111

now, load -1 into the first 15 bits of the longword (integer), the rightmost 15 bits.
Now load 99 into the second 15 bits of the longword, because thats the way the computer does it:
00000000011000111111111111111111
and when you look at that address space as a decimal you get 6553599, and as a hex value, 63FFFF.

Your misunderstanding is thinking that the loading of the integer happens left-to-right.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 38348240
Orcbighter wrote:
>> As stated by phoffric, it has nothing to do with endianness.
    It is ok to disagree and then we sort things out, but I was very clear that it has everything to do with endianness. So, please do not misquote me.

I wrote:
>> "jkr is correct in that the results depend on endianess."
    I do not see how my statement could be misinterpreted by Orcbighter.

Orcbighter wrote:
>> "Your misunderstanding is thinking that the loading of the integer happens left-to-right."
    In the OP, there is no loading of int values; there is only loading of shorts; and the -1 is going to be the low order address of the data array regardless of the endianess. For the given memory layout that occurs when loading the short data array (which does not change in the main program), it is the endianess, as both jkr and I are saying, that gives the different values to an int.

     When looking at an int in the debugger, you will see what appears to be a swapping (if little endian), but this is the system understanding how to interpret the byte ordering. There is no physical swap of the bytes in memory; it is just how it is presented by the debugger and the program.

     The reinterpret cast is not loading anything. It is facilitating the reinterpretation of the memory layout. It is just telling the compiler in this case to accept the code, and that the programmer will accept the consequences.
0
 
LVL 19

Author Comment

by:mrwad99
ID: 38349348
phoffric - right, OK.  I have gone over all the comments, read a simple article on endianness (http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html) and think I understand what is happening.  Thanks.

On the understanding that what you are saying about low/high order bytes is correct, and from the article linked above quoting

"For example, DEC and IBMs(?) are little endian, while Motorolas and Suns are big endian"

I compiled/ran my little program on a Sun Solaris 8 machine, and got the output

-65437 80872192

Now, according to Windows calculator, 80872192 *is* 04D20300 (which is great), but -65437 *is not* FFFF0063 : FFFF0063 is 4294901859.

Any ideas on the discrepancy here?

Thanks again.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 38350377
>> but -65437 *is not* FFFF0063 : FFFF0063 is 4294901859.
      On my windows calculator, -65437 is FFFFFFFFFFFF0063, which is a 64-bit integer with the extra F's simply being a sign-extension. On my calculator, anything less than 64 bits means that the msb is 0 (for a 64-bit integer) and so the integer is positive. But, if your computer has a 32-bit word, then FFFF0063 is negative and not positive.

     A 32-bit int, FFFF0063, corresponds to -65437.
     A 64-bit int, FFFFFFFFFFFF0063, also corresponds to -65437.
0
 
LVL 19

Author Comment

by:mrwad99
ID: 38351005
Thanks for coming back phoffric.

I am confused.

1) In calculator, in decimal mode I enter -65437.  I then change the display to Hex, and the value changes to FFFF0063.  I then change to hex mode, enter FFFF0063 then change to decimal mode.  I get the answer 4294901859!  Why is this??

2)
>> On my windows calculator, -65437 is FFFFFFFFFFFF0063, which is a 64-bit integer with the extra F's simply being a sign-extension. On my calculator, anything less than 64 bits means that the msb is 0 (for a 64-bit integer) and so the integer is positive

I get that too; but only if QWORD is selected in Hex mode.  Can you explain why we have added an extra 8 Fs on when in QWORD mode?  Why aren't they zeros?

3)
>> But, if your computer has a 32-bit word, then FFFF0063 is negative and not positive.

I thought words were two bytes - 16 bits - regardless of the computer?

4)
>> A 32-bit int, FFFF0063, corresponds to -65437.
     A 64-bit int, FFFFFFFFFFFF0063, also corresponds to -65437.

Pasting both of those into calculator gives 4294901859 and 18446744073709486179 respectively.

??

0
 
LVL 32

Expert Comment

by:phoffric
ID: 38351721
When I use a Windows 7 calculator in Programmer's mode, and type in all F's, and convert to decimal, I get a -1. On Windows XP I do see what you are talking about - it can treat a decimal negative number giving the correct hex, but then go back to decimal, and it treats the hex as unsigned.

It is best to use the debugger on your platform, inspecting both memory layouts and individual shorts and ints in both decimal and hex for your situation.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 38356316
>> I thought words were two bytes - 16 bits - regardless of the computer?
Modern processors, including embedded systems, usually have a word size of 8, 16, 24, 32 or 64 bits, while modern general purpose computers usually use 32 or 64 bits.
             http://en.wikipedia.org/wiki/Word_(computer_architecture)
0
 
LVL 19

Author Closing Comment

by:mrwad99
ID: 38360553
Thanks very much all for helping me understand this :o)
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Article by: SunnyDark
This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question