Link to home
Start Free TrialLog in
Avatar of forums_mp
forums_mp

asked on

memcpy .. endianess more

Having a hard time wrapping my head around a solution to this.

I'm dealing with memory mapped IO where data is copied to the processors memory.  IOW there's address locations where you data is stored and retrieved.  An FPGA is responsible for moving the data to/from the memory mapped locations.

Now consider the case where the raw data is akin to: {0x50,0xB0,2,0xFE,0xCA,0xBE,0xBA .... } ( see source for more)

Now a memcpy is guaranteed to work with the local processor endianness in the buffer.  That said I end up with 0xB050 (see source) - not what i want..... I'd like to maintain the format 0x50B0 (the unsigned short value is mapped to bit field proxies) then byteswap if necessary or have the end user specify the desired format.  Makes sense?
# include <iostream>
# include <vector>

typedef std::vector < char > CHAR_VEC ;
char* Get(unsigned short& tgt, char* source)
 {
  char * where = ( char* )( &tgt );
  memcpy ( where, source, sizeof ( unsigned short ) ); 
  return ( source + sizeof ( unsigned short ) ) ; 
}

int main() 
{ 
  unsigned int MAX = 64 ;
  CHAR_VEC Buffer ( MAX );
  Buffer [ 0 ] = 0x50 ;
  Buffer [ 1 ] = 0xB0 ;
  Buffer [ 2 ] = 2 ;
  Buffer [ 4 ] = 0xFE ;
  Buffer [ 5 ] = 0xCA ;
  Buffer [ 6 ] = 0xBE ;
  Buffer [ 7 ] = 0xBA ;
  Buffer [ 10 ] = 0xEF ;
  Buffer [ 11 ] = 0xBE ;
  Buffer [ 16 ] = 0xBD ;
  Buffer [ 17 ] = 0xBA ;

  for ( unsigned odx ( 0 ); odx < MAX ; odx += sizeof ( unsigned short ) ) 
  {
    unsigned short dummy = 0 ; 
    char *ptr = Get ( dummy, &Buffer [ odx ] ) ;
    std::cout << std::hex << dummy << '\n'; 
  }

Open in new window

Avatar of Infinity08
Infinity08
Flag of Belgium image

A memcpy does not change the byte ordering.

>>   Buffer [ 0 ] = 0x50 ;
>>   Buffer [ 1 ] = 0xB0 ;

You place the unsigned short value 0x50B0 in the buffer in big endian byte order.

This means that any code operating on that buffer, and trying to extract that value from the buffer, needs to be aware that the byte ordering is big endian.
If the current platform is big endian itself, nothing needs to be done.
However, if the current platform is a little endian platform, the bytes need to be swapped before using the value.

In other words : the byte swapping needs to happen on the platform where the value will be used. And depending on the endianness of that platform, a byte swap needs to occur or not.
To add to above comment:

You are safe with bytes as long as you don't try to interpret a sequence of two or more bytes as an integer. If you do so, you have to care for endianess for each single integer you were putting into or getting out from the byte buffer and bitfields would make that task worse.

If you want a easy way out, simply convert your integers to strings and you must not care for endianess any more.
Avatar of forums_mp
forums_mp

ASKER

'If you want a easy way out, simply convert your integers to strings and you must not care for endianess any more'
As in convert the elements within 'Buffer' as a string?
My first C++ database had only string fields. I never had a problem with it for about 15 years.  Many serialization concepts go the way to serialize text only. Look at xml. It is pure ascii (7bit).

Note, endianess surely can be handled properly without converting numbers to strings. But surely bitfields shouldn't be used when transfering between platforms of different endianess. That is neither efficient nor makes any sense.
|| Note, endianess surely can be handled properly without converting numbers to strings.
Agreed, except I'm not inclined to change aircraft software today (big endian).... The receipient of the aircraft data is little endian and that's 'today's change - hence the current focus.  I'm just questioning the current implementation that's put in place.   The code does a memcpy of the data then treats bits 0 as bit  15 etc.. Seems (silly/) cumbersome to me
>> then treats bits 0 as bit  15 etc..

You wouldn't need to operate on the bit level. Just a simple swap of the bytes is sufficient for transferring from a big endian to a little endian machine.
ASKER CERTIFIED SOLUTION
Avatar of itsmeandnobodyelse
itsmeandnobodyelse
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I think i'm starting to understand where my confusion arises.  Documention defines 30 words, with each word 16 bits each.  MSB is bit 0.  Lets review the proxy layout for one of the words - word 1 - in a 30 word message.  Call it X:

[00..03] hexadecimal digit 1  = 0
[04..07] hexadecimal digit 2  = 1
[08..11] hexadecimal digit 3  = 0
[12..15] hexadecimal digit 4  = 2

When the source transmit the data, the raw data in the buffer of the receiver for X is:
0x0201
When memcpyd (receiver is little endian) X on the receiver is:
0x0102
Since MSB is bit 0 from the source, the code executing on the receiver for the range 0.3 must take the 12 bits (sizeof short on receiver is 16) - value 0x102 right shift, then keep the zero.

For the range 4..7 right shift 8 bits (i.e eliminate 0x02) - you're left with 0x01...use mask etc to eliminate 0.

In my mind once the endian issue has been settled.  0..3 should be 2 not 0.  I guess that's where I was going wrong?

Swapping from little endian to big endian (and vice versa), simply implies swapping the (8bit) bytes around.

So, 0x02 0x01 in memory becomes 0x01 0x02. Both bytes have simply been swapped around.

You don't need to work on the nibble (corresponding to a hexadecimal digit) level, and certainly not on the bit level.
|| You don't need to work on the nibble (corresponding to a hexadecimal digit)
|| level, and certainly not on the bit level.
I'm confused.  How would I get Hexadecimal digit 2 to be reflect 1 (as called out in the document) if I dont work on the nibble?
By swapping the bytes around.

0x01 and 0x02 are two bytes (note that each byte is represented by two hexadecimal digits). Swap them around, and you change the endianness.
To add to above:

The unit which is transferred between machines of different endianess is bytes. So, the byte is always bits 0 .. 8 before and after transfer (regardless whether bit 0 is 'big' or 'little') and needs no conversion. When transfering a 16bit integer (2 bytes) the byte order was swapped and you need to correct that. Same applies for 32bit and 64bit integers. In any case *only* the byte order needs to be reversed but only if you handle integers of a bit-count greater than 8 (16, 32, 64).

Note, you must not get confused by some kind of representation or output of numbers. So, we generally output numbers with the less significant digit (regardless of hex or decimal) on the right. That also implies bit-shifting where the main direction to increase a number would go from right to left. On the contrary we display arrays and their elements from left to right, i. e. array[0] is most left and for explaining endian issues I also prefer a presentation where the bit 0 of an entity is left. And it seems that was also your preference (as in http:#31386073).
infinity08 and itsmeandnobodyelse - pardon my ignorance here.   Let's rehash:

Upon receipt of the data from the host:   The data in the raw buffer (array of char) is:
0x0201
When memcopied to an unsigned short results the value results in:
0x0102
(infinity is saying) just 'swap them around'.  So I'm back to
0x0201

Hex digit 1=1, Hex digit 2=0, Hex digit 3=1, Hex digit 4 =0.
That wont work.  

I understand - or at least i think i do - the implication surrounding multi byte sequences and endianness. My argument here is because the documentation treats things backwards the proxy 0..3 (documentation)  means 12..15, 4..7 (documentation) means 8..11..etc for the value 0x0102
>> The data in the raw buffer (array of char) is:
>> 0x0201

The sending side sent the value 0x0201 in big endian order, which means that the message consists of two bytes 0x02 and 0x01, in that order.
The receiving side receives those two bytes. It knows that it is in big endian order. Since the receiving side is a little endian machine, it needs to swap around the bytes (in order to convert from big endian to little endian), so the two bytes become 0x01 and 0x02, in that order. When the receiving side interprets those two (swapped) bytes as a little endian 16bit value, it obtains the value 0x0201, which is precisely the value that was sent. Job well done.


Again : do not consider bits or nibbles. Just bytes. There are two bytes in the 16bit value. In order to switch from big endian to little endian, you simply swap those two bytes around.

There's no issue with bit numbering, or anything like that. It's a simple matter of re-arranging bytes.
|| There's no issue with bit numbering, or anything like that. It's a simple
||matter of re-arranging bytes.
So what would the value of hex digit 4 equate to given the obtained value of 0x0201 after interpreting?
I don't understand what you're asking, but you seem to be mixing up several things.


0x0201 is an integer value (written down in hexadecimal notation). In decimal, the value would be 513.

* When this value is stored in memory on a big endian machine, the memory will contain the bytes 0x02 (decimal : 2) and 0x01 (decimal : 1) in that order.

* When this value is stored in memory on a little endian machine, the memory will contain the bytes 0x01 (decimal : 1) and 0x02 (decimal : 2) in that order.

That is the only difference between little endian and big endian : the order in which bytes are stored. It is still the same value - just the way the value is stored, is different.

When transferring data from a big endian machine to a little endian machine (or vice versa), the order of the bytes has to be changed to match the target machine's endiannes. Otherwise, the data would be interpreted incorrectly. So, when receiving data, the bytes are re-ordered.

In this specifica example, the target machine receives the bytes 0x02 and 0x01 (in that order). Because a switch from big endian to little endian has to happen, these bytes need to be swapped around to 0x01 and 0x02 (in that order). The byte ordering is then consistent with that of a little endian machine, and the value can be read from memory as 0x0201 (the same value that was sent).
|| I don't understand what you're asking,
And herein lies the problem.  Recall that the transaction between source and target is 30 words where each word is 16 bits.   Each word is laid out in 'proxy' form.

The target machine upon receipt of source data needs to evaluate a handful of the proxies the perform some action.  Psedudo code:
  struct word_1 {
     unsigned short  hex_digit_1  :  4;
     unsinged short hex_digit_2  :   4;
     unsigned short hex_digit_3  :  4 ;
     unsigned short hex_digit_4  :  4 ;
  };

word_1 obj;
 if ( obj.hex_digit_4 == 2 ) {
  // good message received
  // do work
 }
else {

}

I understand the endian issues and the need to read the value the same it was sent.  i.e. 0x0201.  If I do that obj.hex_digit_4 would be 0.  Not 2 per the documentation.  

By the way, if an expert has source for dealing with bit proxies I'd love to see it/pay for it with points privately if that's possible.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
|| That is not what your original question was about :)
|| You're changing the context (or you didn't make it clear from the beginning).
I'll buy that.

Given your initial response and especially the response of itsme at:  04/08/10 12:54 AM, ID: 30095092, it became clear to me that the issue goes beyond endian.

|| In order to avoid any further confusion, could you please just show the documentation ?
i'll be able to show a snippet but not the entire document.

|| In order to avoid any further confusion, could you please just show the documentation ?

The attached reflects word 1 of a 30 word message.  
As mentioned before, upon receipt of the data from the host:   The data in the raw buffer (array of char) is:
0x0201
When memcopied to an unsigned short results the value results in:
0x0102


word-doc.pdf
Ok. Thanks for that. My previous post (http:#32066014) would cover that.

Bit fields should only be used if you can be absolutely sure that the layout of the bit fields corresponds to that of the message. Unfortunately, in C or C++, there's no such absolute certainty. Some compilers might provide it though, but when you depend on that, be aware that you always run the risk that this behavior changes (when using another compiler, or even just another version of the current compiler).
You might try to get a word from the host where all 4 hex digits were different. That would prove that you only need to swap the bytes of the word you want to convert from network order to host order like in

    // given s is the structure you received and unsigned short is 16 bits

    unsigned short * pus = (unsigned short *)&s;
    *pus[1] = ntohs(*pus[1]);

Got side tracked with vacation.  One final question on this.  Am i correct in saying that the notation in the document is backwards?  

It's all convention. Whether you start numbering from the most significant bit or from the least significant bit is not important, as long as you do it consistently.