• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 606
  • Last Modified:

size of struct in mfc is invalid?

I define a struct in mfc for example
struct {
    char aa;
    short bb;
}
but is size no match with size of component.
probably it fix it multiple with 4
I use a dll in mfc and i want pass it a struct
how can do it?


0
javadp
Asked:
javadp
  • 7
  • 4
  • 3
  • +3
1 Solution
 
FengYuanCommented:
1) Check the struct alignment option for the compiler, make sure it's the same in DLL and EXE.

2) Use #pragma pack

#pragma pack(push, 4)

struct ....

#pragma pack(pop)
0
 
TriskelionCommented:
You can also fix this by packing it on a 1-byte boundary.

Project->settings->C/C++
Choose 'Category' of Code Generation
Struct member alignment.
0
 
peterchen092700Commented:
I wouldn't change the alignment option for the entier project. Might be there's one place which relies on zp8...

to declare a byte packed structure, use

#pragma pack(push, 1) // can be 1,2,4,8

struct {
   char aa;
   short bb;
}

#pragma pack(pop)

(yes - this is almost what FengYuan suggested.)

You can also add your own pad bytes if you beed.


Peter

0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
TriskelionCommented:
peterchen, I have always used a byte alignment of 1.
I have NEVER needed to have any other byte alignment.

I have been involved with projects where I was required to use /Zp1 for network message passing of odd sized messages, but I've never had to use larger.
0
 
peterchen092700Commented:
>> but I've never had to use larger.
yes, but you don't know what other libraries are involved in javadp's problem. And they might...
0
 
AxterCommented:
>>I have NEVER needed to have any other byte alignment.

You loose optimization with a lower number.  You don't want to use a lower number, if you don't have to.
0
 
TriskelionCommented:
The optimization lost only has to do with what's passed over the stack, right???
0
 
peterchen092700Commented:
no, actually, it affects all and every access to misaligned struct members. ("misaligned" is roughly: address is not a multiple of member size). Note the "struct" in this sense includes "class".

misaligned reads are costly since they (partly) break the benefits of the cache.

a) misaligned reads could require *two* cache rows to be filled instead of one. And this probably isn't done in parallel...

b) even when reading from cache, two read operations are necessary

c) If my mind serves me right, misaligned reads are typically handled by an exception-like mechanism that adds quite some overhead (about a dozen ticks for a read-from-cache)

There's one downturn of "correct" alignment: if you work on larger chunks of memory, and the well aligned structure exceeds cache, while the packed structure doesn't, packed *might* be faster. But only *might*.

So choose your option wisely...

Peter
0
 
DanRollinsCommented:
peterchen,
>>If my mind serves me right, misaligned reads are typically handled by an exception-like mechanism...

I'll bet dollars to donuts that you come from a Macintosh 68xxxx background where that was (is?) true.  Not so with Intel, but one can expect a clock cycle or two in wasted memory access.

-- Dan
0
 
peterchen092700Commented:
you lost the dollars, but I'd prefer the donuts, thanks.
All assembly I did was Intel (286..original Pentium), oh and KC85/1 not to forget... anyway, might be I picked this up in some Motorola article, or at my boss' office.

Peter the Mind-like-a-how-do-you-call-the-thing-to-rinse-rice?
0
 
TriskelionCommented:
Done?
0
 
fl0ydCommented:
peterchen,
you are somewhat right about the 'exception-like mechanism'. accessing data that is not aligned at DWORD-boundaries in memory can become very timeconsuming. especially when data crosses the cache-line-boundary (PIII:32 bytes) it gets really nasty. in this case you will definitaly get a pipeline stall which costs up to 17 clock cycles.
Bare in mind that lining up the data well in your class definition can greatly improve speed.
Assembly I did so far: Z80, Pentium, PII, PIII, Athlon.
-- .:fl0yd:.
0
 
peterchen092700Commented:
>>PIII, Athlon
Still worth the trouble? My WATCOM compiler always used to beat my own ASM...
(just curious)
0
 
fl0ydCommented:
>>Still worth the trouble?
Rarely, but yes, somtimes it still has to be asm. CPU's have gotten really powerful and so have compilers. At the same time they both became very complex. This is why any decent C-compiler beats poorly written asm-code. Starting with the Pentium if you try to hand-optimize your code you should make sure that you are *ENTIRELY* familiar with the CPU-architecture.
Some examples when you will need to use assembly:
* to take advantage of MMX/SSE/3DNow! extensions for parallel computation of large amounts of similar data - mainly in graphics- or sound-related applications. As far as I know there is only Intel's compiler that uses MMX/SSE if the source code contains the correct hints.
* if you are writing an operation system. For example, to read or alter access permissions for memory you will have to talk to the CPU directly.
* if you would like to use Intel's performance registers to profile an application there is no other way than using assembly. Those registers contain information on the number of clock cycles since the cpu started, the number of cache hits/misses for memory accesses, the number of correct or mispredicted branches taken at conditional jumps, etc.
* whenever you find your C-compiler messing things up.
hope that clarifies things,
.:fl0yd:.
0
 
peterchen092700Commented:
sure, it's typicaly the latter that keeps me involved ;)
0
 
DanRollinsCommented:
>>... a pipeline stall which costs up to 17 clock cycles... it gets really nasty.

hmmm.  Let me find a napkin to flip over...  Here's a stub of a pencil.  I'll just lick the lead... lesseee...  running a slowish 1GHz machine, 17 clock cycles is about... 17/1,000,000,000-ths of a second.  So if that happens really-really often, say One Meeeelion times... my program will take "up to" 1/50th of a second longer to run than otherwise.  With all of that waiting around for the computer, where am I gunna find time to wash my hair?

>>This is why any decent C-compiler beats poorly written
asm-code.

I think it would be closer to the mark to say that any commercial optimizing C compiler could beat all but the very best hand-optimized ASM code.

And I'd add that with very few exceptions (video drivers, MPG encoders) the few percent gain would be lost in the noise on a 1GHz prosher.  High-performance games such as Quake3 are written in C.  Any company demanding all-ASM code in, for instance, a device driver, has missed their deadline by so much and so often that they are probably out of business by now.  Exception: Government contracts :)

"Writing in ASM" conjures up an image of the three-ream printout to output "Hello World" but in reality, you can write your entire program in C/C++ and drop-down to some inline ASM for a few functions and end up with a program that is most likely within a percent or two of pure ASM performance.

-- Dan
0
 
peterchen092700Commented:
Yep, that's my conclusion, mostly, except theat the pipeline stall doesn't matter...
0
 
fl0ydCommented:
Dan,
>>my program will take "up to" 1/50th of a second longer to run than otherwise
If you put it that way it doesn't sound like anything to worry about. But let me have that napkin again, and the pencil. Now let's say you are writing a render kernel and you are aiming at 100fps, for the sake of easy algebra. Using the algorithm that incorporates this constant stalling of the pipeline for every frame you are wasting half of the real power just waiting for the cpu to catch up on itself -- i.e. aiming at 100fps you will get 50fps due to poor data layout only.
>>So if that happens really-really often, say One Meeeelion times
Let's just use that napkin again to show, that 1 million times isn't quite 'really-really often': Take a bitmap with 32bpp and 512x512 pixels in size == 1MB == 1 million bytes. Going through it byte by byte you have you're 'one million times'.
>>I think it would be closer to the mark to say that any commercial optimizing C compiler could beat all but the very best hand-optimized ASM code.
Agreed - that's what I meant by decent...
>>but in reality, you can write your entire program in C/C++ and drop-down to some inline ASM for a few functions and end up with a program that is most likely within a percent or two of pure ASM performance.
Again, I totally agree - the last game written entirely in asm was NBA 96, a long time ago.
To make my point clear: There are situations, however few, when it makes sense to use assembly. Inline assembly is sufficient most of the times. If anyone would ask me to do it in pure asm, I'd be questioning his sanity and look for something else as soon as possible.

regards,
.:fl0yd:.
0
 
DanRollinsCommented:
>>aiming at 100fps you will get 50fps due to poor data layout only.

That 'aiming' business could explain my crappy frag counts.  

I agree that there is no reason to lose even one cycle if there is no cost involved.  And aligning data to avoid a potential problem is basically free.

-- Dan
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 7
  • 4
  • 3
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now