Solved

size of struct in mfc is invalid?

Posted on 2001-09-07
19
577 Views
Last Modified: 2013-11-20
I define a struct in mfc for example
struct {
    char aa;
    short bb;
}
but is size no match with size of component.
probably it fix it multiple with 4
I use a dll in mfc and i want pass it a struct
how can do it?


0
Comment
Question by:javadp
  • 7
  • 4
  • 3
  • +3
19 Comments
 
LVL 5

Expert Comment

by:FengYuan
Comment Utility
1) Check the struct alignment option for the compiler, make sure it's the same in DLL and EXE.

2) Use #pragma pack

#pragma pack(push, 4)

struct ....

#pragma pack(pop)
0
 
LVL 6

Expert Comment

by:Triskelion
Comment Utility
You can also fix this by packing it on a 1-byte boundary.

Project->settings->C/C++
Choose 'Category' of Code Generation
Struct member alignment.
0
 
LVL 7

Accepted Solution

by:
peterchen092700 earned 50 total points
Comment Utility
I wouldn't change the alignment option for the entier project. Might be there's one place which relies on zp8...

to declare a byte packed structure, use

#pragma pack(push, 1) // can be 1,2,4,8

struct {
   char aa;
   short bb;
}

#pragma pack(pop)

(yes - this is almost what FengYuan suggested.)

You can also add your own pad bytes if you beed.


Peter

0
 
LVL 6

Expert Comment

by:Triskelion
Comment Utility
peterchen, I have always used a byte alignment of 1.
I have NEVER needed to have any other byte alignment.

I have been involved with projects where I was required to use /Zp1 for network message passing of odd sized messages, but I've never had to use larger.
0
 
LVL 7

Expert Comment

by:peterchen092700
Comment Utility
>> but I've never had to use larger.
yes, but you don't know what other libraries are involved in javadp's problem. And they might...
0
 
LVL 30

Expert Comment

by:Axter
Comment Utility
>>I have NEVER needed to have any other byte alignment.

You loose optimization with a lower number.  You don't want to use a lower number, if you don't have to.
0
 
LVL 6

Expert Comment

by:Triskelion
Comment Utility
The optimization lost only has to do with what's passed over the stack, right???
0
 
LVL 7

Expert Comment

by:peterchen092700
Comment Utility
no, actually, it affects all and every access to misaligned struct members. ("misaligned" is roughly: address is not a multiple of member size). Note the "struct" in this sense includes "class".

misaligned reads are costly since they (partly) break the benefits of the cache.

a) misaligned reads could require *two* cache rows to be filled instead of one. And this probably isn't done in parallel...

b) even when reading from cache, two read operations are necessary

c) If my mind serves me right, misaligned reads are typically handled by an exception-like mechanism that adds quite some overhead (about a dozen ticks for a read-from-cache)

There's one downturn of "correct" alignment: if you work on larger chunks of memory, and the well aligned structure exceeds cache, while the packed structure doesn't, packed *might* be faster. But only *might*.

So choose your option wisely...

Peter
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
peterchen,
>>If my mind serves me right, misaligned reads are typically handled by an exception-like mechanism...

I'll bet dollars to donuts that you come from a Macintosh 68xxxx background where that was (is?) true.  Not so with Intel, but one can expect a clock cycle or two in wasted memory access.

-- Dan
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 7

Expert Comment

by:peterchen092700
Comment Utility
you lost the dollars, but I'd prefer the donuts, thanks.
All assembly I did was Intel (286..original Pentium), oh and KC85/1 not to forget... anyway, might be I picked this up in some Motorola article, or at my boss' office.

Peter the Mind-like-a-how-do-you-call-the-thing-to-rinse-rice?
0
 
LVL 6

Expert Comment

by:Triskelion
Comment Utility
Done?
0
 
LVL 8

Expert Comment

by:fl0yd
Comment Utility
peterchen,
you are somewhat right about the 'exception-like mechanism'. accessing data that is not aligned at DWORD-boundaries in memory can become very timeconsuming. especially when data crosses the cache-line-boundary (PIII:32 bytes) it gets really nasty. in this case you will definitaly get a pipeline stall which costs up to 17 clock cycles.
Bare in mind that lining up the data well in your class definition can greatly improve speed.
Assembly I did so far: Z80, Pentium, PII, PIII, Athlon.
-- .:fl0yd:.
0
 
LVL 7

Expert Comment

by:peterchen092700
Comment Utility
>>PIII, Athlon
Still worth the trouble? My WATCOM compiler always used to beat my own ASM...
(just curious)
0
 
LVL 8

Expert Comment

by:fl0yd
Comment Utility
>>Still worth the trouble?
Rarely, but yes, somtimes it still has to be asm. CPU's have gotten really powerful and so have compilers. At the same time they both became very complex. This is why any decent C-compiler beats poorly written asm-code. Starting with the Pentium if you try to hand-optimize your code you should make sure that you are *ENTIRELY* familiar with the CPU-architecture.
Some examples when you will need to use assembly:
* to take advantage of MMX/SSE/3DNow! extensions for parallel computation of large amounts of similar data - mainly in graphics- or sound-related applications. As far as I know there is only Intel's compiler that uses MMX/SSE if the source code contains the correct hints.
* if you are writing an operation system. For example, to read or alter access permissions for memory you will have to talk to the CPU directly.
* if you would like to use Intel's performance registers to profile an application there is no other way than using assembly. Those registers contain information on the number of clock cycles since the cpu started, the number of cache hits/misses for memory accesses, the number of correct or mispredicted branches taken at conditional jumps, etc.
* whenever you find your C-compiler messing things up.
hope that clarifies things,
.:fl0yd:.
0
 
LVL 7

Expert Comment

by:peterchen092700
Comment Utility
sure, it's typicaly the latter that keeps me involved ;)
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
>>... a pipeline stall which costs up to 17 clock cycles... it gets really nasty.

hmmm.  Let me find a napkin to flip over...  Here's a stub of a pencil.  I'll just lick the lead... lesseee...  running a slowish 1GHz machine, 17 clock cycles is about... 17/1,000,000,000-ths of a second.  So if that happens really-really often, say One Meeeelion times... my program will take "up to" 1/50th of a second longer to run than otherwise.  With all of that waiting around for the computer, where am I gunna find time to wash my hair?

>>This is why any decent C-compiler beats poorly written
asm-code.

I think it would be closer to the mark to say that any commercial optimizing C compiler could beat all but the very best hand-optimized ASM code.

And I'd add that with very few exceptions (video drivers, MPG encoders) the few percent gain would be lost in the noise on a 1GHz prosher.  High-performance games such as Quake3 are written in C.  Any company demanding all-ASM code in, for instance, a device driver, has missed their deadline by so much and so often that they are probably out of business by now.  Exception: Government contracts :)

"Writing in ASM" conjures up an image of the three-ream printout to output "Hello World" but in reality, you can write your entire program in C/C++ and drop-down to some inline ASM for a few functions and end up with a program that is most likely within a percent or two of pure ASM performance.

-- Dan
0
 
LVL 7

Expert Comment

by:peterchen092700
Comment Utility
Yep, that's my conclusion, mostly, except theat the pipeline stall doesn't matter...
0
 
LVL 8

Expert Comment

by:fl0yd
Comment Utility
Dan,
>>my program will take "up to" 1/50th of a second longer to run than otherwise
If you put it that way it doesn't sound like anything to worry about. But let me have that napkin again, and the pencil. Now let's say you are writing a render kernel and you are aiming at 100fps, for the sake of easy algebra. Using the algorithm that incorporates this constant stalling of the pipeline for every frame you are wasting half of the real power just waiting for the cpu to catch up on itself -- i.e. aiming at 100fps you will get 50fps due to poor data layout only.
>>So if that happens really-really often, say One Meeeelion times
Let's just use that napkin again to show, that 1 million times isn't quite 'really-really often': Take a bitmap with 32bpp and 512x512 pixels in size == 1MB == 1 million bytes. Going through it byte by byte you have you're 'one million times'.
>>I think it would be closer to the mark to say that any commercial optimizing C compiler could beat all but the very best hand-optimized ASM code.
Agreed - that's what I meant by decent...
>>but in reality, you can write your entire program in C/C++ and drop-down to some inline ASM for a few functions and end up with a program that is most likely within a percent or two of pure ASM performance.
Again, I totally agree - the last game written entirely in asm was NBA 96, a long time ago.
To make my point clear: There are situations, however few, when it makes sense to use assembly. Inline assembly is sufficient most of the times. If anyone would ask me to do it in pure asm, I'd be questioning his sanity and look for something else as soon as possible.

regards,
.:fl0yd:.
0
 
LVL 49

Expert Comment

by:DanRollins
Comment Utility
>>aiming at 100fps you will get 50fps due to poor data layout only.

That 'aiming' business could explain my crappy frag counts.  

I agree that there is no reason to lose even one cycle if there is no cost involved.  And aligning data to avoid a potential problem is basically free.

-- Dan
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
cat dog challenge 18 122
repeateFront java challenge 31 86
how do i create updater to My Activex application? 3 74
sumHeights  challenge 17 59
Introduction: Load and Save to file, Document-View interaction inside the SDI. Continuing from the second article about sudoku.   Open the project in visual studio. From the class view select CSudokuDoc and double click to open the header …
Introduction: Ownerdraw of the grid button.  A singleton class implentation and usage. Continuing from the fifth article about sudoku.   Open the project in visual studio. Go to the class view – CGridButton should be visible as a class.  R…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now