Allocated Buffer alignment rules/differences on the heap/stack ???

Posted on 2000-04-25
Last Modified: 2010-04-02
Hi guys,

Using Borland cpp Builder 4.x

In the project properties I have set (actually is default) :

So, when I allocate memory I expect that memory to be QUAD WORD aligned.  That seems to work but not always.
Are there rules to this ??

E.g. does this only apply on the heap or also on the stack ??
e.g. is there a difference between :
BYTE Buffer[2048] and
BYTE *Buffer = new BYTE{2048]

Also, I have several structures which are declared such that they are BYTE packed #pragma option -a1)

When I next allocate memory, e.g.
BytePackedStructure Buffer ;
Buffer is NOT nessecary QUAD WORD aligned ??

Again, are there rules I should be aware of ??

Maybe I can declare the structures such that besides Byte packed, they are also QUAD WORD aligned when allocated ???

Input would be greatly appreciated !

Question by:sneeuw
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Expert Comment

ID: 2749473
I got the following comments upon my investigation into this problem:


Byte aligns to 8-bit boundaries.
Word aligns to 16-bit boundaries.
Double Word aligns to 32-bit boundaries. Data with type sizes of less than 4 byes are aligned on their type size.
Quad Word aligns to 64-bit boundaries. Data with type sizes of less than 8 bytes are aligned on their type size.
Here is directly from the horse's mouth ( from a person developing the bcc32 compiler )

> Structs are aligned at the smallest of the following 2: the current
> alignment and the struct alignment.  If the largest member inside your
> struct is 4 bytes, the alignment of your struct is 4 bytes, and your struct
> will be aligned on 4 bytes no matter if the overall alignment is set to 8
> bytes.  The overall alignment only has effect on variables whose size is
> larger than 8 bytes, and those will be aligned on 8 bytes.  If you have to
> have your struct aligned on 8 bytes, make sure it has a member whose size
> is 8 bytes or more.

: "Luca Garulli" <> wrote:
>But why in the first case A.b starts at the second byte when sizeof(string)
>is 16 ???!
There is no reason for this in the light of this information:
Word, double word, and quad word alignment force integer-size and larger items to be aligned on memory addresses that are multiples of the type chosen.
Double word alignment aligns non-character data at 32-bit word (4-byte) boundaries. Data with type sizes of less than four bytes are aligned on their type size.
Either this is a documentation problem, or the compiler does not completely enforce alignment rules.
Anyway, I went to
and submitted a bug report for you; FWIW, this
is a more direct link.
Thanks for bringing this up.
LVL 22

Expert Comment

ID: 2749875
I don't think there is a bug here.

This BCB docs are unclear on this, but usually this sort of option only affects the allignment of static data (i.e. global variables, static data members, local static variables).  The option usually does not affect dynamically allocated memory.  This is because the memory can be allocated by an allocator that  is specified at link or run-time and that does not obey this switch.  (like a user defined operator new etc.)

>> E.g. does this only apply on the heap or also on the stack ??
Probably neither.  Probably only to statics.  An implimentations is required to provide a default allocator (new operaotor) that enforces the alignment options of the platform, but your allignment needs may be more critical than the basic hardware's.  if so, there are ways around this.  (basically allocating more memory than is required (by the alignment amount minus 1) and then using the aligned portion of that memory, you might even need to use placement new to initalize the memory.  You can even create a operator new overload that does this for you,

>> I have several structures which are
>> declared such that they are BYTE packed
>> #pragma option -a1)
There is a good chance the compiler will not make any effort to allign these structures when they are allocated on the heap or anywhere else.  If you can remove the byte packing option, it may try to align the structure, but there is not guarantee as this is totally implimentation defined.
LVL 22

Expert Comment

ID: 2749888
From the C++ draft standard

Discussing the minimim alignment requirements of an allocated member buffer.

    The pointer returned shall  be suit-
    ably aligned so that it can be converted
    to a pointer of any complete object type and
    then used to access the object or array in the
    storage allocated  (until the storage is explicitly
    deallocated by a call to a corresponding
    deallocation  function)
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 15

Expert Comment

ID: 2751066
nietod is right. Compiler's align option affects only the static/global data.

You have to allocate the greater buffer and align the pointer yourself.
LVL 22

Expert Comment

ID: 2751181
Well, it depends on the compiler.  For example VC guarantees 16 byte or 32 byte (I can't remember which) alignment for default heap allocations.

Author Comment

ID: 2752377

With all this information now at hand, it seems to me there is never a real guaranty that normally allocating will deliver an aligned pointer.

I can of course write my own aligning code (e.g. overload new) but then I would need to allocate all data using this function (also the stuff I now allocate on the stack).

I have the impression I have some REAL experts in this dicussion (far more than what I can say about me).  Therefor following question to see if what I'm going to suggest will deliver a noticeable performance penalty !??

All code talking to the driver that needs the alignment eventually goes through one single function.
If I in that single function I do something like this :

IO_function (BYTE *Buffer, int Size, BYTE IO)
BYTE *Aligned_Buffer = new_align(Size) ;
if (IO == IN)
  memcpy(Aligned_Buffer,Buffer,Size) ;
  IO_Stuff_To_Driver(Aligned_Buffer) ;
  // IO == OUT
  IO_Stuff_To_Driver(Aligned_Buffer) ;
  memcpy(Buffer,Aligned_Buffer,Size) ;

Maybe allocate a fixed buffer on the stack, which this function can always use (initialised in constructor of class)

So, the question ...
Is this a severe performance penalty for IO that will happen every 10 milleseconds ??
Suggestions / thoughts / input ??


Author Comment

ID: 2752383
new_align() of course being the code that allocates a properly aligned buffer !
LVL 22

Accepted Solution

nietod earned 150 total points
ID: 2752466
>> (also the stuff I now allocate on the stack)
providing alligment on the stack (locals) is usually extremely hard, so it pretty much isn't done.  At least you probably won't find a compiler that does more than the absolute minimum required by the hardware.  On x86 computers you can probalby rely on WORD (2 byte) alignment for values that are larger than a byte.

I'm not sure I understand all the details of your solution.  But that extra copy of the data will take time, (assuming one operand is not aligned, probably about 2 clock cycles per 32 bits to be copied)  If the data is short, that should not be too bad.  You might instead try an approach where the allocated buffer is made a little larger by the allignment amount minus 1 (i.e.  if the data must be aligned to a 32 bit (4 byte) boundary, the buffer has an extra 4-1 = 3 bytes.)  Then there is guaratted to be a position within the buffer that is suitably aligned and there will be sufficient space after it.   the function can fill in data starting at the aligned position and then can return to the caller a pointer to the start of the data.  i.e the pointer returned may be the pointer to the buffer the caller specified, or it might be a pointer a little ways into the buffer.

>> Maybe allocate a fixed buffer on the stack,
>> which this function can always use (initialised
>> in constructor of
A constructor can't allocate data on the stack that can remain after the constructor ends.  You pretty much have to use dynamic allocation for this.

Author Comment

ID: 2753360

Since I'm allowing all sorts of modules to send commands to the function that actually talks to the driver and since all those modules may allocate memory as they want I 'm never sure what I will get.

I hoped to be able to specify some settings that had to be aplied before building but that's no out of the question.

Bottomline, I need to make sure the buffer is aligned properly before I send it to the driver and that is then done in the IO function.

A possibility is to copy everything always in a properly aligned buffer even if it may take some time.
64 K would apparently take 17000 clock cycles so that's like a fifth of a millesecond on a 100 MHz system.

And yes, making sure that inbetween buffer is properly aligned is the way you describe.

It's a pitty there is no golden trick with no performance penalty but I guess there never is !? ;-)

Nietod, you're of course right when you say I can't allocate that buffer on the stack in the constructor. Not thought through well !  My mistake (a mistake I wouldn't make while coding I think, but not thought through when I was writing the comment ;-)

> On x86 computers you can probalby rely on WORD (2 byte) alignment for values that are larger than a byte.

I though I saw BYTE aligned when allocating memory for packed structures !!??

Let's conclude this story ... unless some else has a brilliant idea !??
LVL 22

Expert Comment

ID: 2753728
>> Since I'm allowing all sorts of modules to
>> send commands to the function that actually
>> talks to the driver and since all those modules
>> may allocate memory as they want I 'm never
>> sure what I will get.
Perhaps you need to use a class to help allocate the memory.  there are probably other ways a class or classes can the design too.

>>  though I saw BYTE aligned when allocating
>> memory for packed structures !!??
yes and no.  There are two issues.  The allignment of the start of the structure and the alignment of the items in the structure relative to the start of the structure.   On an x86 you are pretty much guaranteed that the start of a local structure will have WORD alignment.  (A compiler would have to go out of its way to not provide this, so its pretty safe to rely on.)   So if structure starts on a WORD alignment any of its data members that are also WROD aligned (from the start of the structure) are also WORD aligned.  But when you BYTE pack a structure, many of its data members may not be WORD aligned from the start of the structure, so they will not be WORD aligned.

Author Comment

ID: 2754812
One other 'extra' question.

When I each time declare the memory I need (using a special new function e.g. from another class) in the one function through which all communication goes, ... will that take up time ??

E.g. what I mean :

Does it take long to :

BYTE * Buffer = class_new(1000) ;


Delete[] Buffer ;

Provided the class_new function doesn't take long.

E.g. also, how long dos it take to :

BYTE *Buffer = new BYTE[1000] ;

Can this ever cause a performance penalty ?? or should I declare the buffer only once (globally) and each time use the already declared buffer ??

Hope this makes sense !?

LVL 22

Expert Comment

ID: 2754857
I think I understand your question.  Are you asking is it costly (in terms of time) to allocate memory?

Allocating memory is faily costly.  The memory is allocated from a heap (It doesn't have to be, but that is realistically the only way).  A heap is in many ways a linked list or binary tree and the allocation requires that the tree/list be searched to find a block that satisfies the allocation, then the heap must the changed to satisfy the allocation, which invloves altering other parts of the heap.  This takes a fair amount of time.

But you should always consider the 80-20 rule.  You need to worry about the bottlenecks in your program, not the bulk of the code.  If after doing an allocation you will read data from disk into the buffer, then the bottleneck will be the read operation which will take 100s or probably 1000s of times (even for reading a few bytes)  longer than allocating the space.  in other words, you could allocate the memery, delete it, allocate it again. delete again 100s of times and then do the read and probably never notice the difference!    But if you are copying data already in memory into the buffer, then the allocation is likely to take much longer than the copy operation.  In that case the situation is reversed.  If you repeatedly allcoate and delete it will be noticable, howver if you repeatidly did the copy.  (i.e. copies the same information 10 times in a row) you probably wouldn't notice the difference.

You really need to think carefully about where the bottlenecks are in the program, and use a profiler if possible, and optimize the bottlenecks.  Optimizing the rest of the code tends to be a waste of time and may make the code less safe and harder to manage.

Author Comment

ID: 2755023
Thanks !!!

Job well done ;-)

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Grammars for C C++ and java 1 138
How to split this in C++ 4 117
Printing the elements of a set declared inside a map in C++. 3 58
I could not set window to top 4 40
Unlike C#, C++ doesn't have native support for sealing classes (so they cannot be sub-classed). At the cost of a virtual base class pointer it is possible to implement a pseudo sealing mechanism The trick is to virtually inherit from a base class…
Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question