• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 431
  • Last Modified:

Allocated Buffer alignment rules/differences on the heap/stack ???

Hi guys,

Using Borland cpp Builder 4.x

In the project properties I have set (actually is default) :

So, when I allocate memory I expect that memory to be QUAD WORD aligned.  That seems to work but not always.
Are there rules to this ??

E.g. does this only apply on the heap or also on the stack ??
e.g. is there a difference between :
BYTE Buffer[2048] and
BYTE *Buffer = new BYTE{2048]

Also, I have several structures which are declared such that they are BYTE packed #pragma option -a1)

When I next allocate memory, e.g.
BytePackedStructure Buffer ;
Buffer is NOT nessecary QUAD WORD aligned ??

Again, are there rules I should be aware of ??

Maybe I can declare the structures such that besides Byte packed, they are also QUAD WORD aligned when allocated ???

Input would be greatly appreciated !

1 Solution
I got the following comments upon my investigation into this problem:


Byte aligns to 8-bit boundaries.
Word aligns to 16-bit boundaries.
Double Word aligns to 32-bit boundaries. Data with type sizes of less than 4 byes are aligned on their type size.
Quad Word aligns to 64-bit boundaries. Data with type sizes of less than 8 bytes are aligned on their type size.
Here is directly from the horse's mouth ( from a person developing the bcc32 compiler )

> Structs are aligned at the smallest of the following 2: the current
> alignment and the struct alignment.  If the largest member inside your
> struct is 4 bytes, the alignment of your struct is 4 bytes, and your struct
> will be aligned on 4 bytes no matter if the overall alignment is set to 8
> bytes.  The overall alignment only has effect on variables whose size is
> larger than 8 bytes, and those will be aligned on 8 bytes.  If you have to
> have your struct aligned on 8 bytes, make sure it has a member whose size
> is 8 bytes or more.

: "Luca Garulli" <l.garulli@tin.it> wrote:
>But why in the first case A.b starts at the second byte when sizeof(string)
>is 16 ???!
There is no reason for this in the light of this information:
Word, double word, and quad word alignment force integer-size and larger items to be aligned on memory addresses that are multiples of the type chosen.
Double word alignment aligns non-character data at 32-bit word (4-byte) boundaries. Data with type sizes of less than four bytes are aligned on their type size.
Either this is a documentation problem, or the compiler does not completely enforce alignment rules.
Anyway, I went to
and submitted a bug report for you; FWIW, this
is a more direct link.
Thanks for bringing this up.
I don't think there is a bug here.

This BCB docs are unclear on this, but usually this sort of option only affects the allignment of static data (i.e. global variables, static data members, local static variables).  The option usually does not affect dynamically allocated memory.  This is because the memory can be allocated by an allocator that  is specified at link or run-time and that does not obey this switch.  (like a user defined operator new etc.)

>> E.g. does this only apply on the heap or also on the stack ??
Probably neither.  Probably only to statics.  An implimentations is required to provide a default allocator (new operaotor) that enforces the alignment options of the platform, but your allignment needs may be more critical than the basic hardware's.  if so, there are ways around this.  (basically allocating more memory than is required (by the alignment amount minus 1) and then using the aligned portion of that memory, you might even need to use placement new to initalize the memory.  You can even create a operator new overload that does this for you,

>> I have several structures which are
>> declared such that they are BYTE packed
>> #pragma option -a1)
There is a good chance the compiler will not make any effort to allign these structures when they are allocated on the heap or anywhere else.  If you can remove the byte packing option, it may try to align the structure, but there is not guarantee as this is totally implimentation defined.
From the C++ draft standard

Discussing the minimim alignment requirements of an allocated member buffer.

    The pointer returned shall  be suit-
    ably aligned so that it can be converted
    to a pointer of any complete object type and
    then used to access the object or array in the
    storage allocated  (until the storage is explicitly
    deallocated by a call to a corresponding
    deallocation  function)
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

nietod is right. Compiler's align option affects only the static/global data.

You have to allocate the greater buffer and align the pointer yourself.
Well, it depends on the compiler.  For example VC guarantees 16 byte or 32 byte (I can't remember which) alignment for default heap allocations.
sneeuwAuthor Commented:

With all this information now at hand, it seems to me there is never a real guaranty that normally allocating will deliver an aligned pointer.

I can of course write my own aligning code (e.g. overload new) but then I would need to allocate all data using this function (also the stuff I now allocate on the stack).

I have the impression I have some REAL experts in this dicussion (far more than what I can say about me).  Therefor following question to see if what I'm going to suggest will deliver a noticeable performance penalty !??

All code talking to the driver that needs the alignment eventually goes through one single function.
If I in that single function I do something like this :

IO_function (BYTE *Buffer, int Size, BYTE IO)
BYTE *Aligned_Buffer = new_align(Size) ;
if (IO == IN)
  memcpy(Aligned_Buffer,Buffer,Size) ;
  IO_Stuff_To_Driver(Aligned_Buffer) ;
  // IO == OUT
  IO_Stuff_To_Driver(Aligned_Buffer) ;
  memcpy(Buffer,Aligned_Buffer,Size) ;

Maybe allocate a fixed buffer on the stack, which this function can always use (initialised in constructor of class)

So, the question ...
Is this a severe performance penalty for IO that will happen every 10 milleseconds ??
Suggestions / thoughts / input ??

sneeuwAuthor Commented:
new_align() of course being the code that allocates a properly aligned buffer !
>> (also the stuff I now allocate on the stack)
providing alligment on the stack (locals) is usually extremely hard, so it pretty much isn't done.  At least you probably won't find a compiler that does more than the absolute minimum required by the hardware.  On x86 computers you can probalby rely on WORD (2 byte) alignment for values that are larger than a byte.

I'm not sure I understand all the details of your solution.  But that extra copy of the data will take time, (assuming one operand is not aligned, probably about 2 clock cycles per 32 bits to be copied)  If the data is short, that should not be too bad.  You might instead try an approach where the allocated buffer is made a little larger by the allignment amount minus 1 (i.e.  if the data must be aligned to a 32 bit (4 byte) boundary, the buffer has an extra 4-1 = 3 bytes.)  Then there is guaratted to be a position within the buffer that is suitably aligned and there will be sufficient space after it.   the function can fill in data starting at the aligned position and then can return to the caller a pointer to the start of the data.  i.e the pointer returned may be the pointer to the buffer the caller specified, or it might be a pointer a little ways into the buffer.

>> Maybe allocate a fixed buffer on the stack,
>> which this function can always use (initialised
>> in constructor of
A constructor can't allocate data on the stack that can remain after the constructor ends.  You pretty much have to use dynamic allocation for this.
sneeuwAuthor Commented:

Since I'm allowing all sorts of modules to send commands to the function that actually talks to the driver and since all those modules may allocate memory as they want I 'm never sure what I will get.

I hoped to be able to specify some settings that had to be aplied before building but that's no out of the question.

Bottomline, I need to make sure the buffer is aligned properly before I send it to the driver and that is then done in the IO function.

A possibility is to copy everything always in a properly aligned buffer even if it may take some time.
64 K would apparently take 17000 clock cycles so that's like a fifth of a millesecond on a 100 MHz system.

And yes, making sure that inbetween buffer is properly aligned is the way you describe.

It's a pitty there is no golden trick with no performance penalty but I guess there never is !? ;-)

Nietod, you're of course right when you say I can't allocate that buffer on the stack in the constructor. Not thought through well !  My mistake (a mistake I wouldn't make while coding I think, but not thought through when I was writing the comment ;-)

> On x86 computers you can probalby rely on WORD (2 byte) alignment for values that are larger than a byte.

I though I saw BYTE aligned when allocating memory for packed structures !!??

Let's conclude this story ... unless some else has a brilliant idea !??
>> Since I'm allowing all sorts of modules to
>> send commands to the function that actually
>> talks to the driver and since all those modules
>> may allocate memory as they want I 'm never
>> sure what I will get.
Perhaps you need to use a class to help allocate the memory.  there are probably other ways a class or classes can the design too.

>>  though I saw BYTE aligned when allocating
>> memory for packed structures !!??
yes and no.  There are two issues.  The allignment of the start of the structure and the alignment of the items in the structure relative to the start of the structure.   On an x86 you are pretty much guaranteed that the start of a local structure will have WORD alignment.  (A compiler would have to go out of its way to not provide this, so its pretty safe to rely on.)   So if structure starts on a WORD alignment any of its data members that are also WROD aligned (from the start of the structure) are also WORD aligned.  But when you BYTE pack a structure, many of its data members may not be WORD aligned from the start of the structure, so they will not be WORD aligned.
sneeuwAuthor Commented:
One other 'extra' question.

When I each time declare the memory I need (using a special new function e.g. from another class) in the one function through which all communication goes, ... will that take up time ??

E.g. what I mean :

Does it take long to :

BYTE * Buffer = class_new(1000) ;


Delete[] Buffer ;

Provided the class_new function doesn't take long.

E.g. also, how long dos it take to :

BYTE *Buffer = new BYTE[1000] ;

Can this ever cause a performance penalty ?? or should I declare the buffer only once (globally) and each time use the already declared buffer ??

Hope this makes sense !?

I think I understand your question.  Are you asking is it costly (in terms of time) to allocate memory?

Allocating memory is faily costly.  The memory is allocated from a heap (It doesn't have to be, but that is realistically the only way).  A heap is in many ways a linked list or binary tree and the allocation requires that the tree/list be searched to find a block that satisfies the allocation, then the heap must the changed to satisfy the allocation, which invloves altering other parts of the heap.  This takes a fair amount of time.

But you should always consider the 80-20 rule.  You need to worry about the bottlenecks in your program, not the bulk of the code.  If after doing an allocation you will read data from disk into the buffer, then the bottleneck will be the read operation which will take 100s or probably 1000s of times (even for reading a few bytes)  longer than allocating the space.  in other words, you could allocate the memery, delete it, allocate it again. delete again 100s of times and then do the read and probably never notice the difference!    But if you are copying data already in memory into the buffer, then the allocation is likely to take much longer than the copy operation.  In that case the situation is reversed.  If you repeatedly allcoate and delete it will be noticable, howver if you repeatidly did the copy.  (i.e. copies the same information 10 times in a row) you probably wouldn't notice the difference.

You really need to think carefully about where the bottlenecks are in the program, and use a profiler if possible, and optimize the bottlenecks.  Optimizing the rest of the code tends to be a waste of time and may make the code less safe and harder to manage.
sneeuwAuthor Commented:
Thanks !!!

Job well done ;-)

Featured Post

Take Control of Web Hosting For Your Clients

As a web developer or IT admin, successfully managing multiple client accounts can be challenging. In this webinar we will look at the tools provided by Media Temple and Plesk to make managing your clients’ hosting easier.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now