Solved

How code organization is related to performance (MIPS architecture)

Posted on 2001-06-25
6
425 Views
Last Modified: 2010-04-15
I'm working on code for packet processing device running on MIPS processor using Windriver's VxWorks Operating System. Recently I started measuring the performance of the system. I found that minor code changes not related at all to the core function is causing big performance difference. Studying it further I found that adding dummy code of few bytes on certain locations in the code may hit the performance up to 25%.
I did not find any linkage to any compiler/linker switches or to the linking order.
I know MIPS is sensitive to code alignment but I'm sure the compiler / Linker are taking care of that.
Any Idea what can cause such a big difference in performance?

Appreciate your help,

Dan
0
Comment
Question by:Dan_Shule
6 Comments
 
LVL 5

Expert Comment

by:djbusychild
ID: 6226228
well, the easiest thing to do is look at the actual assembly code after you disassemble it and look at what it's doing. Where you put code, how your structures are packed, etc.. there can be many variants. Long time ago I did this for Alpha... write C code, disassemble it and analyze the assembly code to fine tune.
0
 
LVL 5

Expert Comment

by:djbusychild
ID: 6226230
If you haven't yet. You should def. read this book. Strongly recommended.

http://www.amazon.com/exec/obidos/ASIN/1558604286/o/qid=993525930/sr=2-3/ref=aps_sr_b_1_3/104-3865377-6940723
0
 
LVL 2

Accepted Solution

by:
Slordak earned 200 total points
ID: 6233463
Speaking generally, there are a huge number of things which can impact the performance of a given block of code, and the performance can in fact vary widely with just a few changes which don't influence the core algorithm.  It is true that with an efficient optimizing compiler, most of these can be addressed by the compiler (instruction scheduling, loop unrolling, register allocation), but there are still cases where this is not true.  The best architectures tend to be those where the compiler can work together with the processor to improve performance, such as by giving hints about which way a branch will typically be taken or by pre-fetching code which will be needed soon.

Some examples of areas in which minor changes can have relatively large impacts:

1) Cache sizes / Working sets - If the code and data for a given routine fits entirely in the L1 cache of the processor, it's easy to see how this will give the best performance.  Adding a few stray variables or chunks of code which push the size over the limit can result in having to fetch cache blocks in and out, in effect thrashing the cache.

  An interesting example of this... Let's say your data cache with LRU replacement policy can cache 32 elements of some array you're looping through.  If you have an array of size 32, you're in luck, the entire array is cached after going through the loop once.  However consider if you change the size to 33.  Now, when you hit the very last element, the first element is replaced.  Next you attempt to work with the first element, but it's been replaced, so you fetch the first element and replace the second element with it, etc. etc.  Now, nothing ever hits in the cache on each and every iteration!  Very bad.

2) Instruction & register scheduling - While admittedly this likely doesn't apply to any embedded MIPS processors, on "out of order execution" processors, having the right types of instructions available at the right times with minimal interdependencies greatly increases the average number of instructions which can be executed per clock.  The compiler usually takes care of this, thankfully, but it is possible to give bad hints even on sequential execution processors by doing things such as asking to have infrequently used variables always stored in registers (by declaring variables of type "register") when the register could more effectively be used for other values.

  It's more than likely that what you're seeing is in some way related to #1 above, either that you've crossed page boundaries by adding a few lines of code or that you've increased the size of the working set by just enough to greatly reduce the efficiency of the cache.
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 

Author Comment

by:Dan_Shule
ID: 6256658
Looking further into this I believe it has to do with the use of packed structures in the code.
Using packed structures eases handling of TCP/IP packets by enabling direct access to the various fields. It also improve the performance, for example, when preparing a packet to be sent, a block copy can be used instead of byte by byte copy.
It seems like the packed structures cause the alignment problem. I was sure the compiler is taking care of padding aneven structures...
Am I supposed to take care of the padding manually? How?
Is there any special treatment for packed structure that I'm missing?

Thanks
0
 
LVL 2

Expert Comment

by:ssr
ID: 6725306
Be careful.  All CPUs have different assumptions about how structures can/should be packed.  The compiler takes care of it, the best it can.  It depends on how optimized the compiler is for your particular CPU.  There are generally LOTS of rules you have to know if you want to manually optimize this in assembly language.

Perhaps it would be good to step back for a minute.

Here is how I would recommend you approach it:

1. What is the performance problem?  Too long to do a certain operation?  Too much memory used?  Is the CPU waiting for data, or is the rest of the program waiting for the CPU?

2. What are the exact steps to reproduce the performance problem?  You'll need to know this in order to verify that any enhancements you make actually work.

3. Where in the code does it look like the problem is?  Use comments and timing to figure out which blocks are running too slow or using too much memory.  On Windows, you can use GetTickCount() to see how long a particular block of code takes to run.

When you find the block (large or small) where the problem seems to be, then shift to the next part.  Diagram on your white board, how data moves, and how control goes to figure out what is really happening (not what you think is happening.  Things to look for:

* Is there a loop that gets executed many times (like millions or more)?
* Are you doing some searching?
* Are you doing some sorting?
* Are there large structures (linked lists, queues, etc.) being created and destroyed multiple times?

If you are having a problem in the search or sort code, then consider changing algorithms.  Maybe you started with a quick and easy bubble sort, but now you have so much data, that you need to change to a quicksort.  Maybe you are using a simple linear search, but now you need to upgrade to a hash table.  Maybe there is something you are doing repeatedly inside a loop that can be moved outside the loop, and the intermediate result saved.

In general, consider looking at things in this order, to maximize your time, and only fix what needs to be fixed:

1. Algorithms
2. Unnecessary memory allocation/deallocation
3. Unnecessary file I/O
4. Unnecessary repeated calls to same function
5. Loop optimization
6. Unnecessary parameter passing
7. Functions that can be moved inline, instead of having the overhead of a call.

Those are a few thoughts.

SSR
0
 

Author Comment

by:Dan_Shule
ID: 6949175
Your comments are interesting though It did not help to solve my specific problem.

Thanks
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Have you thought about creating an iPhone application (app), but didn't even know where to get started? Here's how: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Important pre-programming comments: I’ve never tri…
This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
The goal of this video is to provide viewers with basic examples to understand opening and writing to files in the C programming language.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now