Embedded Programming using C++

Posted on 2003-03-16
Medium Priority
Last Modified: 2016-05-20
I had been reading up on many books about robotics.  Building the platform and other physical parts of the robot are not hard, but can be accomplished.

I have one huge problem, however.  A robot cannot run without a brain, right?  So that's where embedded micontrollers come into play.  My problem is I don't know how to even start programming them.  I don't want to waste huge amounts of money buying the BASIC program for PIC Microcontrollers or buy Parallax Stamps that also uses the BASIC programming.  I have some background in C++.  I am taking my 2nd semester in C++.  I'm learning object orientated programming.  I heard from someone that it'll help me with the embedded chip programming, but I'm not sure how it will help me.

Getting to my main question, how do I even start to program embedded microchips like the PIC16F84 or the PIC16F627?  If you know how to program other microcontrollers, please answer as well.  If there are books out there, can someone suggest one or more?  I already have one called "Programming Robot Controllers", but all the programming is done in C.  I'm learning C++ and the coding in C is a bit different.  

If possible, can someone include a tutorial or instructions on how to start programming a microcontroller in C++?  Or sample codes on how to program a microcontroller.  Please explain the code, if possible.  If there are any internet links that relate to my question feel free to post it here.  If anyone has any experience with embedded chip programming, please help.  For the person that has experience in embedded chip programming, and doesn't mind, can I have your email address, considering this won't be the only obstacle block in my way.  Thank you.
Question by:Hailfire
  • 2
LVL 12

Expert Comment

ID: 8151343
First off, I suggest you check out if GCC can be used to generate code for your microcontroller. If it can then this is the easiest way, you can generate a cross-platform compiler that runs under CYGWIN or Linux and can generate code for that microcontroller.

You probably also need to get a linker program which can take compiler output and create executables. The GNU ld should work fine for this purpose.

The next thing you need is some kind of loader program which can read that executable and write the code and data to specified place in the controller's memory. How this communication is done vary from controller to controller and also how you build the hardware around it.

One way that might work in your case is to simply have a serial line connection on that controller and then have a loader program that send the data in some specified format over the serial line. In that controller you should then probably have some ROM code (firmware code) that is such that it receives the data in the specified format and then places the data in memory at the specified location(s) and then when the loading is done set the instruction pointer register of the microcontroller to point to the first instruction to execute. Possibly also send some acknowledge over the serial line when it is done so that the other end (the loading program) will know that the data has been received without errors.

You should also use some CRC-16 or CRC-32 scheme or something to verify that the data has been transmitted ok.

Most likely the controller have a very small mini OS that in your case can be as simple as just the bare C/C++ library that you want to support and implementation of the functions you require. Some minimal file system support. You probably don't have a disk on the robot but you probably have some arms and legs and whatever can move around and each independent unit can probably be represented as a "file". The serial line connection that was used to download the program can also be used by the program to read additional data after it, in this case you probably should arrange that the loader program after sending the executable file over should be ready to send over any data that the program requests. Then when you disconnect that connection, the robot is on its own - or at least the serial connection isn't available any more. It might still have an IR connection for example. Or perhaps you used an IR connection to upload the executable to it. In that case you don't have any serial connection.

Either way, the loading program need to communicate with the CPU in the robot in some way or another, this also means that it must be able to send it interrupt and that interrupt can indicate that it wants to send a new program to run. The mini-OS or the ROM code (firmware) can then receive the new program and replace the old program with the new.

Some simple interrupt handling and some simple I/O is therefore needed.

Once you have those set up, I believe GCC should be able to provide you a good cross platform which you can sit on Linux or Windows or whatever platform you choose to develop your program. If you're smart you also write or get a simple emulator program so that you debug and test your code on Linux/windows/whatever before you upload the program to the robot.


Author Comment

ID: 8176453
Didn't know what happened, I have questions on the things you posted.  I read up on the GCC.  Isn't it just a compiler created by GNU?  Also on the site it has G++ which is associated with C++.  I do have microsoft visual studio.net.  I can write C++ code, compile, and link the code there.  So can I use that to write code for embedded chip programming?  

When you're programming for the chip, how can one associate the code to the pins in the microcontroller?

What is CRC-16 or CRC-32 scheme?

When C++ code is compiled and run.  Looking at the code, it just runs down in one direction.  In a robot, there are many things that need to run at a single time, and it's constantly scanning with its sensors so that it does not collide into anything.  So how would you program in C++ the many tasks that are to be done by the robot?  By using interrupts?
LVL 12

Accepted Solution

Salte earned 200 total points
ID: 8179993
Yes, GCC is a compiler created by GNU.

However, it is a generic compiler. It can generate compiled executable for Intel, sparc, alpha, etc etc.

It has certain restrictions to what kind of CPU it likes. The CPU must be at least 32 bit. I.e. sizeof(int) == 4 is minimum so a 16 bit or 8 bit CPU won't work.

In addition if you want to make gcc generate code for a CPU it doesn't already have code generator for you have to define the tables used by the code generator to generate the code.

However, the great thing is that you can sit on an intel or some other regular CPU on your workstation and compile code that will run on your embedded CPU and that is exactly what you can do in your case. I.e. you can generate a compiler that runs on a PC but generate code for some other CPU - such as the CPU that you have in your robot.

What the compiler then produces is an assembly file for that CPU. You will typically feed that assembly code into some assembler (for example gas) which can also be a cross assembler so that it runs on your PC but generate a .o file that contain instructions for your robot CPU. You can then even use a linker to link together all code and produce an "executable". This executable will of course not run on the PC (unless you have some emulation software) but if you feed it to some loader program on the robot it can load it into memory of the robot and execute it.

This loader program - at least a primitive version of it - must exist in ROM and is probably best developed using emulation software, so that you know it will work when you make the ROM chip.

All CPUs have a special signal input which is a 'reset' where the CPU is placed in a known state and the instruction pointer register is pointing to a specific location of RAM or ROM and then instructions will start to go. Typically since the reset is supposed to invalidate the RAM and treat it as all garbage you will typically arrange it so that this specific location points to a ROM address.

The code at that ROM address will then - for a robot CPU - typically do something like this:

step 1. Sanity check - check that everything works the way it is supposed to work. I/O ports etc.

step 2. Wait for input from an external connection. This external connection is supposed to feed it with software to run and until it get that software it will continue to wait.

step 3. When input arrives it is typically a form of "executable" which is essentially a file or stream of data that looks something like this:

first a header. The header will probably give important information as what is the value of the first code address of this file. At what address in RAM should the file's segments be stored. (the base address) etc etc.

Some executable formats allow a file to be stored 'anywhere', the code is so-called 'position independent code' while other formats require the executable data to be stored at an exact specific address. Since you have complete control of the robot CPU you might make it easy on your self and just arrange that the file is always stored starting from address X where X is an address in RAM (not ROM) which has room for the program's segments.

After the header the segments typically follow one after the other. Most compilers and systems count on at least the following kind of segments. The names may vary I use the names as found on Unix:

This is the segment that contain the program code. The instructions. The compiler generate instructions and places them in .text segments and the linker will combine all the .text segments of the .o files together into one .text segment of the executable file. The start address should typically point to some place in this segment. If you make the code always start at a fixed address the linker will know the address of where the .text segment will be stored in memory and will resolve all references based on that assumption. If you make position independent code it will not make any such assumption.

This is the segment that contain space for all global variables. Static variables (class static, function static and static global variables) are also included in this segment. There may be two .data segments actually, if so the .data segment only contain those global data that can be modified while the constant variables and read only data (string literals, const arrays etc) are placed in a separate .rodata segment. Some systems have no .rodata segment and put them all in .data. In a small CPU you will probably not have paging or anything so splitting in .data and .rodata won't make much sense. It is only used in systems that run some form of paging system so that the .rodata segment can have pages that are marked as 'read only' and any instruction that attempt to modify the data will be trapped.

This segment contain room for stack, heap and data that are all initialized to 0. The point is that this segment isn't stored in the file as the previous two segments. Instead only the size of this segment is stored in the file and so the loader when loading the program into RAM will just zero out n bytes (n is the size of the segment) and say here is the .bss segment.

This means that when loaded into RAM the program will typically look like this (some systems might put .data segment before .text but I don't know of any such system, I think most put code before data):

Here is all the code
Here is all the data
First part is any data that have 0 initialization and is of such size that it was placed in .bss instead of .data.
After that comes the start of the heap.
After that comes the stack.

Note that some systems won't place stack and heap in .bss. Typically big OSes (Linux, Win2000, and similar systems) will instead just let the stack and heap be allocated in any free location of the process' virtual memory space and so the executable file doesn't even declare space for the stack.

MS-DOS on the other hand puts the heap (at least one of them - it has several heaps) and stack in the .bss segment. MS-DOS also uses segment registers so SS is set to point to .bss while DS and ES is set to .data and CS is set to .text segment (well, to be honest the exact info here depends on the model, I will not go into the details here).

So, your ROM code should essentially read the file, decode the header and then place the .text segment at one part of RAM, the .data segment after it and the .bss segment after that. It should then probably initialize some registers so that the program knows where to find the data etc. It is possible that the program was compiled with the assumption that one register was supposed to point to where the stack starts so whereever this loading program figure out the stack should start that is the value it sets that register to etc.

Finally it then do a jump to the start address of the program (the start address is also typically found in the header).

You can think of other ways of doing this though. For example in Unix the start address is the function named main() and it is a function so one way you could start is to simply look at the executable's symbol table (typically an additional segment which isn't loaded into memory per se but which is in the file and have a table of all the external symbols of the program) and find the symbol 'main' and then do a CALL instruction to the address of that symbol. However, I would imagine that it is easier if you simply let the linker place that symbol's value into the header some place (the linker must of course know all the external symbols and their value).

Then when you do a JUMP or CALL (all CPUs have that kind of instructions) to the starting point the program is executing. Typically when the program exits you clean up and then go back to the ROM code that waits for another program from the same I/O port.

CRC-16 and CRC-32 are two systems that are used to ensure that data has been received correctly. The idea is that when you send data over some IR channel or some cable it is possible that you send one data value over but the receiver read a slightly different value. CRC-16 and CRC-32 are systems made to detect when that happens and the receiver can then say "please send me those data again, I got error last time you tried to send them to me". The sender can then resent those data until they come over correctly.

It is simply a scheme to prevent that data gets corrupted when sent over some form of communication channel. A "communication channel" is here very generic. A .zip file uses CRC-32 to ensure that when it unpacks the file, it is not corrupted compred to when it originally packed it. TCP/IP uses CRC-32 when you send a packet over the network to make sure that the header gets over without corruption.

CRC-32 stands for "Cyclic Redundancy Check 32 bits" and is a very popular scheme for doing this. Part of the reason for its popularity is that it is fast and compact. You can send a megabyte of data and then send a 32 bit value and if that megabyte of data has been corrupted you will most likely detect it like 99.99999997671694% of the times.

True, a 32 bit value can only hold 4 billion different values and so it is thinkable that you get a corruption which happen to be such that it would generate exactly the same CRC-32 value but this is extremely unlikely, the chance is very close to pow(2,-32) a very small number, so the chance I mentioned above can be written as:

(100 * (1-pow(2.0,-32))) percent which is 99.99999997671694 percent.

Here is how CRC-32 works. The formal description may appear to be a very slow algorithm but the beauty of CRC-32 is that you can implement it using a simple lookup table of 256 entries where you just translate a byte by lookup instead of doing all the shifting and XOR-ing.

The math behind it is simple enough. First, let me introduce the ring Z2. This is a set of only two different number (0 and 1) and it is a ring, i.e. you have the operations + and * defined over the ring. We also have subtraction which is the opposite of + and we even have division defined in this ring. Addition is easy enough:

0 + 0 == 0, 0 + 1 == 1, 1 + 0 == 1 and 1 + 1 == 0

I.e. it is exactly the same as regular binary addition except that we ignore carry. It is also the same as the XOR operation so that is why there is XOR-ing involved when we work with CRC-32.

Note that 0 + 0 == 0 and 1 + 1 == 0, since 0 and 1 are the only two values here it means that x + x == 0 for all x. This means that x == -x and it also means that x - y == x + y. so subtraction and addition is the same thing!

Multiplication is also immediate and is the same as AND since 0 * 0 == 0, 0 * 1 == 0, 1 * 0 == 0 and 1 * 1 == 1.

Division is also possible but not really interesting, we will divide but not the plain numbers of Z2.

Now, let's introduce the set of polynomials over Z2, a polynomial over Z2 is a polynomial where the coefficients are elements of Z2, so they're either 0 or 1.

A(x) == A[n]*pow(x,n) + A[n-1]*pow(x,n-1)+...A2*x*x+A1*x+A0

Here all the A[k] are elements of Z2 and are either 0 or 1.

Now, if you have such a polynomial you can divide this polynomial with another polynomial B(x) over Z2. This will give you a quotient and a remainder just as in regular division:

A(x) == Q(x) * B(x) + R(x)

If B(x) is a polynomial of degree m then the polynomial Q(x) is of the degree n-m and R(x) is of degree < m.

However, we don't really care that much about the polynomial Q(x), the one of interest is R(x).

What happen if we compute A(x) + R(x)?

Specifically what happens if try to divide A(x) + R(x) with B(x)?

A(x) + R(x) == Q(x) * B(x) + R(x) + R(x)

However, all the coefficients are elements of Z2 so x + x == 0 so R(x) + R(x) == 0 and so this is Q(x) * B(x) so A(x) + R(x) divides evenly with B(x) or B(x) is a factor of A(x) + R(x).

So, if you have a stream of bits that you want to send over to receiver, what you do is to compute R(x) and you send over A(x) + R(x) instead of just A(x). The receiver can then do the same algorithm and divided the data he receives with B(x) and if he get a remainder different from 0 he knows the data has been corrupted.

For this to work the polynomial B(x) cannot be any polynomial but is carefully chosen. It isn't only one particular, there are several polynomials and it's not like one doesn't work at all while another works perfectly and there's none in the middle. It is a gradual so some are better than others. However, the CRC-32 means that B(x) is of degree 32. Since it is of degree 32 the first coefficient must be 1 and so usually aren't stored but is implicitely given. The other 32 coefficients of the 32 degree polynomial (A second degree polynomial has 3 coefficients) is typically stored in a 32 bit constant int which is part of the algorithm's parameters.

For every polynomial B(x) which is of degree 32 and which can be used for this algorithm there is a different CRC-32 algorithm. So The polynomial is here an important input and just knowing that some data are "CRC-32" is therefore not enough to calculate the CRC for the data, you have to know the specific polynomial. Both .zip files and TCP/IP are both CRC-32 but I think they use different polynomial for example.

CRC-16 is exactly the same but it uses a polynomial of degree 16 instead of 32.

Another thing to note, if you consider data to be a sequence of 32 bit quantities, then you can consider the whole stream of 32 * K bits you want to send as one big polynomial over Z2 of degree 32 * K:

A(x) == A[32*K]*pow(x,32*K)+A{32*K-1]*pow(x,32*K-1) +
        + A[2]*x*x + A[1]*x + A[0].

What happens if you send one more 32 bit word? it is simply the same as if you multiply the polynomial with pow(x,32).

So if we let A(x) instead be of degree (32*K+32) we get:

A(x) == A[32*K+32]*pow(x,32*K+32)+....
        ... + A[2]*pow(x,34) + A[1]*pow(x,33) +

Now if we compute the remainder for this polynomial we get:

A(x) == Q(x)*B(x) + R(x)

and we can now add this new polynomial to A(x):

A(x) + R(x) == Q(x)*B(x) + R(x) + R(x) == Q(x) * B(x)

So this gives the scheme for computing and detecting CRC errors:

One, initialze the 'CRC' slot with 0, compute the CRC using the value 0 for lower 32 bits. Then instead of sending the lower 32 bits we send the computed CRC bits instead.

The receiver just receives the bits and compute the CRC using the same algorithm and if he doesn't get a CRC value of 0 he will complain.

The actual computation involves a long division of the huge polynomial which is all the bits of the original data strung together in one big polynomial. The good side, each of the coefficients here are members of Z2 and are really easy to compute.

We just simply do a regular division of polynomials keeping in mind that the coefficients are members of Z2 and we therefore uses XOR instead of add or subtract and we use AND instead of multiply, division is also very easy. Divisor cannot be 0 and so the only value you can divide by is 1 and dividing x by 1 always give x as result and 0 remainder.

It is also clear that since we look at the stream of data as a stream of bits the algorithm also depends on how we combine the bits together, which bit we consider to be the first (most significant coefficient) and which bit we consider to be the last (least significant coefficient), so if we consider MSB or LSB of a byte first and byte order etc are also important when computing the CRC and it is important that both receiver and sender uses exactly the same CRC algorithm in all respects.

In the example below I will use a polynomial of degree 4 as my B(x) so this is a CRC-4 algorithm. You probably cannot use that in practice but it shows the principles involved. A CRC-32 algorithm is simply the same but with a 32 degree polynomial (and you will probably have lots more data to send).

Polynomial B(x) = pow(x,4) + x*x + x + 1

Data to send: 1110 1011 0010 1101

This correcponds to a polynomial:

pow(x,15) + pow(x,14) + pow(x,13) + pow(x,11) +
pow(x,9) + pow(x,8) + pow(x,5) + pow(x,3) + x*x + 1.

Let me use the notation Pn to mean pow(x,n) so the polynomial B(x) == P4 + P2 + P1 + P0.

The polynomial A gets four 0 bits added (multiply by P4) so it becomes:

A(x) == P19 + P18 + P17 + P15 + P13 + P12 + P9 + P7 + P6 + P4

We then want to do the long division of A(x) by B(x)

This means that we want to find a power which is such that when you multiply B(x) by Pk we get a value equal to P19, obviously P15 is such a value and so the first bit of the coefficient would be P15, we don't really care about the quotient Q(x) but we need to store this value long enough to compute the remainder polynomial R(x) which is our CRC value.

Q(x) == P15 + V(x) where V(x) is of degree < 15.

B(x) * Q(x) == B(x)*P15 + B(x)*V(x) == P19+P17+P16+P15 + B(x)*V(x) since V(x) is of degree less than 15 B(x) * V(x) is of degree less than 19 and so P19 is the only coefficient for P19 which matches exactly the coefficent for A(x) and we can therefore subtract this from A(x). Since we are about to destroy the bits of A(x) it is important that we send them off before we destroy them.

Actually, you can compute this without actually destroying the polynomial A(x) but it doesn't really matter since you hardly ever (in software) program it like this. Instead you make lookup tables and just take the first byte of data and  combine with the "CRC so far" and do a table lookup to find how the CRC will change due to the bits of A(x). However, in order to generate those tables you have to write code that actually do the shifting of bits and XOR-ing and AND-ing.

So we find the Q(x) == P15 + some polynomial of degree less than 15 (called V(x) above).

A(x) - P15*B(x) will then give a polynomial of degree less than 19 (one bit less) and which is such that the remainder of that new polynomial when divided by B(x) is the same as the remainder of the original polynomial A(x).

However, if you were to send the data this way you will easily see that A(x) - P15 * B(x) == A(x) + P15*B(x) will affect not only bit P19 but also affect bit P15 so before you do this modification you must have already sent the bits P19..P15. Once you've sent those bits you can do this, it will destroy A but only destroy the top 4 bits (which you already sent) and you will get a new polynomial D(x) such that D(x) == V(x)*B(x) + R(x) and have exactly the same remainder as the original polynomial A(x), you can then send off bit 14 and then repeat this process until you have sent off the whole polynomial. However, you never send off the 4 lowest bits (they are all 0) since you send the bits from R(x) instead.

The receiver just receive all bits and as soon as it has bits 19 to 14 it can start to divide this by B(x) and use the bit from the polynomial Q(x) and subtract A(x) - P15*B(x) and so on. Of course, receiver must do this on a copy since it would otherwise destroy the message.

However, you don't actually have to destroy the message even in a bit by bit algorithm. It is in this case only 4 bit at a time that are 'destroyed' so if you have a separate variable holding those 4 bits you can ignore 4 bits of the data and use those 4 bits instead to make your 'destroyed polynomial' and do the division on that instead of the actual polynomial. In this way you can do the division and calculate the remainder without ever destroying the original data. You will lose the quotient polynomial since you reuse the 4 bit data storage for every new bit but that doesn't matter since we aren't interested in Q(x) anyway. We only care about R(x) and that is kept. In fact that very same bits that are used are exactly the remainder polynomial when the process is done:

When you do a long division you rely on the fact that:

a == bq + r

if a is split in several parts, so you have a == a1 * N + a2 where N is some large value and 0 <= a2 < N then you get:

a == bq + r == a1 * N + a2 and you get:

if a1 == b*q1+r1 then you get:

a == a1*N + a2 == (b*q1+r1)*N + a2 == b*q1*N + (r1*N+a2)

and so you get

r1*N+a2 == b*q2 + r2

In this q2 must be 0 <= q2 < N (since r1 < b and so r1*N < b*N and since a2 < N we get that r1*N + a2 < b*N + N and it follows that 0 <= q2 < N. If q2 >= N we would have b*q2 >= b*N and it would be greater than r1*N+a2.

So r1 which is CRC so far is the one we combine with the later bits of a (a2) to compute the new CRC so far (r2) and when this process is done the "CRC so far" is the final CRC value.

Thus you don't actually have to destroy the message A(x) at all as long as you take it in chunks of n bits where n is the degree of the CRC polynomial B(x).

Hope this explains it, if you do a search on the net for CRC you will find lots of details and also example polynomials as well as exact parameters for existing CRC algorithms used in various places.

Of course, I should also mention that in addition to the polynomial and orientation (most significant bit is first or least significant bit is first) there are also other parameters that control the exact algorithm. Details about that can be found around the net.


Expert Comment

ID: 9502130
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

Answered by: Salte

Please leave any comments here within the next seven days.


EE Cleanup Volunteer

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

  Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and …
Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

615 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question