We help IT Professionals succeed at work.

C handling of HTML forms

GSD4ME asked
I'm a first-timer trying to develop a web page and have got so far in the development cycle before running out of knowledge.
I can create HTML forms on my web page and have figured out how to submit the data back to my CGI programon the ISP.

However, what I don't understand is in what form do they arrive at my program? I want to write the CGI program in 'C' (i don't know much else I'm afraid and am too old to handle anything new ike Java or Perl!) and need to extract the form data sent to me.

Secondly, when I have got stuff to send back, how  do I do it to update the form? I think I understand that I need to send back HTML 'commands' to my web page but how does my web page 'know' what to do with the data sent? Does the page autorefresh when it receives the HTML? and how do I ensure that the data I am sending back to my page goes in the correct areas of my page?
Watch Question

The data submited using a form can be retirieved depending on the method used.

If you used the GET method to submit the data, you can get the data then from the environment variable 'QUERY_STING'. It is everything that comes after the first question mark '?' in an address.

for example, if your program is at this address:


and the used cals this address:


Then you'll have the string 'stuff' in the QUERY_STRING environment variable.

Back to GET forms, when you make a GET form with username and password for example and submit it, the browser will send the user to the following address:


In other words, you'll get the following in the QUERY_SRTRING environment variable:


Where username and password are the names of the two fields, and USERNAME and PASSWORD are the values that the (hypothetical) user entered.

Lets talk now about the other method: POST.

When the information is sent using the POST method, the data is available to the program from the standard input. The data is formated in the same way as the GET method:


or whatever (according to your form and data).

To help you decide how much to set aside for the data, there is the 'CONTENT_LENGTH' environment variable that is supposed to contain the lenght of the given data. You can use it as you see appropriate. Be aware, however, that the 'CONTENT_LENGTH' environment variable is UNRELIABLE. This length is calculated and provided by the user agent (the browser) and passed as is to your program. A buggy browser or a cracker could send an inaccurate lengh causing you to reserve less that needed for the content which can result into buffer overrunning problems or she can send it with a very large number making your program reserve a large amount of ram until you run out completely of ram.

That all said, you must know now how to collect the data from a GET or a POST request.

Now comes the hard-part.

The data that you'll must be parsed before you can make use of!

As you must have noticed, the data is sent to a program from a form in an encoded form. Like this:


where name is the name of the field, and value is the values of the field. For more than one field:


As you can see, different fields are seperated by and ampersand '&'. Now what if you need spaces in the sent data? It'll look like this:


That is for a person called (John Smith). As you can see, all spaces are turned into plus signs '+'.

The last (and worst) part of parsing an encoded address is the % encoding, like this:


The above is another way to put a space in a query string, using the % encoding. It is the percent sign '%' followed by the two characters forming the hexdecimal value of the entered character. In the above example the character was the space ' '. In the ASCII table, a space takes askii code 32. 32 in hexdecimal is 20. So the space is represented in the address as '%20'.

So, the three steps of parsing an encoded form data string are:

1) Seperate fields by '&'
2) Seperate key from value by '='
3) Convert all '+' signs to spaces
4) Convert all '%HX' to characters with values equal to the decimal equivilent of the hex 'HX' number.


Now, when a used submit a form, the browser actually takes the user to a differnt page, which is your program. So your program has to print THE WHOLE HTML page that will show up in the users screen. That is with each and every HTML tag.

Printing is normal, using printf() and the others.

You must also note that the first part of what you print must be the page header, followed by an empty line (\n\n) then comes your HTML page. The only required part of a page header is the content type of your page. For HTML pages it is text/html. So before you print any HTML content, make sure that you print this:

Content-type: text/html

It is usually a standart at the begginning of all programs:

printf("Content-type: text/html\n\n");

(Whatch for capitalization in Content-type)


I guess that'll be all! :)


That is VERY comprehensive and VERY helpful.

The area I find most surprising is that to 'refresh' the page is that I have to send the WHOLE of my page back in its HTML.
I guess that I am going to have to redesign my web pages to make it simpler as the form resides on a page that has lots of other things associated with it - instructions what to do, HTML form items etc etc.!

Can I easily create a sub-page that displays the data sent back from the ISP/my program rather than 'refreshing' the original?


There are a couple of ways to do that ..

One way is using frames, but i'm not sure if the page will look exactly the way you want it. From what I understand, it is acceptable for you to change layout of the page. In that case frames as i said will do fine. But they won't work on (very) old browsers.

An 'iframe' element might look more like what you want, but it is IE specific (will not work on other browsers).

There is also another way that is server side. Hence, no need to worry about browser compatability. It is called SSI, Server Side Includes. And it has a very easy and HTML like syntax that enables you to include the output of a CGI program inside your webpage directly. You might then write a CGI program that outputs the HTML data that will display the form to the user when the page is simply viewed, and print the real output if the page data was instead submitted. However, I'm not sure about how is that exactly done. You might wanna search google for SSI or go directly and search in the Apache docs ( http://httpd.apache.com/ ).

BTW, if the form only does a specific and small job then its probablly much easier and more efficient to use a scripting language like PHP.

PHP is very similar to C in general concepts and has a very similar library of functions.

You might even get a full working piece of PHP code just by asking for it in the PHP section here in EE.


Thanks. Am starting to implement the ideas on my web site but it may take a while!

Explore More ContentExplore courses, solutions, and other research materials related to this topic.