• C

C Programming: substring extraction

I am trying to create a HTTP GET request, which requires the following format:
GET {Request-URI) HTTP/1.1
Host: {host_name}

If I have a string such as "http://www.csun.edu/~steve", how do I extract the {host_name} part (http://www.csun.edu), and how do I extract the {Request_URI} part (/~steve)?

I need a strategy that will do this dynamically. In other words, the next string might be "http://www.cnn.com/money", or no directory structure at all like "http://www.google.com".
Who is Participating?
Bill BachConnect With a Mentor PresidentCommented:
You can do this via brute force, or through the use of a parser.  I'd do something like brute force, if you know that all input is a valid URL.  The logic would look like:
1) Detect and strip off the "http://", as this is not part of the host name.
2) Set HostName and Request strings to "".
3) Copy characters off to the HostName string until you find the next "/" or EOL.  
4) If EOL, exit and return strings as needed.
5) Copy from current position to EOL to Request string.
6) Exit and return strings as needed.

Can you create the working code from this description?
pzozulkaAuthor Commented:
Thank you very much. I think this is exactly what I was looking for. I will try to implement in code, and will get back soon with my solution.
pzozulkaAuthor Commented:
I think I got it:
// parse host to strip http:// and any dirs
// *************************************
char src[50];
char dest[100];
char *httpProto = "http://";

if((strstr(host, httpProto)) != NULL) { //does host contain http://
	memset(dest, '\0', sizeof(dest));
	strcpy(dest, host+strlen(httpProto));
	fprintf(configfp,"Stripped HTTP:\n%s\n",dest);
	// http:// stripped -- use dest
// *************************************

// copy chars off to the hostName until you find "/" or '\0'
// *************************************
char request[50];
char hostName[100];
int a, b = 0, forwardSlash = 0;

if((strstr(host, httpProto)) != NULL) { // host contains http://
	for(a=0; dest[a] != '\0'; a++) { //copy domain name into hostName
		if(dest[a] != '/') {
			hostName[a] = dest[a];
		else { // forward slash found
			forwardSlash = 1;
	while( (forwardSlash == 1) && (dest[a] != '\0')) { 
		request[b] = dest[a];
	fprintf(configfp,"Request: \n%s\n",request);
else { // host didn't contain http:// to begin with
// extract directory after domain name

Open in new window

Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

pzozulkaAuthor Commented:
I know I didn't have to extract hostName twice (once in the IF and once in the ELSE) depending if the original host name had a http:// in it or not. I just realized that.
pzozulkaAuthor Commented:
I'm doing something wrong. Here's what I get:

Open in new window

Bill BachPresidentCommented:
You never null-terminate the Hostname string.  Add line 30.5:
pzozulkaAuthor Commented:
I also have another urgent question, if you could please take a look at:

Bill BachPresidentCommented:
Looks like two good answers to that one already.  I agree that allocating memory in a function is a bad idea, and I also agree that the issue is in the malloc to begin with. "buffer" is defined as "char **", but the malloc call (which returns "char *") gets assigned to buffer directly.  Depending on the first few bytes in the string buffer, this will try to be interpretted as another pointer, and it breaks.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.