open a URL and print its contents

Posted on 2004-10-27
Last Modified: 2010-04-15
OK, this applies to both OS: windows and linux.

i'm trying to open a URL, and display its contents on the console. much like wget would do, only simpler.

basically i need some code to perform the html request and print the content. tha's all.
i tried several aproaches but they mostly need MFC or some other technologies.
simple C would do.

if it's not possible then is there any way to do it wihtout MFC or the like?
no external libraries (like libcurl), it's just a small and simple tool.

thanks so much
Question by:urif
    LVL 22

    Expert Comment

    basically you need to open a socket to the server, port 80.
    send a string of the format "GET URL"    optionally followed by options
    You'll get back a response code, hopefully 400, "OK", followed by the web page.

    Some pickier web servers require some options like "HTTP/1.1", but many don't

    You can experiment with telnet:

    c:>telnet 80
    GET /
    400 OK
    <DOCTYPE HTML...>  ...

    to do it from a program, lookup the networking functions like socket(), connect(), etc...  these come with good examples.

    LVL 7

    Expert Comment

    Here are the functions you need to call in sequence. I think that should get you started. You can use man pages to figure out the details.

    /* Open a socket and get a socket descriptor. This is analogous to using the open() function to open a file */

    /* Set socket options - not mandatory to call this */

    /* Connect to the Webserver host/port. You will need to dig around a bit to figure out how to fill in the sockaddr struct. */

    /* Send the HTTP GET command to the server. If the server supports HTTP 1.0, all you need to say is GET /path/to/file HTTP/1.0. With HTTP 1.1 you need to send some additional headers (I forget which) */

    /* Read the data sent back by the server */

    /* Do whatever you want with the data */

    LVL 7

    Expert Comment

    Forgot to add - for write() and read(), you use the socket descriptor (the one you got back from the socket call) like a file descriptor.
    LVL 7

    Expert Comment

    Also forgot - you need to close() the socket descriptor when done.

    Author Comment

    thanks everyone.

    ravs can you be more specific?
    i know about the sockets, but for some reason it doen't work under windows and on linux i get a lot of "cannot open socket" kind of errors.

    do you have a working example|?

    thanks again
    LVL 7

    Expert Comment

    What errors do you get?


    Author Comment

    ravs never mind. thanks for trying to help anyway
    as i say

    > i need some code to perform the html request and print the content

    i don't have the time (thge projst is a huge one and there is the asm part that is taking most of my time) to start trying to build the code, if you have a working piece of code, great, if not thanks so much anyway
    LVL 5

    Accepted Solution

    this is fairly easy to do along the lines suggested by others.
    still here is some thing, see if it works for you.

    #include <stdio.h>
    #include <sys/socket.h>
    #include <sys/types.h>
    #include <netinet/in.h>

    int main(argc, argv)
          int argc;
          char **argv;
          struct sockaddr_in to;
          int sockfd;
          char data[SOME_NUMBER];
          int n;

          char *text = "GET HTTP/1.0\r\n\r\n";
            char *ip = "";     // usual dotted decimal representation of the ip of the site u want to retrieve from.
                                                         // in our example
          to.sin_family = AF_INET;
          to.sin_port = htons(80);
          inet_aton(ip, &to.sin_addr);

          sockfd = socket(AF_INET, SOCK_STREAM, 0);

          if(!connect(sockfd, (struct sockaddr *)&to, sizeof(to)))
                printf("connection failed\n");

          write(sockfd, text, strlen(text));
          n = read(sockfd, data, 1024);     // this you should modify to handle the general case, to read repeatedly from the
                                                                // socket till u get EOF
          data[n - 1] = 0;

          printf("%s\n", data);

    pretty primitive. but i guess it will work. the ip and the page you specify can be made to come from command line.
    you should do that. a better appproach wd be to parse out the first parts from the page name, and use gethostbyname
    to get the ip. well i think you get the general idea to proceed from here

    hope this helps.
    LVL 7

    Expert Comment

    ASM - that's assembly language, right?

    Urif, it will take about 15 mins for an assembly programmer like you to get the C program running. I can give you working code, but not do your assignment (ok, ok, insignificant part of big project) for you. I would at least like to see you try - somehow, the "I'm too busy" doesn't motivate me too much.

    Oh, but never mind, I see that van_dy has done it anyway.

    LVL 5

    Expert Comment

    yea, what part of your project(which incorporates
    something like dirty web client) would need to be coded
    in assembly anyways... we would like to know and help
    you in that too :D

    Author Comment

    thanks everyone, let me try the code.

    ravs, while i would generally agree with you i am pressed with time, that's why i winded up asking for a piece of code, however insignificant it is, as you said, yes, an asm programmer would come up with the code fast, but since i am generally a low level system (mostly kernel related work that has to do with crypto) programmer, i haven't done much with sockets and i have my head totally occupied with the asm.

    now, what project has asm and a web thing in it? well, i can't tell you. but the socket part has to do with getting a specific piece of data from a server that sits half way across the world and that only understand http requests to send the data. since this is "trivial" i didn't want to even get my attention out of the assembler function that as it is, it's really a pain.

    next time i will try "harder" i promise, but not now. i am sorry and i am sorry if i offended anyone it wasn't my intention.

    as soon as i  try the code i'll post my reply and maybe if you are good some assembler for you to "try" to decode :)

    Author Comment

    your code worked but with some changing. i had to add some line, but it essentially worked.
    the only problem is portability to win32 platform.

    thanks so much for the help
    LVL 5

    Expert Comment

    On windows you will need to do something like WSASTARTUP.
    i haven t programmed on that platform, you may consider taking
    a look at  initial pages of the beej's guide to set it right for windows

    Author Comment

    as far as i know you might need either a sdk from microsoft, or something similar.
    do you know if mingw (gcc for windows) supports sockets?
    LVL 5

    Expert Comment

    I really cant guide you much there because i haven't
    programmed on windows at all. However please see this link
    and find out if the stuff is useful.

    Author Comment

    hmmm something stange happened.

    ok, i tried the code you posted uaing google as the reference, everything is ok. i get:


    then the whole html code of the page.
    now, i chenged the page to the server and page i am trying to access and nothing happens. i just get the connected but when it goes into the while() loop for the read != eof then it justs stays there.
    now the page is accessible thru a webbrowser and wget as well, so there is no problem there,
    any ideas?


    Author Comment

    by the way, on windows cygwin gives you the answer, it supports sockets and gcc, so basically i can compile the same code with minor adjustments

    Author Comment

    ok, i removed the HTTP/1.0 from the code and this what i got

    <TITLE>403 Forbidden</TITLE>
    You don't have permission to access http://xxx.yyy
    on this server.<P>
    <ADDRESS>Apache/1.3.9 Server at yyy.yyy Port 80</ADDRESS>

    but from the browser and from the same computer it works...

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Suggested Solutions

    An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
    Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address. This address might be address of another variable/address of devices/address of fu…
    The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.
    The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.

    933 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    19 Experts available now in Live!

    Get 1:1 Help Now