How to retrieve a binary file from a http// in C  and linux

Posted on 2007-10-19
Last Modified: 2010-04-15
Hello, I'm writing a program to download videos from youtube in C. It's based on the youtube-dl software that's written in python. I know that exists libcurl and others to help on this but I'd to do it via TCP to have a better understanding of the process. I already change messages with the site(receive html) but I can retrieve the binary content. My code is the following:

#include <sys/socket.h>
#include <arpa/inet.h>
#include <netdb.h>

#include <errno.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strings.h>
#include <regex.h>

char * http_post(const char * url, const char * params);
#define MAX_BUFFER 204800
char buffer[MAX_BUFFER+1];
char* match(const char *string, char *pattern)
    int    status;
    regex_t    re;
    regmatch_t match;
    char * tparam = NULL;
    if (regcomp(&re, pattern, REG_EXTENDED) != 0) {
        return NULL;      /* report error */
    status = regexec(&re, string, 1, &match, 0);
    if (status != 0) {
        return NULL;      /* report error */
    tparam = (char*)malloc(sizeof(char)*match.rm_eo - match.rm_so);
    strncpy(tparam,&(string[match.rm_so+4]),match.rm_eo - match.rm_so-5);
    return tparam;

int main(){
   char data_location[50];
   FILE * fd = NULL;
   char * tparam = NULL;
   char video_id[]= {"V36AJg6L_3o&mode=related&search="};
   char * ret;

   tparam = match(buffer,"[,{]t:'([^']*)'");
  char tmp[64];
  sprintf(tmp,"/get_video?video_id=%s&t=%s",video_id, tparam);
   http_post("", tmp);
   char * ptr = strstr(buffer,"Location:");
   if(ptr != NULL){  
      char *ptr_end = strstr(ptr+10,"\n");
      int i = 0;
      while(ptr+10 < ptr_end){
         data_location[i] = *(ptr+10) ;
         i++; ptr++;
      data_location[i] = 0 ;
   ret = http_post("","/get_video?video_id=V36AJg6L_3o&");
   if((fd = fopen("video.flv","wb")) == NULL){
      printf("erro ao abrir arquivo\n");
   fclose(fd); */
   return 0 ;

char * http_post(const char * url, const char * params){
   int connectionFd;
   int in;
   unsigned long limit = MAX_BUFFER ,index =0;
   struct sockaddr_in servaddr;
   struct hostent * hptr = NULL;
   char data_location[50];
   FILE * fd = NULL;
   char tmp[512];
   if(strcmp(url,"") == 0){ // here is the problem. We've the right url wich is
                                                                             // hardcoded below, but I can't revice the binary.
                                                                             // the headers is to mimic firefox
      sprintf(tmp,"GET /get_video?video_id=V36AJg6L_3o& HTTP/1.1\r\n\
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0\r\n\
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nAccept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png, */*;q=0.5\r\n\
Accept-Language: en-us,en;q=0.5\r\n\
Accept-Encoding: gzip,deflate\r\n\
Keep-Alive: 300\r\n\
Connection: keep-alive\r\n\
      sprintf(tmp,"POST %s HTTP/1.0\r\n \
            User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0\r\n \
                  Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n  Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png, */*;q=0.5\r\n \
                  Accept-Language: en-us,en;q=0.5\r\n\r\n",params);
      char * string = (char*)malloc(sizeof(char)*strlen(tmp)+1);
      memset(string, 0, sizeof(char)*strlen(tmp)+1);
      connectionFd = socket(AF_INET,SOCK_STREAM,0);
      servaddr.sin_family = AF_INET;
      servaddr.sin_port = htons(80);
      hptr = gethostbyname2(url,AF_INET);
      if(hptr == NULL){
         printf("não conseguiu resolver nome\n");
         return NULL;
      servaddr.sin_addr.s_addr      =   *(unsigned long *) *(hptr->h_addr_list);
      if(connect(connectionFd,(struct sockaddr*)&servaddr,sizeof(servaddr)) ){
         printf("erro ao conectar: %s\n",errno);
      write(connectionFd, string, strlen(string));
      while((in=read(connectionFd, &(buffer[index]), limit)) >0 ){
         index += in;
         limit -= in;
      buffer[index] = 0 ;
      return NULL;
Question by:fabytes
    LVL 86

    Accepted Solution

    All binary data that you receive will be in MIME format (base64-encoded text), so you need to read that and then decode it. You will find portable source code all over the net, the most prominent beint the "uudecode" source code in
    LVL 27

    Assisted Solution


    Hi, fabytes

    if(strcmp(url,"") == 0){ // here is the problem. We've the right url wich is
                                                                                 // hardcoded below, but I can't revice the binary.
                                                                                 // the headers is to mimic firefox

    Do you mean, this condition doesn't work as it should?
    If the problem is not in that confition, I see 2 other possible problems:

    1) Cookie is incorrect. You are using fixed cookie, not a dynamic coocie from the response from the server:
    I guess the cookie value should be caugth from server's response.

    2) You have incorrect read loop:
    while((in=read(connectionFd, &(buffer[index]), limit)) >0 ){
             index += in;
             limit -= in;

    Don't rely on zero response from read() syscall. Rely on a Content-Length, cought as a header in response to GET/POST request. When connection is keep-alive, your loop would block at the end of the data. Also don't read into the same 200k buffer, its too small for large binary content.


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    I have seen several blogs and forum entries elsewhere state that because NTFS volumes do not support linux ownership or permissions, they cannot be used for anonymous ftp upload through the vsftpd program.   IT can be done and here's how to get i…
    This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
    The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.
    The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.

    737 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    22 Experts available now in Live!

    Get 1:1 Help Now