cURL: stopping a http transaction before it's finished

I'm trying to download a webpage, but the webpage is very large (>500kb), so instead of downloading the entire thing, I want to check for relevant details at the beginning, and if it's there (or not there), then parse it, and quit, and continue checking at a regular interval. Otherwise, the bandwidth on my server will be destroyed.

I'm using CURLOPT_WRITEFUNCTION which sends the html data in chunks to the write function. I noticed that if you return a size which is not the size of the data which was sent in, curl will cancel the transfer and throw an error. Problem is, I need to know if doing this will cause problems. My intention is to just cancel and re-start the http request anew, but it seems "hacky", and I don't trust that it won't be problematic.

Simple example of what I'm doing now:

size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream) {
if(strstr((char*)ptr,"my data") > 0)
{
    //parse my data
    return 0; //this seems to cause the http transfer to cancel
}
return nmemb;
}
int main()
{
    while(1)
    {
        curl = curl_easy_init();
        curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
        curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, FALSE);
        curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
        res = curl_easy_perform(curl);
        curl_easy_cleanup(curl);
        usleep(1000*15000);
    }
return 1;
}

Open in new window

stevedellAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ste5anSenior DeveloperCommented:
Use a range request (14.35.2 Range Retrieval Requests).
jmcgOwnerCommented:
Since not all sites honor the range request, the method you are using is often preferred.

There is a wrinkle, though: it's not guaranteed that your callback will be provided a full buffer, or that the string you are looking for does not cross chunk boundaries, so you may need to accumulate content for a few calls until you've received enough to make your determination of whether to continue to accept the remainder of the transfer.

And your logic should be to continue the transfer once you've decided not to abort it. Aborting the transfer, once it has passed your inspection, just to restart it, seems wasteful.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
sarabandeCommented:
you could return CURL_WRITEFUNC_PAUSE from write callback and then cancel the request by making a cleanup.

Sara
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C

From novice to tech pro — start learning today.