Link to home
Start Free TrialLog in
Avatar of Alan Pollenz
Alan Pollenz

asked on

I need help with PHP RollingCurl class

I am running a development site and a production site on an Ionos cloud server.  The server runs on CentOS Linux 7.6.1810 (Core), PHP 7.2.18, and Plesk Onyx Version 17.8.11 Update #53.

I am attempting to use php cURL via the RollingCurl class, but when my parent script cURLs into my child script, RollingCurl only spawns 7 or so children and then waits for one to complete before starting another.  For example, the following is from my log file:

2019-05-09 14:46:27: 027093: Jumpseat Reservation Request ID 6: Started processing at 14:46:26.62034600... Finished processing at 14:46:27.24872900... Submitted to the jumpseat reservation system at 14:46:27.24897800 and status updated to "Processed"... Processing time: 0.62863802909851
2019-05-09 14:46:27: 027093: Jumpseat Reservation Request ID 1: Started processing at 14:46:26.62530500... Finished processing at 14:46:27.25049400... Submitted to the jumpseat reservation system at 14:46:27.25062500 and status updated to "Processed"... Processing time: 0.62532806396484
2019-05-09 14:46:27: 027093: Jumpseat Reservation Request ID 5: Started processing at 14:46:26.62124700... Finished processing at 14:46:27.28338800... Submitted to the jumpseat reservation system at 14:46:27.28359300 and status updated to "Processed"... Processing time: 0.66235399246216
2019-05-09 14:46:27: 027093: Jumpseat Reservation Request ID 7: Started processing at 14:46:26.62410900... Finished processing at 14:46:27.29295000... Submitted to the jumpseat reservation system at 14:46:27.29310400 and status updated to "Processed"... Processing time: 0.66899991035461
2019-05-09 14:46:27: 027093: Jumpseat Reservation Request ID 9: Started processing at 14:46:26.62833800... Finished processing at 14:46:27.29748200... Submitted to the jumpseat reservation system at 14:46:27.29766300 and status updated to "Processed"... Processing time: 0.66932892799377
2019-05-09 14:46:27: 027093: Jumpseat Reservation Request ID 8: Started processing at 14:46:26.62410900... Finished processing at 14:46:27.38582100... Submitted to the jumpseat reservation system at 14:46:27.38604300 and status updated to "Processed"... Processing time: 0.76193881034851
2019-05-09 14:46:27: 027093: Jumpseat Reservation Request ID 10: Started processing at 14:46:26.63221700... Finished processing at 14:46:27.40373300... Submitted to the jumpseat reservation system at 14:46:27.40392500 and status updated to "Processed"... Processing time: 0.77171611785889

....

2019-05-09 14:46:28: 027093: Jumpseat Reservation Request ID 2: Started processing at 14:46:27.58376800... Finished processing at 14:46:28.04593400... Submitted to the jumpseat reservation system at 14:46:28.04617200 and status updated to "Processed"... Processing time: 0.46241307258606
2019-05-09 14:46:28: 027093: Jumpseat Reservation Request ID 3: Started processing at 14:46:27.58375100... Finished processing at 14:46:28.08335500... Submitted to the jumpseat reservation system at 14:46:28.08365400 and status updated to "Processed"... Processing time: 0.49991297721863
2019-05-09 14:46:28: 027093: Jumpseat Reservation Request ID 4: Started processing at 14:46:27.58375100... Finished processing at 14:46:28.12980700... Submitted to the jumpseat reservation system at 14:46:28.12998500 and status updated to "Processed"... Processing time: 0.54624199867249

Note that the first 7 children start at virtually the same time, and then there is a pause of nearly one second before child number 8 starts.  I know that rolling thunder has a limit as to how many children can be running simultaneously, and for testing I opened this up to 200 (20 is the default).

I have attached a text file with the RollingThunder.php class file, my parent file, and the child file that is used when spawning each child.  Note that $securebaseurl is used so I can easily change between development and production sites.  The child process makes a call to a paid third party provider to submit the reservation request.

Any advice you can give on server/php/mysql or other configuration settings, or the code itself, that would allow more child requests to be processed simultaneously, would be much appreciated.
Avatar of gr8gonzo
gr8gonzo
Flag of United States of America image

No code was attached. Also, what's RollingThunder?
Avatar of Alan Pollenz
Alan Pollenz

ASKER

It's actually RollingCurl (oops)... a class that starts another cURL request when a single request finishes, rather than waiting for the batch to finish.

  EE.txt

File should now be embedded.
A few quick comments:
1. The child file looks like it might have some token or authentication info in the RestClient instantiation. If that's sensitive, please redact it and re-upload.

2. Not part of the problem, but a good habit to avoid:
if ($rowcount != '1') { // didn't get reservation to process
...when you compare a number (in this case, an integer containing the # of rows) to a string (the '1', because of the quotes), you force PHP to cast a value in order to compare it first. Not a huge deal when you do it once, but it's a bad habit that needs to be broken. When you do it in a larger project and PHP has to cast things many times, it starts chipping away slowly at performance, especially when multiplied. On top of that, if you ever get into other languages that are strongly-typed, this just won't work - you'll get syntax errors.

3. Your child code uses $xml_filename but I don't see it defined anywhere.

4. I don't see the delay you're referring to. It looks like the child requests are finishing and the next one is starting again almost immediately...

Request ID 6: 14:46:27.2489 = Submitted to the jumpseat reservation system...
...
Request ID 10: 14:46:27.4039 = Submitted to the jumpseat reservation system...
...
Request ID 2: 14:46:27.5837 = Started processing
Oh wait - I think I see what you're doing.

You're trying to modify window_size on the RollingCurl instance, but it's a private property, so you can't do that. Just pass it in your execute call:

BEFORE:
    $rc = new RollingCurl();
    $rc->window_size = 200;
    foreach ($urls as $url) {
        $request = new RollingCurlRequest($url);
        $rc->add($request);
    }
    $rc->execute();

Open in new window


AFTER:
    $rc = new RollingCurl();
    // $rc->window_size = 200; // <--- COMMENT OUT
    foreach ($urls as $url) {
        $request = new RollingCurlRequest($url);
        $rc->add($request);
    }
    $rc->execute(200); // <-- SET TO 200

Open in new window

RollingCurl, eh... Some serious Thrillseeking...

RollingCurl is a bit off the beaten track + your code differs a good bit from the example code.

Likely a good start will be to correctly call add() using your $options... so something like this...

    $rc = new RollingCurlRequest($url);

    foreach ($urls as $url) {
        $request = new \RollingCurl\Request($url);
        $rc->add(
           $request->addOptions($options)
        );
    }

Open in new window


Or something similar.

Your code never seems to pass any $options, so I'm unsure if this means CURLOPT_TIMEOUT is infinite or 0 seconds for every request.

If the default is 0 seconds, then all your requests may start, then instantly end. Unsure.

Likely the overall problem relates to "The child process makes a call to a paid third party provider to submit the reservation request".

This doesn't account for the fact almost every sensible system runs fail2ban + blocks attack traffic.

Your code will register as an attacker, because you aren't honoring any rate limits... if rate limits are in effect...

If you hit a rate limit, the likely fail2ban will kick in + kill off any open connections, then block any future connections for some time period.

Botton Line: To answer your question will require knowing a lot about both the reservation system, then digging into your code to ensure you've honored any of their rate limits.

To many variables to guess, at first glance.

Likely next step: Contact your reservation system provider + open a support ticket to ask about rate limits, if this data isn't published as part of their API docs.
Hah! I think gr8gonzo + caught the same code quirk... which suggests good for you to look at that area of code... after you determine rate limits of the service you're calling.
One last thought - if you're running the parent and child on the same system, it may be more efficient to use pcntl/posix extensions to handle parallel requests instead of using cURL. Otherwise, you could be taking up valuable child workers / processes on the web server. Usually web servers have a limited number of concurrent requests they will handle at one time (which may bottleneck your script anyway, unless the web server is configured to handle 200+ concurrent requests). On top of that, each child is making an outbound request to the jumpseat reservation API / service, so that's another 200 connections.

Simply put, it's pretty inefficient to do 1-request-per-call via the web server like this. Plus, the amount of overhead in the HTTP request is going to add up if you have a large volume of requests to process. It would be like having 10 friends all driving to the supermarket and back to the house, where they buy one item each trip. The driving time adds up over time, and it would be faster to just have each friend pick up 100 items per trip.

One thing you can do is have each child be responsible for grabbing multiple tasks and performing them all, and then looping so they take advantage of their existing resources/connections. The logic looks like this:

1. Child script generates a GUID
(https://www.php.net/manual/en/function.com-create-guid.php - check the comments for Linux versions)

2. Child script runs query to "take" up to 20 unassigned jobs/tasks:
UPDATE table SET worker="GUID from step 1", status='Assigned' WHERE worker IS NULL and status='Waiting' AND other criteria here LIMIT 20;

3. Check # of affected rows. If the # is greater than 0, go fetch the jobs
SELECT * FROM table WHERE worker="GUID from step 1"

4. Loop through each row, and for each row, process the job.

5. After each job, update the status:
UPDATE table SET status='Finished or Failed or whatever' WHERE row_id=<row ID>

6. After all jobs are processes, loop back to step 1, and keep going.

This methodology allows you to kick off a small number of ongoing workers (and you can do it from the command line instead of the web server, to save resources) that should be fairly efficient, especially if they're repeating network calls to the same remote host (they can reuse the same SSL / TLS handshake, which can save you a few hundred milliseconds off of subsequent calls). It'll also likely be preferred by the remote host.
And I agree with David - most "established" services (even if you're paying for them) will have measures to protect itself from DDOS/ flood attacks. Throwing 200 concurrent hits at the service will likely trigger some red flags somewhere and you might get a not-so-fun phone call asking you to rethink your plan, or maybe they'll just block you and suddenly your script will stop working for no good reason.

If the 3rd party service is explicitly set up to handle high-volume concurrent calls, then 200 calls probably isn't an issue, but just be careful and don't assume they are.
Wow... go out for an anniversary dinner and come back to this much great info.  You’ve all given me a lot to digest and think about.

Thanks to all and I will look at things in the morning.
Okay... I've looked at all the suggestions and tested the first few solutions mentioned by gr8gonzo and David, none of which had any effect... but thank you!

I know nothing about pcntl/posix , but as gr8gonzo suggested I will look into that, as well as configuring the server to handle more requests.

And I will also try gr8gonzo's 6 step solution above.

Thanks again.

Alan
I would double check to make sure you're modifying the right script, too. I would have expected PHP to throw an exception when you tried to set window size directly, which would prevent the script from reaching the execute line. So maybe throw in some echo or a logging line and make sure it shows when you run the script.
Thanks gr8... Looking at the class, there is a function that accepts the window size parameter:

    private function rolling_curl($window_size = null) {
        if ($window_size)
            $this->window_size = $window_size;

        // make sure the rolling window isn't greater than the # of urls
        if (sizeof($this->requests) < $this->window_size)
            $this->window_size = sizeof($this->requests);

        if ($this->window_size < 2) {
            throw new RollingCurlException("Window size must be greater than 1");
        }

Open in new window


I changed the window size to 2, and the child executed 2 requests, then a pause, then 2 more, etc.  When I set the window size to 10, it processes 7, waits until one finishes, starts #8, etc.
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.