Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 214
  • Last Modified:

Create 20 threads each launching a php script

Hi,
I have a system (Linux RH7.3) written in php but part of the system is to go and parse html pages. The demand is now too high for the script and therefor I need to launch multiple versions of the php script. I cannot just increase the number of cron jobs as each cycle has to be passed a refernce number of the client whom it is to action.

What I need to do (No experience of perl or python) is create a cgi app I can call from a cron job which once called creates a number of threads (var x, passed by call as parameter) and each thread will call a version of the php script. Once all php scripts have returned their responses the cgi closes?

Does this make sense?, I hope so lol.

How is it best to do this and can it indeed be done?

So, to summarise
How can I create a perl or likewise script that will be passed 2 vars (customer_id, thread_qty) and then each thread will go and call a php script. Once the php script completes the thread is closed, once all threads are closed the cgi stops.

Thanks for any help
S
Cheers
0
08718712060
Asked:
08718712060
  • 7
  • 6
1 Solution
 
MercantilumCommented:
If you call a php script you may have better to call fork() instead :

  See http://iis1.cps.unizar.es/Oreilly/perl/learn/ch14_04.htm

You can create n fork processes doing their php job, at the end of the script is a loop like

     while (wait () != -1) {};

waiting for all process to end before to continue.

It should be easy to implement.
0
 
08718712060Author Commented:
I guess easy if you know what you are doing lol...

The fork() looks good but how would I deal with the parameters, can I just read them out as in php, ie if I pass vars $x $y are these vars availible within the perl scripts as $x and $y or do I need to retrieve them with some sort of param(n) command?

Cheers
0
 
MercantilumCommented:
All variables before the fork() are accessible with the same value after the fork().
However, after a child change one value, it doesn't affect the other fork() child, or the father.

E.g Before fork() $x==3;
After fork() it is still 3, but if modified by the child, now, it wont  affect the other child, or the father.
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
08718712060Author Commented:
So basically the fork() oporates the same way as a windows thread, all variables are duplicated to each thread and each thread has its own independant copy of the initial vars?

**I appologise for my lack of knowledge on this but I am mainly a Delphi developer who has only recently turned to php.**

Cheers
0
 
MercantilumCommented:
No problem, we're here to help.

The threads share all the same memory space, and therefore when 1 thread modifies a value, all threads see the new value.
After a fork, a new process is created, same as if you open two notepads: when you modify the text of one, it does not modify the text of the 2nd.
However, you start 2 different notepads in this case.
While with fork(), you duplicate the process and it's data.
So you end up with two different process, each of them with different memory space, and no worry one will affect the other ; however, when the child was born :) he inherited the same memory space, and therefore variables and their values at the time of the fork() etc...

Please have a look to http://www.experts-exchange.com/Programming/Programming_Languages/C/Q_20891566.html for info.
0
 
08718712060Author Commented:
Hi Mercantilum,
Thanks for the referal it was an interesting synopsis of multi-thread/process. I am going to take 10 minutes out whilst I clarify in my own mind exactly which way I think I should go and then come back to you if that is OK?. A well constructed query is obviously more use and less time wasting that lots of piddly comments from me lol.

PS: Dont suppose you do paid contracts do you????

Cheers
Stu
BRB
0
 
08718712060Author Commented:
Hi Mercantilum,

Project Aim
At regular intervals a cron job launches a php script, this script finds several pieces of information in a mysql db and returns crucial values such as the id's of pages it needs to investigate.

The script then cycles through these pages and returns the html into a var which it then searches for a text match. The problem we face is that at first this worked no problem as there were only about 10 pages and 10 or so keywords to locate. We now need to search say 20 pages and each page has to be searched for about 300 words. Needless to say we are running into timeouts. Especially as each page has to be recalled with the associated keyword ie 100 kw's and 10 pages = 100 * 10 requests!!

This could be overcome by overriding the apache timeouts with php but the problem comes if a page cannot be retrieved for what ever reason, ie site down etc the rest of the pages are not processed as the script is just sat waiting for the current page and you end up with the nightmare situation, a hung script with strict instructions NOT to timeout :-(.

I thought the best way round this would be to create a multi-thread process which is called by the cron job, the mt process would then obtain a list of all pages and for each page required, generate a thread. The child thread would then either create an array of keywords and do the searching or launch a php script which could then return the results to the db. This way even if one thread times out we can basically scrap the process and cancel all threads which will inturn set the status back to "Not run" (or something similar) and this would obviously retry the next time the cron was run.

What we have at the moment is the flags set to "Running" to avoid the cronjob re-launching that report_id and then they stay there because things are hanging and timeing out.

I realy appreciate your input on this and like I said I am willing to pay should you agree to code it for me (Obviously with a detailed spec / schematic etc).

Cheers
Stu
0
 
MercantilumCommented:
Could you search the words before it is HTML (faster to my mind)?

If you need to find 300 words in 20 pages, the best is maybe, before the first search, to make a tree of words, so that the search will be much quicker.
For instance, I don't know what you want from the search, but simple Hashes in Perl make a kind of tree.
Doing 300 times: if (exists $tree{$myword}) will be incredibly faster than a sequential search in all the words on 20 pages each time....

Reading your last comment, I'm not sure the threading or forking will really improve the situation.
Do you have several processors in your box? if not, 10 threads doing a job of 1 second (pure cpu), will be the same as 1 thread doing the 10 jobs, in 10 seconds.
0
 
08718712060Author Commented:
Hi Mercantilum,
I have now aquired a "perl cookbook" O'reily publishers and all of your above comments are starting to become clearer. I will have a try and see how I get on but failing that I may have to rent a developer (Hint hint!)

Cheers for your help so far :-)
S
0
 
MercantilumCommented:
Feel free to ask in Perl :)
In perl you can get an "index" very quickly (20 pages is not a lot in memory)
E.g.

# Reading your pages and making an index/tree of words, each item contains the number of times the word is in
# The script takes the lines of your pages as input (like : cat pages.txt | perl thisscript)
# It takes ... one Perl line

while (<STDIN>) {  for (split) { $tree{$_}++;  } }

# To check if your words are in the pages (e.g. display the words in the pages)
# Your list of words is in an array @words
# It takes... one more line!

for (@words) { print "$_\n" if (exists $tree{$_}); }

Enjoy perl :)
0
 
08718712060Author Commented:
Hi Mercantilum,
Im not sure if you are still monitoring this post but if so I could do with a small favour :-P

I have decided on the following action (a bit bizarre but I see it as a quick work-around). I will be running a php script from cron that carries out all the logging and filtering of records requiring report generation. I will then be cycling through the relevent engine records and creating 3 arrays (Keyphrase and System will not change but the engine array will be generated per engine record),
array1 : Engine Array
This will contain all the neccesary information about the local engine to query
array2 : Keyphrase Array
This will contain the list of keyphrases allready url encoded
array3 : System Array
This will contain all the associated record ids etc required for progress logging.

At this point I want to launch a perl script passing it the above 3 arrays. All the perl script is required to do is generate the number of threads required (ie the number of keyphrases in array2. Each thread will launch a php script passing it the 3 arrays as above.

This way I am only having to modify the system as far as inserting the perl script to generate a multi thread which will hopefully avoid TO issues and also speed things up!

My question is How can I do this simple task?
PS: I will open a new post and issue points if required.

Cheers
S
0
 
MercantilumCommented:
So you call php to call perl to call php ...
Again, I wonder maybe you should first call perl, when your input is still the log, process it as much as you can, then finally call php to get the final output format.
Again unless you have a powerful multiprocessor system, and depending on exactly the data is to be processed, I'm not sure you need threads or forking.
0
 
08718712060Author Commented:
Hi Mercantilum
Thats the way I am going to do it which I know is a weird way of doing it but there are so many functions within the existing system that I do not want to port to perl just yet.
The final system is running on a multi processor server now so I will see how it all goes.

Cheers for all of your help,
S
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 7
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now