Solved

Create 20 threads each launching a php script

Posted on 2004-04-21
13
209 Views
Last Modified: 2013-12-25
Hi,
I have a system (Linux RH7.3) written in php but part of the system is to go and parse html pages. The demand is now too high for the script and therefor I need to launch multiple versions of the php script. I cannot just increase the number of cron jobs as each cycle has to be passed a refernce number of the client whom it is to action.

What I need to do (No experience of perl or python) is create a cgi app I can call from a cron job which once called creates a number of threads (var x, passed by call as parameter) and each thread will call a version of the php script. Once all php scripts have returned their responses the cgi closes?

Does this make sense?, I hope so lol.

How is it best to do this and can it indeed be done?

So, to summarise
How can I create a perl or likewise script that will be passed 2 vars (customer_id, thread_qty) and then each thread will go and call a php script. Once the php script completes the thread is closed, once all threads are closed the cgi stops.

Thanks for any help
S
Cheers
0
Comment
Question by:08718712060
  • 7
  • 6
13 Comments
 
LVL 10

Expert Comment

by:Mercantilum
ID: 10880717
If you call a php script you may have better to call fork() instead :

  See http://iis1.cps.unizar.es/Oreilly/perl/learn/ch14_04.htm

You can create n fork processes doing their php job, at the end of the script is a loop like

     while (wait () != -1) {};

waiting for all process to end before to continue.

It should be easy to implement.
0
 

Author Comment

by:08718712060
ID: 10886346
I guess easy if you know what you are doing lol...

The fork() looks good but how would I deal with the parameters, can I just read them out as in php, ie if I pass vars $x $y are these vars availible within the perl scripts as $x and $y or do I need to retrieve them with some sort of param(n) command?

Cheers
0
 
LVL 10

Expert Comment

by:Mercantilum
ID: 10886360
All variables before the fork() are accessible with the same value after the fork().
However, after a child change one value, it doesn't affect the other fork() child, or the father.

E.g Before fork() $x==3;
After fork() it is still 3, but if modified by the child, now, it wont  affect the other child, or the father.
0
How Do You Stack Up Against Your Peers?

With today’s modern enterprise so dependent on digital infrastructures, the impact of major incidents has increased dramatically. Grab the report now to gain insight into how your organization ranks against your peers and learn best-in-class strategies to resolve incidents.

 

Author Comment

by:08718712060
ID: 10886426
So basically the fork() oporates the same way as a windows thread, all variables are duplicated to each thread and each thread has its own independant copy of the initial vars?

**I appologise for my lack of knowledge on this but I am mainly a Delphi developer who has only recently turned to php.**

Cheers
0
 
LVL 10

Expert Comment

by:Mercantilum
ID: 10886490
No problem, we're here to help.

The threads share all the same memory space, and therefore when 1 thread modifies a value, all threads see the new value.
After a fork, a new process is created, same as if you open two notepads: when you modify the text of one, it does not modify the text of the 2nd.
However, you start 2 different notepads in this case.
While with fork(), you duplicate the process and it's data.
So you end up with two different process, each of them with different memory space, and no worry one will affect the other ; however, when the child was born :) he inherited the same memory space, and therefore variables and their values at the time of the fork() etc...

Please have a look to http://www.experts-exchange.com/Programming/Programming_Languages/C/Q_20891566.html for info.
0
 

Author Comment

by:08718712060
ID: 10886567
Hi Mercantilum,
Thanks for the referal it was an interesting synopsis of multi-thread/process. I am going to take 10 minutes out whilst I clarify in my own mind exactly which way I think I should go and then come back to you if that is OK?. A well constructed query is obviously more use and less time wasting that lots of piddly comments from me lol.

PS: Dont suppose you do paid contracts do you????

Cheers
Stu
BRB
0
 

Author Comment

by:08718712060
ID: 10886789
Hi Mercantilum,

Project Aim
At regular intervals a cron job launches a php script, this script finds several pieces of information in a mysql db and returns crucial values such as the id's of pages it needs to investigate.

The script then cycles through these pages and returns the html into a var which it then searches for a text match. The problem we face is that at first this worked no problem as there were only about 10 pages and 10 or so keywords to locate. We now need to search say 20 pages and each page has to be searched for about 300 words. Needless to say we are running into timeouts. Especially as each page has to be recalled with the associated keyword ie 100 kw's and 10 pages = 100 * 10 requests!!

This could be overcome by overriding the apache timeouts with php but the problem comes if a page cannot be retrieved for what ever reason, ie site down etc the rest of the pages are not processed as the script is just sat waiting for the current page and you end up with the nightmare situation, a hung script with strict instructions NOT to timeout :-(.

I thought the best way round this would be to create a multi-thread process which is called by the cron job, the mt process would then obtain a list of all pages and for each page required, generate a thread. The child thread would then either create an array of keywords and do the searching or launch a php script which could then return the results to the db. This way even if one thread times out we can basically scrap the process and cancel all threads which will inturn set the status back to "Not run" (or something similar) and this would obviously retry the next time the cron was run.

What we have at the moment is the flags set to "Running" to avoid the cronjob re-launching that report_id and then they stay there because things are hanging and timeing out.

I realy appreciate your input on this and like I said I am willing to pay should you agree to code it for me (Obviously with a detailed spec / schematic etc).

Cheers
Stu
0
 
LVL 10

Accepted Solution

by:
Mercantilum earned 500 total points
ID: 10889969
Could you search the words before it is HTML (faster to my mind)?

If you need to find 300 words in 20 pages, the best is maybe, before the first search, to make a tree of words, so that the search will be much quicker.
For instance, I don't know what you want from the search, but simple Hashes in Perl make a kind of tree.
Doing 300 times: if (exists $tree{$myword}) will be incredibly faster than a sequential search in all the words on 20 pages each time....

Reading your last comment, I'm not sure the threading or forking will really improve the situation.
Do you have several processors in your box? if not, 10 threads doing a job of 1 second (pure cpu), will be the same as 1 thread doing the 10 jobs, in 10 seconds.
0
 

Author Comment

by:08718712060
ID: 10890218
Hi Mercantilum,
I have now aquired a "perl cookbook" O'reily publishers and all of your above comments are starting to become clearer. I will have a try and see how I get on but failing that I may have to rent a developer (Hint hint!)

Cheers for your help so far :-)
S
0
 
LVL 10

Expert Comment

by:Mercantilum
ID: 10890318
Feel free to ask in Perl :)
In perl you can get an "index" very quickly (20 pages is not a lot in memory)
E.g.

# Reading your pages and making an index/tree of words, each item contains the number of times the word is in
# The script takes the lines of your pages as input (like : cat pages.txt | perl thisscript)
# It takes ... one Perl line

while (<STDIN>) {  for (split) { $tree{$_}++;  } }

# To check if your words are in the pages (e.g. display the words in the pages)
# Your list of words is in an array @words
# It takes... one more line!

for (@words) { print "$_\n" if (exists $tree{$_}); }

Enjoy perl :)
0
 

Author Comment

by:08718712060
ID: 10904878
Hi Mercantilum,
Im not sure if you are still monitoring this post but if so I could do with a small favour :-P

I have decided on the following action (a bit bizarre but I see it as a quick work-around). I will be running a php script from cron that carries out all the logging and filtering of records requiring report generation. I will then be cycling through the relevent engine records and creating 3 arrays (Keyphrase and System will not change but the engine array will be generated per engine record),
array1 : Engine Array
This will contain all the neccesary information about the local engine to query
array2 : Keyphrase Array
This will contain the list of keyphrases allready url encoded
array3 : System Array
This will contain all the associated record ids etc required for progress logging.

At this point I want to launch a perl script passing it the above 3 arrays. All the perl script is required to do is generate the number of threads required (ie the number of keyphrases in array2. Each thread will launch a php script passing it the 3 arrays as above.

This way I am only having to modify the system as far as inserting the perl script to generate a multi thread which will hopefully avoid TO issues and also speed things up!

My question is How can I do this simple task?
PS: I will open a new post and issue points if required.

Cheers
S
0
 
LVL 10

Expert Comment

by:Mercantilum
ID: 10905290
So you call php to call perl to call php ...
Again, I wonder maybe you should first call perl, when your input is still the log, process it as much as you can, then finally call php to get the final output format.
Again unless you have a powerful multiprocessor system, and depending on exactly the data is to be processed, I'm not sure you need threads or forking.
0
 

Author Comment

by:08718712060
ID: 10909320
Hi Mercantilum
Thats the way I am going to do it which I know is a weird way of doing it but there are so many functions within the existing system that I do not want to port to perl just yet.
The final system is running on a multi processor server now so I will see how it all goes.

Cheers for all of your help,
S
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Need to learn promise API 2 55
powershell md/mkdir/New-item   -Quiet 10 103
[SQL server / powershell] bulk delete table from CSV 8 37
Batch File search for Drive Letter 8 44
Ever wondered how to display how many visitors you have online. In this tutorial I will show you an easy but effective way to display the number of online visitors in WhizBase. In this article I assume you have read my previous articles and know …
It is becoming increasingly popular to have a front-page slider on a web site. Nearly every TV website,  magazine or online news has one on their site, and even some e-commerce sites have one. Today you can use sliders with Joomla, WordPress or …
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

820 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question