Solved

Create 20 threads each launching a php script

Posted on 2004-04-21
13
211 Views
Last Modified: 2013-12-25
Hi,
I have a system (Linux RH7.3) written in php but part of the system is to go and parse html pages. The demand is now too high for the script and therefor I need to launch multiple versions of the php script. I cannot just increase the number of cron jobs as each cycle has to be passed a refernce number of the client whom it is to action.

What I need to do (No experience of perl or python) is create a cgi app I can call from a cron job which once called creates a number of threads (var x, passed by call as parameter) and each thread will call a version of the php script. Once all php scripts have returned their responses the cgi closes?

Does this make sense?, I hope so lol.

How is it best to do this and can it indeed be done?

So, to summarise
How can I create a perl or likewise script that will be passed 2 vars (customer_id, thread_qty) and then each thread will go and call a php script. Once the php script completes the thread is closed, once all threads are closed the cgi stops.

Thanks for any help
S
Cheers
0
Comment
Question by:08718712060
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
13 Comments
 
LVL 10

Expert Comment

by:Mercantilum
ID: 10880717
If you call a php script you may have better to call fork() instead :

  See http://iis1.cps.unizar.es/Oreilly/perl/learn/ch14_04.htm

You can create n fork processes doing their php job, at the end of the script is a loop like

     while (wait () != -1) {};

waiting for all process to end before to continue.

It should be easy to implement.
0
 

Author Comment

by:08718712060
ID: 10886346
I guess easy if you know what you are doing lol...

The fork() looks good but how would I deal with the parameters, can I just read them out as in php, ie if I pass vars $x $y are these vars availible within the perl scripts as $x and $y or do I need to retrieve them with some sort of param(n) command?

Cheers
0
 
LVL 10

Expert Comment

by:Mercantilum
ID: 10886360
All variables before the fork() are accessible with the same value after the fork().
However, after a child change one value, it doesn't affect the other fork() child, or the father.

E.g Before fork() $x==3;
After fork() it is still 3, but if modified by the child, now, it wont  affect the other child, or the father.
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 

Author Comment

by:08718712060
ID: 10886426
So basically the fork() oporates the same way as a windows thread, all variables are duplicated to each thread and each thread has its own independant copy of the initial vars?

**I appologise for my lack of knowledge on this but I am mainly a Delphi developer who has only recently turned to php.**

Cheers
0
 
LVL 10

Expert Comment

by:Mercantilum
ID: 10886490
No problem, we're here to help.

The threads share all the same memory space, and therefore when 1 thread modifies a value, all threads see the new value.
After a fork, a new process is created, same as if you open two notepads: when you modify the text of one, it does not modify the text of the 2nd.
However, you start 2 different notepads in this case.
While with fork(), you duplicate the process and it's data.
So you end up with two different process, each of them with different memory space, and no worry one will affect the other ; however, when the child was born :) he inherited the same memory space, and therefore variables and their values at the time of the fork() etc...

Please have a look to http://www.experts-exchange.com/Programming/Programming_Languages/C/Q_20891566.html for info.
0
 

Author Comment

by:08718712060
ID: 10886567
Hi Mercantilum,
Thanks for the referal it was an interesting synopsis of multi-thread/process. I am going to take 10 minutes out whilst I clarify in my own mind exactly which way I think I should go and then come back to you if that is OK?. A well constructed query is obviously more use and less time wasting that lots of piddly comments from me lol.

PS: Dont suppose you do paid contracts do you????

Cheers
Stu
BRB
0
 

Author Comment

by:08718712060
ID: 10886789
Hi Mercantilum,

Project Aim
At regular intervals a cron job launches a php script, this script finds several pieces of information in a mysql db and returns crucial values such as the id's of pages it needs to investigate.

The script then cycles through these pages and returns the html into a var which it then searches for a text match. The problem we face is that at first this worked no problem as there were only about 10 pages and 10 or so keywords to locate. We now need to search say 20 pages and each page has to be searched for about 300 words. Needless to say we are running into timeouts. Especially as each page has to be recalled with the associated keyword ie 100 kw's and 10 pages = 100 * 10 requests!!

This could be overcome by overriding the apache timeouts with php but the problem comes if a page cannot be retrieved for what ever reason, ie site down etc the rest of the pages are not processed as the script is just sat waiting for the current page and you end up with the nightmare situation, a hung script with strict instructions NOT to timeout :-(.

I thought the best way round this would be to create a multi-thread process which is called by the cron job, the mt process would then obtain a list of all pages and for each page required, generate a thread. The child thread would then either create an array of keywords and do the searching or launch a php script which could then return the results to the db. This way even if one thread times out we can basically scrap the process and cancel all threads which will inturn set the status back to "Not run" (or something similar) and this would obviously retry the next time the cron was run.

What we have at the moment is the flags set to "Running" to avoid the cronjob re-launching that report_id and then they stay there because things are hanging and timeing out.

I realy appreciate your input on this and like I said I am willing to pay should you agree to code it for me (Obviously with a detailed spec / schematic etc).

Cheers
Stu
0
 
LVL 10

Accepted Solution

by:
Mercantilum earned 500 total points
ID: 10889969
Could you search the words before it is HTML (faster to my mind)?

If you need to find 300 words in 20 pages, the best is maybe, before the first search, to make a tree of words, so that the search will be much quicker.
For instance, I don't know what you want from the search, but simple Hashes in Perl make a kind of tree.
Doing 300 times: if (exists $tree{$myword}) will be incredibly faster than a sequential search in all the words on 20 pages each time....

Reading your last comment, I'm not sure the threading or forking will really improve the situation.
Do you have several processors in your box? if not, 10 threads doing a job of 1 second (pure cpu), will be the same as 1 thread doing the 10 jobs, in 10 seconds.
0
 

Author Comment

by:08718712060
ID: 10890218
Hi Mercantilum,
I have now aquired a "perl cookbook" O'reily publishers and all of your above comments are starting to become clearer. I will have a try and see how I get on but failing that I may have to rent a developer (Hint hint!)

Cheers for your help so far :-)
S
0
 
LVL 10

Expert Comment

by:Mercantilum
ID: 10890318
Feel free to ask in Perl :)
In perl you can get an "index" very quickly (20 pages is not a lot in memory)
E.g.

# Reading your pages and making an index/tree of words, each item contains the number of times the word is in
# The script takes the lines of your pages as input (like : cat pages.txt | perl thisscript)
# It takes ... one Perl line

while (<STDIN>) {  for (split) { $tree{$_}++;  } }

# To check if your words are in the pages (e.g. display the words in the pages)
# Your list of words is in an array @words
# It takes... one more line!

for (@words) { print "$_\n" if (exists $tree{$_}); }

Enjoy perl :)
0
 

Author Comment

by:08718712060
ID: 10904878
Hi Mercantilum,
Im not sure if you are still monitoring this post but if so I could do with a small favour :-P

I have decided on the following action (a bit bizarre but I see it as a quick work-around). I will be running a php script from cron that carries out all the logging and filtering of records requiring report generation. I will then be cycling through the relevent engine records and creating 3 arrays (Keyphrase and System will not change but the engine array will be generated per engine record),
array1 : Engine Array
This will contain all the neccesary information about the local engine to query
array2 : Keyphrase Array
This will contain the list of keyphrases allready url encoded
array3 : System Array
This will contain all the associated record ids etc required for progress logging.

At this point I want to launch a perl script passing it the above 3 arrays. All the perl script is required to do is generate the number of threads required (ie the number of keyphrases in array2. Each thread will launch a php script passing it the 3 arrays as above.

This way I am only having to modify the system as far as inserting the perl script to generate a multi thread which will hopefully avoid TO issues and also speed things up!

My question is How can I do this simple task?
PS: I will open a new post and issue points if required.

Cheers
S
0
 
LVL 10

Expert Comment

by:Mercantilum
ID: 10905290
So you call php to call perl to call php ...
Again, I wonder maybe you should first call perl, when your input is still the log, process it as much as you can, then finally call php to get the final output format.
Again unless you have a powerful multiprocessor system, and depending on exactly the data is to be processed, I'm not sure you need threads or forking.
0
 

Author Comment

by:08718712060
ID: 10909320
Hi Mercantilum
Thats the way I am going to do it which I know is a weird way of doing it but there are so many functions within the existing system that I do not want to port to perl just yet.
The final system is running on a multi processor server now so I will see how it all goes.

Cheers for all of your help,
S
0

Featured Post

Certified OpenStack Administrator Course

We just refreshed our COA course based on the Newton exam.  With 14 labs, this course goes over the different OpenStack services that are part of the certification: Dashboard, Identity Service, Image Service, Networking, Compute, Object Storage, Block Storage, and Orchestration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Batch, VBS, and scripts in general are incredibly useful for repetitive tasks.  Some tasks can take a while to complete and it can be annoying to check back only to discover that your script finished 5 minutes ago.  Some scripts may complete nearly …
The Windows functions GetTickCount and timeGetTime retrieve the number of milliseconds since the system was started. However, the value is stored in a DWORD, which means that it wraps around to zero every 49.7 days. This article shows how to solve t…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
The viewer will learn how to count occurrences of each item in an array.

635 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question