Solved

Create 20 threads each launching a php script

Posted on 2004-04-21
13
206 Views
Last Modified: 2013-12-25
Hi,
I have a system (Linux RH7.3) written in php but part of the system is to go and parse html pages. The demand is now too high for the script and therefor I need to launch multiple versions of the php script. I cannot just increase the number of cron jobs as each cycle has to be passed a refernce number of the client whom it is to action.

What I need to do (No experience of perl or python) is create a cgi app I can call from a cron job which once called creates a number of threads (var x, passed by call as parameter) and each thread will call a version of the php script. Once all php scripts have returned their responses the cgi closes?

Does this make sense?, I hope so lol.

How is it best to do this and can it indeed be done?

So, to summarise
How can I create a perl or likewise script that will be passed 2 vars (customer_id, thread_qty) and then each thread will go and call a php script. Once the php script completes the thread is closed, once all threads are closed the cgi stops.

Thanks for any help
S
Cheers
0
Comment
Question by:08718712060
  • 7
  • 6
13 Comments
 
LVL 10

Expert Comment

by:Mercantilum
Comment Utility
If you call a php script you may have better to call fork() instead :

  See http://iis1.cps.unizar.es/Oreilly/perl/learn/ch14_04.htm

You can create n fork processes doing their php job, at the end of the script is a loop like

     while (wait () != -1) {};

waiting for all process to end before to continue.

It should be easy to implement.
0
 

Author Comment

by:08718712060
Comment Utility
I guess easy if you know what you are doing lol...

The fork() looks good but how would I deal with the parameters, can I just read them out as in php, ie if I pass vars $x $y are these vars availible within the perl scripts as $x and $y or do I need to retrieve them with some sort of param(n) command?

Cheers
0
 
LVL 10

Expert Comment

by:Mercantilum
Comment Utility
All variables before the fork() are accessible with the same value after the fork().
However, after a child change one value, it doesn't affect the other fork() child, or the father.

E.g Before fork() $x==3;
After fork() it is still 3, but if modified by the child, now, it wont  affect the other child, or the father.
0
 

Author Comment

by:08718712060
Comment Utility
So basically the fork() oporates the same way as a windows thread, all variables are duplicated to each thread and each thread has its own independant copy of the initial vars?

**I appologise for my lack of knowledge on this but I am mainly a Delphi developer who has only recently turned to php.**

Cheers
0
 
LVL 10

Expert Comment

by:Mercantilum
Comment Utility
No problem, we're here to help.

The threads share all the same memory space, and therefore when 1 thread modifies a value, all threads see the new value.
After a fork, a new process is created, same as if you open two notepads: when you modify the text of one, it does not modify the text of the 2nd.
However, you start 2 different notepads in this case.
While with fork(), you duplicate the process and it's data.
So you end up with two different process, each of them with different memory space, and no worry one will affect the other ; however, when the child was born :) he inherited the same memory space, and therefore variables and their values at the time of the fork() etc...

Please have a look to http://www.experts-exchange.com/Programming/Programming_Languages/C/Q_20891566.html for info.
0
 

Author Comment

by:08718712060
Comment Utility
Hi Mercantilum,
Thanks for the referal it was an interesting synopsis of multi-thread/process. I am going to take 10 minutes out whilst I clarify in my own mind exactly which way I think I should go and then come back to you if that is OK?. A well constructed query is obviously more use and less time wasting that lots of piddly comments from me lol.

PS: Dont suppose you do paid contracts do you????

Cheers
Stu
BRB
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:08718712060
Comment Utility
Hi Mercantilum,

Project Aim
At regular intervals a cron job launches a php script, this script finds several pieces of information in a mysql db and returns crucial values such as the id's of pages it needs to investigate.

The script then cycles through these pages and returns the html into a var which it then searches for a text match. The problem we face is that at first this worked no problem as there were only about 10 pages and 10 or so keywords to locate. We now need to search say 20 pages and each page has to be searched for about 300 words. Needless to say we are running into timeouts. Especially as each page has to be recalled with the associated keyword ie 100 kw's and 10 pages = 100 * 10 requests!!

This could be overcome by overriding the apache timeouts with php but the problem comes if a page cannot be retrieved for what ever reason, ie site down etc the rest of the pages are not processed as the script is just sat waiting for the current page and you end up with the nightmare situation, a hung script with strict instructions NOT to timeout :-(.

I thought the best way round this would be to create a multi-thread process which is called by the cron job, the mt process would then obtain a list of all pages and for each page required, generate a thread. The child thread would then either create an array of keywords and do the searching or launch a php script which could then return the results to the db. This way even if one thread times out we can basically scrap the process and cancel all threads which will inturn set the status back to "Not run" (or something similar) and this would obviously retry the next time the cron was run.

What we have at the moment is the flags set to "Running" to avoid the cronjob re-launching that report_id and then they stay there because things are hanging and timeing out.

I realy appreciate your input on this and like I said I am willing to pay should you agree to code it for me (Obviously with a detailed spec / schematic etc).

Cheers
Stu
0
 
LVL 10

Accepted Solution

by:
Mercantilum earned 500 total points
Comment Utility
Could you search the words before it is HTML (faster to my mind)?

If you need to find 300 words in 20 pages, the best is maybe, before the first search, to make a tree of words, so that the search will be much quicker.
For instance, I don't know what you want from the search, but simple Hashes in Perl make a kind of tree.
Doing 300 times: if (exists $tree{$myword}) will be incredibly faster than a sequential search in all the words on 20 pages each time....

Reading your last comment, I'm not sure the threading or forking will really improve the situation.
Do you have several processors in your box? if not, 10 threads doing a job of 1 second (pure cpu), will be the same as 1 thread doing the 10 jobs, in 10 seconds.
0
 

Author Comment

by:08718712060
Comment Utility
Hi Mercantilum,
I have now aquired a "perl cookbook" O'reily publishers and all of your above comments are starting to become clearer. I will have a try and see how I get on but failing that I may have to rent a developer (Hint hint!)

Cheers for your help so far :-)
S
0
 
LVL 10

Expert Comment

by:Mercantilum
Comment Utility
Feel free to ask in Perl :)
In perl you can get an "index" very quickly (20 pages is not a lot in memory)
E.g.

# Reading your pages and making an index/tree of words, each item contains the number of times the word is in
# The script takes the lines of your pages as input (like : cat pages.txt | perl thisscript)
# It takes ... one Perl line

while (<STDIN>) {  for (split) { $tree{$_}++;  } }

# To check if your words are in the pages (e.g. display the words in the pages)
# Your list of words is in an array @words
# It takes... one more line!

for (@words) { print "$_\n" if (exists $tree{$_}); }

Enjoy perl :)
0
 

Author Comment

by:08718712060
Comment Utility
Hi Mercantilum,
Im not sure if you are still monitoring this post but if so I could do with a small favour :-P

I have decided on the following action (a bit bizarre but I see it as a quick work-around). I will be running a php script from cron that carries out all the logging and filtering of records requiring report generation. I will then be cycling through the relevent engine records and creating 3 arrays (Keyphrase and System will not change but the engine array will be generated per engine record),
array1 : Engine Array
This will contain all the neccesary information about the local engine to query
array2 : Keyphrase Array
This will contain the list of keyphrases allready url encoded
array3 : System Array
This will contain all the associated record ids etc required for progress logging.

At this point I want to launch a perl script passing it the above 3 arrays. All the perl script is required to do is generate the number of threads required (ie the number of keyphrases in array2. Each thread will launch a php script passing it the 3 arrays as above.

This way I am only having to modify the system as far as inserting the perl script to generate a multi thread which will hopefully avoid TO issues and also speed things up!

My question is How can I do this simple task?
PS: I will open a new post and issue points if required.

Cheers
S
0
 
LVL 10

Expert Comment

by:Mercantilum
Comment Utility
So you call php to call perl to call php ...
Again, I wonder maybe you should first call perl, when your input is still the log, process it as much as you can, then finally call php to get the final output format.
Again unless you have a powerful multiprocessor system, and depending on exactly the data is to be processed, I'm not sure you need threads or forking.
0
 

Author Comment

by:08718712060
Comment Utility
Hi Mercantilum
Thats the way I am going to do it which I know is a weird way of doing it but there are so many functions within the existing system that I do not want to port to perl just yet.
The final system is running on a multi processor server now so I will see how it all goes.

Cheers for all of your help,
S
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

The following is a collection of cases for strange behaviour when using advanced techniques in DOS batch files. You should have some basic experience in batch "programming", as I'm assuming some knowledge and not further explain the basics. For some…
In this tutorial I will show you how to make a simple HTML bar chart with the usage of WhizBase, If you want more information about WhizBase please read my previous articles at http://www.experts-exchange.com/ARTH_5123186.html (http://www.experts-ex…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
The viewer will learn how to dynamically set the form action using jQuery.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now