t3chguy
asked on
Optimizing Script Performance
I have a script that I"ve written in PHP.
I wrote a script for a company, but it's taking extremely long. All table columns are indexed properly, but sometimes the script can take up to 20 minutes to run.
I can provide code examples if needed, but first allow me to explain how the script runs to see if it can be optimized or organized a little better.
A user will select a start date, end date, and location from a form. Upon submission, the form submits to an ajax call that passes the POST variables to the main script called run_audit.php
The script begins by connecting to an external API and returns XML data. From there, it loops through the XML and inserts records into the database. I then make another call to the API to return different data, for the specified location. I need to update the table I just wrote to with the correct type for each bus record I've inserted. Unfortunately these are two separate API calls and actions.
I notice that the update script takes a little while to run, I've tried pulling the bus numbers out, looping through there and updating, and I've also tried looping through the returned API results and updating. Both take about the same time.
The program then makes a call to an MSSQL server and runs a query that returns the employee name, employee id, and job title. This part takes a little while as well, but I'm assuming that the MSSQL tables I'm pulling from are not indexed properly [this part is not my issue, as I'm just connecting to another person's table].
With the employee query I just ran, I put the results in a while loop. Within this while loop, I have a foreach loop that loops through each day of the week.
Now within each day, I have to run queries to get the data initially inserted from the API call.
For each person, for each day of the week, I run queries for the bus number that they drove OUT of the lot time and back IN the lot, for each day segment.
Day Segments are broken up like this: AM, Mid, PM, and Late PM.
The loop now is looping for each employee [~200] per day [5 days] per segment [4 per day] with both and IN and OUT query.
By my calculations, my script is running about 8,000 queries, which could be the reason the script is extremely slow.
As each piece of data is being collected, I am adding it to an array, and then doing a json_encode on that array to pass back to the javascript and building HTML tables dynamically based on the returned data.
I'm wondering if there is a better way to structure the process so this script doesn't take 20 minutes.
I have attached an image below of the JSON array that is being returned from the script.
Again, I know without code it's almost impossible to visualize, and will be happy to provide if needed.
Thank you in advance.
SH.png
I wrote a script for a company, but it's taking extremely long. All table columns are indexed properly, but sometimes the script can take up to 20 minutes to run.
I can provide code examples if needed, but first allow me to explain how the script runs to see if it can be optimized or organized a little better.
A user will select a start date, end date, and location from a form. Upon submission, the form submits to an ajax call that passes the POST variables to the main script called run_audit.php
The script begins by connecting to an external API and returns XML data. From there, it loops through the XML and inserts records into the database. I then make another call to the API to return different data, for the specified location. I need to update the table I just wrote to with the correct type for each bus record I've inserted. Unfortunately these are two separate API calls and actions.
I notice that the update script takes a little while to run, I've tried pulling the bus numbers out, looping through there and updating, and I've also tried looping through the returned API results and updating. Both take about the same time.
The program then makes a call to an MSSQL server and runs a query that returns the employee name, employee id, and job title. This part takes a little while as well, but I'm assuming that the MSSQL tables I'm pulling from are not indexed properly [this part is not my issue, as I'm just connecting to another person's table].
With the employee query I just ran, I put the results in a while loop. Within this while loop, I have a foreach loop that loops through each day of the week.
Now within each day, I have to run queries to get the data initially inserted from the API call.
For each person, for each day of the week, I run queries for the bus number that they drove OUT of the lot time and back IN the lot, for each day segment.
Day Segments are broken up like this: AM, Mid, PM, and Late PM.
The loop now is looping for each employee [~200] per day [5 days] per segment [4 per day] with both and IN and OUT query.
By my calculations, my script is running about 8,000 queries, which could be the reason the script is extremely slow.
As each piece of data is being collected, I am adding it to an array, and then doing a json_encode on that array to pass back to the javascript and building HTML tables dynamically based on the returned data.
I'm wondering if there is a better way to structure the process so this script doesn't take 20 minutes.
I have attached an image below of the JSON array that is being returned from the script.
Again, I know without code it's almost impossible to visualize, and will be happy to provide if needed.
Thank you in advance.
SH.png
ASKER
In the example where there are 200 employees at a location, then 1,000 keys exist in the final array, 44 keys nested in each of those 1,000 keys.
I have not given the Explain SELECT a shot but will try that shortly and post back later this evening with some more detailed information.
I have not given the Explain SELECT a shot but will try that shortly and post back later this evening with some more detailed information.
Maybe show us the CREATE TABLE statements, too. I have a feeling that anything getting data about 200 employees should be lightning fast.
ASKER
Will do, thank you!
ASKER
Sorry it has taken so long to get back to this. I've sort of torn the script apart and put it together piece by piece.
I'm thinking the problem is I am connecting to both a MySQL and MSSQL database server.
My initial code is a call to an API which returns data in 1,089ms.
Then I loop through that data and if the bus is a certain subtype, I run an update query.
The API plus the DB updates completes in about 2400 ms.
From here, I go back to MSSQL and run a query to pull all employees out of a table and loop through them.
I first run the following query (field / table names changed for security:
This returns 246 rows. I'm able to loop through two different employee time tables as well as the bus number table, and this by itself returns in 7493ms.
The problem comes in when I run both the API + update query in addition to the previous mssql script, the script just takes a long, long time to run...about 20 minutes.
It's important to note that within the employee loop I just posted, I'll run 3 queries in mssql and about 10 queries in mysql.
Do I need to do something differently when I'm switching between mysql queries and mssql queries?
Thanks.
I'm thinking the problem is I am connecting to both a MySQL and MSSQL database server.
My initial code is a call to an API which returns data in 1,089ms.
Then I loop through that data and if the bus is a certain subtype, I run an update query.
#Loop through the new array to update bus sub types.
foreach($asset_array['asset'] AS $key => $arr) {
if($arr['subtype'] == 'Wheelchair') {
echo $arr['fleet'] . ' - ' . $arr['subtype'] . '<br />';
$upd_bus_types = mysqli_query($link, "UPDATE auto_pay_data SET bus_type = 'Wheelchair' WHERE fleet_number = '".$arr['fleet']."'") or die("Cannot update bus type - " . $arr['fleet'] . ": " . mysqli_error($link));
}
}
The API plus the DB updates completes in about 2400 ms.
From here, I go back to MSSQL and run a query to pull all employees out of a table and loop through them.
I first run the following query (field / table names changed for security:
$emp_sql = sqlsrv_query($mssql_link, "SELECT [tbl_Profile].i_EmployeeID, EmployeeFirstName, EmployeeLastName, EmployeeJobTitle FROM [dbname].[dbo].[tbl_Profile] LEFT JOIN [dbname].[dbo].[tbl_EmpProfile] ON [dbname].[dbo].[tbl_Profile].i_EmployeeID = [".$cic_db_name."].[dbo].[tbl_EmpProfile].i_EmployeeID WHERE EmployeeJobStatus = '1' AND (i_EmployeeJobTitle = '1' OR i_EmployeeJobTitle = '6')", array(), array("Scrollable"=>"buffered"));
This returns 246 rows. I'm able to loop through two different employee time tables as well as the bus number table, and this by itself returns in 7493ms.
The problem comes in when I run both the API + update query in addition to the previous mssql script, the script just takes a long, long time to run...about 20 minutes.
It's important to note that within the employee loop I just posted, I'll run 3 queries in mssql and about 10 queries in mysql.
Do I need to do something differently when I'm switching between mysql queries and mssql queries?
Thanks.
ASKER
After digging into this a lot more, allow me to explain correctly, what is happening.
As I previously stated, I'm looping through ~246 employees, and within that loop, I'm running queries for each day of the week.
As an example, my loop looks like this:
In addition, right before I close the "day" loop, I'm building an array to store all the data generated on this page.
At the end of the loop, I do a
The queries and data manipulation happen almost instantly...~30 seconds to run all these queries.
The bottleneck occurs when I build that final array. It seems to come to a complete standstill, in fact.
Again, in that second loop:
How can I get this data put into the array quicker, so that I can still echo it through with json_encode and pass it back to my html page?
As I previously stated, I'm looping through ~246 employees, and within that loop, I'm running queries for each day of the week.
As an example, my loop looks like this:
foreach($employee as $v) {
foreach($day as $d) {
//Do a bunch of queries, data manipulations, etc.
}
In addition, right before I close the "day" loop, I'm building an array to store all the data generated on this page.
At the end of the loop, I do a
json_encode($report_data)
, which returns data back to the initial ajax call and displays the data in an html table.The queries and data manipulation happen almost instantly...~30 seconds to run all these queries.
The bottleneck occurs when I build that final array. It seems to come to a complete standstill, in fact.
Again, in that second loop:
foreach($employee as $v) {
foreach($day as $d) {
//Do a bunch of queries, data manipulations, etc.
//...
$report_data[] = array(
#Common Fields
"day" => $day,
"date" => $date,
"first_name" => trim($row['c_EmployeeFirstName']),
"last_name" => trim($row['c_EmployeeLastName']),
"employee_id" => $row['i_EmployeeID'],
"job_title" => trim($job_title),
#AM Segment Specific
"am_bus_number" => $am_bus_number,
"am_in_time" => $am_in_time,
"am_out_time" => $am_out_time,
"am_zonar_out" => $am_zonar_out,
"am_time_diff_in" => $am_time_diff_in,
"am_zonar_in" => $am_zonar_in,
"am_time_diff_out" => $am_time_diff_out,
"am_zonar_out_violation" => $am_zonar_out_violation,
"am_zonar_in_violation" => $am_zonar_in_violation,
#Mid Segment Specific
"mid_bus_number" => $mid_bus_number,
"mid_in_time" => $mid_in_time,
"mid_out_time" => $mid_out_time,
"mid_zonar_out" => $mid_zonar_out,
"mid_time_diff_in" => $mid_time_diff_in,
"mid_zonar_in" => $mid_zonar_in,
"mid_time_diff_out" => $mid_time_diff_out,
"mid_zonar_out_violation" => $mid_zonar_out_violation,
"mid_zonar_in_violation" => $mid_zonar_in_violation,
#PM Segment Specific
"pm_bus_number" => $pm_bus_number,
"pm_in_time" => $pm_in_time,
"pm_out_time" => $pm_out_time,
"pm_zonar_out" => $pm_zonar_out,
"pm_time_diff_in" => $pm_time_diff_in,
"pm_zonar_in" => $pm_zonar_in,
"pm_time_diff_out" => $pm_time_diff_out,
"pm_zonar_out_violation" => $pm_zonar_out_violation,
"pm_zonar_in_violation" => $pm_zonar_in_violation,
#LPM Segment Specific
"lpm_bus_number" => $lpm_bus_number,
"lpm_in_time" => $lpm_in_time,
"lpm_out_time" => $lpm_out_time,
"lpm_zonar_out" => $lpm_zonar_out,
"lpm_time_diff_in" => $lpm_time_diff_in,
"lpm_zonar_in" => $lpm_zonar_in,
"lpm_time_diff_out" => $lpm_time_diff_out,
"lpm_zonar_out_violation" => $lpm_zonar_out_violation,
"lpm_zonar_in_violation" => $lpm_zonar_in_violation
);
}
}
How can I get this data put into the array quicker, so that I can still echo it through with json_encode and pass it back to my html page?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Can you comment the array bulding and json_encode stages to verify if the script time is affected?
which is the expected time to run ?
Thinking in the amount of iterations (20min for ~4000 iterations) you have about ~200ms per iteration (at some point of view, faster).
So the best optimization is to reduce the amount of iterations, or reducing the iteration time to 100ms or less, to reach your expectations.
which is the expected time to run ?
Thinking in the amount of iterations (20min for ~4000 iterations) you have about ~200ms per iteration (at some point of view, faster).
So the best optimization is to reduce the amount of iterations, or reducing the iteration time to 100ms or less, to reach your expectations.