Link to home
Start Free TrialLog in
Avatar of Laurent Belin
Laurent Belin

asked on

Creating specially formated text files

Hi all!

I created a small "tool" in PHP to extract data from MYSQLin text files with fixed-field widths.
This tool will be used to produce mailing labels for magazines and therefore the data must be formatted to fixed-width fields. Basically, I need to be able to set a fixed length for each field and keep the empty spaces in between the fields with "pad charcaters".

I managed to successfully produce the text file with the padding characters (using sprintf)!

...but...I'm having some performance issues when using this script on Windows 2000 Server IIS 5.0.

This is the script I wrote:
The user uses an Html form where he inserts the name of the magazine ($nome_rivista), Issue number ($num_rivista ), issue year ($anno_rivista), Magazine code ($cod_rivista), and total number of records ($max_results) he wants to extract.

[CODE]
<?php
// not needed on Mac enviornment
//set_time_limit(60);
//error_reporting(E_ALL ^ E_NOTICE);
/********************************************
PHP - text extraction code
for magazine labels.
Page: output_weekit_mod.php
Developer: Lollo
Created: 07/01/2005
Modified: 20/01/2005
*********************************************/
// Database Connection
include 'inc/pbf_db.php';

// Post from form: set max number of records to extract
$max_results = $_POST['max_results'];

/********************************************
Post from form: magazine code and description
of the records being extracted
*********************************************/
$nome_rivista = $_POST['nome_rivista'];
$num_rivista = $_POST['num_rivista'];
$anno_rivista = $_POST['anno_rivista'];
$cod_rivista = $_POST['cod_rivista'];


// create description of magazine
$des_contatto = ("Invio ".$nome_rivista." "."numero ".$num_rivista." "."anno"." ".$anno_rivista);


/********************************************
Post to contatti the above data to keep
track of the magazine labels being created.
*********************************************/
$sql_contatti = mysql_query("INSERT INTO contatti (des_contatto, data_contatto, cod_rivista)
            VALUES('$des_contatto', now(), '$cod_rivista')") or die (mysql_error());

// Grab the id for later use
$cod_contatto = mysql_insert_id();

/********************************************
Perform MySQL query to extract records that
will be used to produce the magazine labels
*********************************************/
$sql_extract = mysql_query("SELECT persone.cod_n, persone.cognome, persone.nome, persone.nomeaz1, persone.via1, persone.civico, persone.cap, persone.localita, persone.provincia, persone.sesso, persone.tit_accad, persone.itaester, persone.via_aggiunta FROM persone, diritti_riviste, ultimo_contatto
WHERE diritti_riviste.cod_rivista = '644'
and persone.cod_n = diritti_riviste.cod_n
and diritti_riviste.cod_n = ultimo_contatto.cod_n
and diritti_riviste.cod_rivista = ultimo_contatto.cod_rivista
and diritti_riviste.cod_priorita BETWEEN 1 AND 3
and sospeso = 'N'
and cancellato = 'N'
and privacy = 'N'
and do_not_mail = 'N'
and estero = '0'
and LENGTH(CONCAT(cognome,nome)) < 30
ORDER BY diritti_riviste.cod_priorita, ultimo_contatto.data_uc DESC
LIMIT 0, $max_results");

// loop to extract data
while($row = mysql_fetch_array($sql_extract)) {

//set variable for easier editing
//$cod_rivista = $row['cod_rivista'];
$cod_n = $row['cod_n'];
$cognome = $row['cognome'];
$nome = $row['nome'];
$nomeaz1 = $row['nomeaz1'];
$via1 = $row['via1'];
$civico = $row['civico'];
$cap = $row['cap'];
$localita = $row['localita'];
$provincia = $row['provincia'];
$sesso = $row['sesso'];
$tit_accad = $row['tit_accad'];
$itaester = $row['itaester'];
$via_aggiunta = $row['via_aggiunta'];


/********************************************
I perform a MySQL query to insert in
contatti_pers the following records
$cod_contatto, $cod_rivista, $cod_n
This is used as a flag to determin which records
received a certain magazine.
/********************************************/

$sql_contatti_pers = mysql_query("INSERT INTO contatti_pers (cod_contatto, cod_rivista, cod_n)
            VALUES('$cod_contatto', $cod_rivista, '$cod_n')") or die (mysql_error());

//edit variables for correct formating
$nominativo = $cognome.",".$nome;
$via = trim($via1).",".trim($civico);
$via_ok = substr($via,0,31);
$via_aggiunta1 = substr($via_aggiunta,0,31);
$nomeaz1 = substr($nomeaz1,0,31);
$localita_ok = substr($localita,0,22);

/********************************************
Set the automatic download section
/********************************************/
// Output buffer not needed on Mac enviornment
//ob_start();
header("Content-type: application/octet-stream");
header("Content-Disposition: attachment; filename=file_mod3_weekit.txt");
header("Pragma: no-cache");
header("Expires: 0");
// test print to screen (str_pad or printf)
print($cod_rivista."0".sprintf("%08d", $cod_n).sprintf("%-31s", $nominativo).sprintf("%-31s", $nomeaz1).sprintf("%-31s", $via_ok).sprintf("%-5s", $cap).sprintf("%-22s", $localita_ok).sprintf("%-2s", $provincia)." "." ".sprintf("%01d", $sesso).sprintf("%03d", $tit_accad).sprintf("%-2s", $itaester).sprintf("%04d", $anno_rivista).sprintf("%02d", $num_rivista).sprintf("%-52s", "  500").sprintf("%-31s", $via_aggiunta1)."C"."\r\n");

}

if(!$sql_extract){
      echo 'Errore - 001';
} else {

exit();

}

/********************************************
Here I need to redirect the user (operator)
extracting the labels to a thank you page
(e.g. file download complete !)
...but I can't manage to have it work...although
this is the least of my problems.
/********************************************/
//header("Location: index.php") or scrpt("index.php");

?>

[/CODE]

I keep on getting the following message when I extract more than 2,000 records on a Win system:
[quote]
Maximum execution time of 30 seconds exceeded in
Fatal error:  Maximum execution time of 30 seconds exceeded in...
[/quote]
I get this error printed at the end of my text file...and obviously the scipt has only managed to print the first 1,800 or so records.

I tried to fix this by inserting set_time_limit(60); at the top the script.   I manage to extract at least 10,000 records without getting the above error...but it isn't enough...and on top of it the script is very slow in executing (5 minutes vs 15 seconds on my Powerbook).

My problem is that I have to extract at least 25,000 records at the time. When I try to extract 20,000 records I get the following error:

[quote]
Server: Microsoft-IIS/5.0 Date: Fri, 14 Jan 2005 11:29:56 GMT Connection:
close Content-Length: 186 Content-Type: text/html  <head><title>CGI Application
Timeout</title></head>
<body><h1>CGI Timeout</h1>The specified CGI application exceeded the allowed
time for processing.  The server has deleted the process.

[/quote]

Why do I get this CGI error???  ...and why can't I extract at least 25,000 records.  What is wrong with this script?  Why is it so slow on Windows 2000 Server? Would I have the same problems on a Linux system?

Obviously, the script runs PERFECTLY on my Powerbook G4 (1GHZ - RAM: 768 MB) running MAC OS 10.3, Mysql 4.0.22-standard, PHP 4.3.9 and Apache/1.3.33 (Darwin).  It can extract 25,000 0r even 50,000 records in less than 20 seconds.

The client's server is an IBM Netfinity 3500 (PII ?Mhz RAM: 384 MB) running on Windows 2000 server, MySQL 4.1.9 , PHP 4.3.10 and IIS 5.0.

Please, help me optimize this script for Windows or tell how I can make it faster.  

Thanks,
-L.
Avatar of huji
huji
Flag of United States of America image

Hi
1) You say " some performance issues..." but don't specify what it exactly means... Just an idea: If your code works on Linux r Unix servers, and not on Windows ones, the possble reason is that windows uses different characters for the end of line. Try to use Chr(13).Chr(10) at end of lines of text created by your code, and see if it solves the problem.....
2) Unrelated: Your code is open to "SQL Injection". Translation: it can be hacked! If you don't know what SQL injections is try to search the phrase in google and read the first two are three results...
A simple solution is like this:
replace all single quotes (    '    ) to a pair of single quotes (     ''     ) [note that it is not a double quote (     "     )] , in all varibales which are going to apear somewhere in your SQL commands.
3) Recommendation: You can ask CS to move this question to PHP topic area, which is more suitable for this question.
Wish I can help
Huji
Well I guess what I said in my number 1 is concidered in your code using \r\n .... Please provide a detailed version of what happens which shouldn't happen.
Huji
Avatar of Laurent Belin
Laurent Belin

ASKER

Hi Huji,

By performance issues I mean that the script doesn't extract the total amount of records I post.   It stops executing the script after 4-5 minutes and gives me the above errors I mentioned.  Furthermore, it takes a very long time to execute (4-5 minutes) while it only takes 20 seconds on my Powerbook.

-L.
I keep on getting the following message when I try to extract more than 22,000 records on a Win system (it works great on a Mac OS X system):

========
Maximum execution time of 30 seconds exceeded in
Fatal error:  Maximum execution time of 30 seconds exceeded in...
========

I get this error printed at the end of my text file...and obviously the scipt has only managed to print the first 1,800 or so records to the text file.

I tried to fix this by inserting set_time_limit(60); at the top the script.   I manage to extract at least 10,000 records without getting the above error...but it isn't enough...and on top of it the script is very slow in executing (5 minutes vs 15 seconds on my Powerbook).

My problem is that I have to extract at least 25,000 records at the time. When I try to extract 20,000 records I get the following error:

========
Server: Microsoft-IIS/5.0 Date: Fri, 14 Jan 2005 11:29:56 GMT Connection:
close Content-Length: 186 Content-Type: text/html  <head><title>CGI Application
Timeout</title></head>
<body><h1>CGI Timeout</h1>The specified CGI application exceeded the allowed
time for processing.  The server has deleted the process.
========

Why do I get this CGI error???  ...and why can't I extract at least 25,000 records.  What is wrong with this script?  Why is it so slow on Windows 2000 Server? Would I have the same problems on a Linux system?

Do you see something wrong with the script that can cuase these errors?

Obviously, the script runs PERFECTLY on my Powerbook G4 (1GHZ - RAM: 768 MB) running MAC OS 10.3, Mysql 4.0.22-standard, PHP 4.3.9 and Apache/1.3.33 (Darwin).  It can extract 25,000 0r even 50,000 records in less than 20 seconds.

The client's server is an IBM Netfinity 3500 (PII ?Mhz RAM: 384 MB) running on Windows 2000 server, MySQL 4.1.9 , PHP 4.3.10 and IIS 5.0.

Thanks for your help,
-L.
Mmm... I've little ideas at the moment... May be the second server is much busy? Are both servers sering only your site?
...
The other speed-limitting step may be the SQL connection. If the drivers for the database are not fast enough, the SQL command may take long to give result.. I haven't experienced MySQL on Windows yet... I konw MS SQL and Access and others are fast enough.....
I'll be searching for related material on the web and post it here.
Wish I can help
huji
The servers are only serving this script.  The funny thing is that it runs fine my Powerbook but it only partially works on Windows server.

Do you think that the script is well written?

- Somebody suggested to me the following:
This is an IIS error, and not a PHP error.
In the Internet Information Services snap-in, select the computer icon and open its property sheets.
Under Master Properties, select WWW Service, and then click the Edit button
Click the Home Directory tab.
Click the Configuration button.
Click the Process Options tab, and then type the timeout period in the CGI Script Timeout box.

I don't know if the above can help you...help me. ;-)

Let me know.

Thanks,
-L.
Well, that prevents getting the "timeout" error, but I'm thinking that it shouldn't take that much time at all...
Maybe it simply is the MySQL query with its joins that slows everything down?  
Afterall it trys to extract more than 20,000 records at once and at the same time it tries to post the same amount of records.
Should I try to split the scrript in 2 parts?  Although, it works perfectly on my powerbook...

Strange....

I am guessing that you just have a mirror of the table on your powerbook. The data on the two MySQL servers may match, but that doesn't mean the tables should! If big changes have been made to the table on the server, the table may have been defragmented, needing the good old 'optimize' command:

OPTIMIZE TABLE contatti_pers;

or even:

OPTIMIZE TABLE contatti_pers, persone, diritti_riviste, ultimo_contatto, ...;

Note that these commands will lock the tables and may take a while, therefore I recommend disabling the scripts (or IIS) before running them, just to be safe.

Furthermore, you can create table indexes to speed up your queries. (And if the table(s) on your powerbook have indexes and the ones on the server don't, that may be the problem) You may use multiple-column indexes to further speed certain queries. For example, in order to speed this query up:

SELECT persone.cod_n, persone.cognome, persone.nome, persone.nomeaz1, persone.via1, persone.civico, persone.cap, persone.localita, persone.provincia, persone.sesso, persone.tit_accad, persone.itaester, persone.via_aggiunta FROM persone, diritti_riviste, ultimo_contatto
WHERE diritti_riviste.cod_rivista = '644'
and persone.cod_n = diritti_riviste.cod_n
and diritti_riviste.cod_n = ultimo_contatto.cod_n
and diritti_riviste.cod_rivista = ultimo_contatto.cod_rivista
and diritti_riviste.cod_priorita BETWEEN 1 AND 3
and sospeso = 'N'
and cancellato = 'N'
and privacy = 'N'
and do_not_mail = 'N'
and estero = '0'
and LENGTH(CONCAT(cognome,nome)) < 30
ORDER BY diritti_riviste.cod_priorita, ultimo_contatto.data_uc DESC
LIMIT 0, $max_results

I would probably create indexes like:

CREATE INDEX i_seem_to_use_this_a_lot ON diritti_riviste (cod_n, cod_rivista, cod_priorita);
CREATE INDEX misc_options ON <(what table is this?)> (sospeso, cancellato, privacy, do_not_mail, estero);

You may want to check your SQL documentation anyhow.

If you have problems finding bottlenecks with multiple SQL queries, remember that you can always measure the time it takes to execute a query:

function getmicrotime(){ /* Right off the PHP manual */
    list($usec, $sec) = explode(" ",microtime());
    return ((float)$usec + (float)$sec);
    }

$s = getmicrotime();
$sql_extract = mysql_query("SELECT persone.cod_n, persone.cognome, persone.nome, persone.nomeaz1, persone.via1, persone.civico, ...
$e = getmicrotime();

echo "The big query took ".($e-$s)." seconds!\n";

$s = getmicrotime();
$sql_contatti = mysql_query("INSERT INTO contatti (des_contatto, data_contatto, cod_rivista)
          VALUES('$des_contatto', now(), '$cod_rivista')") or die (mysql_error());
$e = getmicrotime();

/* Indexes slow down INSERT operations, see if this one is taking too long. */
echo "The insert query took ".($e-$s)." seconds!\n";
Learning!!!!

[May I have a question? We don't have that OPTIMIZE thing for other databases, like Access, etc?]
huji,

I do not know about other databases, but they may have some specific optimize commands as well. (Or, they may optimize tables automatically.)

The inner workings of MySQL may be different. Here is a snippet of the OPTIMIZE TABLE page (4.6.1) from the MySQL 4.1.1-alpha manual:

OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, BLOB, or TEXT columns). Deleted records are maintained in a linked list and subsequent INSERT operations reuse old record positions. You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the datafile.

In most setups you don't have to run OPTIMIZE TABLE at all. Even if you do a lot of updates to variable length rows it's not likely that you need to do this more than once a month/week and only on certain tables.

(...)

OPTIMIZE TABLE works the following way:

If the table has deleted or split rows, repair the table.
If the index pages are not sorted, sort them.
If the statistics are not up to date (and the repair couldn't be done by sorting the index), update them.
Note that the table is locked during the time OPTIMIZE TABLE is running!
Also:

Some databases may not need optimizations at all. (With, of course, drawbacks on other parts.)

The ex2 filesystem, for example, has been shown not to gain much from defragmentation on typical Linux systems, to the best of my knowledge.
Thanks aib_42 for your tips.

I'll try optimizing the tables on the server and measuring the time the 2 main queries take to execute.  (BTW, the DB on the server is just a dump of the DB from my powerbook)   I'll try these first thing on Monday morning.

How can I echo the time the queries take to execute without going in conflict with the download section:
/********************************************
Set the automatic download section
/********************************************/
// Output buffer not needed on Mac enviornment
//ob_start();
header("Content-type: application/octet-stream");
header("Content-Disposition: attachment; filename=file_mod3_weekit.txt");
header("Pragma: no-cache");
header("Expires: 0");
// test print to screen (str_pad or printf)
print($cod_rivista."0".sprintf("%08d", $cod_n).sprintf("%-31s", $nominativo).sprintf("%-31s", $nomeaz1).sprintf("%-31s", $via_ok).sprintf("%-5s", $cap).sprintf("%-22s", $localita_ok).sprintf("%-2s", $provincia)." "." ".sprintf("%01d", $sesso).sprintf("%03d", $tit_accad).sprintf("%-2s", $itaester).sprintf("%04d", $anno_rivista).sprintf("%02d", $num_rivista).sprintf("%-52s", "  500").sprintf("%-31s", $via_aggiunta1)."C"."\r\n");

Do you see anything else in the script that could be creating the IIS errors?  May be the insert query in the loop?

It is strange that I need to use the ob_start() on the Windows system but not on the Mac system.

Well...thanks again...and I will let you know on Monday.  If you see anything else that is abnormal with the script please let me know.

-L.
>How can I echo the time the queries take to execute without going in conflict with the download section?

Hmm, you could print it to another file. That way, also, if you printed something like "starting query..." before the query, and "query ended. time: %f seconds" afterwards, you would still see the "starting query" part if your script timed out...
Another way of testing the script is running it from the command line, given that it doesn't depend on HTTP variables such as $_POST, $_GET, $_SESSION... (Copying the query to a stand-alone php file and running it is one way)

But why not conflict with the download section? Have the time printed IN your text file.. You can delete it manually, later.

>It is strange that I need to use the ob_start() on the Windows system but not on the Mac system.

ob_start() has a lot to do with the web server - especially, how it buffers script output and handles the header section. Do not leave the line commented; have it the way that works on both systems. You might also want to stop buffering and flush the buffer as soon as the file is written - use ob_end_flush().

>Do you see anything else in the script that could be creating the IIS errors?  May be the insert query in the loop?

IIS errors?? Oh, this:

Maximum execution time of 30 seconds exceeded in
Fatal error:  Maximum execution time of 30 seconds exceeded in...

I don't really regard it as an error - it just says that your script has been running for more than 30 seconds. Try my optimizations, see if they work. It's amazing how much the right table indexes can speed up your queries (x50 is not an unusual number)

If you're worried about the insert query, time that too. The bottleneck is probably the big, big query, but you may also want to check the other queries for slowness.

My apologies in advance if there's anything wrong with this reply; I cannot concentrate much right now, due to noise...
I aib_42,

I have tried your optimizations (optimize table and create indexes)....but no luck!

I keep on getting:  "<b>Fatal error</b>:  Maximum execution time of 30 seconds exceeded in....."

I have a feeling that the ob_start() is creating this slowness.  
It is starnge that I don't need the ob_start() on my Powerbook but I do need it on the Windows system...otherwise it will give me the headers error: "Cannot modify header information - headers already sent by...."

I have even tried to use Apache 1.3 instead on IIS 5.0...but same old error....

What can I do?  Maybe I can try installing Linux instead of Windows 2000.

Please let me.

-L.
Hi aib_42,

I just noticed something weird....but seems to be good news:

If I don't write the ob_start() on the Windows system it will write the entire file (all the records that I want even more than 22000) in less than 20 second.....the only problem is that it writes the header error between each record:

<b>Warning</b>:  Cannot modify header information - headers already sent by (output started at d:\www\pbf.apache\output_weekit_mod3.php:135) in <b>d:\www\pbf.apache\output_weekit_mod3.php</b> on line <b>130</b><br /

How can I avoid this header error?

Please let me know.

Ciao,
-L.
Ahh, so you were not doing ob_end_flush() ?

PHP should do it automatically when the script finishes executing, but put it on the last line before ?> anyway and run the script again...

The "Cannot modify header information" error is the result of printing something to the standard output before using header(). In order to display the output immediately (immediately because output buffering is not turned on using ob_start) the web server sends the headers immediately, then starts sending whatever data it has. After that, you cannot use the header() function because the headers have already been sent:

*****
HTTP/1.1 200 OK
Date: Mon, 24 Jan 2005 19:33:18 GMT
Server: Apache
Content-Type: text/html;charset=ISO-8859-1
Blah: blah
Blah: blah
Connection: close

And here comes the output of the script.
There is no point in putting a header line here:
Pragma: no-cache
Because it would be treated as normal content.
*****

So, what output is there before the header() lines?
Note that there may be none but the web server could still go ahead and print the headers immediately since buffering is turned off. In this case, you can never use header() without ob_start()...

It would still be a good idea to buffer the output; see the first two lines of this reply.
ASKER CERTIFIED SOLUTION
Avatar of aib_42
aib_42

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
aib_42,

Thank you very very much!!!!!!!!!

-L.