Hello,
I have a script that I use to parse flat log files and INSERT the info into a mySQL table. There are frequently very similar log entries, but for the most part, each is unique. Obviously I want to log the similar entries but skip anything that may have already been INSERTED. (For reasons beyond my control, random duplicate entries show up in subsequent log files about 5% of the time.)
What I have been doing is a "SELECT * WHERE..." on a few key columns (all of which are indexed - that sped things up A LOT!) to look for a match. If it finds one, obviously it skips the INSERT statement. However, when no match is found, the new information is added.
Here (buried in some PHP seudo-code) is pretty much what I have been using to now:
/* start pseudo-code **************************
**********
**********
******/
// $dupe will look at the DB and see if there is already an entry that matches what we are about INSERT
$dupe = "SELECT * FROM x WHERE this = '$that' AND this_too = '$that_too' AND this_one = '$that_one'";
$result = mysql_query( $dupe ) or die ( 'Unable to check for duplicates.' );
$num = mysql_numrows( $result );
if ( $num != 0 ) {
// A matching log entry was found - do not insert a duplicate value.
echo "Duplicate.<br>\n";
} else {
// Insert new info
$sql = "INSERT INTO x_print (
id,
this,
this_too,
this_one,
blah_1,
blah_2
blah_3,
etc
) VALUES (
'',
'$that',
'$that_too',
'$that_one',
'$blah_1',
'$blah_2',
'$blah_3',
'$etc'
)";
if (!mysql_query ($sql)) {
echo mysql_error($Link);
die('The query could not be executed!');
}
echo "Inserted!<br>\n";
}
/* end seudo-code **************************
**********
**********
******/
This method is AT LEAST 10x slower than the INSERT statements alone. Some of these log files have 100,000+ lines, and will generate 5,000 to 10,000 rows. Therefore, that's 5,000 to 10,000 SELECT statements too.
Is there something fundementaly wrong with how I am going about this?
Is there a better way?
Thanks,
Tony