Updating a large MySQL DB

I am working with a pretty large (for me anyway) MySQL DB. It is about 260mb and 500,000 records.

We are receiving data from a third party as zipped DBF files. The data they upload to the server is uploaded 5 to 6 times per day and each time builds upon the previous upload. So for instance the DBF they upload in the morning might have 600 records while the DBF at the end of the afternoon might have 6,000  records but will still include the original 600 records.

I have working scripts that use a couple of crontabs to unzip the zip, archive it, and put the DBF file into a specific location to be read. I also have a script that should INSERT the contents of it into the large, 500,000 record DB.

I'm having an issue where it works fine at smaller (<10,000) record sizes but doesn't do anything to update my large DB. I think that it is taking too long because it is trying to identify duplicate records so that it doesn't import them in.

What is a good strategy to take to both import the records into the large DB but to also keep from having duplicate records?
LVL 1
befidledAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ofirproCommented:
try removing indexes when doing the add, then adding them after it is complete, this MIGHT shave some time off the entire process.

what method are you using for duplicate detection, maybe indexes could be added there?
befidledAuthor Commented:
I'm not really doing anything for duplicate detection other than matching on a primary key. What would a good practice be in my situation?
ofirproCommented:
add the distinct clause for faster way of avoiding duplicates
for duplicate records try select distinct * from table

http://www.cryer.co.uk/brian/sql/sql_delete_duplicates.htm
Announcing the Winners!

The results are in for the 15th Annual Expert Awards! Congratulations to the winners, and thank you to everyone who participated in the nominations. We are so grateful for the valuable contributions experts make on a daily basis. Click to read more about this year’s recipients!

befidledAuthor Commented:
Ok, so are you suggesting that I don't worry about duplicates going into the database but to filter them out in my queries?
ofirproCommented:
as long as it is faster, thats a possibiliy. maybe even remove them the duplicates later.

also an idea that i haven tried this myself, you can try to disable duplicates in the SQL table stracture by making the entitiy unique then try to add the records from the other DB, it might just ignore the duplicates (maybe with errors) but still add the non duplicates
befidledAuthor Commented:
how do I make the entity unique?
ofirproCommented:
it is recomended you make a backup before making changes to the table. that being said:

its easiest to add remove constraint with SQL Server Management Studio, less change things will go wrong. Here’s how:
1.Open SQL Server Management Studio.
2.Expand the Tables folder of the database where you wish to create the constraint.
3.Right-click the table where you wish to add the constraint and click Design.
4.In Table Designer, click Indexes/Keys.
5.Click Add.
6.Choose Unique Key in the Type drop-down list."

http://databases.about.com/od/sqlserver/a/unique.htm

or
The following example adds a unique constraint on the column column_name in the table whose name is table_name. The name of the constraint is constraint_name.

 alter table table_name
  add constraint constraint_name
  unique (column_name)

to remove the constraint
alter table table-name disable constraint constraint_name;

http://www.adp-gmbh.ch/ora/sql/alter_table.html
Ray PaseurCommented:
Here is a teaching example that shows how to handle UNIQUE columns in MySQL.
<?php // RAY_temp_borsec.php
error_reporting(E_ALL);
echo "<pre>";

// TEST DATA FROM THE POST AT EE
$xml = <<<ENDXML
<?xml version="1.0"?>
<MetaData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <TableLinks>
    <TableDefinition>
      <Name>Name1</Name>
      <Links>
              <TableDefinition>
                    <Name>SubName1Link1</Name>
                    <Links />
              </TableDefinition>
              <TableDefinition>
                    <Name>SubName1Link2</Name>
                    <Links />
                  </TableDefinition>
        </Links>
    </TableDefinition>

       <TableDefinition>
      <Name>Name2</Name>
      <Links>
              <TableDefinition>
                    <Name>SubName2Link1</Name>
                    <Links />
              </TableDefinition>
              <TableDefinition>
                    <Name>SubName2Link2</Name>
                    <Links />
                  </TableDefinition>
        </Links>
    </TableDefinition>
  </TableLinks>
</MetaData>
ENDXML;

// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml);

// ACTIVATE THIS TO VISUALIZE THE OBJECT
// var_dump($obj);

// ITERATE OVER THE OBJECT
foreach ($obj->TableLinks->TableDefinition as $t)
{
    $n = (string)$t->Name;
    echo PHP_EOL . $n;

    // ITERATE OVER THE SUB-OBJECT
    foreach ($t->Links->TableDefinition as $x)
    {
        $s = (string)$x->Name;
        echo PHP_EOL . ' ' . $s;
    }
}

Open in new window

Ray PaseurCommented:
Oops.  Sorry about that.  SELECT ALL is not the same as SELECT ALL, COPY!

Here is the example.
<?php // RAY_mysql_example_6.php
error_reporting(E_ALL);

// IMPORTANT PAGES FROM THE MANUALS
// MAN PAGE: http://us2.php.net/manual/en/ref.mysql.php
// MAN PAGE: http://us2.php.net/manual/en/mysql.installation.php
// MAN PAGE: http://us.php.net/manual/en/function.mysql-error.php

// CONNECTION AND SELECTION VARIABLES FOR THE DATABASE
$db_host = "localhost"; // PROBABLY THIS IS OK
$db_name = "??";        // GET THESE FROM YOUR HOSTING COMPANY
$db_user = "??";
$db_word = "??";

// OPEN A CONNECTION TO THE DATA BASE SERVER
// MAN PAGE: http://us2.php.net/manual/en/function.mysql-connect.php
if (!$db_connection = mysql_connect("$db_host", "$db_user", "$db_word"))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB CONNECTION: ";
    echo "<br/> $errmsg <br/>";
}

// SELECT THE MYSQL DATA BASE
// MAN PAGE: http://us2.php.net/manual/en/function.mysql-select-db.php
if (!$db_sel = mysql_select_db($db_name, $db_connection))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB SELECTION: ";
    echo "<br/> $errmsg <br/>";
    die('NO DATA BASE');
}
// IF WE GOT THIS FAR WE CAN DO QUERIES



// HOW TO INSERT A UNIQUE VALUE AND HANDLE DUPLICATE VALUES
// ESCAPING THE EXTERNAL DATA
$safe_unique_thing = mysql_real_escape_string($external_unique_thing);

// CONSTRUCTING THE QUERY
$isql   = "INSERT INTO table ( my_unique_thing ) VALUES ( '$safe_unique_thing' )";

// RUN THE QUERY AND TEST FOR SUCCESS
$err    = FALSE;
if (!$i = mysql_query("$isql"))
{

    // IF ERROR IS NOT 1062, THIS IS A BAD THING
    $err = mysql_errno();
    if ($err != 1062)
    {
       /* HANDLE MYSQL ERROR CONDITION */
    }

    // IF ERROR IS 1062, THIS IS A DUPLICATE KEY
    else
    {
       echo htmlentities($external_unique_thing) . ' ALREADY EXISTS IN THE TABLE';
    }
}
if (!$err)
{
    echo htmlentities($external_unique_thing) . ' HAS BEEN INSERTED INTO THE TABLE';
}

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ray PaseurCommented:
To offer much help beyond that, we would need more specifics about your application - test data samples, etc.  If you want to create those and show us, there may be some other suggestions that would be helpful.

You might also want to learn about this function.  It can be very helpful with long-running jobs, if it is used correctly.  
http://us3.php.net/manual/en/function.set-time-limit.php
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
MySQL Server

From novice to tech pro — start learning today.