Link to home
Start Free TrialLog in
Avatar of befidled
befidled

asked on

Updating a large MySQL DB

I am working with a pretty large (for me anyway) MySQL DB. It is about 260mb and 500,000 records.

We are receiving data from a third party as zipped DBF files. The data they upload to the server is uploaded 5 to 6 times per day and each time builds upon the previous upload. So for instance the DBF they upload in the morning might have 600 records while the DBF at the end of the afternoon might have 6,000  records but will still include the original 600 records.

I have working scripts that use a couple of crontabs to unzip the zip, archive it, and put the DBF file into a specific location to be read. I also have a script that should INSERT the contents of it into the large, 500,000 record DB.

I'm having an issue where it works fine at smaller (<10,000) record sizes but doesn't do anything to update my large DB. I think that it is taking too long because it is trying to identify duplicate records so that it doesn't import them in.

What is a good strategy to take to both import the records into the large DB but to also keep from having duplicate records?
Avatar of ofirpro
ofirpro
Flag of United States of America image

try removing indexes when doing the add, then adding them after it is complete, this MIGHT shave some time off the entire process.

what method are you using for duplicate detection, maybe indexes could be added there?
Avatar of befidled
befidled

ASKER

I'm not really doing anything for duplicate detection other than matching on a primary key. What would a good practice be in my situation?
add the distinct clause for faster way of avoiding duplicates
for duplicate records try select distinct * from table

http://www.cryer.co.uk/brian/sql/sql_delete_duplicates.htm
Ok, so are you suggesting that I don't worry about duplicates going into the database but to filter them out in my queries?
as long as it is faster, thats a possibiliy. maybe even remove them the duplicates later.

also an idea that i haven tried this myself, you can try to disable duplicates in the SQL table stracture by making the entitiy unique then try to add the records from the other DB, it might just ignore the duplicates (maybe with errors) but still add the non duplicates
how do I make the entity unique?
it is recomended you make a backup before making changes to the table. that being said:

its easiest to add remove constraint with SQL Server Management Studio, less change things will go wrong. Here’s how:
1.Open SQL Server Management Studio.
2.Expand the Tables folder of the database where you wish to create the constraint.
3.Right-click the table where you wish to add the constraint and click Design.
4.In Table Designer, click Indexes/Keys.
5.Click Add.
6.Choose Unique Key in the Type drop-down list."

http://databases.about.com/od/sqlserver/a/unique.htm

or
The following example adds a unique constraint on the column column_name in the table whose name is table_name. The name of the constraint is constraint_name.

 alter table table_name
  add constraint constraint_name
  unique (column_name)

to remove the constraint
alter table table-name disable constraint constraint_name;

http://www.adp-gmbh.ch/ora/sql/alter_table.html
Here is a teaching example that shows how to handle UNIQUE columns in MySQL.
<?php // RAY_temp_borsec.php
error_reporting(E_ALL);
echo "<pre>";

// TEST DATA FROM THE POST AT EE
$xml = <<<ENDXML
<?xml version="1.0"?>
<MetaData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <TableLinks>
    <TableDefinition>
      <Name>Name1</Name>
      <Links>
              <TableDefinition>
                    <Name>SubName1Link1</Name>
                    <Links />
              </TableDefinition>
              <TableDefinition>
                    <Name>SubName1Link2</Name>
                    <Links />
                  </TableDefinition>
        </Links>
    </TableDefinition>

       <TableDefinition>
      <Name>Name2</Name>
      <Links>
              <TableDefinition>
                    <Name>SubName2Link1</Name>
                    <Links />
              </TableDefinition>
              <TableDefinition>
                    <Name>SubName2Link2</Name>
                    <Links />
                  </TableDefinition>
        </Links>
    </TableDefinition>
  </TableLinks>
</MetaData>
ENDXML;

// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml);

// ACTIVATE THIS TO VISUALIZE THE OBJECT
// var_dump($obj);

// ITERATE OVER THE OBJECT
foreach ($obj->TableLinks->TableDefinition as $t)
{
    $n = (string)$t->Name;
    echo PHP_EOL . $n;

    // ITERATE OVER THE SUB-OBJECT
    foreach ($t->Links->TableDefinition as $x)
    {
        $s = (string)$x->Name;
        echo PHP_EOL . ' ' . $s;
    }
}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
To offer much help beyond that, we would need more specifics about your application - test data samples, etc.  If you want to create those and show us, there may be some other suggestions that would be helpful.

You might also want to learn about this function.  It can be very helpful with long-running jobs, if it is used correctly.  
http://us3.php.net/manual/en/function.set-time-limit.php