Synchronize duplicate fields in two different MYSQL databases using a PHP script

I need to synchronize duplicate fields in two different MYSQL databases using a PHP script

The script needs to login to both databases using separate credentials, and then the query needs to compare the fields in DB1 to those in DB2.
Two options need exist, with separate submit buttons :
1) update DB2 to match the associated fields in DB1
2) ADD any new records to DB2, that exist in DB1

DB1

TABLE_d1_a
Field_d1_a1 ID PK (auto increment)
Field_d1_a2
Field_d1_a3
Field_d1_a4

TABLE_d1_b
Field_d1_b1
Field_d1_b2 ID FK
Field_d1_b3
Field_d1_b4

TABLE_d1_c
Field_d1_c1
Field_d1_c2 ID FK
Field_d1_c3
Field_d1_c4

TABLE_d1_d
Field_d1_d1
Field_d1_d2 ID FK
Field_d1_d3
Field_d1_d4

DB2

TABLE_d2_a
Field_d2_a1
Field_d2_a2 ID FK
Field_d2_a3

TABLE_d2_b
Field_d2_b1
Field_d2_b2 ID FK
Field_d2_b3
Field_d2_b4

TABLE_d2_d
Field_d2_d1 ID FK
Field_d2_d3
Field_d2_d4
ggjonesAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Roger BaklundCommented:
I see there is a primary key (PK) on table TABLE_d1_a. Are there no primary keys or unique indexes on any of the other tables? If not, how to differentiate between an update and a new record?

Which database is local, and which is remote? Or should this run from a third server, so that both databases are remote? Or is it two local databases?

How many rows (approximately) are there in each table on the remote server(s)?

What are the datatypes for each of the fields? A real database schema would be usefull...

I assume Field_d1_a1 is mapped to Field_d2_a1, Field_d1_a2 is mapped to Field_d2_a2 and so on, is this correct?

How is the table TABLE_d1_c involved?

How are the fields Field_d1_a4 and Field_d1_d2 involved?
0
ggjonesAuthor Commented:
... thanks for responding. To address your queries;

I see there is a primary key (PK) on table TABLE_d1_a. Are there no primary keys or unique indexes on any of the other tables? If not, how to differentiate between an update and a new record?

>> my apologies. Assume the first field of each table is a PK (auto increment)

Which database is local, and which is remote? Or should this run from a third server, so that both databases are remote? Or is it two local databases?

>> both DBs are remote

How many rows (approximately) are there in each table on the remote server(s)?

>> assume less than 10k

What are the datatypes for each of the fields? A real database schema would be usefull...

>> assume varchar

I assume Field_d1_a1 is mapped to Field_d2_a1, Field_d1_a2 is mapped to Field_d2_a2 and so on, is this correct?

>> yes, correct

How is the table TABLE_d1_c involved?

>> for illustrative purposes. i.e. some tables in DB1 do NOT exist in DB2

How are the fields Field_d1_a4 and Field_d1_d2 involved?

>> for illustrative purposes. i.e. some fields in DB1 tables do NOT exist in DB2 tables

regards,

G.J.
0
Roger BaklundCommented:
Ok, try the code below. Backup your tables first, this code is poorly tested!

Note that all rows from each table is loaded at once, one table at the time. This requires a lot of RAM, but from the numbers you have provided, it should not be a problem (3*255*10K < 8M).

It will take a while to run it, depending on your bandwidth and load on all three servers involved.
<?php
// enable PHP error reporting
ini_set('display_errors', 1);
error_reporting(E_ALL);
 
if(isset($_POST['sync'])) {
  set_time_limit(0); # no time limit
  echo '<html><head><title>db sync</title></head><body>
<h1>db sync</h1>';
  // connect
  $db1 = mysql_connect('host1','user1','password1') 
    or die(mysql_error());
  if(!mysql_select_db('dbname1',$db1)) 
    die('Could not select database 1');
  $db2 = mysql_connect('host2','user2','password2')
    or die(mysql_error());
  if(!mysql_select_db('dbname2',$db2))
    die('Could not select database 2');
 
  // db schema
  $schema1 = array(
    array('table'=>'TABLE_d1_a','key'=>'Field_d1_a1',
          'fields'=>array('Field_d1_a2','Field_d1_a3')),
    array('table'=>'TABLE_d1_b','key'=>'Field_d1_b1',
          'fields'=>array('Field_d1_b2','Field_d1_b3','Field_d1_b4')),
    array('table'=>'TABLE_d1_d','key'=>'Field_d1_d1',
          'fields'=>array('Field_d1_d3','Field_d1_d4')),
  );
  $schema2 = array(
     array('table'=>'TABLE_d2_a','key'=>'Field_d2_a1',
           'fields'=>array('Field_d2_a2','Field_d2_a3')),
     array('table'=>'TABLE_d2_b','key'=>'Field_d2_b1',
           'fields'=>array('Field_d2_b2','Field_d2_b3','Field_d2_b4')),
     array('table'=>'TABLE_d2_d','key'=>'Field_d2_d1',
           'fields'=>array('Field_d2_d3','Field_d2_d4')),
  );
 
  // sync
  if(isset($_POST['add'])) {
    for($i=0;$i<count($schema1);$i++) {      
      echo 'Fetching rows from '.$schema1[$i]['table'].'<br />';
      $res = mysql_query('select * from '.$schema1[$i]['table'].' order by '.$schema1[$i]['key'],$db1);
      echo mysql_num_rows($res).' rows fetched<br />';
      $affected = 0;
      while($row = mysql_fetch_assoc($res)) {
        $data = array();
        foreach($schema1[$i]['fields'] as $field) 
          $data[] = mysql_real_escape_string($row[$field]);
        $query = 'insert ignore into '.$schema2[$i]['table'].' ('.$schema2[$i]['key'].','.
          implode(',',$schema2[$i]['fields']).') values ('.$row[$schema1[$i]['key']].',"'.
          implode('","',$data).'")';
        $res2 = mysql_query($query,$db2);
        if(!$res2) die('<br />Insert query failed!<br />Query:'.htmlentities($query).'<br />Error:'.mysql_error());
        $affected+=mysql_affected_rows($db2);
      }
      mysql_free_result($res);
      echo 'Inserted rows: '.$affected.'<br />';
    }
  } elseif(isset($_POST['update'])) {
    for($i=0;$i<count($schema1);$i++) {      
      echo 'Fetching rows from '.$schema1[$i]['table'].'<br />';
      $res = mysql_query('select * from '.$schema1[$i]['table'].' order by '.$schema1[$i]['key'],$db1);
      echo mysql_num_rows($res).' rows fetched<br />';
      $affected = 0;
      while($row = mysql_fetch_assoc($res)) {
        $data = array();
        for($j=0;$j<count($schema2[$i]['fields']);$j++) 
          $data[] = $schema2[$i]['fields'][$j].'="'.mysql_real_escape_string($row[$schema1[$i]['fields'][$j]]).'"';
        $query = 'update '.$schema2[$i]['table'].' set '.
          implode(',',$data).' where '.$schema2[$i]['key'].'='.$row[$schema1[$i]['key']];
        $res2 = mysql_query($query,$db2);
        if(!$res2) die('Update query failed!<br />Query:'.htmlentities($query).'<br />Error:'.mysql_error());
        $affected+=mysql_affected_rows($db2);
      }
      mysql_free_result($res);
      echo 'Updated rows: '.$affected.'<br />';
    }    
  }
  echo '<a href="'.$_SERVER['SCRIPT_NAME'].'">back</a>';
  echo '</body></html>';
} else { 
?><html><head><title>db sync</title></head><body>
<h1>db sync</h1>
<form method="post" action="<?php echo $_SERVER['SCRIPT_NAME']?>">
<input type="hidden" name="sync" value="1" />
<input type="submit" name="update" value="update DB2 to match the associated fields in DB1"/>
<input type="submit" name="add" value="ADD any new records to DB2, that exist in DB1"/>
</form></body></html>
<?php } ?>

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

ggjonesAuthor Commented:
.. this truly is a beautiful thing.... thanks for putting your head around it.

I see what you mean about resource use. But if one is to compare all fields in all Tables, are there any alternative methods?

While the high-level logic is clear to me, In terms of testing/de-bugging, the devil will certainlly be in the details. I would expect to have a system built sufficiant to test this soution over the next ten days or so, whereupon, it will either work seamlessly... or it won't, and I'll need to report back error messsages.

regards,

G.J.
0
ggjonesAuthor Commented:
This works brilliantly ( albeit on a small 300k test DB). - thank you very much cxr

I'm reminded however, that I need a third mode of update - the ability to sync records by specific Business.

For example, the attached arrays describe the specific fields in question.
The BUSINESS table PK is 'business_id'. Most other tables include 'business_id_fk'.

If I select a specific Business from a drop-down menu, is there a way to then sync the 2 DBs only for records with that ID, across all tables?

regards, GJ
  // db schema
  $schema1 = array(
    array('table'=>'business','key'=>'business_id',
          'fields'=>array('business_date_modified','business_name','business_nickname','business_city','business_category','business_keyphrase_1','business_keyphrase_2','business_url','business_show')),
    array('table'=>'branding','key'=>'branding_id',
          'fields'=>array('branding_date_modified','branding_title_suffix','branding_description_prefix','branding_description_suffix')),
    array('table'=>'content','key'=>'content_id',
          'fields'=>array('content_date_modified','business_id_fk','content_title','content_description','content_keywords','content_script')),
    array('table'=>'keyphrases','key'=>'keyphrase_id',
          'fields'=>array('business_id_fk','keyphrase_words')),
    array('table'=>'search','key'=>'search_id',
          'fields'=>array('search_date_modified','business_id_fk','search_title','search_show','search_entry')),
    array('table'=>'location','key'=>'location_id',
          'fields'=>array('location_date_modified','business_id_fk','location_name','location_unit_show','location_unit','location_street_address','location_po_box_show','location_po_box','location_city','location_prov','location_postal_code','location_phone_1_show','location_phone_1','location_website_link','location_website_url','location_map_show','location_map_latitude','location_map_longitude')),
  );

Open in new window

0
ggjonesAuthor Commented:
Thanks cxr ... this works well. I have a follow-up question posted about a possible refinement - would you mind reviewing it?  regards, GJ
0
Roger BaklundCommented:
Yes, that is possible. Add a dropdown to select the business, and append a WHERE clause to the SELECT statements if a business was selected and the table specific fields array contains the foreign key (only the branding table does not).
0
ggjonesAuthor Commented:
thanks for responding cxr...

so if I understand you then, this is the type of thing:

$res = mysql_query('select * from '.$schema1[$i]['table'].' order by '.$schema1[$i]['key'] WHERE (business_id_fk = '. $myVar .' OR business_id = '. $myVar) ,$db1);

But.. will this create an error for tables that have only one or none of business_id_fk OR business_id ?
0
Roger BaklundCommented:
Yes, you must test if there is a fk column in that table, and the WHERE clause must be before the ORDER BY clause:  SELECT ... FROM ... WHERE ... ORDER BY ...

If the posted business_id is stored as $business_id, something like this (not tested):
$res = mysql_query('select * from '.$schema1[$i]['table'].
  (($business_id and in_array('business_id_fk',$schema1[$i]['fields']))?
  ' where business_id_fk='.$business_id:'').
  ' order by '.$schema1[$i]['key'],$db1);

Open in new window

0
Roger BaklundCommented:
hm... that only works for the tables with business_id_fk, not for the main business table. Maybe it is better to split this statement into parts (still not tested):
$sql = 'select * from '.$schema1[$i]['table'];
if($business_id > 0) {
  if(in_array('business_id_fk',$schema1[$i]['fields'])) 
    $sql.=' where business_id_fk='.$business_id;
  else if(in_array('business_id',$schema1[$i]['fields'])) 
    $sql.=' where business_id='.$business_id;
}
$res = mysql_query($sql.' order by '.$schema1[$i]['key'],$db1);  

Open in new window

0
Roger BaklundCommented:
A small modification, this should work better:
$sql = 'select * from '.$schema1[$i]['table'];
if($business_id > 0) {
  if(in_array('business_id_fk',$schema1[$i]['fields'])) 
    $sql.=' where business_id_fk='.$business_id;
  else if($schema1[$i]['key']=='business_id')) 
    $sql.=' where business_id='.$business_id;
}
$res = mysql_query($sql.' order by '.$schema1[$i]['key'],$db1);  

Open in new window

0
ggjonesAuthor Commented:
Thank you cxr, this seems to work fine, after brief testing (with a small DB).

I do have one question though - notice that with both UPDATE and ADD, the "branding" table is still being "fetched from", despite there being no business_id or business_id_fk field in that table.

Is there an efficiency to be be made , to exclude the "branding" table when querying is restricted to business_id?
UPDATE by business_ID :
 
Fetching records from branding
8 records fetched
Existing Records Updated: 0
Fetching records from business
1 records fetched
Existing Records Updated: 1
Fetching records from content
1 records fetched
Existing Records Updated: 0
Fetching records from keyphrases
1 records fetched
Existing Records Updated: 0
Fetching records from search
1 records fetched
Existing Records Updated: 0
Fetching records from location
1 records fetched
Existing Records Updated: 0
 
 
ADD by business_ID :
 
Fetching records from branding
8 records fetched
New Records Added: 0
Fetching records from business
1 records fetched
New Records Added: 0
Fetching records from content
1 records fetched
New Records Added: 0
Fetching records from keyphrases
2 records fetched
New Records Added: 1
Fetching records from search
1 records fetched
New Records Added: 0
Fetching records from location
1 records fetched
New Records Added: 0

Open in new window

0
Roger BaklundCommented:
The easiest solution is to use an "untrue" expression, so that no records are fetched:
$sql = 'select * from '.$schema1[$i]['table'];
if($business_id > 0) {
  if(in_array('business_id_fk',$schema1[$i]['fields'])) 
    $sql.=' where business_id_fk='.$business_id;
  else if($schema1[$i]['key']=='business_id')) 
    $sql.=' where business_id='.$business_id;
  else 
    $sql.=' where 1=2';  # allways false
}
$res = mysql_query($sql.' order by '.$schema1[$i]['key'],$db1);  

Open in new window

0
ggjonesAuthor Commented:

yes, of course!
The thing works fabulously now... again, thank you very much cxr.

regards,

GJ.
0
gr8gonzoConsultantCommented:
Just a quick note - if you're looking to synchronize an entire database or selection of databases, then using dual-master replication is a far more efficient and easy way to go, and it's built into MySQL. Just google around for "dual-master replication mysql" and there are several guides to setting it up.
0
ggjonesAuthor Commented:
thanks for the reply gr8gonzo..

Its not quite the entire DB, rather a subset of fields from a subset of tables (in order to reduce overhead)

regards,

GJ
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.