Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 595
  • Last Modified:

Mysql combine merge replace issues

I have a database for our webmail (roundcube) that has become fairly garbled over the past year. Originally usernames were formated like so 'john.smith+domain.com' but was later changed to 'john.smith@domain.com". Even more recently the "+" has been replaced by "@" in all instances within the database. However this has caused some issues, there are now duplicate fields for all addresses.  Recently peoples address books have not been showing up because they were linked to the older username format which had a different userID value in the Contacts table. I know how to do this manually or even semi-manually (running update and replace query) but even then I would have to do it for each address. Any help would be greatly appreciated.

Example in an attempt to clarify.

In the 'users' table.

| User_ID |                  | username |
   32                   john.smith@domain.com
   103                  john.smith@domain.com
   17                   becky.smith@domain.com
   55                   becky.smith@domain.com


In the 'contacts' table the user_id is used to link the addresses (contact_id's) to the proper username. So where im at now we have 'user_id' 103 with about a hundred contacts and only five for 'user_id' 32 but when logging in to the webmail through the john.smith@domain.com address only the contacts from 'user_id' 32 are visable in the address book.

Is there a easy way to either A) link all the user_id's that have the same 'username' field value
B) replace all one user_id in the Contacts table with the other, C) anything that would possibly make fixing this easier!
0
jshiffty
Asked:
jshiffty
  • 9
  • 6
  • 3
  • +2
3 Solutions
 
HainKurtSr. System AnalystCommented:
use this query and your address table
select * from (
  select user_id, (select min(c2.user_id) from contacts c2 where c2.username=c1.username) new_user_id
  from contact c1
) x where user_id <> new_user_id

| User_ID | new_user_ID |
  103        32
   55        17  

this query shows you what to update and what should be the new value

Open in new window

0
 
HainKurtSr. System AnalystCommented:
or just use this

update addresses a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.username=(select c2.username from contacts c2 where c2.user_id=a.user_id))

or create a function:

function getNewID(username as string)
... this function will return min user id outlined @ 26177398 for a given username

example:
getNewID(55) = 32
getNewID(103) = 17
getNewID(17) = 17

then use this function and above update

update addresses a
set user_id = getNewID(user_id)
 
0
 
jshifftyAuthor Commented:
Thanks for all the help very much appreciated! I'll be trying to work through what you've provided me and see what I can get to work. As I said thank you again!

The database contains almost 300 duplicated usernames, does this mean I would have to manually code which to use as the correct user_id for each?  something I'm more and more thinking might be an inevitable resolution.  
0
Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

 
HainKurtSr. System AnalystCommented:
do it manually if it is 300
and after that delete duplicates (which has no addresses anymore, like 103 & 55 in the sample)
after that create a unique index on username and you are good to go, no more duplicate issue...
0
 
HainKurtSr. System AnalystCommented:
above solution will take max a few hours ;)
if updating those 300 and deleting orphans will take more than this, use above solution...
and you will get experience which you will definetly need in the future ;)
0
 
SharathData EngineerCommented:
Can you provide sample data from all your tables and how exactly you want to update the records?
0
 
jshifftyAuthor Commented:
User table gives each email address a user_id which is then used in the contacts table. What i need is to search for all the duplicate email address and resolve a single user_id for them.

Following that I will need to alter the Contacts table to reflect this new user id and switch any contacts from the old user_id.

So victoria has a few hundred contacts on her login with the user_id of 2, however on her login with user_id of 4 only has a few. I need to find an intelligent way to give these duplicates a single user_id and then in the contacts table change all values with both user_id of 2 and user_id of 4 to this new single address.


Hopefully that was clear, i don't know sometimes or not! Thanks for all your help
Userstable.jpg
contacttable.jpg
0
 
jshifftyAuthor Commented:
BTW as mentioned this is but a snipped of the table, I think there are around 300 duplicates in total
0
 
HainKurtSr. System AnalystCommented:
what happened to this query?

update addresses a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.username=(select c2.username from contacts c2 where c2.user_id=a.user_id))


it should update all contact info and put them into one account...  after this query you should get

"a few hundred contacts on her login with the user_id of 2, however on her login with user_id of 4 only has a few"
-->
"a few hundred + 2 contacts on her login with the user_id of 2, however on her login with user_id of 4 only 0"
0
 
jshifftyAuthor Commented:
Hain,
I hadn't had a chance to try to run any query's until today because of travel but after testing a bit today this has what I have gotten.

The original query was changed to update the 'users' table so it resulted in this.

update users a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.username=(select c2.username from contacts c2 where c2.user_id=a.user_id))

which gave me this error. #1054 - Unknown column 'c1.username' in 'where clause'

So I changed the 'username' to 'email' which I think should be the correct field. Which became this
update users a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.email=(select c2.email from contacts c2 where c2.user_id=a.user_id))


Which resulted in this error : #1242 - Subquery returns more than 1 row

Not exactly sure where to go next though. I thought maybe use with the ANY command?? But as I said I only have a basic understanding. All and any help is again appreciated!!
0
 
jshifftyAuthor Commented:
After looking a bit more im sticking with being stuck at the first change.

update users a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.username=(select c2.username from contacts c2 where c2.user_id=a.user_id))
0
 
HainKurtSr. System AnalystCommented:
try this:

update users a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.email=a.email)

you used username in your previous posts, but last post mention about email, confusing a bit...
0
 
jshifftyAuthor Commented:
Let me explain where I think some of the confusion and difficulty is coming from. I was a bit confused when testing a few things and think that may have really hindered things a bit.

These are the fields for the 'Users' table.
user_id | username | mail_host | alias | created | last_login | language

Here are the fields for the 'Contacts' table.
contact_id |  user_id |  changed |  del | name | email | firstname | surname | vcard  

Firstly, I was completely wrong about using the 'email' field from the Contacts table. This is actually the e-mail address of the contact themself and not the user. So the only way the 'Users' and 'Contacts' tables are linked is through the 'user_id' field.

Given that , would I need to change email to 'user_id' in this query??

Sorry for all the confusion and thank you for all your help.
0
 
Richard QuadlingSenior Software DeverloperCommented:
I'm coming in as a response to the request for more help.

What is the difference between a "contact" and a "user"?

Are you expecting a one-for-one relationship between the tables?

If so, this sounds like you should just merge the 2 tables into 1.

0
 
jshifftyAuthor Commented:
The user table stores the information regarding 'user_id' and the actual email address of the user. (address person uses to login to webmail)

The contact table contains the address book contacts of each user. As in user with email address of john@domain.com has user_id of 145 established by the 'users' table. All the email addresses of his contacts from his address book are stored in 'contacts' table with the user_id of 145 they are all given a contact_id but have the same user_id. So within the contacts table there are multiple instances of the same user_id.

currently what im doing is using this query for every single email address that has multiple user_ids(tedious and time consuming as i have to input each user_id for all address)

update contacts
SET user_id=replace(user_id, 113,68)
where user_id like '113' ;



 but i would prefer if there was a way i could do the following

search for the email addresses(called 'usernames' in the 'users' table) that have multiple 'user_id's associated with them in the 'users' table
delete the duplicate 'user_id' with the higher numerical value
replace all instances of the deleted 'user_id' in the 'contacts' table with the lower numerical 'user_id'
0
 
Richard QuadlingSenior Software DeverloperCommented:
Aha!

So, this will be at least 2 queries. The first to update the contacts
table with the appropriate minimized ID and the second to delete the
duplicates.

The first step I would take is to build a qurey which will be used as
a sub query later on.

SELECT MIN(user_id) as user_id, username FROM users GROUP BY username

But as there is a date column (last login), I would use the most
recent login as the potentially most accurate entry.

So ...

SELECT user_id, username  FROM users INNER JOIN (SELECT
MAX(last_login), username GROUP BY username) LastLogin ON
users.username = LastLogin.username

This will give you the ID of the most recent login for each username.

Whichever you choose, think of this as the NewID table. Create a view
for it if you want. Makes things a lot easier.

Next step is to update the contacts with the appropriate new id.

UPDATE contacts
SET user_id = NewID.user_ID
FROM contacts INNER JOIN users ON contacts.user_ID = users.users_ID
INNER JOIN NewID on NewID.username = users.username

What is this saying?

On the contacts table, we are going to update the user_id column with
the user_id of the NewId view.

If you haven't made a view then ...

UPDATE contacts
SET user_id = NewID.user_ID
FROM contacts INNER JOIN users ON contacts.user_ID = users.users_ID
INNER JOIN (SELECT user_id, username  FROM users INNER JOIN (SELECT
MAX(last_login), username GROUP BY username) LastLogin ON
users.username = LastLogin.username) NewID on NewID.username =
users.username




Now all you need to do next is delete all users whose userid is not in
the NewID view.

DELETE FROM Users
WHERE user_id NOT IN (SELECT UserID FROM NewID)

or

DELETE FROM Users
WHERE user_id NOT IN (SELECT UserID FROM ((SELECT MAX(last_login),
username GROUP BY username) LastLogin ON users.username =
LastLogin.username)))




NOTE: All untested.

If you can do this on a backup, please do.
0
 
jshifftyAuthor Commented:
Recieving this error. I am logged in securely and using phpMyAdmin 3.2.4 through Cpanel with MySQL 5.1.42

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM contacts INNER JOIN users ON contacts.user_id = users.users_id INNER JOIN ' at line 3

when trying to run this query

SELECT user_id, username  FROM users INNER JOIN (SELECT
MAX(last_login), username GROUP BY username) LastLogin ON
users.username = LastLogin.username

0
 
Richard QuadlingSenior Software DeverloperCommented:
Doh!


SELECT
      MAX(last_login) as LastLogin,
      username
GROUP BY
      username


selects the last login for each username.



SELECT
      user_id,
      username
FROM
      users
      INNER JOIN
            (
            SELECT
                  MAX(last_login) as LastLogin,
                  username
            GROUP BY
                  username
            ) LastLogin
            ON
                  users.LastLogin = LastLogin.LastLogin
                  AND
                  users.username = LastLogin.username

Selects the user_id and username where the lastlogin and username match the subquery.

Sorry about that.
0
 
Ray PaseurCommented:
A stray thought, in case you decide to mark one or more of the columns UNIQUE.  MySQL will throw error #1062 if you try to violate the UNIQUE nature of the data, like with a duplicate field.  You will need to test for this and handle it differently from other MySQL errors.

Best of luck with it, ~Ray
// HOW TO INSERT A UNIQUE VALUE AND HANDLE DUPLICATE VALUES
// ESCAPING THE EXTERNAL DATA
$my_unique_thing = mysql_real_escape_string($external_unique_thing);

// CONSTRUCTING THE QUERY
$isql   = "INSERT INTO table ( my_unique_thing ) VALUES ( '$my_unique_thing' )";

// RUN THE QUERY AND TEST FOR SUCCESS
$err    = FALSE;
if (!$i = mysql_query("$isql"))
{

// IF ERROR IS NOT 1062, THIS IS A BAD THING
   $err = mysql_errno();
   if ($err != 1062)
   {
      /* HANDLE MYSQL ERROR CONDITION */
   }

// IF ERROR IS 1062, THIS IS A DUPLICATE KEY
   else
   {
      echo htmlentities($external_unique_thing) . ' ALREADY EXISTS IN THE TABLE';
   }
}
if (!$err)
{
   echo htmlentities($external_unique_thing) . ' HAS BEEN INSERTED INTO THE TABLE';
}

Open in new window

0
 
jshifftyAuthor Commented:
No solution worked properly and ended up having to do it all pretty tediously. Regardless everyone did help, in at least compiling duplicate lists. thanks for the  assistance.
0

Featured Post

Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

  • 9
  • 6
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now