Link to home
Start Free TrialLog in
Avatar of jshiffty
jshifftyFlag for United States of America

asked on

Mysql combine merge replace issues

I have a database for our webmail (roundcube) that has become fairly garbled over the past year. Originally usernames were formated like so 'john.smith+domain.com' but was later changed to 'john.smith@domain.com". Even more recently the "+" has been replaced by "@" in all instances within the database. However this has caused some issues, there are now duplicate fields for all addresses.  Recently peoples address books have not been showing up because they were linked to the older username format which had a different userID value in the Contacts table. I know how to do this manually or even semi-manually (running update and replace query) but even then I would have to do it for each address. Any help would be greatly appreciated.

Example in an attempt to clarify.

In the 'users' table.

| User_ID |                  | username |
   32                   john.smith@domain.com
   103                  john.smith@domain.com
   17                   becky.smith@domain.com
   55                   becky.smith@domain.com


In the 'contacts' table the user_id is used to link the addresses (contact_id's) to the proper username. So where im at now we have 'user_id' 103 with about a hundred contacts and only five for 'user_id' 32 but when logging in to the webmail through the john.smith@domain.com address only the contacts from 'user_id' 32 are visable in the address book.

Is there a easy way to either A) link all the user_id's that have the same 'username' field value
B) replace all one user_id in the Contacts table with the other, C) anything that would possibly make fixing this easier!
Avatar of HainKurt
HainKurt
Flag of Canada image

use this query and your address table
select * from (
  select user_id, (select min(c2.user_id) from contacts c2 where c2.username=c1.username) new_user_id
  from contact c1
) x where user_id <> new_user_id

| User_ID | new_user_ID |
  103        32
   55        17  

this query shows you what to update and what should be the new value

Open in new window

SOLUTION
Avatar of HainKurt
HainKurt
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jshiffty

ASKER

Thanks for all the help very much appreciated! I'll be trying to work through what you've provided me and see what I can get to work. As I said thank you again!

The database contains almost 300 duplicated usernames, does this mean I would have to manually code which to use as the correct user_id for each?  something I'm more and more thinking might be an inevitable resolution.  
do it manually if it is 300
and after that delete duplicates (which has no addresses anymore, like 103 & 55 in the sample)
after that create a unique index on username and you are good to go, no more duplicate issue...
above solution will take max a few hours ;)
if updating those 300 and deleting orphans will take more than this, use above solution...
and you will get experience which you will definetly need in the future ;)
Avatar of Sharath S
Can you provide sample data from all your tables and how exactly you want to update the records?
User table gives each email address a user_id which is then used in the contacts table. What i need is to search for all the duplicate email address and resolve a single user_id for them.

Following that I will need to alter the Contacts table to reflect this new user id and switch any contacts from the old user_id.

So victoria has a few hundred contacts on her login with the user_id of 2, however on her login with user_id of 4 only has a few. I need to find an intelligent way to give these duplicates a single user_id and then in the contacts table change all values with both user_id of 2 and user_id of 4 to this new single address.


Hopefully that was clear, i don't know sometimes or not! Thanks for all your help
Userstable.jpg
contacttable.jpg
BTW as mentioned this is but a snipped of the table, I think there are around 300 duplicates in total
what happened to this query?

update addresses a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.username=(select c2.username from contacts c2 where c2.user_id=a.user_id))


it should update all contact info and put them into one account...  after this query you should get

"a few hundred contacts on her login with the user_id of 2, however on her login with user_id of 4 only has a few"
-->
"a few hundred + 2 contacts on her login with the user_id of 2, however on her login with user_id of 4 only 0"
Hain,
I hadn't had a chance to try to run any query's until today because of travel but after testing a bit today this has what I have gotten.

The original query was changed to update the 'users' table so it resulted in this.

update users a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.username=(select c2.username from contacts c2 where c2.user_id=a.user_id))

which gave me this error. #1054 - Unknown column 'c1.username' in 'where clause'

So I changed the 'username' to 'email' which I think should be the correct field. Which became this
update users a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.email=(select c2.email from contacts c2 where c2.user_id=a.user_id))


Which resulted in this error : #1242 - Subquery returns more than 1 row

Not exactly sure where to go next though. I thought maybe use with the ANY command?? But as I said I only have a basic understanding. All and any help is again appreciated!!
After looking a bit more im sticking with being stuck at the first change.

update users a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.username=(select c2.username from contacts c2 where c2.user_id=a.user_id))
try this:

update users a
set user_id = (select min(c1.user_id) from contacts  c1 where c1.email=a.email)

you used username in your previous posts, but last post mention about email, confusing a bit...
Let me explain where I think some of the confusion and difficulty is coming from. I was a bit confused when testing a few things and think that may have really hindered things a bit.

These are the fields for the 'Users' table.
user_id | username | mail_host | alias | created | last_login | language

Here are the fields for the 'Contacts' table.
contact_id |  user_id |  changed |  del | name | email | firstname | surname | vcard  

Firstly, I was completely wrong about using the 'email' field from the Contacts table. This is actually the e-mail address of the contact themself and not the user. So the only way the 'Users' and 'Contacts' tables are linked is through the 'user_id' field.

Given that , would I need to change email to 'user_id' in this query??

Sorry for all the confusion and thank you for all your help.
I'm coming in as a response to the request for more help.

What is the difference between a "contact" and a "user"?

Are you expecting a one-for-one relationship between the tables?

If so, this sounds like you should just merge the 2 tables into 1.

The user table stores the information regarding 'user_id' and the actual email address of the user. (address person uses to login to webmail)

The contact table contains the address book contacts of each user. As in user with email address of john@domain.com has user_id of 145 established by the 'users' table. All the email addresses of his contacts from his address book are stored in 'contacts' table with the user_id of 145 they are all given a contact_id but have the same user_id. So within the contacts table there are multiple instances of the same user_id.

currently what im doing is using this query for every single email address that has multiple user_ids(tedious and time consuming as i have to input each user_id for all address)

update contacts
SET user_id=replace(user_id, 113,68)
where user_id like '113' ;



 but i would prefer if there was a way i could do the following

search for the email addresses(called 'usernames' in the 'users' table) that have multiple 'user_id's associated with them in the 'users' table
delete the duplicate 'user_id' with the higher numerical value
replace all instances of the deleted 'user_id' in the 'contacts' table with the lower numerical 'user_id'
Aha!

So, this will be at least 2 queries. The first to update the contacts
table with the appropriate minimized ID and the second to delete the
duplicates.

The first step I would take is to build a qurey which will be used as
a sub query later on.

SELECT MIN(user_id) as user_id, username FROM users GROUP BY username

But as there is a date column (last login), I would use the most
recent login as the potentially most accurate entry.

So ...

SELECT user_id, username  FROM users INNER JOIN (SELECT
MAX(last_login), username GROUP BY username) LastLogin ON
users.username = LastLogin.username

This will give you the ID of the most recent login for each username.

Whichever you choose, think of this as the NewID table. Create a view
for it if you want. Makes things a lot easier.

Next step is to update the contacts with the appropriate new id.

UPDATE contacts
SET user_id = NewID.user_ID
FROM contacts INNER JOIN users ON contacts.user_ID = users.users_ID
INNER JOIN NewID on NewID.username = users.username

What is this saying?

On the contacts table, we are going to update the user_id column with
the user_id of the NewId view.

If you haven't made a view then ...

UPDATE contacts
SET user_id = NewID.user_ID
FROM contacts INNER JOIN users ON contacts.user_ID = users.users_ID
INNER JOIN (SELECT user_id, username  FROM users INNER JOIN (SELECT
MAX(last_login), username GROUP BY username) LastLogin ON
users.username = LastLogin.username) NewID on NewID.username =
users.username




Now all you need to do next is delete all users whose userid is not in
the NewID view.

DELETE FROM Users
WHERE user_id NOT IN (SELECT UserID FROM NewID)

or

DELETE FROM Users
WHERE user_id NOT IN (SELECT UserID FROM ((SELECT MAX(last_login),
username GROUP BY username) LastLogin ON users.username =
LastLogin.username)))




NOTE: All untested.

If you can do this on a backup, please do.
Recieving this error. I am logged in securely and using phpMyAdmin 3.2.4 through Cpanel with MySQL 5.1.42

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM contacts INNER JOIN users ON contacts.user_id = users.users_id INNER JOIN ' at line 3

when trying to run this query

SELECT user_id, username  FROM users INNER JOIN (SELECT
MAX(last_login), username GROUP BY username) LastLogin ON
users.username = LastLogin.username

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
No solution worked properly and ended up having to do it all pretty tediously. Regardless everyone did help, in at least compiling duplicate lists. thanks for the  assistance.