Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

manipulate duplicate data

Posted on 2014-11-22
15
128 Views
Last Modified: 2014-11-22
My DOCUMENT table has  duplicate entries when I group BY name, size

The result of this query is

SELECT name, COUNT( * ) , id,filename  FROM DOCUMENT GROUP BY name, size HAVING COUNT( * ) >1

name          COUNT(*)      id      filename
--------------------------------------------------------------
docu1             45               33     fname1
docu2             85               59     fname2
docu3             43               33     fname5

I  want to change the  "filename" of all recurring entries to the filename of the first  entry of the group. (That means the first record of the group will be unchanged)

Can any body help me,
Thank you so much
0
Comment
Question by:myyis
  • 8
  • 7
15 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 40459837
So where for example the ID is 33 all entries should be filename1 (not filename5 etc)?
0
 

Author Comment

by:myyis
ID: 40459844
Like this

name      size        id       filename
---------------------------------------------
docu1      10         33      fname1
docu1      10         34      fname4    change  to  fname1
docu1      10         35      fname6    change  to  fname1
docu2      20         36      fname7
docu2      20         36      fname8    change  to  fname7

Also  I need to have the list of the old (changed) "filename" values ("fname4","fname6","fname8")

Thank you
0
 
LVL 58

Accepted Solution

by:
Gary earned 500 total points
ID: 40459861
Try
UPDATE table1 a
INNER JOIN
(SELECT filename,name FROM table1) b
ON a.name = b.name
SET a.filename=b.filename

Open in new window


list of the old (changed) "filename"
a list where?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 

Author Comment

by:myyis
ID: 40459872
if possible using SELECT somewhere
0
 
LVL 58

Expert Comment

by:Gary
ID: 40459877
Yes, but to do what with it?
But you cannot do a recordset select while doing an update - it's one or the other
0
 

Author Comment

by:myyis
ID: 40459880
I will use the list to delete the repeating documents
0
 

Author Comment

by:myyis
ID: 40459884
I mean delete from directory. So may be I can use a SELECT first, then the 2d query to change
0
 
LVL 58

Expert Comment

by:Gary
ID: 40459889
Delete from what directory?
This question is in the MySQL zone.
0
 

Author Comment

by:myyis
ID: 40459900
Yeah I know,
if you can provide me also  the SELECT query, I can use the result set of records to will be changed.
Thank you.
0
 
LVL 58

Expert Comment

by:Gary
ID: 40459915
You can use this which will give a comma seperated field called dupes which contains all the grouped filenames

SELECT GROUP_CONCAT(filename) AS dupes FROM table GROUP BY name

Open in new window

0
 

Author Comment

by:myyis
ID: 40459980
Thank you for the select but also it gives the results that are unique.
I need something like this ("fname4","fname6","fname8"). Please check above
0
 
LVL 58

Expert Comment

by:Gary
ID: 40460000
Is ID a unique auto increment field?
0
 

Author Comment

by:myyis
ID: 40460007
No, the PK is (ID,ORID)
0
 
LVL 58

Assisted Solution

by:Gary
Gary earned 500 total points
ID: 40460031
SELECT filename
FROM table1 a
WHERE filename NOT IN 
(SELECT filename FROM (select name,filename from table1 group by name) b) 

Open in new window

0
 

Author Closing Comment

by:myyis
ID: 40460036
Great!
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question