Help with removing duplicate data elements using VB.NET?

Hi,

How do you remove duplicate data elements? For example if I have file1 brlow:

File1:

<Root>
  <Table1>
  <ID>1</ID>
  <SN>10411</SN>
  <ItemA>DATA</ItemA>
  <ItemX>DATAX</ItemX>
  <ItemY>DATAY</ItemY>
  <ItemZ>DATAZ</ItemZ>
<SN>10411</SN>
  <ItemXX>A</ItemXX>
  <ItemYY>B</ItemYY>
  <ItemZZ>C</ItemZZ>
  </Table1>
  <Table1>
  <ID>2</ID>
 <SN>10411</SN>
  <ItemA>DATA1</ItemA>
  <ItemX>DATAX1</ItemX>
  <ItemY>DATAY1</ItemY>
  <ItemZ>DATAZ1</ItemZ>
<SN>10411</SN>
 <ItemXX>D</ItemXX>
  <ItemYY>E</ItemYY>
  <ItemZZ>F</ItemZZ>
  </Table1>
  <Table1>
 <ID>3</ID>
 <SN>10412</SN>
  <ItemA>DATA2</ItemA>
  <ItemX>DATAX2</ItemX>
  <ItemY>DATAY2ItemY>
  <ItemZ>DATAZ2</ItemZ>
<SN>10412</SN>
<ItemXX>G</ItemXX>
  <ItemYY>H</ItemYY>
  <ItemZZ>I</ItemZZ>
  </Table1>
  </Root>

How do I obtain File2?

Fil2:
<Root>
  <Table1>
  <ID>1</ID>
  <SN>10411</SN>
  <ItemA>DATA</ItemA>
  <ItemX>DATAX</ItemX>
  <ItemY>DATAY</ItemY>
  <ItemZ>DATAZ</ItemZ>
  <ItemXX>A</ItemXX>
  <ItemYY>B</ItemYY>
  <ItemZZ>C</ItemZZ>
  </Table1>
  <Table1>
  <ID>2</ID>
 <SN>10411</SN>
  <ItemA>DATA1</ItemA>
  <ItemX>DATAX1</ItemX>
  <ItemY>DATAY1</ItemY>
  <ItemZ>DATAZ1</ItemZ>
 <ItemXX>D</ItemXX>
  <ItemYY>E</ItemYY>
  <ItemZZ>F</ItemZZ>
  </Table1>
  <Table1>
 <ID>3</ID>
 <SN>10412</SN>
  <ItemA>DATA2</ItemA>
  <ItemX>DATAX2</ItemX>
  <ItemY>DATAY2ItemY>
  <ItemZ>DATAZ2</ItemZ>
<ItemXX>G</ItemXX>
  <ItemYY>H</ItemYY>
  <ItemZZ>I</ItemZZ>
  </Table1>
  </Root>

Thanks,

Victor
vcharlesAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

AndyAinscowFreelance programmer / ConsultantCommented:
The logic is following:
1) Open file for reading. Open another file for writing.
2) Read an element if there is one else quit.
3) Has this element been already read?
3a) Yes - goto 2
3B) No.  Write to output.  goto 2

As to point 3 - you have to decide what is a duplicate.
0
vcharlesAuthor Commented:
Hi,

I need to keep the data in the first 《SN》data  element  and remove other SN elemrnt in the same table.

Victor
0
AndyAinscowFreelance programmer / ConsultantCommented:
OK.  Have a list to store the ID's of the <<SN>> elements (10411, 10412....)
Read an item.
Check if that is in the list and either ignore the element and go to the next OR add the ID to the list and write to the output file the item should the ID not be in the list.
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

vcharlesAuthor Commented:
Hi,

Can you please send me an example in VB.NET.

Victor
0
Fernando SotoRetiredCommented:
Hi Victor;

The following code will do what you wish.

'' Load the XML document from the file system
Dim xdoc As XDocument = XDocument.Load("File path to the XML file")

'' Return all Table1 elements grouped by node name that has more ten 1 Element
Dim results = From table In xdoc.Descendants("Table1") _
              Select table.Elements.GroupBy(Function(n) n.Name) _
              .Where(Function(g) g.Count() > 1)

Remove the second and higher elements from the group
For Each grouping In results
    Console.WriteLine("Enter outer For Loop")
    For Each node As IGrouping(Of XName, XElement) In grouping
        node(1).Remove()
    Next
Next

'' Save the updated document          
xdoc.Save("File Path to save to")        

Open in new window

0
AndyAinscowFreelance programmer / ConsultantCommented:
>>Can you please send me an example in VB.NET.

I'll help you to do it yourself - what have you got that doesn't work?
0
vcharlesAuthor Commented:
Hi Fernando,

I tried the code but the duplicate SN data element was not removed. The only difference with my xml file is <Root> is was changed to <NewDataSet> but I don't think that is the reason it does not work.

Thank you all  for all the comments.

Victor
0
Fernando SotoRetiredCommented:
I think you need to look at your XML data file. The sample XML you posted works fine on my machine with the code I posted. Remember that XML data is case sensitive even in Visual Basic where in coding it is not case sensitive. Also node names needs to be the same as you stated in your sample otherwise it will most likely not work either.
0
vcharlesAuthor Commented:
Hi Fernando,

I finally figured out why the code did not work. I assumed if the solution removed duplicate <SN> it would not matter how many times I have them within a table.

I am merging data from 6 dataGrids into an xml file, where each Grid contains <SN>, but would only have one <SN> in my xml fiile, below is an example of the actual format. Sorry for not being more clear in my initial post.

The solution removes only one duplicate <SN> within each table instead of five duplicate <SN>.

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Root>
   <Table1>
   <ID>1</ID>
   <SN>10411</SN>
   <ItemA>DATA</ItemA>
   <ItemX>DATAX</ItemX>
   <ItemY>DATAY</ItemY>
   <ItemZ>DATAZ</ItemZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
   </Table1>
 <Table1>
   <ID>1</ID>
   <SN>10411</SN>
   <ItemA>DATA</ItemA>
   <ItemX>DATAX</ItemX>
   <ItemY>DATAY</ItemY>
   <ItemZ>DATAZ</ItemZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
   </Table1>
 <Table1>
   <ID>1</ID>
   <SN>10411</SN>
   <ItemA>DATA</ItemA>
   <ItemX>DATAX</ItemX>
   <ItemY>DATAY</ItemY>
   <ItemZ>DATAZ</ItemZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
 <SN>10411</SN>
   <ItemXX>A</ItemXX>
   <ItemYY>B</ItemYY>
   <ItemZZ>C</ItemZZ>
   </Table1>
</Root>
0
Fernando SotoRetiredCommented:
Hi Victor;

Try this out for the new format of the XML..

'' Load the XML document from the file system                                 
Dim xdoc As XDocument = XDocument.Load("File path to the XML file")    
                                                                              
'' Return all Table1 elements grouped by node name that has more ten 1 Element
Dim results = From table In xdoc.Descendants("Table1") _                      
              Select table.Elements.GroupBy(Function(n) n.Name) _             
              .Where(Function(g) g.Count() > 1)                               
                                                                                                                                                            
'' Remove the second and higher elements from the group                       
For Each grouping In results                                                  
    For Each nodeGroup As IGrouping(Of XName, XElement) In grouping           
        Dim first = nodeGroup(0).Value                                        
        Dim count As Integer = nodeGroup.Count() - 1                          
        For idx As Integer = 1 To count Step 1                                
            If nodeGroup(idx).Value = first Then                              
                nodeGroup(idx).Remove()                                       
            End If                                                            
        Next                                                                  
    Next                                                                      
Next                                                                          
                                                                              
'' Save the updated document                                                  
xdoc.Save("File Path to save to")                                           
       

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
vcharlesAuthor Commented:
It works.
Thank You.
Victor
0
vcharlesAuthor Commented:
Hi Fernando,

How do I modify the code to  keep the last value and remove the first 4 duplicates, instead of keeping the first instance and removing the next 5 duplicates?
Thanks,
Victor
0
Fernando SotoRetiredCommented:
I do not understand all are duplicates of one another so what is the difference whether you keep the first or last?
0
vcharlesAuthor Commented:
Hi Fernando,

This issue is resolved.

Thanks
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.