Help with combining data in xml file based on identical country using VB.NET

Hi,

How do combine all records with identical <COUNTRY> in create one record combining data from other data elements, for example, If I have xml file in Part A, how do I create a new xml file as shown in art B? The example is only for one record, the xml file in part A will contain multiple records with similar situation.

Part A.
<NewDataSet>
  <AOP5>
    <SCOUNTRY>ITA 10411</SCOUNTRY>
    <SN></SN>
    <COUNTRY>ITA</COUNTRY>
    <AGD4></AGD4>
  </AOP5
</NewDataSet>
  <AOP5>
    <SCOUNTRY></SCOUNTRY>
    <SN>10411</SN>
    <COUNTRY>ITA</COUNTRY>
    <AGD4></AGD4>
  </AOP5
  <AOP5>
    <SCOUNTRY></SCOUNTRY>
    <SN></SN>
    <COUNTRY>ITA</COUNTRY>
    <AGD4>M777</AGD4>
  </AOP5>
</NewDataSet>

Part B:
<NewDataSet>
  <AOP5>
    <SCOUNTRY>ITA 10411</SCOUNTRY>
    <SN>10411</SN>
    <COUNTRY>ITA</COUNTRY>
    <AGD4>M777</AGD4>
  </AOP5>
</NewDataSet>

Thanks,

Victor
vcharlesAsked:
Who is Participating?
 
louisfrConnect With a Mentor Commented:
With the correction I made earlier, Fernando's code would have worked as expected. Here is the corrected code, including the change from COUNTRY to NSC:
Dim xdoc As XDocument = XDocument.Load("Path to the XML document with input data ")
'' New document being created
Dim newXdoc As New XDocument(New XElement("Root"))
'' Get all the parent nodes
Dim parentNodes = (From aop5 As XElement In xdoc.Descendants("AOP5") _
                   Group By aop5.Element("NSC").Value Into Group _
                   Select Group).ToList()

'' Select all the needed nodes and place them in a new document 
For Each p As XElement() In parentNodes
    Dim nodes = (From n As XElement In p.Descendants().ToList() _
                 Where Not String.IsNullOrEmpty(n.Value)
                 Group By n.Name Into Group _
                 Select Group.First()).ToList()

     '' Create the new elements of the document
    Dim addNodes = New XElement("NewDataSet", nodes)
    newXdoc.Root.Add(addNodes)
Next
 '' Save the new document
newXdoc.Save("Path and file name to store on disk")

Open in new window

0
 
Fernando SotoRetiredCommented:
Hi Victor;

Your original document had a single value over all like records, for example one AGD4 having a value and all other records having an empty values in the node. Is it possible to have multiple values in the same node type across all records like shown below? If yes how do you want to handle these multiple values?

<NewDataSet>
  <AOP5>
    <SCOUNTRY>ITA 10411</SCOUNTRY>
    <SN>2012</SN>
    <COUNTRY>ITA</COUNTRY>
    <AGD4></AGD4>
  </AOP5
</NewDataSet>
  <AOP5>
    <SCOUNTRY>WA 1020</SCOUNTRY>
    <SN>10411</SN>
    <COUNTRY>ITA</COUNTRY>
    <AGD4>F999</AGD4>
  </AOP5
  <AOP5>
    <SCOUNTRY>FL 2031</SCOUNTRY>
    <SN>1141</SN>
    <COUNTRY>ABC</COUNTRY>
    <AGD4>M777</AGD4>
  </AOP5>
</NewDataSet>

Open in new window

0
 
vcharlesAuthor Commented:
Hi,

If this occurs the first record containing a data element  with data shoud overwrite the other records, for example if I have:\ the following in Part A:

Part A.
 <NewDataSet>
   <AOP5>
     <SCOUNTRY>ITA 10411</SCOUNTRY>
     <SN></SN>
     <COUNTRY>ITA</COUNTRY>
     <AGD4>M800</AGD4>
   </AOP5
 </NewDataSet>
   <AOP5>
     <SCOUNTRY></SCOUNTRY>
     <SN>10412</SN>
     <COUNTRY>ITA</COUNTRY>
     <AGD4></AGD4>
   </AOP5
   <AOP5>
     <SCOUNTRY></SCOUNTRY>
     <SN>10411</SN>
     <COUNTRY>ITA</COUNTRY>
     <AGD4>M777</AGD4>
   </AOP5>
 </NewDataSet>

Part B should contain:

Part B:
 <NewDataSet>
   <AOP5>
     <SCOUNTRY>ITA 10411</SCOUNTRY>
     <SN>10412</SN>
     <COUNTRY>ITA</COUNTRY>
     <AGD4>M800</AGD4>
   </AOP5>
 </NewDataSet>
0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

 
Fernando SotoConnect With a Mentor RetiredCommented:
Hi Victory;

This should do what you need.

Dim xdoc As XDocument = XDocument.Load("Path to the XML document with input data ")
'' New document being created
Dim newXdoc As New XDocument(New XElement("Root"))
'' Get all the parent nodes
Dim parentNodes = (From aop5 As XElement In xdoc.Descendants("AOP5") _
                   Group By aop5.Name Into Group _
                   Select Group).ToList()

'' Select all the needed nodes and place them in a new document 
For Each p As XElement() In parentNodes
    Dim nodes = (From n As XElement In p.Descendants().ToList() _
                 Where Not String.IsNullOrEmpty(n.Value)
                 Group By n.Name Into Group _
                 Select Group.First()).ToList()

     '' Create the new elements of the document
    Dim addNodes = New XElement("NewDataSet", nodes)
    newXdoc.Root.Add(addNodes)
Next
 '' Save the new document
newXdoc.Save("Path and file name to store on disk")

Open in new window

0
 
vcharlesAuthor Commented:
Thank you, I will get back to you.
0
 
vcharlesAuthor Commented:
Hi again,

I don't see where <Country> is mentioned to create one record from multiple identical values. Can you please explain the code to me.

Thanks,

Victor
0
 
vcharlesAuthor Commented:
Hi,

unfortunately it does not work, only one record was returned. I need to merge data from all data elements where country is the same and eliminate duplicate data in Country.

Victor
0
 
Fernando SotoRetiredCommented:
Hi Victor;

Please post you code as you implemented it and the file that you used as the input data.
0
 
louisfrCommented:
I don't see where <Country> is mentioned to create one record from multiple identical values.
In the sample, replace
Dim parentNodes = (From aop5 As XElement In xdoc.Descendants("AOP5") _
                    Group By aop5.Name Into Group _
                    Select Group).ToList()

Open in new window

with
Dim parentNodes = (From aop5 As XElement In xdoc.Descendants("AOP5") _
                    Group By aop5.Element("COUNTRY").Value Into Group _
                    Select Group).ToList()

Open in new window

0
 
Fernando SotoRetiredCommented:
Lets start from the beginning. In Part A in your question you have only AOP5 node names and no other. Does this mean that each file only handles one of these node type in each document you may have? Or can one document have different node types at this level?

You state that you want, "identical <COUNTRY>", but in your example you only show nodes of this, <COUNTRY>ITA</COUNTRY>, all having the same value. so it is difficult to help with something where all the information is not stated.

When dealing with XML we need accurate information on the structure that is to be operated on and what the output is to be.  

Please post an accurate representation of the input XML with all possible types and what you want the output to look like. Something like a mini specification your boss would give you to work on a project.
0
 
vcharlesAuthor Commented:
Hi,
 I will post a better example later today.
Thanks,
Victor
0
 
vcharlesAuthor Commented:
Hi,

Below is a better example with three different NSC values (Part A) with duplicate values in three records, which need to be combined in three records in Part B.


Part A.
 <NewDataSet>
   <AOP5>
     <SCOUNTRY>ITA 10411</SCOUNTRY>
     <SN></SN>
     <NSC>100</NSC>
     <AGD4></AGD4>
   </AOP5
   <AOP5>
     <SCOUNTRY></SCOUNTRY>
     <SN>10411</SN>
     <NSC>100</NSC>
     <AGD4></AGD4>
   </AOP5
   <AOP5>
     <SCOUNTRY></SCOUNTRY>
     <SN></SN>
     <NSC>100</NSC>
     <AGD4>M777</AGD4>
   </AOP5>
 <AOP5>
     <SCOUNTRY>ITA 10411</SCOUNTRY>
     <SN></SN>
     <NSC>101</NSC>
     <AGD4></AGD4>
   </AOP5
 </NewDataSet>
   <AOP5>
     <SCOUNTRY></SCOUNTRY>
     <SN>10411</SN>
     <NSC>101</NSC>
     <AGD4></AGD4>
   </AOP5
   <AOP5>
     <SCOUNTRY></SCOUNTRY>
     <SN></SN>
     <NSC>101</NSC>
     <AGD4>M777</AGD4>
   </AOP5>
 <AOP5>
     <SCOUNTRY>ITA 10411</SCOUNTRY>
     <SN></SN>
     <NSC>10111</NSC>
     <AGD4></AGD4>
   </AOP5
 </NewDataSet>
   <AOP5>
     <SCOUNTRY></SCOUNTRY>
     <SN>10411</SN>
     <NSC>10111</NSC>
     <AGD4></AGD4>
   </AOP5
   <AOP5>
     <SCOUNTRY></SCOUNTRY>
     <SN></SN>
     <NSC>10111</NSC>
     <AGD4>M777</AGD4>
   </AOP5>
 </NewDataSet>

Desired Results:

Part B:
 <NewDataSet>
   <AOP5>
     <SCOUNTRY>ITA 10411</SCOUNTRY>
     <SN>10411</SN>
     <NSC>100</NSC>
     <AGD4>M777</AGD4>
   </AOP5>
<AOP5>
     <SCOUNTRY>ITA 10411</SCOUNTRY>
     <SN>10411</SN>
     <NSC>101</NSC>
     <AGD4>M777</AGD4>
   </AOP5>
<AOP5>
     <SCOUNTRY>ITA 10411</SCOUNTRY>
     <SN>10411</SN>
     <NSC>10111</NSC>
     <AGD4>M777</AGD4>
   </AOP5>
 </NewDataSet>

Thanks,

Victor
0
 
vcharlesAuthor Commented:
Thank you, will try it and get back to you.
Victor
0
 
vcharlesAuthor Commented:
Hi,

I'm get only one record for each NSC but they do not include values from the other data elements of the other rows which contained the same NSC.

Victor
0
 
vcharlesAuthor Commented:
Hi,

The code works but I see the problem. The data elements with no data include "N/A", therefore  N/A is included in the results eventhough in another record with same NSC there is an existing value in that data element.  When I remove N/A in all data elements the results only returns data elements with data.
Is there a way to include the empty data elements with my results for all the records?

Thanks,

Victor
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.