Link to home
Start Free TrialLog in
Avatar of andre72
andre72

asked on

Parse XML file and find nodes with a special attribute

Hi,

I've SVG documents that looks like this:
<svg>
   <svg xmlns="test">
       <svg xmlns="sbutest1"></svg>
       <svg xmlns="sbutest2"></svg>
  </svg>
   <svg xmlns="test2">
       <svg xmlns="sbutest3"></svg>
   </svg>
</svg>

Now I've to parse it and save the last childs into own files (subtest1.xml, subtest2.xml) etc.
Any idea how to do this?

Thanks,

Andre
Avatar of abel
abel
Flag of Netherlands image

Not sure what the "this" refers to in that last sentence. You say that you parsed it and saved them in separate files. What is your next task that you have trouble with? Or did you have problems with that task, if so, what? Can you show your C# code and point to the place where it goes wrong or that you have trouble with?
The code below should get you started.
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
foreach (XmlNode node1 in doc.ChildNodes)
  foreach (XmlNode node2 in node1.ChildNodes)
    foreach (XmlNode node3 in node2.ChildNodes)
      // save node 3 to a file

Open in new window

Avatar of andre72
andre72

ASKER

Thanks dericstone, that nearly what I'm looking for . but what I show was just an example, more child nodes are also possible, eg.

   <svg xmlns="test">
       <svg xmlns="sbutest1">
            <svg xmlns="subsubtest1">
                <svg xmlns="subsubsubtest1"></svg>
                 <svg xmlns="subsubsubtest2">
                         <svg xmlns="subsubsubsubtest2 /">
                 </svg>
           </svg>
      </svg>

would also be possible ...
       <svg xmlns="sbutest2"></svg>
  </svg>
Avatar of andre72

ASKER

Sorry, copy past mistake ;-)
andre, I asked my questions in my first comment with a reason. Your code shows a very unlikely scenario for svg data (is it svg at all?). You ask for attributes, but there is not a single attribute in your XML code, instead, there are only namespace attributes which, despite the name, or not attributes.

Using regular foreach loops is not the approach you will be after. Instead, either LINQ to XML or SelectNodes / XpathSelectElements etc is more an approach you should be after.

Please, take the time to give us a better insight in what you want so we can help you to the point.
Avatar of andre72

ASKER

I'm sorry about abel, you're right, I'd been a little bit in a hurry when I did my first article.
Well and as I'm not good with SVG files I thought xmlns="" is an attribute like in "normal" xml files...
Ok, here we go again...
The xml (you're right, is ever svg) looks like svg do:
   <svg xmlns="test">
       <svg xmlns="sbutest1">
            <svg xmlns="subsubtest1">
                <svg xmlns="subsubsubtest1"></svg>
                 <svg xmlns="subsubsubtest2">
                         <svg xmlns="subsubsubsubtest2 /">
                 </svg>
           </svg>
      </svg>
       <svg xmlns="sbutest2"></svg>
  </svg>

So I've to read it recursive but only the nodes with xmlns="xyz" are needed.
If I get one I need to save it with any child nodes.
I also include a example (just testing for recursive working), this is a little bit mystic (for me):
It works at all, but doc.Load(file); takes about 10 seconds to load for a 4kb SVG file.
Well, with doc.XmlResolver = null; is much faster but no more resursive call than?!?
Also I'm not sure about if this is really a good solution for as I'm a novice with xml...
Thanks,

Andre
XmlDocument doc = new XmlDocument();
doc.XmlResolver = null;
doc.Load(file);
GetNode(doc.DocumentElement);
 
        private void GetNode(XmlNode inXmlNode)
        {
 
            XmlAttributeCollection xmlAttrs = inXmlNode.Attributes;
            XmlNode xmlAttr = xmlAttrs.GetNamedItem("xmlns");
 
            if (inXmlNode.HasChildNodes && xmlAttr!=null)
            {
                Console.WriteLine((inXmlNode.OuterXml).Trim());
                nodeList = inXmlNode.ChildNodes;
                for (i = 0; i <= nodeList.Count - 1; i++)
                {
                    xNode = inXmlNode.ChildNodes[i];
                    GetNode(xNode);
                }
            }
            else
            {
                if (xmlAttr != null)
                {
                    Console.WriteLine((inXmlNode.OuterXml).Trim());
                }
            }
        }  

Open in new window

Avatar of andre72

ASKER

Arggs, again an error - xmlns for sure is not ever given
<svg id="test">
       <svg name="sbutest1">
            <svg xmlns="subsubtest1"> <!-- save from here 1 -->
                <svg"></svg>
                 <svg xmlns="subsubsubtest2"> <!-- save from here 2 -->
                         <svg xmlns="subsubsubsubtest2 /">
                 </svg> <!-- to here 2 -->
           </svg> <!-- to here 1 -->
      </svg>
       <svg xmlns="sbutest2"></svg>
  </svg>
I'm quite surprised still about the structure of your svg. I have to take your word for it that it looks the way it does, and I assume for now that what you put inside the xmlns-attributes (namespace attributes) is something starting with "http:" or "urn:". If not, the file is not XML + Namespaces compliant and parsers should raise an error (but in the case of the xmlns attribute, they can be lenient).

Normal SVG files have a structure like in the code example below (from http://www.w3schools.com/svg/radial2.svg). As you can see, it only has one xmlns attribute. It is allowed that the attribute is repeated, but the parts that have a different namespace (i.e., a different attribute value) are not part of the SVG spec and cannot be parsed as SVG.

Your problem in general can be best attacked with XSLT. I'll come up shortly (not sure if it'll be tonight) with an example in both C# and XSLT, which does what you want: get every element + child nodes that have a certain namespace.

-- Abel --

PS: your code is not working because you are asking for an attribute, and there isn't any. You cannot "just" ask for a node with a certain namespace, because a namespace is a scope and starts on the element where it is specified. That means, that elements not having the xmlns attribute specifically, can still be part of the result of your search. This is in the nature of XML and cannot be changed.


<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
 
<svg width="100%" height="100%" version="1.1"
xmlns="http://www.w3.org/2000/svg">
 
    <defs>
        <radialGradient id="grey_blue" cx="20%" cy="40%" r="50%" fx="50%" fy="50%">
            <stop offset="0%" style="stop-color:rgb(200,200,200);stop-opacity:0"/>
            <stop offset="100%" style="stop-color:rgb(0,0,255);stop-opacity:1"/>
        </radialGradient>
    </defs>
 
    <ellipse cx="230" cy="200" rx="110" ry="100"
    style="fill:url(#grey_blue)"/>
 
</svg>

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of abel
abel
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Here's the code in C#. It is really that simple. Just change the paths to how you have it now. Make sure to reference System.Xml

// simplest way of transforming XML with an XSLT stylesheet
 
XslCompiledTransform xslt = new XslCompiledTransform(true);
xslt.Load("transform.xslt");
xslt.Transform("input.xml", "output.xml");

Open in new window

Avatar of andre72

ASKER

abel, this is really a great solution and idea for it! and at all I learned a lot about. thanks!
You're welcome, glad it helped