math errors in msxml

Posted on 2013-12-26
Last Modified: 2014-02-14
I'm getting crazy with math errors (approximations?) in msxml (Windows 7)...

Take this XML:


Open in new window

Damn....  why does    $o.selectnodes("/xml/number/text()[.</xml/now/text()-/xml/difference/text()]").length    return 4?  (instead of 3)
Question by:lucavilla
  • 7
  • 6
LVL 60

Accepted Solution

Geert Bormans earned 500 total points
ID: 39740728
This is not a MicroSoft parser issue, this is a XPath1 issue.
XSLT1 mandates that numbers are calculated using double type
so the loss in precision is predictable and all XPath processors I tested this with return 4

You can fix this by using XPath2.0 (be it there is no microsoft implementation) and use xs:decimal

I also want to note that you are counting text() nodes, not number elements,
you should avoid using text() in most of the cases
I recommend this XPath instead of yours (though it does not fix your issue)
/xml/number[number(.) &lt; (number(/xml/now) - number(/xml/difference))]

Author Comment

ID: 39740807
Thanks Geert

About "number()", I though that being that text() works, text() is faster because number() has to do an (unneded) conversion.   Are you sure that the use of number() would be better?

Actually I need to extensively track my XML elements with timestamps (like "130102.123452") and to make extractions specifying time differences like for example all the timestamps that are "< now - 10 seconds".
How would you solve this need?
LVL 60

Expert Comment

by:Geert Bormans
ID: 39740864
As soon as you do numeric calculations, there is an implicit cast to numbers. By making it explicit, you help the processor. It no longer needs the logic to imply that you want to use it as a number. You tell the processor you want to use it as a number. In theory using number() is faster, though your processor might optimize and then it does not matter. So I conisder that self preservation. Make it obvious to yourself that you need it to be a number

text() is a different discussion. number() is a (casting) function. text() is a node test. That is a different thing. It belongs in the box with node() and comment() etc...
/xml/number/text()[$some-condition] selects a text node, not the element number
but noone guarantees that there is only one text() node in there. So best practice you avoid the use of text(). If you need casting to a string, use string() and use text() only when you need to explicitely address a specific text() node eg. following-sibling::text()[1] if you need to test if the next node is a text node. excessive use of text() is on the top 5 list of common XPath/XSLT mistakes.

There is no relation in my previous comment between dropping text() and adding number(). I hope I clarified that now

Working around the precision issues
- use format-number() to force an exact number of decimals to your numbers (now it is a string)
- use substring-before(., '.') and substring-after() to get the integer and decimal part as comparable integers.
- do the calculus on the integers at full precision
MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.


Author Comment

ID: 39743567
Thanks Gertone, you clarified to me some things.

A question: supposing that we want to take that number inside <now></now> excluding any text inside any children(s) (is any), what would be the best msxml command?

I tried the following without success so I'm still a little confused:

Maybe I must not use number() here because msxml can only extract text (with ".text")?
and is the filter "[.]" to avoid inclusion of any children(s)?

Author Comment

ID: 39743574
Another question:  I'm seeing that I have correct results if I remove the decimal point in timestamps, even if I add the leading "20" for year   (so that I have timestamps like "20131228154710").

What are the limits of double type?  do I still have to worry about unprecision in case I use timestamps of these length without decimals?  (I would only perform date differences or comparisons)
LVL 60

Expert Comment

by:Geert Bormans
ID: 39744523
[ ] indicates a predicate
. is short for self::node()
[.] doesn't do anything in general
I told you to avoid text(), it does not do anything in this case and can only break your code
number() casts a node to a number, you can use it in tests in such an XPath, but I don't think it is a good idea to have the global expression inside a selectSingleNode wrapped in a number. Why cast to a number if you need a node that you want to get the .text from
should be

As a short option you can indeed just drop the '.' (use translate() for that)
make sure that you use format-number() first to ensure the number of decimals are alligned if you do that

I think you will be safe. double uses 64 bits for a number. If you don't have decimals, you would have double the number of digits of int32
(I quickly checked wikipedia and some java and .net references and they all seem to agree that the maximum for double type is 1.79769313486231570E+308)
Of course there is an XPath processor implementation layered on top of your type system, so you 'd better test some first

Author Comment

ID: 39760323
Problem: "supposing that we want to take that number inside <now></now> excluding any text inside any children(s) (is any), what would be the best msxml command?"

I tested with this:

Open in new window

Your solution $oDoc.SelectSingleNode("/xml/now").text  returns "130102.123456123foobar" that is not what I want.

while $oDoc.SelectSingleNode("/xml/now/text()").text  returns "130102.123456123" that is what I want.

I saw that even when I want to write/overwrite that number, "text()" is necessary to avoid loosing the "<abc>foobar</abc>" forever!
Therefore even the right command to write is $oDoc.SelectSingleNode("/xml/now/text()").text  =  123456.112233444

About the "[.]" you were right: it seems to not have any effect...
LVL 60

Expert Comment

by:Geert Bormans
ID: 39760421
You are throwing new XML into the equation
(note that I optimize the XPath based on the XML I see, not on all possible XMLs that could occur, unless you give me that information)

By using selectSingleNode and text() you only get the first node(). By choosing the higher up element (now) you get stringified content of now... safer in the earlier XML

I wanted to point out a risk, of course you can find examples that break my suggestion
(by changing requirements after the development, you start to sound like a real customer by the way ;-)

I was raising the warning bell on the text() node.
Your Xpath only selects the first text() node in your now element
If you are sure that the first actual text node in the now element is ALWAYS the one you are after, then OK, you have found a good usecase for text(). Note that whatever is between </abc> and </now> is another text node, not found by your XPath

Whilst you think your example breaks my point, it actually enforces it

Now you have proven that other nodes might exist inside the <now> element, try some of those
<now> <abc>foobar</abc>130102.123456123</now>




You could claim that the only realistic one is the first. And you might find out it works as you please... OK, please note that that one works because of a flaw in the msxml parser, it would break with any other XML parser... and it would definitely break if your XML had a DTD or a schema (are you sure it will never have one?)

So, if you are certain that text() works for you, and you are comfortable it will not break in any of your use cases, please use it. If it breaks at one point, I hope I gave you some things to think about why that could happen

If I were to do this task, and I were to go for the safe route, I would select all child nodes of the now element, combine all the text nodes into one and use that. For putting the nodes back, I would reconstruct from the child element nodes found
[caveat: my colleagues often tell me I am taking too much control, but they admit I am the one that needs the least time fixing production glitches :-) ]

Author Comment

ID: 39760536
hehe Gertone you've always an eagle eye's but I didn't change requirements, I just copied and pasted the question I posted on 2013-12-28 at 15:40:41 where I searched for a solution excluding any subchildren of the element "now" (if any).

Anyway, I understand your point that text(), while having the advantage (for me, in general use) to avoid text nodes of subchildrens, limits to the first text node of the root of the element.

So... is there a final solution in msxml + XPath 1 to both avoid subchildrens text nodes while including all the text nodes of the root of the element?
LVL 60

Expert Comment

by:Geert Bormans
ID: 39760605
try selectNodes on the text() nodes instead of selectSingleNode,
at least you get all of them
and use your .net code to join all the text nodes from the node array into one

Author Comment

ID: 39760621
It seems that selectNodes doesn't support the ".text" property.
It gives an error...   while with selectSingleNode it works...
LVL 60

Expert Comment

by:Geert Bormans
ID: 39760648
selectNodes returns an array of nodes, not a single node, so you need to iterate the array and get the .text of each single node and concat that to a string.

Author Closing Comment

ID: 39859945

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The Problem How to write an Xquery that works like a SQL outer join, providing placeholders for absent data on the outer side?  I give a bit more background at the end. The situation expressed as relational data Let’s work through this.  I’ve …
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question