Solved

math errors in msxml

Posted on 2013-12-26
13
204 Views
Last Modified: 2014-02-14
I'm getting crazy with math errors (approximations?) in msxml (Windows 7)...

Take this XML:

<xml>
	<now>130102.123456</now>
	<difference>0.000001</difference>
	<number>130102.123456</number>
	<number>130102.123455</number>
	<number>130102.123454</number>
	<number>130102.123453</number>
	<number>130102.123452</number>
</xml>

Open in new window


Damn....  why does    $o.selectnodes("/xml/number/text()[.</xml/now/text()-/xml/difference/text()]").length    return 4?  (instead of 3)
0
Comment
Question by:lucavilla
  • 7
  • 6
13 Comments
 
LVL 60

Accepted Solution

by:
Geert Bormans earned 500 total points
ID: 39740728
This is not a MicroSoft parser issue, this is a XPath1 issue.
XSLT1 mandates that numbers are calculated using double type
so the loss in precision is predictable and all XPath processors I tested this with return 4

You can fix this by using XPath2.0 (be it there is no microsoft implementation) and use xs:decimal

I also want to note that you are counting text() nodes, not number elements,
you should avoid using text() in most of the cases
I recommend this XPath instead of yours (though it does not fix your issue)
/xml/number[number(.) &lt; (number(/xml/now) - number(/xml/difference))]
0
 

Author Comment

by:lucavilla
ID: 39740807
Thanks Geert

:(((
About "number()", I though that being that text() works, text() is faster because number() has to do an (unneded) conversion.   Are you sure that the use of number() would be better?

Actually I need to extensively track my XML elements with timestamps (like "130102.123452") and to make extractions specifying time differences like for example all the timestamps that are "< now - 10 seconds".
How would you solve this need?
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39740864
As soon as you do numeric calculations, there is an implicit cast to numbers. By making it explicit, you help the processor. It no longer needs the logic to imply that you want to use it as a number. You tell the processor you want to use it as a number. In theory using number() is faster, though your processor might optimize and then it does not matter. So I conisder that self preservation. Make it obvious to yourself that you need it to be a number

text() is a different discussion. number() is a (casting) function. text() is a node test. That is a different thing. It belongs in the box with node() and comment() etc...
/xml/number/text()[$some-condition] selects a text node, not the element number
but noone guarantees that there is only one text() node in there. So best practice you avoid the use of text(). If you need casting to a string, use string() and use text() only when you need to explicitely address a specific text() node eg. following-sibling::text()[1] if you need to test if the next node is a text node. excessive use of text() is on the top 5 list of common XPath/XSLT mistakes.

There is no relation in my previous comment between dropping text() and adding number(). I hope I clarified that now

Working around the precision issues
- use format-number() to force an exact number of decimals to your numbers (now it is a string)
- use substring-before(., '.') and substring-after() to get the integer and decimal part as comparable integers.
- do the calculus on the integers at full precision
0
 

Author Comment

by:lucavilla
ID: 39743567
Thanks Gertone, you clarified to me some things.

A question: supposing that we want to take that number inside <now></now> excluding any text inside any children(s) (is any), what would be the best msxml command?

I tried the following without success so I'm still a little confused:
$oDoc.SelectSingleNode("number(/xml/now/text()[.])").text

Maybe I must not use number() here because msxml can only extract text (with ".text")?
and is the filter "[.]" to avoid inclusion of any children(s)?
0
 

Author Comment

by:lucavilla
ID: 39743574
Another question:  I'm seeing that I have correct results if I remove the decimal point in timestamps, even if I add the leading "20" for year   (so that I have timestamps like "20131228154710").

What are the limits of double type?  do I still have to worry about unprecision in case I use timestamps of these length without decimals?  (I would only perform date differences or comparisons)
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39744523
[ ] indicates a predicate
. is short for self::node()
[.] doesn't do anything in general
I told you to avoid text(), it does not do anything in this case and can only break your code
number() casts a node to a number, you can use it in tests in such an XPath, but I don't think it is a good idea to have the global expression inside a selectSingleNode wrapped in a number. Why cast to a number if you need a node that you want to get the .text from
$oDoc.SelectSingleNode("number(/xml/now/text()[.])").text
should be
$oDoc.SelectSingleNode("/xml/now").text

As a short option you can indeed just drop the '.' (use translate() for that)
make sure that you use format-number() first to ensure the number of decimals are alligned if you do that

I think you will be safe. double uses 64 bits for a number. If you don't have decimals, you would have double the number of digits of int32
(I quickly checked wikipedia and some java and .net references and they all seem to agree that the maximum for double type is 1.79769313486231570E+308)
Of course there is an XPath processor implementation layered on top of your type system, so you 'd better test some first
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 

Author Comment

by:lucavilla
ID: 39760323
Problem: "supposing that we want to take that number inside <now></now> excluding any text inside any children(s) (is any), what would be the best msxml command?"

I tested with this:
<xml>
<now>130102.123456123<abc>foobar</abc></now>
</xml>

Open in new window


Your solution $oDoc.SelectSingleNode("/xml/now").text  returns "130102.123456123foobar" that is not what I want.

while $oDoc.SelectSingleNode("/xml/now/text()").text  returns "130102.123456123" that is what I want.

I saw that even when I want to write/overwrite that number, "text()" is necessary to avoid loosing the "<abc>foobar</abc>" forever!
Therefore even the right command to write is $oDoc.SelectSingleNode("/xml/now/text()").text  =  123456.112233444

About the "[.]" you were right: it seems to not have any effect...
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39760421
You are throwing new XML into the equation
(note that I optimize the XPath based on the XML I see, not on all possible XMLs that could occur, unless you give me that information)

By using selectSingleNode and text() you only get the first node(). By choosing the higher up element (now) you get stringified content of now... safer in the earlier XML

I wanted to point out a risk, of course you can find examples that break my suggestion
(by changing requirements after the development, you start to sound like a real customer by the way ;-)

I was raising the warning bell on the text() node.
Your Xpath only selects the first text() node in your now element
If you are sure that the first actual text node in the now element is ALWAYS the one you are after, then OK, you have found a good usecase for text(). Note that whatever is between </abc> and </now> is another text node, not found by your XPath

Whilst you think your example breaks my point, it actually enforces it

Now you have proven that other nodes might exist inside the <now> element, try some of those
<now> <abc>foobar</abc>130102.123456123</now>

<now>1<abc>foobar</abc>30102.123456123</now>

<now>130102<decimalpoint/>123456123</now>

<now>130102.12<!--glitch-->3456123</now>

You could claim that the only realistic one is the first. And you might find out it works as you please... OK, please note that that one works because of a flaw in the msxml parser, it would break with any other XML parser... and it would definitely break if your XML had a DTD or a schema (are you sure it will never have one?)

So, if you are certain that text() works for you, and you are comfortable it will not break in any of your use cases, please use it. If it breaks at one point, I hope I gave you some things to think about why that could happen

If I were to do this task, and I were to go for the safe route, I would select all child nodes of the now element, combine all the text nodes into one and use that. For putting the nodes back, I would reconstruct from the child element nodes found
[caveat: my colleagues often tell me I am taking too much control, but they admit I am the one that needs the least time fixing production glitches :-) ]
0
 

Author Comment

by:lucavilla
ID: 39760536
hehe Gertone you've always an eagle eye's but I didn't change requirements, I just copied and pasted the question I posted on 2013-12-28 at 15:40:41 where I searched for a solution excluding any subchildren of the element "now" (if any).

Anyway, I understand your point that text(), while having the advantage (for me, in general use) to avoid text nodes of subchildrens, limits to the first text node of the root of the element.

So... is there a final solution in msxml + XPath 1 to both avoid subchildrens text nodes while including all the text nodes of the root of the element?
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39760605
try selectNodes on the text() nodes instead of selectSingleNode,
at least you get all of them
and use your .net code to join all the text nodes from the node array into one
0
 

Author Comment

by:lucavilla
ID: 39760621
It seems that selectNodes doesn't support the ".text" property.
It gives an error...   while with selectSingleNode it works...
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 39760648
selectNodes returns an array of nodes, not a single node, so you need to iterate the array and get the .text of each single node and concat that to a string.
0
 

Author Closing Comment

by:lucavilla
ID: 39859945
...
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now