asked on

math errors in msxml

I'm getting crazy with math errors (approximations?) in msxml (Windows 7)...

Take this XML:

<xml>
	<now>130102.123456</now>
	<difference>0.000001</difference>
	<number>130102.123456</number>
	<number>130102.123455</number>
	<number>130102.123454</number>
	<number>130102.123453</number>
	<number>130102.123452</number>
</xml>

Open in new window

Damn.... why does $o.selectnodes("/xml/number/text()[.</xml/now/text()-/xml/difference/text()]").length return 4? (instead of 3)

ASKER CERTIFIED SOLUTION

Gertone (Geert Bormans)

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

lucavilla

ASKER

Thanks Geert

:(((
About "number()", I though that being that text() works, text() is faster because number() has to do an (unneded) conversion. Are you sure that the use of number() would be better?

Actually I need to extensively track my XML elements with timestamps (like "130102.123452") and to make extractions specifying time differences like for example all the timestamps that are "< now - 10 seconds".
How would you solve this need?

Gertone (Geert Bormans)

As soon as you do numeric calculations, there is an implicit cast to numbers. By making it explicit, you help the processor. It no longer needs the logic to imply that you want to use it as a number. You tell the processor you want to use it as a number. In theory using number() is faster, though your processor might optimize and then it does not matter. So I conisder that self preservation. Make it obvious to yourself that you need it to be a number

text() is a different discussion. number() is a (casting) function. text() is a node test. That is a different thing. It belongs in the box with node() and comment() etc...
/xml/number/text()[$some-condition] selects a text node, not the element number
but noone guarantees that there is only one text() node in there. So best practice you avoid the use of text(). If you need casting to a string, use string() and use text() only when you need to explicitely address a specific text() node eg. following-sibling::text()[1] if you need to test if the next node is a text node. excessive use of text() is on the top 5 list of common XPath/XSLT mistakes.

There is no relation in my previous comment between dropping text() and adding number(). I hope I clarified that now

Working around the precision issues
- use format-number() to force an exact number of decimals to your numbers (now it is a string)
- use substring-before(., '.') and substring-after() to get the integer and decimal part as comparable integers.
- do the calculus on the integers at full precision

lucavilla

ASKER

Thanks Gertone, you clarified to me some things.

A question: supposing that we want to take that number inside <now></now> excluding any text inside any children(s) (is any), what would be the best msxml command?

I tried the following without success so I'm still a little confused:
$oDoc.SelectSingleNode("number(/xml/now/text()[.])").text

Maybe I must not use number() here because msxml can only extract text (with ".text")?
and is the filter "[.]" to avoid inclusion of any children(s)?

lucavilla

ASKER

Another question: I'm seeing that I have correct results if I remove the decimal point in timestamps, even if I add the leading "20" for year (so that I have timestamps like "20131228154710").

What are the limits of double type? do I still have to worry about unprecision in case I use timestamps of these length without decimals? (I would only perform date differences or comparisons)

Gertone (Geert Bormans)

[ ] indicates a predicate
. is short for self::node()
[.] doesn't do anything in general
I told you to avoid text(), it does not do anything in this case and can only break your code
number() casts a node to a number, you can use it in tests in such an XPath, but I don't think it is a good idea to have the global expression inside a selectSingleNode wrapped in a number. Why cast to a number if you need a node that you want to get the .text from
$oDoc.SelectSingleNode("number(/xml/now/text()[.])").text
should be
$oDoc.SelectSingleNode("/xml/now").text

As a short option you can indeed just drop the '.' (use translate() for that)
make sure that you use format-number() first to ensure the number of decimals are alligned if you do that

I think you will be safe. double uses 64 bits for a number. If you don't have decimals, you would have double the number of digits of int32
(I quickly checked wikipedia and some java and .net references and they all seem to agree that the maximum for double type is 1.79769313486231570E+308)
Of course there is an XPath processor implementation layered on top of your type system, so you 'd better test some first

lucavilla

ASKER

Problem: "supposing that we want to take that number inside <now></now> excluding any text inside any children(s) (is any), what would be the best msxml command?"

I tested with this:

<xml>
<now>130102.123456123<abc>foobar</abc></now>
</xml>

Open in new window

Your solution $oDoc.SelectSingleNode("/xml/now").text returns "130102.123456123foobar" that is not what I want.

while $oDoc.SelectSingleNode("/xml/now/text()").text returns "130102.123456123" that is what I want.

I saw that even when I want to write/overwrite that number, "text()" is necessary to avoid loosing the "<abc>foobar</abc>" forever!
Therefore even the right command to write is $oDoc.SelectSingleNode("/xml/now/text()").text = 123456.112233444

About the "[.]" you were right: it seems to not have any effect...

Gertone (Geert Bormans)

You are throwing new XML into the equation
(note that I optimize the XPath based on the XML I see, not on all possible XMLs that could occur, unless you give me that information)

By using selectSingleNode and text() you only get the first node(). By choosing the higher up element (now) you get stringified content of now... safer in the earlier XML

I wanted to point out a risk, of course you can find examples that break my suggestion
(by changing requirements after the development, you start to sound like a real customer by the way ;-)

I was raising the warning bell on the text() node.
Your Xpath only selects the first text() node in your now element
If you are sure that the first actual text node in the now element is ALWAYS the one you are after, then OK, you have found a good usecase for text(). Note that whatever is between </abc> and </now> is another text node, not found by your XPath

Whilst you think your example breaks my point, it actually enforces it

Now you have proven that other nodes might exist inside the <now> element, try some of those
<now> <abc>foobar</abc>130102.123456123</now>

<now>1<abc>foobar</abc>30102.123456123</now>

<now>130102<decimalpoint/>123456123</now>

<now>130102.123456123</now>

You could claim that the only realistic one is the first. And you might find out it works as you please... OK, please note that that one works because of a flaw in the msxml parser, it would break with any other XML parser... and it would definitely break if your XML had a DTD or a schema (are you sure it will never have one?)

So, if you are certain that text() works for you, and you are comfortable it will not break in any of your use cases, please use it. If it breaks at one point, I hope I gave you some things to think about why that could happen

If I were to do this task, and I were to go for the safe route, I would select all child nodes of the now element, combine all the text nodes into one and use that. For putting the nodes back, I would reconstruct from the child element nodes found
[caveat: my colleagues often tell me I am taking too much control, but they admit I am the one that needs the least time fixing production glitches :-) ]

lucavilla

ASKER

hehe Gertone you've always an eagle eye's but I didn't change requirements, I just copied and pasted the question I posted on 2013-12-28 at 15:40:41 where I searched for a solution excluding any subchildren of the element "now" (if any).

Anyway, I understand your point that text(), while having the advantage (for me, in general use) to avoid text nodes of subchildrens, limits to the first text node of the root of the element.

So... is there a final solution in msxml + XPath 1 to both avoid subchildrens text nodes while including all the text nodes of the root of the element?

Gertone (Geert Bormans)

try selectNodes on the text() nodes instead of selectSingleNode,
at least you get all of them
and use your .net code to join all the text nodes from the node array into one

lucavilla

ASKER

It seems that selectNodes doesn't support the ".text" property.
It gives an error... while with selectSingleNode it works...

Gertone (Geert Bormans)

selectNodes returns an array of nodes, not a single node, so you need to iterate the array and get the .text of each single node and concat that to a string.

lucavilla

ASKER

...