[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

generate-id, input document, intermediate document

Posted on 2009-04-27
19
Medium Priority
?
944 Views
Last Modified: 2013-11-18
To simplify things I have decided to first transform the input document into an intermediate form, and then to transform that into the final output.

During both phases I am using "generate-id" to form internal links within the document. In phase #1 the id's are generated from the input document. In phase #2 the id's are generated from the intermediate document.

Here is an example:

INPUT DOCUMENT:

<input-root>
  <a/>
</input-root>

INTERMEDIATE DOCUMENT:

<int-root>
   <a xml:id="GeneratedID1"/>
   <b/>
</int-root>

OUTPUT

<output-root>
  <a xml:id="GeneratedID1"/>
  <b xml:id="GeneratedID2"/>
</output-root>

Is it possible that "generate-id" will return "GeneratedID1" again despite being from a different document tree, but of a similar structure?

Phase #1 is stored in a variable, and then Phase #2 is generated from that. So both parses are completed within the same transform.

I am using XSLT 2.0 with Saxon.
0
Comment
Question by:numberkruncher
  • 10
  • 9
19 Comments
 
LVL 11

Expert Comment

by:kmartin7
ID: 24246652
>Is it possible that "generate-id" will return "GeneratedID1" again despite being from a different document tree, but of a similar structure?

I am a little confused as to exact what you are doing, and what you want to get, but using generate-id() will most likely duplicate id values in different documents. But not knowing exactly how you are going about doing it, I must reserve specific comment.

Can you expound on exactly what you are trying to do?

0
 
LVL 13

Author Comment

by:numberkruncher
ID: 24246757
I am using a custom document format for technical documentation. The XSLT uses "generate-id" to construct page jumps between various elements within the output document. This part of the system is almost set in stone, but I could make minor amendments if necessary.

This custom document is transformed into HTML output.

The input document contains data of a different nature which:
   - Contains a subset of the custom document format for documentation purposes.
   - During transformation into HTML can benefit from some of the custom document format.

DATA (WITH SUBSET OF CUSTOM FORMAT)  =>  CUSTOM FORMAT => HTML


   <!-- Phase #1 -->
   
     
   

   <!-- Phase #2 -->
   


During phase #1 page jumps are constructed using "generate-id" from the DATA nodes.

During phase #2 other page jumps are constructed using "generate-id" from the intermediate CUSTOM FORMAT nodes.

I actually have an implementation of this which appears to be working perfectly. But I need to make sure that the system is not going to fail in freak cases where Phase #1 and Phase #2 both generate the same unique ID.

I hope that this helps clarify my question.
0
 
LVL 11

Expert Comment

by:kmartin7
ID: 24246878
It is my understanding that you cannot be certain that the generate-id() function will not always generate unique IDs between disparate documents. Even the W3C spec states that you can't be certain that duplicates might exist within a document:

16.6.4 generate-id
generate-id() as xs:string
generate-id($node as node()?) as xs:string
The generate-id function returns a string that uniquely identifies a given node. The unique identifier must consist of ASCII alphanumeric characters and must start with an alphabetic character. Thus, the string is syntactically an XML name. An implementation is free to generate an identifier in any convenient way provided that it always generates the same identifier for the same node and that different identifiers are always generated from different nodes. An implementation is under no obligation to generate the same identifiers each time a document is transformed. There is no guarantee that a generated unique identifier will be distinct from any unique IDs specified in the source document. If the argument is the empty sequence, the result is the zero-length string. If the argument is omitted, it defaults to the context node. (emphasis mine).
We use an exslt uuid implementation that we include in our XSLTs. You call it basically the same, but you must use a uuid namespace to implement. Let me know if you are interested and I'll post (probably tomorrow since I am getting ready to go to bed for the night).


0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 13

Author Comment

by:numberkruncher
ID: 24246916
How do UUID values work around this issue?

Are generated UUID values compatible with HTML ID's so that they can be accessed via JavaScript with "document.getElementById"?
0
 
LVL 11

Expert Comment

by:kmartin7
ID: 24249986
Using this will ensure you get a unique ID on every element - even between mutliple documents. No ID is the same. It is an NCName value, which is proper for ID values.
0
 
LVL 13

Author Comment

by:numberkruncher
ID: 24249994
Will this function always return the same UID for the same element of the same document?
0
 
LVL 11

Expert Comment

by:kmartin7
ID: 24250084
You can at least try it, if you want. Copy the code below and create a new XSLT named "uuid.xsl", then include it into your existing XSLT using the include element. You must also declare an additional namespace, "uuid" as shown below (you can choose to exclude any result prefixes like I have shown below):

<xsl:stylesheet xmlns:uuid="http://www.uuid.org" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" exclude-result-prefixes="uuid">

<!-- here is the include --><xsl:include href="uuid.xsl"/>

<!-- rest of your existing stylesheet here -->

</xsl:stylesheet>

Then in your existing XSLT, replace each 'gnerate-id() with the uuid function like so (or similarly):

<xsl:attribute name="id">
      <xsl:value-of select="uuid:get-id()"/>
</xsl:attribute>

Let me know,

kmartin7
<xsl:stylesheet xmlns:uuid="http://www.uuid.org" xmlns:math="http://exslt.org/math" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
	<!-- Functions in the uuid: namespace are used to calculate a UUID The method used is a derived timestamp method, 
	which is explained here: http://www.famkruithof.net/guid-uuid-timebased.html 
	and here: http://www.ietf.org/rfc/rfc4122.txt -->
	
	<!-- Returns the UUID -->
	<xsl:function name="uuid:get-uuid" as="xs:string*">
		<xsl:variable name="ts" select="uuid:ts-to-hex(uuid:generate-timestamp())"/>
		<xsl:value-of separator="-" select=" substring($ts, 8, 8), substring($ts, 4, 4), string-join((uuid:get-uuid-version(), substring($ts, 1, 3)), ''), uuid:generate-clock-id(), uuid:get-network-node()"/>
	</xsl:function>
 
	<!-- internal aux. fu with Saxon, this creates a more-unique result with generate-id than when just using a variable containing a node -->
	<xsl:function name="uuid:_get-node">
		<xsl:comment/>
	</xsl:function>
 
	<!-- generates some kind of unique id -->
	<xsl:function name="uuid:get-id" as="xs:string">
		<xsl:sequence select="generate-id(uuid:_get-node())"/>
	</xsl:function>
 
	<!-- should return the next nr in sequence, but this can't be done in xslt. Instead, it returns a guaranteed unique number -->
	<xsl:function name="uuid:next-nr" as="xs:integer">
		<xsl:variable name="node">
			<xsl:comment/>
		</xsl:variable>
		<xsl:sequence select=" xs:integer(replace( generate-id($node), '\D', ''))"/>
	</xsl:function>
 
	<!-- internal fu for returning hex digits only -->
	<xsl:function name="uuid:_hex-only" as="xs:string">
		<xsl:param name="string"/>
		<xsl:param name="count"/>
		<xsl:sequence select=" substring(replace( $string, '[^0-9a-fA-F]', '') , 1, $count)"/>
	</xsl:function>
 
	<!-- may as well be defined as returning the same seq each time -->
	<xsl:variable name="_clock" select="uuid:get-id()"/>
	<xsl:function name="uuid:generate-clock-id" as="xs:string">
		<xsl:sequence select="uuid:_hex-only($_clock, 4)"/>
	</xsl:function>
 
	<!-- returns the network node, this one is 'random', but must be the same within calls. The least-significant bit must be '1' when it is not a real MAC address (in this case it is set to '1') -->
	<xsl:function name="uuid:get-network-node" as="xs:string">
		<xsl:sequence select="uuid:_hex-only('09-17-3F-13-E4-C5', 12)"/>
	</xsl:function>
 
	<!-- returns version, for timestamp uuids, this is "1" -->
	<xsl:function name="uuid:get-uuid-version" as="xs:string">
		<xsl:sequence select="'1'"/>
	</xsl:function>
 
	<!-- Generates a timestamp of the amount of 100 nanosecond intervals from 15 October 1582, in UTC time. -->
	<xsl:function name="uuid:generate-timestamp">
		<!-- date calculation automatically goes correct when you add the timezone information, in this case that is UTC. -->
		<xsl:variable name="duration-from-1582" as="xs:dayTimeDuration">
			<xsl:sequence select=" current-dateTime() - xs:dateTime('1582-10-15T00:00:00.000Z')"/>
		</xsl:variable>
		<xsl:variable name="random-offset" as="xs:integer">
			<xsl:sequence select="uuid:next-nr() mod 10000"></xsl:sequence>
		</xsl:variable>
 
		<!-- do the math to get the 100 nano second intervals -->
		<xsl:sequence select="(days-from-duration($duration-from-1582) * 24 * 60 * 60 + hours-from-duration($duration-from-1582) * 60 * 60 + minutes-from-duration($duration-from-1582) * 60 + seconds-from-duration($duration-from-1582)) * 1000 * 10000 + $random-offset"/>
	</xsl:function>
	<!-- simple non-generalized function to convert from timestamp to hex -->
	<xsl:function name="uuid:ts-to-hex">
		<xsl:param name="dec-val"/>
		<xsl:value-of separator="" select=" for $i in 1 to 15 return (0 to 9, tokenize('A B C D E F', ' ')) [ $dec-val idiv xs:integer(math:power(16, 15 - $i)) mod 16 + 1 ]"/>
	</xsl:function>
</xsl:stylesheet>

Open in new window

0
 
LVL 11

Expert Comment

by:kmartin7
ID: 24250104
>Will this function always return the same UID for the same element of the same document?
No. I am sorry, but I thought you wanted different IDs each time. My bad. lol.
0
 
LVL 13

Author Comment

by:numberkruncher
ID: 24250144
Thanks for your example kmartin7, I will give that a try.

> No. I am sorry, but I thought you wanted different IDs each time. My bad. lol.
Perhaps I am not explaining myself very well.

I do want to generate different ID's each time. What I meant was, if something gets changed in the input document, then it will need to be re-transformed into HTML so that users can benefit from the latest version. Will the re-transformed version have the same UID's as previously?

This was working with "generate-id" of Saxon, but clearly this should not have been something to rely upon according to the W3C specification.
0
 
LVL 11

Expert Comment

by:kmartin7
ID: 24250191
>Perhaps I am not explaining myself very well.
No, it was me. I was horribly exhausted last night when I originally replied, and went to bed soon afterwards. I just didn't read it well.

>something gets changed in the input document, then it will need to be re-transformed into HTML so that users can benefit from the latest version.
Does this mean new elements can be added? If new elements are added, this will be difficult to maintain ID values. If only data within elements change, then it is possible
0
 
LVL 13

Author Comment

by:numberkruncher
ID: 24250690
Yes, new elements can be added, and elements can also be removed.

I am starting to think that it might be simpler to run the input document through another transform which generates the missing "xml:id" attributes whilst preserving the present ones. Then replace the input document with the updated input document.

I think that this would probably resolve all of these issues, because the id's will then become fixed.

I am not entirely sure how I would do this. I think it would be a case of:
   1. Update input document identifiers and place within variable.
   2. Output contents of variable to overwrite the input document.
   3. Perform phase#1 transform to variable and store in another variable.
   4. Then perform phase#2 transform on the second variable and output HTML.

What do you think of this idea? is it feasible? would it be best to implement this purely in XSLT 2.0 or to write a C# wrapper?
0
 
LVL 11

Expert Comment

by:kmartin7
ID: 24252558
It's hard to say, without actually seeing your input and expected output. It sounds like (from your last post) that some of the original input XML has IDs? Is this the case?

You might be able to use the document() function and combine new elements with your existing output, this will retain previous ID elements. Might be tricky, but also might be doable.
0
 
LVL 13

Author Comment

by:numberkruncher
ID: 24252865
Yes, the input already contains numerous id's, so I was suggesting what if I overwrote the original input file with one which was fully populated with id's. That way id's would be preserved even when content is added, changed or removed.

Would the document function work in this instance? From my understanding Saxon caches content by path, so if I attempt to re-open the input file (even though it has been modified), it would still show the original cached version.
0
 
LVL 11

Expert Comment

by:kmartin7
ID: 24253629
It seems that you should write a choose statement that retains the IDs from the existing document, otherwise add using generate-id(). Perhaps you could post some before and after code snippets? That would go a long way towards helping me understand exactly what you need.
0
 
LVL 13

Author Comment

by:numberkruncher
ID: 24254598
I don't have a problem with adding the missing id's.

Basically I am suggesting,

To avoid:
  - Conflicting id's (because as you say, technically I am working with multiple documents).
  - Problems with inconsistent id's where content has been changed in the source file.

That I could:
   - Open the input file, for all elements that do not have id's, generate a unique one. For all elements that already have one, simply preserve it.
   - Overwrite the input file with the one which now has id's for all elements.
   - Generate the phase #1, and finally phase #2 transforms.

Essentially, this would mean that I do not need to generate id's during phase #1 because the input document will already have plenty sufficient id's. Phase #2 will still need to generate unique id's, BUT I can easily prefix these with a unique prefix to avoid conflict.

My question is simply, do you recommend adding an additional 3rd parse to ensure that all elements have identifiers? To me it seams like something which should be relitively straightforward, but to what cost. Is it going to kill my web server I guess is the question.

I am not sure if there a way in XSLT to overwrite and reload the input file to continue with phase #1, or whether I would need to write this functionality into the website using C#?

Thanks for your persistence kmartin7, it is greatly appreciated!
0
 
LVL 11

Accepted Solution

by:
kmartin7 earned 1500 total points
ID: 24259632
>Open the input file, for all elements that do not have id's, generate a unique one. For all elements that already have one, simply preserve it.

This is completely doable.

>Overwrite the input file with the one which now has id's for all elements.

This is confusing to me. "Overwrite the input file" Are you sure you want to overwrite (replace) the input file?
>My question is simply, do you recommend adding an additional 3rd parse to ensure that all elements have identifiers?

Do you need all elements to have IDs? Unless you plan to access them in some way, there is no need to. So in this case (as I understand it) I would not recommend a 3rd phase. Is there a reason to ensure all elements have an ID?
I apologize for not having a better understanding - I just want to make sure I fully grasp what your issues are.
Thanks,
kmartin7
0
 
LVL 13

Author Comment

by:numberkruncher
ID: 24259829
> I apologize for not having a better understanding - I just want to make sure I fully grasp what your issues are.

No problem, I am probably not explaining myself very well.

> This is confusing to me. "Overwrite the input file" Are you sure you want to overwrite (replace) the input file?

You are right, there should not be any reasons to overwrite the input file. I am making this problem far more complicated than it really needs to be. When multiple transform parses are undertaken the possibility of clashing generated identifiers can be avoided simply by adding a unique prefix to the generated value.

I have written a function which automatically tests for an existing "xml:id", and if none is present, it generates one and adds a unique prefix. In the second document parse I can then resolve identifiers with a different unique prefix (see below).

Thanks for your guidance, whilst it has not directly solved my problem, it has certainly pointed me in the right direction.
<!-- Normal "CUSTOM FORMAT" function used during phase #2 -->
<xsl:function name="doc:resolve-id">
    <xsl:param name="node"/>
    <xsl:value-of select="if(string-length($node/@xml:id)>0)then $node/@xml:id else concat('doc-',generate-id($node))"/>
</xsl:function>
 
<!-- Specialized "DATA" function used during phase #1 -->
<xsl:function name="ex1:resolve-id">
    <xsl:param name="node"/>
    <xsl:value-of select="if(string-length($node/@xml:id)>0)then $node/@xml:id else concat('ex1-',generate-id($node))"/>
</xsl:function>

Open in new window

0
 
LVL 13

Author Comment

by:numberkruncher
ID: 24259842
This function is used as:
<xsl:variable name="local-id" select="doc:resolve-id(.)"/>
 
<!-- Variable then contains an ID -->

Open in new window

0
 
LVL 13

Author Closing Comment

by:numberkruncher
ID: 31575253
Thanks again!
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
Create a Windows 10 custom Image with custom task bar and custom start menu using XML for deployment.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).
Suggested Courses

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question