generate-id, input document, intermediate document

To simplify things I have decided to first transform the input document into an intermediate form, and then to transform that into the final output.

During both phases I am using "generate-id" to form internal links within the document. In phase #1 the id's are generated from the input document. In phase #2 the id's are generated from the intermediate document.

Here is an example:

INPUT DOCUMENT:

<input-root>
  <a/>
</input-root>

INTERMEDIATE DOCUMENT:

<int-root>
   <a xml:id="GeneratedID1"/>
   <b/>
</int-root>

OUTPUT

<output-root>
  <a xml:id="GeneratedID1"/>
  <b xml:id="GeneratedID2"/>
</output-root>

Is it possible that "generate-id" will return "GeneratedID1" again despite being from a different document tree, but of a similar structure?

Phase #1 is stored in a variable, and then Phase #2 is generated from that. So both parses are completed within the same transform.

I am using XSLT 2.0 with Saxon.
LVL 13
numberkruncherAsked:
Who is Participating?
 
kmartin7Connect With a Mentor Commented:
>Open the input file, for all elements that do not have id's, generate a unique one. For all elements that already have one, simply preserve it.

This is completely doable.

>Overwrite the input file with the one which now has id's for all elements.

This is confusing to me. "Overwrite the input file" Are you sure you want to overwrite (replace) the input file?
>My question is simply, do you recommend adding an additional 3rd parse to ensure that all elements have identifiers?

Do you need all elements to have IDs? Unless you plan to access them in some way, there is no need to. So in this case (as I understand it) I would not recommend a 3rd phase. Is there a reason to ensure all elements have an ID?
I apologize for not having a better understanding - I just want to make sure I fully grasp what your issues are.
Thanks,
kmartin7
0
 
kmartin7Commented:
>Is it possible that "generate-id" will return "GeneratedID1" again despite being from a different document tree, but of a similar structure?

I am a little confused as to exact what you are doing, and what you want to get, but using generate-id() will most likely duplicate id values in different documents. But not knowing exactly how you are going about doing it, I must reserve specific comment.

Can you expound on exactly what you are trying to do?

0
 
numberkruncherAuthor Commented:
I am using a custom document format for technical documentation. The XSLT uses "generate-id" to construct page jumps between various elements within the output document. This part of the system is almost set in stone, but I could make minor amendments if necessary.

This custom document is transformed into HTML output.

The input document contains data of a different nature which:
   - Contains a subset of the custom document format for documentation purposes.
   - During transformation into HTML can benefit from some of the custom document format.

DATA (WITH SUBSET OF CUSTOM FORMAT)  =>  CUSTOM FORMAT => HTML


   <!-- Phase #1 -->
   
     
   

   <!-- Phase #2 -->
   


During phase #1 page jumps are constructed using "generate-id" from the DATA nodes.

During phase #2 other page jumps are constructed using "generate-id" from the intermediate CUSTOM FORMAT nodes.

I actually have an implementation of this which appears to be working perfectly. But I need to make sure that the system is not going to fail in freak cases where Phase #1 and Phase #2 both generate the same unique ID.

I hope that this helps clarify my question.
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
kmartin7Commented:
It is my understanding that you cannot be certain that the generate-id() function will not always generate unique IDs between disparate documents. Even the W3C spec states that you can't be certain that duplicates might exist within a document:

16.6.4 generate-id
generate-id() as xs:string
generate-id($node as node()?) as xs:string
The generate-id function returns a string that uniquely identifies a given node. The unique identifier must consist of ASCII alphanumeric characters and must start with an alphabetic character. Thus, the string is syntactically an XML name. An implementation is free to generate an identifier in any convenient way provided that it always generates the same identifier for the same node and that different identifiers are always generated from different nodes. An implementation is under no obligation to generate the same identifiers each time a document is transformed. There is no guarantee that a generated unique identifier will be distinct from any unique IDs specified in the source document. If the argument is the empty sequence, the result is the zero-length string. If the argument is omitted, it defaults to the context node. (emphasis mine).
We use an exslt uuid implementation that we include in our XSLTs. You call it basically the same, but you must use a uuid namespace to implement. Let me know if you are interested and I'll post (probably tomorrow since I am getting ready to go to bed for the night).


0
 
numberkruncherAuthor Commented:
How do UUID values work around this issue?

Are generated UUID values compatible with HTML ID's so that they can be accessed via JavaScript with "document.getElementById"?
0
 
kmartin7Commented:
Using this will ensure you get a unique ID on every element - even between mutliple documents. No ID is the same. It is an NCName value, which is proper for ID values.
0
 
numberkruncherAuthor Commented:
Will this function always return the same UID for the same element of the same document?
0
 
kmartin7Commented:
You can at least try it, if you want. Copy the code below and create a new XSLT named "uuid.xsl", then include it into your existing XSLT using the include element. You must also declare an additional namespace, "uuid" as shown below (you can choose to exclude any result prefixes like I have shown below):

<xsl:stylesheet xmlns:uuid="http://www.uuid.org" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" exclude-result-prefixes="uuid">

<!-- here is the include --><xsl:include href="uuid.xsl"/>

<!-- rest of your existing stylesheet here -->

</xsl:stylesheet>

Then in your existing XSLT, replace each 'gnerate-id() with the uuid function like so (or similarly):

<xsl:attribute name="id">
      <xsl:value-of select="uuid:get-id()"/>
</xsl:attribute>

Let me know,

kmartin7
<xsl:stylesheet xmlns:uuid="http://www.uuid.org" xmlns:math="http://exslt.org/math" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
	<!-- Functions in the uuid: namespace are used to calculate a UUID The method used is a derived timestamp method, 
	which is explained here: http://www.famkruithof.net/guid-uuid-timebased.html 
	and here: http://www.ietf.org/rfc/rfc4122.txt -->
	
	<!-- Returns the UUID -->
	<xsl:function name="uuid:get-uuid" as="xs:string*">
		<xsl:variable name="ts" select="uuid:ts-to-hex(uuid:generate-timestamp())"/>
		<xsl:value-of separator="-" select=" substring($ts, 8, 8), substring($ts, 4, 4), string-join((uuid:get-uuid-version(), substring($ts, 1, 3)), ''), uuid:generate-clock-id(), uuid:get-network-node()"/>
	</xsl:function>
 
	<!-- internal aux. fu with Saxon, this creates a more-unique result with generate-id than when just using a variable containing a node -->
	<xsl:function name="uuid:_get-node">
		<xsl:comment/>
	</xsl:function>
 
	<!-- generates some kind of unique id -->
	<xsl:function name="uuid:get-id" as="xs:string">
		<xsl:sequence select="generate-id(uuid:_get-node())"/>
	</xsl:function>
 
	<!-- should return the next nr in sequence, but this can't be done in xslt. Instead, it returns a guaranteed unique number -->
	<xsl:function name="uuid:next-nr" as="xs:integer">
		<xsl:variable name="node">
			<xsl:comment/>
		</xsl:variable>
		<xsl:sequence select=" xs:integer(replace( generate-id($node), '\D', ''))"/>
	</xsl:function>
 
	<!-- internal fu for returning hex digits only -->
	<xsl:function name="uuid:_hex-only" as="xs:string">
		<xsl:param name="string"/>
		<xsl:param name="count"/>
		<xsl:sequence select=" substring(replace( $string, '[^0-9a-fA-F]', '') , 1, $count)"/>
	</xsl:function>
 
	<!-- may as well be defined as returning the same seq each time -->
	<xsl:variable name="_clock" select="uuid:get-id()"/>
	<xsl:function name="uuid:generate-clock-id" as="xs:string">
		<xsl:sequence select="uuid:_hex-only($_clock, 4)"/>
	</xsl:function>
 
	<!-- returns the network node, this one is 'random', but must be the same within calls. The least-significant bit must be '1' when it is not a real MAC address (in this case it is set to '1') -->
	<xsl:function name="uuid:get-network-node" as="xs:string">
		<xsl:sequence select="uuid:_hex-only('09-17-3F-13-E4-C5', 12)"/>
	</xsl:function>
 
	<!-- returns version, for timestamp uuids, this is "1" -->
	<xsl:function name="uuid:get-uuid-version" as="xs:string">
		<xsl:sequence select="'1'"/>
	</xsl:function>
 
	<!-- Generates a timestamp of the amount of 100 nanosecond intervals from 15 October 1582, in UTC time. -->
	<xsl:function name="uuid:generate-timestamp">
		<!-- date calculation automatically goes correct when you add the timezone information, in this case that is UTC. -->
		<xsl:variable name="duration-from-1582" as="xs:dayTimeDuration">
			<xsl:sequence select=" current-dateTime() - xs:dateTime('1582-10-15T00:00:00.000Z')"/>
		</xsl:variable>
		<xsl:variable name="random-offset" as="xs:integer">
			<xsl:sequence select="uuid:next-nr() mod 10000"></xsl:sequence>
		</xsl:variable>
 
		<!-- do the math to get the 100 nano second intervals -->
		<xsl:sequence select="(days-from-duration($duration-from-1582) * 24 * 60 * 60 + hours-from-duration($duration-from-1582) * 60 * 60 + minutes-from-duration($duration-from-1582) * 60 + seconds-from-duration($duration-from-1582)) * 1000 * 10000 + $random-offset"/>
	</xsl:function>
	<!-- simple non-generalized function to convert from timestamp to hex -->
	<xsl:function name="uuid:ts-to-hex">
		<xsl:param name="dec-val"/>
		<xsl:value-of separator="" select=" for $i in 1 to 15 return (0 to 9, tokenize('A B C D E F', ' ')) [ $dec-val idiv xs:integer(math:power(16, 15 - $i)) mod 16 + 1 ]"/>
	</xsl:function>
</xsl:stylesheet>

Open in new window

0
 
kmartin7Commented:
>Will this function always return the same UID for the same element of the same document?
No. I am sorry, but I thought you wanted different IDs each time. My bad. lol.
0
 
numberkruncherAuthor Commented:
Thanks for your example kmartin7, I will give that a try.

> No. I am sorry, but I thought you wanted different IDs each time. My bad. lol.
Perhaps I am not explaining myself very well.

I do want to generate different ID's each time. What I meant was, if something gets changed in the input document, then it will need to be re-transformed into HTML so that users can benefit from the latest version. Will the re-transformed version have the same UID's as previously?

This was working with "generate-id" of Saxon, but clearly this should not have been something to rely upon according to the W3C specification.
0
 
kmartin7Commented:
>Perhaps I am not explaining myself very well.
No, it was me. I was horribly exhausted last night when I originally replied, and went to bed soon afterwards. I just didn't read it well.

>something gets changed in the input document, then it will need to be re-transformed into HTML so that users can benefit from the latest version.
Does this mean new elements can be added? If new elements are added, this will be difficult to maintain ID values. If only data within elements change, then it is possible
0
 
numberkruncherAuthor Commented:
Yes, new elements can be added, and elements can also be removed.

I am starting to think that it might be simpler to run the input document through another transform which generates the missing "xml:id" attributes whilst preserving the present ones. Then replace the input document with the updated input document.

I think that this would probably resolve all of these issues, because the id's will then become fixed.

I am not entirely sure how I would do this. I think it would be a case of:
   1. Update input document identifiers and place within variable.
   2. Output contents of variable to overwrite the input document.
   3. Perform phase#1 transform to variable and store in another variable.
   4. Then perform phase#2 transform on the second variable and output HTML.

What do you think of this idea? is it feasible? would it be best to implement this purely in XSLT 2.0 or to write a C# wrapper?
0
 
kmartin7Commented:
It's hard to say, without actually seeing your input and expected output. It sounds like (from your last post) that some of the original input XML has IDs? Is this the case?

You might be able to use the document() function and combine new elements with your existing output, this will retain previous ID elements. Might be tricky, but also might be doable.
0
 
numberkruncherAuthor Commented:
Yes, the input already contains numerous id's, so I was suggesting what if I overwrote the original input file with one which was fully populated with id's. That way id's would be preserved even when content is added, changed or removed.

Would the document function work in this instance? From my understanding Saxon caches content by path, so if I attempt to re-open the input file (even though it has been modified), it would still show the original cached version.
0
 
kmartin7Commented:
It seems that you should write a choose statement that retains the IDs from the existing document, otherwise add using generate-id(). Perhaps you could post some before and after code snippets? That would go a long way towards helping me understand exactly what you need.
0
 
numberkruncherAuthor Commented:
I don't have a problem with adding the missing id's.

Basically I am suggesting,

To avoid:
  - Conflicting id's (because as you say, technically I am working with multiple documents).
  - Problems with inconsistent id's where content has been changed in the source file.

That I could:
   - Open the input file, for all elements that do not have id's, generate a unique one. For all elements that already have one, simply preserve it.
   - Overwrite the input file with the one which now has id's for all elements.
   - Generate the phase #1, and finally phase #2 transforms.

Essentially, this would mean that I do not need to generate id's during phase #1 because the input document will already have plenty sufficient id's. Phase #2 will still need to generate unique id's, BUT I can easily prefix these with a unique prefix to avoid conflict.

My question is simply, do you recommend adding an additional 3rd parse to ensure that all elements have identifiers? To me it seams like something which should be relitively straightforward, but to what cost. Is it going to kill my web server I guess is the question.

I am not sure if there a way in XSLT to overwrite and reload the input file to continue with phase #1, or whether I would need to write this functionality into the website using C#?

Thanks for your persistence kmartin7, it is greatly appreciated!
0
 
numberkruncherAuthor Commented:
> I apologize for not having a better understanding - I just want to make sure I fully grasp what your issues are.

No problem, I am probably not explaining myself very well.

> This is confusing to me. "Overwrite the input file" Are you sure you want to overwrite (replace) the input file?

You are right, there should not be any reasons to overwrite the input file. I am making this problem far more complicated than it really needs to be. When multiple transform parses are undertaken the possibility of clashing generated identifiers can be avoided simply by adding a unique prefix to the generated value.

I have written a function which automatically tests for an existing "xml:id", and if none is present, it generates one and adds a unique prefix. In the second document parse I can then resolve identifiers with a different unique prefix (see below).

Thanks for your guidance, whilst it has not directly solved my problem, it has certainly pointed me in the right direction.
<!-- Normal "CUSTOM FORMAT" function used during phase #2 -->
<xsl:function name="doc:resolve-id">
    <xsl:param name="node"/>
    <xsl:value-of select="if(string-length($node/@xml:id)>0)then $node/@xml:id else concat('doc-',generate-id($node))"/>
</xsl:function>
 
<!-- Specialized "DATA" function used during phase #1 -->
<xsl:function name="ex1:resolve-id">
    <xsl:param name="node"/>
    <xsl:value-of select="if(string-length($node/@xml:id)>0)then $node/@xml:id else concat('ex1-',generate-id($node))"/>
</xsl:function>

Open in new window

0
 
numberkruncherAuthor Commented:
This function is used as:
<xsl:variable name="local-id" select="doc:resolve-id(.)"/>
 
<!-- Variable then contains an ID -->

Open in new window

0
 
numberkruncherAuthor Commented:
Thanks again!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.