Tree regular expressions

Posted on 2007-08-12
Medium Priority
Last Modified: 2013-11-05

If I understood it correctly, RELAX NG is a tree regular expression language for xml, right?

With "classical" regular expressions, you can also use them to do substitutions, such as:
$ sed -e "s/\([0-9]\)\([a-z]\)/CHANGED(\2 \1)/g"
This is a sample text with 2a inside.
This is a sample text with CHANGED(a 2) inside.

Is it possible to use RELAX NG to match xml parts and make substitutions?
It would be an advanced version of XSLT.

For instance, I could look for "card" elements that do not have an email, and insert a default one.


Are you aware if a solution for this already exists?

Best regards,
DAvid Portabella

Question by:dportabella
  • 3
  • 2
LVL 60

Expert Comment

by:Geert Bormans
ID: 19678614
Relax NG above all is a schema language... it is for validation only.
One of the things it does differently compared to W3C schema is that it doesn't change the XML document
(W3C schema does things such as providing default values of attributes et al.)
Relax NG doesn't even do that, it just tells an XML document is valid according to the schema or not

With "tree regular expression" the definition means to express the following.
RelaxNG describes "patterns" of XML documents (like a regex describes a pattern of a string)
XML documents are like trees (hierarchical), hence RelaxNG describes tree patterns,
it is a sort of a regular expression language for XML documents
A RelaxNG validator checks whether there is a match between the patterns in the schema and the document,
if there is, the document is "valid"

You can't use a RelaxNG schema for changing your document
... that would be a transformation, which you need to express in XSLT.

You could build an application that alters the document, based on a relaxing schema
That application would be best developed in XSLT
(I am not aware that such an application floats around somewhere)
You could also abuse Schematrons allerting mechanism to achieve what you want,
but I am quite convinced that XSLT is really the way to go with this

Using XSLTs push mechanism with apply templates
and starting from an identity transform
the XSLT to achieve what you need would be fairly simple

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="node()">
        <xsl:copy-of select="@*"/>
        <xsl:apply-templates select="node()"/>
    <xsl:template match="card">
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"/>
            <xsl:if test="not(email)">
                <email><xsl:value-of select="name"/><xsl:text>@example.com</xsl:text></email>

but I suspect you need a more generic approach?




Author Comment

ID: 19678708

Thanks for the info.
With a regular expression, you can say whether a string is valid or not.
Furthermore, you can use it to transform the string.

I don't see why that could not be also the case with trees, instead of strings.

Thanks also for the XSLT example; however, the example of the email was a toy example.
As you say, I would need a more generic approach.

Before your message, I tried to look around for "Tree regular expressions substitutions" but I did not find anything. With your message, I realized that the correct keyword is "transformation" instead of "substitution". Looking around for "Tree regular expressions transformation", I found:
>Parse::Eyapp introduces a new language called Tree Regular Expressions that easies the transformation of trees.

which maybe could do the job.

However, I would prefer to find a more standard project (maybe using RELAX NG).

I do think that other people have thought of extending RELAX NG to use it also for transformation.
Any idea of how to find such a project? (even if it is not based on RELAX NG)

LVL 60

Accepted Solution

Geert Bormans earned 1500 total points
ID: 19678732
>I don't see why that could not be also the case with trees, instead of strings.

you are correct, there is no reason why that is not the case,
but RelaxNG aims at validation only, not substitution

I am not aware of projects extending RelaxNG in a way as you described.

I came accross this
I am not aware that this is useful, or that any of Tadeusz' work is somewhere online
but it might be worthwhile sending him an email



Author Comment

ID: 19679202
Geert, thanks for the info.

I wrote an email to him. Let's see.
I also realized that I only need to look for "transformation language for XML",
and several alternatives to XSLT appear.

One that seems what I was looking for is Xcerpt: https://sourceforge.net/projects/xcerpt/
which is also implemented in Java (which I need).
Unfortunately this project is in development and there is very few documentation.

So, still looking for an appropriate package.
The problem is not yet solved, but I think that I should already award you the points.

Many thanks,
DAvid Portabella
LVL 60

Expert Comment

by:Geert Bormans
ID: 19680125
good luck
(I ll give xcerpt a look)

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
We are witnesses that everyone is saying that our children shouldn't "play" with a technology because it is dangerous. This article is going to prove that they are wrong.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question