Tree regular expressions

Posted on 2007-08-12
Last Modified: 2013-11-05

If I understood it correctly, RELAX NG is a tree regular expression language for xml, right?

With "classical" regular expressions, you can also use them to do substitutions, such as:
$ sed -e "s/\([0-9]\)\([a-z]\)/CHANGED(\2 \1)/g"
This is a sample text with 2a inside.
This is a sample text with CHANGED(a 2) inside.

Is it possible to use RELAX NG to match xml parts and make substitutions?
It would be an advanced version of XSLT.

For instance, I could look for "card" elements that do not have an email, and insert a default one.


Are you aware if a solution for this already exists?

Best regards,
DAvid Portabella

Question by:dportabella
    LVL 60

    Expert Comment

    by:Geert Bormans
    Relax NG above all is a schema language... it is for validation only.
    One of the things it does differently compared to W3C schema is that it doesn't change the XML document
    (W3C schema does things such as providing default values of attributes et al.)
    Relax NG doesn't even do that, it just tells an XML document is valid according to the schema or not

    With "tree regular expression" the definition means to express the following.
    RelaxNG describes "patterns" of XML documents (like a regex describes a pattern of a string)
    XML documents are like trees (hierarchical), hence RelaxNG describes tree patterns,
    it is a sort of a regular expression language for XML documents
    A RelaxNG validator checks whether there is a match between the patterns in the schema and the document,
    if there is, the document is "valid"

    You can't use a RelaxNG schema for changing your document
    ... that would be a transformation, which you need to express in XSLT.

    You could build an application that alters the document, based on a relaxing schema
    That application would be best developed in XSLT
    (I am not aware that such an application floats around somewhere)
    You could also abuse Schematrons allerting mechanism to achieve what you want,
    but I am quite convinced that XSLT is really the way to go with this

    Using XSLTs push mechanism with apply templates
    and starting from an identity transform
    the XSLT to achieve what you need would be fairly simple

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="" version="1.0">
    <xsl:template match="node()">
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates select="node()"/>
        <xsl:template match="card">
                <xsl:copy-of select="@*"/>
                <xsl:apply-templates select="node()"/>
                <xsl:if test="not(email)">
                    <email><xsl:value-of select="name"/><xsl:text></xsl:text></email>

    but I suspect you need a more generic approach?




    Author Comment


    Thanks for the info.
    With a regular expression, you can say whether a string is valid or not.
    Furthermore, you can use it to transform the string.

    I don't see why that could not be also the case with trees, instead of strings.

    Thanks also for the XSLT example; however, the example of the email was a toy example.
    As you say, I would need a more generic approach.

    Before your message, I tried to look around for "Tree regular expressions substitutions" but I did not find anything. With your message, I realized that the correct keyword is "transformation" instead of "substitution". Looking around for "Tree regular expressions transformation", I found:
    >Parse::Eyapp introduces a new language called Tree Regular Expressions that easies the transformation of trees.

    which maybe could do the job.

    However, I would prefer to find a more standard project (maybe using RELAX NG).

    I do think that other people have thought of extending RELAX NG to use it also for transformation.
    Any idea of how to find such a project? (even if it is not based on RELAX NG)

    LVL 60

    Accepted Solution

    >I don't see why that could not be also the case with trees, instead of strings.

    you are correct, there is no reason why that is not the case,
    but RelaxNG aims at validation only, not substitution

    I am not aware of projects extending RelaxNG in a way as you described.

    I came accross this
    I am not aware that this is useful, or that any of Tadeusz' work is somewhere online
    but it might be worthwhile sending him an email



    Author Comment

    Geert, thanks for the info.

    I wrote an email to him. Let's see.
    I also realized that I only need to look for "transformation language for XML",
    and several alternatives to XSLT appear.

    One that seems what I was looking for is Xcerpt:
    which is also implemented in Java (which I need).
    Unfortunately this project is in development and there is very few documentation.

    So, still looking for an appropriate package.
    The problem is not yet solved, but I think that I should already award you the points.

    Many thanks,
    DAvid Portabella
    LVL 60

    Expert Comment

    by:Geert Bormans
    good luck
    (I ll give xcerpt a look)

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Top 6 Sources for Identifying Threat Actor TTPs

    Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

    Suggested Solutions

    by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
    I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
    Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

    760 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    10 Experts available now in Live!

    Get 1:1 Help Now