Link to home
Create AccountLog in
Avatar of Gary Samuels
Gary SamuelsFlag for United States of America

asked on

A regular expression to add a digit to the end

I have strings that start with product_code="2   the number 2 can be followed by any number of digits or characters and ends with a quote mark. Any random string may follow the end quote, for example:

product_code="24967">
product_code="24967" attribute_code="Color">
product_code="260088MP" attribute_code="Optional Gems">
product_code="24967-A">

Can someone help me out with a regular expression that would add the number 3 to the end of each product code, for example:

product_code="249673">
product_code="249673" attribute_code="Color">
product_code="260088MP3" attribute_code="Optional Gems">
product_code="24967-A3">

Thanks
Avatar of Gertone (Geert Bormans)
Gertone (Geert Bormans)
Flag of Belgium image

I can highly recommend that you don't change XML without using XML tools
It generally is not a good idea to change a XML with regular expressions

A very simple XSLT would do the trick

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="@product_code">
        <xsl:attribute name="product_code">
            <xsl:value-of select="."/>
            <xsl:text>3</xsl:text>
        </xsl:attribute>
    </xsl:template>
</xsl:stylesheet>

Open in new window


you are not very explicit about which programming language you want to use.
Well, you can use XSLT in about any programming language, or standalone on the command-line. Just let me know how you need to run this
If you'd like to disregard Gertone's good advice, you'd want something like:
$string = preg_replace ('/(product_code="2[^"]*)"/', '${1}3"', $string);

It can't be run more than once, or it will add another 3 to the end of the code. Ensure you backup your data first in case it doesn't work as expected.
Avatar of Gary Samuels

ASKER

I have a text file, formated like a XML file, which will be used to import data into a database. After the import the file will be trashed. I simply need to append the number 3 to the end of each product code before I import the data. So an XSL will not help.

It's really nothing more that a search and replace routine I need to do. Because it's a text file so there's really no programing language involved.

Attached is a small portion of the file.
sample.xml
Sorry, I should be been more clear about what I need. I'm looking for something that can be used in a search and replace text editor which accepts regular expressions.
Text editors generally don't have great regular expression engines. If you have PHP or perl, then a very short script will easily do the trick. Do you have either?
Not really, other than Dreamweaver wich can be used to build PHP pages.  I've been using Dreamweaver as a text editor and I discovered that searching for: product_code="2[A-Za-z0-9._-]* will correctly select the string I'm looking for but I can't figure how how to replace the string with string&3.
ASKER CERTIFIED SOLUTION
Avatar of mark_harris231
mark_harris231
Flag of United States of America image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
As an aside, I can strongly recommend the freeware editor, Programmer's Notepad (www.pnotepad.org).  It supports a wide variety of programming formats and includes a pretty robust regex engine.
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Good catch, Terry.  I may be wrong, but I think regex is limited to 9 or less capture groups due to the inherent issue of referencing two-digit capture groups (although some languages support "named" capture groups).
How strange. I tried this across several text editors that allow regular expression and got the same results on all of them.

Return: $1 3$2  = product_code="260162 3">

If I remove the space after the $1 then it returns, $13"

I then tried it in a little $3.00 app for mac called "Patterns" and it works!

YaHoo! thanks
well yeah, your detail on your workflow was not too explicit :-)

given that you load the XML into a database, have you considered manipulating the value on import. It is likely easy to tell the database to concat a 3 to that value on import... just a thought.

This "pattern" app, is that embeddable in dreamweaver? Or did you decide to use another editor alongside dreamweaver. If the latter, so changing the workflow, why not change it for real automation?

again, I would like to note one of the risks of XML text processing...

an attribute value delimiter in XML can be both a " or a '
both are considered equivalent by an XML parser
and for that you can not reliably predict which one is used by the serialisation mechanism of an XML tool
so it is " now, it can be ' tomorrow, without a warning (serious, I have been there many times before)

Even if you control the serialisation of the XML (and if you do, why not add the 3 at serialisation time?) I would at least change the regex from
(product_code="2[A-Za-z0-9._-]*?)(")
to
(product_code=\.2[A-Za-z0-9._-]*?)(\.)
or
(product_code=["']2[A-Za-z0-9._-]*?)(["'])
"Patterns" is an app for Mac OSx and cannot be embedded in Dreamweaver.

This was a one-time project. The XML formatted text file has now been imported and I'll never have to do it again. Thanks to all for the help.