Solved

Regular Expression to match Xml tags

Posted on 2006-06-27
14
1,076 Views
Last Modified: 2013-11-19
I am writing a webDAV query which needs to use a property which is in hex (i.e. beings 0x)

This is not allowed in Xml elements.  Therefore I need to replace the 0x with x and vice versa as the information goes in and out of an Xml document.

I am looking for a regular expression that will do the replacement of 0x to x.  However it must only match the elements in hex, not data.

So basically

<proptag:0x823D0003>5</proptag:0x823D0003>
would need to become
<proptag:x823D0003>5</proptag:x823D0003>

<mapirecurring:0x00008223 dt:dt="boolean">1</mapirecurring:0x00008223>
would need to become
<mapirecurring:x00008223 dt:dt="boolean">1</mapirecurring:x00008223>

Note that the namespace prefix can vary, the actual hex value can vary, and there could be additional attributes like in the above dt:dt="boolean"

A regular expression that could be used to do this replacement (ideally that will work in ASP.NEt with C#) would be highly appreciated.  I have linked to this question from the C# area as well.
0
Comment
Question by:mrichmon
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 6
14 Comments
 
LVL 37

Expert Comment

by:Harisha M G
ID: 16996764
Hi,

Find: "(</?\w+:)0(x\d+)"
Replace with: $1$2

C#:

RegEx.Replace(yourString, @"(</?\w+:)0(x\d+)", "$1$2",
    RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline)


Hope that helps


---
Harish
0
 
LVL 35

Author Comment

by:mrichmon
ID: 17005745
Thanks for the comments, but it doesn't work at all.  There are several problems one of which I could have told you without even testing (but I did test)

1) You only look for digits after the 0x, but it is a hex number which means that there could be 0-9 or A-F or a-f
2) Even assuming that my hex number was all digits it doesn't work.

I tested like this:
string test = "<mapirecurring:0x00008223 dt:dt=\"boolean\">1</mapirecurring:0x00008223>"
Response.Write(Response.Write("Regex results: " + Regex.Replace("test", @"(</?\w+:)0(x\d+)", "$1$2", RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline));

And the result was:

Regex results: 1

So basically it turned my whole string into the number 1.  Not good.

Also can you explain the $1$2 notation?  I am guessin ghte problem is there, but don't know what it is to test.
0
 
LVL 37

Accepted Solution

by:
Harisha M G earned 500 total points
ID: 17005785
$1$2 is same as \1\2

If you put "test" inside quotes, what should it search ?                      V

Response.Write(Response.Write("Regex results: " + Regex.Replace(test, @"(</?\w+:)0(x[\da-f]+)", "$1$2", RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline));

Tested
<mapirecurring:0x00008223 dt:dt=\"boolean\">1</mapirecurring:0x00008223>
Returns
<mapirecurring:x00008223 dt:dt=\"boolean\">1</mapirecurring:x00008223>

Also note the changed regex.
0
Business Impact of IT Communications

What are the business impacts of how well businesses communicate during an IT incident? Targeting, speed, and transparency all matter. Find out more in this infographic.

 
LVL 35

Author Comment

by:mrichmon
ID: 17010472
If you put test in quotes it would search the string "test" - which is not what I want, but that was just a typing error when posting here to forum.

I double checked and my code is correct - no quotes around test.  test was the variable containing the string as I showed.

I still don't understand this: $1$2 is same as \1\2

Can you explain what it does?

However, I just tried you new expression and it still doesn't work - which doesn't suprise me since you only acocunted for the a-f - which I knew how to do, but not the fundamental problem.

<mapirecurring:0x00008223 dt:dt=\"boolean\">1</mapirecurring:0x00008223>
Returns
1

<mapirecurring:0x00008223 dt:dt=\"boolean\">Joe</mapirecurring:0x00008223>
Returns
Joe

So you are basically stripping out my entire xml tags and getting only the inner text.
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17013703
$1 means the first captured group, and $2 means the second.

Try \1\2 instead of $1$2 and see whether that corrects the problem
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17013722
Captured group is the match that occurs inside the parenthesis.

So,

$1 = \1 = (</?\w+:)

And

$2 = \2 = (x[\da-f]+)
0
 
LVL 35

Author Comment

by:mrichmon
ID: 17013805
When I try \1\2 I get a compilation error:  Compiler Error Message: CS1009: Unrecognized escape sequence

If I escpae the to "\\1\\2" or even @"\1\2"I get:

\1\2 dt:dt="boolean">Joe\1\2>
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17014048
http://www.fileformat.info/tool/regex.htm

Put appropriate values, and see whether it works for your various values.. I am not that good in ASP.NET (However I know C#)
0
 
LVL 35

Author Comment

by:mrichmon
ID: 17014683
Okay figured out the problem.  It was returning the correct results, just not displaying them to the screen.  My fault.  So I think it should work.

Thanks.
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17014688
Why "B" ? Did it not solve your problem ?

Anyways, glad to help
0
 
LVL 35

Author Comment

by:mrichmon
ID: 17014700
Whoops. meant to hit A.

I will fix it.
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17014729
Hey, didn't know you are a mod/page editor  !!

Which TA ?
0
 
LVL 35

Author Comment

by:mrichmon
ID: 17014781
I'm the primary PE in a bunch of small ones... Microsoft Project, EAI, SAP, etc
0
 
LVL 37

Expert Comment

by:Harisha M G
ID: 17014800
Ah, I see :)

Thanks for the grade correction !
0

Featured Post

Don't Cry: How Liquid Web is Ensuring Security

WannaCry is just the start. Read how Liquid Web is protecting itself and its customers against new threats.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Styling your websites can become very complex. Here I'll show how SASS can help you better organize, maintain and reuse your CSS code.
Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question