Extra Spacing In Pattern Matching

This is a follow on from this question
http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_28370416.html

This is my regex
<b>([\s|\r\n]+)?<a href="(.+)?.html">(.+)?([\s|\r\n]+)?(.+)</a></b>

Open in new window


This is the replacement I am using
<a href="$2.html">$3 $5</a>

Open in new window


Here is the sample data I am using.
1.  <p><b>  <a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto [LF] 
  Lease</a></b></p>

2.  <b><a href="http://www.thefrugallife.com/ants1.html">Click[LF]
      here!</a></b>

3.  <p><b>
          <a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Car</a></b> [LF]
</p>

Open in new window


Now here are the results I am getting
1.  <p><a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto  Lease</a></p>

2.  <a href="http://www.thefrugallife.com/ants1.html">Click here!</a>

3.  <p><a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Ca r</a> 
</p>

Open in new window


Putting the space in the replacement between $3 and $5, I get good results on #2 but #1 has two spaces between Auto and Lease and  there is a problem in #3 with the word Ca r being represented with a space between the a and the r.

The problem seems to be the LF represented above as [LF] sometimes has a space between it and the preceding text.  So how can I account for this in the regex?

Thanks,

Randal
sharingsunshineAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Dan CraciunConnect With a Mentor IT ConsultantCommented:
Yup, it's the \r\n vs \n problem.

Opened your file in HxD. The line endings are all 0x0A (\n)

I opened your file in DW, and my regex worked, no problems. You have the result attached.

After saving, the end lines were all 0x0D0A (\r\n). Which means DW changed the endings on open to match Windows style.

Long story short, if you use DW on mac, use the following:
find: <b>[\s|\n]*<a href="(.*).html">((.|\n)*?)</a>[\s|\n]*</b>
replace: <a href="$1.html">$2</a>

Open in new window

123-mod.html
0
 
Dan CraciunIT ConsultantCommented:
I think you might need 2 regex'es. The first:
<b>[\s|\r\n]*<a href="(.*).html">((.|\r\n)*?)</a>[\s|\r\n]*</b>

Open in new window

with the replacement
<a href="$1.html">$2</a>

Open in new window

will take care of the <b> tags, leaving only the line breaks to be solved.

HTH,
Dan
0
 
sharingsunshineAuthor Commented:
that will only take care of the 3rd example of data I am using.  1 & 2 are being passed over.
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
Dan CraciunIT ConsultantCommented:
For the line breaks you can use:

regex: .html">(.*?)[\r\n]+\s*?(\w+[\w\s]*)</a>
repl: .html">$1$2</a>

Open in new window

0
 
Dan CraciunIT ConsultantCommented:
Weird. In RegexBuddy, the result of the replacements are:

<a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto [LF] 
  Lease</a>
<a href="http://www.thefrugallife.com/ants1.html">Click[LF]
      here!</a>
<a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Car</a>

Open in new window

0
 
Dan CraciunIT ConsultantCommented:
Just tested in Dreamweaver CS6, using your test data.

before replaceafter replace
0
 
sharingsunshineAuthor Commented:
I have RegexBuddy too and it won't highlight the 1st and 2nd examples.  Consequently, when I do the replace it only removes the <b> tags from the 3rd one.

Are you using Javascript as the regex engine?
0
 
sharingsunshineAuthor Commented:
then there must be something different about my test data from yours.  So how do we find the differences?
0
 
Dan CraciunIT ConsultantCommented:
RegexBuddy is just a testing tool. Instead of figuring out what's different between my setup and yours (I'm on a PC, you're on a Mac, and the line breaks are different - /r/n vs /n) let's focus on the end result.

Where do you want to use that regex? In Dreamweaver on multiple files?
On a web page?
0
 
sharingsunshineAuthor Commented:
in dreamwever CS5 and on on all the files in the site.  If this matters, my RegexBuddy is on Windows 7 too.  I have VMWare Fusion to do both.
0
 
Dan CraciunIT ConsultantCommented:
OK, that means your sample was mangled by the formatting on EE.

Please post the sample as a file, so the line breaks aren't modified.
0
 
sharingsunshineAuthor Commented:
this is a page like I am trying to pattern match.  

123.html

Thanks,
0
 
sharingsunshineAuthor Commented:
that's great, thanks for getting down to the bottom of the problem.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.