Solved

Extra Spacing In Pattern Matching

Posted on 2014-02-21
13
339 Views
Last Modified: 2014-02-21
This is a follow on from this question
http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_28370416.html

This is my regex
<b>([\s|\r\n]+)?<a href="(.+)?.html">(.+)?([\s|\r\n]+)?(.+)</a></b>

Open in new window


This is the replacement I am using
<a href="$2.html">$3 $5</a>

Open in new window


Here is the sample data I am using.
1.  <p><b>  <a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto [LF] 
  Lease</a></b></p>

2.  <b><a href="http://www.thefrugallife.com/ants1.html">Click[LF]
      here!</a></b>

3.  <p><b>
          <a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Car</a></b> [LF]
</p>

Open in new window


Now here are the results I am getting
1.  <p><a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto  Lease</a></p>

2.  <a href="http://www.thefrugallife.com/ants1.html">Click here!</a>

3.  <p><a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Ca r</a> 
</p>

Open in new window


Putting the space in the replacement between $3 and $5, I get good results on #2 but #1 has two spaces between Auto and Lease and  there is a problem in #3 with the word Ca r being represented with a space between the a and the r.

The problem seems to be the LF represented above as [LF] sometimes has a space between it and the preceding text.  So how can I account for this in the regex?

Thanks,

Randal
0
Comment
Question by:sharingsunshine
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
13 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39877286
I think you might need 2 regex'es. The first:
<b>[\s|\r\n]*<a href="(.*).html">((.|\r\n)*?)</a>[\s|\r\n]*</b>

Open in new window

with the replacement
<a href="$1.html">$2</a>

Open in new window

will take care of the <b> tags, leaving only the line breaks to be solved.

HTH,
Dan
0
 

Author Comment

by:sharingsunshine
ID: 39877314
that will only take care of the 3rd example of data I am using.  1 & 2 are being passed over.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39877315
For the line breaks you can use:

regex: .html">(.*?)[\r\n]+\s*?(\w+[\w\s]*)</a>
repl: .html">$1$2</a>

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39877320
Weird. In RegexBuddy, the result of the replacements are:

<a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto [LF] 
  Lease</a>
<a href="http://www.thefrugallife.com/ants1.html">Click[LF]
      here!</a>
<a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Car</a>

Open in new window

0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39877387
Just tested in Dreamweaver CS6, using your test data.

before replaceafter replace
0
 

Author Comment

by:sharingsunshine
ID: 39877389
I have RegexBuddy too and it won't highlight the 1st and 2nd examples.  Consequently, when I do the replace it only removes the <b> tags from the 3rd one.

Are you using Javascript as the regex engine?
0
 

Author Comment

by:sharingsunshine
ID: 39877393
then there must be something different about my test data from yours.  So how do we find the differences?
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39877418
RegexBuddy is just a testing tool. Instead of figuring out what's different between my setup and yours (I'm on a PC, you're on a Mac, and the line breaks are different - /r/n vs /n) let's focus on the end result.

Where do you want to use that regex? In Dreamweaver on multiple files?
On a web page?
0
 

Author Comment

by:sharingsunshine
ID: 39877483
in dreamwever CS5 and on on all the files in the site.  If this matters, my RegexBuddy is on Windows 7 too.  I have VMWare Fusion to do both.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39877496
OK, that means your sample was mangled by the formatting on EE.

Please post the sample as a file, so the line breaks aren't modified.
0
 

Author Comment

by:sharingsunshine
ID: 39877653
this is a page like I am trying to pattern match.  

123.html

Thanks,
0
 
LVL 35

Accepted Solution

by:
Dan Craciun earned 500 total points
ID: 39877738
Yup, it's the \r\n vs \n problem.

Opened your file in HxD. The line endings are all 0x0A (\n)

I opened your file in DW, and my regex worked, no problems. You have the result attached.

After saving, the end lines were all 0x0D0A (\r\n). Which means DW changed the endings on open to match Windows style.

Long story short, if you use DW on mac, use the following:
find: <b>[\s|\n]*<a href="(.*).html">((.|\n)*?)</a>[\s|\n]*</b>
replace: <a href="$1.html">$2</a>

Open in new window

123-mod.html
0
 

Author Closing Comment

by:sharingsunshine
ID: 39877922
that's great, thanks for getting down to the bottom of the problem.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Deploystudio is a system which can be used to deploy OSX clients and servers within the small/medium or large business environments. The system is built ontop of the OSX Server NetBoot system and uses images & workflows as its core assets. Although …
I was prompted to write this article after the recent World-Wide Ransomware outbreak. For years now, System Administrators around the world have used the excuse of "Waiting a Bit" before applying Security Patch Updates. This type of reasoning to me …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

690 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question