Solved

Extra Spacing In Pattern Matching

Posted on 2014-02-21
13
311 Views
Last Modified: 2014-02-21
This is a follow on from this question
http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_28370416.html

This is my regex
<b>([\s|\r\n]+)?<a href="(.+)?.html">(.+)?([\s|\r\n]+)?(.+)</a></b>

Open in new window


This is the replacement I am using
<a href="$2.html">$3 $5</a>

Open in new window


Here is the sample data I am using.
1.  <p><b>  <a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto [LF] 
  Lease</a></b></p>

2.  <b><a href="http://www.thefrugallife.com/ants1.html">Click[LF]
      here!</a></b>

3.  <p><b>
          <a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Car</a></b> [LF]
</p>

Open in new window


Now here are the results I am getting
1.  <p><a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto  Lease</a></p>

2.  <a href="http://www.thefrugallife.com/ants1.html">Click here!</a>

3.  <p><a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Ca r</a> 
</p>

Open in new window


Putting the space in the replacement between $3 and $5, I get good results on #2 but #1 has two spaces between Auto and Lease and  there is a problem in #3 with the word Ca r being represented with a space between the a and the r.

The problem seems to be the LF represented above as [LF] sometimes has a space between it and the preceding text.  So how can I account for this in the regex?

Thanks,

Randal
0
Comment
Question by:sharingsunshine
  • 7
  • 6
13 Comments
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39877286
I think you might need 2 regex'es. The first:
<b>[\s|\r\n]*<a href="(.*).html">((.|\r\n)*?)</a>[\s|\r\n]*</b>

Open in new window

with the replacement
<a href="$1.html">$2</a>

Open in new window

will take care of the <b> tags, leaving only the line breaks to be solved.

HTH,
Dan
0
 

Author Comment

by:sharingsunshine
ID: 39877314
that will only take care of the 3rd example of data I am using.  1 & 2 are being passed over.
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39877315
For the line breaks you can use:

regex: .html">(.*?)[\r\n]+\s*?(\w+[\w\s]*)</a>
repl: .html">$1$2</a>

Open in new window

0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39877320
Weird. In RegexBuddy, the result of the replacements are:

<a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto [LF] 
  Lease</a>
<a href="http://www.thefrugallife.com/ants1.html">Click[LF]
      here!</a>
<a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Car</a>

Open in new window

0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39877387
Just tested in Dreamweaver CS6, using your test data.

before replaceafter replace
0
 

Author Comment

by:sharingsunshine
ID: 39877389
I have RegexBuddy too and it won't highlight the 1st and 2nd examples.  Consequently, when I do the replace it only removes the <b> tags from the 3rd one.

Are you using Javascript as the regex engine?
0
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

 

Author Comment

by:sharingsunshine
ID: 39877393
then there must be something different about my test data from yours.  So how do we find the differences?
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39877418
RegexBuddy is just a testing tool. Instead of figuring out what's different between my setup and yours (I'm on a PC, you're on a Mac, and the line breaks are different - /r/n vs /n) let's focus on the end result.

Where do you want to use that regex? In Dreamweaver on multiple files?
On a web page?
0
 

Author Comment

by:sharingsunshine
ID: 39877483
in dreamwever CS5 and on on all the files in the site.  If this matters, my RegexBuddy is on Windows 7 too.  I have VMWare Fusion to do both.
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39877496
OK, that means your sample was mangled by the formatting on EE.

Please post the sample as a file, so the line breaks aren't modified.
0
 

Author Comment

by:sharingsunshine
ID: 39877653
this is a page like I am trying to pattern match.  

123.html

Thanks,
0
 
LVL 34

Accepted Solution

by:
Dan Craciun earned 500 total points
ID: 39877738
Yup, it's the \r\n vs \n problem.

Opened your file in HxD. The line endings are all 0x0A (\n)

I opened your file in DW, and my regex worked, no problems. You have the result attached.

After saving, the end lines were all 0x0D0A (\r\n). Which means DW changed the endings on open to match Windows style.

Long story short, if you use DW on mac, use the following:
find: <b>[\s|\n]*<a href="(.*).html">((.|\n)*?)</a>[\s|\n]*</b>
replace: <a href="$1.html">$2</a>

Open in new window

123-mod.html
0
 

Author Closing Comment

by:sharingsunshine
ID: 39877922
that's great, thanks for getting down to the bottom of the problem.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
The error "There was an error performing the update" occurred on a Mac OS X client workstation running  Symantec AntiVirus for Mac (http://www.symantec.com/business/products/purchasing.jsp?pcid=pcat_security&pvid=825_1) - the Enterprise product vers…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

26 Experts available now in Live!

Get 1:1 Help Now