Solved

Extra Spacing In Pattern Matching

Posted on 2014-02-21
13
308 Views
Last Modified: 2014-02-21
This is a follow on from this question
http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_28370416.html

This is my regex
<b>([\s|\r\n]+)?<a href="(.+)?.html">(.+)?([\s|\r\n]+)?(.+)</a></b>

Open in new window


This is the replacement I am using
<a href="$2.html">$3 $5</a>

Open in new window


Here is the sample data I am using.
1.  <p><b>  <a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto [LF] 
  Lease</a></b></p>

2.  <b><a href="http://www.thefrugallife.com/ants1.html">Click[LF]
      here!</a></b>

3.  <p><b>
          <a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Car</a></b> [LF]
</p>

Open in new window


Now here are the results I am getting
1.  <p><a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto  Lease</a></p>

2.  <a href="http://www.thefrugallife.com/ants1.html">Click here!</a>

3.  <p><a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Ca r</a> 
</p>

Open in new window


Putting the space in the replacement between $3 and $5, I get good results on #2 but #1 has two spaces between Auto and Lease and  there is a problem in #3 with the word Ca r being represented with a space between the a and the r.

The problem seems to be the LF represented above as [LF] sometimes has a space between it and the preceding text.  So how can I account for this in the regex?

Thanks,

Randal
0
Comment
Question by:sharingsunshine
  • 7
  • 6
13 Comments
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
I think you might need 2 regex'es. The first:
<b>[\s|\r\n]*<a href="(.*).html">((.|\r\n)*?)</a>[\s|\r\n]*</b>

Open in new window

with the replacement
<a href="$1.html">$2</a>

Open in new window

will take care of the <b> tags, leaving only the line breaks to be solved.

HTH,
Dan
0
 

Author Comment

by:sharingsunshine
Comment Utility
that will only take care of the 3rd example of data I am using.  1 & 2 are being passed over.
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
For the line breaks you can use:

regex: .html">(.*?)[\r\n]+\s*?(\w+[\w\s]*)</a>
repl: .html">$1$2</a>

Open in new window

0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
Weird. In RegexBuddy, the result of the replacements are:

<a href="http://www.thefrugallife.com/auto_lease.html">Getting Out of an Auto [LF] 
  Lease</a>
<a href="http://www.thefrugallife.com/ants1.html">Click[LF]
      here!</a>
<a href="http://www.thefrugallife.com/new_car.html">New Car vs. Used Car</a>

Open in new window

0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
Just tested in Dreamweaver CS6, using your test data.

before replaceafter replace
0
 

Author Comment

by:sharingsunshine
Comment Utility
I have RegexBuddy too and it won't highlight the 1st and 2nd examples.  Consequently, when I do the replace it only removes the <b> tags from the 3rd one.

Are you using Javascript as the regex engine?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:sharingsunshine
Comment Utility
then there must be something different about my test data from yours.  So how do we find the differences?
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
RegexBuddy is just a testing tool. Instead of figuring out what's different between my setup and yours (I'm on a PC, you're on a Mac, and the line breaks are different - /r/n vs /n) let's focus on the end result.

Where do you want to use that regex? In Dreamweaver on multiple files?
On a web page?
0
 

Author Comment

by:sharingsunshine
Comment Utility
in dreamwever CS5 and on on all the files in the site.  If this matters, my RegexBuddy is on Windows 7 too.  I have VMWare Fusion to do both.
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
OK, that means your sample was mangled by the formatting on EE.

Please post the sample as a file, so the line breaks aren't modified.
0
 

Author Comment

by:sharingsunshine
Comment Utility
this is a page like I am trying to pattern match.  

123.html

Thanks,
0
 
LVL 34

Accepted Solution

by:
Dan Craciun earned 500 total points
Comment Utility
Yup, it's the \r\n vs \n problem.

Opened your file in HxD. The line endings are all 0x0A (\n)

I opened your file in DW, and my regex worked, no problems. You have the result attached.

After saving, the end lines were all 0x0D0A (\r\n). Which means DW changed the endings on open to match Windows style.

Long story short, if you use DW on mac, use the following:
find: <b>[\s|\n]*<a href="(.*).html">((.|\n)*?)</a>[\s|\n]*</b>
replace: <a href="$1.html">$2</a>

Open in new window

123-mod.html
0
 

Author Closing Comment

by:sharingsunshine
Comment Utility
that's great, thanks for getting down to the bottom of the problem.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

If you are using Mac OS X and have a large number of login items set up in accounts, under system preferences, you may find that your computer is sluggish and unresponsive during startup until everything is done launching. Another problem that a…
Deploystudio is a system which can be used to deploy OSX clients and servers within the small/medium or large business environments. The system is built ontop of the OSX Server NetBoot system and uses images & workflows as its core assets. Although …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now