Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 204
  • Last Modified:

What regex will remove duplicate rel="nofolow" tags?

I had this question after viewing Python error - Need Help.

I created this regex to remove the duplicate rel="nofollow" tags using grep in TextWrangler but I am not clear how to add this into the Python regex code.

rel="nofollow"(\s|\n|\n\r)rel="nofollow"

Open in new window


replace with
rel="nofollow"

Open in new window

0
sharingsunshine
Asked:
sharingsunshine
  • 2
1 Solution
 
peprCommented:
With respect to your previous question, you can use the following code. However, you should consider a quick hack. It would not work if the original page contained the rel="nofollow" attribute in another location (that is the duplicates not being adjacent). The proper, robust solution would need the use of an HTML parser:
import urllib2
import re

website = urllib2.urlopen('http://www.theherbsplacenews.com/')
html = website.read()   # the content of the page

with open('original_document.html', 'w') as f:
    f.write(html)

rexURL = re.compile(r'("http://www\.theherbsplace\.com/.*?")')
result = rexURL.sub(r'\1 rel="nofollow"', html)

rexDoubledNofollow = re.compile(r'(rel="nofollow"\s*)+')
result = rexDoubledNofollow.sub(r'\1', result)

with open('new_document.html', 'w') as f:
    f.write(result)

Open in new window

The \s* means zero or more whitespace characters that include also tabs and newlines. It is added to the searched sequence and captured as a group of characters (enclosed in parentheses, later referred as \1 in the next sub call). The + after means one or more occurrences.
0
 
peprCommented:
I have noticed a bug in the original page:
<a 1="" href="http://www.theherbsplace.com/" imageanchor=" rel="nofollow" style="...

Open in new window


Notice the 1="" and the imageanchor=" without the enclosing double quote.
0
 
sharingsunshineAuthor Commented:
Thanks for the help.  On the other exceptions you pointed out I will just have to fix them as I find them.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now