Removing bad chars from URL with .htaccess

Posted on 2011-05-06
Medium Priority
Last Modified: 2012-05-11
Hi Experts,

I am having a problem on one of my websites. Google bot keeps on picking up url's like this: /life/mortgage-life-%E2%80%8Binsurance

When the actual URL is this: life/mortgage-life-insurance

So those characters convert to something that definitely isn't in any of my source files, so I am assuming that this is a link in from an external site. Where they are coming from doesn't really bother me, so what I need to do is create a htaccess mod_rewrite rule to remove those bad characters from the URL.

What I have come up with so far (by googling) is attached. It removes it from the URL however it doesn't then put the rest of the URL back in, so when it redirects it goes to life/mortgage-life-

and I need it to remember to put the insurance on to the end of it and basically only just remove %E2%80%8B.

How can I do this? I have tried a few regex creators but none seem to be able to do..

Many thanks!

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.*)\%e2%80%8b(.*)\ HTTP/ [NC] 
RewriteRule ^.*$ http://www.example.com/%1 [R=301,L]

Open in new window

Question by:temmygray
  • 3

Expert Comment

ID: 35705871

I have no clue about how apache works BUT I see your line #1 is not exactly what you are looking for : in *nix, %e2%80%8b is not equal to %E2%80%8B

Have you tried it with the right case ?

LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35705903

BUT I see your line #1 is not exactly what you are looking for : in *nix, %e2%80%8b is not equal to %E2%80%8B
The "NC" flag stands for "ignore case"   ; )

I might be missing something, but can you try adding the "no escape" flag?
RewriteRule ^.*$ http://www.example.com/%1 [R=301,NE,L]

Open in new window

LVL 75

Accepted Solution

käµfm³d   👽 earned 2000 total points
ID: 35706526
Ooops! I see now that I am indeed missing something  = )

Ignore the previous comment. You have two capture groups, but you are only referring to one of them in your replacment syntax, namely "%1". Try adding the second group to the replacement (below).

Each set of parentheses acts as its own capture group, which are numbered sequentially, starting from 1, going from left to right. In your rule above, the group to the left of "\%e2%80%8b" is group 1; to the right is group 2.
RewriteRule ^.*$ http://www.example.com/%1-%2 [R=301,L]

Open in new window


Author Closing Comment

ID: 35706677
Perfect. Thanks!
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 35706696
NP. Glad to help  = )

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If your site has a few sections that need to be secure when data is transmitted between the server and local computer, such as a /order/ section for ordering or /customer/ which contains customer data, etc it would of course be recommended to secure…
We are witnesses that everyone is saying that our children shouldn't "play" with a technology because it is dangerous. This article is going to prove that they are wrong.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses
Course of the Month16 days, 5 hours left to enroll

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question