Removing bad chars from URL with .htaccess

Hi Experts,

I am having a problem on one of my websites. Google bot keeps on picking up url's like this: /life/mortgage-life-%E2%80%8Binsurance

When the actual URL is this: life/mortgage-life-insurance

So those characters convert to something that definitely isn't in any of my source files, so I am assuming that this is a link in from an external site. Where they are coming from doesn't really bother me, so what I need to do is create a htaccess mod_rewrite rule to remove those bad characters from the URL.

What I have come up with so far (by googling) is attached. It removes it from the URL however it doesn't then put the rest of the URL back in, so when it redirects it goes to life/mortgage-life-

and I need it to remember to put the insurance on to the end of it and basically only just remove %E2%80%8B.

How can I do this? I have tried a few regex creators but none seem to be able to do..

Many thanks!

Tom
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.*)\%e2%80%8b(.*)\ HTTP/ [NC] 
RewriteRule ^.*$ http://www.example.com/%1 [R=301,L]

Open in new window

temmygrayAsked:
Who is Participating?
 
käµfm³d 👽Connect With a Mentor Commented:
Ooops! I see now that I am indeed missing something  = )

Ignore the previous comment. You have two capture groups, but you are only referring to one of them in your replacment syntax, namely "%1". Try adding the second group to the replacement (below).

Each set of parentheses acts as its own capture group, which are numbered sequentially, starting from 1, going from left to right. In your rule above, the group to the left of "\%e2%80%8b" is group 1; to the right is group 2.
RewriteRule ^.*$ http://www.example.com/%1-%2 [R=301,L]

Open in new window

0
 
PilouteCommented:
Hi

I have no clue about how apache works BUT I see your line #1 is not exactly what you are looking for : in *nix, %e2%80%8b is not equal to %E2%80%8B

Have you tried it with the right case ?

P
0
 
käµfm³d 👽Commented:
@Piloute

BUT I see your line #1 is not exactly what you are looking for : in *nix, %e2%80%8b is not equal to %E2%80%8B
The "NC" flag stands for "ignore case"   ; )


@temmygray
I might be missing something, but can you try adding the "no escape" flag?
RewriteRule ^.*$ http://www.example.com/%1 [R=301,NE,L]

Open in new window

0
 
temmygrayAuthor Commented:
Perfect. Thanks!
0
 
käµfm³d 👽Commented:
NP. Glad to help  = )
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.