Solved

Rewriting URLs that end in .htm in htaccess

Posted on 2012-03-20
12
314 Views
Last Modified: 2012-06-22
I am using the following rewriterule:

RewriteRule ^(.+)\.htm[^l]+$ $1.htm [L,NC,QSA,R=301]

I thought it was working to do what I want which is to drop any extra characters after urls that end in .htm unless the next character is an l or a querystring. I just learned that a URL of http://www.romancestuck.com/quotes/movie-quotes.htmCarol doesn't cut off the Carol part and redirect to http://www.romancestuck.com/quotes/movie-quotes.htm How can I change this rewriterule to work?

Thanks!
0
Comment
Question by:webstuck5
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
12 Comments
 
LVL 15

Expert Comment

by:babuno5
ID: 37748483
Here you go

      RewriteRule ^(.*)\.htm(.+)$ /$1.htm [R=301,L,QSA,NC]


Hope the above helps
0
 
LVL 50

Accepted Solution

by:
Steve Bink earned 500 total points
ID: 37748909
Here is my response from your previous question:

Because there is a lower-case "L" at the end.  Perhaps a small modification:

RewriteRule ^/?(.+)\.htm[^l].*$ $1.htm [L,NC,QSA,R=301]

Open in new window

                                           
The new regex looks for any string that may or may not start with a literal "/" character, followed by a sequence of one or more characters, to be later identified as group 1, followed by the literal string ".htm", followed by any character that is not "L" (case-insensitive), followed by 0 or more characters.
0
 

Author Comment

by:webstuck5
ID: 37749367
I just don't understand why to look for any string that may or may not start with a literal "/" character. What does the "/" character have to do with anything?
0
Save the day with this special offer from ATEN!

Save 30% on the CV211 using promo code EXPERTS30 now through April 30th. The ATEN CV211 connects a laptop directly to any server allowing you instant access to perform data maintenance and local operations, for quick troubleshooting, updating, service and repair.

 
LVL 50

Expert Comment

by:Steve Bink
ID: 37750167
I posted the reasoning for that in the other question.
0
 

Author Comment

by:webstuck5
ID: 37759414
I just noticed that someone is linking to:

http://www.romancestuck.com/quotes/movie-quotes.htmLove

I wonder if there is a better way to write this redirect so that if a URL ends in .htm or .html then everything is cut off after that except for querystrings. I am trying to set up this one redirect to handle as many possible bad URL links as possible.
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 37759625
The better question is why these requests are coming in.  

Try this:

RewriteRule ^/?(.+)\.(html?).*$ $1.$2 [L,NC,QSA,R=301]

Open in new window

0
 

Author Comment

by:webstuck5
ID: 37760006
Google webmaster tools lists all these bad URL links to my site. Most of the links are meaningless however Google webmaster tools will keep displaying these URL link errors if I don't fix or redirect them. So, I am trying to set up as few redirects as possible to fix them so I don't have to keep seeing these error messages. Sorry to bug you again but your last redirect didn't work the way I was hoping. It redirected:

http://www.romancestuck.com/quotes/movie-quotes.htmLove

to

http://www.romancestuck.com/quotes/movie-quotes.htmL

and showed a message that said it caused too many redirects. I was hoping that it would redirect to the actual URL of:

http://www.romancestuck.com/quotes/movie-quotes.htm

The only page that I have that ends in .html is the main index.html page, every other page ends in .htm

In .htaccess, I now have:

  # REDIRECT URLS ENDING IN index.html
  RewriteRule ^/*index\.html.*$ / [L,NC,QSA,R=301]

So, any redirect after that should actually only need to worry about pages that end with .htm and cut off anything after .htm. Sorry to keep bothering you but I think I am getting to a really good redirect rule that will solve a ton of bad links all at once.
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 37760934
I think you're going about this the wrong way.  Using the webmaster tools, you should be removing these bad links.  Alternatively, have Google re-index your site.  Right now, you're addressing a symptom, not the problem itself, and that rarely does what you want it to do.

RewriteCond %{REQUEST_FILENAME} !index.html$
RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

Open in new window

0
 

Author Comment

by:webstuck5
ID: 37761030
These are links from other sites so I can't remove them and most are things like forum posts so I can't really write the webmaster to have them update the links. The problem is that a lot of people apparently don't know how to create web links, but I don't think I can do much about that. :) So in my .htaccess, I should try:

# REDIRECT URLS ENDING IN index.html
RewriteRule ^/*index\.html.*$ / [L,NC,QSA,R=301]

RewriteCond %{REQUEST_FILENAME} !index.html$
RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

Correct?
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 37761243
While your explanation is true, the burden for proper linking is on the one creating the link.  We all wish people would do everything right every time, but, people being people, that doesn't happen.  You're trying to account for an infinite range of errors, and that just is not going to happen.  You'll find (eventually) that it is a monumental waste of time to try.  

The first rule is liable to create a redirection loop, depending on how your DefaultDocument directive is set.  I would remove it, but if you are determined to keep it, make sure you use the [NS] modifier to prevent the loop.  Also, the first atom should be "/?".

Otherwise, that looks correct.
0
 

Author Comment

by:webstuck5
ID: 37761304
What I am trying to do is redirect as many as possible of these bad links with as few rewriterules as possible. A lot I need to setup individual redirects. I really don't have a choice. I can ignore these bad links listed in Google webmaster tools but then I have to go through them to find the bad links that actually need to be fixed. I can mark these bad links as fixed and have them come back up when the Google crawler finds them again, but that seems silly. So, I thought redirecting these bad links with as few as redirects as possible would be the best option.
0
 

Author Comment

by:webstuck5
ID: 37762115
I am using:

 REDIRECT URLS ENDING IN index.html
RewriteRule ^/?/*index\.html.*$ / [L,NC,NS,QSA,R=301]

# REDIRECT ANY URLS THAT DON'T CONTAIN index.html AND END IN .htm THAT ARE NOT FOLLOWED A querystring
  RewriteCond %{REQUEST_FILENAME} !index.html$
  RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

It seems to be working well. Thanks for all your help again!
0

Featured Post

Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
ProxyPass - Problem 5 224
Recompile PHP 5.5 with zip support for a newbe 8 138
Can't connect to WAMP server 5 79
Cpanel file manager 8 59
It is possible to boost certain documents at query time in Solr. Query time boosting can be a powerful resource for finding the most relevant and "best" content. Of course the more information you index, the more fields you will be able to use for y…
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
Finding and deleting duplicate (picture) files can be a time consuming task. My wife and I, our three kids and their families all share one dilemma: Managing our pictures. Between desktops, laptops, phones, tablets, and cameras; over the last decade…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question