Rewriting URLs that end in .htm in htaccess

I am using the following rewriterule:

RewriteRule ^(.+)\.htm[^l]+$ $1.htm [L,NC,QSA,R=301]

I thought it was working to do what I want which is to drop any extra characters after urls that end in .htm unless the next character is an l or a querystring. I just learned that a URL of http://www.romancestuck.com/quotes/movie-quotes.htmCarol doesn't cut off the Carol part and redirect to http://www.romancestuck.com/quotes/movie-quotes.htm How can I change this rewriterule to work?

Thanks!
webstuck5Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

babuno5Commented:
Here you go

      RewriteRule ^(.*)\.htm(.+)$ /$1.htm [R=301,L,QSA,NC]


Hope the above helps
Steve BinkCommented:
Here is my response from your previous question:

Because there is a lower-case "L" at the end.  Perhaps a small modification:

RewriteRule ^/?(.+)\.htm[^l].*$ $1.htm [L,NC,QSA,R=301]

Open in new window

                                           
The new regex looks for any string that may or may not start with a literal "/" character, followed by a sequence of one or more characters, to be later identified as group 1, followed by the literal string ".htm", followed by any character that is not "L" (case-insensitive), followed by 0 or more characters.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
webstuck5Author Commented:
I just don't understand why to look for any string that may or may not start with a literal "/" character. What does the "/" character have to do with anything?
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

Steve BinkCommented:
I posted the reasoning for that in the other question.
webstuck5Author Commented:
I just noticed that someone is linking to:

http://www.romancestuck.com/quotes/movie-quotes.htmLove

I wonder if there is a better way to write this redirect so that if a URL ends in .htm or .html then everything is cut off after that except for querystrings. I am trying to set up this one redirect to handle as many possible bad URL links as possible.
Steve BinkCommented:
The better question is why these requests are coming in.  

Try this:

RewriteRule ^/?(.+)\.(html?).*$ $1.$2 [L,NC,QSA,R=301]

Open in new window

webstuck5Author Commented:
Google webmaster tools lists all these bad URL links to my site. Most of the links are meaningless however Google webmaster tools will keep displaying these URL link errors if I don't fix or redirect them. So, I am trying to set up as few redirects as possible to fix them so I don't have to keep seeing these error messages. Sorry to bug you again but your last redirect didn't work the way I was hoping. It redirected:

http://www.romancestuck.com/quotes/movie-quotes.htmLove

to

http://www.romancestuck.com/quotes/movie-quotes.htmL

and showed a message that said it caused too many redirects. I was hoping that it would redirect to the actual URL of:

http://www.romancestuck.com/quotes/movie-quotes.htm

The only page that I have that ends in .html is the main index.html page, every other page ends in .htm

In .htaccess, I now have:

  # REDIRECT URLS ENDING IN index.html
  RewriteRule ^/*index\.html.*$ / [L,NC,QSA,R=301]

So, any redirect after that should actually only need to worry about pages that end with .htm and cut off anything after .htm. Sorry to keep bothering you but I think I am getting to a really good redirect rule that will solve a ton of bad links all at once.
Steve BinkCommented:
I think you're going about this the wrong way.  Using the webmaster tools, you should be removing these bad links.  Alternatively, have Google re-index your site.  Right now, you're addressing a symptom, not the problem itself, and that rarely does what you want it to do.

RewriteCond %{REQUEST_FILENAME} !index.html$
RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

Open in new window

webstuck5Author Commented:
These are links from other sites so I can't remove them and most are things like forum posts so I can't really write the webmaster to have them update the links. The problem is that a lot of people apparently don't know how to create web links, but I don't think I can do much about that. :) So in my .htaccess, I should try:

# REDIRECT URLS ENDING IN index.html
RewriteRule ^/*index\.html.*$ / [L,NC,QSA,R=301]

RewriteCond %{REQUEST_FILENAME} !index.html$
RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

Correct?
Steve BinkCommented:
While your explanation is true, the burden for proper linking is on the one creating the link.  We all wish people would do everything right every time, but, people being people, that doesn't happen.  You're trying to account for an infinite range of errors, and that just is not going to happen.  You'll find (eventually) that it is a monumental waste of time to try.  

The first rule is liable to create a redirection loop, depending on how your DefaultDocument directive is set.  I would remove it, but if you are determined to keep it, make sure you use the [NS] modifier to prevent the loop.  Also, the first atom should be "/?".

Otherwise, that looks correct.
webstuck5Author Commented:
What I am trying to do is redirect as many as possible of these bad links with as few rewriterules as possible. A lot I need to setup individual redirects. I really don't have a choice. I can ignore these bad links listed in Google webmaster tools but then I have to go through them to find the bad links that actually need to be fixed. I can mark these bad links as fixed and have them come back up when the Google crawler finds them again, but that seems silly. So, I thought redirecting these bad links with as few as redirects as possible would be the best option.
webstuck5Author Commented:
I am using:

 REDIRECT URLS ENDING IN index.html
RewriteRule ^/?/*index\.html.*$ / [L,NC,NS,QSA,R=301]

# REDIRECT ANY URLS THAT DON'T CONTAIN index.html AND END IN .htm THAT ARE NOT FOLLOWED A querystring
  RewriteCond %{REQUEST_FILENAME} !index.html$
  RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

It seems to be working well. Thanks for all your help again!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Apache Web Server

From novice to tech pro — start learning today.