Solved

Rewriting URLs that end in .htm in htaccess

Posted on 2012-03-20
12
308 Views
Last Modified: 2012-06-22
I am using the following rewriterule:

RewriteRule ^(.+)\.htm[^l]+$ $1.htm [L,NC,QSA,R=301]

I thought it was working to do what I want which is to drop any extra characters after urls that end in .htm unless the next character is an l or a querystring. I just learned that a URL of http://www.romancestuck.com/quotes/movie-quotes.htmCarol doesn't cut off the Carol part and redirect to http://www.romancestuck.com/quotes/movie-quotes.htm How can I change this rewriterule to work?

Thanks!
0
Comment
Question by:webstuck5
  • 6
  • 5
12 Comments
 
LVL 15

Expert Comment

by:babuno5
ID: 37748483
Here you go

      RewriteRule ^(.*)\.htm(.+)$ /$1.htm [R=301,L,QSA,NC]


Hope the above helps
0
 
LVL 50

Accepted Solution

by:
Steve Bink earned 500 total points
ID: 37748909
Here is my response from your previous question:

Because there is a lower-case "L" at the end.  Perhaps a small modification:

RewriteRule ^/?(.+)\.htm[^l].*$ $1.htm [L,NC,QSA,R=301]

Open in new window

                                           
The new regex looks for any string that may or may not start with a literal "/" character, followed by a sequence of one or more characters, to be later identified as group 1, followed by the literal string ".htm", followed by any character that is not "L" (case-insensitive), followed by 0 or more characters.
0
 

Author Comment

by:webstuck5
ID: 37749367
I just don't understand why to look for any string that may or may not start with a literal "/" character. What does the "/" character have to do with anything?
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 37750167
I posted the reasoning for that in the other question.
0
 

Author Comment

by:webstuck5
ID: 37759414
I just noticed that someone is linking to:

http://www.romancestuck.com/quotes/movie-quotes.htmLove

I wonder if there is a better way to write this redirect so that if a URL ends in .htm or .html then everything is cut off after that except for querystrings. I am trying to set up this one redirect to handle as many possible bad URL links as possible.
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 37759625
The better question is why these requests are coming in.  

Try this:

RewriteRule ^/?(.+)\.(html?).*$ $1.$2 [L,NC,QSA,R=301]

Open in new window

0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 

Author Comment

by:webstuck5
ID: 37760006
Google webmaster tools lists all these bad URL links to my site. Most of the links are meaningless however Google webmaster tools will keep displaying these URL link errors if I don't fix or redirect them. So, I am trying to set up as few redirects as possible to fix them so I don't have to keep seeing these error messages. Sorry to bug you again but your last redirect didn't work the way I was hoping. It redirected:

http://www.romancestuck.com/quotes/movie-quotes.htmLove

to

http://www.romancestuck.com/quotes/movie-quotes.htmL

and showed a message that said it caused too many redirects. I was hoping that it would redirect to the actual URL of:

http://www.romancestuck.com/quotes/movie-quotes.htm

The only page that I have that ends in .html is the main index.html page, every other page ends in .htm

In .htaccess, I now have:

  # REDIRECT URLS ENDING IN index.html
  RewriteRule ^/*index\.html.*$ / [L,NC,QSA,R=301]

So, any redirect after that should actually only need to worry about pages that end with .htm and cut off anything after .htm. Sorry to keep bothering you but I think I am getting to a really good redirect rule that will solve a ton of bad links all at once.
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 37760934
I think you're going about this the wrong way.  Using the webmaster tools, you should be removing these bad links.  Alternatively, have Google re-index your site.  Right now, you're addressing a symptom, not the problem itself, and that rarely does what you want it to do.

RewriteCond %{REQUEST_FILENAME} !index.html$
RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

Open in new window

0
 

Author Comment

by:webstuck5
ID: 37761030
These are links from other sites so I can't remove them and most are things like forum posts so I can't really write the webmaster to have them update the links. The problem is that a lot of people apparently don't know how to create web links, but I don't think I can do much about that. :) So in my .htaccess, I should try:

# REDIRECT URLS ENDING IN index.html
RewriteRule ^/*index\.html.*$ / [L,NC,QSA,R=301]

RewriteCond %{REQUEST_FILENAME} !index.html$
RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

Correct?
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 37761243
While your explanation is true, the burden for proper linking is on the one creating the link.  We all wish people would do everything right every time, but, people being people, that doesn't happen.  You're trying to account for an infinite range of errors, and that just is not going to happen.  You'll find (eventually) that it is a monumental waste of time to try.  

The first rule is liable to create a redirection loop, depending on how your DefaultDocument directive is set.  I would remove it, but if you are determined to keep it, make sure you use the [NS] modifier to prevent the loop.  Also, the first atom should be "/?".

Otherwise, that looks correct.
0
 

Author Comment

by:webstuck5
ID: 37761304
What I am trying to do is redirect as many as possible of these bad links with as few rewriterules as possible. A lot I need to setup individual redirects. I really don't have a choice. I can ignore these bad links listed in Google webmaster tools but then I have to go through them to find the bad links that actually need to be fixed. I can mark these bad links as fixed and have them come back up when the Google crawler finds them again, but that seems silly. So, I thought redirecting these bad links with as few as redirects as possible would be the best option.
0
 

Author Comment

by:webstuck5
ID: 37762115
I am using:

 REDIRECT URLS ENDING IN index.html
RewriteRule ^/?/*index\.html.*$ / [L,NC,NS,QSA,R=301]

# REDIRECT ANY URLS THAT DON'T CONTAIN index.html AND END IN .htm THAT ARE NOT FOLLOWED A querystring
  RewriteCond %{REQUEST_FILENAME} !index.html$
  RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

It seems to be working well. Thanks for all your help again!
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As Wikipedia explains 'robots.txt' as -- the robot exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a websit…
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now