Solved

Rewriting URLs that end in .htm in htaccess

Posted on 2012-03-20
12
311 Views
Last Modified: 2012-06-22
I am using the following rewriterule:

RewriteRule ^(.+)\.htm[^l]+$ $1.htm [L,NC,QSA,R=301]

I thought it was working to do what I want which is to drop any extra characters after urls that end in .htm unless the next character is an l or a querystring. I just learned that a URL of http://www.romancestuck.com/quotes/movie-quotes.htmCarol doesn't cut off the Carol part and redirect to http://www.romancestuck.com/quotes/movie-quotes.htm How can I change this rewriterule to work?

Thanks!
0
Comment
Question by:webstuck5
  • 6
  • 5
12 Comments
 
LVL 15

Expert Comment

by:babuno5
ID: 37748483
Here you go

      RewriteRule ^(.*)\.htm(.+)$ /$1.htm [R=301,L,QSA,NC]


Hope the above helps
0
 
LVL 50

Accepted Solution

by:
Steve Bink earned 500 total points
ID: 37748909
Here is my response from your previous question:

Because there is a lower-case "L" at the end.  Perhaps a small modification:

RewriteRule ^/?(.+)\.htm[^l].*$ $1.htm [L,NC,QSA,R=301]

Open in new window

                                           
The new regex looks for any string that may or may not start with a literal "/" character, followed by a sequence of one or more characters, to be later identified as group 1, followed by the literal string ".htm", followed by any character that is not "L" (case-insensitive), followed by 0 or more characters.
0
 

Author Comment

by:webstuck5
ID: 37749367
I just don't understand why to look for any string that may or may not start with a literal "/" character. What does the "/" character have to do with anything?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 50

Expert Comment

by:Steve Bink
ID: 37750167
I posted the reasoning for that in the other question.
0
 

Author Comment

by:webstuck5
ID: 37759414
I just noticed that someone is linking to:

http://www.romancestuck.com/quotes/movie-quotes.htmLove

I wonder if there is a better way to write this redirect so that if a URL ends in .htm or .html then everything is cut off after that except for querystrings. I am trying to set up this one redirect to handle as many possible bad URL links as possible.
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 37759625
The better question is why these requests are coming in.  

Try this:

RewriteRule ^/?(.+)\.(html?).*$ $1.$2 [L,NC,QSA,R=301]

Open in new window

0
 

Author Comment

by:webstuck5
ID: 37760006
Google webmaster tools lists all these bad URL links to my site. Most of the links are meaningless however Google webmaster tools will keep displaying these URL link errors if I don't fix or redirect them. So, I am trying to set up as few redirects as possible to fix them so I don't have to keep seeing these error messages. Sorry to bug you again but your last redirect didn't work the way I was hoping. It redirected:

http://www.romancestuck.com/quotes/movie-quotes.htmLove

to

http://www.romancestuck.com/quotes/movie-quotes.htmL

and showed a message that said it caused too many redirects. I was hoping that it would redirect to the actual URL of:

http://www.romancestuck.com/quotes/movie-quotes.htm

The only page that I have that ends in .html is the main index.html page, every other page ends in .htm

In .htaccess, I now have:

  # REDIRECT URLS ENDING IN index.html
  RewriteRule ^/*index\.html.*$ / [L,NC,QSA,R=301]

So, any redirect after that should actually only need to worry about pages that end with .htm and cut off anything after .htm. Sorry to keep bothering you but I think I am getting to a really good redirect rule that will solve a ton of bad links all at once.
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 37760934
I think you're going about this the wrong way.  Using the webmaster tools, you should be removing these bad links.  Alternatively, have Google re-index your site.  Right now, you're addressing a symptom, not the problem itself, and that rarely does what you want it to do.

RewriteCond %{REQUEST_FILENAME} !index.html$
RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

Open in new window

0
 

Author Comment

by:webstuck5
ID: 37761030
These are links from other sites so I can't remove them and most are things like forum posts so I can't really write the webmaster to have them update the links. The problem is that a lot of people apparently don't know how to create web links, but I don't think I can do much about that. :) So in my .htaccess, I should try:

# REDIRECT URLS ENDING IN index.html
RewriteRule ^/*index\.html.*$ / [L,NC,QSA,R=301]

RewriteCond %{REQUEST_FILENAME} !index.html$
RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

Correct?
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 37761243
While your explanation is true, the burden for proper linking is on the one creating the link.  We all wish people would do everything right every time, but, people being people, that doesn't happen.  You're trying to account for an infinite range of errors, and that just is not going to happen.  You'll find (eventually) that it is a monumental waste of time to try.  

The first rule is liable to create a redirection loop, depending on how your DefaultDocument directive is set.  I would remove it, but if you are determined to keep it, make sure you use the [NS] modifier to prevent the loop.  Also, the first atom should be "/?".

Otherwise, that looks correct.
0
 

Author Comment

by:webstuck5
ID: 37761304
What I am trying to do is redirect as many as possible of these bad links with as few rewriterules as possible. A lot I need to setup individual redirects. I really don't have a choice. I can ignore these bad links listed in Google webmaster tools but then I have to go through them to find the bad links that actually need to be fixed. I can mark these bad links as fixed and have them come back up when the Google crawler finds them again, but that seems silly. So, I thought redirecting these bad links with as few as redirects as possible would be the best option.
0
 

Author Comment

by:webstuck5
ID: 37762115
I am using:

 REDIRECT URLS ENDING IN index.html
RewriteRule ^/?/*index\.html.*$ / [L,NC,NS,QSA,R=301]

# REDIRECT ANY URLS THAT DON'T CONTAIN index.html AND END IN .htm THAT ARE NOT FOLLOWED A querystring
  RewriteCond %{REQUEST_FILENAME} !index.html$
  RewriteRule ^/?(.+)\.htm.+$ /$1.htm [NC,QSA,R=301]

It seems to be working well. Thanks for all your help again!
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Over the last year I have answered a couple of basic URL rewriting questions several times so I thought I might as well have a stab at: explaining the basics, providing a few useful links and consolidating some of the most common queries into a sing…
If your site has a few sections that need to be secure when data is transmitted between the server and local computer, such as a /order/ section for ordering or /customer/ which contains customer data, etc it would of course be recommended to secure…
This Micro Tutorial will give you a basic overview how to record your screen with Microsoft Expression Encoder. This program is still free and open for the public to download. This will be demonstrated using Microsoft Expression Encoder 4.
Windows 10 is mostly good. However the one thing that annoys me is how many clicks you have to do to dial a VPN connection. You have to go to settings from the start menu, (2 clicks), Network and Internet (1 click), Click VPN (another click) then fi…

785 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question