<

Go Premium for a chance to win a PS4. Enter to Win

x

The Basics of .htaccess Files and URL Redirection

Published on
15,587 Points
5,287 Views
3 Endorsements
Last Modified:
If you've heard about htaccess and it sounds like it does what you want, but you're not sure how it works... well, you're in the right place. Read on.

Some Basics

#1. It's a file and its filename is .htaccess (yes, with a dot in the front).

#2. It's an Apache feature. Other web servers will not use .htaccess files (at least not without some custom plug-in).

#3. It's an extension of the Apache configuration. There is nothing you can do in the .htaccess file that you cannot also do in the main Apache configuration, but the most popular use of .htaccess is to set up redirection using the mod_rewrite plugin. If you don't know what mod_rewrite is but you're on a shared hosting provider, it's probably enabled.

#4. It applies ONLY to the location / folder that it's in (and any subfolders). This means you can configure how Apache behaves within a specific folder just by creating the .htaccess file inside that folder.

#5. It will ONLY work if the main Apache configuration has been set up so that "AllowOverride" is enabled for that folder (or if that folder is inside of a folder tree with it turned on). This is usally enabled on most shared hosting providers.

A Sample Redirection

Okay, on to the fun stuff. Let's say we have visitors that go to http://domain.com and http://www.domain.com but we want ALL of them to end up on the "www" version. We would create an .htaccess file and put it into the base directory for our website, and then put the following contents inside that file:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

Open in new window


It's easier than it looks. Let's break this down into something that regular humans can read. Here, we have the first line that simply enables the ability to perform redirects. Just like you only turn your car engine on when you want to go somewhere (otherwise you waste gas), you only want to turn on the RewriteEngine when you're about to use it. Leaving it on everywhere is a waste of server resources and makes your site slower.

Conditions and Rules

Next, we have the two lines RewriteCond and RewriteRule. Redirection has two basic components - conditions (RewriteCond) and rules (RewriteRule). In programming-speak, it's your basic if/then logic:

if(condition is met)
{
  execute this rule
}

Conditions and rules always work together, and there's always one rule for one or more conditions. This means that if you have this kind of configuration:

RewriteCond green
RewriteCond eggs
RewriteCond ham
RewriteRule do_not_eat.html

Open in new window


...then you will only get redirected to the "do_not_eat.html" page if ALL three conditions are met (green eggs AND ham). If you didn't want to eat green eggs OR green ham (which is likely if you are not a raccoon), then you might want to throw an extra [OR] condition after your "eggs" line:

RewriteCond green
RewriteCond eggs [OR]
RewriteCond ham
RewriteRule do_not_eat.html

Open in new window


RewriteCond

Of course, conditions are a little more complex than that. What is it about "green" that we are trying to match? Are we trying to see if the word "green" appears in the domain name? Or maybe we're trying to see if it appears in a filename? What if we just want to see if that file even exists? Obviously, we need a bit more information in our condition for it to be valid. So let's go back to the "www" condition from above:

RewriteCond %{HTTP_HOST} !^www\.

Open in new window


We know that RewriteCond is just the keyword that indicates that we're defining a condition, which leaves us with the "%{HTTP_HOST}" and "!^www\." parts. The first part is the subject of the condition. What is the condition examining?

We're looking at %{HTTP_HOST}, which happens to (usually) contain the value of whatever the domain name was in the address that someone typed in.

So if someone clicks on a link that takes them to https://www.mysite.com/mypicture.jpg, the %{HTTP_HOST} would contain "www.mysite.com", while if someone typed in a URL of http://mysite.com/myphoto.jpg, the %{HTTP_HOST} would contain "mysite.com".

Variables are handy like this, and there's a whole list of them here:

  http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritecond

Next is the pattern we are searching for, which is a regular expression. This article isn't meant to cover how to perform regular expressions, so if you don't know them, then you are best off learning about them here:

  http://www.regular-expressions.info/

Essentially, the regular expression !^www\. is checking to see if %{HTTP_HOST} does NOT start with "www.". So if the requested domain does NOT start with "www." then the following rule will apply:

RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

Open in new window


RewriteRule

The basic syntax of the above line is:

RewriteRule FROM TO [FLAGS]

Open in new window


Even though we can check the domain by looking at %{HTTP_HOST}, the RewriteRule works on the part of the URL that comes after the domain. So if your original URL is http://mysite.com/myphoto.jpg, the "FROM" portion of the rule is "/myphoto.jpg". So the above RewriteRule uses a regular expression "^(.*)$" to copy that entire URL. The parentheses copy the value into a variable called $1 so that when we use $1 later, it will be replaced with the "/myphoto.jpg" address.

Next comes the TO. Here we build a new URL starting with "http://www." and then we add in whatever is inside %{HTTP_HOST}, so if the original URL was http://mysite.com/myphoto.jpg, we should now have:

  http://www.mysite.com

Next we add a forward slash and the contents of $1, which we know contains "/myphoto.jpg" now, so the new URL becomes:

  http://www.mysite.com//myphoto.jpg

You might say, "Hey, there are two slashes!" Yes, you are right. We could have left out the extra slash in our RewriteRule, but it's a little easier to read with it in and Apache is pretty good about handling accidental extra slashes. Still, if you want to be extra-careful, you can get rid of that extra slash in the RewriteRule.

RewriteRule Flags

Finally we come to the flags [R=301,L]. That is actually two flags together: "R=301" and "L". In the first rule, "R" stands for "Redirect". Without this flag, Apache will try to handle the redirect quietly so that the visitor doesn't know about it (the URL will look like what they originally put in). However, if you WANT the visitor to be aware of it (so that they bookmark the "www" version of the site, for example) the "R"edirect will send instructions to the user's web browser to try again at the new URL, and the new URL will show up in their address bar.

There are different redirect codes, and 301 is the most common. A 301 redirect code is considered a "permanent" redirect. It's like telling the user, "Hey, we here at www.mysite.com NEVER intend for the non-www domain to ever work, but we're just redirecting you and letting you know so that you don't just see an error message or a blank page." There are also 300, 302, 303, and 307 redirect codes, but you shouldn't use them until you've researched them a bit more thoroughly. I rarely see anything other than 301s anymore.

The next flag is "L" which stands for "Last Rule". In many cases, you may have lots of different redirects for different situations:

RewriteEngine on
RewriteCond green
RewriteCond eggs
RewriteCond ham
RewriteRule do_not_eat.html

RewriteCond house
RewriteCond mouse
RewriteRule seriously_guys_will_not_eat.html

Open in new window


When you specify the "L" flag, you're saying, "Hey, if this RewriteRule gets applied, then don't even bother looking at the rest of the rules and conditions - we're done here! Just do the redirect now!" So if "do_not_eat.html" had an [L] at the end of the line, and if the conditions for it were met, then the house/mouse/etc stuff would be ignored.

You can see all the flags here if you scroll down a bit:

  http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule

You should now have a decent idea of the basics of redirection inside .htaccess files. All you have to do is change the conditions and rules to fit your needs! There are some good sites out there that have some handy, common conditions and rules that you can copy-and-paste and tweak so you don't have to build everything from scratch. I personally like this one:

  http://www.askapache.com/htaccess/modrewrite-tips-tricks.html

Getting back to .htaccess, remember that .htaccess itself is not limited to redirection/rewrite configuration. It can handle the basic Apache configuration directives for folders, so you can use it for more things, like overriding php.ini values and setting custom 404 error handlers and stuff like that. It's a very handy tool, but also very dangerous. Since changes take effect as soon as you save, you need to be certain that you don't have typos or other mistakes in the file. If you have mistakes in an .htaccess file, you can accidentally block ALL access to every file in a website until the mistake is fixed. So always test your .htaccess files in a separate subfolder or on a separate server if you can, and always keep a copy of the original file in case you need to roll back your changes!

One final note for Apache administrators who are reading this: this next section is very important for you.

Ease of Use vs. Performance

Setup of .htaccess files is easy and quick. You can make updates and have them take effect as soon as you save the file. If you need another .htaccess file, you just create it and it starts to work. It doesn't get much simpler.

However, that "easy and quick" benefit comes with a significant performance problem. The main Apache configuration gets processed only ONCE every time that the web server restarts, but EVERY single time ANYONE asks for ANY file within a directory that has "AllowOverride" enabled, Apache has to first check for .htaccess files in that folder or in its parent folders and then process it.

So let's say that you have this folder structure:

/www.mysite.com/.htaccess
/www.mysite.com/images/jpg/photo_of_me.jpg

When someone tries to request the download of http://www.mysite.com/images/jpg/photo_of_me.jpg, Apache goes and looks in the base folder for .htaccess, and then it looks in the "images" folder for .htaccess, and then finally the "jpg" folder. You can have multiple .htaccess files, so it has to check all the valid locations first. After it checks these three folders, it pulls in all the .htaccess files that it finds, processes the contents into a set of rules that Apache understands, and then goes to see if any of the rules apply.

Now think of a normal website that uses 15 images in its layout. That means 15 individual web requests, plus the original one for the web page. That means Apache has to check those 3 folders a total of 15-16 times, read the .htaccess file(s) in 15-16 times, and process the contents 15-16 times... all for that one single visit to your web page.

If you only get one visitor a day, it's not a problem. When Apache has to serve up thousands of visitors per hour, it will start slowing down because of all the folders it has to check, all the htaccess files it has to read, and all the processing it has to do. So if you are the only administrator on your server and you don't have users who need to edit their own .htaccess files, then turn off the "AllowOverride" wherever it shows up (just set it to "None") and you'll save a lot of unnecessary hits on the disk and a lot of wasted CPU usage in the long run.

If you use an application that makes use of an .htaccess file, just copy the contents into the appropriate section in your main Apache configuration and gracefully restart Apache.
3
Comment
Author:gr8gonzo
  • 2
  • 2
5 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
I think you should add a blurb about what URL redirection is. I can't begin to tell you how many questions I've seen where people expect .htaccess to take this:

    http://www.example.com?some=querystring

to this:

    http://www.example.com/querystring

In other words, there seems to confusion that .htaccess produces "SEO" links that the user will see in his address bar. While I know of at least one product which can do this, .htaccess on Apache does not, so far as I know.
0
 
LVL 26

Expert Comment

by:arober11
0
 
LVL 35

Author Comment

by:gr8gonzo
Hi kaufmed,

Sorry, I didn't see your comment earlier. This bit of code should do what you're talking about:

RewriteEngine On
RewriteCond %{QUERY_STRING} ^some=([^&]+)$
RewriteRule ^.*$ http://%{HTTP_HOST}/%1.php? [R=302,L]
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
@arober11

I didn't get a chance to read the articles in-dept, but I scanned over them and I don't believe they address what I am talking about. Can you point me to the section of either article which addresses what I mention below?

@gr8gonzo

Not how to do it (i.e. what code to use), but rather what it is. As I mentioned, my understanding of "SEO" URLs and .htaccess file is that the .htaccess file takes a SEO link that a user clicked on, and it effectively pre-processes the request before the web server gets a chance to inspect the request. The rewrite engine takes the SEO link and turns it into the traditional querystring version. In my experience, many people do not realize this. They believe that the .htaccess file is going to modify the outgoing links in their HTML files to be more SEO-friendly. As I understand, this is not the case. One should embed the SEO-friendly URLs into their HTML, and then craft their .htaccess file(s) in such a way as to un-SEO the URL when it comes back to the server--before the web server sees which resource was requested.

If I'm decidedly ignorant in this regard, please let me know  : )
0
 
LVL 35

Author Comment

by:gr8gonzo
@kaufmed, you're correct in that htaccess files don't do that, although I haven't come across anyone else that seems to think that's what htaccess files do. Then again, that could just be my own selection bias. I think the comments so far should help clear that up, though.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Join & Write a Comment

Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an anti-spam), the admin…
Screencast - Getting to Know the Pipeline
Suggested Courses
Course of the Month12 days, 2 hours left to enroll

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month