<

URL Aliasing, Redirection, Rewriting and Reverse Proxying using Apache HTTPD

Published on
55,288 Points
37,088 Views
7 Endorsements
Last Modified:
Awarded
Over the last year I have answered a couple of basic URL rewriting questions several times so I thought I might as well have a stab at: explaining the basics, providing a few useful links and consolidating some of the most common queries into a single Article.
So let us start at the very beginning, with defining the term URL (Uniform Resource Locater).

URL’s Requests and 404’s

As the (link) explains a URL is just: A textual address for a Web based resources, it consists of 2 to 4 parts, and is  either manually entered, in a Browser’s address bar, or picked up through following an HTML based link e.g. http://www.somsite.com:81/subdir/index.php?someVar=2&anotherVar=xxxx

Where the constituent parts are:
The scheme (Protocol) e.g. http://
The Hostname or IP and optionally a port e.g.  www.somsite.com:81
The Path (URI) to the desired resource of the site e.g.  /subdir/index.php
The Query String (CGI parameters) the resource will take e.g.  ?someVar=2&anotherVar=xxxx
Note:
1) The Path and Query strings are optional.
2) A browser ONLY anchor / bookmark may be found at the very end of the URL e.g. #sectionTwo
this is not technically part of the URL and is not sent to the web server.

On receipt of a request a WEB server, such as Apache HTTPD, will generally either return the requested resource, or a HTTP 404 “Not found" error. Hopefully clear enough so far but from time to time a site owner may wish to:
1) Restructure the directory tree or Naming convention on their site.
2) Remove content from the site
3) Choose to merge multiple sites into one.
4) Choose to obscure / simplify / alias the URL’s used to access a resource on their site.
5) Choose to incorporate content hosted on a separate Web or Application server or service.
If the site owner performs any of the first 3 and possibly the 4th action any Search Engine results, 3rd party site links or User bookmarks for a once valid resource will result in a 404 “Not Found" error in the user’s browser, possibly impacting the sites traffic and / or revenue. To work around this issue Apache provides a number or options, from the ability to define Custom Error page(s), rather than the plain white pages, to the ability to: Alias, Redirect, Rewrite or Proxy individual requests. Proxying also support the 5th option above.  

Custom Error pages

Let us start with the custom Error page option; these can be set either globally, for the site, or at the directory level, via the use of Errordocument directives e.g.
ErrorDocument 404 /errors/pageNolongerExists.html

Open in new window

An ErrorDocument directive can be added to either Apache’s main httpd.conf (sometimes called: apache.conf) or via a .htaccess file within the sites document tree. It’s recommended that at least one stylized page be provided per site, ideally with some helpful text and a few working links, so as to not overly worry customers with a sticky keyboard or dyslexic fingers.  While not presenting the user with a white Screen the ErrorDocument approach does not provide a seamless transition for the User, so Redirection or Rewriting is a popular choice. Before I go further I should explain the difference between Redirection and Rewriting.  
* Redirection = The simple re-mapping of an entered PATH to a new path or full URL.
* Rewriting = The remapping of any combination of the URL's constituent parts to a New path or URL, possible adding or removing elements in the process.

Redirection and Aliasing

Before I go through Apache’s Redirection and Aliasing options I'll briefly mention an alternative mechanism to achieve the same result, through the addition of an HTML meta refresh tag to each existing static HTML page and / or altering any existing application or script to serve a tag in the existing pages e.g.
<meta http-equiv="refresh" content="0;url=http://www.newsite.com/newLocation/">

Open in new window

The above tag will immediately ask the User’s browser to redirect to page: http://www.newsite.com/newLocation/
This approach leaves the WEB server configuration unchanged, which may be your only option if using a hosted solution, but can be rather time consuming and will also change the URL seen in the user’s browser.
Now back to Apache. Using a mod_alias Redirect is slightly more efficient than equivalent mod_rewrite RewritRule, but I should re-emphasise Redirection only works on the URI (PATH) element of a URL. If you wish to incorporate other elements of the URL, environmental or remote content you’ll need to use a Rewriting or Proxying solution, but should still read this section for background information.

Assuming your Apache instance has loaded (As root Type: apache2ctl -t -D DUMP_MODULES |egrep "proxy|alias|rewrite") or has access to load, via an httpd.conf:
LoadModule  Mod_alias modules/mod_alias.so
directive, Aliasing and Redirection are available to you, if you are unfamiliar with the terms this may help:
* Alias (Sometimes called an "Internal Rewrite") = Serve the requested file from another directory on the same server, this option provides an entirely Apache based work around for a straight directory rename or directory split e.g.  oldDir is now /newDirectory
* Redirection – A joint Server and Browser solution to work around a change of the sites Host Name, structure or object name. Rather than sending a 404 error back for a moved / deleted object, when a rule has been defined for the resource, Apache will send a 30# response code and a new URL to the requestor, indicating the resource can now be found at the new URL. e.g.
www.oldSite.com/products   is now www.newSite.com/catalog
The response code indicate whether the redirection is either on a Permanent (default) or Temporary basis, which will help determine whether the Browser will change the URL displayed in the Address Bar. A Temporary Redirect (code: 302 or 307) to another page on the same site will not generally change the URL displayed in the browser, while a change of Host Name or a Permanent Redirect (code: 301) will alter the visible URL to the new URL.
The following gives an indication of what can be achieved through use of mod_alias directives.
# Specify the new, absolute, location for the content that used to be found in /oldDir
Alias      /oldDir    /srv/www/htdocs/newDirectory

# Specify the new location for any jpeg files that were formerly in the products directory 
AliasMatch ^/products/(.*\.jpg)$ /srv/www/htdocs/images/$1
# Note: For an explanation of (.*\.jpg)$ and $1 see: Regular Expressions

# Indicates the content of /oldDirectory can now be found on a different Web-site.
Redirect           /oldDirectory     http://www.newsite.com/somDirectory 

# A Temporary redirection within the site, should not alter the Visible URL in the address bar.
RedirectTemp  /oldDirectory  http://www.oldSite.com/newDirectory

# Off load all requests for images to a remote Content Management service.
RedirectMatch ([^.\/]*\.jpg)$   http://www.contentManagementService.com/someDirectory/$1

Open in new window

It should be noted that Alias directives can only be added to the httpd.conf, whilst Redirect* directives can also be added to a .htaccess file, assuming permissions to been granted via a suitable: Options FollowSymLinks directive.

Also bear in mind the order of precedence (hierarchy) between the: Alias, ProxyPass, Redirect, RedirectMatch and RewriteRule directives. When contradictory Directive appear, the Rule with the highest precedence (lowest number) in the following list will be processed:
1)      ProxyPass in the httpd.conf
2)      RedirectMatch in the lowest level .htaccess  e.g. /images/products/.htaccess
3)      RedirectMatch in a higher level .htaccess    e.g. /images/.htaccess  then  /.htaccess
4)      RedirectMatch in the httpd.conf
5)      Redirect in the lowest level .htaccess
6)      Redirect in a higher level .htaccess
7)      Redirect in the httpd.conf
8)      Alias in the httpd.conf
9)      RewriteRule lowest level .htaccess
10)      RewriteRule higher level .htaccess
11)      RewriteRule in the httpd.conf

Also note:
1) If there are contradictory rules of the same type, within a single file, then the first rule to appear will be processed.
2) Rules added to the httpd.conf, or Include'ed *.conf files will only be read and Loaded into memory at server start, which is more efficient but you will have to re-start the Apache instance to pick up a rule change.  .htaccess based rules are read, loaded and interpreted for every request hitting the server, so any change to a rule will be instantaneous, but there is overhead in using this approach.

Rewriting


Now on to the meaty toy: mod_rewrite, this module offers the ability to manipulate a URL in every imaginably way, including manipulating the URL’s Scheme and Query String. The modifications can also be made conditional and incorporate, in the new URL any of the passed (not via the URL) HTTP Header variables or external variable or resource accessible by Apache e.g. the Time, the user ID of the body requesting the resource, the machine requesting the host, a CGI parameter....
The module is also able to block access to a resource, again on the same array of factors, Proxying requests on to another server or site, load balance requests across multiple servers, set environment variables for another module to use, skip other rules, call an external program to process the URL, Internally Rewrite (Alias) requests, etc...
But before you leap in it's worth reading a couple of the Basic mod-rewrite guides out on the web e.g.
URL Rewriting for Beginners
And also refresh or acquire a basic knowledge of Regular expressions, see section below.

As per mod_alias your Apache instance is either going to need to have the module built in or loaded, so check apache (as above) and the httpd.conf for a:
LoadModule  mod_rewrite  modules/mod_rewrite.so
and if your planning to use .htaccess based rules a suitable Options FollowSymLinks.

As I mentioned above a RewriteRule can replicate an unconditional Redirect, or make the Rule dependent on one or more conditions, the syntax for both for is as follows:

# URI pattern only
RewriteRule   <URI Pattern>       <New URL>   [Response codes and modifiers]

# URI pattern AND a Conditional Rule:
RewriteCond   %{<Some variable>}  <Pattern>   [Logical operation and modifiers]
RewriteRule   <URI Pattern>       <New URL>   [Response code, Operation or modifier]

Open in new window

Note: a rule can have multiple Conditions, joined via logical AND's (default) or OR's e.g.

Where the common Response codes and variables can be found in the following lists:

Common Response codes, Operations and modifiers:
* NC                   Not Case Sensitive
* OR                   Logical OR
* F                        Fail - Block Access to the resource.
* L                        Last - If the rule matches do not look at any
* QSA                  Query String Append - Append any passed Query String to any new values specified in the <NewURL>
* P                        Proxy request, via mod_proxy on to server specified in <newURL>, and serve back the responce under the local URL.
* PT                  Pass through, internally modify the URL then run through all the Redirect, Alias and Rewrite rules again.
* R=###                  HTTP return Code passed back to the browser e.g. 301, 302, 307, 404.  If omitted and the <New URL> is local, then Apache will internally fetch and serve rather than redirect the browser to the new <New URL>.
* E                        Assign a value to an environment variable.


Common Variables:
* REQUEST_URI            The requested URI, the bit between Hostname and the query string e.g. /somedir/somefie.html
* QUERY_STRING            The CGI parameter, the bits after a "?" in the URL
* HTTPS                        Whether the request has been SSL encrypted (https:// rather than http://)
* HTTP_REFERER            If reached via a link, the page the link was on.
* HTTP_HOST                  The Host name or IP address used to reach this Apache instance e.g. www.somesite.com, mail.somesite.com, 11.22.33.44
* HTTP_USER_AGENT      Browser or BOT Type
* REMOTE_HOST            The IP address of the remote Computer
* REMOTE_USER            The User ID of the logged in user, if using HTTP authentication.
* REQUEST_METHOD      POST or GET

From the sub-set we might as well attempt to create a few rules using the above syntax e.g
#Required just once - enables Rewriting.
RewriteEngine On

#Internaly Alter (Alias) the first 2 characters of URI PATH's starting: /VV  to: /XX, ignoring the case, and continue on to the next rule (default action).
RewriteRule ^/?VV(.*)  /XX$1  [NC]

# If the requested PATH start with “XX", followed by one ore more non / characters, and ends in a /
# AND that does not resolve to File, Directory or Symbolic link 
# Then: internaly redirect the request to the /index.php page and pass the rest of the URI, the bit after /XX in the CGI variable: restOfURI, 
#        and ignore all further RewriteRule's
RewriteCond %{REQUEST_FILENAME}  !-f
RewriteCond %{REQUEST_FILENAME}  !-d
RewriteCond %{REQUEST_FILENAME}  !-s
RewriteRule ^/?XX([^/]+)/$       /index.php?restOfURI=$1           [NC,L] 

# If the request is for page /products/show.php
# And a (passed category parameter of value {20,21,22,23,24,25}
#        Or product code starting with 'a' ot 'A' and followed by a series of digits)
# Then internaly redirect the request to page: /products/temptNotAvailable.html, and ignore all other RewriteRule
RewriteCond %{QUERY_STRING}      catogory=2[0-5]                   [NC,OR]
RewriteCond %{QUERY_STRING}      product=a[0-9]+                   [NC]
RewriteRule ^/?products/show.php  /products/temptNotAvailable.html [L] 

Open in new window


Note: There can may be a slight difference in the RewriteRule syntax, but not the RewriteCond, between a .htaccess and httpd.conf based rule. As the URI extracted from say:
http://www.somesite.com/XX123
Will be /XX123 if the rule is located in your httpd.conf but just XX123 when the same rule is located in from you root .htaccess file. This is due to the module stripping of  the  contents of RewriteBase from the URI, when calling a .htacess based rule, by default RewriteBase = /. So if you are looking for URI’s starting with a particular string remember:
# .htaccess Format
RewriteRule ^XX([^/]+) index.php  [NC,L]
# httpd.conf Format
RewriteRule ^/XX([^/]+) index.php  [NC,L]
# Will work in either, but less efficient
RewriteRule ^/?XX([^/]+) index.php  [NC,L]

Open in new window



Debugging:
If you have access to the httpd.conf and are experiencing difficulties, then add the following to either your VirtualHost of global definition, and restart Apache:
RewriteLog      /tmp/tmp_rewrite.log
RewriteLogLevel 9

Open in new window

Once in place just browse to the erroneous url, then check the log file.  The log will indicate the values it's comparing with each patterns in your Rules. Once you have identified and resolved the issue remember to either remove the lines or set the RewriteLogLevel to 0, and restart Apache.

If you don't have access to the httpd.conf I suggest you attempt the table based approach I've suggested in How to define your own Redirects or Rewrites section, else if you don't have access to the httpd.conf or just wish to see the Browser->Server->Browser side of things, I suggest you use Firefox and the Live HTTP headers plug-in.

Reverse Proxying


As briefly mentioned above, Proxying describes the process where for a given request your Apache server is configured to go off to another server, or servers, and requests a resource be served by that server, obtains the results and then presents them back, possibly modifying a few links in the process. The returned objects appear to the requestor as if they originated from your Apache server. It should be noted the backend conversation is not restricted to HTTP, for example Apache could pull a file from a File server, using a ProxyPass ... ftp:// or a page from a Tomcat Application server using ProxyPass ... AJP:// or just a JkMount, then again it's just as happy using a http:// URL in either a ProxyPass or RewriteRule to request the remote resource via plain HTTP or HTTPS.

One fairly important point is that all the available proxying solutions are restricted to httpd.conf based GLOBAL or VirtualHost wide directive, there is no .htaccess option. Also this is a fairly complex area and I'd recommend you read: Running a Reverse Proxy in Apache  and / or The Apache Tomcat Connector - Generic HowTo before going further.

Per the previous section Apache will first need to loaded the appropriate modules e.g.
LoadModule  proxy_module          modules/mod_proxy.so
LoadModule  proxy_http_module     modules/mod_proxy_http.so
#LoadModule proxy_ajp_module      modules/mod_proxy_ajp.so
#LoadModule proxy_ftp_module      modules/mod_proxy_ftp.so
#LoadModule jk_module             modules/mod_jk.so
#LoadModule proxy_balancer_module modules/mod_proxy_balancer.so

Open in new window


Then either in the main httpd.conf, if a global rule or in a specific Virtual host definition have a suitable set of directived e.g.
ProxyRequests Off
ProxyPass        /application http://www.otherSite.com/bar
ProxyPassReverse /application http://www.otherSite.com/bar
<Proxy *>
Order deny,allow
Allow from all
</Proxy>

Open in new window


It's also possible to replace the unconditional ProxyPass directive with a conditional RewriteRule e.g.
#The following directive, can be made conditional:
ProxyPass        /application http://www.otherSite.com/bar

# e.g. Only Proxy requests for HTTP user: Fred
RewriteCond   %{REMOTE_USER}  fred                           [NC]
RewriteRule  /application     http://www.otherSite.com/bar   [P,L]
RewriteRule  /application     -                              [F]

Open in new window

Note: You still require the rest of the Proxy directive above, when using a RewriteRule instead of a ProxyPass.

Regular Expressions (RegEx)

A Regular expression is just the term used for a STRING description syntax. and you'll find the RegEx syntax used by Apache is more or less common with that used in the majority of computer systems and applications these days, so handy to learn. It should also be noted that an RegEx's primary purpose within a Rewrite or Redirect directive is to just provide a mechanism to simplify your rules, through the ability to wildcard elements of the pattern, for example:
The following logic looking for a PATH starting with either  /aaa/, /bbb/ or /ccc/:
RewriteCond %{REQUEST_URI}  ^/aaa/  [OR]
RewriteCond %{REQUEST_URI}  ^/bbb/  [OR]
RewriteCond %{REQUEST_URI}  ^/ccc/  
...

Open in new window

Could be re-written using the regular expression:
RewriteCond %{REQUEST_URI}  ^/(aaa|bbb|ccc)/  
...

Open in new window

Similarly  the following parameter matching logic:
RewriteCond %{QUERY_STRING}   cat=1   [OR]
RewriteCond %{QUERY_STRING}   cat=2   [OR]
RewriteCond %{QUERY_STRING}   cat=3   
...

Open in new window

Could be re-written using the regular expression:
RewriteCond %{QUERY_STRING}   cat=[1-3]
...

Open in new window

As well as simplifying the process you’ll probably find a basic grasp of Regular Expressions will be vital in achieving what you want. So I suggest you download and read the following cheat sheets, along with the covering articles, then possibly scan over my scribblings below:
Regular Expression syntax:
!    Not the following pattern
( )  Assign the contents of the pattern inside the braces, to a variable.
[ ]   Any of the characters enclosed in these braces
(a|b) Either a or B
.    Any character
\.   A Period character
\\   A Back slash character
\(   A brace
?    ZERO or more occurrences of the previous pattern
*    ZERO or more occurrences of the previous pattern
+    One or more occurrences of the previous pattern

Open in new window

A few common regular expressions in use:
[A-Z0-9]      An Alphanumeric character
[^/\-_]       Not any of the following characters:  / – _ 
^/?xxx        An optional leading /, so either /xxx or xxx will match
xxx/?$        An optional trailing /, so either xxx/ or xxx will match
[A-Z]*        ZERO or more Alphabetic characters 
[A-Z]+        One or more Alphabetic characters 
! /xxxx       Not String /xxxx
/(cat|dog)/   Either /cat/ or /dog/

Open in new window

Note:
1) Multiple pairs of Braces ( ) can be used in both RewriteRule and RewriteCond patterns, but the resulting variables will follow different naming conventions. The elements highlighted in a RewriteRule will be automatically named  $1, $2, $3, ... $9, while those appearing in a RewriteCond will end up with the names %1, %2, %3, ... %9.
2) You can mix and match Variables extracted from a Rewriterule or RewriteCond pattern in your <NewURL> e.g.
3) Note a MAXIMUM of 9 variable can be assigned form each type of Pattern, limiting you to a Maximum of 18 Pattern variables in your <NewURL>.

#Convert URI: /JumpedTheQuickFoxBrown.html  to: /The_Quick_Brown_Fox_Jumped.html
RewriteCond  %{REQUEST_URI}                        /?Jumped(The)Quick(Fox)Brown\.html
RewriteRule  /?(Jumped)The(Quick)Fox(Brown)\.html  /%1_$2_$3_%2_$1.html                    [L,R=301]

Open in new window

If you wish to test the above rule simple add a file by the name of: The_Quick_Brown_Fox_Jumped.html to your document root, the content below should do, add the rule and navigate to /JumpedTheQuickFoxBrown.html
<HTML>This is a test<HTML>

Open in new window


How to define your own Redirects or Rewrites:

I would suggest you start by creating a table, possibly in s spreadsheet with the following columns:
1)      Rule number, just start at 1 and increment.
2)      Source (the URL the user enters / Click on):
3)      Source pattern - If Needed create a RegExp to cover the Source string.
4)      Destination (The page / location the user is to be served).
5)      Additional Conditions:  {CGI Parameters, Protocol, Host name, Case Sensitivity, Cookie, User signed in}.
6)      Type or Redirect: {Permanent, Temporary, Internal (Alias), Proxy, Deny, Last}
7)      Apache directive.
Populate the table with any redirects you currently have in either your httpd.conf or .htaccess files.
Add your new Rule(s) to the sheet,
Sort on the "Source" column and then scan through the list and look for any overlap. If there is an overlap and:
* The Destination URL's differ, see if you can find a optional condition to distinguish the two rules.
* The Destination URL's and Conditions are the same, see if you can device a Source pattern that covers both URL's.

Once happy apply the New rule / rules to your Apache configuration.

A few Common Requests:

Solutions for a fair number of the how to rewrite X to Y posts in Apache HTTPD zone can be found in either Apache’s own basic or Advanced guides:
I’ll also add a few others for good measure:
1) Redirecting from a non www. to www. domain e.g. http://mail.somesite.com/   to  http://www.somesite.com/
RewriteCond %{HTTP_HOST} !^www\.                  [NC]
RewriteRule ^(.*)$       http://www.domain.com/$1 [L,R=301]

Open in new window

2) If the requested PATH start with “XX", and does not resolve to a File, Directory or Symbolic link redirect to the index.php page:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule ^XX([^/]+) index.php  [NC,L] 

Open in new window

3) How do you apply a condition to several rules, simple by skipping 'n' rules if the condition does not match e.g.
#Skip the next 3 rules IF Not a request for /product/special
RewriteCond %{REQUEST_URI}  ! /product/special [NC]
RewriteRule .*  -  [S=3]
RewriteRule /?product/special/([^/])+$                /product/main.php?cat=special&month=$1  [L]
RewriteRule /?product/special/([^/])+/([^/])+$   /product/main.php?cat=special&month=$1&cat=$2  [L]
RewriteRule /?product/special/([^/])+/([^/])+/([^/])+$   /product/main.php?cat=special&month=$1&cat=$2&item=$3  [L]
# End of /product/special block

Open in new window

4) Block access to your image, unless the request has apparently come from a page on yourSite:
# Only accept image request from your own site
RewriteCond %{HTTP_REFERER}  !www\.yoursite\.com [NC]
RewriteRule \.(jpg|gif|png)  -                   [F,L]

Open in new window

5) How to make the contents of a sub-directory appear to be your Document root, with a few exceptions for admin and status app's:
# Rewrite all request for URL's not starting with: /theSubDir, /status, /admin or /manager into the /theSubDir/
RewriteCond %{REQUEST_URI}  !/(theSubDir|status|admin|manager)
RewriteRUle  .*    /theSubDir%{REQUEST_URI}    [L]

Open in new window

6) Only permit indirect access to your index.php, via an internal redirect from another URI:
#Attempt to block direct requests to index.php script, only internally rewritten requests should get through 
RewriteRule ^/?AppName       index.php [L,NC]
RewriteCond %{THE_REQUEST}        index\.php        [NC]
RewriteRule index.php       -                    [F,NC]

Open in new window

7) Force all requests for files in the /secure/ directory to HTTPS (Assumes your server already has a set of SSL certificates installed) and for /insecure/ files to http:
RewriteCond %{HTTPS} !on
RewriteRule ^/?secure/  https://%{SERVER_NAME}%{REQUEST_URI} [L,R=301]
RewriteCond %{HTTPS} on
RewriteRule ^/?insecure/  http://%{SERVER_NAME}%{REQUEST_URI} [L,R=301]

Open in new window


Resources:


(C) Andrew Roberts, Oct 2010
7
Comment
Author:arober11
  • 2
4 Comments
 

Expert Comment

by:harvest-soft
How do we find a case when/where QuerySTring is blank?

Also an example to find anything but few words would be worth.
e.g * but (xyz|images|css|other)
0
 
LVL 26

Author Comment

by:arober11

The following condition will ensure the QUERY_STRING is blank:

RewriteCond %{QUERY_STRING}  ^$

Open in new window

0
 

Expert Comment

by:harvest-soft
0
 
LVL 35

Expert Comment

by:gr8gonzo
Nice job, arober11. I haven't used Apache for reverse-proxying before.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Join & Write a Comment

We’ve all felt that sense of false security before—locking down external access to a database or component and feeling like we’ve done all we need to do to secure company data. But that feeling is fleeting. Attacks these days can happen in many w…
Are you ready to place your question in front of subject-matter experts for more timely responses? With the release of Priority Question, Premium Members, Team Accounts and Qualified Experts can now identify the emergent level of their issue, signal…
Suggested Courses
Course of the Month14 days, 2 hours left to enroll

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month