Link to home
Start Free TrialLog in
Avatar of J_K_1
J_K_1

asked on

how to redirct 15K URL's to new site

I have 15K URL's that are indexed by google and have a good page rank.

I developed a new Joomla site where the URL structure uses a category and a listing id.

The old site, cfusion, URL only uses a listing ID.

How would I redirect the old URL's to the New URL's without causing a huge performance hit on my site?

Here's the specs
- Shared hosting linux/apache

- Was going to use .htaccess mod_rewrite:

RewriteCond %{QUERY_STRING}  ^fuseAction=books\.viewBook&book_id=1$ [NC]
RewriteRule ^cfusion/index\.cfm$ /index.php?option=com_content&view=article&catid=21:yafiction&id=1 [R=301,NE,NC,L]

- The URL structure is:
FROM:
http://www.mysite.net/cfusion/index.cfm?fuseAction=books.viewBook&book_id=1
TO:
http://www.mysite.net/index.php?option=com_content&view=article&catid=21:yafiction&id=1

There are 4 categories (21,22,23,24) in the new URL. I was going to just hard code all of the rewrites and put them in the .htaccess, but I have been told that will significantly slow my site down.

I am looking for an answer that
1. doesn't stall the site to a crawl
2. keeps Google happy and I don't lose my 1st page results on google


Avatar of gr8gonzo
gr8gonzo
Flag of United States of America image

Okay, so since you're on shared hosting, you can't really turn off overrides (per the previous question you asked), and you can't put all 30,000 lines into the main Apache config, but what you CAN do is move all the redirects into a PHP script where you can perform database lookups and such instead of hardcoding 15,000 URLs.

First, clear out all of the old redirects out of your .htaccess file (make a backup just in case), and then just add this one line:

ErrorDocument 404 /redirect.php

Save and close the .htaccess file. Now whenever a request comes in and it would normally get a 404 error, the request will first be passed to the redirect.php script. You can intercept that request, do your own coding/scripting to figure out what you want to do with the request, and then redirect with PHP to the correct page.

You'll also have to rename cfusion/index.cfm so that the request goes to the redirect script.

Then create your redirect.php script like this:

<?php

// Our new URL will default to be blank
$newURL = "";

// Extract info from old URL
$oldURL = $_SERVER["REQUEST_URI"];
if(preg_match("/cfusion\/index.cfm\?fuseAction=books.viewBook&book_id=([0-9]+)/",$oldURL,$matches))
{
        // Found a request for the old URL! Now we have to perform some logic to redirect properly...
        $oldBookID = $matches[1];

        // Figure out what the new values should be below based on the book ID (maybe a database lookup?)
        $id = 1;
        $catID = 21;
        $categoryName = "yafiction";

        // Create your new URL
        $newURL = "/index.php?option=com_content&view=article&catid={$catID}:{$categoryName}&id={$id}";
}

if($newURL)
{
        // We got a new URL, so redirect with a 301
        header( "HTTP/1.1 301 Moved Permanently" );
        header( "Location: {$newURL}" );
}
else
{
        // No new URL was provided, so we'll return the normal 404
        header( "HTTP/1.1 404 Not Found" );
}
?>

You'll have to change the guts of the script a bit so that it knows what the new URL will be (I'm assuming you have a database table that contains the mappings from old-to-new URLs or else some logic on how to create the new URLs), but you get the idea.

Now, just a few notes:

1. This uses 301 redirects, which are pemanent redirects. ANY redirect will have some page ranking loss (there's no way around that), but 301 redirects will result in the least amount of loss in rankings because you're informing Google that the redirection is permanent - it's not just some SEO trick. This is the type of redirect you were doing before with mod_rewrite.

2. Even though you're running a PHP script (which does take some overhead processing to do), it should still be faster overall and a lighter server load than reading and running those 30,000 lines of .htaccess code for every single request. Especially since the PHP redirect script will only be executed for 404s, so it shouldn't impact things like images and such (unless you have a ton of missing images somehow).

3. You could increase performance even further by moving the .htaccess file into your "cfusion" subdirectory. This way, no .htaccess file needs to be loaded at all for most of your site / pages / resources. It will only read the .htaccess file when someone tries to hit a URL that would reside inside the /cfusion subdirectory. If you don't have that subdirectory, create it, and then move the file in there.
Avatar of J_K_1
J_K_1

ASKER

Wow. I'm continually amazed at the depth of these responses. Thanks for the thoughtful post!

Ok, so let me break this down and see if it is something I can do
First, clear out all of the ... then redirect with PHP to the correct page.
This makes sense, no problem here

You'll also have to rename cfusion/index.cfm so that the request goes to the redirect script.
The cfusion site is hosted on a different server than the new site. When I have everything ready to go, I was going to point the domain to the new site. I'm guessing I don't need to rename cfusion/index.cfm since it will be sitting lame on the old server.

Then create your redirect.php script like this:
Oh crap. I know next to nothing about php scripting. I mean, I can copy and paste, but yeah....that doesn't count :)
I read through the code you posted, and it makes sense, based on your comments. I would be able to create a .csv file with the URL's grouped by category. And I could import it into MySQL database. I would need some hand holding for linking the script to the database properly

Now, just a few notes:
1 and 2 make sense, thanks for the clear explanation

3. You could increase performance even further by moving the .htaccess file into your "cfusion" subdirectory. .. If you don't have that subdirectory, create it, and then move the file in there.
So I would create the folder /cfusion and put the .htaccess file that has the ErrorDocument 404 /redirect.php line that you instructed me to create. I don't mean to confuse the situation here, but I am using .htaccess with Joomla to create SEF URL's. Maybe I should use a Joomla extension like sh404SEF instead, and get rid of the htaccess file.


suPHP_ConfigPath /home/mywebhostingusername/public_html
##
# @version $Id: htaccess.txt 14401 2010-01-26 14:10:00Z louis $
# @package Joomla
# @copyright Copyright (C) 2005 - 2010 Open Source Matters. All rights reserved.
# @license http://www.gnu.org/copyleft/gpl.html GNU/GPL
# Joomla! is Free Software
##
#####################################################
#  READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE
#
# The line just below this section: 'Options +FollowSymLinks' may cause problems
# with some server configurations.  It is required for use of mod_rewrite, but may already
# be set by your server administrator in a way that dissallows changing it in
# your .htaccess file.  If using it causes your server to error out, comment it out (add # to
# beginning of line), reload your site in your browser and test your sef url's.  If they work,
# it has been set by your server administrator and you do not need it set here.
#
#####################################################
##  Can be commented out if causes errors, see notes above.
Options +FollowSymLinks
#
#  mod_rewrite in use
RewriteEngine on


########## Begin - Rewrite rules to block out some common exploits
## If you experience problems on your site block out the operations listed below
## This attempts to block the most common type of exploit `attempts` to Joomla!
#
## Deny access to extension xml files (uncomment out to activate)
#<Files ~ "\.xml$">
#Order allow,deny
#Deny from all
#Satisfy all
#</Files>
## End of deny access to extension xml files
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
# Block out any script trying to base64_encode crap to send via URL
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
# Block out any script that includes a <script> tag in URL
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Send all blocked request to homepage with 403 Forbidden error!
RewriteRule ^(.*)$ index.php [F,L]
#
########## End - Rewrite rules to block out some common exploits

#  Uncomment following line if your webserver's URL
#  is not directly related to physical file paths.
#  Update Your Joomla! Directory (just / for root)
RewriteBase /


########## Begin - Joomla! core SEF Section
#
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.feed|\.pdf|\.raw|/[^.]*)$  [NC]
RewriteRule (.*) index.php
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
#
########## End - Joomla! core SEF Section

Open in new window

I unfortunately don't know too much about Joomla - I've only ever edited a few things on it. However, we can probably help with the PHP scripting if you can walk through the mapping of the old-to-new URLs. For example, is there some kind of logic that look like:

Old ID=1, New ID=1 and Category ID=21
Old ID=2, New ID=2 and Category ID=23
...etc... ?

You had mentioned a CSV file - not sure if that's what contains all the old and new URLs?
ASKER CERTIFIED SOLUTION
Avatar of J_K_1
J_K_1

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of J_K_1

ASKER

I've requested that this question be closed as follows:

Accepted answer: 0 points for J_K_1's comment http:/Q_27108600.html#36015779

for the following reason:

figured it out myself
My apologies, I didn't see the last comment, but it sounds like you are going to use the approach I suggested with the error handler and PHP script. If that's the case, then it would be nice if you sent some of the points my way instead of closing it out. If you didn't use that method, then you should post your final solution so others can benefit from it.
Avatar of J_K_1

ASKER

no I found out that the catagory ID doesn't matter. It still passes the book ID regardless of what the catagory ID is set to.

So I have the rewrite as:

RewriteCond %{QUERY_STRING} ^fuseAction=books\.viewBook&book_id=(.*)$ [NC]
RewriteRule ^cfusion/index\.cfm$ /yafiction/%1? [R=301,NE,NC,L]

Open in new window