• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 532
  • Last Modified:

Shorten URLs produced by rss2html script

I have a peculiar problem: the urls produced by the rss2html script are way too long and since there are many items that the script is loading on a page it becomes very heavy.

Here's a typical URL:

http://site1.com/agents/ads.php?XMLFILE=http://www.site2.com/housing/sale/home/-price_0_75000/-/usa:tx:fortworth/%2b%2b30/%3fq%3d%2522owner%2bfinancing%2522%2bor%2b%2522seller%2bfinancing%2522%26v%3drss%26s%3dprice&TEMPLATE=2.html&GUID=http://granburytx.site2.com/j_rsxx_/3083260335-496u570,1243/realestate.site2.com___e2IdeSEZp5hkawkqbThWfsD0uw2YdD0g1fZNJIf3zN9qjgnlMmJe8b25W-TGGnkR3yNr21EHlyspsjK8rNBewbCQbOZi_BGyrXCvkzFQICQHrn2Dt_X9IpSrLAjdN8P-XKEWutirsQY,&MAXITEMS=1

Open in new window


Above is a link from the search results page to the listing details page. But because the search results page has over 50 listings  I was thinking  to somehow hash each URL and then decrypt the hash on the listing details page with file_get_contents or some other function.

So my question is how can I store or hash the URL above in the shortest possible form, and then retrieve it on the details page?
0
greenerpastures
Asked:
greenerpastures
  • 3
  • 3
  • 2
  • +2
3 Solutions
 
Julian HansenCommented:
One way is to store the URL in the session or a database and then use a unique id (autonumber) or UUID to reference the URL.

$hashedurls = isset($_SESSION['hashedurls'])?$_SESSION['hashedurls']:array();
if (!in_array($url, $hashedurls)) {
   $hashedurls[] = $url;
   $urlid = count($hashedurls)-1;
   $_SESSION['hashedurls'] = $hashedurls;
}

// To access a url
$urlid = $_GET['urlid']; // Modify appropriately to check for existence and sanitise
$hashedurls = isset($_SESSION['hashedurls'])?$_SESSION['hashedurls']:array();
if (!empty($hashedurls[$urilid])) {
   // use URL here
}

Open in new window

This assumes you are wanting to shorten these per session - if you want to have a global list of shortend URLS then you would need to consider using a database or file storage.
0
 
greenerpasturesAuthor Commented:
Each page will have up to 100 URL's and a visitor may visit up to 5-10 pages, so potentially 1,000 URLs per visitor to store. Will this be too much to handle for the session storage which might lead to perhaps longer page loading?

I was thinking more along the lines of the  crc32() function, but how do I "unhash" with  crc32()?

COuld any other php hashing function be used to hash and unhash a URL?
0
 
Julian HansenCommented:
I was thinking more along the lines of the  crc32() function, but how do I "unhash" with  crc32()?
You can't a hash is one way to give a short index into longer data.
Given the amount of data you are looking at say 50K per user which is probably not that much in today's terms. I am very resource shy so I would probably not use this approach.
You could use a database - but again there is a time cost in accessing the DB.
You could do a combination i.e keep 100 url's in the Session and if not found go to the DB and maintain an accessed count or simply just moved the last accessed to the top to keep the most accessed URL at the top and bump the one at the bottom.
Finally, if the URL's are all similar you can parse them - and extract only the bits that are unique to each and then rebuild the URL based on those values.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Eddie ShipmanAll-around developerCommented:
I agree with julianH, store the URLs in a database, on the server, write a PHP script to return the URLs based on an identifier that is used in the anchor's href and use ajax to retrieve the URL from that script in an anchor click event then redirect the customer to the returned URL.
0
 
greenerpasturesAuthor Commented:
Not sure about the database... I was trying to find a simpler solution. I have seen a caching script some place that would store the text in a caching file and then locate it when needed. Anybody came across such a script?

What about replacing the long URLs with short ones on the search results page, and once a URL is clicked it would redirect to the original long URL?
Are there any other PHP scripts that might be used to create temporary short URLs and then redirect to original URLs?
0
 
Ray PaseurCommented:
Not sure about the database... I was trying to find a simpler solution.
There isn't a simpler solution if you want to do this yourself.  

Have you tried this service?
https://bitly.com/
0
 
Julian HansenCommented:
The URL's are dynamic as I understand it so the service will not really be suitable in this case
would store the text in a caching file and then locate it when needed

You can do that I did not mention it because accessing a file in a shared enviroment means locking and releasing the file (if a single file is used) or using something like a session id to create a temporary file - but then you need to cleanup old files.

What about replacing the long URLs with short ones on the search results page, and once a URL is clicked it would redirect to the original long URL?

Well that is a hash isn't it?
If you have a way of working out the longer URL from the short one then obviously this would be the way to go which is why I asked you if there were parts of the URL that were static.
0
 
Ray PaseurCommented:
Please post a link to the web page that illustrates this phenomenon.  I'd like to see the real thing in action.

As far as dynamic URLs go, I believe that Bit.ly has an API to cover that part.  But if you want to try to replicate the functionality of Bit.ly, you can do it fairly easily.  Generate a random, unique string of 6 characters and store it in the data base along with the true URL.  Put that string into the URL of a master script.  The master script takes the string, makes a SELECT from the data base to acquire the true URL, and issues a header("Location") to redirect the client to the true URL.

A random unique string generator would work something like this.  You would not need a separate table to hold the unique strings - just mark the random string column UNIQUE.
<?php // RAY_random_unique_string.php
error_reporting(E_ALL);
echo "<pre>";

// GENERATE A SHORT UNIQUE RANDOM STRING FOR USE AS SOME KIND OF KEY
// WE DELIBERATELY OMIT LOOK-ALIKE LETTERS LIKE O and 0, I and 1.
// NOTE THAT THE DATA BASE MUST HAVE THE rand_key FIELD DEFINED AS "UNIQUE"
// NOTE THAT THE LENGTH ARGUMENT MUST MATCH THROUGHOUT SO WE DEFINE() IT.

define('ARG_LENGTH', 6);

// CONNECTION AND SELECTION VARIABLES FOR THE DATABASE
$db_host = "??"; // PROBABLY 'localhost' IS OK
$db_user = "??";
$db_word = "??";
$db_name = "??";

// OPEN A CONNECTION TO THE DATA BASE SERVER
// MAN PAGE: http://php.net/manual/en/function.mysql-connect.php
if (!$db_connection = mysql_connect("$db_host", "$db_user", "$db_word"))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB CONNECTION: ";
    echo "<br/> $errmsg <br/>";
}

// SELECT THE MYSQL DATA BASE
// MAN PAGE: http://php.net/manual/en/function.mysql-select-db.php
if (!$db_sel = mysql_select_db($db_name, $db_connection))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB SELECTION: ";
    echo "<br/> $errmsg <br/>";
    die('NO DATA BASE');
}
// IF WE GOT THIS FAR WE CAN DO QUERIES


// FUNCTION TO CREATE A DATABASE TABLE
function create_myTable()
{
    $length = ARG_LENGTH;

    mysql_query("DROP TABLE myTable");
    $psql
    =
    " CREATE TEMPORARY TABLE myTable
    ( _key        INT(8)            NOT NULL AUTO_INCREMENT
    , rand_key    VARCHAR($length)  UNIQUE NOT NULL DEFAULT '?'
    , PRIMARY KEY(`_key`)
    ) ENGINE=INNODB DEFAULT CHARSET=ascii
    "
    ;
    if (!mysql_query($psql)) die( "FAIL: $psql <br/>" . mysql_error() );
}


// FUNCTION TO MAKE A RANDOM STRING
function random_string()
{
    // POSSIBLE COMBINATIONS > 530MM IF LENGTH IS 6
    //          1...5...10...15...20...25...30.
   $alphabet = "ABCDEFGHJKMNPQRSTUVWXYZ23456789";
   $string   = "";
   while(strlen($string) < ARG_LENGTH)
   {
       $string .= substr($alphabet, mt_rand(0,(strlen($alphabet))), 1);
   }
   return($string);
}


// FUNCTION TO ENSURE THE RANDOM STRING IS UNIQUE
function make_random_key()
{
    $rand_key = '';

    // GENERATE A UNIQUE AND RANDOM TOKEN
    while ($rand_key == '')
    {
        $rand_key = random_string(ARG_LENGTH);
        $isql     = "INSERT INTO myTable ( rand_key ) VALUES ( '$rand_key' )";

        // IF QUERY ERROR
        if (!mysql_query($isql))
        {
            $err = mysql_errno();

            // DUPLICATE UNIQUE FIELD ON rand_key
            if ($err == 1062)
            {
                // ACTIVATE THIS TO VISUALIZE KEY COLLISIONS
                // echo PHP_EOL . $rand_key;

                // NULLIFY THIS KEY AND TRY AGAIN
                $rand_key = '';
            }

            // OTHER QUERY ERROR
            else
            {
                /* HANDLE FATAL QUERY ERROR ($isql) */
            }
        }
    }
    return $rand_key;
}


// SHOW HOW TO MAKE LOTS OF UNIQUE AND RANDOM STRINGS
create_myTable();

$kount = 0;
$array = array();
while ($kount < 100)
{
    $array[] = make_random_key();
    $kount++;
}

print_r($array);

Open in new window

0
 
Slick812Commented:
greetings  greenerpastures, , ,  hashs aren't meant to get anything back as the original, you could use some of the PHP compression functions to reduce the size of the string, but then you'd have to use the encode64 to use as URL, and it would be near the original size of the first URL, so nothing is gained.  This looks like a Real Estate  listing for a certain HOME type (price, size, location, etc.), so This must be a "PER USER" list of RSS URLS, , The best way to store that is with a PHP SESSION, that's what I would do if I understand what you have said about this so far.  I really do NOT see that you would need anysort of "HASH" or random unique url lookup, since this is a  "PER USER" list and would be regenerated, if that web page user made a LOOKUP for a homes listing in  another price range, after they already did on price range lookup.

<?php
session_start();

// IMPORTANT - the function generateRSS below would be YOUR URL rss link generator NOT MINE
function generateRSS($amt = 20) {
// this is here just to show you some code to use, your link generator would be different
	$reAry = array();
	for ($i=0; $i < $amt; ++$i) {
		$reAry[] = 'http://site1.com/agents/ads.php?XMLFILE=http://www.site2.com/housing/sale/home/-price_0_7500'.$i;
		}
	return $reAry;
	}


$rssResult = generateRSS(6);// YOU need to use your own URL production
// $rssResult is an array here since I do not know your return from your code

// Below is some code to try and show you to store and generate URLS
$storeURLS = array();
for ($i=0; $i < count($rssResult); ++$i) {
// I make a storage array, that has it's elements as= 'home0001'
// this has a maximum of 10,000 links, but can be increased 
	$num = str_pad(''.$i, 4, "0", STR_PAD_LEFT);//0002
	$storeURLS['home'.$num] = $rssResult[$i]; 
	}
// place the array into a session slot
$_SESSION['homes'] = $storeURLS;
echo 'RSS SESSION= ',$_SESSION['homes']['home0003'],' |<br />';

//below is some code to add the URLS to a link, but your code would be different
foreach ($_SESSION['homes']as $key => $value) {
	echo '<a href="http://site1.com/show?hm=',$key,'">Click for ',$key,'</a><br />';
	}
?>

Open in new window


this will generate links as
"http://site1.com/show?hm=home0003"

the http://site1.com  is YOUR site that will fetch the __GET value for "hm"
then use the session $_SESSION['homes'] to retrieve the long URL, and then you can redirect the page to that URL.

hope this give you some ideas, ask questions if you need more info
0
 
greenerpasturesAuthor Commented:
thanks
0
 
Eddie ShipmanAll-around developerCommented:
FYI, take a look at this recent PHPMaster article:
http://phpmaster.com/building-your-own-url-shortener/
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

  • 3
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now