Link to home
Start Free TrialLog in
Avatar of EffinGood
EffinGoodFlag for United States of America

asked on

Can you explain this PHP Snippet?

Hello experts,

I'm working with a bit of code that was passed down to me and I don't quite have the ability to parse it to fully understand it. It's a crawler - so it's doing a lot of matching.

Could someone run through this and give me an explanation of what this is doing?

I have written some PHP but some of this is new on me, particularly starting with
 if($linksInArray[$Counter] == "" || $linksInArray[$Counter] == "#")
    continue;

Thank you!

 function get_a_href($url){
    $url = htmlentities(strip_tags($url));
    $ExplodeUrlInArray = explode('/',$url);
    $SubDomainName = $ExplodeUrlInArray[1];
    $DomainName = $ExplodeUrlInArray[2];
    $file = @file_get_contents($url);
    $h1count = preg_match_all('/(href=["|\'])(.*?)(["|\'])/i',$file,$patterns);
    $linksInArray = $patterns[2];
    $CountOfLinks = count($linksInArray);
    $InternalLinkCount = 0;
    $ExternalLinkCount = 0;
    for($Counter=0;$Counter<$CountOfLinks;$Counter++)
    {

    if($linksInArray[$Counter] == "" || $linksInArray[$Counter] == "#")
    continue;
    preg_match('/javascript:/', $linksInArray[$Counter],$CheckJavascriptLink);
    if($CheckJavascriptLink != NULL)
    continue;
    $Link = $linksInArray[$Counter];
    preg_match('/\?/', $linksInArray[$Counter],$CheckForArgumentsInUrl);
    if($CheckForArgumentsInUrl != NULL)
    {
    $ExplodeLink = explode('?',$linksInArray[$Counter]);
    $Link = $ExplodeLink[0];
    }
    preg_match('/'.$DomainName.'/',$Link,$Check);
    if($Check == NULL)
    {
    preg_match('/http:\/\//',$Link,$ExternalLinkCheck);
    if($ExternalLinkCheck == NULL)
    {
    $InternalDomainsInArray[$InternalLinkCount] = $Link;
    $InternalLinkCount++;
    }
    else
    {
    $ExternalDomainsInArray[$ExternalLinkCount] = $Link;
    $ExternalLinkCount++;
    }

    }
    else
    {
    $InternalDomainsInArray[$InternalLinkCount] = $Link;
    $InternalLinkCount++;
    }
    }

    $LinksResultsInArray = array(
    'ExternalLinks'=>$ExternalDomainsInArray,
    'InternalLinks'=>$InternalDomainsInArray
    );
    return $LinksResultsInArray;
    }

Open in new window

SOLUTION
Avatar of Dan Craciun
Dan Craciun
Flag of Romania image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You might want to consider applying a coding standard to the script.  If you just indent the control structures in a sensible manner, a lot of the logic will be visible.
Avatar of EffinGood

ASKER

Hi Dan,

Thanks man, that # was throwing me off. Couldn't figure that one out! It's looking for an anchor. Check.
To make it easy on myself, especially my future self that may have to change the code, I usually use more parenthesis to group the statements to make it clearer what I think I'm doing.
if(($linksInArray[$Counter] == "") || ($linksInArray[$Counter] == "#"))
    continue;

Open in new window

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Wow, thank you gentlemen. I wasn't 100% sure on how to break up points on your deeeelish answers. You make a lady feel special. Thank you!
Glad we were able to help!  Thanks for the points and thanks for using EE, ~Ray