Link to home
Start Free TrialLog in
Avatar of Heather Ritchey
Heather RitcheyFlag for United States of America

asked on

Checking https returns 301

I know this is going to be hard for me to explain, so please ask if there's other info that will help clarify.

We have a tool here: http://www.getsearchified.com/ that you enter a url of any site and it scans through some things to return seo recommendations. Where it's now having a problem is it automatically is checking http:// rather than checking what you enter so that you can check an http:// site or an https:// site. The tool fails if it doesn't get a 200 response. So obviously it's seeing 301 because it's automatically looking at the site as http:// first. The tool was built a really long time ago so I think it just wasn't fully thought out for https sites since there weren't really a ton of ssl sites yet. We can't even remember the programmer we hired to get things going on it to be able to reach out to them for advice.

The tool was built using smarty, so I "think" where code needs edited is in the two files attached.

Any help on this would be great.
seoinspector.php
easywebfetch.php
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Please post the code for the is_subdomain() function, and any other dependencies you know of, thanks.

Also a good design strategy, going forward, is to avoid the use of private visibility.  When you have private methods or properties, it makes it impossible to extend the class effectively.  You can only add to it, not replace functionality, and with private properties, you can't "see" the data the object is using.
Avatar of Heather Ritchey

ASKER

That function is in the attached file. Hopefully it also has the others that may need to be seen. Appreciate you taking a look. Like I said it was made a very long time ago and by a smarty programmer that we can no longer reach.
extensions.php
See if this works for you.  I don't have exhaustive testing tools, but I made a change near line 202 in the EasyWebFetch class to incorporate HTTPS.  I would also add that if we were building this today, we would probably add a dependency on cURL - it abstracts away a lot of the issues like the redirects and generally makes the code package smaller and easier to test.  (I think private methods and properties work against any testing strategies, too).

EasyWebFetch may throw a Notice because of an undefined index.  I did not try to address that.
[17-Mar-2017 15:56:57 America/Chicago] PHP Notice:  Undefined index: transfer_encoding in /home/iconoun/public_html/demo/easywebfetch.php on line 106


Here is a script that will return 200 OK.  It will respond on either HTTP or HTTPS.
http://www.iconoun.com/demo/temp_dzynit_200.php
<?php // demo/temp_dzynit_200.php
/**
 * https://www.experts-exchange.com/questions/29009903/Checking-https-returns-301.html
 *
 * https://en.wikipedia.org/wiki/HTTP_302
 */
error_reporting(E_ALL);


// CREATE OUR WEB PAGE IN HTML5 FORMAT, USING HEREDOC SYNTAX
$htm = <<<HTML5
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta charset="utf-8" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>HTML5 Page in UTF-8 Encoding</title>
</head>
<body>

<noscript>Your browsing experience will be much better with JavaScript enabled!</noscript>

<p>Hello World</p>

</body>
</html>
HTML5;


// RENDER THE WEB PAGE
echo $htm;

Open in new window


Here is a script that will redirect with 302 to the 200 script on HTTPS.
http://www.iconoun.com/demo/temp_dzynit_302.php
<?php // demo/temp_dzynit_302.php
/**
 * https://www.experts-exchange.com/questions/29009903/Checking-https-returns-301.html
 *
 * https://en.wikipedia.org/wiki/HTTP_302
 */
error_reporting(E_ALL);


// A URL THAT WILL NOT REDIRECT
$url = 'https://www.iconoun.com/demo/temp_dzynit_200.php';


header('HTTP/1.1 302 Found');
header("Location: $url");
exit;

Open in new window


Here is a script that will make an automated call to the 302 script, triggering the redirection.
https://www.iconoun.com/demo/temp_dzynit.php
<?php // demo/temp_dzynit.php
/**
 * https://www.experts-exchange.com/questions/29009903/Checking-https-returns-301.html
 */
error_reporting(E_ALL);
require_once('easywebfetch.php');


// A URL THAT WILL REDIRECT TO HTTPS
$url = 'http://iconoun.com/demo/temp_dzynit_302.php';
echo PHP_EOL . $url;


/** ACTIVATE THIS TO PROVE THAT THE 302 REDIRECT WORKS
$htm = file_get_contents($url);
echo htmlentities($htm);
**/


$ewf = new EasyWebFetch();
$ewf->get($url);
echo '<pre>';
print_r($ewf);

Open in new window


Here is the EasyWebFetch with my changes near line 202.
<?php
/*
 * EasyWebFetch - Fetch a page by opening socket connection, no dependencies
 *
 * PHP version 5
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 *
 * @author    Nashruddin Amin <me@nashruddin.com>
 * @copyright Nashruddin Amin 2008
 * @license   GNU General Public License 3.0
 * @package   EasyWebFetch
 * @version   1.1
 */

class EasyWebFetch
{
    private $_request_url;
    private $_host;
    private $_path;
    private $_query;
    private $_fragment;
    private $_headers_only;
    private $_portnum       = 80;
    private $_user_agent    = "SimpleHttpClient/1.0";
    private $_req_timeout   = 120;
    private $_maxredirs     = 5;
    private $_extra_headers = null;

    private $_use_proxy     = false;
    private $_proxy_host;
    private $_proxy_port;
    private $_proxy_user;
    private $_proxy_pass;

    private $_status;
    private $_resp_headers;
    private $_resp_body;

    private $_is_error;
    private $_errmsg;

    /**
     * class constructor
     */
    public function __construct()
    {
        $this->_resp_headers = array();
        $this->_resp_body    = "";
    }

    /**
     * get the requested page
     *
     * @param string  $url          URL of the requested page
     * @param boolean $headers_only true to return headers only,
     *                              false to return headers and body
     *
     * @return  boolean true on success, false on failure
     */
    public function get($url = '', $headers_only = false)
    {
        $this->_request_url  = $url;
        $this->_headers_only = $headers_only;

        $redir = 0;

        while(($redir++) <= $this->_maxredirs) {
            $this->parseUrl($this->_request_url);

            if (($response = @$this->makeRequest()) == false) {
                return(false);
            }

            /* split head and body */
            $neck = strpos($response, "\r\n\r\n");
            $head = substr($response, 0, $neck);
            $body = substr($response, $neck+2);

            /* read response headers */
            $this->_resp_headers = $this->parseHeaders($head);

            /* check for redirects */
            if ($this->getStatus() == 301 || $this->getStatus() == 302) {
                $follow = $this->_resp_headers['location'];
                $this->_request_url = $this->setFullPath($follow, $this->_request_url);
                continue;
            } else {
                /* no redirects, start reading response body */
                break;
            }
        }

        /* read the body part */
        if ($this->_resp_headers['transfer_encoding'] == 'chunked') {
            $this->_resp_body = $this->joinChunks($body);
        } else {
            $this->_resp_body = $body;
        }

        return(true);
    }

    /**
     * build HTTP header and perform HTTP request
     *
     * @return  mixed   HTTP response on success, false on failure
     */
    private function makeRequest()
    {
        $method     = ($this->_headers_only == true) ? "HEAD" : "GET";
        $proxy_auth = base64_encode("$this->_proxy_user:$this->_proxy_pass");
        $response   = "";

        if ($this->_use_proxy) {
            $headers = "$method $this->_request_url HTTP/1.1\r\n"
                     . "Host: $this->_host\r\n"
                     . "Proxy-Authorization: Basic $proxy_auth\r\n"
                     . "User-Agent: $this->_user_agent\r\n"
                     . "Connection: Close\r\n";
            if ($this->_extra_headers) {
                foreach ($this->extra_headers as $header) $headers .= $header . "\r\n";
            }
            $headers .= "\r\n";

            $fp = fsockopen($this->_proxy_host, $this->_proxy_port, $errno, $errmsg, $this->_req_timeout);
        } else {
            $headers = "$method $this->_path$this->_query$this->_fragment HTTP/1.1\r\n"
                     . "Host: $this->_host\r\n"
                     . "User-Agent: $this->_user_agent\r\n"
                     . "Connection: Close\r\n";
            if ($this->_extra_headers) {
                foreach ($this->_extra_headers as $header) $headers .= $header . "\r\n";
            }
            $headers .= "\r\n";

            $fp = fsockopen($this->_host, $this->_portnum, $errno, $errmsg, $this->_req_timeout);
        }

        if (!$fp) {
            $this->_is_error = true;
            $this->_errmsg   = "Unknown error";
            return(false);
        }
        fwrite($fp, $headers);

        while(!feof($fp)) {
            $response .= fgets($fp, 4096);
        }
        fclose($fp);

        return($response);
    }

    /**
     * parse the requested URL to its host, path, query and fragment
     *
     * @return void
     */
    private function parseUrl($url)
    {
        $url = str_replace('http:///','http://', $url); //Heather
        $this->_host     = parse_url($url, PHP_URL_HOST);
        $this->_path     = parse_url($url, PHP_URL_PATH);
        $this->_query    = parse_url($url, PHP_URL_QUERY);
        $this->_fragment = parse_url($url, PHP_URL_FRAGMENT);

        if (empty($this->_path)) {
            $this->_path = '/';
        }
    }

    /**
     * get the full path of the page to redirect. if the requested page is
     * http://www.example.com and it redirects to redirpage.html, then the
     * new request is http://www.example.com/redirpage.html
     *
     * @param string $loc           new location from the HTTP response headers
     * @param string $parent_url    the parent's URL
     *
     * @return string  full path of the page to redirect
     */
    private function setFullPath($loc, $parent_url)
    {
        $parent_url = preg_replace("/\/[^\/]*$/", "", $parent_url);

        if (strpos($loc, 'http://') !== false) {
            return($loc);
        }

        if (strpos($loc, 'https://') !== false) {
            return($loc);
        }

        if (strpos($loc, '../') === false) {
            return("$parent_url/$loc");
        }

        while (strpos($loc, '../') !== false) {
            $loc        = preg_replace("/^\.\.\//", "", $loc);
            $parent_url = preg_replace("/\/[^\/]+$/", "", $parent_url);
        }

        return("$parent_url/$loc");
    }

    /**
     * parse HTTP response headers to array
     *
     * @param string $string HTTP response headers
     *
     * @return array
     */
    private function parseHeaders($string)
    {
        $string  = trim($string);
        $headers = array();

        $lines = explode("\r\n", $string);

        $headers['http_status'] = $lines[0];

        /* read HTTP _status in first line */
        preg_match('/HTTP\/(\\d\\.\\d)\\s*(\\d+)\\s*(.*)/', $lines[0], $m);
        $this->_status = $m[2];

        array_splice($lines, 0, 1); /* remove first line */

        foreach ($lines as $line) {
            list($key, $val) = explode(': ', $line);

            $key = str_replace("-", "_", $key);
            $key = strtolower($key);
            $val = trim($val);

            $headers[$key] = $val;
        }
        return($headers);
    }

    /**
     * join parts of the HTTP response body with chunked transfer-encoding
     *
     * @param string $chunks HTTP response body
     *
     * @return string full body
     */
    private function joinChunks($chunks)
    {
        preg_match("/\r\n([0-9a-z]+)(;?.*)\r\n/", $chunks, $match);
        $size = hexdec($match[1]);

        $body = "";

        while($size > 0) {
            /* remove line with chunk size */
            $chunks = preg_replace("/\r\n.+\r\n/m", "", $chunks, 1);

            $part   = substr($chunks, 0, $size);
            $chunks = substr($chunks, $size);

            $body .= $part;

            /* get next chunk size */
            preg_match("/\r\n([0-9a-z]+)(;?.*)\r\n/", $chunks, $match);
            $size = hexdec($match[1]);
        }
        return($body);
    }

        /**
         * sets the maximum timeout (in seconds)
         *
         * @param timeouto the maximum connection timeout in seconds
         */
        public function setTimeout($seconds) {
            $this->_req_timeout = $seconds;
        }
    /**
     * set the requested URL
     *
     * @param string $url URL of the requested page
     */
    public function setRequestUrl($url)
    {
        $this->_request_url = $url;
    }

    /**
     * set to return headers only
     *
     * @param boolean $headers_only true to return headers only,
     *                              false to return headers and body
     */
    public function returnHeadersOnly($headers_only)
    {
        $this->_headers_only = $headers_only;
    }

    /**
     * set proxy host and port
     *
     * @param string $hostport proxy host and proxy port in format proxy_host:proxy_port
     */
    public function setProxyHost($hostport)
    {
        list($this->_proxy_host, $this->_proxy_port) = explode(':', $hostport);
        $this->_use_proxy = true;
    }

    /**
     * Set a custom UserAgent for the request
     *
     * @param string $useragent Check out http://www.useragentstring.com/pages/useragentstring.php for a list of valid user agents.
     */
    public function setUserAgent($useragent) {
        $this->_user_agent = $useragent;
    }

    /**
     * Set any additional http request headers
     * such as "Accept-Encoding: gzip, deflate"
     *
     * @param string|array One or more header lines
     */
    public function setExtraHeaders($headers) {
        if ( is_array($headers) ) {
            $this->_extra_headers = $headers;
        } else {
            $this->_extra_headers[] = $headers;
        }
    }

    /**
     * set proxy user and password
     *
     * @param string $userpass proxy user and password in format proxy_user:proxy_password
     */
    public function setProxyUser($userpass)
    {
        list($this->_proxy_user, $this->_proxy_pass) = explode(':', $userpass);
    }

    /**
     * get the HTTP response status (200, 404, etc)
     *
     * @return string
     */
    public function getStatus()
    {
        return($this->_status);
    }

    /**
     * get the requested URL
     *
     * @return string
     */
    public function getRequestUrl()
    {
        return($this->_request_url);
    }

    /**
     * set maximum redirects
     *
     * @param int $maxredirs
     */
    public function setMaxRedirs($maxredirs)
    {
        $this->_maxredirs = $maxredirs;
    }

    /**
     * get HTTP response headers
     *
     * @param string $header Set this param if you want an individual header. cAsE-inSENSiTivE
     * @return array|string
     */
    public function getHeaders($header = null)
    {
        if ($header) {
            foreach (array_keys($this->_resp_headers) as $array_key ) {
                if ($array_key === strtolower($header) ) return( $this->_resp_headers[$array_key] ) ;
            }
        } else {
            return($this->_resp_headers);
        }
    }

    /**
     * get the HTTP response body, usually in HTML
     *
     * @return string
     */
    public function getContents()
    {
        return($this->_resp_body);
    }

    /**
     * get error message
     *
     * @return string
     */
    public function getErrorMessage()
    {
        return($this->_errmsg);
    }

    /**
     * print debug information
     */
    private function debug($text)
    {
        print "$text\n";
    }
}

Open in new window

Darn. Still doesn't work.
It seems like somewhere it's forcing the http instead of just accepting what's in the input field but I can't find where. If at least a full url whether http or https would work. I could then change the wording to specify to enter the full url until I could code in a check to find out if it's a site with ssl or not.
What do you mean by "doesn't work?"  Surely there must be a symptom you can tell us about! My tests worked perfectly - you have the links posted above along with the scripts, so you can see how I was testing.  What URL are you testing?
When you try it here: http://www.getsearchified.com/
You can enter https://www.securitymailbox.com/ in the input box and run it, but it still gives the redirect notice with status 301 and won't complete.
If you put this one in: http://www.nationaloff-roadjeepassociation.com/ you can see it brings back results.

Somehow, somewhere it seems to be forcing the tool to check the sites only using http and then won't complete when it gets the 301 status that returns when it gets to the site and the site automatically forces into the https mode. If I could at least get it to continue running even if it gets a 301it would be great.
I think you're seeing a different kind of event.  The Security Mailbox site sets a header 301 (not the 302, the 301 is a "moved permanently" header), but then (surprise!) the site also produces an HTML document instead of redirecting the browser.  This is kind of odd behavior.  This appears to be something it is doing in lieu of a header("Location...").  The request does, in fact, complete.  Here is the HTML document it produced.  Somehow the EasyWebFetch is being frustrated by the Security Mailbox site.
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://www.securitymailbox.com/">here</a>.</p>
<hr>
<address>Apache Server at www.securitymailbox.com Port 80</address>
</body></html>

Open in new window

Here is the complete output from my script
https://www.securitymailbox.com/<br />
<b>Notice</b>:  Undefined index: transfer_encoding in <b>/home/iconoun/public_html/demo/easywebfetch.php</b> on line <b>106</b><br />
<pre>EasyWebFetch Object
(
    [_request_url:EasyWebFetch:private] => https://www.securitymailbox.com/
    [_host:EasyWebFetch:private] => www.securitymailbox.com
    [_path:EasyWebFetch:private] => /
    [_query:EasyWebFetch:private] => 
    [_fragment:EasyWebFetch:private] => 
    [_headers_only:EasyWebFetch:private] => 
    [_portnum:EasyWebFetch:private] => 80
    [_user_agent:EasyWebFetch:private] => SimpleHttpClient/1.0
    [_req_timeout:EasyWebFetch:private] => 120
    [_maxredirs:EasyWebFetch:private] => 5
    [_extra_headers:EasyWebFetch:private] => 
    [_use_proxy:EasyWebFetch:private] => 
    [_proxy_host:EasyWebFetch:private] => 
    [_proxy_port:EasyWebFetch:private] => 
    [_proxy_user:EasyWebFetch:private] => 
    [_proxy_pass:EasyWebFetch:private] => 
    [_status:EasyWebFetch:private] => 301
    [_resp_headers:EasyWebFetch:private] => Array
        (
            [http_status] => HTTP/1.1 301 Moved Permanently
            [date] => Fri, 17 Mar 2017 22:56:29 GMT
            [server] => Apache
            [location] => https://www.securitymailbox.com/
            [content_length] => 313
            [connection] => close
            [content_type] => text/html; charset=iso-8859-1
        )

    [_resp_body:EasyWebFetch:private] => 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://www.securitymailbox.com/">here</a>.</p>
<hr>
<address>Apache Server at www.securitymailbox.com Port 80</address>
</body></html>

    [_is_error:EasyWebFetch:private] => 
    [_errmsg:EasyWebFetch:private] => 
)

Open in new window

When I read the URL https://www.securitymailbox.com/ with PHP file_get_contents(), I get the correct HTML document that represents what you see on your browser when you visit that URL.  It may be time to refactor EasyWebFetch.  But before you incur that expense, try testing with other HTTP-to-HTTPS redirection sites.   I think http://twitter.com might be a good test case.
Testing with Twitter, it looks like the script identifies the location, but has a logic error that prevents it from going to that location.  I'll try a few more tests.
This site also does the same https://www.sempdx.org - so does the twitter url. Must be a way to make a change in the coding somewhere so it won't care what server status it gets.
I don't think EasyWebFetch cares now about the HTTPS part (I haven't posted a new copy here yet, but I will before the end of this exercise).  But it looks like EasyWebFetch has no provision for handling cookies.  I'm not 100% sure this is the issue, but I can envision something like this...

HTTP request received at server
Server looks for a cookie that says, "I've been here before."
If no cookie, server sets the cookie and redirects.

If the client, in this case the EasyWebFetch script, does not accept and return the cookie, that cycle will never come to an end.
What other parts of the EasyWebFetch object do you need besides the content of the web page?  There may be a good way to short-circuit this whole thing if all you need is the HTML document and some parts of the headers.  We can use cURL.  Most PHP installations have it baked in.
http://php.net/manual/en/intro.curl.php

You can use this script, shown here in its entirety, to find out if you have this extension.  Scan the output for "curl" to see.
<?php phpinfo();

Open in new window

Honestly, I'm not sure how involved the easywebfetch is with the tool. We do need to check some things in the header and content areas and then it also checks for a couple root files that are involved in seo. (robots and sitemap) It's basically meant as a shortcut user-friendly way for people to get an seo review of their (or a customer's) site. We have two versions. This one at the getsearchified site that I'm using for help here is a shortened version. It only shows a small amount of results that the tool can do and also is that is used for the mobile app. We also have a paid version that has a whole lot more, but we're doing away with the paid one and slowly just adding more of the results into the free version. We didn't want them all to just all the sudden show up on an upgrade and confuse the heck out the people using the plugin.

Does that help give a better idea of what it's doing and why the oddness for the https sites not being able to show results is what I'm trying to conquer?
Yes, that helps.  I'm going to call it a night.  Will look at this again in the daylight.  FWIW, I am getting consistent good results using cURL.  

Do you have the cURL library?
<?php phpinfo();

Open in new window

yes, do have the curl library. But it's important to make sure it works for anyone that uses the plugin. The only thing we won't provide support for is windows servers simply because I do not have access to one to consistently test and program on.

And yeah, it's getting late. I appreciate your help so very much! Weekend is finally here - I plan to sleep in ;) I'll check back in tomorrow after my usual morning routine.

Thanks again and have a good night.
I don't know why I didn't think of it, but you can see all files and everything by installing the plugin on a wordpress site. I guess I didn't think of it because what's at getsearchified is a bare bones one tool only set up because it's meant to run the mobile apps.

The plugin is here: https://wordpress.org/plugins/seo-automatic-seo-tools/
It includes other tools too though, so you could just ask me if you're unsure where to look for anything pertaining to the url-checker tool itself.
Thanks for the tip on WP.  I haven't got a lot of interest in going down that rabbit hole right now, but I'll take a look at the ways you're using EasyWebFetch and see if there is a way to modernize it.  I would say it's not very good code by 2017 standards, but it's amazingly good code for 2008 standards.  Times change...
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I apologize for the delay. Had some urgent matters this past week. I'll talk with my boss when he's available and see what course he'd like to take. It's a well used tool, so I know he's going to want to try to do something. Thank you very much for putting time into testing and helping.
Depending on what we choose to do, I'll open another questions and reference this one so we remember everything we've tried and discussed.
Thanks for the points.  If you want me to run this to ground and work out the issues for a new EasyWebFetch, just let me know.  I do consulting, my rates are reasonable and I know PHP very, very well!  You can contact me via Ray.Paseur [at] Gmail.com (It's in my profile here at E-E) if you want to take a proposal to your boss.

All the best,
Ray