Link to home
Start Free TrialLog in
Avatar of intellisource
intellisourceFlag for South Africa

asked on

how to check if there is a DIV enclosing a form element within the conditional PCRE?

hi, i've just completed a PCRE as follows to retrieve form elements from a form:
$pattern = '#<(form)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*action="([^"]*))?)' .
           '|<(input)(?=(?:[^>]*type="([^"]*))?)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*value="([^"]*))?)(?=(?:[^>]*checked="([^"]*))?)' .
           '|<(textarea)(?=(?:[^>]*name="([^"]*))?)[^>]*>([^<]|<(?!/textarea))*' .
           '|<(select)(?=(?:[^>]*name="([^"]*))?)' .
           '|<(button)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*value="([^"]*))?)#i';

Open in new window

i wish to post this question as a follow-up to the question available at https://www.experts-exchange.com/questions/27295097/loading-an-external-form-with-fields-and-attributes-into-an-array-via-preg-match-all.html?anchorAnswerId=36497404#a36497404
now if an element is within a div (parental or any other ancestral level tags) with inline style including "visibility: hidden" or "display: none" how can i detect that (hidden|none|empty string) by adding to the four element pattern's regex?
for example loading the login form on https://lmx.leads360.com/web/Login.aspx - the emailTexbox element is retrieved but does not give any clue as to wether it is displayed/hidden or not.
Array
(
    [0] => Array
        (
            [0] => <form
            [1] => form
            [2] => form1
            [3] => login.aspx
        )

    [1] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => hidden
            [6] => __VIEWSTATE
            [7] => /wEPDwULLTEwMTAwMDM0ODIPZBYCAgMPZBYCAg0PFgIeC18hSXRlbUNvdW50AgEWAmYPZBYCAgEPFgIeBFRleHQF3gU8ZGl2Pg0KPHA+T24gOC8xOS8wOSB3ZSByZWxlYXNlZCBtaW5vciB1cGRhdGVzIHRvIEV4cHJlc3MuIFBsZWFzZSByZXZpZXcgdGhlIDxhIGhyZWY9Imh0dHA6Ly9sZWFkczM2MC56ZW5kZXNrLmNvbS9mb3J1bXMvMTY0MzYvZW50cmllcy80OTk5OCIgdGFyZ2V0PSJfYmxhbmsiPnJlbGVhc2Ugbm90ZXM8L2E+IGZvciBkZXRhaWxzLiBXZSBhcmUgdmVyeSBpbnRlcmVzdGVkIGluIHlvdXIgZmVlZGJhY2ssIGlmIHlvdSBoYXZlIGFueSBmZWF0dXJlIHJlY29tbWVuZGF0aW9ucywgcGxlYXNlIHBvc3QgdGhlbSBpbiBvdXIgPGEgaHJlZj0iaHR0cDovL2xlYWRzMzYwLnplbmRlc2suY29tL2ZvcnVtcy8xNjQzOS9lbnRyaWVzIiB0YXJnZXQ9Il9ibGFuayI+dXNlciBmb3J1bTwvYT4uPC9wPg0KPHVsPg0KPGxpPldlIG5vdyBvZmZlciA1IHRlbXBsYXRlczogTW9ydGdhZ2UsIERlYnQvTG9hbk1vZCwgSW5zdXJhbmNlIChIb21lL0F1dG8pLCBJbnN1cmFuY2UgKEhlYWx0aC9MaWZlKSwgR2VuZXJpYzwvbGk+DQo8bGk+TmV3IDxhIGhyZWY9Imh0dHBzOi8vbG14LmxlYWRzMzYwLmNvbS9oZWxwIiB0YXJnZXQ9Il9ibGFuayI+aGVscCBmb3J1bXM8L2E+IGFuZCB0aWNrZXRpbmcgc3lzdGVtPC9saT4NCjxsaT5OZXcgZGVkaWNhdGVkIEV4cHJlc3Mgc3VwcG9ydCBwZXJzb24gYW5kIGNoYXQgbm93IGF2YWlsYWJsZSBmb3IgdHJpYWwgYW5kIHBheWluZyBjbGllbnRzPC9saT4NCjwvdWw+DQo8L2Rpdj4NCmRkSh+rfjq3ECYgva0xikqoCQO0DWY=
        )

    [2] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => hidden
            [6] => __EVENTVALIDATION
            [7] => /wEWBQL7tbDVBgLw0JndDgLi/qahAwKSuuDUCwKCkfPgDOt1y12mZhJ/qB81miBJ4pLiwjLK
        )

    [3] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => text
            [6] => usernameTextBox
        )

    [4] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => password
            [6] => passwordTextBox
        )

    [5] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => text
            [6] => emailTextBox
        )

)

Open in new window

i also need to check for a disabled value, or a parent style hiding the field - to be appended to the same element subarray. how would this be done? here follows the possible hiding of an element via parental inline css.
<div class="dialog" style="position: absolute; visibility: hidden; z-index: 70013; left: 735px; top: 212px;" id="passwordDialog"><div class="header" id="passwordDialog_HeaderSpan">

            Forgot Password
        
</div><div class="content password" id="passwordDialog_InnerSpan">

            <p>Please enter the email address you signed up with and we will send you a new login link.</p>
            <dl>
                <dt>Email:</dt>
                <dd><input type="text" class="bigtextbox" id="emailTextBox" name="emailTextBox" tabindex="0"></dd>
            </dl>
            <div class="buttons submitcancel">
                <a onclick="RequestPasswordReset();" id="submitPasswordRest" class="submit left" tabindex="0">Submit</a>
                <a onclick="passwordDialog.Close();" class="cancel right" tabindex="0">Close</a>
            </div>
        
</div><div class="footer" id="passwordDialog_FooterSpan">

        
</div></div>

Open in new window

Avatar of kaufmed
kaufmed
Flag of United States of America image

You won't be able to add to the patterns because PHP regex does not support unbounded lookbehind. Even if it did, trying to fully parse HTML with regex is inadvisable--it gets extremely messy and is most often unreliable. At this point I would say to extract your tags via the regex, then use one of the built-in HTML parsing functions to find your <div> tags, then see if the regex-captured text is a substring of the <div>'s text. I don't know the function names for the HTML parsers off-hand, but I'll try to find them.
Avatar of intellisource

ASKER

thanks kaufmed - would appreciate that - should perhaps have done this in the first place :P lol
but - they say there are many ways to skin a cat - i am only aware of the xml parsing functions not html dom - might this be what you meant?
browsed php.net and came accross the dom objects. i think this is what you meant, i tried using loadHTML method as follows, within the forms loop by regex:
$doc = new DOMDocument();
$doc->loadHTML($form);

Open in new window

but unfortunately i can't see how to parse for wether a parent div is hidden or not.
this is the function to detect this at the moment, it receives the entire form element and its contents as well as the element name to check, as id's do not get submitted to the server.
function ishidden($name,$html) {
	$dom = new DOMDocument();
	@$dom->loadHTML($html);
	$divs = $dom->getElementsByTagName("div");
	foreach ($divs as $div) {
		$style = $div->getAttribute("style");
		if (preg_match("/display:none|visibility:hidden/ims",$style)) {
			foreach ($div->childNodes as $child) {
				if ($child->getAttribute("name")==$name) {
					return true;
				}
			}
		}
	}
	return false;
}

Open in new window

This is an interesting question.  Can you tell us why you are doing this?  What is the input and the expected work product?  If we know that, there may be easier ways to skin this cat.
it is to automate laborious work. a whole string of copy and paste processes can be eliminated by automating the form submissions into our quicktextpro integrations.
oh, this function is to be used in reading forms, to emulate form submissions - i've given you the purpose.
sorry but i am really not familiar with using these php dom objects... -_- quite different to javascripts, where i could have loaded the element, from there retrieved parents while it's not top level. don't see how to do this here in php....
I can show you how to find the form elements, if that is any help.

<?php // RAY_temp_intellisource.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_27296137.html?cid=1572


// A URL TO TEST WITH
$url = 'https://lmx.leads360.com/web/Login.aspx';

// READ THE GENERATED HTML STRING
$htm = my_curl($url);

// REMOVE THE END-OF-LINE CHARACTERS
$htm = str_replace(PHP_EOL, "", $htm);

// ISOLATE THE FORM
$form   = explode("<form",$htm);
$form   = explode("</form>",$form[1]);
$inputs = explode("<input",$form[0]);

// ISOLATE THE INPUTS TO THE REQUEST
foreach($inputs as $key => $val)
{
    // IDENTIFY THE ACTION SCRIPT
    $action = strpos($val, "action");
    if($action !== false)
    {
        // EXTRACT THE ACTION SCRIPT NAME FROM THE FORM INPUT
        $actstart = strpos($val, "\"", $action+1);
        $actend   = strpos($val, "\"", $actstart+1);
        $posturl  = substr($val, $actstart+1, ($actend-$actstart-1));
        continue;
    }

    // IDENTIFY THE INPUT FIELDS BY NAME AND VALUE PAIRS
    $name = strpos($val, "name");
    if($name !== false)
    {
        // EXTRACT THE NAME FROM THE FORM INPUT
        $namestart = strpos($val, "\"", $name+1);
        $nameend   = strpos($val, "\"", $namestart+1);
        $strname   = substr($val, $namestart+1, ($nameend-$namestart-1));

        // EXTRACT THE VALUE
        $value = strpos($val, "value");
        if($value !== false)
        {
            $valuestart = strpos($val, "\"", $value+1);
            $valueend   = strpos($val, "\"", $valuestart+1);
            $strvalue   = substr($val, $valuestart+1, ($valueend-$valuestart-1));
        }

        // IF NO VALUE
        else
        {
            $strvalue   = NULL;
        }
    }
    $postdata[$strname] = $strvalue;
}

// SHOW THE WORK PRODUCT
echo "<pre>";
echo PHP_EOL . "THE ACTION SCRIPT URL IS: $posturl";
echo PHP_EOL . "THE REQUEST ARGUMENTS ARE: ";
var_dump($postdata);



// A FUNCTION TO RUN A CURL-GET CLIENT CALL TO A FOREIGN SERVER
function my_curl
( $url
, $timeout=3
, $error_report=TRUE
)
{
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

well getting the form elements via pcre was not too big a problem - just the formatting which kaufmed helped me with. the issue i am facing now is determining wether a form element is within a div that has the inline style of display: none or visibility: hidden. not so sure how to work that out parsing with the php DOM objects (DOM objects specifically used DOMDocument::loadHTML). there does not seem to be an ancestry/parent property as javascript has when parsing the DOM though, so my mind is rather stuck on this. :(
the issue is merely within this function, which is passed the element name and the form html tree:
function ishidden($name,$html) {
        $dom = new DOMDocument();
        @$dom->loadHTML($html);
        $divs = $dom->getElementsByTagName("div");
        foreach ($divs as $div) {
                $style = $div->getAttribute("style");
                if (preg_match("/display:none|visibility:hidden/ims",$style)) {
                        foreach ($div->childNodes as $child) {
                                if ($child->getAttribute("name")==$name) {
                                        return true;
                                }
                        }
                }
        }
        return false;
}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of intellisource
intellisource
Flag of South Africa image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
have decided to go with the PHP Simple HTML DOM Parser, linked in this post to a resolution of the actual problem. ;)