• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 390
  • Last Modified:

how to check if there is a DIV enclosing a form element within the conditional PCRE?

hi, i've just completed a PCRE as follows to retrieve form elements from a form:
$pattern = '#<(form)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*action="([^"]*))?)' .
           '|<(input)(?=(?:[^>]*type="([^"]*))?)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*value="([^"]*))?)(?=(?:[^>]*checked="([^"]*))?)' .
           '|<(textarea)(?=(?:[^>]*name="([^"]*))?)[^>]*>([^<]|<(?!/textarea))*' .
           '|<(select)(?=(?:[^>]*name="([^"]*))?)' .
           '|<(button)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*value="([^"]*))?)#i';

Open in new window

i wish to post this question as a follow-up to the question available at http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_27295097.html#a36497404
now if an element is within a div (parental or any other ancestral level tags) with inline style including "visibility: hidden" or "display: none" how can i detect that (hidden|none|empty string) by adding to the four element pattern's regex?
for example loading the login form on https://lmx.leads360.com/web/Login.aspx - the emailTexbox element is retrieved but does not give any clue as to wether it is displayed/hidden or not.
Array
(
    [0] => Array
        (
            [0] => <form
            [1] => form
            [2] => form1
            [3] => login.aspx
        )

    [1] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => hidden
            [6] => __VIEWSTATE
            [7] => /wEPDwULLTEwMTAwMDM0ODIPZBYCAgMPZBYCAg0PFgIeC18hSXRlbUNvdW50AgEWAmYPZBYCAgEPFgIeBFRleHQF3gU8ZGl2Pg0KPHA+T24gOC8xOS8wOSB3ZSByZWxlYXNlZCBtaW5vciB1cGRhdGVzIHRvIEV4cHJlc3MuIFBsZWFzZSByZXZpZXcgdGhlIDxhIGhyZWY9Imh0dHA6Ly9sZWFkczM2MC56ZW5kZXNrLmNvbS9mb3J1bXMvMTY0MzYvZW50cmllcy80OTk5OCIgdGFyZ2V0PSJfYmxhbmsiPnJlbGVhc2Ugbm90ZXM8L2E+IGZvciBkZXRhaWxzLiBXZSBhcmUgdmVyeSBpbnRlcmVzdGVkIGluIHlvdXIgZmVlZGJhY2ssIGlmIHlvdSBoYXZlIGFueSBmZWF0dXJlIHJlY29tbWVuZGF0aW9ucywgcGxlYXNlIHBvc3QgdGhlbSBpbiBvdXIgPGEgaHJlZj0iaHR0cDovL2xlYWRzMzYwLnplbmRlc2suY29tL2ZvcnVtcy8xNjQzOS9lbnRyaWVzIiB0YXJnZXQ9Il9ibGFuayI+dXNlciBmb3J1bTwvYT4uPC9wPg0KPHVsPg0KPGxpPldlIG5vdyBvZmZlciA1IHRlbXBsYXRlczogTW9ydGdhZ2UsIERlYnQvTG9hbk1vZCwgSW5zdXJhbmNlIChIb21lL0F1dG8pLCBJbnN1cmFuY2UgKEhlYWx0aC9MaWZlKSwgR2VuZXJpYzwvbGk+DQo8bGk+TmV3IDxhIGhyZWY9Imh0dHBzOi8vbG14LmxlYWRzMzYwLmNvbS9oZWxwIiB0YXJnZXQ9Il9ibGFuayI+aGVscCBmb3J1bXM8L2E+IGFuZCB0aWNrZXRpbmcgc3lzdGVtPC9saT4NCjxsaT5OZXcgZGVkaWNhdGVkIEV4cHJlc3Mgc3VwcG9ydCBwZXJzb24gYW5kIGNoYXQgbm93IGF2YWlsYWJsZSBmb3IgdHJpYWwgYW5kIHBheWluZyBjbGllbnRzPC9saT4NCjwvdWw+DQo8L2Rpdj4NCmRkSh+rfjq3ECYgva0xikqoCQO0DWY=
        )

    [2] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => hidden
            [6] => __EVENTVALIDATION
            [7] => /wEWBQL7tbDVBgLw0JndDgLi/qahAwKSuuDUCwKCkfPgDOt1y12mZhJ/qB81miBJ4pLiwjLK
        )

    [3] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => text
            [6] => usernameTextBox
        )

    [4] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => password
            [6] => passwordTextBox
        )

    [5] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => text
            [6] => emailTextBox
        )

)

Open in new window

i also need to check for a disabled value, or a parent style hiding the field - to be appended to the same element subarray. how would this be done? here follows the possible hiding of an element via parental inline css.
<div class="dialog" style="position: absolute; visibility: hidden; z-index: 70013; left: 735px; top: 212px;" id="passwordDialog"><div class="header" id="passwordDialog_HeaderSpan">

            Forgot Password
        
</div><div class="content password" id="passwordDialog_InnerSpan">

            <p>Please enter the email address you signed up with and we will send you a new login link.</p>
            <dl>
                <dt>Email:</dt>
                <dd><input type="text" class="bigtextbox" id="emailTextBox" name="emailTextBox" tabindex="0"></dd>
            </dl>
            <div class="buttons submitcancel">
                <a onclick="RequestPasswordReset();" id="submitPasswordRest" class="submit left" tabindex="0">Submit</a>
                <a onclick="passwordDialog.Close();" class="cancel right" tabindex="0">Close</a>
            </div>
        
</div><div class="footer" id="passwordDialog_FooterSpan">

        
</div></div>

Open in new window

0
intellisource
Asked:
intellisource
  • 9
  • 2
1 Solution
 
käµfm³d 👽Commented:
You won't be able to add to the patterns because PHP regex does not support unbounded lookbehind. Even if it did, trying to fully parse HTML with regex is inadvisable--it gets extremely messy and is most often unreliable. At this point I would say to extract your tags via the regex, then use one of the built-in HTML parsing functions to find your <div> tags, then see if the regex-captured text is a substring of the <div>'s text. I don't know the function names for the HTML parsers off-hand, but I'll try to find them.
0
 
intellisourceAuthor Commented:
thanks kaufmed - would appreciate that - should perhaps have done this in the first place :P lol
but - they say there are many ways to skin a cat - i am only aware of the xml parsing functions not html dom - might this be what you meant?
0
 
intellisourceAuthor Commented:
browsed php.net and came accross the dom objects. i think this is what you meant, i tried using loadHTML method as follows, within the forms loop by regex:
$doc = new DOMDocument();
$doc->loadHTML($form);

Open in new window

but unfortunately i can't see how to parse for wether a parent div is hidden or not.
this is the function to detect this at the moment, it receives the entire form element and its contents as well as the element name to check, as id's do not get submitted to the server.
function ishidden($name,$html) {
	$dom = new DOMDocument();
	@$dom->loadHTML($html);
	$divs = $dom->getElementsByTagName("div");
	foreach ($divs as $div) {
		$style = $div->getAttribute("style");
		if (preg_match("/display:none|visibility:hidden/ims",$style)) {
			foreach ($div->childNodes as $child) {
				if ($child->getAttribute("name")==$name) {
					return true;
				}
			}
		}
	}
	return false;
}

Open in new window

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
Ray PaseurCommented:
This is an interesting question.  Can you tell us why you are doing this?  What is the input and the expected work product?  If we know that, there may be easier ways to skin this cat.
0
 
intellisourceAuthor Commented:
it is to automate laborious work. a whole string of copy and paste processes can be eliminated by automating the form submissions into our quicktextpro integrations.
0
 
intellisourceAuthor Commented:
oh, this function is to be used in reading forms, to emulate form submissions - i've given you the purpose.
0
 
intellisourceAuthor Commented:
sorry but i am really not familiar with using these php dom objects... -_- quite different to javascripts, where i could have loaded the element, from there retrieved parents while it's not top level. don't see how to do this here in php....
0
 
Ray PaseurCommented:
I can show you how to find the form elements, if that is any help.

<?php // RAY_temp_intellisource.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_27296137.html?cid=1572


// A URL TO TEST WITH
$url = 'https://lmx.leads360.com/web/Login.aspx';

// READ THE GENERATED HTML STRING
$htm = my_curl($url);

// REMOVE THE END-OF-LINE CHARACTERS
$htm = str_replace(PHP_EOL, "", $htm);

// ISOLATE THE FORM
$form   = explode("<form",$htm);
$form   = explode("</form>",$form[1]);
$inputs = explode("<input",$form[0]);

// ISOLATE THE INPUTS TO THE REQUEST
foreach($inputs as $key => $val)
{
    // IDENTIFY THE ACTION SCRIPT
    $action = strpos($val, "action");
    if($action !== false)
    {
        // EXTRACT THE ACTION SCRIPT NAME FROM THE FORM INPUT
        $actstart = strpos($val, "\"", $action+1);
        $actend   = strpos($val, "\"", $actstart+1);
        $posturl  = substr($val, $actstart+1, ($actend-$actstart-1));
        continue;
    }

    // IDENTIFY THE INPUT FIELDS BY NAME AND VALUE PAIRS
    $name = strpos($val, "name");
    if($name !== false)
    {
        // EXTRACT THE NAME FROM THE FORM INPUT
        $namestart = strpos($val, "\"", $name+1);
        $nameend   = strpos($val, "\"", $namestart+1);
        $strname   = substr($val, $namestart+1, ($nameend-$namestart-1));

        // EXTRACT THE VALUE
        $value = strpos($val, "value");
        if($value !== false)
        {
            $valuestart = strpos($val, "\"", $value+1);
            $valueend   = strpos($val, "\"", $valuestart+1);
            $strvalue   = substr($val, $valuestart+1, ($valueend-$valuestart-1));
        }

        // IF NO VALUE
        else
        {
            $strvalue   = NULL;
        }
    }
    $postdata[$strname] = $strvalue;
}

// SHOW THE WORK PRODUCT
echo "<pre>";
echo PHP_EOL . "THE ACTION SCRIPT URL IS: $posturl";
echo PHP_EOL . "THE REQUEST ARGUMENTS ARE: ";
var_dump($postdata);



// A FUNCTION TO RUN A CURL-GET CLIENT CALL TO A FOREIGN SERVER
function my_curl
( $url
, $timeout=3
, $error_report=TRUE
)
{
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

0
 
intellisourceAuthor Commented:
well getting the form elements via pcre was not too big a problem - just the formatting which kaufmed helped me with. the issue i am facing now is determining wether a form element is within a div that has the inline style of display: none or visibility: hidden. not so sure how to work that out parsing with the php DOM objects (DOM objects specifically used DOMDocument::loadHTML). there does not seem to be an ancestry/parent property as javascript has when parsing the DOM though, so my mind is rather stuck on this. :(
0
 
intellisourceAuthor Commented:
the issue is merely within this function, which is passed the element name and the form html tree:
function ishidden($name,$html) {
        $dom = new DOMDocument();
        @$dom->loadHTML($html);
        $divs = $dom->getElementsByTagName("div");
        foreach ($divs as $div) {
                $style = $div->getAttribute("style");
                if (preg_match("/display:none|visibility:hidden/ims",$style)) {
                        foreach ($div->childNodes as $child) {
                                if ($child->getAttribute("name")==$name) {
                                        return true;
                                }
                        }
                }
        }
        return false;
}

Open in new window

0
 
intellisourceAuthor Commented:
okay.
after a business breakfast with the client, and an inspired discussion towards resolving this issue as in yesterday - i've located the PHP Simple HTML DOM Parser, which does in fact include a parent property to each DOM element! ;)
just figuring how to include and use this API though... then it will be about 30 minutes to complete this function! :D thanks for the help guys...
0
 
intellisourceAuthor Commented:
have decided to go with the PHP Simple HTML DOM Parser, linked in this post to a resolution of the actual problem. ;)
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

  • 9
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now