Solved

how to check if there is a DIV enclosing a form element within the conditional PCRE?

Posted on 2011-09-07
12
369 Views
Last Modified: 2012-05-12
hi, i've just completed a PCRE as follows to retrieve form elements from a form:
$pattern = '#<(form)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*action="([^"]*))?)' .
           '|<(input)(?=(?:[^>]*type="([^"]*))?)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*value="([^"]*))?)(?=(?:[^>]*checked="([^"]*))?)' .
           '|<(textarea)(?=(?:[^>]*name="([^"]*))?)[^>]*>([^<]|<(?!/textarea))*' .
           '|<(select)(?=(?:[^>]*name="([^"]*))?)' .
           '|<(button)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*value="([^"]*))?)#i';

Open in new window

i wish to post this question as a follow-up to the question available at http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_27295097.html#a36497404
now if an element is within a div (parental or any other ancestral level tags) with inline style including "visibility: hidden" or "display: none" how can i detect that (hidden|none|empty string) by adding to the four element pattern's regex?
for example loading the login form on https://lmx.leads360.com/web/Login.aspx - the emailTexbox element is retrieved but does not give any clue as to wether it is displayed/hidden or not.
Array
(
    [0] => Array
        (
            [0] => <form
            [1] => form
            [2] => form1
            [3] => login.aspx
        )

    [1] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => hidden
            [6] => __VIEWSTATE
            [7] => /wEPDwULLTEwMTAwMDM0ODIPZBYCAgMPZBYCAg0PFgIeC18hSXRlbUNvdW50AgEWAmYPZBYCAgEPFgIeBFRleHQF3gU8ZGl2Pg0KPHA+T24gOC8xOS8wOSB3ZSByZWxlYXNlZCBtaW5vciB1cGRhdGVzIHRvIEV4cHJlc3MuIFBsZWFzZSByZXZpZXcgdGhlIDxhIGhyZWY9Imh0dHA6Ly9sZWFkczM2MC56ZW5kZXNrLmNvbS9mb3J1bXMvMTY0MzYvZW50cmllcy80OTk5OCIgdGFyZ2V0PSJfYmxhbmsiPnJlbGVhc2Ugbm90ZXM8L2E+IGZvciBkZXRhaWxzLiBXZSBhcmUgdmVyeSBpbnRlcmVzdGVkIGluIHlvdXIgZmVlZGJhY2ssIGlmIHlvdSBoYXZlIGFueSBmZWF0dXJlIHJlY29tbWVuZGF0aW9ucywgcGxlYXNlIHBvc3QgdGhlbSBpbiBvdXIgPGEgaHJlZj0iaHR0cDovL2xlYWRzMzYwLnplbmRlc2suY29tL2ZvcnVtcy8xNjQzOS9lbnRyaWVzIiB0YXJnZXQ9Il9ibGFuayI+dXNlciBmb3J1bTwvYT4uPC9wPg0KPHVsPg0KPGxpPldlIG5vdyBvZmZlciA1IHRlbXBsYXRlczogTW9ydGdhZ2UsIERlYnQvTG9hbk1vZCwgSW5zdXJhbmNlIChIb21lL0F1dG8pLCBJbnN1cmFuY2UgKEhlYWx0aC9MaWZlKSwgR2VuZXJpYzwvbGk+DQo8bGk+TmV3IDxhIGhyZWY9Imh0dHBzOi8vbG14LmxlYWRzMzYwLmNvbS9oZWxwIiB0YXJnZXQ9Il9ibGFuayI+aGVscCBmb3J1bXM8L2E+IGFuZCB0aWNrZXRpbmcgc3lzdGVtPC9saT4NCjxsaT5OZXcgZGVkaWNhdGVkIEV4cHJlc3Mgc3VwcG9ydCBwZXJzb24gYW5kIGNoYXQgbm93IGF2YWlsYWJsZSBmb3IgdHJpYWwgYW5kIHBheWluZyBjbGllbnRzPC9saT4NCjwvdWw+DQo8L2Rpdj4NCmRkSh+rfjq3ECYgva0xikqoCQO0DWY=
        )

    [2] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => hidden
            [6] => __EVENTVALIDATION
            [7] => /wEWBQL7tbDVBgLw0JndDgLi/qahAwKSuuDUCwKCkfPgDOt1y12mZhJ/qB81miBJ4pLiwjLK
        )

    [3] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => text
            [6] => usernameTextBox
        )

    [4] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => password
            [6] => passwordTextBox
        )

    [5] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => text
            [6] => emailTextBox
        )

)

Open in new window

i also need to check for a disabled value, or a parent style hiding the field - to be appended to the same element subarray. how would this be done? here follows the possible hiding of an element via parental inline css.
<div class="dialog" style="position: absolute; visibility: hidden; z-index: 70013; left: 735px; top: 212px;" id="passwordDialog"><div class="header" id="passwordDialog_HeaderSpan">

            Forgot Password
        
</div><div class="content password" id="passwordDialog_InnerSpan">

            <p>Please enter the email address you signed up with and we will send you a new login link.</p>
            <dl>
                <dt>Email:</dt>
                <dd><input type="text" class="bigtextbox" id="emailTextBox" name="emailTextBox" tabindex="0"></dd>
            </dl>
            <div class="buttons submitcancel">
                <a onclick="RequestPasswordReset();" id="submitPasswordRest" class="submit left" tabindex="0">Submit</a>
                <a onclick="passwordDialog.Close();" class="cancel right" tabindex="0">Close</a>
            </div>
        
</div><div class="footer" id="passwordDialog_FooterSpan">

        
</div></div>

Open in new window

0
Comment
Question by:intellisource
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 2
12 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36498816
You won't be able to add to the patterns because PHP regex does not support unbounded lookbehind. Even if it did, trying to fully parse HTML with regex is inadvisable--it gets extremely messy and is most often unreliable. At this point I would say to extract your tags via the regex, then use one of the built-in HTML parsing functions to find your <div> tags, then see if the regex-captured text is a substring of the <div>'s text. I don't know the function names for the HTML parsers off-hand, but I'll try to find them.
0
 

Author Comment

by:intellisource
ID: 36500782
thanks kaufmed - would appreciate that - should perhaps have done this in the first place :P lol
but - they say there are many ways to skin a cat - i am only aware of the xml parsing functions not html dom - might this be what you meant?
0
 

Author Comment

by:intellisource
ID: 36501720
browsed php.net and came accross the dom objects. i think this is what you meant, i tried using loadHTML method as follows, within the forms loop by regex:
$doc = new DOMDocument();
$doc->loadHTML($form);

Open in new window

but unfortunately i can't see how to parse for wether a parent div is hidden or not.
this is the function to detect this at the moment, it receives the entire form element and its contents as well as the element name to check, as id's do not get submitted to the server.
function ishidden($name,$html) {
	$dom = new DOMDocument();
	@$dom->loadHTML($html);
	$divs = $dom->getElementsByTagName("div");
	foreach ($divs as $div) {
		$style = $div->getAttribute("style");
		if (preg_match("/display:none|visibility:hidden/ims",$style)) {
			foreach ($div->childNodes as $child) {
				if ($child->getAttribute("name")==$name) {
					return true;
				}
			}
		}
	}
	return false;
}

Open in new window

0
Salesforce Made Easy to Use

On-screen guidance at the moment of need enables you & your employees to focus on the core, you can now boost your adoption rates swiftly and simply with one easy tool.

 
LVL 110

Expert Comment

by:Ray Paseur
ID: 36502433
This is an interesting question.  Can you tell us why you are doing this?  What is the input and the expected work product?  If we know that, there may be easier ways to skin this cat.
0
 

Author Comment

by:intellisource
ID: 36502664
it is to automate laborious work. a whole string of copy and paste processes can be eliminated by automating the form submissions into our quicktextpro integrations.
0
 

Author Comment

by:intellisource
ID: 36502913
oh, this function is to be used in reading forms, to emulate form submissions - i've given you the purpose.
0
 

Author Comment

by:intellisource
ID: 36502928
sorry but i am really not familiar with using these php dom objects... -_- quite different to javascripts, where i could have loaded the element, from there retrieved parents while it's not top level. don't see how to do this here in php....
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 36503209
I can show you how to find the form elements, if that is any help.

<?php // RAY_temp_intellisource.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_27296137.html?cid=1572


// A URL TO TEST WITH
$url = 'https://lmx.leads360.com/web/Login.aspx';

// READ THE GENERATED HTML STRING
$htm = my_curl($url);

// REMOVE THE END-OF-LINE CHARACTERS
$htm = str_replace(PHP_EOL, "", $htm);

// ISOLATE THE FORM
$form   = explode("<form",$htm);
$form   = explode("</form>",$form[1]);
$inputs = explode("<input",$form[0]);

// ISOLATE THE INPUTS TO THE REQUEST
foreach($inputs as $key => $val)
{
    // IDENTIFY THE ACTION SCRIPT
    $action = strpos($val, "action");
    if($action !== false)
    {
        // EXTRACT THE ACTION SCRIPT NAME FROM THE FORM INPUT
        $actstart = strpos($val, "\"", $action+1);
        $actend   = strpos($val, "\"", $actstart+1);
        $posturl  = substr($val, $actstart+1, ($actend-$actstart-1));
        continue;
    }

    // IDENTIFY THE INPUT FIELDS BY NAME AND VALUE PAIRS
    $name = strpos($val, "name");
    if($name !== false)
    {
        // EXTRACT THE NAME FROM THE FORM INPUT
        $namestart = strpos($val, "\"", $name+1);
        $nameend   = strpos($val, "\"", $namestart+1);
        $strname   = substr($val, $namestart+1, ($nameend-$namestart-1));

        // EXTRACT THE VALUE
        $value = strpos($val, "value");
        if($value !== false)
        {
            $valuestart = strpos($val, "\"", $value+1);
            $valueend   = strpos($val, "\"", $valuestart+1);
            $strvalue   = substr($val, $valuestart+1, ($valueend-$valuestart-1));
        }

        // IF NO VALUE
        else
        {
            $strvalue   = NULL;
        }
    }
    $postdata[$strname] = $strvalue;
}

// SHOW THE WORK PRODUCT
echo "<pre>";
echo PHP_EOL . "THE ACTION SCRIPT URL IS: $posturl";
echo PHP_EOL . "THE REQUEST ARGUMENTS ARE: ";
var_dump($postdata);



// A FUNCTION TO RUN A CURL-GET CLIENT CALL TO A FOREIGN SERVER
function my_curl
( $url
, $timeout=3
, $error_report=TRUE
)
{
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

0
 

Author Comment

by:intellisource
ID: 36507474
well getting the form elements via pcre was not too big a problem - just the formatting which kaufmed helped me with. the issue i am facing now is determining wether a form element is within a div that has the inline style of display: none or visibility: hidden. not so sure how to work that out parsing with the php DOM objects (DOM objects specifically used DOMDocument::loadHTML). there does not seem to be an ancestry/parent property as javascript has when parsing the DOM though, so my mind is rather stuck on this. :(
0
 

Author Comment

by:intellisource
ID: 36507498
the issue is merely within this function, which is passed the element name and the form html tree:
function ishidden($name,$html) {
        $dom = new DOMDocument();
        @$dom->loadHTML($html);
        $divs = $dom->getElementsByTagName("div");
        foreach ($divs as $div) {
                $style = $div->getAttribute("style");
                if (preg_match("/display:none|visibility:hidden/ims",$style)) {
                        foreach ($div->childNodes as $child) {
                                if ($child->getAttribute("name")==$name) {
                                        return true;
                                }
                        }
                }
        }
        return false;
}

Open in new window

0
 

Accepted Solution

by:
intellisource earned 0 total points
ID: 36508721
okay.
after a business breakfast with the client, and an inspired discussion towards resolving this issue as in yesterday - i've located the PHP Simple HTML DOM Parser, which does in fact include a parent property to each DOM element! ;)
just figuring how to include and use this API though... then it will be about 30 minutes to complete this function! :D thanks for the help guys...
0
 

Author Closing Comment

by:intellisource
ID: 36528000
have decided to go with the PHP Simple HTML DOM Parser, linked in this post to a resolution of the actual problem. ;)
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question