Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

how to check if there is a DIV enclosing a form element within the conditional PCRE?

Posted on 2011-09-07
12
Medium Priority
?
388 Views
Last Modified: 2012-05-12
hi, i've just completed a PCRE as follows to retrieve form elements from a form:
$pattern = '#<(form)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*action="([^"]*))?)' .
           '|<(input)(?=(?:[^>]*type="([^"]*))?)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*value="([^"]*))?)(?=(?:[^>]*checked="([^"]*))?)' .
           '|<(textarea)(?=(?:[^>]*name="([^"]*))?)[^>]*>([^<]|<(?!/textarea))*' .
           '|<(select)(?=(?:[^>]*name="([^"]*))?)' .
           '|<(button)(?=(?:[^>]*name="([^"]*))?)(?=(?:[^>]*value="([^"]*))?)#i';

Open in new window

i wish to post this question as a follow-up to the question available at http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_27295097.html#a36497404
now if an element is within a div (parental or any other ancestral level tags) with inline style including "visibility: hidden" or "display: none" how can i detect that (hidden|none|empty string) by adding to the four element pattern's regex?
for example loading the login form on https://lmx.leads360.com/web/Login.aspx - the emailTexbox element is retrieved but does not give any clue as to wether it is displayed/hidden or not.
Array
(
    [0] => Array
        (
            [0] => <form
            [1] => form
            [2] => form1
            [3] => login.aspx
        )

    [1] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => hidden
            [6] => __VIEWSTATE
            [7] => /wEPDwULLTEwMTAwMDM0ODIPZBYCAgMPZBYCAg0PFgIeC18hSXRlbUNvdW50AgEWAmYPZBYCAgEPFgIeBFRleHQF3gU8ZGl2Pg0KPHA+T24gOC8xOS8wOSB3ZSByZWxlYXNlZCBtaW5vciB1cGRhdGVzIHRvIEV4cHJlc3MuIFBsZWFzZSByZXZpZXcgdGhlIDxhIGhyZWY9Imh0dHA6Ly9sZWFkczM2MC56ZW5kZXNrLmNvbS9mb3J1bXMvMTY0MzYvZW50cmllcy80OTk5OCIgdGFyZ2V0PSJfYmxhbmsiPnJlbGVhc2Ugbm90ZXM8L2E+IGZvciBkZXRhaWxzLiBXZSBhcmUgdmVyeSBpbnRlcmVzdGVkIGluIHlvdXIgZmVlZGJhY2ssIGlmIHlvdSBoYXZlIGFueSBmZWF0dXJlIHJlY29tbWVuZGF0aW9ucywgcGxlYXNlIHBvc3QgdGhlbSBpbiBvdXIgPGEgaHJlZj0iaHR0cDovL2xlYWRzMzYwLnplbmRlc2suY29tL2ZvcnVtcy8xNjQzOS9lbnRyaWVzIiB0YXJnZXQ9Il9ibGFuayI+dXNlciBmb3J1bTwvYT4uPC9wPg0KPHVsPg0KPGxpPldlIG5vdyBvZmZlciA1IHRlbXBsYXRlczogTW9ydGdhZ2UsIERlYnQvTG9hbk1vZCwgSW5zdXJhbmNlIChIb21lL0F1dG8pLCBJbnN1cmFuY2UgKEhlYWx0aC9MaWZlKSwgR2VuZXJpYzwvbGk+DQo8bGk+TmV3IDxhIGhyZWY9Imh0dHBzOi8vbG14LmxlYWRzMzYwLmNvbS9oZWxwIiB0YXJnZXQ9Il9ibGFuayI+aGVscCBmb3J1bXM8L2E+IGFuZCB0aWNrZXRpbmcgc3lzdGVtPC9saT4NCjxsaT5OZXcgZGVkaWNhdGVkIEV4cHJlc3Mgc3VwcG9ydCBwZXJzb24gYW5kIGNoYXQgbm93IGF2YWlsYWJsZSBmb3IgdHJpYWwgYW5kIHBheWluZyBjbGllbnRzPC9saT4NCjwvdWw+DQo8L2Rpdj4NCmRkSh+rfjq3ECYgva0xikqoCQO0DWY=
        )

    [2] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => hidden
            [6] => __EVENTVALIDATION
            [7] => /wEWBQL7tbDVBgLw0JndDgLi/qahAwKSuuDUCwKCkfPgDOt1y12mZhJ/qB81miBJ4pLiwjLK
        )

    [3] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => text
            [6] => usernameTextBox
        )

    [4] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => password
            [6] => passwordTextBox
        )

    [5] => Array
        (
            [0] => <input
            [1] => 
            [2] => 
            [3] => 
            [4] => input
            [5] => text
            [6] => emailTextBox
        )

)

Open in new window

i also need to check for a disabled value, or a parent style hiding the field - to be appended to the same element subarray. how would this be done? here follows the possible hiding of an element via parental inline css.
<div class="dialog" style="position: absolute; visibility: hidden; z-index: 70013; left: 735px; top: 212px;" id="passwordDialog"><div class="header" id="passwordDialog_HeaderSpan">

            Forgot Password
        
</div><div class="content password" id="passwordDialog_InnerSpan">

            <p>Please enter the email address you signed up with and we will send you a new login link.</p>
            <dl>
                <dt>Email:</dt>
                <dd><input type="text" class="bigtextbox" id="emailTextBox" name="emailTextBox" tabindex="0"></dd>
            </dl>
            <div class="buttons submitcancel">
                <a onclick="RequestPasswordReset();" id="submitPasswordRest" class="submit left" tabindex="0">Submit</a>
                <a onclick="passwordDialog.Close();" class="cancel right" tabindex="0">Close</a>
            </div>
        
</div><div class="footer" id="passwordDialog_FooterSpan">

        
</div></div>

Open in new window

0
Comment
Question by:intellisource
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 2
12 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36498816
You won't be able to add to the patterns because PHP regex does not support unbounded lookbehind. Even if it did, trying to fully parse HTML with regex is inadvisable--it gets extremely messy and is most often unreliable. At this point I would say to extract your tags via the regex, then use one of the built-in HTML parsing functions to find your <div> tags, then see if the regex-captured text is a substring of the <div>'s text. I don't know the function names for the HTML parsers off-hand, but I'll try to find them.
0
 

Author Comment

by:intellisource
ID: 36500782
thanks kaufmed - would appreciate that - should perhaps have done this in the first place :P lol
but - they say there are many ways to skin a cat - i am only aware of the xml parsing functions not html dom - might this be what you meant?
0
 

Author Comment

by:intellisource
ID: 36501720
browsed php.net and came accross the dom objects. i think this is what you meant, i tried using loadHTML method as follows, within the forms loop by regex:
$doc = new DOMDocument();
$doc->loadHTML($form);

Open in new window

but unfortunately i can't see how to parse for wether a parent div is hidden or not.
this is the function to detect this at the moment, it receives the entire form element and its contents as well as the element name to check, as id's do not get submitted to the server.
function ishidden($name,$html) {
	$dom = new DOMDocument();
	@$dom->loadHTML($html);
	$divs = $dom->getElementsByTagName("div");
	foreach ($divs as $div) {
		$style = $div->getAttribute("style");
		if (preg_match("/display:none|visibility:hidden/ims",$style)) {
			foreach ($div->childNodes as $child) {
				if ($child->getAttribute("name")==$name) {
					return true;
				}
			}
		}
	}
	return false;
}

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36502433
This is an interesting question.  Can you tell us why you are doing this?  What is the input and the expected work product?  If we know that, there may be easier ways to skin this cat.
0
 

Author Comment

by:intellisource
ID: 36502664
it is to automate laborious work. a whole string of copy and paste processes can be eliminated by automating the form submissions into our quicktextpro integrations.
0
 

Author Comment

by:intellisource
ID: 36502913
oh, this function is to be used in reading forms, to emulate form submissions - i've given you the purpose.
0
 

Author Comment

by:intellisource
ID: 36502928
sorry but i am really not familiar with using these php dom objects... -_- quite different to javascripts, where i could have loaded the element, from there retrieved parents while it's not top level. don't see how to do this here in php....
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36503209
I can show you how to find the form elements, if that is any help.

<?php // RAY_temp_intellisource.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_27296137.html?cid=1572


// A URL TO TEST WITH
$url = 'https://lmx.leads360.com/web/Login.aspx';

// READ THE GENERATED HTML STRING
$htm = my_curl($url);

// REMOVE THE END-OF-LINE CHARACTERS
$htm = str_replace(PHP_EOL, "", $htm);

// ISOLATE THE FORM
$form   = explode("<form",$htm);
$form   = explode("</form>",$form[1]);
$inputs = explode("<input",$form[0]);

// ISOLATE THE INPUTS TO THE REQUEST
foreach($inputs as $key => $val)
{
    // IDENTIFY THE ACTION SCRIPT
    $action = strpos($val, "action");
    if($action !== false)
    {
        // EXTRACT THE ACTION SCRIPT NAME FROM THE FORM INPUT
        $actstart = strpos($val, "\"", $action+1);
        $actend   = strpos($val, "\"", $actstart+1);
        $posturl  = substr($val, $actstart+1, ($actend-$actstart-1));
        continue;
    }

    // IDENTIFY THE INPUT FIELDS BY NAME AND VALUE PAIRS
    $name = strpos($val, "name");
    if($name !== false)
    {
        // EXTRACT THE NAME FROM THE FORM INPUT
        $namestart = strpos($val, "\"", $name+1);
        $nameend   = strpos($val, "\"", $namestart+1);
        $strname   = substr($val, $namestart+1, ($nameend-$namestart-1));

        // EXTRACT THE VALUE
        $value = strpos($val, "value");
        if($value !== false)
        {
            $valuestart = strpos($val, "\"", $value+1);
            $valueend   = strpos($val, "\"", $valuestart+1);
            $strvalue   = substr($val, $valuestart+1, ($valueend-$valuestart-1));
        }

        // IF NO VALUE
        else
        {
            $strvalue   = NULL;
        }
    }
    $postdata[$strname] = $strvalue;
}

// SHOW THE WORK PRODUCT
echo "<pre>";
echo PHP_EOL . "THE ACTION SCRIPT URL IS: $posturl";
echo PHP_EOL . "THE REQUEST ARGUMENTS ARE: ";
var_dump($postdata);



// A FUNCTION TO RUN A CURL-GET CLIENT CALL TO A FOREIGN SERVER
function my_curl
( $url
, $timeout=3
, $error_report=TRUE
)
{
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}

Open in new window

0
 

Author Comment

by:intellisource
ID: 36507474
well getting the form elements via pcre was not too big a problem - just the formatting which kaufmed helped me with. the issue i am facing now is determining wether a form element is within a div that has the inline style of display: none or visibility: hidden. not so sure how to work that out parsing with the php DOM objects (DOM objects specifically used DOMDocument::loadHTML). there does not seem to be an ancestry/parent property as javascript has when parsing the DOM though, so my mind is rather stuck on this. :(
0
 

Author Comment

by:intellisource
ID: 36507498
the issue is merely within this function, which is passed the element name and the form html tree:
function ishidden($name,$html) {
        $dom = new DOMDocument();
        @$dom->loadHTML($html);
        $divs = $dom->getElementsByTagName("div");
        foreach ($divs as $div) {
                $style = $div->getAttribute("style");
                if (preg_match("/display:none|visibility:hidden/ims",$style)) {
                        foreach ($div->childNodes as $child) {
                                if ($child->getAttribute("name")==$name) {
                                        return true;
                                }
                        }
                }
        }
        return false;
}

Open in new window

0
 

Accepted Solution

by:
intellisource earned 0 total points
ID: 36508721
okay.
after a business breakfast with the client, and an inspired discussion towards resolving this issue as in yesterday - i've located the PHP Simple HTML DOM Parser, which does in fact include a parent property to each DOM element! ;)
just figuring how to include and use this API though... then it will be about 30 minutes to complete this function! :D thanks for the help guys...
0
 

Author Closing Comment

by:intellisource
ID: 36528000
have decided to go with the PHP Simple HTML DOM Parser, linked in this post to a resolution of the actual problem. ;)
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question