PHP Regex help.

Hi there.

Data example:

							<table class="table-gradient">
								<thead>
									<tr>
										<th scope="col"><a href="" style="text-decoration: none; color: #FFF;">NICKNAME</a></th>
										<th scope="col"><a href="gm_decisions.php?sort=count&dir=DESC" style="text-decoration: none; color: #FFF;">GM</a></th>
										<th scope="col"><a href="gm_decisions.php?sort=date_decided&dir=DESC" style="text-decoration: none; color: #FFF;">Date Decided</a></th>
										<th scope="col"><a href="gm_decisions.php?sort=decision_id&dir=DESC" style="text-decoration: none; color: #FFF;">Decision ID</a></th>
										<th scope="col">Category</th>
										<th scope="col">Decision</th>
									</tr>
								</thead>

								<tbody>
									
																<tr>
									<td><a href="playerview.php?account_id=5640930">lHadesl</a></td>
									<td><a href="gm_decisions.php?searchType=gm&search=Rejanu">Rejanu</a></td>
									<td>03-23-12 17:42</td>
									<td>146152</td>
									<td>Excessive Verbal Abuse</td>
									<td><a href="javascript:void(0);" >Guilty</a></td>
								</tr>
									
																<tr>
									<td><a href="playerview.php?account_id=3012910">Mezmerise</a></td>
									<td><a href="gm_decisions.php?searchType=gm&search=Rejanu">Rejanu</a></td>
									<td>03-24-12 11:50</td>
									<td>145933</td>
									<td>Excessive Verbal Abuse</td>
									<td><a href="javascript:void(0);" >Innocent</a></td>
								</tr>

										
								</tbody>
							</table>

Open in new window



I have an array with dates. Now I need a regex to loop through the html source and see if table contains any data with dates that are in the dates array within <tr> </tr>. If it does I need to place it into the array : Date Decided, Category, Description.

Thank you all for the help.
mropenmindAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

mropenmindAuthor Commented:
Date Array Example:

Array
(
    [0] => 2012-03-23
    [1] => 2012-03-24
    [2] => 2012-03-25
    [3] => 2012-03-26
    [4] => 2012-03-27
    [5] => 2012-03-28
    [6] => 2012-03-29
    [7] => 2012-03-30
    [8] => 2012-03-31
    [9] => 2012-04-01
    [10] => 2012-04-02
    [11] => 2012-04-03
    [12] => 2012-04-04
    [13] => 2012-04-05
)
0
mropenmindAuthor Commented:
There are 2 matches in the Data example, therefore I need to add them into array

As you can see, table contains both date and time (<td>03-23-12 17:42</td>) but I don't need time, just date.

Data1: 03-23-12,Excessive Verbal Abuse,Guilty
Data2: 03-24-12,Excessive Verbal Abuse,Innocent
0
Ray PaseurCommented:
The test data has no intersection, if I understand the question correctly.  It looks like the HTML dates are March 21 and the date array example starts with March 23.

I'll try changing the data a little bit and see if that can produce a reasonable test case.
0
Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

mropenmindAuthor Commented:
Thanks. I've noticed my error before, and therefore corrected it.
0
Terry WoodsIT GuruCommented:
A regex isn't really the best tool for parsing HTML, but when I tried using a DOM parser I just got an error. The following extracts the data in a useful manner, but it's not exactly tidy - depending on how tidy the code needs to be, it may be enough:

$tableBody = preg_replace("#^.*<tbody>(.*?)</tbody>.*$#is", "$1", $text);
print "Table body: $tableBody\n";
preg_match_all("#<tr>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)
*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td
>(.*?)</td>#si", $tableBody, $matches);
print_r($matches);

Open in new window

0
Ray PaseurCommented:
You can see the script in action here.
http://www.laprbass.com/RAY_temp_mropenmind.php

But I would like to suggest that you take a moment (perhaps post another question here at EE) to start a conversation about data design patterns.  Parsing HTML is a really brittle approach to data gathering.  It may work once when it is first written and tested, but if the publisher of the HTML makes any changes, you're screwed.  For this reason many publishers expose an API and render either XML or JSON strings.  If you could get this data from a formal interface (API interfaces are almost always version-numbered and are not published until they are stable) you would be better off.

One other note -- be sure that your method of access to the web site does not violate the terms of service or the copyright notice.  Some sites explicitly disallow automated access to their web pages.  If you violate their terms of service you can be sued successfully and you may wind up with a huge legal bill.  It's not worth this risk, so be careful to check and ensure that you're in squeaky-clean compliance with the TOS.
<?php // RAY_temp_mropenmind.php
error_reporting(E_ALL);
echo "<pre>";

// REQUIRED SINCE PHP 5.1+
date_default_timezone_set('America/New_York');


// TEST DATA FROM THE POST AT EE
$htm = <<<HTM
							<table class="table-gradient">
								<thead>
									<tr>
										<th scope="col"><a href="" style="text-decoration: none; color: #FFF;">NICKNAME</a></th>
										<th scope="col"><a href="gm_decisions.php?sort=count&dir=DESC" style="text-decoration: none; color: #FFF;">GM</a></th>
										<th scope="col"><a href="gm_decisions.php?sort=date_decided&dir=DESC" style="text-decoration: none; color: #FFF;">Date Decided</a></th>
										<th scope="col"><a href="gm_decisions.php?sort=decision_id&dir=DESC" style="text-decoration: none; color: #FFF;">Decision ID</a></th>
										<th scope="col">Category</th>
										<th scope="col">Decision</th>
									</tr>
								</thead>

								<tbody>

																<tr>
									<td><a href="playerview.php?account_id=5640930">lHadesl</a></td>
									<td><a href="gm_decisions.php?searchType=gm&search=Rejanu">Rejanu</a></td>
									<td>03-21-12 17:42</td>
									<td>146152</td>
									<td>Excessive Verbal Abuse</td>
									<td><a href="javascript:void(0);" >Guilty</a></td>
								</tr>

																<tr>
									<td><a href="playerview.php?account_id=3012910">Mezmerise</a></td>
									<td><a href="gm_decisions.php?searchType=gm&search=Rejanu">Rejanu</a></td>
	<!-- CHANGE HERE -->			<td>04-01-12 11:50</td>
									<td>145933</td>
									<td>Excessive Verbal Abuse</td>
									<td><a href="javascript:void(0);" >Innocent</a></td>
								</tr>


								</tbody>
							</table>
HTM;

// FUNCTION TO RETURN AN ARRAY OF DATES
function array_of_dates($alpha='Today', $omega='Today')
{
    // MIGHT WANT TO ADD SOME SANITY CHECKS HERE
    $out = array();
    $alpha = date('Y-m-d', strtotime($alpha));
    $omega = date('Y-m-d', strtotime($omega));
    while($alpha <= $omega)
    {
        $out[] = $alpha;
        $alpha = date('Y-m-d', strtotime($alpha . ' + 1 DAY'));
    }
    return $out;
}


// GET SOMETHING TO TEST WITH
$dts = array_of_dates('March 23', 'April 5');

// BREAK THE HTML INTO TABLE-ROWS
$trs = explode('<tr>', $htm);

// TEST EACH TABLE ROW
foreach ($trs as $tr)
{
    // TEST AGAINST EACH DATE
    foreach ($dts as $dt)
    {
        // IF THIS DATE IS PRESENT
        $test_date = date('m-d-y', strtotime($dt));
        if (strpos($tr, $test_date))
        {
            // ISOLATE THE DATA ELEMENTS
            // var_dump($tr);
            $tds = explode('<td>', $tr);

            // SHOW THE INFORMATION WE FOUND
            foreach ($tds as $td)
            {
                $td = trim($td);
                echo PHP_EOL . strip_tags($td);
            }
        }
        else continue;
    }
}

Open in new window

Best of luck with your project, ~Ray
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mropenmindAuthor Commented:
Mezmerise
Rejanu
      
04-01-12 11:50
145933
Excessive Verbal Abuse
Innocent

Why there is a space in results, and how do I add results into the array and then print them out?
0
mropenmindAuthor Commented:
I only need
date (without time) = 04-01-12
Category = Excessive Verbal Abuse
Decision = Innocent
0
mropenmindAuthor Commented:
empty line was there because of: <!-- CHANGE HERE -->
0
Terry WoodsIT GuruCommented:
$tableBody = preg_replace("#^.*<tbody>(.*?)</tbody>.*$#is", "$1", $text);
preg_match_all("#<tr>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)
*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td
>(.*?)</td>#si", $tableBody, $matches);
foreach($matches[1] as $num=>$value) {
  print "Date: ".preg_replace("/ .*/","",$matches[3][$num])."\n";
  print "Category: ".$matches[5][$num]."\n";
  print "Decision: ".strip_tags($matches[6][$num])."\n";
}

Output:
Date: 03-21-12
Category: Excessive Verbal Abuse
Decision: Guilty
Date: 03-21-12
Category: Excessive Verbal Abuse
Decision: Innocent
0
mropenmindAuthor Commented:
where did you put that code?
0
Terry WoodsIT GuruCommented:
Tested it on a linux server. Actually the line breaks might cause trouble - corrected version here:

$tableBody = preg_replace("#^.*<tbody>(.*?)</tbody>.*$#is", "$1", $text);
preg_match_all("#<tr>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>(?:(?!</tr>).)*?<td>(.*?)</td>#si", $tableBody, $matches);
foreach($matches[1] as $num=>$value) {
  print "Date: ".preg_replace("/ .*/","",$matches[3][$num])."\n";
  print "Category: ".$matches[5][$num]."\n";
  print "Decision: ".strip_tags($matches[6][$num])."\n";
}

Open in new window

0
mropenmindAuthor Commented:
I can't seem to find the correct place where to place your latest code.
0
Terry WoodsIT GuruCommented:
Just put the HTML source into $text first, and it should work.

Oh, and Ray, data design patterns would make a great subject for an article...
0
Ray PaseurCommented:
@mropenmind: Have you ever taken a class in PHP programming?  If not, you might want to consider it.  Many community colleges offer PHP classes, and there are user groups (that offer code reviews) in the major cities.  This will give you some structured learning about PHP and it will make your learning process faster and much, much easier.

If you cannot find those kinds of learning resources, run (don't walk) to buy this book and give yourself a month to read, absorb, and work through the examples.  It will not make you a pro, but it will put you light years ahead in the quest to do things with PHP.
http://www.sitepoint.com/books/phpmysql4/

Once you have completed the SitePoint book you will never again feel like you brought a spork to a knife fight!
0
Ray PaseurCommented:
@TerryAtOpus:

;-)

Thanks, ~Ray
0
mropenmindAuthor Commented:
I used your code in the way TerryAtOpus said, but it's just that I didn't find a way to make it work with the date array.
0
mropenmindAuthor Commented:
that's why I posted: "I can't seem to find the correct place where to place your latest code."
0
Terry WoodsIT GuruCommented:
Are you confusing my code with Ray's? We both posted completely independent solutions. My code should give the output I posted with just the HTML source.
0
mropenmindAuthor Commented:
Oh yea, I actually am...
0
Pierre FrançoisSenior consultantCommented:
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.