xeta_it
asked on
URGENT: Build multidimensional array from mutliple preg_match_all().
This problem is both a little difficult (at least for me) and it is also extremely urgent:
I'm parsing a html-string from a fetched page with "preg_match_all()" several times.
preg_match_all("/[0-9]*\) <font/", $html, $id_matches); ==> ARRAY $id_matches containing ids: (1,2,3,4,5,6)
preg_match_all("/[a-z]*\) <font/", $html, $name_matches); ==> ARRAY $name_matches containing names: (dick,phil,jack,john,jim,b ob)
preg_match_all("/[a-z]*\) <font/", $html, $adress_matches); ==> ARRAY $adress_matches containing names: (elm street,paper street,gotham street,lake drive,arlington road,downing street)
how do I build a multidimensional array from that?
Like this
id | name | adress
1 | dick | elm street
2 | phil | paperstreet
and so on...
I'm parsing a html-string from a fetched page with "preg_match_all()" several times.
preg_match_all("/[0-9]*\) <font/", $html, $id_matches); ==> ARRAY $id_matches containing ids: (1,2,3,4,5,6)
preg_match_all("/[a-z]*\) <font/", $html, $name_matches); ==> ARRAY $name_matches containing names: (dick,phil,jack,john,jim,b
preg_match_all("/[a-z]*\) <font/", $html, $adress_matches); ==> ARRAY $adress_matches containing names: (elm street,paper street,gotham street,lake drive,arlington road,downing street)
how do I build a multidimensional array from that?
Like this
id | name | adress
1 | dick | elm street
2 | phil | paperstreet
and so on...
Now that code is not quite robust enough.
What happens if the source data is missing something.
<?php
$sHTMLPage = <<< END_HTML
<html>
<head>
<title>Some data</title>
</head>
<body>
<table border="0">
<tr>
<th>ID</th>
<th>Name</th>
<th>Address</th>
</tr>
<tr>
<th>1</th>
<td>John</td>
<td>123 Acacia Avenue<br>Longbridge<br>Lo ndon</td>
</tr>
<tr>
<th>2</th>
<td>Paul</td>
<td>321 Acacia Avenue<br>Shortbridge<br>L ondon</td>
</tr>
<tr>
<th>3</th>
<td>Ringo</td>
<td>999 Acacia Avenue<br>Underbridge<br>L ondon</td>
</tr>
<tr>
<th>4</th>
<td>George</td>
<td>666 Acacia Avenue<br>Overbridge<br>Lo ndon</td>
</tr>
<tr>
<th></th>
<td>Richard</td>
<td>1a Acacia Avenue<br>Nobridge<br>Lond on</td>
</tr>
<tr>
<th>5</th>
<td></td>
<td>2b Acacia Avenue<br>Somebridge<br>Lo ndon</td>
</tr>
<tr>
<th>6</th>
<td>Sally</td>
<td></td>
</tr>
</table>
</body>
</html>
END_HTML;
// Extract the ID, Name and Address.
preg_match_all('|<tr>.*<th >(\d{1,10} )?</th>.*< td>(.*)?</ td>.*<td>( .*)?</td>. *</tr>|sim U',$sHTMLP age,$aMatc hes);
echo "<pre>\n";
print_r($aMatches);
echo "\n</pre>\n";
?>
Is a better example.
This way you will see that the arrays contain empty elements for the missing data. Using your initial code, the would be no entry at all and you would be only have 6 entries in the list rather than 7.
Richard.
What happens if the source data is missing something.
<?php
$sHTMLPage = <<< END_HTML
<html>
<head>
<title>Some data</title>
</head>
<body>
<table border="0">
<tr>
<th>ID</th>
<th>Name</th>
<th>Address</th>
</tr>
<tr>
<th>1</th>
<td>John</td>
<td>123 Acacia Avenue<br>Longbridge<br>Lo
</tr>
<tr>
<th>2</th>
<td>Paul</td>
<td>321 Acacia Avenue<br>Shortbridge<br>L
</tr>
<tr>
<th>3</th>
<td>Ringo</td>
<td>999 Acacia Avenue<br>Underbridge<br>L
</tr>
<tr>
<th>4</th>
<td>George</td>
<td>666 Acacia Avenue<br>Overbridge<br>Lo
</tr>
<tr>
<th></th>
<td>Richard</td>
<td>1a Acacia Avenue<br>Nobridge<br>Lond
</tr>
<tr>
<th>5</th>
<td></td>
<td>2b Acacia Avenue<br>Somebridge<br>Lo
</tr>
<tr>
<th>6</th>
<td>Sally</td>
<td></td>
</tr>
</table>
</body>
</html>
END_HTML;
// Extract the ID, Name and Address.
preg_match_all('|<tr>.*<th
echo "<pre>\n";
print_r($aMatches);
echo "\n</pre>\n";
?>
Is a better example.
This way you will see that the arrays contain empty elements for the missing data. Using your initial code, the would be no entry at all and you would be only have 6 entries in the list rather than 7.
Richard.
ASKER
Okay, that answer is close to perfect...
The only problem I have is that the kind of array isn't exctacly what i was looking for.
Maybe a missunderstanding.
The array must be able to build like this table:
id | name | adress
1 | dick | elm street
2 | phil | paperstreet
With the array being built by your example, I'm not getting to the point.
I need to go like this:
<table>
<? foreach ($aMatches as $match){?>
<tr>
<td><? echo match['id'];?></td><td><? echo match['name'];?></td><td>< ? echo match[''];?></td>
</tr>
<?} // foreach ?>
</table>
The only problem I have is that the kind of array isn't exctacly what i was looking for.
Maybe a missunderstanding.
The array must be able to build like this table:
id | name | adress
1 | dick | elm street
2 | phil | paperstreet
With the array being built by your example, I'm not getting to the point.
I need to go like this:
<table>
<? foreach ($aMatches as $match){?>
<tr>
<td><? echo match['id'];?></td><td><? echo match['name'];?></td><td><
</tr>
<?} // foreach ?>
</table>
ASKER
so basically it would be like flipping the array over 90 Degrees to the side
Ok.
Off to lunch. Back in 10.
Off to lunch. Back in 10.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Was still alittle complicated but I got it.
Thanks a lot!!!
Thanks a lot!!!
ASKER
Okay one last short question...
In your regular Expression you use the modifier U
But this one is responsible for the loss of signs like & @ etc...
How do I get around this... ?
I need the @ sign for emailadresses and the & sign for special characters.
In your regular Expression you use the modifier U
But this one is responsible for the loss of signs like & @ etc...
How do I get around this... ?
I need the @ sign for emailadresses and the & sign for special characters.
U is the ungreedy setting.
<td>(.*)</td>
will capture from the inside of the FIRST <td> to the LAST </td>
Not normally what is required.
Instead ...
<td>(.*?)</td>
is used, but as I would have a LOT of ?, you can use U as a modifier.
You should not be losing any characters.
Can you show your script and what you are looking at?
<td>(.*)</td>
will capture from the inside of the FIRST <td> to the LAST </td>
Not normally what is required.
Instead ...
<td>(.*?)</td>
is used, but as I would have a LOT of ?, you can use U as a modifier.
You should not be losing any characters.
Can you show your script and what you are looking at?
What happens if the regular expressions don't return the same number of parts for each check?
It is a LOT easier to do the whole regular expression in 1 hit.
From your code though I cannot see how they are working as the second and third regxps are the same so will give the same results.
So, a quick example.
<?php
$sHTMLPage = <<< END_HTML
<html>
<head>
<title>Some data</title>
</head>
<body>
<table border="0">
<tr>
<th>ID</th>
<th>Name</th>
<th>Address</th>
</tr>
<tr>
<th>1</th>
<td>John</td>
<td>123 Acacia Avenue<br>Longbridge<br>Lo
</tr>
<tr>
<th>2</th>
<td>Paul</td>
<td>321 Acacia Avenue<br>Shortbridge<br>L
</tr>
<tr>
<th>3</th>
<td>Ringo</td>
<td>999 Acacia Avenue<br>Underbridge<br>L
</tr>
<tr>
<th>4</th>
<td>George</td>
<td>666 Acacia Avenue<br>Overbridge<br>Lo
</tr>
</table>
</body>
</html>
END_HTML;
// Extract the ID, Name and Address.
preg_match_all('|<tr>.*<th
echo '<pre>';
print_r($aMatches);
echo '</pre>';
?>
Run this and the output is ...
Array
(
[0] => Array
(
[0] => <tr>
<th>ID</th>
<th>Name</th>
<th>Address</th>
</tr>
<tr>
<th>1</th>
<td>John</td>
<td>123 Acacia Avenue<br>Longbridge<br>Lo
</tr>
[1] => <tr>
<th>2</th>
<td>Paul</td>
<td>321 Acacia Avenue<br>Shortbridge<br>L
</tr>
[2] => <tr>
<th>3</th>
<td>Ringo</td>
<td>999 Acacia Avenue<br>Underbridge<br>L
</tr>
[3] => <tr>
<th>4</th>
<td>George</td>
<td>666 Acacia Avenue<br>Overbridge<br>Lo
</tr>
)
[1] => Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
)
[2] => Array
(
[0] => John
[1] => Paul
[2] => Ringo
[3] => George
)
[3] => Array
(
[0] => 123 Acacia Avenue<br>Longbridge<br>Lo
[1] => 321 Acacia Avenue<br>Shortbridge<br>L
[2] => 999 Acacia Avenue<br>Underbridge<br>L
[3] => 666 Acacia Avenue<br>Overbridge<br>Lo
)
)
$aMatches[1] is the array of IDs.
$aMatches[2] is the array of names.
$aMatches[3] is the array of addresses.
Richard.