[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

PHP Regex to get link and size...

Posted on 2009-04-23
4
Medium Priority
?
649 Views
Last Modified: 2013-12-13
Hello!

I need to open a html page and read the file, then i need to extract two things from _many_ blocks that look like this:

<tr class="text">
<td><a href="http://media4.something.com/DPliana/something/midband/DPliana_4.mp4">Download</a></td>
<td align="left"><span class="text">something here</span></td>
<td align="center">82 MB</td>
</tr>


I *only* want the links that have the text "Download", note that there are plenty of other links on the page both with the text 'Download' and without. Lastly i need the filesize which will always be in the same _place_ as above.

So the above  should give me an array:
someArray ( [link1] => "http://media4.something.com/DPliana/something/midband/DPliana_4.mp4"  [size1] =>  "82 MB")
0
Comment
Question by:lopband
  • 3
4 Comments
 

Accepted Solution

by:
jumanj1 earned 1400 total points
ID: 24221399
If you are not looking for a strictly regex solution, check my attached code... its dirty, but it should work
<?php
 
$start_size_line_counter=0;
$size_line_counter=0;
$i=0;
$handle = @fopen("inputfile.html", "r");
if ($handle) {
    while (!feof($handle)) {
        $buffer = fgets($handle, 2000);
		//$i++;echo $i.". ".$buffer;
		if(strstr($buffer,"Download</a>"))
		{
        
		$explode1 = explode("href=\"", $buffer);
		$explode2 = explode("\">Download", $explode1[1]);
		$the_urls[]= $explode2[0];
 
		$start_size_line_counter=1;
		}
 
		if($start_size_line_counter==1)
		{
			if($size_line_counter<=1)
				{
				if(trim($buffer) !="")
					{
					$size_line_counter++;
					}
				}
			else if($size_line_counter==2)
			{
				if(trim($buffer) !="")
					{
						$explode3 = explode(">", $buffer);
						$explode4 = explode("<", $explode3[1]);
 
						$the_size[]=$explode4[0];
						$start_size_line_counter=0;$size_line_counter=0;
					}
 
			}
		
		}
 
 
    }
    fclose($handle);
}
 
 
for($i=0;$i<count($the_size);$i++)
{
echo $the_urls[$i]." ".$the_size[$i]."<br>";
}
 
//print_r($the_size);
?>

Open in new window

0
 
LVL 1

Author Comment

by:lopband
ID: 24221410
That does work pretty well, but i was looking for a regex solution...

if nobody else posts i'll give you the points.

Thanks!
0
 
LVL 1

Author Comment

by:lopband
ID: 24221465
ok, it seems to be working exactly as planned so i'll give you the pts.. cheers!
0
 
LVL 1

Author Closing Comment

by:lopband
ID: 31574045
thanks!
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Suggested Courses
Course of the Month18 days, 22 hours left to enroll

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question