Link to home
Create AccountLog in
Avatar of Mike Miller
Mike MillerFlag for United States of America

asked on

Rapidshare Link Regular Expression

I am working on a project and the client wants to be able to check rapidshare links to tell if they are valid. So I need some help with a regular expression function for PHP that will pull all the rapidshare.com or rapidshare.de links out of the page source it gets as a variable, and return an array with them.

Does that make sense?
Avatar of b0lsc0tt
b0lsc0tt
Flag of United States of America image

qlogix,

You just need help getting the links with a regex right?  If so please provide a sample of the html that you would get which contains the links.

Let me know if you have any questions or need more information.

b0lsc0tt
Avatar of Mike Miller

ASKER

Here is the example:
<td valign="top" class="postbody"><div class="postbody_div">
<hr />
<img src="http://i213.photobucket.com/albums/cc98/warezsharez_december/i-sound.gif" alt="Image" title="Image" border="0" />
<br />
 
<br />
 
<br />
<span style="font-weight:bold">i-Sound WMA MP3 Recorder</span> turn your computer into complete home recording studio. You can record streaming audio into MP3, OGG, WMA, APE, WAV format sound file directly without costing any other disk space. Built-in scheduler allows you to record streaming audio from specified URL at predefined time. VOX system automatically monitors the input source and activates streaming recording when the input volume reaches a specified level. The recording automatically stops once the audio level drops below a specified threshold. Typical applications:
 
<br />
 
<br />
    * Convert <span style="font-weight:bold">Cassette</span> or <span style="font-weight:bold">LP</span> to MP3
<br />
    * <span style="font-weight:bold">Record Radio</span> with built-in scheduler
 
<br />
    * Streaming Audio Recorder:<span style="font-weight:bold">Capture Streaming Audio</span> to MP3,WMA,OGG
<br />
    * Record Lectures with <span style="font-weight:bold">Voice-Activation</span>
<br />
    * Real-time noise reduction
<br />
    * <span style="font-weight:bold">Record Skype Calls</span> (both sides)
 
<br />
    * Record protected <span style="font-weight:bold">M4P, WMA</span> and <span style="font-weight:bold">AAC</span> files to MP3 format legally.
<br />
    * Record <span style="font-weight:bold">MIDI to MP3</span> format
<br />
 
<br />
 
<br />
<span style="font-size:18px; line-height:normal"><span style="font-weight:bold">Download:</span></span>
<br />
<table width="90%" cellspacing="1" cellpadding="3" border="0" align="center">
<tr>
<td><span class="genmed"><strong>Code:</strong></span></td>
</tr>
<tr>
<td class="code">http&#58;//rapidshare.com/files/80236549/iSound_MP3_WMA_Recorder_Pro_v6.8.2.0.rar</td></tr></table>
<br />
<span style="font-weight:bold"><span style="font-size:10px; line-height:normal">Link checked on Mon Jul 21, 2008 11:38 am [WBB_Linkchecker_Bot]</span></span></div></td>
 
</tr>
<tr>
<td height="40" valign="bottom" class="genmed"><br />_________________<br />Removed, Signature May Not Be Bigger Than 500 x 200 ~ ashmo ~
<br />
 
<br />
 
<br />
<span style="font-weight:bold">Please Download This As A Free User:</span>
<br />
<table width="90%" cellspacing="1" cellpadding="3" border="0" align="center">
<tr>
<td><span class="genmed"><strong>Code:</strong></span></td>
</tr>
 
<tr>
<td class="code">http&#58;//rapidshare.com/files/79526037/Thank_You.mp3</td></tr></table><span class="postdetails"></span></td>
</tr>
</table>

Open in new window

Thanks for the sample?  A complex regex to id ANY url could be more than we need.  Will all the urls you want be http?  Do you want all urls or do you want just those in the anchor tag, just in the table cell, or where?  Will they all be rapidshare.com?

bol
Oops.  I didn't need a question mark on that first sentence.  Sorry. :)

bol
Yes I only want the rapidshare.com links, they will all be http:// however there may be either rapidshare.com or rapidshare.de links.  I would like them to be matchable from anywhere in the source since alot of people dont place them in the "Code" tag on the forum.
OK.  Thanks.  I realize I overlooked the answer to one of my questions.  The code below should do it.

preg_match_all('/http(?::|&#58;)\/\/rapidshare\.(?:com|de)\/[-A-Z0-9.]+\/[-A-Z0-9+&@#\/%=~_|!:,.;]*/i', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];

It assumes the html is in the variable $subject.  It will then put the resulting array in a variable named $result.  Let me know if you have a question or need more info.

bol
SOLUTION
Avatar of ddrudik
ddrudik
Flag of United States of America image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Umm, neither of those seem to work. I tested them on another site that is similar to my clients and it said that there was no matches for either of them.

the page I tested on was : http://www.warez-bb.org/viewtopic.php?t=1061942
ASKER CERTIFIED SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Ok that worked, apparently my server is having a problem accessing the site. Thanks!!
Thanks for your help!!
Your welcome!  If the problem doesn't go away when the site works again then it may be the contents of that page.  If there is a big difference in the html in each page then it could make it so the script won't find a match.

I'm glad I could help.  Thanks for the grade, the points and the fun question.

bol