Solved

Regex find PHP

Posted on 2012-04-12
10
243 Views
Last Modified: 2012-04-13
Ok,

So Im trying to make a page that would allow visitor to like our Facebook fanpage and receive free goods in exchange.

So Im using PHP to extract amount of likes. The $content includes the following code:
<div class="fsm fwn fcg">2,704,886 likes · 80,715 talking about this</div>

Open in new window


The value Im trying to extract is 2,704,886. Any ideas what regex pattern do I need to achieve this?
0
Comment
Question by:GVNPublic123
10 Comments
 
LVL 34

Accepted Solution

by:
gr8gonzo earned 500 total points
ID: 37837175
$str = '<div class="fsm fwn fcg">2,704,886 likes · 80,715 talking about this</div>';
if(preg_match("/([0-9]+) likes/",str_replace(",","",$str),$matches))
{
   echo $matches[1];
}
0
 

Author Comment

by:GVNPublic123
ID: 37837251
Actually I just found out I cant file_get_contents the facebook page, as it returns Uncompatible Browser...I guess they have protected themselves from scrapers...

Any idea how could I access that info...maybe API?
0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 37837279
If it's a public page, you could try to use cURL to get it and pretend to be a normal browser. Use this example from the PHP man page:

<?php 

$GVNPublic123 = new cURL(); 
$html = $GVNPublic123->get('http://www.facebook.com/yourpagetoscrape'); 

// Do your scraping on $html here
// $str = '<div class="fsm fwn fcg">2,704,886 likes · 80,715 talking about this</div>';
$str = $html;
if(preg_match("/([0-9]+) likes/",str_replace(",","",$str),$matches))
{
   echo $matches[1];
}

class cURL { 
var $headers; 
var $user_agent; 
var $compression; 
var $cookie_file; 
var $proxy; 
function cURL($cookies=TRUE,$cookie='cookies.txt',$compression='gzip',$proxy='') { 
$this->headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg'; 
$this->headers[] = 'Connection: Keep-Alive'; 
$this->headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8'; 
$this->user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)'; 
$this->compression=$compression; 
$this->proxy=$proxy; 
$this->cookies=$cookies; 
if ($this->cookies == TRUE) $this->cookie($cookie); 
} 
function cookie($cookie_file) { 
if (file_exists($cookie_file)) { 
$this->cookie_file=$cookie_file; 
} else { 
fopen($cookie_file,'w') or $this->error('The cookie file could not be opened. Make sure this directory has the correct permissions'); 
$this->cookie_file=$cookie_file; 
fclose($this->cookie_file); 
} 
} 
function get($url) { 
$process = curl_init($url); 
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
curl_setopt($process, CURLOPT_HEADER, 0); 
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file); 
curl_setopt($process,CURLOPT_ENCODING , $this->compression); 
curl_setopt($process, CURLOPT_TIMEOUT, 30); 
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy); 
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
$return = curl_exec($process); 
curl_close($process); 
return $return; 
} 
function post($url,$data) { 
$process = curl_init($url); 
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
curl_setopt($process, CURLOPT_HEADER, 1); 
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file); 
curl_setopt($process, CURLOPT_ENCODING , $this->compression); 
curl_setopt($process, CURLOPT_TIMEOUT, 30); 
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy); 
curl_setopt($process, CURLOPT_POSTFIELDS, $data); 
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($process, CURLOPT_POST, 1); 
$return = curl_exec($process); 
curl_close($process); 
return $return; 
} 
function error($error) { 
echo "<center><div style='width:500px;border: 3px solid #FFEEFF; padding: 3px; background-color: #FFDDFF;font-family: verdana; font-size: 10px'><b>cURL Error</b><br>$error</div></center>"; 
die; 
} 
} 
?> 

Open in new window

0
Is Your AD Toolbox Looking More Like a Toybox?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

 

Author Comment

by:GVNPublic123
ID: 37837379
Yep, feeding it useragent and other headers via curl did the trick. However the regex doesnt work for me.
0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 37837506
It is probably just an issue with the content being on multiple lines. The regex I gave you was for a single line of HTML while the cURL class is returning the whole page of HTML.

What is the URL?
0
 
LVL 10

Expert Comment

by:pfrancois
ID: 37837563
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 37837867
Sometimes it's easier to use explode() than regex, but regex is good for tidying up the results string.
<?php // RAY_temp_gvn.php
error_reporting(E_ALL);

$doc = <<<DOC
Lots of
Random Stuff
// READ FROM THE WEB SITE WITH A CURL REQUEST...
<div class="fsm fwn fcg">2,704,886 likes · 80,715 talking about this</div>
Even more stuff
DOC;

$arr = explode('fsm fwn fcg', $doc);
$arr = explode('likes', $arr[1]);
$num = preg_replace('/[^0-9]/', NULL, $arr[0]);
var_dump($num);

Open in new window

http://www.laprbass.com/RAY_temp_gvn.php
0
 

Author Comment

by:GVNPublic123
ID: 37837963
Yeh, exploding totally made it easier as theres only 1 match now.

Do you guys have any idea how could I get the profile picture (in small format) so I can make a nice widget with it? Than I detect when follow is done on like button via JS api and thats it (I already know how to do that)...

Now I just need an image to go with likes count.
0
 

Author Comment

by:GVNPublic123
ID: 37838189
This is code of image:
class="scaledImageFitWidth img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-snc4/41581_82061850555_1078443985_n.jpg"

Open in new window

0
 

Author Comment

by:GVNPublic123
ID: 37838622
Id like to get code of image with regex, but I dont know how to make a pattern.
0

Featured Post

Is Your AD Toolbox Looking More Like a Toybox?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
These days socially coordinated efforts have turned into a critical requirement for enterprises.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question