Solved

Regex find PHP

Posted on 2012-04-12
10
244 Views
Last Modified: 2012-04-13
Ok,

So Im trying to make a page that would allow visitor to like our Facebook fanpage and receive free goods in exchange.

So Im using PHP to extract amount of likes. The $content includes the following code:
<div class="fsm fwn fcg">2,704,886 likes · 80,715 talking about this</div>

Open in new window


The value Im trying to extract is 2,704,886. Any ideas what regex pattern do I need to achieve this?
0
Comment
Question by:GVNPublic123
10 Comments
 
LVL 34

Accepted Solution

by:
gr8gonzo earned 500 total points
ID: 37837175
$str = '<div class="fsm fwn fcg">2,704,886 likes · 80,715 talking about this</div>';
if(preg_match("/([0-9]+) likes/",str_replace(",","",$str),$matches))
{
   echo $matches[1];
}
0
 

Author Comment

by:GVNPublic123
ID: 37837251
Actually I just found out I cant file_get_contents the facebook page, as it returns Uncompatible Browser...I guess they have protected themselves from scrapers...

Any idea how could I access that info...maybe API?
0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 37837279
If it's a public page, you could try to use cURL to get it and pretend to be a normal browser. Use this example from the PHP man page:

<?php 

$GVNPublic123 = new cURL(); 
$html = $GVNPublic123->get('http://www.facebook.com/yourpagetoscrape'); 

// Do your scraping on $html here
// $str = '<div class="fsm fwn fcg">2,704,886 likes · 80,715 talking about this</div>';
$str = $html;
if(preg_match("/([0-9]+) likes/",str_replace(",","",$str),$matches))
{
   echo $matches[1];
}

class cURL { 
var $headers; 
var $user_agent; 
var $compression; 
var $cookie_file; 
var $proxy; 
function cURL($cookies=TRUE,$cookie='cookies.txt',$compression='gzip',$proxy='') { 
$this->headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg'; 
$this->headers[] = 'Connection: Keep-Alive'; 
$this->headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8'; 
$this->user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)'; 
$this->compression=$compression; 
$this->proxy=$proxy; 
$this->cookies=$cookies; 
if ($this->cookies == TRUE) $this->cookie($cookie); 
} 
function cookie($cookie_file) { 
if (file_exists($cookie_file)) { 
$this->cookie_file=$cookie_file; 
} else { 
fopen($cookie_file,'w') or $this->error('The cookie file could not be opened. Make sure this directory has the correct permissions'); 
$this->cookie_file=$cookie_file; 
fclose($this->cookie_file); 
} 
} 
function get($url) { 
$process = curl_init($url); 
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
curl_setopt($process, CURLOPT_HEADER, 0); 
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file); 
curl_setopt($process,CURLOPT_ENCODING , $this->compression); 
curl_setopt($process, CURLOPT_TIMEOUT, 30); 
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy); 
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
$return = curl_exec($process); 
curl_close($process); 
return $return; 
} 
function post($url,$data) { 
$process = curl_init($url); 
curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
curl_setopt($process, CURLOPT_HEADER, 1); 
curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file); 
if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file); 
curl_setopt($process, CURLOPT_ENCODING , $this->compression); 
curl_setopt($process, CURLOPT_TIMEOUT, 30); 
if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy); 
curl_setopt($process, CURLOPT_POSTFIELDS, $data); 
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
curl_setopt($process, CURLOPT_POST, 1); 
$return = curl_exec($process); 
curl_close($process); 
return $return; 
} 
function error($error) { 
echo "<center><div style='width:500px;border: 3px solid #FFEEFF; padding: 3px; background-color: #FFDDFF;font-family: verdana; font-size: 10px'><b>cURL Error</b><br>$error</div></center>"; 
die; 
} 
} 
?> 

Open in new window

0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 

Author Comment

by:GVNPublic123
ID: 37837379
Yep, feeding it useragent and other headers via curl did the trick. However the regex doesnt work for me.
0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 37837506
It is probably just an issue with the content being on multiple lines. The regex I gave you was for a single line of HTML while the cURL class is returning the whole page of HTML.

What is the URL?
0
 
LVL 10

Expert Comment

by:pfrancois
ID: 37837563
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 37837867
Sometimes it's easier to use explode() than regex, but regex is good for tidying up the results string.
<?php // RAY_temp_gvn.php
error_reporting(E_ALL);

$doc = <<<DOC
Lots of
Random Stuff
// READ FROM THE WEB SITE WITH A CURL REQUEST...
<div class="fsm fwn fcg">2,704,886 likes · 80,715 talking about this</div>
Even more stuff
DOC;

$arr = explode('fsm fwn fcg', $doc);
$arr = explode('likes', $arr[1]);
$num = preg_replace('/[^0-9]/', NULL, $arr[0]);
var_dump($num);

Open in new window

http://www.laprbass.com/RAY_temp_gvn.php
0
 

Author Comment

by:GVNPublic123
ID: 37837963
Yeh, exploding totally made it easier as theres only 1 match now.

Do you guys have any idea how could I get the profile picture (in small format) so I can make a nice widget with it? Than I detect when follow is done on like button via JS api and thats it (I already know how to do that)...

Now I just need an image to go with likes count.
0
 

Author Comment

by:GVNPublic123
ID: 37838189
This is code of image:
class="scaledImageFitWidth img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-snc4/41581_82061850555_1078443985_n.jpg"

Open in new window

0
 

Author Comment

by:GVNPublic123
ID: 37838622
Id like to get code of image with regex, but I dont know how to make a pattern.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question