• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 361
  • Last Modified:

Getting text from Wikipedia

I have a database of products, there is a column called "wikiurl" which relates to the Wikipedia URL. My question is what is the best way to extract the intro paragraph from wikipedia. So for example if "wikiurl" = "iPhone" then I would want to get the first paragraph from the page: http://en.wikipedia.org/wiki/Iphone

I'm using PHP and CodeIgniter. Whats the best way to scrape this info?
0
alex_wareing
Asked:
alex_wareing
1 Solution
 
Roger BaklundCommented:
The code below seems to do what you want. It fetches the first paragraph from the page. I am not sure if this will work with all articles.

Some warnings are generated during the parsing, which is why I used error_reporting() to supress them.
error_reporting(E_ALL^E_WARNING);
$d = new DOMDocument();
$d->loadHTMLFile('http://en.wikipedia.org/wiki/Iphone');
$paras = $d->getElementsByTagName('p');
echo $paras->item(0)->nodeValue;

Open in new window

0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now