Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Getting text from Wikipedia

Posted on 2009-07-13
1
Medium Priority
?
359 Views
Last Modified: 2012-05-07
I have a database of products, there is a column called "wikiurl" which relates to the Wikipedia URL. My question is what is the best way to extract the intro paragraph from wikipedia. So for example if "wikiurl" = "iPhone" then I would want to get the first paragraph from the page: http://en.wikipedia.org/wiki/Iphone

I'm using PHP and CodeIgniter. Whats the best way to scrape this info?
0
Comment
Question by:alex_wareing
1 Comment
 
LVL 39

Accepted Solution

by:
Roger Baklund earned 2000 total points
ID: 24845065
The code below seems to do what you want. It fetches the first paragraph from the page. I am not sure if this will work with all articles.

Some warnings are generated during the parsing, which is why I used error_reporting() to supress them.
error_reporting(E_ALL^E_WARNING);
$d = new DOMDocument();
$d->loadHTMLFile('http://en.wikipedia.org/wiki/Iphone');
$paras = $d->getElementsByTagName('p');
echo $paras->item(0)->nodeValue;

Open in new window

0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.
Suggested Courses

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question