?
Solved

What is the best HTML parser for PHP?

Posted on 2007-10-01
9
Medium Priority
?
2,553 Views
Last Modified: 2008-01-09
Hi,
What is the best html parser for PHP? Something like BeautifulSoup for Python.
Thanks
Jamie
0
Comment
Question by:jamie_lynn
  • 3
  • 2
  • 2
  • +2
9 Comments
 
LVL 2

Expert Comment

by:m_tawfick
ID: 19989384
check this page for a good list:
http://www.info4php.com/?req=PHP_Editors
0
 
LVL 11

Assisted Solution

by:siliconbrit
siliconbrit earned 400 total points
ID: 19989797

It's not possible to say which parser is best.  Because of the enormous variation in how HTML pages are presented, your choice if tool will usually be driven by the pages you are trying to parse.  I have used a couple of available tools, and in some cases written my own.

However, you might find that the raw parser at http://sourceforge.net/projects/php-html/ will do what you need.  If not, just google for PHP HTML Parser and you'll find quite a few options.  I could list some here, but if someone reads this next year the list will have changed.

Try out the sourceforge one I mentioned, and if you are still struggling, post back with more information on what you are trying to parse, and I'll try to help you narrow your search.
0
 
LVL 54

Assisted Solution

by:b0lsc0tt
b0lsc0tt earned 400 total points
ID: 19992580
If you need something to get the html then CURL is commonly used in PHP scripts.  It is part of PHP.  There is some info on it at http://php.net/curl.

Let me know if you need more info on it or have question.

bol
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 48

Accepted Solution

by:
hernst42 earned 1200 total points
ID: 19994151
Depends on your need. For which job do you need the parser?

html-validation can be don with the tidyextension of php http://www.php.net/tidy
strip dangerous html code: http://pear.php.net/package/HTML_Safe
0
 

Author Comment

by:jamie_lynn
ID: 20004403
Well.. I want to parser that can search on tag name, attribute, or value.
....
<div id="rating">

</div>
....
i.e. divcontent = soup.findAll("div", { "id" : "rating" })
     Then search again from the results

Is there a html parser in PHP that does this?

Thanks
Jamie
0
 
LVL 48

Assisted Solution

by:hernst42
hernst42 earned 1200 total points
ID: 20004437
0
 

Author Comment

by:jamie_lynn
ID: 20004601
Thanks hernst.  Can this handle messy html that everyone write on the web?  I was thinking about using dom parsers but I was reading that dom parsers does not do well with poor html.
i.e. No end tag, unquoted values, etc

Thanks
Jamie
0
 
LVL 48

Expert Comment

by:hernst42
ID: 20004656
If you have poor html, you could 1st use tidy to make it better html and then use it in dom. If the html is so poor that tidy fails, the user should fix it. No parser can do such things.
0
 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 20015219
I'm glad I could help.  Thanks for the fun question, grade and points.

bol
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this. Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it i…
This article discusses how to create an extensible mechanism for linked drop downs.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses
Course of the Month14 days, 15 hours left to enroll

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question