parsing html PHP vs JAVA

I\m developing java application and now I want to parse html pages to get title, description, number of outgoing links etc.
I've read that Java might be not the best choice for parsing html. Is PHP parsing better? Or maybe should I JTidy and than Java parser (which one?)?
static86Asked:
Who is Participating?
 
for_yanCommented:
It looks that there is no overwhelming opinion.
In general people mostly more use PHP for smaller projects in personal or small buisness
environemnt; big companies more tend to use Java, though there are exceptions

It is probably then depends on your personal preferences - with which language you'd be more comfortable,
that would be probably the best choice for parsing for you.
In Java you can use JTidy and then some XML parser like SAX or DOM - I used it this way
and it worked OK for me.
Some folks like JSoup ot TagSoup or HTMLCleaner

this is the comparuison of several parsers  -  and lively discussion below:

http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/


0
 
pius_babbunCommented:
Hope this link would give you a better understanding of these languages.

http://onepixelahead.com/2010/03/04/php-vs-java-which-one-is-the-better-web-language/

Hope it helps you.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.