Solved

Finding emails and links in an html, PHP file

Posted on 2013-06-17
2
346 Views
Last Modified: 2013-09-22
Hi

I'm dabbling with the code to pull from html code, all links and email addresses from a page.
I'm using the jsoup API, which performs wonderfully.
I have got some of it working, listing links,
but, I don't know the regular expression system jsoup uses for email addresses.
Maybe with the Document section..
(the code is in a Document object) with the Jsoup API . .

        //place source in a Document Object

        Document doc = Jsoup.connect(url).get();

        Elements links = doc.select("a[href]");
        Elements media = doc.select("[src]");
        Elements imports = doc.select("link[href]");

what would the email collecting line be?
Elements emails = ?
Thanks
0
Comment
Question by:beavoid
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 13

Accepted Solution

by:
haloexpertsexchange earned 500 total points
ID: 39253793
check for mailto?
0
 

Author Comment

by:beavoid
ID: 39255045
Thanks for all this advice

I found an API called jsoup that handles all of these issues, probably regular expressions to find text within a page.
It finds links perfectly. I attached a zip with the main files I use, and my listlinks.java where my code with problems is

I am making an HTML diver that starts at one page, and recursively goes through every link from page down to pages beneath it. It works fine, but claims that some URL findings are invalid and its seems silly. I can't see the exact line that is the problem. - or how to avoid problem pages. I avoid certain extensions like .swf and .png
You can comment out which URL you'd like to begin with in main(

I can't attach JAR files. Google it :)
It is
jsoup-1.7.2.jar

My java file is below

Thanks
ListLinks.java
0

Featured Post

Learn how to optimize MySQL for your business need

With the increasing importance of apps & networks in both business & personal interconnections, perfor. has become one of the key metrics of successful communication. This ebook is a hands-on business-case-driven guide to understanding MySQL query parameter tuning & database perf

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does your audience prefer people in photos or no people? How can you best highlight what you’re selling? What are your competitors doing, and what can you do that is different and unique from them?  Continue reading to learn how to make your images …
This article was originally published on Monitis Blog, you can check it here . Today it’s fairly well known that high-performing websites and applications bring in more visitors, higher SEO, and ultimately more sales. By the same token, downtime…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question