Solved

Finding emails and links in an html, PHP file

Posted on 2013-06-17
2
327 Views
Last Modified: 2013-09-22
Hi

I'm dabbling with the code to pull from html code, all links and email addresses from a page.
I'm using the jsoup API, which performs wonderfully.
I have got some of it working, listing links,
but, I don't know the regular expression system jsoup uses for email addresses.
Maybe with the Document section..
(the code is in a Document object) with the Jsoup API . .

        //place source in a Document Object

        Document doc = Jsoup.connect(url).get();

        Elements links = doc.select("a[href]");
        Elements media = doc.select("[src]");
        Elements imports = doc.select("link[href]");

what would the email collecting line be?
Elements emails = ?
Thanks
0
Comment
Question by:beavoid
2 Comments
 
LVL 13

Accepted Solution

by:
haloexpertsexchange earned 500 total points
ID: 39253793
check for mailto?
0
 

Author Comment

by:beavoid
ID: 39255045
Thanks for all this advice

I found an API called jsoup that handles all of these issues, probably regular expressions to find text within a page.
It finds links perfectly. I attached a zip with the main files I use, and my listlinks.java where my code with problems is

I am making an HTML diver that starts at one page, and recursively goes through every link from page down to pages beneath it. It works fine, but claims that some URL findings are invalid and its seems silly. I can't see the exact line that is the problem. - or how to avoid problem pages. I avoid certain extensions like .swf and .png
You can comment out which URL you'd like to begin with in main(

I can't attach JAR files. Google it :)
It is
jsoup-1.7.2.jar

My java file is below

Thanks
ListLinks.java
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Developer portfolios can be a bit of an enigma—how do you present yourself to employers without burying them in lines of code?  A modern portfolio is more than just work samples, it’s also a statement of how you work.
This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
The viewer will the learn the benefit of plain text editors and code an HTML5 based template for use in further tutorials.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question