Solved

Finding emails and links in an html, PHP file

Posted on 2013-06-17
2
325 Views
Last Modified: 2013-09-22
Hi

I'm dabbling with the code to pull from html code, all links and email addresses from a page.
I'm using the jsoup API, which performs wonderfully.
I have got some of it working, listing links,
but, I don't know the regular expression system jsoup uses for email addresses.
Maybe with the Document section..
(the code is in a Document object) with the Jsoup API . .

        //place source in a Document Object

        Document doc = Jsoup.connect(url).get();

        Elements links = doc.select("a[href]");
        Elements media = doc.select("[src]");
        Elements imports = doc.select("link[href]");

what would the email collecting line be?
Elements emails = ?
Thanks
0
Comment
Question by:beavoid
2 Comments
 
LVL 13

Accepted Solution

by:
haloexpertsexchange earned 500 total points
ID: 39253793
check for mailto?
0
 

Author Comment

by:beavoid
ID: 39255045
Thanks for all this advice

I found an API called jsoup that handles all of these issues, probably regular expressions to find text within a page.
It finds links perfectly. I attached a zip with the main files I use, and my listlinks.java where my code with problems is

I am making an HTML diver that starts at one page, and recursively goes through every link from page down to pages beneath it. It works fine, but claims that some URL findings are invalid and its seems silly. I can't see the exact line that is the problem. - or how to avoid problem pages. I avoid certain extensions like .swf and .png
You can comment out which URL you'd like to begin with in main(

I can't attach JAR files. Google it :)
It is
jsoup-1.7.2.jar

My java file is below

Thanks
ListLinks.java
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
A form to still have contents even if some are wrong 10 47
get domain with php 7 20
Update from TABLE-A to TABLE-B 5 39
updating the date data 12 23
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
This article demonstrates how to create a simple responsive confirmation dialog with Ok and Cancel buttons using HTML, CSS, jQuery and Promises
The viewer will the learn the benefit of plain text editors and code an HTML5 based template for use in further tutorials.
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now