Solved

Finding emails and links in an html, PHP file

Posted on 2013-06-17
2
323 Views
Last Modified: 2013-09-22
Hi

I'm dabbling with the code to pull from html code, all links and email addresses from a page.
I'm using the jsoup API, which performs wonderfully.
I have got some of it working, listing links,
but, I don't know the regular expression system jsoup uses for email addresses.
Maybe with the Document section..
(the code is in a Document object) with the Jsoup API . .

        //place source in a Document Object

        Document doc = Jsoup.connect(url).get();

        Elements links = doc.select("a[href]");
        Elements media = doc.select("[src]");
        Elements imports = doc.select("link[href]");

what would the email collecting line be?
Elements emails = ?
Thanks
0
Comment
Question by:beavoid
2 Comments
 
LVL 13

Accepted Solution

by:
haloexpertsexchange earned 500 total points
Comment Utility
check for mailto?
0
 

Author Comment

by:beavoid
Comment Utility
Thanks for all this advice

I found an API called jsoup that handles all of these issues, probably regular expressions to find text within a page.
It finds links perfectly. I attached a zip with the main files I use, and my listlinks.java where my code with problems is

I am making an HTML diver that starts at one page, and recursively goes through every link from page down to pages beneath it. It works fine, but claims that some URL findings are invalid and its seems silly. I can't see the exact line that is the problem. - or how to avoid problem pages. I avoid certain extensions like .swf and .png
You can comment out which URL you'd like to begin with in main(

I can't attach JAR files. Google it :)
It is
jsoup-1.7.2.jar

My java file is below

Thanks
ListLinks.java
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Suggested Solutions

Read about why website design really matters in today's demanding market.
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now