How to read tree of pages of a site? how to read a binary/multiple tree?

Nav444
Nav444 used Ask the Experts™
on
Hi How it is posible to read all urls of a site? I have the funtion to read a page and spits out all urls of the page as an array, but can't find out how to go down the tree of pages on the site.
Is there an algorithm or method to do this?

I really appreciate your help.
Nav
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
Use recursion in your URL finding function. Make sure the host matches the current site and has not already been found.

I would use a List to store the URLs instead of an array so new URLs can be added. It is also easily checked to see if a given URL is already saved.

If you require more advice, post what you currently have and someone can offer suggestions.
Hi,
few notes for the following code:
1. i assumed that getAllURLFromURL() method is the method u already implemented, after adding your implementation remove "abstract" from the class declaration.
2. isFromSameHost() method is very basic and could be modified (since one site can combine urls from different hosts)
3. u can change the vector data structure to other data structures, including tree models.
4. to use this class in your code:
...
SiteURLBuilder urlBuilder = new SiteURLBuilder(<any url>);
Vector allURLS = urlBuilder.getAllURL();
...



abstract class SiteURLBuilder {

    protected Vector vURL = null; // hold all collected URLs
    protected URL urlRoot = null; // the root for the site

    public SiteURLBuilder(URL root) {
      vURL = new Vector();
      urlRoot = root;
    }

    abstract protected URL[] getAllURLFromURL(URL url); // your implementation

    protected boolean isFromSameHost(URL url) {
      return (url.getHost().equals(urlRoot.getHost()));
    }

    protected boolean isValid(URL url) {
      return !vURL.contains(url) && this.isFromSameHost(url);
    }

    protected void createURLVec(URL url) {
      vURL.addElement(url);
      URL[] urls = this.getAllURLFromURL(url);
      for (int i=0; i<urls.length; i++)
        if (this.isValid(urls[i]))
          this.createURLVec(urls[i]);
    }

    public Vector getAllURL() {
      vURL.clear();
      this.createURLVec(urlRoot);
      return vURL;
    }
  } // SiteURLBuilder

-gkern

Author

Commented:
Dear gKern,
Thank you , Thank you. That was really great. It worked and did help me a lot.
Thanks once again.
Nav

-------------------------------------

conick, Thanks for your reply.
I will use list for sure.
Nav
glad i could help :)
-gkern

Commented:
Nav444,

EE works a little different than most Q&A message boards. After a question is answered, you can "close the question" by grading the comment that helped you the most.

This does a few things:
- Places the question in Previously Asked Questions for others to search
- Removes it from "Questions Awaiting Answers"
- Increments the points the "Expert" has received.
- Shows your grade in your profile so "Experts" can see how you grade. Users who give poor grades or do not grade questions may receive less attention from Experts in the future.

If gkern's comment answered your question, please go to his comment and click the "Grade Comment as Answer" link.

Good Luck!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial