Solved

robot text file

Posted on 2004-09-07
26
218 Views
Last Modified: 2010-03-31
i have store some urls in vector by looking at robot txt file in a website

if i had a url "www.example.com/important" stored in a vector ROBOT taken from a robot file and i have a link "www.example.com/important/name/details" should i be allow to parse this link? if not how should i be checking the vector as using "if contain" method wont work?

0
Comment
Question by:HomerrSimpson
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 12
  • 8
  • 3
  • +1
26 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 12002270
>>if i had a url "www.example.com/important

The vector should contain

/important/


>>should i be allow to parse this link?

No. Any path starting with 'important' would be disallowed
0
 
LVL 92

Expert Comment

by:objects
ID: 12002404
> if not how should i be checking the vector as using "if contain" method wont work?

You'll need to loop thru the vector checking each entry
0
 

Author Comment

by:HomerrSimpson
ID: 12005805
i m confused why wouldnt i need to store the host part as well in the vector?



0
PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

 
LVL 86

Expert Comment

by:CEHJ
ID: 12009327
>>i m confused why wouldnt i need to store the host part as well in the vector?

Depends on how you're implementing it. The host part doesn't of course appear in the robots file. You can of course store it if you're dealing with several different hosts
0
 

Author Comment

by:HomerrSimpson
ID: 12010395
yeah i am dealing with several different hosts that was why i was storing the host in the vector aswell

this leaves me back to the original how should i compare it with the links. i think i can loop thru the vector ok but checking is what i am unsure of?

so if i have this www.example.com/important in the vector and i have this as a link "www.example.com/important/name/" or this as a link "/important/name/address " how can i compare it ?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12010525
Well you must work without the hostname:

String filepath = new URL("www.example.com/important").getFile();
String link = "/important/name/address";
boolean allowed = (link.startsWith(filepath) == false);
0
 

Author Comment

by:HomerrSimpson
ID: 12011877
think i got it working do you mind double checking?

      boolean allowed=true;
      Vector v = new Vector();
      v.add("http://www.example.com/important/");
      v.add("http://www.example.com/peoplename/");
      v.add("http://www.example.com/address/");
      v.add("http://www.example.com/securitynumber/");

String link = "/important/peoplename/addrress";

if(link.startsWith("http://www"))
{
      System.out.println("ENTER");
      ListIterator iter = v.listIterator();

      while (iter.hasNext()) {
      String filepath = (String)iter.next();            
      System.out.println("link " + link + " filepath " + filepath);
                
             if(allowed = (link.startsWith(filepath) == false)){
               allowed=false;
                  }
            }

}else{
      
      ListIterator iter = v.listIterator();
      System.out.println("ENTER2");

            while (iter.hasNext()) {
                String filepath= new URL((String)iter.next()).getFile();
                System.out.println("link " + link + " filepath " + filepath);
                
                if(allowed = (link.startsWith( filepath ) == false)){
                      allowed=false;
                }
          }

}
System.out.println(allowed);
0
 
LVL 92

Expert Comment

by:objects
ID: 12012139
>            if(allowed = (link.startsWith(filepath) == false)){
>             allowed=false;
>               }

thats not quite right and can be simplified to:

allowed = !link.startsWith(filePath);
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12013711
>>if(allowed = (link.startsWith( filepath ) == false)){

What you've written there is *not* two boolean tests - you've written an assignment followed by a boolean test. The correct code is what i gave earlier:


allowed = (link.startsWith(filepath) == false);

>>allowed = !link.startsWith(filePath);

is the same, but less readable, particularly as the variable  name starts with 'l'


0
 
LVL 20

Expert Comment

by:Venabili
ID: 12014566
>>allowed = !link.startsWith(filePath);

is the same, but less readable, particularly as the variable  name starts with 'l'

if you ask me this one is more readable and understandable than the longer variant... It depends on the person that reads it and the fact that it is more readable for you does not mean it is such for everyone... So do not bite please... :))

Venabili
0
 

Author Comment

by:HomerrSimpson
ID: 12016498
so i just need

allowed = (link.startsWith(filepath) == false);

in the while loop

then

 if allowed = true then parse else do not parse
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12019100
Yep. (As long as you don't find it too unreadable ;-))
0
 
LVL 92

Expert Comment

by:objects
ID: 12022361
you can improve it using:

          while (allowed && iter.hasNext()) {
              String filepath= new URL((String)iter.next()).getFile();
              System.out.println("link " + link + " filepath " + filepath);
             
              allowed = !link.startsWith(filePath);
         }

That way you don't loop thru the entire list once u find its not allowed.
0
 

Author Comment

by:HomerrSimpson
ID: 12028325
found a problem

if a vector contains  "http://www.example.com/staff/"

and a link "http://www.example.com/staff"

allowed is always equals to true any idea on how to solve it ?

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12028774
>>allowed is always equals to true any idea on how to solve it ?

Can you post the exact code you're using?
0
 

Author Comment

by:HomerrSimpson
ID: 12029652
sure the result always returned true

      boolean allowed=true;
      Vector v = new Vector();
      
      v.add("http://www.example.com/peoplename/");
      v.add("http://www.example.com/address/");
      v.add("http://www.example.com/staff/");
      v.add("http://www.example.com/securitynumber/");


String link = "http://www.example.com/staff";

if(link.startsWith("http://www"))
{
      ListIterator iter = v.listIterator();

            while (allowed && iter.hasNext()) {
            String filepath = (String)iter.next();            
            
            allowed = (link.startsWith(filepath) == false);
         
            System.out.println("allowed " + allowed);
         }
}

      if(allowed==false){
       System.out.println("do not parse " + link);
        }            
      else{
            System.out.println("parse " + link);
      }
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12029716
If the following is true:

>>if(link.startsWith("http://www"))

then the following:

>>allowed = (link.startsWith(filepath) == false);

(the value of allowed) must always be true ;-)

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12029737
>>That way you don't loop thru the entire list once u find its not allowed.

btw, is that what you *want* to do HomerrSimpson !?
0
 

Author Comment

by:HomerrSimpson
ID: 12029997
i think it would make sense to stop once there is a match

i want to put the above code in a method which returns a boolean false if there is a match so i do not parse if it true parse

public static boolean checkRobot(String link)
{
  ListIterator iter = v.listIterator();

  while (allowed && iter.hasNext()) {
  String filepath = (String)iter.next();          
  allowed = (link.startsWith(filepath) == false);
   }

return allowed;
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12030017
What are you going to be passing as 'link'? Please post example
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 90 total points
ID: 12030106
You need something like:


public static boolean checkRobot(String link)
{
      boolean allowed = true;
      try
      {
            link = new URL(link).getFile();
      }
      catch(Exception e)
      {
            return new IllegalArgumentException("Bad link passed to checkRobot");
      }
      ListIterator iter = v.listIterator();
      while (allowed && iter.hasNext())
      {
            String www = (String)iter.next();
            try
            {
                  String filepath = new URL(www).getFile();
            }
            catch(Exception e)
            {
                  return new IllegalArgumentException("Bad link in Vector");
            }

            allowed = (link.startsWith(filepath) == false);
      }

      return allowed;
}

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12030139
Sorry - typos there. Try this

      public static boolean checkRobot(Vector v, String link)
      {
            boolean allowed = true;
            try
            {
                  link = new URL(link).getFile();
            }
            catch(Exception e)
            {
                  throw new IllegalArgumentException("Bad link passed to checkRobot");
            }
            ListIterator iter = v.listIterator();
            while (allowed && iter.hasNext())
            {
                  String www = (String)iter.next();
                  String filepath = null;
                  try
                  {
                        filepath = new URL(www).getFile();
                  }
                  catch(Exception e)
                  {
                        throw new IllegalArgumentException("Bad link in Vector");
                  }

                  allowed = (link.startsWith(filepath) == false);
            }

            return allowed;
      }

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12270536
8-)
0

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Java can be integrated with native programs using an interface called JNI(Java Native Interface). Native programs are programs which can directly run on the processor. JNI is simply a naming and calling convention so that the JVM (Java…
In this post we will learn how to make Android Gesture Tutorial and give different functionality whenever a user Touch or Scroll android screen.
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

624 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question