robot text file

i have store some urls in vector by looking at robot txt file in a website

if i had a url "www.example.com/important" stored in a vector ROBOT taken from a robot file and i have a link "www.example.com/important/name/details" should i be allow to parse this link? if not how should i be checking the vector as using "if contain" method wont work?

HomerrSimpsonAsked:
Who is Participating?
 
CEHJConnect With a Mentor Commented:
You need something like:


public static boolean checkRobot(String link)
{
      boolean allowed = true;
      try
      {
            link = new URL(link).getFile();
      }
      catch(Exception e)
      {
            return new IllegalArgumentException("Bad link passed to checkRobot");
      }
      ListIterator iter = v.listIterator();
      while (allowed && iter.hasNext())
      {
            String www = (String)iter.next();
            try
            {
                  String filepath = new URL(www).getFile();
            }
            catch(Exception e)
            {
                  return new IllegalArgumentException("Bad link in Vector");
            }

            allowed = (link.startsWith(filepath) == false);
      }

      return allowed;
}

0
 
CEHJCommented:
>>if i had a url "www.example.com/important

The vector should contain

/important/


>>should i be allow to parse this link?

No. Any path starting with 'important' would be disallowed
0
 
objectsCommented:
> if not how should i be checking the vector as using "if contain" method wont work?

You'll need to loop thru the vector checking each entry
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
HomerrSimpsonAuthor Commented:
i m confused why wouldnt i need to store the host part as well in the vector?



0
 
CEHJCommented:
>>i m confused why wouldnt i need to store the host part as well in the vector?

Depends on how you're implementing it. The host part doesn't of course appear in the robots file. You can of course store it if you're dealing with several different hosts
0
 
HomerrSimpsonAuthor Commented:
yeah i am dealing with several different hosts that was why i was storing the host in the vector aswell

this leaves me back to the original how should i compare it with the links. i think i can loop thru the vector ok but checking is what i am unsure of?

so if i have this www.example.com/important in the vector and i have this as a link "www.example.com/important/name/" or this as a link "/important/name/address " how can i compare it ?
0
 
CEHJCommented:
Well you must work without the hostname:

String filepath = new URL("www.example.com/important").getFile();
String link = "/important/name/address";
boolean allowed = (link.startsWith(filepath) == false);
0
 
HomerrSimpsonAuthor Commented:
think i got it working do you mind double checking?

      boolean allowed=true;
      Vector v = new Vector();
      v.add("http://www.example.com/important/");
      v.add("http://www.example.com/peoplename/");
      v.add("http://www.example.com/address/");
      v.add("http://www.example.com/securitynumber/");

String link = "/important/peoplename/addrress";

if(link.startsWith("http://www"))
{
      System.out.println("ENTER");
      ListIterator iter = v.listIterator();

      while (iter.hasNext()) {
      String filepath = (String)iter.next();            
      System.out.println("link " + link + " filepath " + filepath);
                
             if(allowed = (link.startsWith(filepath) == false)){
               allowed=false;
                  }
            }

}else{
      
      ListIterator iter = v.listIterator();
      System.out.println("ENTER2");

            while (iter.hasNext()) {
                String filepath= new URL((String)iter.next()).getFile();
                System.out.println("link " + link + " filepath " + filepath);
                
                if(allowed = (link.startsWith( filepath ) == false)){
                      allowed=false;
                }
          }

}
System.out.println(allowed);
0
 
objectsCommented:
>            if(allowed = (link.startsWith(filepath) == false)){
>             allowed=false;
>               }

thats not quite right and can be simplified to:

allowed = !link.startsWith(filePath);
0
 
CEHJCommented:
>>if(allowed = (link.startsWith( filepath ) == false)){

What you've written there is *not* two boolean tests - you've written an assignment followed by a boolean test. The correct code is what i gave earlier:


allowed = (link.startsWith(filepath) == false);

>>allowed = !link.startsWith(filePath);

is the same, but less readable, particularly as the variable  name starts with 'l'


0
 
VenabiliCommented:
>>allowed = !link.startsWith(filePath);

is the same, but less readable, particularly as the variable  name starts with 'l'

if you ask me this one is more readable and understandable than the longer variant... It depends on the person that reads it and the fact that it is more readable for you does not mean it is such for everyone... So do not bite please... :))

Venabili
0
 
HomerrSimpsonAuthor Commented:
so i just need

allowed = (link.startsWith(filepath) == false);

in the while loop

then

 if allowed = true then parse else do not parse
0
 
CEHJCommented:
Yep. (As long as you don't find it too unreadable ;-))
0
 
objectsCommented:
you can improve it using:

          while (allowed && iter.hasNext()) {
              String filepath= new URL((String)iter.next()).getFile();
              System.out.println("link " + link + " filepath " + filepath);
             
              allowed = !link.startsWith(filePath);
         }

That way you don't loop thru the entire list once u find its not allowed.
0
 
HomerrSimpsonAuthor Commented:
found a problem

if a vector contains  "http://www.example.com/staff/"

and a link "http://www.example.com/staff"

allowed is always equals to true any idea on how to solve it ?

0
 
CEHJCommented:
>>allowed is always equals to true any idea on how to solve it ?

Can you post the exact code you're using?
0
 
HomerrSimpsonAuthor Commented:
sure the result always returned true

      boolean allowed=true;
      Vector v = new Vector();
      
      v.add("http://www.example.com/peoplename/");
      v.add("http://www.example.com/address/");
      v.add("http://www.example.com/staff/");
      v.add("http://www.example.com/securitynumber/");


String link = "http://www.example.com/staff";

if(link.startsWith("http://www"))
{
      ListIterator iter = v.listIterator();

            while (allowed && iter.hasNext()) {
            String filepath = (String)iter.next();            
            
            allowed = (link.startsWith(filepath) == false);
         
            System.out.println("allowed " + allowed);
         }
}

      if(allowed==false){
       System.out.println("do not parse " + link);
        }            
      else{
            System.out.println("parse " + link);
      }
0
 
CEHJCommented:
If the following is true:

>>if(link.startsWith("http://www"))

then the following:

>>allowed = (link.startsWith(filepath) == false);

(the value of allowed) must always be true ;-)

0
 
CEHJCommented:
>>That way you don't loop thru the entire list once u find its not allowed.

btw, is that what you *want* to do HomerrSimpson !?
0
 
HomerrSimpsonAuthor Commented:
i think it would make sense to stop once there is a match

i want to put the above code in a method which returns a boolean false if there is a match so i do not parse if it true parse

public static boolean checkRobot(String link)
{
  ListIterator iter = v.listIterator();

  while (allowed && iter.hasNext()) {
  String filepath = (String)iter.next();          
  allowed = (link.startsWith(filepath) == false);
   }

return allowed;
}
0
 
CEHJCommented:
What are you going to be passing as 'link'? Please post example
0
 
HomerrSimpsonAuthor Commented:
0
 
CEHJCommented:
Sorry - typos there. Try this

      public static boolean checkRobot(Vector v, String link)
      {
            boolean allowed = true;
            try
            {
                  link = new URL(link).getFile();
            }
            catch(Exception e)
            {
                  throw new IllegalArgumentException("Bad link passed to checkRobot");
            }
            ListIterator iter = v.listIterator();
            while (allowed && iter.hasNext())
            {
                  String www = (String)iter.next();
                  String filepath = null;
                  try
                  {
                        filepath = new URL(www).getFile();
                  }
                  catch(Exception e)
                  {
                        throw new IllegalArgumentException("Bad link in Vector");
                  }

                  allowed = (link.startsWith(filepath) == false);
            }

            return allowed;
      }

0
 
CEHJCommented:
8-)
0
All Courses

From novice to tech pro — start learning today.