Solved

robot text file

Posted on 2004-09-07
26
210 Views
Last Modified: 2010-03-31
i have store some urls in vector by looking at robot txt file in a website

if i had a url "www.example.com/important" stored in a vector ROBOT taken from a robot file and i have a link "www.example.com/important/name/details" should i be allow to parse this link? if not how should i be checking the vector as using "if contain" method wont work?

0
Comment
Question by:HomerrSimpson
  • 12
  • 8
  • 3
  • +1
26 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 12002270
>>if i had a url "www.example.com/important

The vector should contain

/important/


>>should i be allow to parse this link?

No. Any path starting with 'important' would be disallowed
0
 
LVL 92

Expert Comment

by:objects
ID: 12002404
> if not how should i be checking the vector as using "if contain" method wont work?

You'll need to loop thru the vector checking each entry
0
 

Author Comment

by:HomerrSimpson
ID: 12005805
i m confused why wouldnt i need to store the host part as well in the vector?



0
Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

 
LVL 86

Expert Comment

by:CEHJ
ID: 12009327
>>i m confused why wouldnt i need to store the host part as well in the vector?

Depends on how you're implementing it. The host part doesn't of course appear in the robots file. You can of course store it if you're dealing with several different hosts
0
 

Author Comment

by:HomerrSimpson
ID: 12010395
yeah i am dealing with several different hosts that was why i was storing the host in the vector aswell

this leaves me back to the original how should i compare it with the links. i think i can loop thru the vector ok but checking is what i am unsure of?

so if i have this www.example.com/important in the vector and i have this as a link "www.example.com/important/name/" or this as a link "/important/name/address " how can i compare it ?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12010525
Well you must work without the hostname:

String filepath = new URL("www.example.com/important").getFile();
String link = "/important/name/address";
boolean allowed = (link.startsWith(filepath) == false);
0
 

Author Comment

by:HomerrSimpson
ID: 12011877
think i got it working do you mind double checking?

      boolean allowed=true;
      Vector v = new Vector();
      v.add("http://www.example.com/important/");
      v.add("http://www.example.com/peoplename/");
      v.add("http://www.example.com/address/");
      v.add("http://www.example.com/securitynumber/");

String link = "/important/peoplename/addrress";

if(link.startsWith("http://www"))
{
      System.out.println("ENTER");
      ListIterator iter = v.listIterator();

      while (iter.hasNext()) {
      String filepath = (String)iter.next();            
      System.out.println("link " + link + " filepath " + filepath);
                
             if(allowed = (link.startsWith(filepath) == false)){
               allowed=false;
                  }
            }

}else{
      
      ListIterator iter = v.listIterator();
      System.out.println("ENTER2");

            while (iter.hasNext()) {
                String filepath= new URL((String)iter.next()).getFile();
                System.out.println("link " + link + " filepath " + filepath);
                
                if(allowed = (link.startsWith( filepath ) == false)){
                      allowed=false;
                }
          }

}
System.out.println(allowed);
0
 
LVL 92

Expert Comment

by:objects
ID: 12012139
>            if(allowed = (link.startsWith(filepath) == false)){
>             allowed=false;
>               }

thats not quite right and can be simplified to:

allowed = !link.startsWith(filePath);
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12013711
>>if(allowed = (link.startsWith( filepath ) == false)){

What you've written there is *not* two boolean tests - you've written an assignment followed by a boolean test. The correct code is what i gave earlier:


allowed = (link.startsWith(filepath) == false);

>>allowed = !link.startsWith(filePath);

is the same, but less readable, particularly as the variable  name starts with 'l'


0
 
LVL 20

Expert Comment

by:Venabili
ID: 12014566
>>allowed = !link.startsWith(filePath);

is the same, but less readable, particularly as the variable  name starts with 'l'

if you ask me this one is more readable and understandable than the longer variant... It depends on the person that reads it and the fact that it is more readable for you does not mean it is such for everyone... So do not bite please... :))

Venabili
0
 

Author Comment

by:HomerrSimpson
ID: 12016498
so i just need

allowed = (link.startsWith(filepath) == false);

in the while loop

then

 if allowed = true then parse else do not parse
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12019100
Yep. (As long as you don't find it too unreadable ;-))
0
 
LVL 92

Expert Comment

by:objects
ID: 12022361
you can improve it using:

          while (allowed && iter.hasNext()) {
              String filepath= new URL((String)iter.next()).getFile();
              System.out.println("link " + link + " filepath " + filepath);
             
              allowed = !link.startsWith(filePath);
         }

That way you don't loop thru the entire list once u find its not allowed.
0
 

Author Comment

by:HomerrSimpson
ID: 12028325
found a problem

if a vector contains  "http://www.example.com/staff/"

and a link "http://www.example.com/staff"

allowed is always equals to true any idea on how to solve it ?

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12028774
>>allowed is always equals to true any idea on how to solve it ?

Can you post the exact code you're using?
0
 

Author Comment

by:HomerrSimpson
ID: 12029652
sure the result always returned true

      boolean allowed=true;
      Vector v = new Vector();
      
      v.add("http://www.example.com/peoplename/");
      v.add("http://www.example.com/address/");
      v.add("http://www.example.com/staff/");
      v.add("http://www.example.com/securitynumber/");


String link = "http://www.example.com/staff";

if(link.startsWith("http://www"))
{
      ListIterator iter = v.listIterator();

            while (allowed && iter.hasNext()) {
            String filepath = (String)iter.next();            
            
            allowed = (link.startsWith(filepath) == false);
         
            System.out.println("allowed " + allowed);
         }
}

      if(allowed==false){
       System.out.println("do not parse " + link);
        }            
      else{
            System.out.println("parse " + link);
      }
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12029716
If the following is true:

>>if(link.startsWith("http://www"))

then the following:

>>allowed = (link.startsWith(filepath) == false);

(the value of allowed) must always be true ;-)

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12029737
>>That way you don't loop thru the entire list once u find its not allowed.

btw, is that what you *want* to do HomerrSimpson !?
0
 

Author Comment

by:HomerrSimpson
ID: 12029997
i think it would make sense to stop once there is a match

i want to put the above code in a method which returns a boolean false if there is a match so i do not parse if it true parse

public static boolean checkRobot(String link)
{
  ListIterator iter = v.listIterator();

  while (allowed && iter.hasNext()) {
  String filepath = (String)iter.next();          
  allowed = (link.startsWith(filepath) == false);
   }

return allowed;
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12030017
What are you going to be passing as 'link'? Please post example
0
 

Author Comment

by:HomerrSimpson
ID: 12030069
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 90 total points
ID: 12030106
You need something like:


public static boolean checkRobot(String link)
{
      boolean allowed = true;
      try
      {
            link = new URL(link).getFile();
      }
      catch(Exception e)
      {
            return new IllegalArgumentException("Bad link passed to checkRobot");
      }
      ListIterator iter = v.listIterator();
      while (allowed && iter.hasNext())
      {
            String www = (String)iter.next();
            try
            {
                  String filepath = new URL(www).getFile();
            }
            catch(Exception e)
            {
                  return new IllegalArgumentException("Bad link in Vector");
            }

            allowed = (link.startsWith(filepath) == false);
      }

      return allowed;
}

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12030139
Sorry - typos there. Try this

      public static boolean checkRobot(Vector v, String link)
      {
            boolean allowed = true;
            try
            {
                  link = new URL(link).getFile();
            }
            catch(Exception e)
            {
                  throw new IllegalArgumentException("Bad link passed to checkRobot");
            }
            ListIterator iter = v.listIterator();
            while (allowed && iter.hasNext())
            {
                  String www = (String)iter.next();
                  String filepath = null;
                  try
                  {
                        filepath = new URL(www).getFile();
                  }
                  catch(Exception e)
                  {
                        throw new IllegalArgumentException("Bad link in Vector");
                  }

                  allowed = (link.startsWith(filepath) == false);
            }

            return allowed;
      }

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12270536
8-)
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For customizing the look of your lightweight component and making it look lucid like it was made of glass. Or: how to make your component more Apple-ish ;) This tip assumes your component to be of rectangular shape and completely opaque. (COD…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question