Solved

robot text file

Posted on 2004-09-07
26
216 Views
Last Modified: 2010-03-31
i have store some urls in vector by looking at robot txt file in a website

if i had a url "www.example.com/important" stored in a vector ROBOT taken from a robot file and i have a link "www.example.com/important/name/details" should i be allow to parse this link? if not how should i be checking the vector as using "if contain" method wont work?

0
Comment
Question by:HomerrSimpson
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 12
  • 8
  • 3
  • +1
26 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 12002270
>>if i had a url "www.example.com/important

The vector should contain

/important/


>>should i be allow to parse this link?

No. Any path starting with 'important' would be disallowed
0
 
LVL 92

Expert Comment

by:objects
ID: 12002404
> if not how should i be checking the vector as using "if contain" method wont work?

You'll need to loop thru the vector checking each entry
0
 

Author Comment

by:HomerrSimpson
ID: 12005805
i m confused why wouldnt i need to store the host part as well in the vector?



0
Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

 
LVL 86

Expert Comment

by:CEHJ
ID: 12009327
>>i m confused why wouldnt i need to store the host part as well in the vector?

Depends on how you're implementing it. The host part doesn't of course appear in the robots file. You can of course store it if you're dealing with several different hosts
0
 

Author Comment

by:HomerrSimpson
ID: 12010395
yeah i am dealing with several different hosts that was why i was storing the host in the vector aswell

this leaves me back to the original how should i compare it with the links. i think i can loop thru the vector ok but checking is what i am unsure of?

so if i have this www.example.com/important in the vector and i have this as a link "www.example.com/important/name/" or this as a link "/important/name/address " how can i compare it ?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12010525
Well you must work without the hostname:

String filepath = new URL("www.example.com/important").getFile();
String link = "/important/name/address";
boolean allowed = (link.startsWith(filepath) == false);
0
 

Author Comment

by:HomerrSimpson
ID: 12011877
think i got it working do you mind double checking?

      boolean allowed=true;
      Vector v = new Vector();
      v.add("http://www.example.com/important/");
      v.add("http://www.example.com/peoplename/");
      v.add("http://www.example.com/address/");
      v.add("http://www.example.com/securitynumber/");

String link = "/important/peoplename/addrress";

if(link.startsWith("http://www"))
{
      System.out.println("ENTER");
      ListIterator iter = v.listIterator();

      while (iter.hasNext()) {
      String filepath = (String)iter.next();            
      System.out.println("link " + link + " filepath " + filepath);
                
             if(allowed = (link.startsWith(filepath) == false)){
               allowed=false;
                  }
            }

}else{
      
      ListIterator iter = v.listIterator();
      System.out.println("ENTER2");

            while (iter.hasNext()) {
                String filepath= new URL((String)iter.next()).getFile();
                System.out.println("link " + link + " filepath " + filepath);
                
                if(allowed = (link.startsWith( filepath ) == false)){
                      allowed=false;
                }
          }

}
System.out.println(allowed);
0
 
LVL 92

Expert Comment

by:objects
ID: 12012139
>            if(allowed = (link.startsWith(filepath) == false)){
>             allowed=false;
>               }

thats not quite right and can be simplified to:

allowed = !link.startsWith(filePath);
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12013711
>>if(allowed = (link.startsWith( filepath ) == false)){

What you've written there is *not* two boolean tests - you've written an assignment followed by a boolean test. The correct code is what i gave earlier:


allowed = (link.startsWith(filepath) == false);

>>allowed = !link.startsWith(filePath);

is the same, but less readable, particularly as the variable  name starts with 'l'


0
 
LVL 20

Expert Comment

by:Venabili
ID: 12014566
>>allowed = !link.startsWith(filePath);

is the same, but less readable, particularly as the variable  name starts with 'l'

if you ask me this one is more readable and understandable than the longer variant... It depends on the person that reads it and the fact that it is more readable for you does not mean it is such for everyone... So do not bite please... :))

Venabili
0
 

Author Comment

by:HomerrSimpson
ID: 12016498
so i just need

allowed = (link.startsWith(filepath) == false);

in the while loop

then

 if allowed = true then parse else do not parse
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12019100
Yep. (As long as you don't find it too unreadable ;-))
0
 
LVL 92

Expert Comment

by:objects
ID: 12022361
you can improve it using:

          while (allowed && iter.hasNext()) {
              String filepath= new URL((String)iter.next()).getFile();
              System.out.println("link " + link + " filepath " + filepath);
             
              allowed = !link.startsWith(filePath);
         }

That way you don't loop thru the entire list once u find its not allowed.
0
 

Author Comment

by:HomerrSimpson
ID: 12028325
found a problem

if a vector contains  "http://www.example.com/staff/"

and a link "http://www.example.com/staff"

allowed is always equals to true any idea on how to solve it ?

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12028774
>>allowed is always equals to true any idea on how to solve it ?

Can you post the exact code you're using?
0
 

Author Comment

by:HomerrSimpson
ID: 12029652
sure the result always returned true

      boolean allowed=true;
      Vector v = new Vector();
      
      v.add("http://www.example.com/peoplename/");
      v.add("http://www.example.com/address/");
      v.add("http://www.example.com/staff/");
      v.add("http://www.example.com/securitynumber/");


String link = "http://www.example.com/staff";

if(link.startsWith("http://www"))
{
      ListIterator iter = v.listIterator();

            while (allowed && iter.hasNext()) {
            String filepath = (String)iter.next();            
            
            allowed = (link.startsWith(filepath) == false);
         
            System.out.println("allowed " + allowed);
         }
}

      if(allowed==false){
       System.out.println("do not parse " + link);
        }            
      else{
            System.out.println("parse " + link);
      }
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12029716
If the following is true:

>>if(link.startsWith("http://www"))

then the following:

>>allowed = (link.startsWith(filepath) == false);

(the value of allowed) must always be true ;-)

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12029737
>>That way you don't loop thru the entire list once u find its not allowed.

btw, is that what you *want* to do HomerrSimpson !?
0
 

Author Comment

by:HomerrSimpson
ID: 12029997
i think it would make sense to stop once there is a match

i want to put the above code in a method which returns a boolean false if there is a match so i do not parse if it true parse

public static boolean checkRobot(String link)
{
  ListIterator iter = v.listIterator();

  while (allowed && iter.hasNext()) {
  String filepath = (String)iter.next();          
  allowed = (link.startsWith(filepath) == false);
   }

return allowed;
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12030017
What are you going to be passing as 'link'? Please post example
0
 

Author Comment

by:HomerrSimpson
ID: 12030069
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 90 total points
ID: 12030106
You need something like:


public static boolean checkRobot(String link)
{
      boolean allowed = true;
      try
      {
            link = new URL(link).getFile();
      }
      catch(Exception e)
      {
            return new IllegalArgumentException("Bad link passed to checkRobot");
      }
      ListIterator iter = v.listIterator();
      while (allowed && iter.hasNext())
      {
            String www = (String)iter.next();
            try
            {
                  String filepath = new URL(www).getFile();
            }
            catch(Exception e)
            {
                  return new IllegalArgumentException("Bad link in Vector");
            }

            allowed = (link.startsWith(filepath) == false);
      }

      return allowed;
}

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12030139
Sorry - typos there. Try this

      public static boolean checkRobot(Vector v, String link)
      {
            boolean allowed = true;
            try
            {
                  link = new URL(link).getFile();
            }
            catch(Exception e)
            {
                  throw new IllegalArgumentException("Bad link passed to checkRobot");
            }
            ListIterator iter = v.listIterator();
            while (allowed && iter.hasNext())
            {
                  String www = (String)iter.next();
                  String filepath = null;
                  try
                  {
                        filepath = new URL(www).getFile();
                  }
                  catch(Exception e)
                  {
                        throw new IllegalArgumentException("Bad link in Vector");
                  }

                  allowed = (link.startsWith(filepath) == false);
            }

            return allowed;
      }

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12270536
8-)
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How to convert from xls to xlsx using java 7 157
mysql jsp example issue 32 96
Session in java desktop 5 37
Eclipse neon2 "Java build path" correctness 7 43
INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.
Suggested Courses

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question