Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

robot text file

Posted on 2004-09-07
26
213 Views
Last Modified: 2010-03-31
i have store some urls in vector by looking at robot txt file in a website

if i had a url "www.example.com/important" stored in a vector ROBOT taken from a robot file and i have a link "www.example.com/important/name/details" should i be allow to parse this link? if not how should i be checking the vector as using "if contain" method wont work?

0
Comment
Question by:HomerrSimpson
  • 12
  • 8
  • 3
  • +1
26 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 12002270
>>if i had a url "www.example.com/important

The vector should contain

/important/


>>should i be allow to parse this link?

No. Any path starting with 'important' would be disallowed
0
 
LVL 92

Expert Comment

by:objects
ID: 12002404
> if not how should i be checking the vector as using "if contain" method wont work?

You'll need to loop thru the vector checking each entry
0
 

Author Comment

by:HomerrSimpson
ID: 12005805
i m confused why wouldnt i need to store the host part as well in the vector?



0
Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

 
LVL 86

Expert Comment

by:CEHJ
ID: 12009327
>>i m confused why wouldnt i need to store the host part as well in the vector?

Depends on how you're implementing it. The host part doesn't of course appear in the robots file. You can of course store it if you're dealing with several different hosts
0
 

Author Comment

by:HomerrSimpson
ID: 12010395
yeah i am dealing with several different hosts that was why i was storing the host in the vector aswell

this leaves me back to the original how should i compare it with the links. i think i can loop thru the vector ok but checking is what i am unsure of?

so if i have this www.example.com/important in the vector and i have this as a link "www.example.com/important/name/" or this as a link "/important/name/address " how can i compare it ?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12010525
Well you must work without the hostname:

String filepath = new URL("www.example.com/important").getFile();
String link = "/important/name/address";
boolean allowed = (link.startsWith(filepath) == false);
0
 

Author Comment

by:HomerrSimpson
ID: 12011877
think i got it working do you mind double checking?

      boolean allowed=true;
      Vector v = new Vector();
      v.add("http://www.example.com/important/");
      v.add("http://www.example.com/peoplename/");
      v.add("http://www.example.com/address/");
      v.add("http://www.example.com/securitynumber/");

String link = "/important/peoplename/addrress";

if(link.startsWith("http://www"))
{
      System.out.println("ENTER");
      ListIterator iter = v.listIterator();

      while (iter.hasNext()) {
      String filepath = (String)iter.next();            
      System.out.println("link " + link + " filepath " + filepath);
                
             if(allowed = (link.startsWith(filepath) == false)){
               allowed=false;
                  }
            }

}else{
      
      ListIterator iter = v.listIterator();
      System.out.println("ENTER2");

            while (iter.hasNext()) {
                String filepath= new URL((String)iter.next()).getFile();
                System.out.println("link " + link + " filepath " + filepath);
                
                if(allowed = (link.startsWith( filepath ) == false)){
                      allowed=false;
                }
          }

}
System.out.println(allowed);
0
 
LVL 92

Expert Comment

by:objects
ID: 12012139
>            if(allowed = (link.startsWith(filepath) == false)){
>             allowed=false;
>               }

thats not quite right and can be simplified to:

allowed = !link.startsWith(filePath);
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12013711
>>if(allowed = (link.startsWith( filepath ) == false)){

What you've written there is *not* two boolean tests - you've written an assignment followed by a boolean test. The correct code is what i gave earlier:


allowed = (link.startsWith(filepath) == false);

>>allowed = !link.startsWith(filePath);

is the same, but less readable, particularly as the variable  name starts with 'l'


0
 
LVL 20

Expert Comment

by:Venabili
ID: 12014566
>>allowed = !link.startsWith(filePath);

is the same, but less readable, particularly as the variable  name starts with 'l'

if you ask me this one is more readable and understandable than the longer variant... It depends on the person that reads it and the fact that it is more readable for you does not mean it is such for everyone... So do not bite please... :))

Venabili
0
 

Author Comment

by:HomerrSimpson
ID: 12016498
so i just need

allowed = (link.startsWith(filepath) == false);

in the while loop

then

 if allowed = true then parse else do not parse
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12019100
Yep. (As long as you don't find it too unreadable ;-))
0
 
LVL 92

Expert Comment

by:objects
ID: 12022361
you can improve it using:

          while (allowed && iter.hasNext()) {
              String filepath= new URL((String)iter.next()).getFile();
              System.out.println("link " + link + " filepath " + filepath);
             
              allowed = !link.startsWith(filePath);
         }

That way you don't loop thru the entire list once u find its not allowed.
0
 

Author Comment

by:HomerrSimpson
ID: 12028325
found a problem

if a vector contains  "http://www.example.com/staff/"

and a link "http://www.example.com/staff"

allowed is always equals to true any idea on how to solve it ?

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12028774
>>allowed is always equals to true any idea on how to solve it ?

Can you post the exact code you're using?
0
 

Author Comment

by:HomerrSimpson
ID: 12029652
sure the result always returned true

      boolean allowed=true;
      Vector v = new Vector();
      
      v.add("http://www.example.com/peoplename/");
      v.add("http://www.example.com/address/");
      v.add("http://www.example.com/staff/");
      v.add("http://www.example.com/securitynumber/");


String link = "http://www.example.com/staff";

if(link.startsWith("http://www"))
{
      ListIterator iter = v.listIterator();

            while (allowed && iter.hasNext()) {
            String filepath = (String)iter.next();            
            
            allowed = (link.startsWith(filepath) == false);
         
            System.out.println("allowed " + allowed);
         }
}

      if(allowed==false){
       System.out.println("do not parse " + link);
        }            
      else{
            System.out.println("parse " + link);
      }
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12029716
If the following is true:

>>if(link.startsWith("http://www"))

then the following:

>>allowed = (link.startsWith(filepath) == false);

(the value of allowed) must always be true ;-)

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12029737
>>That way you don't loop thru the entire list once u find its not allowed.

btw, is that what you *want* to do HomerrSimpson !?
0
 

Author Comment

by:HomerrSimpson
ID: 12029997
i think it would make sense to stop once there is a match

i want to put the above code in a method which returns a boolean false if there is a match so i do not parse if it true parse

public static boolean checkRobot(String link)
{
  ListIterator iter = v.listIterator();

  while (allowed && iter.hasNext()) {
  String filepath = (String)iter.next();          
  allowed = (link.startsWith(filepath) == false);
   }

return allowed;
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12030017
What are you going to be passing as 'link'? Please post example
0
 

Author Comment

by:HomerrSimpson
ID: 12030069
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 90 total points
ID: 12030106
You need something like:


public static boolean checkRobot(String link)
{
      boolean allowed = true;
      try
      {
            link = new URL(link).getFile();
      }
      catch(Exception e)
      {
            return new IllegalArgumentException("Bad link passed to checkRobot");
      }
      ListIterator iter = v.listIterator();
      while (allowed && iter.hasNext())
      {
            String www = (String)iter.next();
            try
            {
                  String filepath = new URL(www).getFile();
            }
            catch(Exception e)
            {
                  return new IllegalArgumentException("Bad link in Vector");
            }

            allowed = (link.startsWith(filepath) == false);
      }

      return allowed;
}

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12030139
Sorry - typos there. Try this

      public static boolean checkRobot(Vector v, String link)
      {
            boolean allowed = true;
            try
            {
                  link = new URL(link).getFile();
            }
            catch(Exception e)
            {
                  throw new IllegalArgumentException("Bad link passed to checkRobot");
            }
            ListIterator iter = v.listIterator();
            while (allowed && iter.hasNext())
            {
                  String www = (String)iter.next();
                  String filepath = null;
                  try
                  {
                        filepath = new URL(www).getFile();
                  }
                  catch(Exception e)
                  {
                        throw new IllegalArgumentException("Bad link in Vector");
                  }

                  allowed = (link.startsWith(filepath) == false);
            }

            return allowed;
      }

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12270536
8-)
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
egit plugin on eclipse 8 82
oracle 11g 23 106
Why doesn't this text field show up on my Applet frame? 2 19
java mysql insert application 14 25
INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question