Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Defining regular expression

Posted on 2003-12-04
13
476 Views
Last Modified: 2010-03-31

How do I define regrex that start with <a href=" but not http:// after that.

ex)
I want to detect relative link such as <a href="HW/homework.txt"> not <a href="http://....">

p.s.
the format is going to be <a href="<hyperlink>">

0
Comment
Question by:dkim18
  • 5
  • 3
  • 3
  • +1
13 Comments
 
LVL 35

Expert Comment

by:girionis
ID: 9880822
 If you already have the string then simply do a string.indexOf("http"). If it is not found it will return a -1.
0
 
LVL 35

Expert Comment

by:girionis
ID: 9880824
 BTW the absense of http does not guarante a relative link since the path might as well be /home/dkim18/mpla/mpla/mpla...
0
 
LVL 24

Expert Comment

by:sciuriware
ID: 9881036
To find a relative link in String k, practise:

int foundLink;
String m = k.toUppercase(); // Catch <a href as well ....

    if((foundLink = m.indexOf("<A HREF")) >= 0)   // Found something
    {
          if(m.indexOf("HTTP", foundLink) < 0) // Not found, that's good ...
          {
                  ...... go on as you like it .......

Note: this non-regex-approach doesn't prepare for multiple spaces between A and H ...
;JOOP!
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 35

Expert Comment

by:girionis
ID: 9881078
 Better (if "s" is the string variable that holds the string)

s.toLowerCase().indexOf("http")
0
 
LVL 24

Expert Comment

by:sciuriware
ID: 9881178
Anyway dkim18, it's hard to use regex to define that you do NOT want to find something.
A more precise piece of code could be:

int foundLink;
String m = k.toUppercase(); // Catch <a href as well ....

    if(m.matches("<A +HREF) && (foundLink = m.indexOf(" HREF")) >= 0)   // Found something and allows many spaces between A and H ...
    {
          if(k.indexOf("http", foundLink) < 0) // Not found, that's good ...
          {
                  ...... go on again as you like it .......

Can you live with all above?
;JOOP!
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9882363
Try

String re =".+href=\"*http:.+|.+href=\"*/.+";
boolean absoluteUrl = someLink.matches(re);
0
 

Author Comment

by:dkim18
ID: 9888463
This is what I intended.
So far,   final String REPLACE_PATTERN = "<a href=\"[^(http)]"; this dedects all relative link but when it added with "newURL" the first character of relative path after " disappeared.
ex)
if relative link is: <a href="save/save.txt">
absoult linke is(newURL):<a href="http://www.abc/hw/

Result is: <a href="http://www.abc/hw/ave/save.txt">
but I want <a href="http://www.abc/hw/save/save.txt">
So, I link the this absolute link from my local computer.

I know the problem is here:
final String REPLACE_PATTERN = "<a href=\"[^(http://)]";
Somehow, [^(http://)]; make disapper chracter 's' from above example.

here is my cord:
---------------------------
public static String patternReplaceURL(String htmlWebPage, String url, String htmlFile){
    int splitIndex = 0;
    String newURL = null;

    splitIndex = url.indexOf(htmlFile);
    newURL = url.substring(0, splitIndex);
 
  final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL ;
  final String REPLACE_PATTERN = "<a href=\"[^(http://)]";
   
  Pattern myPattern = Pattern.compile(REPLACE_PATTERN, FLAGS);
  Matcher myMatcher = myPattern.matcher(htmlWebPage);

  StringBuffer buffy = new StringBuffer();
  for(int i = 0; i < 3 ; i++){
    if (myMatcher.find()) {
      myMatcher.appendReplacement(buffy, replace_str);
    }
  }

  myMatcher.appendTail(buffy);
  System.out.println(buffy.toString());

  String newHtml=buffy.toString();
  return newHtml;
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9888720
How will your replacement rules work in the following case?

<a href="../a/b/x.html">Some link</a>

How would you put in the parent directory?
0
 

Author Comment

by:dkim18
ID: 9888822
How will your replacement rules work in the following case?

<a href="../a/b/x.html">Some link</a>
>>if this is not absolute dir, then this will be detected and replaced with absolute dir.
>>is that what you wanted to ask?

I just need solve the replacing problem. when above code replaces relative dir with absolte dir, it cut it out the first character of relative dir.

ex)
if relative link is: <a href="save/save.txt">
absoult linke is:<a href="http://www.xyz.com/hw/index.html
(this line of code make new url path w/o index.html)
    int splitIndex = 0;
    String newURL = null;

    splitIndex = url.indexOf(htmlFile);
    newURL = url.substring(0, splitIndex);

Result is: <a href="http://www.abc/hw/ave/save.txt">
but I want <a href="http://www.abc/hw/save/save.txt">
So, I link the this absolute link from my local computer.

again, the problem is here:
final String REPLACE_PATTERN = "<a href=\"[^(http://)]";

some how it occupied the first character of relative path after <a href="


I hope I made my point clear.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9888868
Yes, let's forget that .. business for now. It seems to me your expression is not quite right - for instance, the character class does not seem appropriate here. The following works for me:

  String s = "zzzz<a href=/c/d/file.html>A</a>zzzz<a href=\"c/d/file.html\">zzzz</a>zzzz" +
        "<a href=\"../c/d/file.html\">zzzz</a>zzzz<a href=\"./c/d/file.html\">zzzz</a>";
       
        patternReplaceURL(s, null, null);
        
        
.................
        

  public static String patternReplaceURL(String htmlWebPage, String url, String htmlFile) {
        final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL;
        final String REPLACE_PATTERN = "<a href=(\")*[/\\.]*";
        final String replace_str = "<a href=$1http://www.xxx.com/";

         Pattern myPattern = Pattern.compile(REPLACE_PATTERN, FLAGS);
          Matcher myMatcher = myPattern.matcher(htmlWebPage);
          StringBuffer buffy = new StringBuffer();
          while (myMatcher.find()) {
              myMatcher.appendReplacement(buffy, replace_str);
          }

          myMatcher.appendTail(buffy);
          System.out.println(buffy.toString());
          String newHtml = buffy.toString();
          return newHtml;
  }

0
 

Author Comment

by:dkim18
ID: 9888952
this line dedected absolute links.
final String REPLACE_PATTERN = "<a href=(\")*[/\\.]*";

I am trying dedect relative link
ex)
 <a href="save/save.txt">
and trying replace with something like "<a href="http://www.abc./com/hw1/"
so I can get absolute link like "<a href="http://www.abc./com/hw1/save/save.txt">
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9889153
Sorry - didn't know you had mixed relative/absolute in source. Shall tweak it if I have time ;-)
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 300 total points
ID: 9892530
This should leave the absolute ones untouched. It works by replacing the ones that throw an exception (due to no protocol - i.e. relative) and replace in the handler

 public static String patternReplaceURL(String htmlWebPage, String url, String htmlFile) throws Exception  {
    final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL;
    final String FIND_PATTERN = "(<a href=\"*)(([/\\.]*)([^>\"]+))";
    final String replace_str = "$1http://www.xxx.com/$4";
    Pattern myPattern = Pattern.compile(FIND_PATTERN, FLAGS);
    Matcher myMatcher = myPattern.matcher(htmlWebPage);
    StringBuffer buffy = new StringBuffer();
    while (myMatcher.find()) {
      // DEBUG
      /*
      System.out.println("$1 = " + myMatcher.group(1));
      System.out.println("$2 = " + myMatcher.group(2));
      System.out.println("$3 = " + myMatcher.group(3));
      System.out.println("$4 = " + myMatcher.group(4));
      */
      try {
        URL uri = new URL(myMatcher.group(4));
      }
      catch(MalformedURLException e) {
        myMatcher.appendReplacement(buffy, replace_str);
      }
    }

    myMatcher.appendTail(buffy);
    ///System.out.println(buffy.toString());
    return buffy.toString();
  }
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Protect jar file - windows app 2 65
jar file executable 12 58
how to debug htl and js pages 8 39
Delphi Firemonkey: if the Sms contain special characters it won't send it 3 43
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question