Solved

Defining regular expression

Posted on 2003-12-04
13
478 Views
Last Modified: 2010-03-31

How do I define regrex that start with <a href=" but not http:// after that.

ex)
I want to detect relative link such as <a href="HW/homework.txt"> not <a href="http://....">

p.s.
the format is going to be <a href="<hyperlink>">

0
Comment
Question by:dkim18
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 3
  • +1
13 Comments
 
LVL 35

Expert Comment

by:girionis
ID: 9880822
 If you already have the string then simply do a string.indexOf("http"). If it is not found it will return a -1.
0
 
LVL 35

Expert Comment

by:girionis
ID: 9880824
 BTW the absense of http does not guarante a relative link since the path might as well be /home/dkim18/mpla/mpla/mpla...
0
 
LVL 24

Expert Comment

by:sciuriware
ID: 9881036
To find a relative link in String k, practise:

int foundLink;
String m = k.toUppercase(); // Catch <a href as well ....

    if((foundLink = m.indexOf("<A HREF")) >= 0)   // Found something
    {
          if(m.indexOf("HTTP", foundLink) < 0) // Not found, that's good ...
          {
                  ...... go on as you like it .......

Note: this non-regex-approach doesn't prepare for multiple spaces between A and H ...
;JOOP!
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 35

Expert Comment

by:girionis
ID: 9881078
 Better (if "s" is the string variable that holds the string)

s.toLowerCase().indexOf("http")
0
 
LVL 24

Expert Comment

by:sciuriware
ID: 9881178
Anyway dkim18, it's hard to use regex to define that you do NOT want to find something.
A more precise piece of code could be:

int foundLink;
String m = k.toUppercase(); // Catch <a href as well ....

    if(m.matches("<A +HREF) && (foundLink = m.indexOf(" HREF")) >= 0)   // Found something and allows many spaces between A and H ...
    {
          if(k.indexOf("http", foundLink) < 0) // Not found, that's good ...
          {
                  ...... go on again as you like it .......

Can you live with all above?
;JOOP!
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9882363
Try

String re =".+href=\"*http:.+|.+href=\"*/.+";
boolean absoluteUrl = someLink.matches(re);
0
 

Author Comment

by:dkim18
ID: 9888463
This is what I intended.
So far,   final String REPLACE_PATTERN = "<a href=\"[^(http)]"; this dedects all relative link but when it added with "newURL" the first character of relative path after " disappeared.
ex)
if relative link is: <a href="save/save.txt">
absoult linke is(newURL):<a href="http://www.abc/hw/

Result is: <a href="http://www.abc/hw/ave/save.txt">
but I want <a href="http://www.abc/hw/save/save.txt">
So, I link the this absolute link from my local computer.

I know the problem is here:
final String REPLACE_PATTERN = "<a href=\"[^(http://)]";
Somehow, [^(http://)]; make disapper chracter 's' from above example.

here is my cord:
---------------------------
public static String patternReplaceURL(String htmlWebPage, String url, String htmlFile){
    int splitIndex = 0;
    String newURL = null;

    splitIndex = url.indexOf(htmlFile);
    newURL = url.substring(0, splitIndex);
 
  final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL ;
  final String REPLACE_PATTERN = "<a href=\"[^(http://)]";
   
  Pattern myPattern = Pattern.compile(REPLACE_PATTERN, FLAGS);
  Matcher myMatcher = myPattern.matcher(htmlWebPage);

  StringBuffer buffy = new StringBuffer();
  for(int i = 0; i < 3 ; i++){
    if (myMatcher.find()) {
      myMatcher.appendReplacement(buffy, replace_str);
    }
  }

  myMatcher.appendTail(buffy);
  System.out.println(buffy.toString());

  String newHtml=buffy.toString();
  return newHtml;
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9888720
How will your replacement rules work in the following case?

<a href="../a/b/x.html">Some link</a>

How would you put in the parent directory?
0
 

Author Comment

by:dkim18
ID: 9888822
How will your replacement rules work in the following case?

<a href="../a/b/x.html">Some link</a>
>>if this is not absolute dir, then this will be detected and replaced with absolute dir.
>>is that what you wanted to ask?

I just need solve the replacing problem. when above code replaces relative dir with absolte dir, it cut it out the first character of relative dir.

ex)
if relative link is: <a href="save/save.txt">
absoult linke is:<a href="http://www.xyz.com/hw/index.html
(this line of code make new url path w/o index.html)
    int splitIndex = 0;
    String newURL = null;

    splitIndex = url.indexOf(htmlFile);
    newURL = url.substring(0, splitIndex);

Result is: <a href="http://www.abc/hw/ave/save.txt">
but I want <a href="http://www.abc/hw/save/save.txt">
So, I link the this absolute link from my local computer.

again, the problem is here:
final String REPLACE_PATTERN = "<a href=\"[^(http://)]";

some how it occupied the first character of relative path after <a href="


I hope I made my point clear.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9888868
Yes, let's forget that .. business for now. It seems to me your expression is not quite right - for instance, the character class does not seem appropriate here. The following works for me:

  String s = "zzzz<a href=/c/d/file.html>A</a>zzzz<a href=\"c/d/file.html\">zzzz</a>zzzz" +
        "<a href=\"../c/d/file.html\">zzzz</a>zzzz<a href=\"./c/d/file.html\">zzzz</a>";
       
        patternReplaceURL(s, null, null);
        
        
.................
        

  public static String patternReplaceURL(String htmlWebPage, String url, String htmlFile) {
        final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL;
        final String REPLACE_PATTERN = "<a href=(\")*[/\\.]*";
        final String replace_str = "<a href=$1http://www.xxx.com/";

         Pattern myPattern = Pattern.compile(REPLACE_PATTERN, FLAGS);
          Matcher myMatcher = myPattern.matcher(htmlWebPage);
          StringBuffer buffy = new StringBuffer();
          while (myMatcher.find()) {
              myMatcher.appendReplacement(buffy, replace_str);
          }

          myMatcher.appendTail(buffy);
          System.out.println(buffy.toString());
          String newHtml = buffy.toString();
          return newHtml;
  }

0
 

Author Comment

by:dkim18
ID: 9888952
this line dedected absolute links.
final String REPLACE_PATTERN = "<a href=(\")*[/\\.]*";

I am trying dedect relative link
ex)
 <a href="save/save.txt">
and trying replace with something like "<a href="http://www.abc./com/hw1/"
so I can get absolute link like "<a href="http://www.abc./com/hw1/save/save.txt">
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9889153
Sorry - didn't know you had mixed relative/absolute in source. Shall tweak it if I have time ;-)
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 300 total points
ID: 9892530
This should leave the absolute ones untouched. It works by replacing the ones that throw an exception (due to no protocol - i.e. relative) and replace in the handler

 public static String patternReplaceURL(String htmlWebPage, String url, String htmlFile) throws Exception  {
    final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL;
    final String FIND_PATTERN = "(<a href=\"*)(([/\\.]*)([^>\"]+))";
    final String replace_str = "$1http://www.xxx.com/$4";
    Pattern myPattern = Pattern.compile(FIND_PATTERN, FLAGS);
    Matcher myMatcher = myPattern.matcher(htmlWebPage);
    StringBuffer buffy = new StringBuffer();
    while (myMatcher.find()) {
      // DEBUG
      /*
      System.out.println("$1 = " + myMatcher.group(1));
      System.out.println("$2 = " + myMatcher.group(2));
      System.out.println("$3 = " + myMatcher.group(3));
      System.out.println("$4 = " + myMatcher.group(4));
      */
      try {
        URL uri = new URL(myMatcher.group(4));
      }
      catch(MalformedURLException e) {
        myMatcher.appendReplacement(buffy, replace_str);
      }
    }

    myMatcher.appendTail(buffy);
    ///System.out.println(buffy.toString());
    return buffy.toString();
  }
0

Featured Post

MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
login jsp example 24 105
maven module vs maven project 3 118
SequenceInputStream example 3 38
listing all the respondents to a twitter feed - Java 5 49
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question