Link to home
Start Free TrialLog in
Avatar of dlcnet
dlcnetFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Regular Expression


Hi Experts!

I need a regex that will filter out the tile of the image

Input : <img src="x.jpg" alt="image" title="mysooper dooper image" />

Output <img src="x.jpg" alt="image" title=" " />
SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
                     String ss =   "<img src=\"x.jpg\" alt=\"image\" title=\"mysooper dooper image\" /> ";


Pattern p = Pattern.compile("title=\"(.*?)\"");
        
        Matcher m = p.matcher(ss);
        while (m.find()) {
           System.out.println(m.group(1));
        }

Open in new window


Output:
mysooper dooper image

Open in new window

You really need the following for tolerance though
s = s.replaceAll("title\\s*=\\s*\".*?\"", "title=\"\"");

Open in new window

Avatar of dlcnet

ASKER

@ CEHJ

 Hi ! I tried both of them and the title of the image is still there :(
Please show your actual input where it failed
The following is the output from the code below

<img src="x.jpg" alt="image" title="" />
String s =   "<img src=\"x.jpg\" alt=\"image\" title=\"mysooper dooper image\" /> ";
s = s.replaceAll("title\\s*=\\s*\".*?\"", "title=\"\"");
System.out.println(s);

Open in new window

How about:
String source = "<img src=\"x.jpg\" alt=\"image\" title=\"mysooper dooper image\" />";

String result = source.replaceAll("<img [^>]*)title=\"[^\"]*\"([^>]*)", "$1$2");

Open in new window

Hmmm...  I misread the question  : (

Correction:
String source = "<img src=\"x.jpg\" alt=\"image\" title=\"mysooper dooper image\" />";

String result = source.replaceAll("<img [^>]*title=\")[^\"]*(\"[^>]*)", "$1$2");

Open in new window

Avatar of dlcnet

ASKER

@CEHJ

my bad :) it works ... however if I have something like this is crashes:
title="blablal&&
bla
bla
bla"

title spans over multiple lines. I believe after each  line is a CR
OK. Try
s = s.replaceAll("(?s)title\\s*=\\s*\".*?\"", "title=\"\"");

Open in new window

Although I did forget an opening parentheses, the pattern I posted should account for multiple lines. Corrected paren below:
String result = source.replaceAll("(<img [^>]*title=\")[^\"]*(\"[^>]*)", "$1$2");

Open in new window

@CEHJ

That won't work either unless you turn on single-line mode  : )
Never mind. I missed it  : (
This works for me; I just tested:

    String ss =   "<img src=\"x.jpg\" alt=\"image\" title=\"mysooper dooper image\" /> ";
       ss= ss.replaceAll("title=\"(.*?)\"","");
        System.out.println(ss);

Open in new window


Output:
<img src="x.jpg" alt="image"  /> 
 

Open in new window

Or this way if you want the word title= to leave there:

     String ss =   "<img src=\"x.jpg\" alt=\"image\" title=\"mysooper dooper image\" /> ";
       ss= ss.replaceAll("title=\"(.*?)\"","title=\"\"");
        System.out.println(ss);

Open in new window

Output:
<img src="x.jpg" alt="image" title="" /> 

Open in new window

>>ss= ss.replaceAll("title=\"(.*?)\"","title=\"\"");

The group is redundant and simply creates overhead. The pattern will fail for multiline
     String regexString ="title=\"(.*)\"";
            Pattern p = Pattern.compile(regexString);
            String one = "<img src=\"x.jpg\" alt=\"image\" title=\"mysooper dooper image\" />";
            String two = "<img src=\"x.jpg\" alt=\"image\" title=\" \" />";

            Matcher matcher = p.matcher(one);
            if(matcher.find())
            {
                  System.out.println(matcher.group(1));
            }
sorry i believe i am repeating the answer. sorry for that.
This works with multiline title:

             String ss =   "<img src=\"x.jpg\" alt=\"image\" title=\"mysooper "+ System.getProperty("line.seprator") + "dooper image\" /> ";
       ss= ss.replaceAll("title=\"([^\r\n]*?)\"","title=\"\"");
        System.out.println("result: " +  ss);

Open in new window

Output:

result: <img src="x.jpg" alt="image" title="" /> 

Open in new window

Yes, true, group is not necessary, I first thought that filter out means oppositely to extract;
group is from that time
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
group is necessary if u would like to get the title name.