Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Regular expression

Posted on 2002-06-05
10
Medium Priority
?
148 Views
Last Modified: 2013-12-25
Hi friends

I want to extract the name of background image from a body tag. A typical tag may contain attributes like text, vlink, bgcolor, background,marginheight,leftmargin etc. or some of them or none of them with varying number of spaces and quotes . I want to extract the name of bgimage from this tag. For example I want to extract "tree.gif" from the below tag  <BODY marginwidth="0" marginheight="0"  background= "tree.gif" leftmargin=0 topmargin="0" link="#330099" vlink="#330066" alink="#330066" >

Please advice...
Thanks
0
Comment
Question by:boolee
10 Comments
 
LVL 51

Expert Comment

by:ahoffmann
ID: 7055951
s/.*background="?([^" ]*)"?.*/$1/
0
 
LVL 15

Expert Comment

by:samri
ID: 7056244
ahoffman,

and that is in Perl right?
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 7056347
right
(or is somebody doing real/advanced regular expressions in other language *and* use it for CGI ;-)
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 10

Expert Comment

by:rj2
ID: 7057283
bolee,
Html might contain (almost) arbitrary whitespace.
Code below should work also if there is linebreak before background filename (as it is in the sample you posted)

#!/usr/bin/perl
use strict;

$_=<<ENDHTML;
<BODY marginwidth="0" marginheight="0"  background=
"tree.gif" leftmargin=0 topmargin="0" link="#330099" vlink="#330066" alink="#330066" >
ENDHTML


/(.|\s)*background\s*=\s*"([^"]*)"/;
print "$2\n";
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 7057646
\n and \r problem fixed too, but how about missing ", or pathnames containing whitespaces (which is not allowed according RFC, but M$ makes it work):

Simply forgot my disclaimer in my very first comment:
   to be improved in many ways
0
 
LVL 1

Author Comment

by:boolee
ID: 7058508
HI all

I shall tell u the real problem . I am doing a mail handling program and I want to display the embedded image or background image from the incoming mail in a table.  I have an HTML template in the form

<HTML>
    <BODY>
    <TABLE>
    <TR> <TD BACKGROUND="MY FILENAME"><TD></TR>
    .............................
    .............................

The problem is that when I receive a mail from outlook/eudora/netscape messenger with a stationary background image, the mail text itself is an HTML file with tag <BODY BACKGROUND="an image"....> and this tag overrides my previous declaration. So the image comes as a page background. I am not sure about the attributes of body tags from different mail softwares. So I want to extract the filename and replace it in my table. There may be different number of body tags depending on the number of times of reply/forward. Please tell me a solution.

ahoffmann's answer worked for most of my attribute combinations except a few ... but since different mailing softwares sent the body tag in different ways I am looking for something robust...

Thanks
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 7058694
if you have a mail with sevaral replies and forwards embeded, you probably also have more than one <BODY> and/or <HTML> tag.
I.g. we can say that this all together is no longer a valid HTML syntax tree, 'cause you never can restrict someone to write her/his comments anywhere inbetween the previous text.

You need to identify the <BODY> you're interested in, or you need to change them all.
I suggest to change them all. But keep in mind that someone sends mail which contains the string literal
   <body background=this-all-together-is-a-literal-string>
which must not be changed at all.
Means that you cannot get a 100% solution.

s/.*<body\s+.*background="?([^" ]*)"?.*/$1/ig
performed on all your lines should give you most of the required images (keep previous comments about \n \r and blanks in mind)
0
 
LVL 1

Author Comment

by:boolee
ID: 7061125
So u mean I have to replace all the inner BODY and HTML tags? Can I just remove the background property and preserve the rest of the tag? Do u think it is useful?
0
 
LVL 51

Accepted Solution

by:
ahoffmann earned 150 total points
ID: 7061847
> Can I just remove the background property and preserve the rest of the tag?
You can, it's up to you.

> Do u think it is useful?
If you never expect mails which contain the literal string background=some.gif (like this text here), then it might be usefull for you in most cases.

How about replacing all occourences of background=whatever
with for example:
    "back--ground = whatever"
so it will not be a valid tag attribute anymore (hopefully in future too). And you still can read the text, even if it was not meant as tag attribute but literal string.
0
 
LVL 1

Author Comment

by:boolee
ID: 7077229
Thanks ahoffmann
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently I have been answering a lot of questions like this in IT forums that I frequent. The question posed is usually something along the lines of "We have software X installed and need to uninstall it for reason Y" or some other variant of the sa…
In this tutorial I will show you how to make a simple HTML bar chart with the usage of WhizBase, If you want more information about WhizBase please read my previous articles at http://www.experts-exchange.com/ARTH_5123186.html (http://www.experts-ex…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
The viewer will learn how to dynamically set the form action using jQuery.
Suggested Courses

876 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question