Solved

Regular expressions to replace via *-type wild card

Posted on 2014-01-10
38
232 Views
Last Modified: 2014-08-03
Hello

I have a database utility that will let me search for strings.  It says it can use "Regular Expressions" -- something that I am not really familiar with.

I have a website with many webpages and images.  I am using this utility (it's NOT phpmyadmin)  to search the database and remove all html references to the images height, width and border size.

So I have tons of image sizes with various widths and heights and borders with different pixels sizes in the html. Example:

<img height="251" width="783" border="5"

that I need to change to

<img height="" width="" border=""

Since the site has images with various heights and widths and various border sizes, I am looking for a "regular expression" wild card that I can use in my search that will let me change everything no matter what the height, width and border sizes.

Not knowing how regular expressions wild cards work, the below is an example where the * needs to be replaced by a wild card regular expression (again I am probably not using correct syntax with regular expressions, but I assume you get the idea of what I mean:

I need to search for
<img height="*" width="*" border="*"

Once I get the above correct, my utility will take care of the replacement part.

BTW I also see this in the utility's documentation

When the Regular Expressions option is checked, you will also be able to search in UTF-8 mode. This can help you when using special characters or certain character sets.

Thanks!

Rowby
0
Comment
Question by:Rowby Goren
  • 21
  • 13
  • 2
  • +1
38 Comments
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 39771212
please specify which database you refer to ...

in normal regular expression it would be:

<img height="\d+" width="\d+" border="\d+"

however, this will match those where indeed all 3 items have been specified ..
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39771228
Rowby, have a look at this cartoon. http://xkcd.com/1171/

You might find it is easier to handle the attribute removal one at a time, using separate search arguments.  The image tag attributes can be in any order, and regular expressions are order-sensitive.

Also you may find that the regular expression engines are a little different from one implementation to the next.  And of course, HTML is very lax about attributes.  Quotes are not always used, single or double quotes can be intermixed, etc.

I can try to show you some PHP examples of regular expressions that do this, but I don't think it will be fruitful to do it all in one pass.

And before anything else, back up your data model!
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 39771248
As Ray mentions, different utilities offer different regex implementations. It would be best if you named the utility you are using so we can provide you with the appropriate syntax.
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39771253
Hi  Guy Hengel / Ray,  I see what you mean...  I will have to do each search separately -- no problem with that.
When the Regular Expressions option is checked, you will also be able to search in UTF-8 mode. This can help you when using special characters or certain character sets.

System info:         PHP 5.3.3   Mysql         5.3.3

And Ray great cartoon. And yes I have a backup of the database and will do another one before I do the replacement.

So yes, Ray, three passes. This website is pretty old and while it all uses Joomla there could be variations of the " and ' , etc.

The utility lets me preview the results before I commit to them, so that will help.

And if it doesn't work I will keep on backing up the database between each attempt.

FYI what I am doing is removing the image size info because the site is now a twitter-bootstrap responsive site and I need to remove that to let the images resize based on resolution etc.

Thanks!

Rowby
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 39771301
so, the first step would be to find all those items...

I would limit the mysql query to find all "<img * tags, to start with, and to the rest in php.

for each attribute, you then check the value:
{
 if you inf <img * heigth=*  * />  [   with the / being optional in "simple" html...   ]
 you then want to replace/remove the heigth=* or only the * (value) ?
}

save back the item to the db if you actually changed the value...

please clarify which item/part you actually need help with
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39771401
This seems to work for the test cases I've got here.
http://www.laprbass.com/RAY_temp_rowby.php

<?php // RAY_temp_rowby.php
error_reporting(E_ALL);
echo '<pre>';


// SEE http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_28335348.html
// REF http://php.net/manual/en/pcre.pattern.php


// TEST DATA COLLECTION
$data = array
( '<img height="251" width="783" border="5" />'
, '<img height=251  />'
, '<img height = 251  />'
, "<img height='251' width='783' />"
)
;


$dlm = <<<'EOD'
("|'| ?)
EOD;


// A REGULAR EXPRESSION TO HANDLE HEIGHT
$rgx
= '/'                          // START OF REGEX
. 'height'                     // STRING LITERAL
. '.*?'                        // ANYTHING OR NOTHING
. '='                          // STRING LITERAL
. '('                          // GROUP START
. $dlm                         // MAYBE QUOTE OR BLANK
. '\d+'                        // NUMBER(S)
. $dlm                         // MAYBE QUOTE OR BLANK
. ')'                          // GROUP END
. '/i'                         // END OF REGEX WITH CASE-INSENSITIVE
;
echo PHP_EOL . 'REGEX: ' . htmlentities($rgx) . PHP_EOL;


// TEST AND SHOW THE WORK PRODUCT
foreach ($data as $img)
{
    $new = preg_replace($rgx, 'height=""', $img);
    echo PHP_EOL . htmlentities($img);
    echo PHP_EOL . htmlentities($new);
    echo PHP_EOL;
}

Open in new window

HTH, ~Ray
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39771407
Hi

I need only to replace the values within the quotes.
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39771465
HI I'm away from my workstation but here is the utility panel I will be working in, if that helps.

Be back within the hour to test.....
Database replacer admin panel
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39771508
only to replace the values within the quotes
Risk with that strategy might be that there are other constructs using quotes.  <img alt="23" ... etc.
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39771513
Example of utility replacement dialogHi Ray

Can your php "expressions" work within my joomla database utility -- see screen capture in my earlier comment?

Also the original html for images will have varying height and width and border values.

Some possible examples:

<img height="251" width="783" border="5"

<img height="100" width="217" border=""

<img height="112px" width="321" border="0"


and so on.

My utility can let me preview, any amount of data, for example just 25 "finds" or "100" finds, so I can preview the results before I save and commit to the changes.   The utility makes me approve "twice" before it will save the changes.

Rowby
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39771550
Hello

I know if I was doing this in microsoft word or similar I could just do

Search for <img height="*" width="*" border="*"

Replace with <img height="" width="" border=""

There is nothing as simple as that with regular expressions?

I suppose I could load the sql database in notepad plus and try that search and replace  (of course keeping a backup....

What do you think?  

Should I try it in notepadplus?????

To be extra cautious I could do it in a clone of the site that I have.....  hmmmmm

Rowby
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39771573
Notepad plusAs it turns out Notepad plus apparently won't work like word and let me use * wild card.

It likes to see regular expressions too?
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 39772033
Within Notepad++, try the following Find:

<img height="[^"]+" width="[^"]+" border="[^"]+" 

Open in new window

0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39772161
Hi Kaufmed

I tried it but it didn't work -- but only, I think because the database is showing the formatting as follows:  (Guess that's how mysql handles quotes.....)


<img height=\"258\" width=\"200\"  border=\"0\"

So can you incorporate the  backslashes into your suggestion
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39772540
BTW I can see where this can be dangerous :)

For example I am also seeing image things like this in the HTML

<img style="margin-right: 35px; float: left;" alt="bw-mitchell" src="images/stories/music_misc/bw-mitchell.jpg" height="187" width="145"


And all I want to touch is height="187" width="145"  in the above example....

So I guess I need to be VERY PRECISE, if that's possible.

The nice thing about the utility is actually previews the HTML changes before I commit.  So I am thinking of NOT using notepad plus which would do a global replacement of everything.

At least with the browser based joomla utility I can look at 50 changes at a time and make sure there are no surprises.

HHHHHHHHHHHHHHHHhhhhhhhhhhhhhhhhhmmmmmmm

Rowby
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39773349
<img height=\"258\" width=\"200\"  border=\"0\"
This is an artifact of multiple escape sequences.  Perhaps magic quotes was on, and the script also used MySQL_Real_Escape_String()?
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_6630-Magic-Quotes-a-bad-idea-from-day-one.html
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39773360
Please see http://www.laprbass.com/RAY_temp_rowby.php where we can handle most of the likely expressions of the height attribute.  Similar ways might exist to handle other attributes of the image tag.  Regular Expressions are complicated things and regex engines vary a bit in each implementation.  So you might want to consider using the most predictable tool for the process.  For example, you could write a quick script that would query the data base, make the replacements for height on a row-by-row basis and store the changes for each row.  Then run another similar script to handle width, etc.

<?php // RAY_temp_rowby.php
error_reporting(E_ALL);
echo '<pre>';


// SEE http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_28335348.html
// REF http://php.net/manual/en/pcre.pattern.php


// TEST DATA COLLECTION
$data = array
( '<img height="251" width="783" border="5" />'
, '<img height=251  />'
, '<img height = 251  />'
, "<img width='783' height='251' />"
, '<img height="251" width="783" border="5"'
, '<img height="100" width="217" border="" />'
, '<img width="321" border="0" height="112px" />'

)
;


$dlm = <<<'EOD'
("|'| ?)
EOD;


// A REGULAR EXPRESSION TO HANDLE HEIGHT
$rgx
= '/'                          // START OF REGEX
. 'height'                     // STRING LITERAL
. '.*?'                        // ANYTHING OR NOTHING
. '='                          // STRING LITERAL
. '('                          // GROUP START
. $dlm                         // MAYBE QUOTE OR BLANK
. '\d+p?x?'                    // NUMBER(S) + px
. $dlm                         // MAYBE QUOTE OR BLANK
. ')'                          // GROUP END
. '/i'                         // END OF REGEX WITH CASE-INSENSITIVE
;
echo PHP_EOL . 'REGEX: ' . htmlentities($rgx) . PHP_EOL;


// TEST AND SHOW THE WORK PRODUCT
foreach ($data as $img)
{
    $new = preg_replace($rgx, 'height=""', $img);
    echo PHP_EOL . htmlentities($img);
    echo PHP_EOL . htmlentities($new);
    echo PHP_EOL;
}

Open in new window

0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39773524
Hi Ray,

I suppose I could run that script with this joomla component?

http://mijosoft.com/joomla-extensions/mijosql-joomla-database-manager

Of course I would test it first on my cloned playground version of the website.

Rowby
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39773527
I'm not familiar with MiJoSoft, so I can't comment on that.  But if you have PHP and MySQL on the playground server I can show you how to address this issue.  I would need to see the CREATE TABLE statement for one of the applicable tables.  From that I think I can modify the latest version of the REGEX script and show you how it's done.
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 9

Author Comment

by:Rowby Goren
ID: 39773544
phpmyadmin I am in phpmyadmin.  Not sure where I look for the CREATE TABLE statement.

Does this screen capture give a clue?

Thanks

Rowby
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39773558
Click "Structure" and see what we get.
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39773617
StructureHi

Here's structure
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39773719
Great, thanks.  Is the column we want to modify called "images?"
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39773760
Actually it would be the introtext column.   That is where the HTML content is for the joomla "articles".    I browsed the table and confirmed that.

There is another column called fulltext.   but I did some spotchecks and didn't see anything in that field.

This is what I see in the image field when browsing via phpmyadmin
{"image_intro":"","float_intro":"","image_intro_alt":"","image_intro_caption":"","image_fulltext":"","float_fulltext":"","image_fulltext_alt":"","image_fulltext_caption":""}

Open in new window


I am pretty sure that the image field is for any overall image that would be associated with the specific article -- and I don't think it applies for what we are doing.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39773766
OK, I'll create a small test case using the introtext.  Please clarify (in case I misunderstand) that we want to nullify these attributes:

height
width
border

I'll post back with a script that is good enough to test in a few moments.
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39773770
Confirmed!

height
width
border

zip the values for the above
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 39773817
Rowby: To make this work, go to line 11 and replace the ??? with the name of the table.  Then go to lines 14-17 and provide your data base credentials.  Make sure you have a backup of the table, then run the script and see how it looks.  I can't test it here, but I've made unit tests of most of it, so I do not expect any errors.

It should print one line for each update as it runs.

<?php // RAY_temp_rowby.php
error_reporting(E_ALL);
echo '<pre>';


// SEE http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_28335348.html
// REF http://php.net/manual/en/pcre.pattern.php


// THIS IS THE TABLE THAT WE WILL UPDATE
$mytable = '???';

// DATABASE CONNECTION AND SELECTION VARIABLES - GET THESE FROM YOUR HOSTING COMPANY
$db_host = "localhost"; // PROBABLY THIS IS OK
$db_name = "??";
$db_user = "??";
$db_word = "??";

// OPEN A CONNECTION TO THE DATA BASE SERVER AND SELECT THE DB
$mysqli = new mysqli($db_host, $db_user, $db_word, $db_name);

// DID THE CONNECT/SELECT WORK OR FAIL?
if ($mysqli->connect_errno)
{
    $err
    = "CONNECT FAIL: "
    . $mysqli->connect_errno
    . ' '
    . $mysqli->connect_error
    ;
    trigger_error($err, E_USER_ERROR);
}

// GET THE ROWS
$sql = "SELECT id, introtext FROM $mytable ORDER BY id";
$res = $mysqli->query($sql);
if (!$res)
{
    $err
    = 'QUERY FAILURE:'
    . ' ERRNO: '
    . $mysqli->errno
    . ' ERROR: '
    . $mysqli->error
    . ' QUERY: '
    . $sql
    ;
    trigger_error($err, E_USER_ERROR);
}

// DETERMINE HOW MANY ROWS OF RESULTS WE GOT
$num     = $res->num_rows;
$num_fmt = number_format($num);
if (!$num)
{
    echo "<br/>QUERY: $sql ";
    echo "<br/>FOUND NO DATA ";
    echo PHP_EOL;
}
else
{
    echo "<br/>QUERY: $sql ";
    echo "<br/>FOUND $num_fmt ROWS OF DATA ";
    echo PHP_EOL;
}

// ITERATE OVER THE ROWS, REPLACING THE DATA IN THE TABLE
while ($row = $res->fetch_object())
{
    $newtext = unAttribute($row->introtext);
    echo PHP_EOL;
    print_r($newtext);
    $newtext = $mysqli->real_escape_string($newtext);
    $uql = "UPDATE $mytable SET introtext = '$newtext' WHERE id = $row->id LIMIT 1";
    $ues = $mysqli->query($uql);
    if (!$ues)
    {
        $err
        = 'QUERY FAILURE:'
        . ' ERRNO: '
        . $mysqli->errno
        . ' ERROR: '
        . $mysqli->error
        . ' QUERY: '
        . $uql
        ;
        trigger_error($err, E_USER_ERROR);
    }
}


function unAttribute($tag)
{
    // REMOVE THESE ATTRIBUTES
    $att = array('height', 'width', 'border');

    $dlm = '("|\'| ?)';

    // A TEMPLATE REGULAR EXPRESSION TO DETECT ATTRIBUTES
    $rgx
    = '/'                          // START OF REGEX
    . 'XXX'                        // STRING LITERAL
    . '.*?'                        // ANYTHING OR NOTHING
    . '='                          // STRING LITERAL
    . '('                          // GROUP START
    . $dlm                         // MAYBE QUOTE OR BLANK
    . '\d+p?x?'                    // NUMBER(S) + px
    . $dlm                         // MAYBE QUOTE OR BLANK
    . ')'                          // GROUP END
    . '/i'                         // END OF REGEX WITH CASE-INSENSITIVE
    ;

    // REMOVE EACH OF THE ATTRIBUTES
    foreach ($att as $str)
    {
        $rep = $str . '=""';
        $pat = str_replace('XXX', $str, $rgx);
        $tag = preg_replace($pat, $rep, $tag);
    }
    return $tag;
}

Open in new window

Cheers, ~Ray
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39773820
Thanks so much Ray.  I'm on the road but will test it later.

On my cloned test server -- not the live one!

Rowby
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39773836
10-4.  If you get anything that looks goofy, please post back so I can take a look at it.
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39773985
Running it now and it's going through the hundreds of pages.

It's echoing nicely the html view of the pages as it grinds through.

I've looked at the source code and it looks like it's working fine!

I'll do some more checking and then probably run it on the actual development site -- after I do a backup.

I'll probably wait until sunday a.m. to do my development site -- I sometimes wake up in the middle of the night with a important thought about these things.  However in this case, it looks like it is perfect!!

Thanks, Ray (and the other pros here),

Will report back!

Rowby
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39774010
Great!  Best regards, ~Ray
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39810519
Hi everyone,

The site went up a few days ago and I think I will wait a week before I start complimenting this solution on the live site.

I will have a new development site about then to do my final testing once again.

Thanks for your patience.

Rowby
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39892032
Hi Ray

I did the first file and it worked fine.  Got rid of all the javascript.  Perfect!

Let's call that first html file HTML file #1

But when I put in a new file (HTML file #2) (and completely remove the old HTML File #1) and run the script again the results of RAY_temp_rowby.php is the fixed HIML File #1

Are the results of the script saved elsewhere (not on the server)? I am using xxamp.  

Is there something I need to do to "flush out" the previous output of HTML file #1

I rebooted xampp and even rebooted my computer -- but when I run the script again the output is the fixed original file HTML #1.

I have a completely new html file with a different name.html  with a completely new html source code (with the javascript to be removed) but somehow the output still showing the fixed html #1 results).  The Html #1 file is no longer even in the folder.

????
Thanks

Rowby
RAY-temp-rowby.php
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39892055
Xamp control panel info, if it helps:

7:12:09 AM  [main]       Initializing Control Panel
7:12:09 AM  [main]       Windows Version: Windows 8  64-bit
7:12:09 AM  [main]       XAMPP Version: 1.8.3
7:12:09 AM  [main]       Control Panel Version: 3.2.1  [ Compiled: May 7th 2013 ]
7:12:09 AM  [main]       You are not running with administrator rights! This will work for
7:12:09 AM  [main]       most application stuff but whenever you do something with services
7:12:09 AM  [main]       there will be a security dialogue or things will break! So think
7:12:09 AM  [main]       about running this application with administrator rights!
7:12:09 AM  [main]       XAMPP Installation Directory: "c:\xampp\"
7:12:09 AM  [main]       Checking for prerequisites
7:12:19 AM  [main]       All prerequisites found
7:12:19 AM  [main]       Initializing Modules
7:12:19 AM  [main]       Starting Check-Timer
7:12:19 AM  [main]       Control Panel Ready
7:12:21 AM  [Apache]       Attempting to start Apache app...
7:12:21 AM  [mysql]       Attempting to start MySQL app...
7:12:23 AM  [Apache]       Status change detected: running
7:12:23 AM  [mysql]       Status change detected: running
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39892251
I'm assuming this has to do something with "flushing" or clearing "cache" / "buffer" in xampp.

Should I post such a question on an EE forum here to find out how to do that?

Or can it be handled in the script?

??

Rowby
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 39892978
Ray -- my three last comments were from a different question you are on in experts exchange.

I'll post them correctly in the right question here on EE.

EEEK!

Rowby
0
 
LVL 9

Author Closing Comment

by:Rowby Goren
ID: 40237392
Thanks Ray.

Sorry for the delay in awarding the points. It's been so long I forgot what this question was about!

Rowby
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40237454
Thanks, Rowby!  All the best, Ray
0

Featured Post

Easy Project Management (No User Manual Required)

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
MySQL database data submission 7 39
how to filter email.ro:pass? 5 25
while loop in html mail format 5 32
Paging Using PHP 7 32
Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now