?
Solved

Regular expression to fix html errors

Posted on 2005-03-08
13
Medium Priority
?
894 Views
Last Modified: 2012-06-27
Hi,

I'm having to edit another person's HTML code (hundreds of files) and so many times   is written as just &nbsp - which can cause some browsers to render it as text. Also all singular ampersands "&" should be written as & to be HTML4.01 compliant (and XHTML) - I think.

I'm using TextPad and I just need a simple regex that will let me replace an instance of "&nbsp" that ISN'T "&nsbp;" with " ". And also instances of JUST "&" (with possibly preceding and after letters, if written incorrectly) with &

Obviously will need two Regex's but simpler to ask in one question. I just can't seem to get my head round the regex's since most websites seem to have disgustingly complicated ways of explaining them. I've tried http://www.regular-expressions.info/ which seems useful if you have twenty hours of time to learn, which unfortunately I don't :)

So thanks in advance,

Dave
0
Comment
Question by:davehamer
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 5
13 Comments
 
LVL 96

Expert Comment

by:Bob Learned
ID: 13485972
Start with:

Replace &nsbp[^;] with &nsbp;

Bob
0
 

Author Comment

by:davehamer
ID: 13493977
That unfortunately doesn't work;

I already tried that one from reading the examples, however in Textpad it selects the &nbsp and the following character.

For example" &nbsp<img" with that replace would become "&nbsp;img".

Perhaps there is a command that I am missing to "save" the end character so that it can be used in the replacement?

I've upped the points to 80; I'm sure that this is a simple answer tho.

Dave.
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 13495410
Try this:

&nsbp(?![;])

Bob
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:davehamer
ID: 13497041
Doesn't match anything this time :(
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 13497087
Where are you running this from?  I tested it in VB.NET, but it should still be a valid expression.

I tested the expression with the specific case of &nbsp<img and got &nbsp;<img.

Bob
0
 

Author Comment

by:davehamer
ID: 13497142
Hi Bob,

Thanks for your replies; I'm using TextPad 4.7.3 as specified in the original question. The reg-ex engine in this piece of software should be the same as any else AFAIK.

It is available as a trial download from:

http://www.textpad.com/download/index.html#downloads

Thanks;

Dave
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 13497262
Right, I have a few Regular Expression questions in play, and I just got a little confused.

Bob
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 13497293
BTW, not all Regular Expression engines are the same.

Bob
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 13497464
This is confusing, because I checked the Posix option for Regular Expressions in preferences, and I looked up that the '?!' is a negative lookahead expression character, and still it doesn't find anything.

(Scratching head)

Bob
0
 
LVL 96

Accepted Solution

by:
Bob Learned earned 200 total points
ID: 13497509
I found a reference to a problem in their forum:

http://www.textpad.info/forum/viewtopic.php?t=5583&highlight=regular+expression+lookahead

     You want to ignore not a list of characters but a sequence of characters.
     I think Textpad's Regex machine is not up to it as this requires "negative lookahead assertion".

Bob
0
 

Author Comment

by:davehamer
ID: 13504048
Thanks for your further help;

I've improved the expression further:

By using :

\(&nbsp\)\([^;]\)

This still selects the next char as well, but I can then use the replacement syntax of:

\1;\2

The only problem is that the expression will ONLY match &nbsp that is followed by ANOTHER character. Unfortunately it won't match line breaks so if a line just contains &nbsp (which it does cos this guy is a muppet) it won't be matched. I suppose I could use a second regex to match those ones( The simplest being "&nbsp\n" - maybe you can build this into a single regex? I dont know because I'm still new to this.

Thanks for your help so far Bob, hopefully we can get this one kicked in the head; putting points upto 100 for when we get a completed answer.

Dave
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 13516487
I am fresh out of ideas, sorry :(

Bob
0
 

Author Comment

by:davehamer
ID: 13611509
Since no-one else has contributed a correct answer; I will submit the points to Bob but with a lower grade due to a slightly incomplete answer.

Ty;
Dave
0

Featured Post

[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this post we will learn how to make Android Gesture Tutorial and give different functionality whenever a user Touch or Scroll android screen.
Make the most of your online learning experience.
Introduction to Processes
Starting up a Project
Suggested Courses
Course of the Month8 days, 3 hours left to enroll

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question