?
Solved

How to change all characters of an HTML string to asterisk (*) except for html tags using reg expression?

Posted on 2006-05-23
10
Medium Priority
?
166 Views
Last Modified: 2010-04-05
Anybody here knows or expert in regular expression? I'm having a hard time looking for a solution of changing all charaters in an HTML string to asterisk (*) except for html tags using regular expression. I'm using TRegExpr.

Thanks,

Edwin
0
Comment
Question by:edwinaceron
  • 6
  • 3
10 Comments
 
LVL 28

Accepted Solution

by:
2266180 earned 900 total points
ID: 16740630
what about scripts? and other data that is not effective text. do you really want everything except html tags?
if so, then you ca use a regular expressino of something like:
\>[CHARS]*\<
I escaped < and > just to be on the safe side
replace CHARS iwth all characters you want to be replaced: for example: a-zA-Z0-9\.,:;'"/\?\\+=-_\)\(\*\&^%$#@!~
you might need to escape more chars than I did

if regular expressions is not a requirement, then doing a simple text parsing will do as well:

function replaceit(html:string):string;
var i:integer; dor:boolean;
begin
  dor:=false;
  for i:=1 to length(html) do
  begin
    if html[i]='>' then dor:=true
    else if html[i]='<' then dor:=false
    else if dor then html[i]='*';
  end;
  result:=html;
end;
0
 
LVL 14

Assisted Solution

by:Pierre Cornelius
Pierre Cornelius earned 150 total points
ID: 16740763
Similar to Ciuly's approach but added feature of keeping line breaks and carrige returns in tact.


DFM File
======
object Form1: TForm1
  Left = 192
  Top = 114
  Width = 696
  Height = 421
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  PixelsPerInch = 96
  TextHeight = 13
  object Memo1: TMemo
    Left = 8
    Top = 8
    Width = 673
    Height = 329
    Lines.Strings = (
      '<html>'
      '<head>  some text'
      '</head>'
      'Some Text'
      '<body>'
      'some text'
      '</body> some text'
      '</html>'
      ''
      '')
    ScrollBars = ssBoth
    TabOrder = 0
  end
  object Button1: TButton
    Left = 16
    Top = 352
    Width = 75
    Height = 25
    Caption = 'Change'
    TabOrder = 1
    OnClick = Button1Click
  end
end


PAS File
=====
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls;

type
  TForm1 = class(TForm)
    Memo1: TMemo;
    Button1: TButton;
    procedure Button1Click(Sender: TObject);
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.Button1Click(Sender: TObject);
var SaveChange, Change: boolean;
    i: integer;
    s: string;
begin
  s:= Memo1.Text;
  for i:= 0 to length(s)-1 do
  begin
    if (s[i+1] = '<') then Change:= false;
    if (i < Length(s)-1) AND (s[i] = '>') then Change:= true;

    if (s[i+1] = #13) //leave carriage returns in tact
        OR (s[i+1] = #10) //leave line breaks in tact
      then begin
        SaveChange:= Change;
        Change:= false;
        if change then s[i+1]:= '*';
        Change:= SaveChange;
      end
      else if change then s[i+1]:= '*';
  end;
  Memo1.Text:= s;
end;

end.


Kind regards
Pierre
0
 

Author Comment

by:edwinaceron
ID: 16740917
How about if HTML has less than (<) or greater than (>) character used in text not in tag? Is it safe? how about a certain word must not be asterisked like the first word?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 28

Expert Comment

by:2266180
ID: 16740924
it will not have those as ">" or "<" but escaped. like &gt; or &lt; ;)
0
 
LVL 28

Expert Comment

by:2266180
ID: 16740940
ignore my prevoius comment :D
it seems that if there isn't text after the < or before the > it is displayed correctly
so you will have to make an extra check that
if (s[i]='<') and (not (s[i+1] in [#9,#32,#13,#10])) then// tab, space, cr, lf
same for <
0
 

Author Comment

by:edwinaceron
ID: 16741007
i didn't think about &gt and &lt :) anyway, thanks :)
is there a way to choose which word to show up (not replacing * on each charater of a word)?
0
 
LVL 28

Expert Comment

by:2266180
ID: 16741211
yes. you can do a pre-parsing for the word in question, but it will slow down the algorithm a bit.
something like (I'll presume you have a list of such words):
function followsWord(html:string; pos:integer; list:array of string):boolean;
var l:array of boolean;//signal which word to search for
    hasnext:boolean;
    j,p:integer;
begin
  result:=false;
  setlength(l,length(list));
  //it should all be false, but just to make sure:
  for j:=1 to length(l) do
    l[j-1]:=true;
  hasnext:=length(list)>0;
  p:=1;
  while (i<length(html)) and hasnext do
  begin
    hasnext:=false;
    for j:=1 to length(l) do
      if l[j-1] then
      begin
        if ((length(list[j-1])<=p) and (list[j-1][p]<>html[i])) or (length(list[j-1])>p) then l[j-1]:=false
        else if length(list[j-1])=p then begin result:=true; hasnext:=false; break; end// word found. exit
        else hasnext:=true;// still have words to search for
      end;
    inc(i);
    inc(p);
  end;
  setlength(l,0);// free the dynamic array
end;

not tested.. looks like it would do the job
0
 
LVL 28

Expert Comment

by:2266180
ID: 16741224
forgot to mention how to use, though it's pretty straitforward:
when you check for the if s[i]='>' you also do an  " and (not followWord())"
0
 

Author Comment

by:edwinaceron
ID: 16756932
can you post a code to display all first word of each sentences?
0
 
LVL 28

Expert Comment

by:2266180
ID: 16758320
you should probably open another question for that since it is not related to this one ;)
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
This Micro Tutorial will teach you how to add a cinematic look to any film or video out there. There are very few simple steps that you will follow to do so. This will be demonstrated using Adobe Premiere Pro CS6.
Whether it be Exchange Server Crash Issues, Dirty Shutdown Errors or Failed to mount error, Stellar Phoenix Mailbox Exchange Recovery has always got your back. With the help of its easy to understand user interface and 3 simple steps recovery proced…
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question