?
Solved

Extract all words from a richedit and insert into a stringlist

Posted on 2007-03-19
12
Medium Priority
?
453 Views
Last Modified: 2010-04-05
Hello guys,

I have a text in a richedit and I would like a function where I could get all words from my richedit and insert into a stringlist. this function need to take away the simbols ",!@?/'etc.

Thanks
0
Comment
Question by:hidrau
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 2
12 Comments
 
LVL 26

Expert Comment

by:EddieShipman
ID: 18748573
Try something like this:

procedure Explode(myStr: String; List: TStrings);
var
  MyParser: TParser;
  MS: TMemoryStream;
begin
  MS := TMemoryStream.Create;
  MS.Position := 0;
  MS.Write(MyStr[1], Length(MyStr));
  MS.Position := 0;
  MyParser := TParser.Create(MS);
  MyStr := MyParser.TokenString;
  while MyParser.Token <> toEOF do
  begin
    if MyParser.TokenSymbolIs(MyParser.TokenString) then
    begin
      MyStr := MyParser.TokenString;
      List.Add(MyStr);
    end;
    MyParser.NextToken;
  end;
  MyParser.Free;
  MS.Free;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  Explode(StringReplace(RichEdit1.Lines.Text, '''', '''''', [rfReplaceAll]), ListBox1.Items);
end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18748742
EddieShipman,

I am having problem with your function because I have some word with accent. Like

ação = action
sábia = knew
fábrica = factory

maybe your function doesn't work with accented words

Could you take a look at?

0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 18748968
ok, hold on.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 26

Expert Comment

by:EddieShipman
ID: 18749207
Seems the NextToken function of TParser does not handle extended chars.
0
 
LVL 1

Author Comment

by:hidrau
ID: 18750415
no way?
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 18750758
I found a free component on Torry's that seems to work OK, give it a try, here's a demo:

SQL Parser By Kei.
TQueryParserComp is a simple text parser which recognizes comments, strings, statement delimiters
and special characters.

Fully functional
Source: Included

Download: http://www.torry.net/db/access/db_sql/kqparser.zip

******************************************************************************************
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, ComCtrls, SQLParserComp;

type
  TForm1 = class(TForm)
    Button1: TButton;
    ListBox1: TListBox;
    RichEdit1: TRichEdit;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
    procedure QueryParserStatementDelimiter(Sender: TObject;
      SQLStatement: String);
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure Explode(myStr: String; List: TStrings);
var
  MyParser: TParser;
  MS: TMemoryStream;
begin
  MS := TMemoryStream.Create;
  MS.Position := 0;
  MS.Write(MyStr[1], Length(MyStr));
  MS.Position := 0;
  MyParser := TParser.Create(MS);
  MyStr := MyParser.TokenString;
  //
  while MyParser.Token <> toEOF do
  begin
    if MyParser.TokenSymbolIs(MyParser.TokenString) then
    begin
      MyStr := MyParser.TokenString;
      List.Add(MyStr);
    end;
    MyParser.NextToken;
  end;
  MyParser.Free;
  MS.Free;
end;

procedure TForm1.Button1Click(Sender: TObject);
  function AddToken(var AToken: String):Boolean;
  var
    i: Integer;
  begin
    Result := False;
    if (Length(Trim(AToken)) > 0) then
    begin
      for i := Length(AToken) downto 1 do
        if (AToken[i] in [',', '.', '"', '(', ')', '@']) then
          Delete(AToken, i, 1);
        if Length(Trim(AToken)) > 0 then
          Result := True;
    end;
  end;

var
  QueryParser: TQueryParserComp;
  sToken: String;
begin
  try
    QueryParser := TQueryParserComp.Create(nil);
    QueryParser.IsEOFStmtDelimiter := False;
    QueryParser.StringDelimiters   := #39'"';
    QueryParser.RemoveStrDelimiter := False;
    QueryParser.CountFromStatement := True;
    QueryParser.TextToParse := RichEdit1.Lines.Text;
    QueryParser.FirstToken;
    while not QueryParser.EOF do
    begin
      sToken := QueryParser.Token;
      if AddToken(sToken)then
        ListBox1.Items.Add(sToken);
      QueryParser.NextToken;
    end;
  finally
    ShowMessage(IntToStr(QueryParser.SymbolsCount));
    QueryParser.Free;
  end;
end;

procedure TForm1.QueryParserStatementDelimiter(Sender: TObject;
  SQLStatement: String);
begin
end;

end.
0
 
LVL 1

Author Comment

by:hidrau
ID: 18750983
EddieShipman:

If I knew that you will try with some component I would have advised you. it is not allowed to install component here in company. Sorry to say :(

it seems that solution is hard
0
 
LVL 28

Expert Comment

by:2266180
ID: 18751150
an aprox 5 min solution (includes starting up the IDE :) )

type TCharSet=set of char;

procedure moveto(r:TRichEdit; l:TStrings; strip:TCharSet);
var s,t:string;
    i:integer;
begin
  s:='';
  t:=r.Lines.Text;
  for i:=1 to length(t) do
    if not (t[i] in strip) then
      s:=s+t[i];
  l.Text:=s;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  moveto(richedit1, memo1.lines, ['"',',','!','@','?','/','''']);
end;

there is a lot to optimize here speedwise (not using a set but an array if char, using pchar for s and t, etc, but I concentrated on making the solution fast rather than maing a fast solution :) 1st important thing is to make it work :) (this is a pour excuse for the above algorithm :P )
0
 
LVL 1

Author Comment

by:hidrau
ID: 18751242
I have a text in my richedit and I need to create a listbox or a tstringlist of words from my text.
This Tstringlist or listbox can't have repeated words.

0
 
LVL 28

Accepted Solution

by:
2266180 earned 2000 total points
ID: 18751660
>This Tstringlist or listbox can't have repeated words.
this is a very important detail you forgot to mention.

and I missunderstood the word list part :)

here is an updated code:

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, ComCtrls;

type
  TForm1 = class(TForm)
    RichEdit1: TRichEdit;
    Button1: TButton;
    Memo1: TMemo;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

function isWordDelimiter(c:char):boolean;
begin
{  result:=(c<#48) or // calculated according to www.asciitable.com
            ( (c>#57) and
              ( (c<#65) or
                ( (c>#90) and
                  ( (c<#97) or
                    ( (c>#122) and
                      ( (c<#128) or
                        ( (c>#151) and
                          ( (c<#153) or
                            ( (c>#154) and
                              ( (c<#160) or
                                (c>#165)
                              )
                            )
                          )
                        )
                      )
                    )
                  )
                )
              )
            );{}
  result:=(c<#48) or // calculated according to www.lookuptables.com
            ( (c>#57) and
              ( (c<#65) or
                ( (c>#90) and
                  ( (c<#97) or
                    ( (c>#122) and
                      ( (c<#192) or
                        ( (c>#214) and
                          (c<#216)
                        )
                      )
                    )
                  )
                )
              )
            );{}
end;

function findWord(w:string; l:TStrings; caseSensitive:boolean):boolean;
var i:integer;
begin
  if caseSensitive then result:=l.IndexOf(w)>-1
                   else
  begin
    i:=0;
    while (i<l.Count) and (not SameText(l[i],w)) do
      inc(i);
    result:=i<l.Count;
  end;
end;

procedure moveto(r:TRichEdit; l:TStrings; caseSensitive:boolean);
var s,t,w:string;
    i:integer;
begin
  s:='';
  w:='';
  t:=r.Lines.Text;
  i:=1;
  while i<=length(t) do
  begin
    while (i<=length(t)) and (isWordDelimiter(t[i])) do
      inc(i);
    w:='';
    while (i<=length(t)) and (not isWordDelimiter(t[i])) do
    begin
      w:=w+t[i];
      inc(i);
    end;
    if (w<>'') and not findWord(w, l, caseSensitive)then
      l.add(w);
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  moveto(richedit1, memo1.lines, false);
end;

end.
0
 
LVL 1

Author Comment

by:hidrau
ID: 18751713
thanks ciuly

that was I needed.

0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 18751957
hidrau, I did not install the component, only had the source code avail in my path.
I created the component at run-time.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Navigation is an important part of web design from a usability perspective. But it is often a pain when it comes to a developer’s perspective. By navigation, it often means menuing. This is less theory and more practical of how to get a specific gro…
Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question