Solved

Extract all words from a richedit and insert into a stringlist

Posted on 2007-03-19
12
434 Views
Last Modified: 2010-04-05
Hello guys,

I have a text in a richedit and I would like a function where I could get all words from my richedit and insert into a stringlist. this function need to take away the simbols ",!@?/'etc.

Thanks
0
Comment
Question by:hidrau
  • 5
  • 5
  • 2
12 Comments
 
LVL 26

Expert Comment

by:EddieShipman
ID: 18748573
Try something like this:

procedure Explode(myStr: String; List: TStrings);
var
  MyParser: TParser;
  MS: TMemoryStream;
begin
  MS := TMemoryStream.Create;
  MS.Position := 0;
  MS.Write(MyStr[1], Length(MyStr));
  MS.Position := 0;
  MyParser := TParser.Create(MS);
  MyStr := MyParser.TokenString;
  while MyParser.Token <> toEOF do
  begin
    if MyParser.TokenSymbolIs(MyParser.TokenString) then
    begin
      MyStr := MyParser.TokenString;
      List.Add(MyStr);
    end;
    MyParser.NextToken;
  end;
  MyParser.Free;
  MS.Free;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  Explode(StringReplace(RichEdit1.Lines.Text, '''', '''''', [rfReplaceAll]), ListBox1.Items);
end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18748742
EddieShipman,

I am having problem with your function because I have some word with accent. Like

ação = action
sábia = knew
fábrica = factory

maybe your function doesn't work with accented words

Could you take a look at?

0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 18748968
ok, hold on.
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 18749207
Seems the NextToken function of TParser does not handle extended chars.
0
 
LVL 1

Author Comment

by:hidrau
ID: 18750415
no way?
0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 18750758
I found a free component on Torry's that seems to work OK, give it a try, here's a demo:

SQL Parser By Kei.
TQueryParserComp is a simple text parser which recognizes comments, strings, statement delimiters
and special characters.

Fully functional
Source: Included

Download: http://www.torry.net/db/access/db_sql/kqparser.zip

******************************************************************************************
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, ComCtrls, SQLParserComp;

type
  TForm1 = class(TForm)
    Button1: TButton;
    ListBox1: TListBox;
    RichEdit1: TRichEdit;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
    procedure QueryParserStatementDelimiter(Sender: TObject;
      SQLStatement: String);
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure Explode(myStr: String; List: TStrings);
var
  MyParser: TParser;
  MS: TMemoryStream;
begin
  MS := TMemoryStream.Create;
  MS.Position := 0;
  MS.Write(MyStr[1], Length(MyStr));
  MS.Position := 0;
  MyParser := TParser.Create(MS);
  MyStr := MyParser.TokenString;
  //
  while MyParser.Token <> toEOF do
  begin
    if MyParser.TokenSymbolIs(MyParser.TokenString) then
    begin
      MyStr := MyParser.TokenString;
      List.Add(MyStr);
    end;
    MyParser.NextToken;
  end;
  MyParser.Free;
  MS.Free;
end;

procedure TForm1.Button1Click(Sender: TObject);
  function AddToken(var AToken: String):Boolean;
  var
    i: Integer;
  begin
    Result := False;
    if (Length(Trim(AToken)) > 0) then
    begin
      for i := Length(AToken) downto 1 do
        if (AToken[i] in [',', '.', '"', '(', ')', '@']) then
          Delete(AToken, i, 1);
        if Length(Trim(AToken)) > 0 then
          Result := True;
    end;
  end;

var
  QueryParser: TQueryParserComp;
  sToken: String;
begin
  try
    QueryParser := TQueryParserComp.Create(nil);
    QueryParser.IsEOFStmtDelimiter := False;
    QueryParser.StringDelimiters   := #39'"';
    QueryParser.RemoveStrDelimiter := False;
    QueryParser.CountFromStatement := True;
    QueryParser.TextToParse := RichEdit1.Lines.Text;
    QueryParser.FirstToken;
    while not QueryParser.EOF do
    begin
      sToken := QueryParser.Token;
      if AddToken(sToken)then
        ListBox1.Items.Add(sToken);
      QueryParser.NextToken;
    end;
  finally
    ShowMessage(IntToStr(QueryParser.SymbolsCount));
    QueryParser.Free;
  end;
end;

procedure TForm1.QueryParserStatementDelimiter(Sender: TObject;
  SQLStatement: String);
begin
end;

end.
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 1

Author Comment

by:hidrau
ID: 18750983
EddieShipman:

If I knew that you will try with some component I would have advised you. it is not allowed to install component here in company. Sorry to say :(

it seems that solution is hard
0
 
LVL 28

Expert Comment

by:ciuly
ID: 18751150
an aprox 5 min solution (includes starting up the IDE :) )

type TCharSet=set of char;

procedure moveto(r:TRichEdit; l:TStrings; strip:TCharSet);
var s,t:string;
    i:integer;
begin
  s:='';
  t:=r.Lines.Text;
  for i:=1 to length(t) do
    if not (t[i] in strip) then
      s:=s+t[i];
  l.Text:=s;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  moveto(richedit1, memo1.lines, ['"',',','!','@','?','/','''']);
end;

there is a lot to optimize here speedwise (not using a set but an array if char, using pchar for s and t, etc, but I concentrated on making the solution fast rather than maing a fast solution :) 1st important thing is to make it work :) (this is a pour excuse for the above algorithm :P )
0
 
LVL 1

Author Comment

by:hidrau
ID: 18751242
I have a text in my richedit and I need to create a listbox or a tstringlist of words from my text.
This Tstringlist or listbox can't have repeated words.

0
 
LVL 28

Accepted Solution

by:
ciuly earned 500 total points
ID: 18751660
>This Tstringlist or listbox can't have repeated words.
this is a very important detail you forgot to mention.

and I missunderstood the word list part :)

here is an updated code:

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, ComCtrls;

type
  TForm1 = class(TForm)
    RichEdit1: TRichEdit;
    Button1: TButton;
    Memo1: TMemo;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

function isWordDelimiter(c:char):boolean;
begin
{  result:=(c<#48) or // calculated according to www.asciitable.com
            ( (c>#57) and
              ( (c<#65) or
                ( (c>#90) and
                  ( (c<#97) or
                    ( (c>#122) and
                      ( (c<#128) or
                        ( (c>#151) and
                          ( (c<#153) or
                            ( (c>#154) and
                              ( (c<#160) or
                                (c>#165)
                              )
                            )
                          )
                        )
                      )
                    )
                  )
                )
              )
            );{}
  result:=(c<#48) or // calculated according to www.lookuptables.com
            ( (c>#57) and
              ( (c<#65) or
                ( (c>#90) and
                  ( (c<#97) or
                    ( (c>#122) and
                      ( (c<#192) or
                        ( (c>#214) and
                          (c<#216)
                        )
                      )
                    )
                  )
                )
              )
            );{}
end;

function findWord(w:string; l:TStrings; caseSensitive:boolean):boolean;
var i:integer;
begin
  if caseSensitive then result:=l.IndexOf(w)>-1
                   else
  begin
    i:=0;
    while (i<l.Count) and (not SameText(l[i],w)) do
      inc(i);
    result:=i<l.Count;
  end;
end;

procedure moveto(r:TRichEdit; l:TStrings; caseSensitive:boolean);
var s,t,w:string;
    i:integer;
begin
  s:='';
  w:='';
  t:=r.Lines.Text;
  i:=1;
  while i<=length(t) do
  begin
    while (i<=length(t)) and (isWordDelimiter(t[i])) do
      inc(i);
    w:='';
    while (i<=length(t)) and (not isWordDelimiter(t[i])) do
    begin
      w:=w+t[i];
      inc(i);
    end;
    if (w<>'') and not findWord(w, l, caseSensitive)then
      l.add(w);
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  moveto(richedit1, memo1.lines, false);
end;

end.
0
 
LVL 1

Author Comment

by:hidrau
ID: 18751713
thanks ciuly

that was I needed.

0
 
LVL 26

Expert Comment

by:EddieShipman
ID: 18751957
hidrau, I did not install the component, only had the source code avail in my path.
I created the component at run-time.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

This article will show, step by step, how to integrate R code into a R Sweave document
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now