?
Solved

Getting number of times word appears in text file (500 points)

Posted on 2005-03-02
16
Medium Priority
?
278 Views
Last Modified: 2010-04-05
I want to search a text file for certain words which appear in a combobox (which are different every time the program is run).

I need to discover what word appears most in the text file. Example: "yellow" could be mentioned 15 times, "pink" could be mentioned 10 times and "blue" could be mentioned 25 times. In this case, I would want "blue" to be shown in Edit1.Text.

I am using Delph 6.  I am giving 500 points for this urgent question. Many thanks for your help.
0
Comment
Question by:rincewind666
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +5
16 Comments
 

Expert Comment

by:_JHL_
ID: 13444877
Hi

You need to create a parser for your Edit1, but first i have a q for you. You need to make a ranking of the words, or to search a particular word (any) but one per time

JHL.. (i`m working on you solution)
0
 
LVL 17

Expert Comment

by:TheRealLoki
ID: 13444999
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls, Grids;

type
  TForm1 = class(TForm)
    Button1: TButton;
    ComboBox1: TComboBox;
    Memo1: TMemo;
    Edit1: TEdit;
    procedure Button1Click(Sender: TObject);
    procedure FormCreate(Sender: TObject);
    procedure FormDestroy(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
    resultlist: TStringList;

  end;

var
  Form1: TForm1;
  function SortByValueNumber(List: TStringList; Index1, Index2: Integer): Integer;

IMPLEMENTATION

{$R *.DFM}
function GetNextDelim(var s: string): string;
    const
        delims: array[1..5] of char = (' ', '.', ',', '!', '?');
    var
        i: integer;
        temps: string;
        c: char;
    begin
        result := '';
        i := 1;
        temps := '';
        repeat
            c := s[1];
            if (c in [' ', '.', ',', '!', '?']) then
            begin
                result := temps;
            end
            else
            begin
                temps := temps + c;
            end;
            delete(s, 1, 1);
        until ( (result <> '') or (s = '') );
       
    end;

function SortByValueNumber(List: TStringList; Index1, Index2: Integer): Integer;
    begin
        if StrToIntDef(List.Values[List.Names[index1]], 0) > StrToIntDef(List.Values[List.Names[index2]], 0) then
          result := -1
        else if StrToIntDef(List.Values[List.Names[index1]], 0) < StrToIntDef(List.Values[List.Names[index2]], 0) then
          result := 1
        else
          result := 0;
    end;

procedure TForm1.Button1Click(Sender: TObject);
    var
        FullText: string;
        oneword: string;
    begin
       
        FullText := combobox1.items.Text;
        FullText := stringreplace(FullText, #13#10, ' ', [rfReplaceAll]); // convert all CR/LFs to spaces
        resultlist.Clear;
        while FullText <> '' do
        begin
            oneword := GetNextDelim(FullText);
            ResultList.Values[oneword] := IntToStr(StrToIntDef(ResultList.Values[oneword], 0) + 1);
        end;
        ResultList.CustomSort(SortByValueNumber);
        edit1.text := ResultList.Names[0];
        memo1.lines.assign(ResultList);

    end;

procedure TForm1.FormCreate(Sender: TObject);
    begin
        resultlist := TStringList.Create;
    end;

procedure TForm1.FormDestroy(Sender: TObject);
    begin
        resultlist.Clear;
        resultlist.Free;
    end;

end.



FORM
====

object Form1: TForm1
  Left = 274
  Top = 168
  Width = 519
  Height = 309
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  OnCreate = FormCreate
  OnDestroy = FormDestroy
  PixelsPerInch = 96
  TextHeight = 13
  object Button1: TButton
    Left = 16
    Top = 40
    Width = 75
    Height = 25
    Caption = 'Button1'
    TabOrder = 0
    OnClick = Button1Click
  end
  object ComboBox1: TComboBox
    Left = 16
    Top = 8
    Width = 145
    Height = 21
    ItemHeight = 13
    TabOrder = 1
    Text = 'ComboBox1'
    Items.Strings = (
      'the quick brown fox'
      'jumps over the'
      'lazy dog'
      'my dog has fleas')
  end
  object Memo1: TMemo
    Left = 176
    Top = 8
    Width = 305
    Height = 201
    Lines.Strings = (
      'Memo1')
    TabOrder = 2
  end
  object Edit1: TEdit
    Left = 16
    Top = 80
    Width = 121
    Height = 21
    TabOrder = 3
    Text = 'Edit1'
  end
end
0
 
LVL 21

Expert Comment

by:ziolko
ID: 13447649
this is typicall task  for use for BST (Binary Search Tree)

ziolko.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:rincewind666
ID: 13447840
TheRealLoki, I tried your code - very interesting but isn't what I need. I want to get the number of words that appear in a TEXT file that also appear in a combobox - not the number of times a word appears in a combobox.
0
 
LVL 12

Expert Comment

by:esoftbg
ID: 13448525
download an fantastic example from:
page:        http://www.geocities.com/esoftbg/
  link:        Q_21335504.zip        Getting number of times word appears in text file (500 points)

unit Unit1_Q_21335504;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, Grids, ValEdit;

type
  TForm1 = class(TForm)
    btn_Load_ComboBox: TButton;
    ComboBox: TComboBox;
    btn_Counting: TButton;
    Edit_Max_Count: TEdit;
    Edit_FileCombo: TEdit;
    ValueListEditor: TValueListEditor;
    procedure FormCreate(Sender: TObject);
    procedure btn_Load_ComboBoxClick(Sender: TObject);
    procedure btn_CountingClick(Sender: TObject);
    private
    { Private declarations }
    public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.FormCreate(Sender: TObject);
begin
  btn_Load_ComboBoxClick(Self);
end;

function  Removing_extra_spaces_from_Text(S: string): string;
var
  PC0:    PChar;
  PC1:    PChar;
  PC2:    PChar;
begin
  if (Length(S)>1) then
  begin
    PC0 := @S[1];
    PC2 := PC0+Length(S);
    PC1 := PC0+1;
    while not (PC1>PC2) do
    begin
      if not ((PC0^=' ') and (PC1^=' ')) then
      begin
        Inc(PC0);
        PC0^ := PC1^;
      end;
      Inc(PC1);
    end;
    PC1 := @S[1];
    Result:=copy(S,1, PC0-PC1);
  end
  else
    Result:=S;
end;

procedure Get_All_Words_From_Text(S: string; ComboBox: TComboBox);
var
  BP:     Integer;
  EP:     Integer;
  I:      Integer;
  L:      Integer;
  T:      string;
  procedure Get_A_Word;
  var
    J:      Integer;
    N:      Integer;
  begin
    N := 0;
    T := S;
    for J := BP to EP do
      if (S[J] in [#65..#255]) then
      begin
        Inc(N);
        T[N] := S[J];
      end;
    SetLength(T, N);
    ComboBox.Items.Add(T);
  end;
begin
  ComboBox.Clear;
  BP := 1;
  L := Length(S);
  for I := 1 to L do
  begin
    if (S[I]=#32) then
    begin
      EP := I-1;
      Get_A_Word;
      BP := I+1;
    end;
  end;
end;

procedure TForm1.btn_Load_ComboBoxClick(Sender: TObject);
var
  SL:     TStringList;
  S:      string;
begin
  SL := TStringList.Create;
  try
    SL.LoadFromFile(Edit_FileCombo.Text);
    S := Trim(SL.Text);
    S := Trim(Removing_extra_spaces_from_Text(S));
    Get_All_Words_From_Text(S, ComboBox);
  finally
    SL.Free;
  end;
end;

procedure Add_Words_into_ValueListEditor(SS: TStrings; VLEditor: TValueListEditor; Edit: TEdit);
var
  C:      Integer;
  I:      Integer;
  M:      Integer;
  R:      Integer;
  S:      string;
begin
  M := 0;
  S := '';
  VLEditor.Strings.Clear;
  for I := 0 to SS.Count-1 do
  begin
    if VLEditor.FindRow(SS[I], R) then
    begin
      C := StrToInt(VLEditor.Cells[1,R]);
      Inc(C);
      if (C>M) then
      begin
        S := SS[I];
        M := C;
      end;
      VLEditor.Cells[1,R] := IntToStr(C);
    end
    else
      VLEditor.InsertRow(SS[I], '1', True);
  end;
  Edit.Text := S + ' = ' + IntToStr(M);
end;

procedure TForm1.btn_CountingClick(Sender: TObject);
begin
  Add_Words_into_ValueListEditor(ComboBox.Items, ValueListEditor, Edit_Max_Count);
end;

end.
0
 
LVL 12

Expert Comment

by:Lee_Nover
ID: 13450053
fantastic example ? coding conventions ? and what about speed ?
the code bellow uses memory mapped files to open a file
this example show searching for one string
you can loop through your combobox and search for the current string
for string search it uses madshi's PosPChar which can be found in madBasic - www.madshi.net
(madshi: hope it's ok ? otherwise a moderator can delete your code part)

the code:

try it on a few MB file :-D

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

// from madBasic - www.madshi.net
var lowCharTable : array [#0..#$FF] of char =
 (#$00,#$01,#$02,#$03,#$04,#$05,#$06,#$07,#$08,#$09,#$0A,#$0B,#$0C,#$0D,#$0E,#$0F,
  #$10,#$11,#$12,#$13,#$14,#$15,#$16,#$17,#$18,#$19,#$1A,#$1B,#$1C,#$1D,#$1E,#$1F,
  #$20,#$21,#$22,#$23,#$24,#$25,#$26,#$27,#$28,#$29,#$2A,#$2B,#$2C,#$2D,#$2E,#$2F,
  #$30,#$31,#$32,#$33,#$34,#$35,#$36,#$37,#$38,#$39,#$3A,#$3B,#$3C,#$3D,#$3E,#$3F,
  #$40,#$61,#$62,#$63,#$64,#$65,#$66,#$67,#$68,#$69,#$6A,#$6B,#$6C,#$6D,#$6E,#$6F,
  #$70,#$71,#$72,#$73,#$74,#$75,#$76,#$77,#$78,#$79,#$7A,#$5B,#$5C,#$5D,#$5E,#$5F,
  #$60,#$61,#$62,#$63,#$64,#$65,#$66,#$67,#$68,#$69,#$6A,#$6B,#$6C,#$6D,#$6E,#$6F,
  #$70,#$71,#$72,#$73,#$74,#$75,#$76,#$77,#$78,#$79,#$7A,#$7B,#$7C,#$7D,#$7E,#$7F,
  #$80,#$81,#$82,#$83,#$84,#$85,#$86,#$87,#$88,#$89,#$9A,#$8B,#$9C,#$8D,#$9E,#$8F,
  #$90,#$91,#$92,#$93,#$94,#$95,#$96,#$97,#$98,#$99,#$9A,#$9B,#$9C,#$9D,#$9E,#$FF,
  #$A0,#$A1,#$A2,#$A3,#$A4,#$A5,#$A6,#$A7,#$A8,#$A9,#$AA,#$AB,#$AC,#$AD,#$AE,#$AF,
  #$B0,#$B1,#$B2,#$B3,#$B4,#$B5,#$B6,#$B7,#$B8,#$B9,#$BA,#$BB,#$BC,#$BD,#$BE,#$BF,
  #$E0,#$E1,#$E2,#$E3,#$E4,#$E5,#$E6,#$E7,#$E8,#$E9,#$EA,#$EB,#$EC,#$ED,#$EE,#$EF,
  #$F0,#$F1,#$F2,#$F3,#$F4,#$F5,#$F6,#$D7,#$F8,#$F9,#$FA,#$FB,#$FC,#$FD,#$FE,#$DF,
  #$E0,#$E1,#$E2,#$E3,#$E4,#$E5,#$E6,#$E7,#$E8,#$E9,#$EA,#$EB,#$EC,#$ED,#$EE,#$EF,
  #$F0,#$F1,#$F2,#$F3,#$F4,#$F5,#$F6,#$F7,#$F8,#$F9,#$FA,#$FB,#$FC,#$FD,#$FE,#$FF);

function PosPChar(subStr       : pchar;
                 str          : pchar;
                 subStrLen    : cardinal = 0;   // 0 -> StrLen is called internally
                 strLen       : cardinal = 0;
                 ignoreCase   : boolean  = false;
                 fromPos      : cardinal = 0;
                 toPos        : cardinal = high(cardinal)) : integer;

 function GetPCharLen(const pc: pchar) : cardinal; assembler;
 asm
   MOV     EDX,EDI
   MOV     EDI,EAX
   MOV     ECX,0FFFFFFFFH
   XOR     AL,AL
   REPNE   SCASB
   MOV     EAX,0FFFFFFFEH
   SUB     EAX,ECX
   MOV     EDI,EDX
 end;

var pc1, pc2, pc3, pc4, pc5, pc6 : pchar;
   c1                           : cardinal;
   ch1                          : char;
begin
 result := -1;
 if (subStr <> nil) and (subStr^ <> #0) and (str <> nil) and (str^ <> #0) then begin
   if subStrLen = 0 then subStrLen := GetPCharLen(subStr);
   if    strLen = 0 then    strLen := GetPCharLen(   str);
   dec(subStrLen);
   if strLen >= subStrLen then begin
     c1 := strLen - subStrLen;
     if ignoreCase then ch1 := lowCharTable[subStr^]
     else               ch1 :=              subStr^;
     if fromPos > toPos then begin
       if toPos <= c1 then begin
         if fromPos > c1 then fromPos := c1;
         pc1 := str + fromPos;
         pc2 := str +   toPos;
         pc3 := subStr + 1;
         pc4 := subStr + subStrLen;
         pc6 := pc3;
         if ignoreCase then begin
           while pc1 >= pc2 do
             if lowCharTable[pc1^] = ch1 then begin
               inc(pc1);
               pc5 := pc1;
               while (pc3 <= pc4) and (lowCharTable[pc1^] = lowCharTable[pc3^]) do begin
                 inc(pc1); inc(pc3);
               end;
               if pc3 > pc4 then begin
                 result := pc5 - pchar(str) - 1;
                 break;
               end;
               pc3 := pc6;
               pc1 := pc5 - 2;
             end else dec(pc1);
         end else
           while pc1 >= pc2 do
             if pc1^ = ch1 then begin
               inc(pc1);
               pc5 := pc1;
               while (pc3 <= pc4) and (pc1^ = pc3^) do begin
                 inc(pc1); inc(pc3);
               end;
               if pc3 > pc4 then begin
                 result := pc5 - pchar(str) - 1;
                 break;
               end;
               pc3 := pc6;
               pc1 := pc5 - 2;
             end else dec(pc1);
       end;
     end else
       if fromPos <= c1 then begin
         if toPos > c1 then toPos := c1;
         pc1 := str + fromPos;
         pc2 := str +   toPos;
         pc3 := subStr + 1;
         pc4 := subStr + subStrLen;
         pc6 := pc3;
         if ignoreCase then begin
           while pc1 <= pc2 do
             if lowCharTable[pc1^] = ch1 then begin
               inc(pc1);
               pc5 := pc1;
               while (pc3 <= pc4) and (lowCharTable[pc1^] = lowCharTable[pc3^]) do begin
                 inc(pc1); inc(pc3);
               end;
               if pc3 > pc4 then begin
                 result := pc5 - pchar(str) - 1;
                 break;
               end;
               pc3 := pc6;
               pc1 := pc5;
             end else inc(pc1);
         end else
           while pc1 <= pc2 do
             if pc1^ = ch1 then begin
               inc(pc1);
               pc5 := pc1;
               while (pc3 <= pc4) and (pc1^ = pc3^) do begin
                 inc(pc1); inc(pc3);
               end;
               if pc3 > pc4 then begin
                 result := pc5 - pchar(str) - 1;
                 break;
               end;
               pc3 := pc6;
               pc1 := pc5;
             end else inc(pc1);
       end;
   end;
 end;
end;

function StrCount(const AFile, AStr: string): Integer;
var
  fh, map: Cardinal;
  bufsize: Cardinal;
  bufpos: Cardinal;
  buf : Pointer;
  srch: PChar;
  srchLen: Integer;
begin
   Result:=-1;
   fh := CreateFile(PChar(AFile), GENERIC_READ, 0, nil, OPEN_EXISTING, 0, 0);
   if fh <> INVALID_HANDLE_VALUE then
   try
      map := CreateFileMapping(fh, nil, PAGE_READONLY, 0, 0, nil);
      if map <> 0 then
      try
         buf := MapViewOfFile(map, FILE_MAP_READ, 0, 0, 0);
         if buf <> nil then
         try
            bufpos:=0;
            Result:=0;
            srch:=PChar(AStr);
            srchLen:=Length(AStr);
            bufsize:=GetFileSize(fh, nil);
            repeat
              bufpos:=PosPChar(srch, buf, srchLen, bufsize, false, bufpos+1, bufsize);
              if bufpos < maxdword then
                 Inc(Result);
            until bufpos = maxdword;
         finally
            UnmapViewOfFile(buf);
         end;
      finally
         CloseHandle(map);
      end;
   finally
      CloseHandle(fh);
   end;
end;

procedure TForm1.Button1Click(Sender: TObject);
var ctime: Cardinal;
    cnt: Integer;
begin
     with TOpenDialog.Create(nil) do
     try
        if Execute then
        begin
          ctime:=GetTickCount;
          cnt:=StrCount(FileName, sLineBreak); // replace sLineBreak with your searched string
          ctime:=GetTickCount - ctime;
          ShowMessage(Format('Count: %d; Time: %dms', [cnt, ctime]));
        end;
     finally
        Free;
     end;
end;

end.
0
 
LVL 12

Expert Comment

by:esoftbg
ID: 13450243
> for string search it uses madshi's PosPChar which can be found in madBasic
Is that true that you wrote above ????
I do not use madshi's PosPChar nor any madshi's code at all !
0
 
LVL 6

Expert Comment

by:pritaeas
ID: 13450399
This should do the trick.

This function opens a file as a TFileStream. Each character is read. A word is seperated by delimiters like space, comma etc. An output parameter (AResults) is an array of integers containing the occurence of each word in AWords. AWords[0] correspondents with AResults[0].

Pritaeas

Code:

type
  TResultsArray = array of Integer;

function Occurence (const AFileName: String; AWords: TStrings; var AResults: TResultsArray): Boolean;
const
  delimiters: set of char = [#8..#32,',','.',';',':','''','"'];
var
  F: TFileStream;
  c: Char;
  s: String;
  i: Int64;
begin
  Result := False;
  if not(FileExists (AFileName)) then
    Exit;

  SetLength (AResults, AWords.Count);
  i := 0;
  while i < AWords.Count do
  begin
    AResults[i] := 0;
    inc (i);
  end;

  F := TFileStream.Create (AFileName, fmOpenRead);
  try
    F.Read(c,1);
    while F.Position < F.Size do
    begin
      while (c in delimiters) and
            (F.Position < F.Size) do
        F.Read(c,1);

      i := F.Position-1;
      while not(c in delimiters) and
            (F.Position < F.Size) do
        F.Read(c,1);

      if F.Position < F.Size then
      begin
        SetLength (s, F.Position - i - 1);
        F.Position := i;
        F.Read (Pointer(s)^, length(s));

        i := AWords.IndexOf(s);
        if i <> -1 then
          inc (AResults[i]);
      end;
    end;

    Result := True;
  finally
    F.Free;
  end;
end;


procedure TForm1.Button1Click(Sender: TObject);
var
  sl: TResultsArray;
  i,m: Integer;
begin
  try
    Occurence ('test.txt', ComboBox1.Items, sl);
  finally
    m := 0;
    for i := Low (sl) to High (sl) do
    begin
      if sl[i] > m then
      begin
        m := sl[i];
        Combobox1.ItemIndex := i;
      end;
    end;

    SetLength (sl, 0);
  end;
end;
0
 
LVL 17

Expert Comment

by:TheRealLoki
ID: 13451820
just mod my code so that FullText is the text from the file,

then simply go if combobox
begin
     oneword := GetNextDelim(FullText);
     if Combobox1.Items.IndexOf(oneword) <> -1 then
       ResultList.Values[oneword] := IntToStr(StrToIntDef(ResultList.Values[oneword], 0) + 1);

that way you are only getting words in Fulltext (from your file) that are in the combobox
0
 

Expert Comment

by:_JHL_
ID: 13496740
Hi.... any of the comments are OK??
0
 

Author Comment

by:rincewind666
ID: 14531600
Please accept my sincere apologies for this oversight. I thought I closed this. I'm afraid I must be getting absent minded in my old age...
0
 

Accepted Solution

by:
modulo earned 0 total points
ID: 14560768
PAQed with points refunded (500)

modulo
Community Support Moderator
0
 
LVL 17

Expert Comment

by:TheRealLoki
ID: 14566035
I don't understand. Were none of the answers right?
0

Featured Post

Enroll in August's Course of the Month

August's CompTIA IT Fundamentals course includes 19 hours of basic computer principle modules and prepares you for the certification exam. It's free for Premium Members, Team Accounts, and Qualified Experts!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
Introduction Raise your hands if you were as upset with FireMonkey as I was when I discovered that there was no TListview.  I use TListView in almost all of my applications I've written, and I was not going to compromise by resorting to TStringGrid…
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
Suggested Courses
Course of the Month11 days, 1 hour left to enroll

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question