• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 288
  • Last Modified:

Getting number of times word appears in text file (500 points)

I want to search a text file for certain words which appear in a combobox (which are different every time the program is run).

I need to discover what word appears most in the text file. Example: "yellow" could be mentioned 15 times, "pink" could be mentioned 10 times and "blue" could be mentioned 25 times. In this case, I would want "blue" to be shown in Edit1.Text.

I am using Delph 6.  I am giving 500 points for this urgent question. Many thanks for your help.
0
rincewind666
Asked:
rincewind666
  • 3
  • 2
  • 2
  • +5
1 Solution
 
_JHL_Commented:
Hi

You need to create a parser for your Edit1, but first i have a q for you. You need to make a ranking of the words, or to search a particular word (any) but one per time

JHL.. (i`m working on you solution)
0
 
TheRealLokiSenior DeveloperCommented:
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls, Grids;

type
  TForm1 = class(TForm)
    Button1: TButton;
    ComboBox1: TComboBox;
    Memo1: TMemo;
    Edit1: TEdit;
    procedure Button1Click(Sender: TObject);
    procedure FormCreate(Sender: TObject);
    procedure FormDestroy(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
    resultlist: TStringList;

  end;

var
  Form1: TForm1;
  function SortByValueNumber(List: TStringList; Index1, Index2: Integer): Integer;

IMPLEMENTATION

{$R *.DFM}
function GetNextDelim(var s: string): string;
    const
        delims: array[1..5] of char = (' ', '.', ',', '!', '?');
    var
        i: integer;
        temps: string;
        c: char;
    begin
        result := '';
        i := 1;
        temps := '';
        repeat
            c := s[1];
            if (c in [' ', '.', ',', '!', '?']) then
            begin
                result := temps;
            end
            else
            begin
                temps := temps + c;
            end;
            delete(s, 1, 1);
        until ( (result <> '') or (s = '') );
       
    end;

function SortByValueNumber(List: TStringList; Index1, Index2: Integer): Integer;
    begin
        if StrToIntDef(List.Values[List.Names[index1]], 0) > StrToIntDef(List.Values[List.Names[index2]], 0) then
          result := -1
        else if StrToIntDef(List.Values[List.Names[index1]], 0) < StrToIntDef(List.Values[List.Names[index2]], 0) then
          result := 1
        else
          result := 0;
    end;

procedure TForm1.Button1Click(Sender: TObject);
    var
        FullText: string;
        oneword: string;
    begin
       
        FullText := combobox1.items.Text;
        FullText := stringreplace(FullText, #13#10, ' ', [rfReplaceAll]); // convert all CR/LFs to spaces
        resultlist.Clear;
        while FullText <> '' do
        begin
            oneword := GetNextDelim(FullText);
            ResultList.Values[oneword] := IntToStr(StrToIntDef(ResultList.Values[oneword], 0) + 1);
        end;
        ResultList.CustomSort(SortByValueNumber);
        edit1.text := ResultList.Names[0];
        memo1.lines.assign(ResultList);

    end;

procedure TForm1.FormCreate(Sender: TObject);
    begin
        resultlist := TStringList.Create;
    end;

procedure TForm1.FormDestroy(Sender: TObject);
    begin
        resultlist.Clear;
        resultlist.Free;
    end;

end.



FORM
====

object Form1: TForm1
  Left = 274
  Top = 168
  Width = 519
  Height = 309
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  OnCreate = FormCreate
  OnDestroy = FormDestroy
  PixelsPerInch = 96
  TextHeight = 13
  object Button1: TButton
    Left = 16
    Top = 40
    Width = 75
    Height = 25
    Caption = 'Button1'
    TabOrder = 0
    OnClick = Button1Click
  end
  object ComboBox1: TComboBox
    Left = 16
    Top = 8
    Width = 145
    Height = 21
    ItemHeight = 13
    TabOrder = 1
    Text = 'ComboBox1'
    Items.Strings = (
      'the quick brown fox'
      'jumps over the'
      'lazy dog'
      'my dog has fleas')
  end
  object Memo1: TMemo
    Left = 176
    Top = 8
    Width = 305
    Height = 201
    Lines.Strings = (
      'Memo1')
    TabOrder = 2
  end
  object Edit1: TEdit
    Left = 16
    Top = 80
    Width = 121
    Height = 21
    TabOrder = 3
    Text = 'Edit1'
  end
end
0
 
ziolkoCommented:
this is typicall task  for use for BST (Binary Search Tree)

ziolko.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
rincewind666Author Commented:
TheRealLoki, I tried your code - very interesting but isn't what I need. I want to get the number of words that appear in a TEXT file that also appear in a combobox - not the number of times a word appears in a combobox.
0
 
esoftbgCommented:
download an fantastic example from:
page:        http://www.geocities.com/esoftbg/
  link:        Q_21335504.zip        Getting number of times word appears in text file (500 points)

unit Unit1_Q_21335504;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, Grids, ValEdit;

type
  TForm1 = class(TForm)
    btn_Load_ComboBox: TButton;
    ComboBox: TComboBox;
    btn_Counting: TButton;
    Edit_Max_Count: TEdit;
    Edit_FileCombo: TEdit;
    ValueListEditor: TValueListEditor;
    procedure FormCreate(Sender: TObject);
    procedure btn_Load_ComboBoxClick(Sender: TObject);
    procedure btn_CountingClick(Sender: TObject);
    private
    { Private declarations }
    public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.FormCreate(Sender: TObject);
begin
  btn_Load_ComboBoxClick(Self);
end;

function  Removing_extra_spaces_from_Text(S: string): string;
var
  PC0:    PChar;
  PC1:    PChar;
  PC2:    PChar;
begin
  if (Length(S)>1) then
  begin
    PC0 := @S[1];
    PC2 := PC0+Length(S);
    PC1 := PC0+1;
    while not (PC1>PC2) do
    begin
      if not ((PC0^=' ') and (PC1^=' ')) then
      begin
        Inc(PC0);
        PC0^ := PC1^;
      end;
      Inc(PC1);
    end;
    PC1 := @S[1];
    Result:=copy(S,1, PC0-PC1);
  end
  else
    Result:=S;
end;

procedure Get_All_Words_From_Text(S: string; ComboBox: TComboBox);
var
  BP:     Integer;
  EP:     Integer;
  I:      Integer;
  L:      Integer;
  T:      string;
  procedure Get_A_Word;
  var
    J:      Integer;
    N:      Integer;
  begin
    N := 0;
    T := S;
    for J := BP to EP do
      if (S[J] in [#65..#255]) then
      begin
        Inc(N);
        T[N] := S[J];
      end;
    SetLength(T, N);
    ComboBox.Items.Add(T);
  end;
begin
  ComboBox.Clear;
  BP := 1;
  L := Length(S);
  for I := 1 to L do
  begin
    if (S[I]=#32) then
    begin
      EP := I-1;
      Get_A_Word;
      BP := I+1;
    end;
  end;
end;

procedure TForm1.btn_Load_ComboBoxClick(Sender: TObject);
var
  SL:     TStringList;
  S:      string;
begin
  SL := TStringList.Create;
  try
    SL.LoadFromFile(Edit_FileCombo.Text);
    S := Trim(SL.Text);
    S := Trim(Removing_extra_spaces_from_Text(S));
    Get_All_Words_From_Text(S, ComboBox);
  finally
    SL.Free;
  end;
end;

procedure Add_Words_into_ValueListEditor(SS: TStrings; VLEditor: TValueListEditor; Edit: TEdit);
var
  C:      Integer;
  I:      Integer;
  M:      Integer;
  R:      Integer;
  S:      string;
begin
  M := 0;
  S := '';
  VLEditor.Strings.Clear;
  for I := 0 to SS.Count-1 do
  begin
    if VLEditor.FindRow(SS[I], R) then
    begin
      C := StrToInt(VLEditor.Cells[1,R]);
      Inc(C);
      if (C>M) then
      begin
        S := SS[I];
        M := C;
      end;
      VLEditor.Cells[1,R] := IntToStr(C);
    end
    else
      VLEditor.InsertRow(SS[I], '1', True);
  end;
  Edit.Text := S + ' = ' + IntToStr(M);
end;

procedure TForm1.btn_CountingClick(Sender: TObject);
begin
  Add_Words_into_ValueListEditor(ComboBox.Items, ValueListEditor, Edit_Max_Count);
end;

end.
0
 
Lee_NoverCommented:
fantastic example ? coding conventions ? and what about speed ?
the code bellow uses memory mapped files to open a file
this example show searching for one string
you can loop through your combobox and search for the current string
for string search it uses madshi's PosPChar which can be found in madBasic - www.madshi.net
(madshi: hope it's ok ? otherwise a moderator can delete your code part)

the code:

try it on a few MB file :-D

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

// from madBasic - www.madshi.net
var lowCharTable : array [#0..#$FF] of char =
 (#$00,#$01,#$02,#$03,#$04,#$05,#$06,#$07,#$08,#$09,#$0A,#$0B,#$0C,#$0D,#$0E,#$0F,
  #$10,#$11,#$12,#$13,#$14,#$15,#$16,#$17,#$18,#$19,#$1A,#$1B,#$1C,#$1D,#$1E,#$1F,
  #$20,#$21,#$22,#$23,#$24,#$25,#$26,#$27,#$28,#$29,#$2A,#$2B,#$2C,#$2D,#$2E,#$2F,
  #$30,#$31,#$32,#$33,#$34,#$35,#$36,#$37,#$38,#$39,#$3A,#$3B,#$3C,#$3D,#$3E,#$3F,
  #$40,#$61,#$62,#$63,#$64,#$65,#$66,#$67,#$68,#$69,#$6A,#$6B,#$6C,#$6D,#$6E,#$6F,
  #$70,#$71,#$72,#$73,#$74,#$75,#$76,#$77,#$78,#$79,#$7A,#$5B,#$5C,#$5D,#$5E,#$5F,
  #$60,#$61,#$62,#$63,#$64,#$65,#$66,#$67,#$68,#$69,#$6A,#$6B,#$6C,#$6D,#$6E,#$6F,
  #$70,#$71,#$72,#$73,#$74,#$75,#$76,#$77,#$78,#$79,#$7A,#$7B,#$7C,#$7D,#$7E,#$7F,
  #$80,#$81,#$82,#$83,#$84,#$85,#$86,#$87,#$88,#$89,#$9A,#$8B,#$9C,#$8D,#$9E,#$8F,
  #$90,#$91,#$92,#$93,#$94,#$95,#$96,#$97,#$98,#$99,#$9A,#$9B,#$9C,#$9D,#$9E,#$FF,
  #$A0,#$A1,#$A2,#$A3,#$A4,#$A5,#$A6,#$A7,#$A8,#$A9,#$AA,#$AB,#$AC,#$AD,#$AE,#$AF,
  #$B0,#$B1,#$B2,#$B3,#$B4,#$B5,#$B6,#$B7,#$B8,#$B9,#$BA,#$BB,#$BC,#$BD,#$BE,#$BF,
  #$E0,#$E1,#$E2,#$E3,#$E4,#$E5,#$E6,#$E7,#$E8,#$E9,#$EA,#$EB,#$EC,#$ED,#$EE,#$EF,
  #$F0,#$F1,#$F2,#$F3,#$F4,#$F5,#$F6,#$D7,#$F8,#$F9,#$FA,#$FB,#$FC,#$FD,#$FE,#$DF,
  #$E0,#$E1,#$E2,#$E3,#$E4,#$E5,#$E6,#$E7,#$E8,#$E9,#$EA,#$EB,#$EC,#$ED,#$EE,#$EF,
  #$F0,#$F1,#$F2,#$F3,#$F4,#$F5,#$F6,#$F7,#$F8,#$F9,#$FA,#$FB,#$FC,#$FD,#$FE,#$FF);

function PosPChar(subStr       : pchar;
                 str          : pchar;
                 subStrLen    : cardinal = 0;   // 0 -> StrLen is called internally
                 strLen       : cardinal = 0;
                 ignoreCase   : boolean  = false;
                 fromPos      : cardinal = 0;
                 toPos        : cardinal = high(cardinal)) : integer;

 function GetPCharLen(const pc: pchar) : cardinal; assembler;
 asm
   MOV     EDX,EDI
   MOV     EDI,EAX
   MOV     ECX,0FFFFFFFFH
   XOR     AL,AL
   REPNE   SCASB
   MOV     EAX,0FFFFFFFEH
   SUB     EAX,ECX
   MOV     EDI,EDX
 end;

var pc1, pc2, pc3, pc4, pc5, pc6 : pchar;
   c1                           : cardinal;
   ch1                          : char;
begin
 result := -1;
 if (subStr <> nil) and (subStr^ <> #0) and (str <> nil) and (str^ <> #0) then begin
   if subStrLen = 0 then subStrLen := GetPCharLen(subStr);
   if    strLen = 0 then    strLen := GetPCharLen(   str);
   dec(subStrLen);
   if strLen >= subStrLen then begin
     c1 := strLen - subStrLen;
     if ignoreCase then ch1 := lowCharTable[subStr^]
     else               ch1 :=              subStr^;
     if fromPos > toPos then begin
       if toPos <= c1 then begin
         if fromPos > c1 then fromPos := c1;
         pc1 := str + fromPos;
         pc2 := str +   toPos;
         pc3 := subStr + 1;
         pc4 := subStr + subStrLen;
         pc6 := pc3;
         if ignoreCase then begin
           while pc1 >= pc2 do
             if lowCharTable[pc1^] = ch1 then begin
               inc(pc1);
               pc5 := pc1;
               while (pc3 <= pc4) and (lowCharTable[pc1^] = lowCharTable[pc3^]) do begin
                 inc(pc1); inc(pc3);
               end;
               if pc3 > pc4 then begin
                 result := pc5 - pchar(str) - 1;
                 break;
               end;
               pc3 := pc6;
               pc1 := pc5 - 2;
             end else dec(pc1);
         end else
           while pc1 >= pc2 do
             if pc1^ = ch1 then begin
               inc(pc1);
               pc5 := pc1;
               while (pc3 <= pc4) and (pc1^ = pc3^) do begin
                 inc(pc1); inc(pc3);
               end;
               if pc3 > pc4 then begin
                 result := pc5 - pchar(str) - 1;
                 break;
               end;
               pc3 := pc6;
               pc1 := pc5 - 2;
             end else dec(pc1);
       end;
     end else
       if fromPos <= c1 then begin
         if toPos > c1 then toPos := c1;
         pc1 := str + fromPos;
         pc2 := str +   toPos;
         pc3 := subStr + 1;
         pc4 := subStr + subStrLen;
         pc6 := pc3;
         if ignoreCase then begin
           while pc1 <= pc2 do
             if lowCharTable[pc1^] = ch1 then begin
               inc(pc1);
               pc5 := pc1;
               while (pc3 <= pc4) and (lowCharTable[pc1^] = lowCharTable[pc3^]) do begin
                 inc(pc1); inc(pc3);
               end;
               if pc3 > pc4 then begin
                 result := pc5 - pchar(str) - 1;
                 break;
               end;
               pc3 := pc6;
               pc1 := pc5;
             end else inc(pc1);
         end else
           while pc1 <= pc2 do
             if pc1^ = ch1 then begin
               inc(pc1);
               pc5 := pc1;
               while (pc3 <= pc4) and (pc1^ = pc3^) do begin
                 inc(pc1); inc(pc3);
               end;
               if pc3 > pc4 then begin
                 result := pc5 - pchar(str) - 1;
                 break;
               end;
               pc3 := pc6;
               pc1 := pc5;
             end else inc(pc1);
       end;
   end;
 end;
end;

function StrCount(const AFile, AStr: string): Integer;
var
  fh, map: Cardinal;
  bufsize: Cardinal;
  bufpos: Cardinal;
  buf : Pointer;
  srch: PChar;
  srchLen: Integer;
begin
   Result:=-1;
   fh := CreateFile(PChar(AFile), GENERIC_READ, 0, nil, OPEN_EXISTING, 0, 0);
   if fh <> INVALID_HANDLE_VALUE then
   try
      map := CreateFileMapping(fh, nil, PAGE_READONLY, 0, 0, nil);
      if map <> 0 then
      try
         buf := MapViewOfFile(map, FILE_MAP_READ, 0, 0, 0);
         if buf <> nil then
         try
            bufpos:=0;
            Result:=0;
            srch:=PChar(AStr);
            srchLen:=Length(AStr);
            bufsize:=GetFileSize(fh, nil);
            repeat
              bufpos:=PosPChar(srch, buf, srchLen, bufsize, false, bufpos+1, bufsize);
              if bufpos < maxdword then
                 Inc(Result);
            until bufpos = maxdword;
         finally
            UnmapViewOfFile(buf);
         end;
      finally
         CloseHandle(map);
      end;
   finally
      CloseHandle(fh);
   end;
end;

procedure TForm1.Button1Click(Sender: TObject);
var ctime: Cardinal;
    cnt: Integer;
begin
     with TOpenDialog.Create(nil) do
     try
        if Execute then
        begin
          ctime:=GetTickCount;
          cnt:=StrCount(FileName, sLineBreak); // replace sLineBreak with your searched string
          ctime:=GetTickCount - ctime;
          ShowMessage(Format('Count: %d; Time: %dms', [cnt, ctime]));
        end;
     finally
        Free;
     end;
end;

end.
0
 
esoftbgCommented:
> for string search it uses madshi's PosPChar which can be found in madBasic
Is that true that you wrote above ????
I do not use madshi's PosPChar nor any madshi's code at all !
0
 
pritaeasSoftware EngineerCommented:
This should do the trick.

This function opens a file as a TFileStream. Each character is read. A word is seperated by delimiters like space, comma etc. An output parameter (AResults) is an array of integers containing the occurence of each word in AWords. AWords[0] correspondents with AResults[0].

Pritaeas

Code:

type
  TResultsArray = array of Integer;

function Occurence (const AFileName: String; AWords: TStrings; var AResults: TResultsArray): Boolean;
const
  delimiters: set of char = [#8..#32,',','.',';',':','''','"'];
var
  F: TFileStream;
  c: Char;
  s: String;
  i: Int64;
begin
  Result := False;
  if not(FileExists (AFileName)) then
    Exit;

  SetLength (AResults, AWords.Count);
  i := 0;
  while i < AWords.Count do
  begin
    AResults[i] := 0;
    inc (i);
  end;

  F := TFileStream.Create (AFileName, fmOpenRead);
  try
    F.Read(c,1);
    while F.Position < F.Size do
    begin
      while (c in delimiters) and
            (F.Position < F.Size) do
        F.Read(c,1);

      i := F.Position-1;
      while not(c in delimiters) and
            (F.Position < F.Size) do
        F.Read(c,1);

      if F.Position < F.Size then
      begin
        SetLength (s, F.Position - i - 1);
        F.Position := i;
        F.Read (Pointer(s)^, length(s));

        i := AWords.IndexOf(s);
        if i <> -1 then
          inc (AResults[i]);
      end;
    end;

    Result := True;
  finally
    F.Free;
  end;
end;


procedure TForm1.Button1Click(Sender: TObject);
var
  sl: TResultsArray;
  i,m: Integer;
begin
  try
    Occurence ('test.txt', ComboBox1.Items, sl);
  finally
    m := 0;
    for i := Low (sl) to High (sl) do
    begin
      if sl[i] > m then
      begin
        m := sl[i];
        Combobox1.ItemIndex := i;
      end;
    end;

    SetLength (sl, 0);
  end;
end;
0
 
TheRealLokiSenior DeveloperCommented:
just mod my code so that FullText is the text from the file,

then simply go if combobox
begin
     oneword := GetNextDelim(FullText);
     if Combobox1.Items.IndexOf(oneword) <> -1 then
       ResultList.Values[oneword] := IntToStr(StrToIntDef(ResultList.Values[oneword], 0) + 1);

that way you are only getting words in Fulltext (from your file) that are in the combobox
0
 
_JHL_Commented:
Hi.... any of the comments are OK??
0
 
rincewind666Author Commented:
Please accept my sincere apologies for this oversight. I thought I closed this. I'm afraid I must be getting absent minded in my old age...
0
 
moduloCommented:
PAQed with points refunded (500)

modulo
Community Support Moderator
0
 
TheRealLokiSenior DeveloperCommented:
I don't understand. Were none of the answers right?
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 3
  • 2
  • 2
  • +5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now