Link to home
Start Free TrialLog in
Avatar of tankergoblin
tankergoblin

asked on

how to count words from file

i want to read a content from file
my content may have "," space "."

how to count words from file

say i have this

Hi! Want to tell you a story:
There once was a bear, it lived in a forest and the bear love to eat meat. He has a friend named Sticky.  When she came, the bear went to the beach and danced with some noodles on his head in a forest happily ever after.

Story end? Yes.
Avatar of Geert G
Geert G
Flag of Belgium image

Of course you can change cDelimiter set :)

function WordCount(const pFileName: String): Integer;
const
  cDelimiters = [#0..#31,' ','.',',','?','(',')','[',']','\','/'];
var
  fFile: TFileStream;
  vBuffer: array[0..1023] of char;
  vWord: Boolean;
  vi, vBufSize: Integer;
begin
  result := 0;
  if not FileExists(pFileName) then exit;
  try
    fFile := TFileStream.Create(pFileName,fmOpenRead);
  except
    exit; //Acces denied for file or other errors
  end;
  try
    //Reading from file and counting;
    vWord := false;
    while fFile.Position < fFile.Size do
    begin
      vBufSize := fFile.Read(vBuffer,sizeOf(vBuffer));    
 
      for vi := 0 to vBufSize do
      begin
        if not(vBuffer[vi] in cDelimiters) then
          vWord := true
        else begin
          if vWord then           
            inc(result);
          vWord := false;          
        end;                   
      end;
    end;
  finally
    FreeAndNil(fFile);
  end;
end;

Open in new window

Avatar of tankergoblin
tankergoblin

ASKER

500 points for 2 links?

i do as below

program below work with space example

how do you go

but if i add ?,! example
hi! where are you going?
Above does not work with my word count program.



var 
 FileWordList,FilenameList: TStringList;
 Filename: string; 
 i: integer;
begin
 FilenameList := TStringList.Create;
 FileWordList := TStringList.Create;
 
 OpenDialog := TOpenDialog.create(self);
 
 if openDialog.execute then
 begin
  Filename := OpenDialog.Filename;
  FilenameList.LoadFromFile(Filename);
 end;
 OpenDialog.Free;
 
 for i = 0 to FilenameList.count-1 do
 begin
  FileWordList.DelimitedText := FilenameList[i];
 end;
 showmessage(intToStr(FileWordList.count));

Open in new window

sorry is *below
Loading files to TStringList are ok, but only for small files. If you will have large file it will be inefficient.
I prefer using FileStreams like in my example, where file is loaded while counting process. In my opinion it is much faster than TStringList.LoadFromFile, on large files
also say i have two line

how are you
going to city

if will only read second line
how to fix
>>500 points for 2 links?
the second link i thought was exactly what you needed

and do you really expect me to copy all the code in here ?
I could off course write my own interpretation, but why should i reinvent the wheel all over again ?

are you saying that when you start a totally new application,
you don't use any code of your previous applications ?

so you basically reinvent the wheel every time ?

there is nothing wrong with providing a link to very good code,
and very well documented too ...
how about usage of memory .
I think using Tstringlist u store everything in an object that allow you not to read the file every time you need it.
I think this will make it more faster.
Further more the code is easy to write and shorter .
Just that i can only execute last line .
how to fix
Below is yours code which will count all rows, but I think it is still wrong, because text like:
how are you.Going to city

will be counted as 5 words, because "you.Going" is one word for it and you can't do anything with that because TStringList can have only one delimiter char.
Sample I gave you few post earlier will handle that case.

var
 FileWordList,FilenameList: TStringList;
 Filename: string;
 i: integer;
 vResult: Integer;
begin
 FilenameList := TStringList.Create;
 FileWordList := TStringList.Create;
 
 OpenDialog := TOpenDialog.create(self);
 
 if openDialog.execute then
 begin
  Filename := OpenDialog.Filename;
  FilenameList.LoadFromFile(Filename);
 end;
 OpenDialog.Free;
 
 vresult := 0;
 for i := 0 to FilenameList.count-1 do
 begin  
   FileWordList.DelimitedText := FilenameList[i];
   vResult := vResult + FileWordList.count;
 end;
 showmessage(intToStr(vResult));
also you store a bunch of character in array where it takes space
cDelimiters array can by modify by you.
Chars #0..#31 are non printable chars like
#10 - end of line
#13 - Return
#8 - Tabulator
etc.
there is also defined ' ' char which means space. You can put there any chars you like.
I wouldn't delete #8, #10, #13, ' ', '.', ',' but it is your choice :)
#8 is backspace
#9 is tab

just so you know ...
My mistake, sorry.
dprochownik:
i get you point
and i had try you code and mine as below

Inc(vResult,FileWordList.Count); which i think is equivalent to your
vResult := vResult + FileWordList.count;

However as you said earlier if i put my string as

"how are you.In good condition"

it will read as 5 instead of 6
how to solve?


set dot '.' as delimiter - add it to cDelimiters array. If you do so, algorithm will treats dots as good as spaces and 'you.In' will be two words not one.
Thats why I advice you to add  to cDelimiter set as many characters as possible '(', ')', '?' etc, because there is always a propability that somene won't put space after '?'.
>>tankergoblin
also you store a bunch of character in array where it takes space

what is wrong with that ?
Of course you can use other approach as in code below. You can declare set of characters wchich can be used in words and all other characters will be treaten as delimiters, but you have to know that this is VERY DANGEROUS, because you have to declare set of all allowed characters, what can be hard to do if yours app will be used on text written on different keyboard language than yours.
In this approach if source text can be written in language other than english, you have to add to set cAllowed all national characters too.

function WordCount(const pFileName: String): Integer;
const
  cAllowed = ['a'..'z','A'..'Z','-'];
var
  fFile: TFileStream;
  vBuffer: array[0..1023] of char;
  vWord: Boolean;
  vi, vBufSize: Integer;
begin
  result := 0;
  if not FileExists(pFileName) then exit;
  try
    fFile := TFileStream.Create(pFileName,fmOpenRead);
  except
    exit; //Acces denied for file or other errors
  end;
  try
    //Reading from file and counting;
    vWord := false;
    while fFile.Position < fFile.Size do
    begin
      vBufSize := fFile.Read(vBuffer,sizeOf(vBuffer));
 
      for vi := 0 to vBufSize do
      begin
        if vBuffer[vi] in cAllowed then
          vWord := true
        else begin
          if vWord then
            inc(result);
          vWord := false;
        end;
      end;
    end;
  finally
    FreeAndNil(fFile);
  end;
end;

Open in new window

how can cdelimiter apply in my code?
ASKER CERTIFIED SOLUTION
Avatar of dprochownik
dprochownik
Flag of Poland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial