how to count words from file

i want to read a content from file
my content may have "," space "."

how to count words from file

say i have this

Hi! Want to tell you a story:
There once was a bear, it lived in a forest and the bear love to eat meat. He has a friend named Sticky.  When she came, the bear went to the beach and danced with some noodles on his head in a forest happily ever after.

Story end? Yes.
LVL 7
tankergoblinAsked:
Who is Participating?
 
dprochownikConnect With a Mentor Commented:
In yours code you could do that as below, but in my opininon it is completelly inefficient because:
  1. yours code copys all of file data into memory when  FileWordList.LoadFromFile(OpenDialog.Filename);
    so memory manager has to assing quite large space in memory for larger files,
  2. processing TStringList.DelimiterText := ...... reads all of these data, and you have to do this for all delimiters so yours code reads all block of data and process it as many times as many delimiters have been declared.
Why don't you just copy code from my first post which:
  1. holds maximum 1024 bytes of data in memory at once (using TStringStream),
  2. reads data and checks for all delimiters only once,
so it is much more efficient.

const
  cDelimiters: array[0..41] of char = (#0,#1,#2,#3,#4,#5,#6,#7,#8,#9,#10,#11,#12,#13,#14,#15,#16,#17,#18,#19,#20,#21,#22,#23,#24,#25,#26,#27,#28,#29,#30,#31,
' ','.',',','?','(',')','[',']','\','/')
var
  FileWordList: TStringList;
  i: integer;
  OpenDialog: TOpenDialog;
begin
  FileWordList := TStringList.Create;
  try
    OpenDialog := TOpenDialog.create(self);
    try
      if openDialog.execute then
        FileWordList.LoadFromFile(OpenDialog.Filename);
    finally
      Opendialog.Free;
    end;
    
    for i := 0 to high(cDelimiters) do
    begin
      FileWordList.Delimiter := cDelimiters[i];
      FileWordList.DelimitedText := FileWordList.Text;
    end;
 
    showmessage(intToStr(FileWordList.Count));
  finally
    FileWordList.Free;
  end;

Open in new window

0
 
dprochownikCommented:
Of course you can change cDelimiter set :)

function WordCount(const pFileName: String): Integer;
const
  cDelimiters = [#0..#31,' ','.',',','?','(',')','[',']','\','/'];
var
  fFile: TFileStream;
  vBuffer: array[0..1023] of char;
  vWord: Boolean;
  vi, vBufSize: Integer;
begin
  result := 0;
  if not FileExists(pFileName) then exit;
  try
    fFile := TFileStream.Create(pFileName,fmOpenRead);
  except
    exit; //Acces denied for file or other errors
  end;
  try
    //Reading from file and counting;
    vWord := false;
    while fFile.Position < fFile.Size do
    begin
      vBufSize := fFile.Read(vBuffer,sizeOf(vBuffer));    
 
      for vi := 0 to vBufSize do
      begin
        if not(vBuffer[vi] in cDelimiters) then
          vWord := true
        else begin
          if vWord then           
            inc(result);
          vWord := false;          
        end;                   
      end;
    end;
  finally
    FreeAndNil(fFile);
  end;
end;

Open in new window

0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
tankergoblinAuthor Commented:
500 points for 2 links?

i do as below

program below work with space example

how do you go

but if i add ?,! example
hi! where are you going?
Above does not work with my word count program.



var 
 FileWordList,FilenameList: TStringList;
 Filename: string; 
 i: integer;
begin
 FilenameList := TStringList.Create;
 FileWordList := TStringList.Create;
 
 OpenDialog := TOpenDialog.create(self);
 
 if openDialog.execute then
 begin
  Filename := OpenDialog.Filename;
  FilenameList.LoadFromFile(Filename);
 end;
 OpenDialog.Free;
 
 for i = 0 to FilenameList.count-1 do
 begin
  FileWordList.DelimitedText := FilenameList[i];
 end;
 showmessage(intToStr(FileWordList.count));

Open in new window

0
 
tankergoblinAuthor Commented:
sorry is *below
0
 
dprochownikCommented:
Loading files to TStringList are ok, but only for small files. If you will have large file it will be inefficient.
I prefer using FileStreams like in my example, where file is loaded while counting process. In my opinion it is much faster than TStringList.LoadFromFile, on large files
0
 
tankergoblinAuthor Commented:
also say i have two line

how are you
going to city

if will only read second line
how to fix
0
 
Geert GOracle dbaCommented:
>>500 points for 2 links?
the second link i thought was exactly what you needed

and do you really expect me to copy all the code in here ?
I could off course write my own interpretation, but why should i reinvent the wheel all over again ?

are you saying that when you start a totally new application,
you don't use any code of your previous applications ?

so you basically reinvent the wheel every time ?

there is nothing wrong with providing a link to very good code,
and very well documented too ...
0
 
tankergoblinAuthor Commented:
how about usage of memory .
I think using Tstringlist u store everything in an object that allow you not to read the file every time you need it.
I think this will make it more faster.
Further more the code is easy to write and shorter .
Just that i can only execute last line .
how to fix
0
 
dprochownikCommented:
Below is yours code which will count all rows, but I think it is still wrong, because text like:
how are you.Going to city

will be counted as 5 words, because "you.Going" is one word for it and you can't do anything with that because TStringList can have only one delimiter char.
Sample I gave you few post earlier will handle that case.

var
 FileWordList,FilenameList: TStringList;
 Filename: string;
 i: integer;
 vResult: Integer;
begin
 FilenameList := TStringList.Create;
 FileWordList := TStringList.Create;
 
 OpenDialog := TOpenDialog.create(self);
 
 if openDialog.execute then
 begin
  Filename := OpenDialog.Filename;
  FilenameList.LoadFromFile(Filename);
 end;
 OpenDialog.Free;
 
 vresult := 0;
 for i := 0 to FilenameList.count-1 do
 begin  
   FileWordList.DelimitedText := FilenameList[i];
   vResult := vResult + FileWordList.count;
 end;
 showmessage(intToStr(vResult));
0
 
tankergoblinAuthor Commented:
also you store a bunch of character in array where it takes space
0
 
dprochownikCommented:
cDelimiters array can by modify by you.
Chars #0..#31 are non printable chars like
#10 - end of line
#13 - Return
#8 - Tabulator
etc.
there is also defined ' ' char which means space. You can put there any chars you like.
I wouldn't delete #8, #10, #13, ' ', '.', ',' but it is your choice :)
0
 
Geert GOracle dbaCommented:
#8 is backspace
#9 is tab

just so you know ...
0
 
dprochownikCommented:
My mistake, sorry.
0
 
tankergoblinAuthor Commented:
dprochownik:
i get you point
and i had try you code and mine as below

Inc(vResult,FileWordList.Count); which i think is equivalent to your
vResult := vResult + FileWordList.count;

However as you said earlier if i put my string as

"how are you.In good condition"

it will read as 5 instead of 6
how to solve?


0
 
dprochownikCommented:
set dot '.' as delimiter - add it to cDelimiters array. If you do so, algorithm will treats dots as good as spaces and 'you.In' will be two words not one.
Thats why I advice you to add  to cDelimiter set as many characters as possible '(', ')', '?' etc, because there is always a propability that somene won't put space after '?'.
0
 
Geert GOracle dbaCommented:
>>tankergoblin
also you store a bunch of character in array where it takes space

what is wrong with that ?
0
 
dprochownikCommented:
Of course you can use other approach as in code below. You can declare set of characters wchich can be used in words and all other characters will be treaten as delimiters, but you have to know that this is VERY DANGEROUS, because you have to declare set of all allowed characters, what can be hard to do if yours app will be used on text written on different keyboard language than yours.
In this approach if source text can be written in language other than english, you have to add to set cAllowed all national characters too.

function WordCount(const pFileName: String): Integer;
const
  cAllowed = ['a'..'z','A'..'Z','-'];
var
  fFile: TFileStream;
  vBuffer: array[0..1023] of char;
  vWord: Boolean;
  vi, vBufSize: Integer;
begin
  result := 0;
  if not FileExists(pFileName) then exit;
  try
    fFile := TFileStream.Create(pFileName,fmOpenRead);
  except
    exit; //Acces denied for file or other errors
  end;
  try
    //Reading from file and counting;
    vWord := false;
    while fFile.Position < fFile.Size do
    begin
      vBufSize := fFile.Read(vBuffer,sizeOf(vBuffer));
 
      for vi := 0 to vBufSize do
      begin
        if vBuffer[vi] in cAllowed then
          vWord := true
        else begin
          if vWord then
            inc(result);
          vWord := false;
        end;
      end;
    end;
  finally
    FreeAndNil(fFile);
  end;
end;

Open in new window

0
 
tankergoblinAuthor Commented:
how can cdelimiter apply in my code?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.