c++ : search and count how many times a word appears in a file

Posted on 2003-03-13
Medium Priority
Last Modified: 2007-12-19
I need a code snipped to do the following:

Search files: When specifying string to be searched,the program should returns a list of all the files which had
matches within them, as well as the number of matches e.g.
Files containing search string: HELLO
sally.txt: 333
bob.txt: 32
mike.txt: 2

I don't know any method/function that can go word for word in a file, in c++. Something like a java string tokenizer maybe!
Question by:moevic
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Accepted Solution

JackNCalvin earned 100 total points
ID: 8131678

You need to open up a file and then scan each word.  This is done fairly easily in c++.  The input stream will allow you to read each individual word separatley.  You can read this in as a String and then compare it to the desired word.  Just add a counter to increment everytime this word is found, and reset it to 0 when the file closes.  

Here is a function that might work for you.


void readSearchedFile(char *filename, String stringCompare){

   String s;
   int counter = 0;

   ifstream fin(filename);

   if (!filename){
        cout << "Could Not Open: " << filename << endl;

   while (fin) {

     fin >> s;

     if (s == stringCompare) {
         counter = 0;


   cout << filename << ": " << counter << endl;


} // end readSearched File


Hope this helps!!!

LVL 30

Expert Comment

by:Mayank S
ID: 8134296
You mean, you need to implement something similar to the grep command. But you didn't specify whether thw flie-names will be specified by the user or whether it has to search all the files in the directory?


Assisted Solution

baby001 earned 100 total points
ID: 8141562
The following codes can meet your need, but it can only run on windows. you can copy it to your project, it was builded before and can work well.

Hoping this is helpful to you.

#include <stdio.h>
#include <string.h>
#include "windows.h"
#include <sys/stat.h>

class FileList
      FileList(int nFile) ;
      ~FileList() ;
      void Add(char * strName) ;
      int GetNum() ;
      void OutPutList() ;
      char ** listFileName ;

      int m_index ;
      int m_nFile ;
} ;

FileList::FileList(int nFile)
      if (nFile >0)
            m_index = 0 ;
            m_nFile = nFile ;
            listFileName= new char*[nFile] ;
            for (int i=0;i<nFile;++i)      
                  listFileName[i] = new char[100] ;      


      if (m_nFile > 0)
            int i ;
            for (i=0;i<m_nFile;i++)
                  delete [] listFileName[i] ;
            delete [] listFileName ;

void FileList::Add(char *strName)
      int i ;
      if (m_index < m_nFile)
            strcpy(listFileName[m_index], strName) ;      
            return ;
      ++m_index ;

int FileList::GetNum()
      return m_index ;

void FileList::OutPutList()
      int i ;      
      for (i=0;i<m_index;i++)
            printf("\n%s", listFileName[i]) ;

void SearchSubString(char * strDirectory, char * strSub, FileList &listFile)
      HANDLE hFind;
      WIN32_FIND_DATA df;
      long fileSize = 0;
      int i ;
      char strClear[100] ;
      for (i=0;i<100;i++)
            strClear[i] = '\0' ;      
      char dirFirst[50] ;
      strcpy(dirFirst, strDirectory) ;
      strcat(dirFirst, "\\*.*") ;
      strcat(strDirectory, "\\") ;

      hFind=FindFirstFile(dirFirst, &df);
            printf("%s is no a directory!\n", strDirectory) ;
            return ;

            //is directory
            if( df.dwFileAttributes==FILE_ATTRIBUTE_DIRECTORY ||
                  if ((strcmp(df.cFileName, ".")!=0)&&
                        (strcmp(df.cFileName, "..")!=0))
                        char strDesSubPath[100] ;
                        memcpy(strDesSubPath, strClear, 100) ;
                        strcpy(strDesSubPath, strDirectory) ;
                        strcat(strDesSubPath, df.cFileName) ;                  
                        strcat(strDesSubPath, "\\");
                        SearchSubString(strDesSubPath, strSub, listFile) ;      
            //is file
                  FILE *stream ;
                  long lenFile = 0 ;
                  char strTemp[500] ;
                  strcpy(strTemp, strDirectory) ;
                  strcat(strTemp, df.cFileName) ;
                  stream = fopen(strTemp, "r") ;
                  struct _stat statbuf ;
                  _stat(strTemp, &statbuf) ;
                  lenFile = statbuf.st_size ;
                  char * buffer = new char[lenFile] ;
                  fread(buffer, sizeof(char), lenFile, stream) ;
                  char * pDest ;
                  pDest = strstr(buffer, strSub) ;
                  if (pDest != NULL)
                        listFile.Add(strTemp) ;                  
                  fclose(stream) ;            

void main(int argc, char *argv[])
      FileList listFile(1000) ;
      if (argc <= 2)
            printf("syntex error, you should call findsubstring like:\n findsubstring <directoryname>, <substring>") ;
            return ;
      char * pDest ;
      if ((pDest = strstr(argv[1], "\\"))==NULL)
            printf("The first argument must be a whole directory name!") ;
            return ;

      char strName[200] ;
      strcpy(strName, argv[1]) ;
      SearchSubString(strName, argv[2], listFile) ;      
      if (listFile.GetNum() <= 0)
            printf("No files contain the substring\n") ;
            return ;
            listFile.OutPutList() ;

Expert Comment

ID: 9447159
This old question needs to be finalized -- accept an answer, split points, or get a refund.  For information on your options, please click here-> http:/help/closing.jsp#1 
Post your closing recommendations!  No comment means you don't care.

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
Computer science students often experience many of the same frustrations when going through their engineering courses. This article presents seven tips I found useful when completing a bachelors and masters degree in computing which I believe may he…
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Simple Linear Regression

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question