Looking for sample program

Hi all,
 I want you to find a program which can count words, and then make it also tell you the line numbers where the words occur.
 
 If you post the source code here( for future users too), please give me the references.  

PS: any person who need to edit the code in order to meet the requirements and have references will get 500 points.
valleytechAsked:
Who is Participating?
 
DarrylshConnect With a Mentor Commented:
#include<iostream>
#include <vector>
#include <string>
#include <sstream>
#include <fstream>
#include <map>

using namespace std;

vector<string>  getTokens(string str);

int main()
{
      ifstream infile ("input.txt",ios::in);
      if (!infile.is_open())
      {
            cout << "Error Opening input File.\n";
            return 0;
      }      
      map <string, vector<int> > wordcount;
      int line_number = 1;
      string line;
      while( getline(infile, line,'\n'))
      {
            vector<string> words = getTokens(line);
            for (int i = 0;i < words.size();++i)
            {
                  wordcount[words[i]].push_back(line_number);
            }
            ++line_number;
      }
      map <string, vector<int> >::iterator it;
      for (it = wordcount.begin();it!= wordcount.end(); ++it)
      {
            cout << "word=" << it->first <<": count=" <<(it->second).size() << ": line numbers= ";
            for (int i = 0; i < (it->second).size();++i)
            {
                  cout << (it->second).at(i)<< " ";
            }
            cout << endl;
      }

}
vector<string>  getTokens(string str)
{
      string buf;  // Have a buffer string
      stringstream ss(str);  // Insert the string into a stream
      vector<string> tokens;  // Create vector to hold our words
      while (ss >> buf)
            tokens.push_back(buf);
      return tokens;
}
0
 
Kent OlsenData Warehouse Architect / DBACommented:

Hi Valleytech,

If you're on a unix system, this can be done very easily with a shell script.  That's a lot easier than writing a custom program.


Kent
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
valleytechAuthor Commented:
well, i want to it in c program.
 alexnek, those links you gave me only count word. However, I need to see for each particular word occurs at which those line numbers. Thanks.
please help.
0
 
DarrylshCommented:
create a map with the word as key and a vector of integers as the value.
map <string, vector<int> > wordcount;
now read each line.  
for each word in the line push back the line number into the vector whose key is the word.
wordcount[a_word].push_back(line_number);

in the end the size of the vector will be your count:  wordcount[word_in_list].size();
and the vector will be the index of all the lines it appears


0
 
valleytechAuthor Commented:
wow Darrylsh.
 You are so good. I just thought of binary tree with the node include the line number and word. Let me run yours. Thanks.
0
 
AlexNekCommented:
> I need to see for each particular word occurs at which those line numbers.
It is a base, I think it is easy to modify it.
If you want it in C that would be better to use a tree. With node like this one
struct Node{
char Letter;
BOOL EndOfWord;
int LineNumber;
int WordCount;
Node* pNext;
Node* pChildren;
} Node;

It is not Memory optimized but easy to realize;
It is possibel to use sorted array too with items
stuct Item
{
char* pWord;
int LineNumber;
int WordCount;
}
0
 
valleytechAuthor Commented:
do you have the sample code for it? Thanks.
0
 
AlexNekCommented:
No, I haven't but I can't see a big problem to write it.
I'll try to write a pseudo code for it.
0
 
AlexNekCommented:
I'm sorry, I have no time to write something but I've found the error in my data construction.
We need to have an array for the LineNumber, one number for each word. For C it is possible to have a list
0
 
valleytechAuthor Commented:
it's ok alexnek.
0
 
AlexNekCommented:
I did it! But I hate C now. ... The code could be better.

typedef struct LineNumNode {
      int m_nLineNumber;
      LineNumNode* m_pNext;
}LineNumNode;

typedef struct Node {
      char m_Letter;
      bool m_EndOfWord;
      LineNumNode* m_pLineNumRoot;
      int m_WordCount;
      Node* m_pNext;
      Node* m_pChildren;
} Node;

Node* FindSymbolNext(Node* pStart, char Symbol)
{
      Node* pCurrent = pStart;
      Node* Find = NULL;
      if (pStart!= NULL)
      {
        do
        {
           if (pCurrent->m_Letter == Symbol)
             {
                   Find = pCurrent;
                   break;
             }
         pCurrent = pCurrent->m_pNext;
        }
        while (pCurrent != NULL);
      }
      return Find;
}

Node* FindSymbolChild(Node* pStart, char Symbol)
{
      Node* pCurrent = pStart;
      Node* Find = NULL;
      if (pStart!= NULL)
      {
        do
        {
           if (pCurrent->m_Letter == Symbol)
             {
                   Find = pCurrent;
                   break;
             }
             pCurrent = pCurrent->m_pChildren;
        }
        while (pCurrent != NULL);
      }
      return Find;
}

void AddLineNumber(Node* pNode, int LineNumber)
{
      if (pNode != NULL)
      {
            LineNumNode* pNumNode = NULL;
            pNumNode = new LineNumNode();
            //Set all fields to 0
            memset(pNumNode,0, sizeof(LineNumNode));

            if (pNode->m_pLineNumRoot == NULL)
            {
                  pNode->m_pLineNumRoot = pNumNode;
            }
            else
            {
                  LineNumNode* pCurrNumNode = pNode->m_pLineNumRoot;
                  while (pCurrNumNode->m_pNext != NULL)
                  {
                        pCurrNumNode = pCurrNumNode->m_pNext;
                  }
                  pCurrNumNode->m_pNext = pNumNode;
            }
            pNumNode->m_nLineNumber = LineNumber;
      }
}

Node* GetLastWordNode(Node* pStart)
{
      Node* pCurrent = pStart;
      Node* Find = pCurrent;
      if (pStart!= NULL)
      {
            do
            {
                  Find = pCurrent;
                  pCurrent = pCurrent->m_pNext;
            }
            while (pCurrent != NULL);
      }
      return Find;
}

//Test Line
//
//this is simple text
//
//tree structure - next line is Next, symbols on the same line is children
//root
//!
//t->h->i->s
//!  !
//!  e->x->t
//i->s
//!
//s->a->m->p->l->e
//
void test03()
{
      Node* Root = new Node();
      Node* LastNode = Root;
      Node* FoundNode = NULL;
      Node* CurrentNode = NULL;
      //Set all fields to 0
      memset(Root,0, sizeof(Node));
      
      int LineNumber = 0;

      char* WordDelimiters = " ;-.,\t\n";
      char* Test = "  This is  sample Text is Trick";
    char Ch;
      bool NewWord = true;
      bool BeginOfLine = true;
      bool MiddleOfWord = false;

      //skip delimiters
      while (*Test!= '\0' && strchr(WordDelimiters, *Test))
      {
            Test++;
      }

      while (*Test != '\0')
      {
        Ch = *Test;
            //Word Delimiter found
            if (strchr(WordDelimiters, Ch) != NULL)
            {
                  NewWord = true;
                  LastNode->m_WordCount++;
                  AddLineNumber(LastNode,LineNumber);
                  LastNode = Root;
      
                  //skip delimiters
                  while (*Test!= '\0' && strchr(WordDelimiters, *Test))
                  {
                        Test++;
                  }
                  Ch = *Test;
                  FoundNode = FindSymbolNext(LastNode,Ch);
            }
            else
            {
                  FoundNode = FindSymbolChild(LastNode,Ch);
            }

            if (FoundNode != NULL)
            {
                  LastNode = FoundNode;
                  MiddleOfWord = true;
            }
            else
            {
                  Node* CurrentNode = new Node();
                  //Set all fields to 0
                  memset(CurrentNode,0, sizeof(Node));
                  CurrentNode->m_Letter = Ch;
                  if (NewWord || BeginOfLine)
                  {
                        if (MiddleOfWord)
                        {
                              LastNode = GetLastWordNode(LastNode->m_pChildren);
                        }
                        else
                        {
                              LastNode = GetLastWordNode(Root);
                        }
                        if (LastNode != NULL)
                        {
                              LastNode->m_pNext = CurrentNode;
                        }
                  }
                  else
                  {
                        LastNode->m_pChildren = CurrentNode;
                  }
                  NewWord = false;
                  MiddleOfWord = false;
                  LastNode = CurrentNode;
            }
            BeginOfLine = false;
            Test++;
      }
      //set values for the last word in line
      LastNode->m_WordCount++;
      AddLineNumber(LastNode,LineNumber);

      //print all words
      PrintAllWords(Root,NULL);
}
0
All Courses

From novice to tech pro — start learning today.