Solved

Looking for sample program

Posted on 2007-04-06
13
248 Views
Last Modified: 2010-04-01
Hi all,
 I want you to find a program which can count words, and then make it also tell you the line numbers where the words occur.
 
 If you post the source code here( for future users too), please give me the references.  

PS: any person who need to edit the code in order to meet the requirements and have references will get 500 points.
0
Comment
Question by:valleytech
  • 6
  • 4
  • 2
  • +1
13 Comments
 
LVL 45

Expert Comment

by:Kent Olsen
ID: 18865337

Hi Valleytech,

If you're on a unix system, this can be done very easily with a shell script.  That's a lot easier than writing a custom program.


Kent
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18865485
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18865494
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 

Author Comment

by:valleytech
ID: 18868283
well, i want to it in c program.
 alexnek, those links you gave me only count word. However, I need to see for each particular word occurs at which those line numbers. Thanks.
please help.
0
 
LVL 3

Expert Comment

by:Darrylsh
ID: 18868405
create a map with the word as key and a vector of integers as the value.
map <string, vector<int> > wordcount;
now read each line.  
for each word in the line push back the line number into the vector whose key is the word.
wordcount[a_word].push_back(line_number);

in the end the size of the vector will be your count:  wordcount[word_in_list].size();
and the vector will be the index of all the lines it appears


0
 
LVL 3

Accepted Solution

by:
Darrylsh earned 500 total points
ID: 18868451
#include<iostream>
#include <vector>
#include <string>
#include <sstream>
#include <fstream>
#include <map>

using namespace std;

vector<string>  getTokens(string str);

int main()
{
      ifstream infile ("input.txt",ios::in);
      if (!infile.is_open())
      {
            cout << "Error Opening input File.\n";
            return 0;
      }      
      map <string, vector<int> > wordcount;
      int line_number = 1;
      string line;
      while( getline(infile, line,'\n'))
      {
            vector<string> words = getTokens(line);
            for (int i = 0;i < words.size();++i)
            {
                  wordcount[words[i]].push_back(line_number);
            }
            ++line_number;
      }
      map <string, vector<int> >::iterator it;
      for (it = wordcount.begin();it!= wordcount.end(); ++it)
      {
            cout << "word=" << it->first <<": count=" <<(it->second).size() << ": line numbers= ";
            for (int i = 0; i < (it->second).size();++i)
            {
                  cout << (it->second).at(i)<< " ";
            }
            cout << endl;
      }

}
vector<string>  getTokens(string str)
{
      string buf;  // Have a buffer string
      stringstream ss(str);  // Insert the string into a stream
      vector<string> tokens;  // Create vector to hold our words
      while (ss >> buf)
            tokens.push_back(buf);
      return tokens;
}
0
 

Author Comment

by:valleytech
ID: 18868696
wow Darrylsh.
 You are so good. I just thought of binary tree with the node include the line number and word. Let me run yours. Thanks.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18869166
> I need to see for each particular word occurs at which those line numbers.
It is a base, I think it is easy to modify it.
If you want it in C that would be better to use a tree. With node like this one
struct Node{
char Letter;
BOOL EndOfWord;
int LineNumber;
int WordCount;
Node* pNext;
Node* pChildren;
} Node;

It is not Memory optimized but easy to realize;
It is possibel to use sorted array too with items
stuct Item
{
char* pWord;
int LineNumber;
int WordCount;
}
0
 

Author Comment

by:valleytech
ID: 18870225
do you have the sample code for it? Thanks.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18870350
No, I haven't but I can't see a big problem to write it.
I'll try to write a pseudo code for it.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18871999
I'm sorry, I have no time to write something but I've found the error in my data construction.
We need to have an array for the LineNumber, one number for each word. For C it is possible to have a list
0
 

Author Comment

by:valleytech
ID: 18872479
it's ok alexnek.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18873198
I did it! But I hate C now. ... The code could be better.

typedef struct LineNumNode {
      int m_nLineNumber;
      LineNumNode* m_pNext;
}LineNumNode;

typedef struct Node {
      char m_Letter;
      bool m_EndOfWord;
      LineNumNode* m_pLineNumRoot;
      int m_WordCount;
      Node* m_pNext;
      Node* m_pChildren;
} Node;

Node* FindSymbolNext(Node* pStart, char Symbol)
{
      Node* pCurrent = pStart;
      Node* Find = NULL;
      if (pStart!= NULL)
      {
        do
        {
           if (pCurrent->m_Letter == Symbol)
             {
                   Find = pCurrent;
                   break;
             }
         pCurrent = pCurrent->m_pNext;
        }
        while (pCurrent != NULL);
      }
      return Find;
}

Node* FindSymbolChild(Node* pStart, char Symbol)
{
      Node* pCurrent = pStart;
      Node* Find = NULL;
      if (pStart!= NULL)
      {
        do
        {
           if (pCurrent->m_Letter == Symbol)
             {
                   Find = pCurrent;
                   break;
             }
             pCurrent = pCurrent->m_pChildren;
        }
        while (pCurrent != NULL);
      }
      return Find;
}

void AddLineNumber(Node* pNode, int LineNumber)
{
      if (pNode != NULL)
      {
            LineNumNode* pNumNode = NULL;
            pNumNode = new LineNumNode();
            //Set all fields to 0
            memset(pNumNode,0, sizeof(LineNumNode));

            if (pNode->m_pLineNumRoot == NULL)
            {
                  pNode->m_pLineNumRoot = pNumNode;
            }
            else
            {
                  LineNumNode* pCurrNumNode = pNode->m_pLineNumRoot;
                  while (pCurrNumNode->m_pNext != NULL)
                  {
                        pCurrNumNode = pCurrNumNode->m_pNext;
                  }
                  pCurrNumNode->m_pNext = pNumNode;
            }
            pNumNode->m_nLineNumber = LineNumber;
      }
}

Node* GetLastWordNode(Node* pStart)
{
      Node* pCurrent = pStart;
      Node* Find = pCurrent;
      if (pStart!= NULL)
      {
            do
            {
                  Find = pCurrent;
                  pCurrent = pCurrent->m_pNext;
            }
            while (pCurrent != NULL);
      }
      return Find;
}

//Test Line
//
//this is simple text
//
//tree structure - next line is Next, symbols on the same line is children
//root
//!
//t->h->i->s
//!  !
//!  e->x->t
//i->s
//!
//s->a->m->p->l->e
//
void test03()
{
      Node* Root = new Node();
      Node* LastNode = Root;
      Node* FoundNode = NULL;
      Node* CurrentNode = NULL;
      //Set all fields to 0
      memset(Root,0, sizeof(Node));
      
      int LineNumber = 0;

      char* WordDelimiters = " ;-.,\t\n";
      char* Test = "  This is  sample Text is Trick";
    char Ch;
      bool NewWord = true;
      bool BeginOfLine = true;
      bool MiddleOfWord = false;

      //skip delimiters
      while (*Test!= '\0' && strchr(WordDelimiters, *Test))
      {
            Test++;
      }

      while (*Test != '\0')
      {
        Ch = *Test;
            //Word Delimiter found
            if (strchr(WordDelimiters, Ch) != NULL)
            {
                  NewWord = true;
                  LastNode->m_WordCount++;
                  AddLineNumber(LastNode,LineNumber);
                  LastNode = Root;
      
                  //skip delimiters
                  while (*Test!= '\0' && strchr(WordDelimiters, *Test))
                  {
                        Test++;
                  }
                  Ch = *Test;
                  FoundNode = FindSymbolNext(LastNode,Ch);
            }
            else
            {
                  FoundNode = FindSymbolChild(LastNode,Ch);
            }

            if (FoundNode != NULL)
            {
                  LastNode = FoundNode;
                  MiddleOfWord = true;
            }
            else
            {
                  Node* CurrentNode = new Node();
                  //Set all fields to 0
                  memset(CurrentNode,0, sizeof(Node));
                  CurrentNode->m_Letter = Ch;
                  if (NewWord || BeginOfLine)
                  {
                        if (MiddleOfWord)
                        {
                              LastNode = GetLastWordNode(LastNode->m_pChildren);
                        }
                        else
                        {
                              LastNode = GetLastWordNode(Root);
                        }
                        if (LastNode != NULL)
                        {
                              LastNode->m_pNext = CurrentNode;
                        }
                  }
                  else
                  {
                        LastNode->m_pChildren = CurrentNode;
                  }
                  NewWord = false;
                  MiddleOfWord = false;
                  LastNode = CurrentNode;
            }
            BeginOfLine = false;
            Test++;
      }
      //set values for the last word in line
      LastNode->m_WordCount++;
      AddLineNumber(LastNode,LineNumber);

      //print all words
      PrintAllWords(Root,NULL);
}
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
The goal of this video is to provide viewers with basic examples to understand opening and writing to files in the C programming language.
The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question