Solved

Looking for sample program

Posted on 2007-04-06
13
244 Views
Last Modified: 2010-04-01
Hi all,
 I want you to find a program which can count words, and then make it also tell you the line numbers where the words occur.
 
 If you post the source code here( for future users too), please give me the references.  

PS: any person who need to edit the code in order to meet the requirements and have references will get 500 points.
0
Comment
Question by:valleytech
  • 6
  • 4
  • 2
  • +1
13 Comments
 
LVL 45

Expert Comment

by:Kdo
ID: 18865337

Hi Valleytech,

If you're on a unix system, this can be done very easily with a shell script.  That's a lot easier than writing a custom program.


Kent
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18865485
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18865494
0
 

Author Comment

by:valleytech
ID: 18868283
well, i want to it in c program.
 alexnek, those links you gave me only count word. However, I need to see for each particular word occurs at which those line numbers. Thanks.
please help.
0
 
LVL 3

Expert Comment

by:Darrylsh
ID: 18868405
create a map with the word as key and a vector of integers as the value.
map <string, vector<int> > wordcount;
now read each line.  
for each word in the line push back the line number into the vector whose key is the word.
wordcount[a_word].push_back(line_number);

in the end the size of the vector will be your count:  wordcount[word_in_list].size();
and the vector will be the index of all the lines it appears


0
 
LVL 3

Accepted Solution

by:
Darrylsh earned 500 total points
ID: 18868451
#include<iostream>
#include <vector>
#include <string>
#include <sstream>
#include <fstream>
#include <map>

using namespace std;

vector<string>  getTokens(string str);

int main()
{
      ifstream infile ("input.txt",ios::in);
      if (!infile.is_open())
      {
            cout << "Error Opening input File.\n";
            return 0;
      }      
      map <string, vector<int> > wordcount;
      int line_number = 1;
      string line;
      while( getline(infile, line,'\n'))
      {
            vector<string> words = getTokens(line);
            for (int i = 0;i < words.size();++i)
            {
                  wordcount[words[i]].push_back(line_number);
            }
            ++line_number;
      }
      map <string, vector<int> >::iterator it;
      for (it = wordcount.begin();it!= wordcount.end(); ++it)
      {
            cout << "word=" << it->first <<": count=" <<(it->second).size() << ": line numbers= ";
            for (int i = 0; i < (it->second).size();++i)
            {
                  cout << (it->second).at(i)<< " ";
            }
            cout << endl;
      }

}
vector<string>  getTokens(string str)
{
      string buf;  // Have a buffer string
      stringstream ss(str);  // Insert the string into a stream
      vector<string> tokens;  // Create vector to hold our words
      while (ss >> buf)
            tokens.push_back(buf);
      return tokens;
}
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:valleytech
ID: 18868696
wow Darrylsh.
 You are so good. I just thought of binary tree with the node include the line number and word. Let me run yours. Thanks.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18869166
> I need to see for each particular word occurs at which those line numbers.
It is a base, I think it is easy to modify it.
If you want it in C that would be better to use a tree. With node like this one
struct Node{
char Letter;
BOOL EndOfWord;
int LineNumber;
int WordCount;
Node* pNext;
Node* pChildren;
} Node;

It is not Memory optimized but easy to realize;
It is possibel to use sorted array too with items
stuct Item
{
char* pWord;
int LineNumber;
int WordCount;
}
0
 

Author Comment

by:valleytech
ID: 18870225
do you have the sample code for it? Thanks.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18870350
No, I haven't but I can't see a big problem to write it.
I'll try to write a pseudo code for it.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18871999
I'm sorry, I have no time to write something but I've found the error in my data construction.
We need to have an array for the LineNumber, one number for each word. For C it is possible to have a list
0
 

Author Comment

by:valleytech
ID: 18872479
it's ok alexnek.
0
 
LVL 16

Expert Comment

by:AlexNek
ID: 18873198
I did it! But I hate C now. ... The code could be better.

typedef struct LineNumNode {
      int m_nLineNumber;
      LineNumNode* m_pNext;
}LineNumNode;

typedef struct Node {
      char m_Letter;
      bool m_EndOfWord;
      LineNumNode* m_pLineNumRoot;
      int m_WordCount;
      Node* m_pNext;
      Node* m_pChildren;
} Node;

Node* FindSymbolNext(Node* pStart, char Symbol)
{
      Node* pCurrent = pStart;
      Node* Find = NULL;
      if (pStart!= NULL)
      {
        do
        {
           if (pCurrent->m_Letter == Symbol)
             {
                   Find = pCurrent;
                   break;
             }
         pCurrent = pCurrent->m_pNext;
        }
        while (pCurrent != NULL);
      }
      return Find;
}

Node* FindSymbolChild(Node* pStart, char Symbol)
{
      Node* pCurrent = pStart;
      Node* Find = NULL;
      if (pStart!= NULL)
      {
        do
        {
           if (pCurrent->m_Letter == Symbol)
             {
                   Find = pCurrent;
                   break;
             }
             pCurrent = pCurrent->m_pChildren;
        }
        while (pCurrent != NULL);
      }
      return Find;
}

void AddLineNumber(Node* pNode, int LineNumber)
{
      if (pNode != NULL)
      {
            LineNumNode* pNumNode = NULL;
            pNumNode = new LineNumNode();
            //Set all fields to 0
            memset(pNumNode,0, sizeof(LineNumNode));

            if (pNode->m_pLineNumRoot == NULL)
            {
                  pNode->m_pLineNumRoot = pNumNode;
            }
            else
            {
                  LineNumNode* pCurrNumNode = pNode->m_pLineNumRoot;
                  while (pCurrNumNode->m_pNext != NULL)
                  {
                        pCurrNumNode = pCurrNumNode->m_pNext;
                  }
                  pCurrNumNode->m_pNext = pNumNode;
            }
            pNumNode->m_nLineNumber = LineNumber;
      }
}

Node* GetLastWordNode(Node* pStart)
{
      Node* pCurrent = pStart;
      Node* Find = pCurrent;
      if (pStart!= NULL)
      {
            do
            {
                  Find = pCurrent;
                  pCurrent = pCurrent->m_pNext;
            }
            while (pCurrent != NULL);
      }
      return Find;
}

//Test Line
//
//this is simple text
//
//tree structure - next line is Next, symbols on the same line is children
//root
//!
//t->h->i->s
//!  !
//!  e->x->t
//i->s
//!
//s->a->m->p->l->e
//
void test03()
{
      Node* Root = new Node();
      Node* LastNode = Root;
      Node* FoundNode = NULL;
      Node* CurrentNode = NULL;
      //Set all fields to 0
      memset(Root,0, sizeof(Node));
      
      int LineNumber = 0;

      char* WordDelimiters = " ;-.,\t\n";
      char* Test = "  This is  sample Text is Trick";
    char Ch;
      bool NewWord = true;
      bool BeginOfLine = true;
      bool MiddleOfWord = false;

      //skip delimiters
      while (*Test!= '\0' && strchr(WordDelimiters, *Test))
      {
            Test++;
      }

      while (*Test != '\0')
      {
        Ch = *Test;
            //Word Delimiter found
            if (strchr(WordDelimiters, Ch) != NULL)
            {
                  NewWord = true;
                  LastNode->m_WordCount++;
                  AddLineNumber(LastNode,LineNumber);
                  LastNode = Root;
      
                  //skip delimiters
                  while (*Test!= '\0' && strchr(WordDelimiters, *Test))
                  {
                        Test++;
                  }
                  Ch = *Test;
                  FoundNode = FindSymbolNext(LastNode,Ch);
            }
            else
            {
                  FoundNode = FindSymbolChild(LastNode,Ch);
            }

            if (FoundNode != NULL)
            {
                  LastNode = FoundNode;
                  MiddleOfWord = true;
            }
            else
            {
                  Node* CurrentNode = new Node();
                  //Set all fields to 0
                  memset(CurrentNode,0, sizeof(Node));
                  CurrentNode->m_Letter = Ch;
                  if (NewWord || BeginOfLine)
                  {
                        if (MiddleOfWord)
                        {
                              LastNode = GetLastWordNode(LastNode->m_pChildren);
                        }
                        else
                        {
                              LastNode = GetLastWordNode(Root);
                        }
                        if (LastNode != NULL)
                        {
                              LastNode->m_pNext = CurrentNode;
                        }
                  }
                  else
                  {
                        LastNode->m_pChildren = CurrentNode;
                  }
                  NewWord = false;
                  MiddleOfWord = false;
                  LastNode = CurrentNode;
            }
            BeginOfLine = false;
            Test++;
      }
      //set values for the last word in line
      LastNode->m_WordCount++;
      AddLineNumber(LastNode,LineNumber);

      //print all words
      PrintAllWords(Root,NULL);
}
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now