[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

read a file into an array and remove punctuation while changing everything to lowercase in c++

Posted on 2007-10-09
103
Medium Priority
?
1,956 Views
Last Modified: 2013-12-14
I am working on a text analyzer for my c++ class and i am a little stumped.  I am reading a file in to an array, but I want to convert everything to lower case and remove punctuation from the file, I was wondering if there is a way to do that.  this is waht I ahve so far and it reads everything into an array just fine, but I am stumped on how to do the rest.  

#include <iostream>
#include <fstream>
using namespace std;

//prototypes
char read (char *, int);


//Reads words in from a text file residing on the computer

char** read(const char* fileName, int& count)
{
  ifstream countingStream(fileName);
  // first count them
  count = 0;
  while (true)
  {
            char line[100];
            countingStream.getline(line, 100);
            if (strlen(line) == 0)
            {
                  break;
            }
            count += 1;
  }
  countingStream.close();

  ifstream readingStream(fileName);
  char** words = new char* [count];

  for (int index = 0; index < count; ++index)
  {
            char line[100];
            readingStream.getline(line, 100);
            words[index] = strdup(line);
            cout << line << std::endl;
  }
  readingStream.close();
  return words;
}
0
Comment
Question by:urobins
  • 52
  • 24
  • 22
  • +2
103 Comments
 

Author Comment

by:urobins
ID: 20045599
I know I can use tolower but I'm not sure how to incorporate this if someone has some ideas I'd really appreciate it :)
0
 
LVL 6

Expert Comment

by:SeanDurkin
ID: 20046190
Well, one thing you could do is put it into a new array one character at a time, converting each to lower case if it's upper case and only putting in valid characters (letters, numbers, spaces, no punctuation). For example:

char *newArr = new char[strlen(line) +1];
for(int i = 0, j = 0; i < strlen(line); ++i)
{
        // if it is an upper case character
        if(line[i] >= 0x41 && line[i] <= 5A)
        {
                // if it is upper case, subtract by 0x20 to get the lower case version
                newArr[j++] = line[i] - 0x20;
        }
        else if(line[i] == 0x20)
        {
                // if it is a space (or whatever other characters you don't want changed)
                newArr[j++] = line[i];
        }
}

I used the hex values in the table from http://www.asciitable.com/

- seand
0
 
LVL 6

Expert Comment

by:SeanDurkin
ID: 20046194
Oh, and just to clarify:

0x41 is 'A'
0x5A is 'Z'
0x20 is ' ' (space)

You could definitely use the character literals instead of the hexadecimal values like I did, I'm just used to using them :).
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Expert Comment

by:abith
ID: 20046474
put all punctuation letters in to an array , in a function parameterized with a char called  ispunc, check if passed char is exist in array, return true if exist. And follow the procedure
#define tolower(s) (s >='A' && s<='Z' ? s+('a'-'A') : s)
#define isletter(s) (s>='a' && s<='z' ? true : false)

in strdup function...
char *strremovepunc(char *line)
{
      char c;
      char *start = line;
      int i,j,n = strlen(line);
      for(i = 0, j = 0; i < n; j++)
      {
             c = tolower(line[j]);
             if (isalphanum(c) || !ispunc(c))
                   line[i++] = line[j];
      }
      return start;
}
0
 
LVL 5

Expert Comment

by:abith
ID: 20046476
>> #define isletter(s) (s>='a' && s<='z' ? true : false)
should be replaced with
#define isalphanum(s) ( (s>='a' && s<='z') || (s>='0' && s<='9') ? true : false)
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20046783
You'd better read the file into an array rather than reading it twice. Moreover the check for an empty line isn't a valid end-of-file check:

   ...
#include <string>  // use std::string instead of char arrays
#include <vector>
   ...

   ...
   ifstream ifs(fileName);
   string line;
   vector<string> allLines;
   while (getline(ifs, line))
         allLines.push_back(line);
    ifs.close();

    // now you can iterate the vector, make your changes and
    // write the lines back to a outputfile ....

     ...
    for (int i = 0; i < allLines.size(); ++i)
    {
           string& s = allLines[i];  // by using a reference you may change
                                                // the line in vector
           ...
         
Regards, Alex
0
 

Author Comment

by:urobins
ID: 20047315
Thanks for the tips, I'll give these a shot

itsmeandnobodyelse: this is a homework assignment and I don't think we can use str I asked yesterday but have not had a response, our last project we were forced to use arrays of char instead of string so I am guessing this is the same way.
0
 

Author Comment

by:urobins
ID: 20047565
I need my array to read in at least 5000 words, however my current iteration doesn't get the whole file.  I was wondering if someone might see what is wrong with it?  I was creating a char line of 100 spaces I thought that would allow large words, is that infact limiting the size of my array for all words?
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20047657
>> however my current iteration doesn't get the whole file.

What do you mean by that ?

Can you show your last code ?
0
 

Author Comment

by:urobins
ID: 20047728
The code posted above, but here it is again

#include <iostream>
#include <fstream>
using namespace std;

//prototypes
char read (char *, int);


//Reads words in from a text file residing on the computer

char** read(const char* fileName, int& count)
{
  ifstream countingStream(fileName);
  // first count them
  count = 0;
  while (true)
  {
            char line[100];
            countingStream.getline(line, 100);
            if (strlen(line) == 0)
            {
                  break;
            }
            count += 1;
  }
  countingStream.close();

  ifstream readingStream(fileName);
  char** words = new char* [count];

  for (int index = 0; index < count; ++index)
  {
            char line[100];
            readingStream.getline(line, 100);
            words[index] = strdup(line);
            cout << line << endl;
  }
  readingStream.close();
  return words;
}

I used this in my last project and thought I could re-use it but it doesn't seem to make it through then entire file.  I tried upping my line, 100 value to 10000 and it got much further through, but I think I don't understand that quite right, I thought I was created an array for a word of a possible 100 values, am I really creating an array of only 100 words?  in that case I would want to set it much higher or have it dynamicaly allocate but I am not sure.  
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20047754
>> The code posted above, but here it is again

I was asking because I thought you modified something after all the comments that were already given ...


Your code stops reading when it reads line of length 0 :

            if (strlen(line) == 0)
            {
                  break;
            }

I'd look into that ... are you sure you want to use that as your stop condition ?

Another thing to check : are you sure that all lines are shorter than 100 characters ?
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20047760
btw, do you realize that the prototype for the read function doesn't match the read function you implemented ?
0
 

Author Comment

by:urobins
ID: 20047769
No I am not sure how short the lines will be I was trying to capture words individually so I think I need to change that.  I have been experementing with the good function this morning but haven't gotten to far
while (myInputStream.good())

I think I need to read in the words then tokenize it out for spaces and or punctuation for stopping points.  but I really wasn't sure.

So I should be really be breaking when I find white space or punctuation not 0?  
0
 

Author Comment

by:urobins
ID: 20047777
Yeah I just notice that :)  as I was posting my response to you.  
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20047789
>> No I am not sure how short the lines will be I was trying to capture words individually so I think I need to change that.

Yes, if a line can contain more than one word, then you'll indeed have to change the logic a bit. A good approach is to read a line (into a buffer of sufficient size), then tokenize it on whitespace/punctuation to get the words, then read the next line and do the same.
0
 
LVL 6

Expert Comment

by:SeanDurkin
ID: 20047830
You're creating an array to read in one line at a time, and it should hold 99 characters (the 100th is for the null terminating character). You could make it larger to be on the safe side, but I have a feeling the way you check for the length of characters extracted is the problem. Instead of

if (strlen(line) == 0)
{
        break;
}

try

if(readingStream.gcount() == 0)
{
        break;
}

istream's gcount() member function will take the number of characters read in from the stream. See this for more information: http://www.thescripts.com/forum/thread59634.html
0
 

Author Comment

by:urobins
ID: 20048150
Infinity do you have a link to an example of what you are talking about?  I tried googling and didn't get much useful information.  I am not sure how to read a line to a buffer then pick up where I left off.  the strtok I think I understand that. but I am having issues reading multiple lines.  Thanks!
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20048191
>> I am not sure how to read a line to a buffer

The same way you were already doing it - just provide a bigger buffer (right now it's 100 characters). Then you can process the line you read, and split it up in words. Take a look at strtok for that :

        http://www.cplusplus.com/reference/clibrary/cstring/strtok.html

(since you have to use character arrays instead of std::string)


>> but I am having issues reading multiple lines.

You're already doing that ... take a look at your code :

  for (int index = 0; index < count; ++index)
  {
            char line[100];
            readingStream.getline(line, 100);
            // do something with line here ...
  }


Note that you don't have to go over the file twice ... just go over it once.
0
 

Author Comment

by:urobins
ID: 20048282
Oh okay! Yeah I can use string on this assignment, but the professor says he doesn't see how it would help to use strings instead.
 
Do you see any advantage to that?  I m going to work on the strtok now, thanks so much for your help again.

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20048596
>>>> but the professor says he doesn't see how it would help to use strings instead
it helps much as you could get rid of the fixed sized char arrays, of the dynamic words array and the two loops.

I assume you need to write an output file with all your changes.

- Open the output file (ofstream) together with the input file.
- Take my loop from above but omit the vector<string>
- instead make the necessary changes to the line and write it back to the output file.

Use functions to convert a string to lowercase characters and to remove the punctuation. The functions may have the prototypes

     string makeLower(const string& s);
     string removePunctuation(const string& s);

In both functions you would iterate thru the characters of the argument string and fill a new temporary string by adding the current iteration character (either as lowercase char or if not a punctuation char). Finally return the temporary string.

After you applied both functions to the line currently read, you write the changed line to the output file.

Hope, you understand what I mean.

Regards, Alex


Regards, Alex
0
 

Author Comment

by:urobins
ID: 20048805
I am trying to wrap my brain around that now.  I will take a crack at it here soon.  We just have to put the output to the screen so an output file isn't necessary but won't hurt :).

Ultimately what I am being tasked on is I have to take this large text file (5000+ words) itereate through and out put how many times each letter shows up, as well as how many time each word shows up in the order they appear as well as how many 1,2,3,4,5.... letter words show up.  So I wanted to bring everything in and convert it to lower case so that way I won't have to worry when I do my comparisons.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20048839
>> We just have to put the output to the screen

That's even simpler :) You don't need to store the words in an array, nor do you need to open an output file - you just write them to the standard output.


>> So I wanted to bring everything in and convert it to lower case so that way I won't have to worry when I do my comparisons.

Sounds like a good plan. You can however get a word from the file, and immediately increment counters as you go, without storing the word (unless it's a new word that you haven't encountered yet).
0
 

Author Comment

by:urobins
ID: 20048970
Yeah my basic outline (haven't written code this far yet) was by brute force.  
I would have all words stored in an array (huge array I know)
iterate through the array 1 word at a time, so I would say

if (array[index] == array[indx++] )
     increment counter
when I hit the end of the arry
cout << array[index] << "appeared " << counter << " times";

The part I'm having the most problem is trying to find out how many times 1,2,3,4,5.... letter words appear, I guess I would need to go through the whole array and set a flag for each strlenght and the count the number of times the trues show up for each length??? does this sound feasable?

-U

or something like that, but it seems rather inefficient.
0
 
LVL 39

Accepted Solution

by:
itsmeandnobodyelse earned 1200 total points
ID: 20049216
>>>> out put how many times each letter shows up
if counting alphabetics only (and not counting umlauts) you only need an integer array of 26 all initialized with 0.

     int countAlpha[26] = { 0 };

Then if recognized that a char was in the range of 'a' to 'z' (including) you could increment by

    countAlpha[c - 'a']++;

Same you can do for the digits by using an array of 10 and by subtracting '0' to get the index of a digit to count.


Counting occurrences of words is indeed another game cause you would need a container where you can collect the words. If you could use std::map it would be easiest:

      map<string, int> words;

      ....
                // assume the found word is in string word
                words[word]++;     // wow, that is all

finally, iterate thru the map and output the occurences.

If you could use std::vector but not std::map, it is much less easy. You could do:

- add each word found to the vector<string> with push_back
- finally sort the vector using std::sort function
- iterate the vector counting each occurrence of a word until the word changes.

If you couldn't use a standard container (beside of string) you are lost. Then, you either

would need to make a little dynamic container yourself (begin with 'string* words = new string[128]; int wordssize = 128;'. If wordscount >= wordssize,  allocate a new string array of size 'wordssize + 128', copy the old array to the new array in a loop,  delete the old array, assign the pointer of the new array to words and increment the wordssize by 128. Of course you may use bigger chunks than 128 strings if you want)  

or

have the two loops as you have now where the first loop is only counting and the second loop would add to a correctly sized string array.

In both cases you have a string array of size wordscount which now needs to get sorted. Use a bubble sort (two nested loops and swapping if the right is less than the left) if sorting isn't a main task.


>>>> each word shows up in the order they appear
Hmm, if you really need the initial order, you cannot sort. In that case you either have to use a second array for sorting (you actually would sort the indices of the first array then rather than sorting words) *or* eliminate duplicates when reading them from the file. You also need a second integer array then to hold the counts for each word. The last method is the slowest cause you would need to compare n/2 * n/2 times if n is the number of words.

Regards, Alex

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20049312
Take a look at the concept of a frequency table (frequency distribution). It will help you a lot in counting occurrences of words and letters. (see also Alex's post for a very clear description of how to practically use it).
0
 

Author Comment

by:urobins
ID: 20049357
Thanks, for all of the help!  I am going to start really hitting this hard this evening when I get home from work.  I have been trying little things all day here but I keep getting interupted with real work ;-)  
0
 

Author Comment

by:urobins
ID: 20049380
Thanks Inifinity I will check that out.  You guys have been huge in helping me to understand this and get my ideas fleshed out.  I'll try to get some code up and operational so I can re-post and close this ticket out and get you guys your points!
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20049487
I unintendedly posted my last comment as 'Proposed Solution'. Though the comment was not 'bad' ;-) you shouldn't take that as a 'request' but regard it like any other comment. I never use these radio buttons.
0
 

Author Comment

by:urobins
ID: 20049567
LOL, I didn't even notice :)  
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20050291
>>>> LOL, I didn't even notice :)
I noticed because of the 'colors'. All other comments were 'mouse-gray' with the new design.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20050679
>> I noticed because of the 'colors'. All other comments were 'mouse-gray' with the new design.

I don't see a difference, but that might be because I'm using the Expert skin.
0
 

Author Comment

by:urobins
ID: 20053204
>>>Counting occurrences of words is indeed another game cause you would need a container where you can collect the words. If you could use std::map it would be easiest:

  >>>  map<string, int> words

I'm not sure what is meant by this, I have never heard of map, which library is this in so I can try to find more info on it?
0
 

Author Comment

by:urobins
ID: 20053418
How do I read in my words to an array of strings?  I am getting an error when trying to tokenize the string and put the resulting word into my words array...
 this is what i have come up with... I left the counting part in there as my instructor request we use it...  I am just really confused with all this switching back and forth from char** to string etc...

#include <iostream>
#include <fstream>
using namespace::std;

char** read(const char* fileName, int& count)
{
      std::ifstream countingStream(fileName);
      // first count them
      count = 0;
      while (countingStream.good())
      {
            char line[1000];
            countingStream.getline(line, 1000);
            count += 1;
      }

      countingStream.close();
      ifstream readingStream(fileName);
      char** words = new char* [count];
      string word;
      for (int index = 0; index < count; ++index)
      {
            char line[1000];
            readingStream.getline(line, 1000);
            while (readingStream.good())
            {
                  word= strtok (line," ,.-!?");      
                  words[index] = word;
                  cout << words[index] << endl;
            }
            
            //cout << line << endl;
      }
      readingStream.close();
      return words;
}




int main ()
{
      int wordCount;
      char** text = read("c:\\proj2test.txt",wordCount);
      
      
      system("pause");
      return 0;
}


0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20054596
>>>> I'm not sure what is meant by this, I have never heard of map

map is in STL (standard template library). As most templates it is source only, you only have to include one header file:

#include <map>
using namespace std;

A map has two template arguments(types). The first is the key type, the second is the value type, e. g.

   map<string, int> mymap;

would define a dictionary where the key is a string and the value is an integer. Then you can do:

     mymap["Hello"]  = 5;
     mymap["Hello"]++;  // now the value is 6

Important is that any 'key' may occur only once.

   mymap[word]++;

would increment the occurence for the string in word by  1. So you easily could count the frequencies without needing to check whether the word already exist or not. If it is a new word, the value would initialized with 0 what is exactly what we need here.

>>>>
             while (readingStream.good())
             {
                 ...
                 words[index] = word;

what is that loop good for? The loop is infinite as the status of the stream was not changing within the loop.

Regards, Alex
 
0
 
LVL 53

Assisted Solution

by:Infinity08
Infinity08 earned 800 total points
ID: 20054719
>>       char** words = new char* [count];

Since you want to store words, and not lines, you shouldn't use the line count to create your words array (a line can contain more than one word, can't it ?).

Instead, of you're allowed to, you can go with an STL container, like a map as suggested by Alex. More info here :

        http://www.cplusplus.com/reference/stl/map/

So, you would read in count lines, tokenize each line, and increment the counter for each found word in the words map.

Can you use a map ? If not, what are you allowed to use ?


Btw, don't forget to also keep a frequency table for the letters ...
0
 

Author Comment

by:urobins
ID: 20055197
Yeah I think I can use a map, he hasn't said no.  I will verify for sure.  I'll be looking at this today at work and hopefully making some progress, got stuck working on a downed network last night, always fun.. Thanks for all of your help!
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20055458
>>>>         word= strtok (line," ,.-!?");
That doesn't compile for a few reasons:

strtok needs a writeable char buffer as first argument cause it replaces the found separators by zero characters. That way it doesn't need to allocate storage for the tokens it returns but simply can point in the original buffer you provide. string 'line' isn't a writeable char buffer. You only can get a const char buffer by using line.c_str().

strtok is highly appreciated by C programmers. For C++ I would recommend the following what is only a few statements more but much safer:

       int pos = 0;
       int lpos = 0;
       line += ' ';  // add a extra space at end so that we get a final delimiter
       while ((pos = (int)line.find_first_of(" ,.-!?", lpos)) != string::npos)
       {
             if (pos > lpos)
             {
                    string word = line.substr(pos, lpos - pos);
                    // do something with the word now
                    ...
             }
             lpos = pos + 1;   // next start for find_first_of
       }

The above also answers the question of your instructor how it would help to use strings instead of char arrays.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20055462
Correction:

     string word = line.substr(pos, pos - lpos);  // exchanged pos and lpos
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20055467
Grrrrr:

     string word = line.substr(lpos, pos - lpos);  // exchanged pos and lpos

Sorry.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20055845
I'm confused now ... can you use C++ strings or not ? Because earlier you said you couldn't if I'm not mistaken ...
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20055871
>>>> Because earlier you said you couldn't if I'm not mistaken ...

>> Oh okay! Yeah I can use string on this assignment,
>> but the professor says he doesn't see how it would
>> help to use strings instead.

I see.


0
 

Author Comment

by:urobins
ID: 20055936
Yeah He finally agreed to allow us to use strings just said that he didn't see why it would help.

So I am trying to re-write everything I had now and go this route as it seems more efficient and slightly easier than what I was dealing with.

0
 

Author Comment

by:urobins
ID: 20055969
>>>> what is that loop good for? The loop is infinite as the status of the stream was not changing within the loop

Yeah I found that out the hard way, just another attempt at working through the line to tokenize the strings.
0
 

Author Comment

by:urobins
ID: 20056058
I tried making the change you reccomended (Alex) but I am getting the following compilation errors.  I was just trying to see what kind of output I would get haven't implemented the char counting yet.

 error C2106: '+=' : left operand must be l-value
(38) : error C2228: left of '.find_first_of' must have class/struct/union 1>        type is 'char [1000]'
(38) : fatal error C1903: unable to recover from previous error(s); stopping compilation

My code is looking like htis

#include <iostream>
#include <fstream>
#include <map>

using namespace std;

//Reads words in from a text file residing on the computer

char** read(const char* fileName, int& count)
{
  ifstream countingStream(fileName);
  // first count them
  count = 0;
   while (countingStream.good())
      {
            char line[1000];
            countingStream.getline(line, 1000);
            count += 1;
      }
  countingStream.close();

  ifstream readingStream(fileName);

  map<string,int> wordMap;


  for (int index = 0; index < count; ++index)
  {
                  char line[1000];
            readingStream.getline(line, 1000);
                  int pos = 0;
                  int lpos = 0;
                  line += ' ';  
                  while ((pos = (int)line.find_first_of(" ,.-!?", lpos)) != string::npos)
                  {
                        if (pos > lpos)
                        {
                    string word = line.substr(lpos, pos - lpos);
                    wordMap[word];
                        }
                        lpos = pos + 1;  
                  }

           
            cout << wordMap[word]<< std::endl;
  }
  readingStream.close();
  return words;
}
0
 

Author Comment

by:urobins
ID: 20056212
I was reading up on map.  Do I need to create a template to use it?
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20056360
>>>> Do I need to create a template to use it?
Hmm the 'template' is the map class. You need to create an 'instance' by supplying concrete types:

   map<string, int> wordsmap;

Note, a map is a dynamically increasing container. You don't need an allocation and can add your words already in the first (reading) loop.

Regards, Alex
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20056384
>>>> char line[1000];
make it

      string line;

and change

            readingStream.getline(line, 1000);

to

            getline(readingStream, line);  // no size required
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20056449
As already told: if taking a map you don't need two loops. Simply read line by line, extract words and characters and increment the counters:

   string line;
   map<string,int> wordMap;
   int  alphacount[26] = { 0 };
   int  numcount[10] = { 0 };


   while (getline(readingStream, line))
   {
          // extract words
          int lpos = 0;
          ...
          while ((pos = (int)line.find_first_of....
          {
                 ...
                 string word = line.substr(...

                 // increment count of word via map
                 ...
                 // for each char in word check if alpha or digit and
                 // update counters if so
                 ...
           }
   }    

    // close file
    ...
    // output all counters
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20056527
>>>> wordMap[word];

You forgot to increment the counter. Note, wordMap::operator[] returns a *reference* to the 'int' value associated to the 'word'. So, it is a l-value which can be incremented without using a temporary. Check my first posts regarding map.
0
 

Author Comment

by:urobins
ID: 20056589
Thanks for the quick replies.  I will implement these right away!  You should be an online teacher :)
0
 

Author Comment

by:urobins
ID: 20056655
>> As already told: if taking a map you don't need two loops. Simply read line by line, extract words and characters and increment the counters:


Thanks I was confused, my professor (though he means very well) has a bad habit of confusing me as English is his second language sometimes I have a hard time figuring out what he was really trying to tell me.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20056751
>>>> has a bad habit of confusing me as English is his second language sometimes

yes, English is my second language most times as well ;-)

>>>> I have a hard time figuring out what he was really trying to tell me

Note, if not using a dynamically growing container like std::map, you actually need two loops. The first is only to find out how big the arrays for the words and the associated counters must be allocated.

0
 

Author Comment

by:urobins
ID: 20057095
Okay in my previous code I was returning the array to main to work with.  Should I just handle all counting etc within this one function and only call one function from main?  In this case my read function would be void as it returns nothing right?
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20057142
>> >>>> Because earlier you said you couldn't if I'm not mistaken ...
>> 
>> >> Oh okay! Yeah I can use string on this assignment,
>> >> but the professor says he doesn't see how it would
>> >> help to use strings instead.
>> 
>> I see.

I said I was confused ;) Thanks for clearing that up ...


>> yes, English is my second language most times as well ;-)

lol ... and the rest of the time ?
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20057159
>> Should I just handle all counting etc within this one function and only call one function from main?

I wouldn't do that ... the read function is already quite big. And btw, the name read for the function implies that it reads data, not that it does anything else with it. Just let it read the data from the file, and store it in the arrays. Then in some other function you can show the results etc.
0
 

Author Comment

by:urobins
ID: 20057326
>>lol ... and the rest of the time ?
LOL

Okay so i should still have the function return an array and not a map?  I am just a little confused with the addition of the map.  so when I put the value in the map and increment it, I also need to place that string in an array?  So my next question is how do I set the array up to work with strings or do I not need to worry about that and just use the map for the words but increment the array of letter count?  Sorry I am just really lost.

Thanks again!
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20057379
>>>> lol ... and the rest of the time ?

In my 'deep sleep' I am formulating wise comments as a native English speaker ...

>>>> Just let it read the data from the file, and store it in the arrays.
>>>> Then in some other function you can show the results etc.

Don't confuse me!

Do you mean all text should be stored in a container. Why?

Or do you mean that the dictionary and the counter arrays should be passed to read() and all counters set while reading?

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20057461
I would suggest to call the function readAndCount and pass the containers:

   map<string,int> wordMap;
   int  alphacount[26] = { 0 };
   int  numcount[10] = { 0 };

   int lines = readAndCount("input.txt", wordMap, alphacount, numcount);
   
   if (lines <= 0)
       return 1;


The  prototype for 'readAndCount' would be

int readAndCount(const char* file, map<string, int>& m,
                             int alphacount[26], int numcount[10]);

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20057489
>> Or do you mean that the dictionary and the counter arrays should be passed to read() and all counters set while reading?

The second ;) The map with the words, and the array with the letter frequencies can be filled by the read function, and then you can use that for whatever purpose you need it (showing to the user eg.).


>> Don't confuse me!

Heh, a lot of confusion going on lately :)


>> Sorry I am just really lost.

The map is used to store the unique words in, and the number of times they occur in the file.
Then you need one more 26 character array to store the letter frequencies.
That's all.
0
 

Author Comment

by:urobins
ID: 20058263
Okay, thanks, I think I am finally picking up on this, I have to run to another meeting, but I'll try to implement it when I get out!  Thanks you so much you guys are great!
0
 

Author Comment

by:urobins
ID: 20058878
Okay so I am reading this now and want to run this by you.  

I should have my function read and count.
in main I would declare my map and my two arrays
pass this info to readandcount this will read the file put words into the map and count my letters and words?  Sounds easy enough, I'm gonna start hammering away thanks again!
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20059044
>> I should have my function read and count.
>> in main I would declare my map and my two arrays
>> pass this info to readandcount this will read the file put words into the map and count my letters and words?

Correct.Keep us informed of your progress, and don't hesitate to ask questions if needed.
0
 

Author Comment

by:urobins
ID: 20059060
THanks I really appreciate it.  I will hopefully get some stuff done before I leave work for the day, but I'll really be hitting it hard when I get tonight.  
0
 

Author Comment

by:urobins
ID: 20061915
Okay here is what i have so far, just trying to get it to compile but I'm not having much luck.  Says getline is unidentified and that alphacount is a redefinition

here is my interpretation of your suggestions :)  

#include <iostream>
#include <fstream>
#include <map>
using namespace::std;

int readAndCount(const char* fileName, map<string, int>& m,int alphacount[26]);


int readAndCount(const char* fileName, map<string, int>& m,int alphacount[26])
{
      string line;
      map<string,int> wordMap;
      int  alphacount[26] = { 0 };
      
      ifstream readingStream(fileName);

      while (getline(readingStream, line))
      {
          // extract words
          int lPos = 0;
          int pos=0;

          while ((pos = (int)line.find_first_of(" ,.-!?", lpos)) != string::npos)

          {                
                string word = line.substr(lpos, pos - lpos);
                        wordMap[word]++;
               
           }
    }    
      readingStream.close();
      if (lines <= 0)
       return 1;
}

int main ()
{
      int lines = readAndCount("proj2test.txt", wordMap, alphacount);
      return 0;
}
0
 

Author Comment

by:urobins
ID: 20062039
Sorry I just saw one of my mistakes, but it still doesn't like getline...  Not sure why. I was hoping to get it so it could read in the words to wordmap and then count characters once I get that portion working. Just wanted to see something work :)



#include <iostream>
#include <fstream>
#include <map>
using namespace::std;

int readAndCount(const char* fileName, map<string, int>& m,int alphacount[26]);


int readAndCount(const char* fileName, map<string, int>& m,int alphacount[26])
{
      string line;
      
      
      ifstream readingStream(fileName);

      while (getline(readingStream, line))
      {
          // extract words
          int lPos = 0;
          int pos=0;

          while ((pos = (int)line.find_first_of(" ,.-!?", lpos)) != string::npos)

          {                
                string word = line.substr(lpos, pos - lpos);
                        wordMap[word]++;
               
           }
    }    
      readingStream.close();
      if (lines <= 0)
       return 1;
}

int main ()
{
      map<string,int> wordMap;
      int  alphacount[26] = { 0 };

      int lines = readAndCount("proj2test.txt", wordMap, alphacount);
      return 0;
}
0
 

Author Comment

by:urobins
ID: 20062089
Few more changes :)

#include <iostream>
#include <fstream>
#include <map>
using namespace::std;

int readAndCount(const char* fileName, map<string, int>& m,int alphacount[26]);


int readAndCount(const char* fileName, map<string, int>& m,int alphacount[26])
{
      string line;
      
      
      ifstream readingStream(fileName);

      while (getline(readingStream, line))
      {
          // extract words
          int lpos = 0;
          int pos=0;

          while ((pos = (int)line.find_first_of(" ,.-!?", lpos)) != string::npos)

          {                
                string word = line.substr(lpos, pos - lpos);
                        wordMap[word]++;
               
           }
    }    
      readingStream.close();
      return 0;
      
}

int main ()
{
      map<string,int> wordMap;
      int  alphacount[26] = { 0 };

      int lines = readAndCount("proj2test.txt", wordMap, alphacount);
      if (lines <= 0)
       return 1;
      return 0;
}
0
 

Author Comment

by:urobins
ID: 20062126
Okay, now the only error I am getting is on getline saying it is an unknown identifier.  any ideas?

#include <iostream>
#include <fstream>
#include <map>
using namespace::std;

int readAndCount(const char* fileName, map<string, int>& wordMap,int alphacount[26]);


int readAndCount(const char* fileName, map<string, int>& wordMap,int alphacount[26])
{
      string line;
      
      
      ifstream readingStream(fileName);

      while (getline(readingStream, line))
      {
          // extract words
          int lpos = 0;
          int pos=0;

          while ((pos = (int)line.find_first_of(" ,.-!?", lpos)) != string::npos)

          {                
                string word = line.substr(lpos, pos - lpos);
                        wordMap[word]++;
               
                        // read through word and count letters... not sure exactly how yet
           }
    }    
      readingStream.close();
      return 0;
      
}

int main ()
{
      map<string,int> wordMap;
      int  alphacount[26] = { 0 };

      int lines = readAndCount("proj2test.txt", wordMap, alphacount);
      if (lines <= 0)
       return 1;
      return 0;
}
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20063480
You have to add

        #include <string>

for the getline and the string usage.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20063513
>> using namespace::std;

This has to be :

        using namespace std;


>>           while ((pos = (int)line.find_first_of(" ,.-!?", lpos)) != string::npos)

Make sure that you also account for the last word on a line (if there's no punctuation or whitespace at the end of the line).

You probably also want to set lpos to the correct value at the end of the loop (to get it ready for the next iteration).

And possibly you want to add a few more whitespace characters, like \n, \t etc. and maybe a few more punctuation characters like ;


>>       return 0;

(the one inside the readAndCount function) : you want to return the number of lines read, so you'll have to keep a counter that counts the lines, and returns the value at the end of the function.


>>       // read through word and count letters... not sure exactly how yet

You go through the word you read, and treat every character in it. There is a [] operator on string that can be useful (just like arrays).
For every character, you need to determine which alphabetical character it is to know which element from the alphacount array you need to increment. There's two things to do :

        1) converting all characters to either uppercase or lowercase will make the comparison easier

        2) there's a nice "trick" to get the value of an alphabetical character. For lowercase, (c - 'a') will return the index oc the character c : it will give 0 for a, 1 for b, etc.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20063532
>>>>     while ((pos = (int)line.find_first_of(" ,.-!?", lpos)) != string::npos)
it is an infinite loop til now. You forgot to set lpos (last position) to pos + 1 at end of every iteration.

>>>> line.substr(lpos, pos - lpos);

you should check that pos is greater lpos. If there are two delimiters, i. e. two spaces or a comma and a space, you should ignore that, cause it doesn't make much sense to count 'empty' words.

>>>> // read through word and count letters... not sure exactly how yet
you could call a function that does that. Maybe count digits as well:  

  void countLettersAndDigits(const string& word, int alphacount[26], int digitcount[10]);

The function would do a loop from 0 to string length, extracting a char at each iteration (using []). Make the char upper or lower case and check whether
it is either a alphabetic or a digit. In the first case you increment the alphacount, in the second case the digitcount. If you encounter a non-alphanumeric, e. g. a dash, you could consider to put it in your list of delimiters, to neot getting it again next time.




0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20063547
>>>> Make sure that you also account for the last word on a line
Add a space to the line *before* parsing tokens. That way you'll get the last word as well.

>>>> there's a nice "trick" to get the value of an alphabetical character.
You'll find a sample of the trick if reading the current thread thoroughly.
0
 

Author Comment

by:urobins
ID: 20063914
Thanks I am getting ready to head in to work and I'll look over these and make the changes!  Thanks!
0
 

Author Comment

by:urobins
ID: 20064381
Okay, I think I got the counters and the white space working correctly, but when I use a cout to spit the word out in the loop it isn't showing anything...  The program compiles without error or warning though, so that is at least 1 good step :)  I am working on getting the alpha characters counted.  I don't have to worry about numbers.  

I did have another question.  can I output the word from the map by simply using
cout << wordMap[word];  ?

#include <iostream>
#include <fstream>
#include <map>
#include <string>

using namespace std;

int readAndCount(const char* fileName, map<string, int>& wordMap,int alphacount[26]);


int readAndCount(const char* fileName, map<string, int>& wordMap,int alphacount[26])
{
      string line;
      int numberOfLines=0;

      ifstream readingStream(fileName);

      while (getline(readingStream, line))
      {
         
          int lpos = 0;
          int pos=0;
              line +=" ";
          while ((pos = (int)line.find_first_of(" ,.-!?\\;/\\n\\t", lpos)) != string::npos)

          {      
                    if (pos > lpos)
                    {
                string word = line.substr(lpos, pos - lpos);
                        cout << word << endl;
                        wordMap[word]++;
               
                        // read through word and count letters... not sure exactly how yet
                    }
                    lpos = pos+1;
                    numberOfLines++;
           }
    }    
      readingStream.close();
      return numberOfLines;
      
}

int main ()
{
      map<string,int> wordMap;
      int  alphacount[26] = { 0 };

      int lines = readAndCount("proj2test.txt", wordMap, alphacount);
      system ("pause");
      if (lines <= 0)
       return 1;
      return 0;
}
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20064469
>>>> but when I use a cout to spit the word out in the loop it isn't showing anything...  

Then most likely the open failed.  proj2test.txt needs to be in the working directory. Better use a full path like "c:\\temp\\proj2test.txt".
0
 

Author Comment

by:urobins
ID: 20064542
doh! thanks, didn't even notice that!  that did the trick.  I am getting ready to implement my character counting.  

I haven't typed it in yet but this is what I jotted on paper... does it look about right?
Using your function name :)
string makeLower(const string& word, int& countAlpha26])
{
   char letter;
   string lowerWord= word;
   int place;
   int counter;
   int length = strlen(lowerWord);
   for (place =0; counter=0; place < length;counter++)
      {
         letter = tolower(lowerWord(couter));
         countAlpha[letter-'a'];
       }
     return lowerWord
   }
0
 

Author Comment

by:urobins
ID: 20064679
Okay I added that to my code (fixing the few spelling and naming mistakes.)  Below is what I have though now I get a whole slew of syntax errors I am working through.  Doesn't seem to like my for loop

#include <iostream>
#include <fstream>
#include <map>
#include <string>

using namespace std;

int readAndCount(const char* fileName, map<string, int>& wordMap,int countAlpha[26]);
string makeLower(const string& word, int& countAlpha26]);

int readAndCount(const char* fileName, map<string, int>& wordMap,int countAlpha[26])
{
      string line;
      int numberOfLines=0;

      ifstream readingStream(fileName);

      while (getline(readingStream, line))
      {
         
          int lpos = 0;
          int pos=0;
              line +=" ";
          while ((pos = (int)line.find_first_of(" ,.-!?\\;/\n\t", lpos)) != string::npos)

          {      
                    if (pos > lpos)
                    {
                string word = line.substr(lpos, pos - lpos);
                        cout << word << endl;
                        word = makeLower(word,coutAlpha);
                        wordMap[word]++;
               
                        // read through word and count letters... not sure exactly how yet
                    }
                    lpos = pos+1;
                    numberOfLines++;
           }
    }    
      readingStream.close();
      return numberOfLines;
      
}
string makeLower(const string& word, int& countAlpha26])
{
   char letter;
   string lowerWord= word;
   int place;
   int counter;
   int length = strlen(lowerWord);
   for (place =0; counter=0; place < length;counter++)
      {
         letter = tolower(lowerWord(counter));
         countAlpha[letter-'a'];
       }
     return lowerWord
   }

int main ()
{
      map<string,int> wordMap;
      int  countAlpha[26] = { 0 };

      int lines = readAndCount("c:\\proj2test.txt", wordMap, countAlpha);

      system ("pause");
      if (lines <= 0)
       return 1;
      return 0;
}
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20064981
>>>> int length = strlen(lowerWord);
length = lowerWord.length();

>>>> letter = tolower(lowerWord(counter));
letter = tolower(lowerWord[place]);

counter wasn't needed
0
 

Author Comment

by:urobins
ID: 20065030
Thanks, I just picked up on the counter and made the changes but I didn't figure out the length.   Do you know of a way to turn on line numbers in Visual Studio 2005?  I find myself counting lines to find the errors a lot of the time.  Thanks again!

when I compile I get these errors
(9) : error C2234: 'countAlpha' : arrays of references are illegal
(31) : error C2664: 'makeLower' : cannot convert parameter 2 from 'int []' to 'int *[]'
(44) : error C2234: 'countAlpha' : arrays of references are illegal
(50) : error C3867: 'std::basic_string<_Elem,_Traits,_Ax>::length'

My code looks like this now

#include <iostream>
#include <fstream>
#include <map>
#include <string>

using namespace std;

int readAndCount(const char* fileName, map<string, int>& wordMap,int countAlpha[26]);
string makeLower(const string& word, int& countAlpha[26]);

int readAndCount(const char* fileName, map<string, int>& wordMap,int countAlpha[26])
{
      string line;
      int numberOfLines=0;

      ifstream readingStream(fileName);

      while (getline(readingStream, line))
      {
         
          int lpos = 0;
          int pos=0;
              line +=" ";
          while ((pos = (int)line.find_first_of(" ,.-!?\\;/\n\t", lpos)) != string::npos)

          {      
                    if (pos > lpos)
                    {
                string word = line.substr(lpos, pos - lpos);
                        cout << word << endl;
                        word = makeLower(word,countAlpha);
                        wordMap[word]++;
               
                        // read through word and count letters... not sure exactly how yet
                    }
                    lpos = pos+1;
                    numberOfLines++;
           }
    }    
      readingStream.close();
      return numberOfLines;
      
}
string makeLower(const string& word, int& countAlpha[26])
{
   char letter;
   string lowerWord= word;
   int place;
 
   int length = lowerWord.length;
   for (place =0; place < length;place++)
      {
         letter = tolower(lowerWord[place]);
         countAlpha[letter-'a'];
       }
     return lowerWord;
   }

int main ()
{
      map<string,int> wordMap;
      int  countAlpha[26] = { 0 };

      int lines = readAndCount("c:\\proj2test.txt", wordMap, countAlpha);

      system ("pause");
      if (lines <= 0)
       return 1;
      return 0;
}
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20065283
>>>> int& countAlpha[26]
no '&' . arrays are passed as pointers.

>>>> lowerWord.length;
lowerWord.length();
0
 

Author Comment

by:urobins
ID: 20065354
always the little things :)  thanks so much.  I am going to toy with this and see if it works.  Thanks!
0
 

Author Comment

by:urobins
ID: 20065794
is there a way to increment characters?
I ask because I was thinking of something like this:


void printLetters (countAlpha[26])
{
   int place =0;
   for (place =0; place <=26; place++)
   {
       cout <<  character << " is used " << countAlpha[place];
       increment character;
   }
}
0
 

Author Comment

by:urobins
ID: 20066037
I tried this but it is getting compile errors

void printLetters (int countAlpha[26])
{
      
   int place =0;
   char letter= 'a';
   for (place =0; place <=26; place++)
   {
       cout <<  letter << " is used " << countAlpha[place] << " times"<<endl;
       letter++;
   }
}
0
 

Author Comment

by:urobins
ID: 20066083
doh!  Think I found it, found my other errors too!  Thanks for helping me out so much.  I am moving on the the words counting.

The other problem I see is how do I do the number of words of x length?

I know how to get the length of a word but how do I know what lengths i've read in without brute forcing it ?
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20066211
>>    for (place =0; place <=26; place++)

You probably want to use < instead of <=


>> The other problem I see is how do I do the number of words of x length?

If you want to count that too, then you need to keep a container (a vector sounds good) with all possible lengths of the words, and each time you read a word, you get its length, and increment the corresponding element in the vector. You can use the length as index into the vector.
0
 

Author Comment

by:urobins
ID: 20066373
>> You probably want to use < instead of <=

Thanks I found that when I got my code worked and saw a strange value at the end :)

>>If you want to count that too, then you need to keep a container (a vector sounds good) with all possible lengths of the words

The only real prob is I don't know all possible lengths that he might test with.  So I wouldn't know how big to make it.  Is there some kind of standard?  I mean I guess I could guess 1-100 and figure there shouldn't be a word larger than 100 characters.

Another question would be what is a vector? Could the same be done with an array of int's and access the array using str length?
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20066483
>>>> then you need to keep a container (a vector sounds good)
Hmm, a map sounds much better and you are already used to it.

   map<int, int> lengthMap;

key is the length and incrementing works same as with words.

0
 

Author Comment

by:urobins
ID: 20066491
Think I got that part workin now too :)
0
 

Author Comment

by:urobins
ID: 20066506
Oh, I ended up just using an array like countAlpha and put it to 26 figuring that no words would be longer than that... but your solution is better,

How do I increment through the map and print the words and how many times they were used?
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20067221
>>>> How do I increment through the map and print the words
>>>> and how many times they were used?

You are using an iterator.

    map<string, int>::iterator it;
    for (it = wordMap.begin(); it != wordMap.end(); ++it)
    {
    }

That is the loop. In case of the length map you would need to use the <int, int>.

Within the loop  it->first  is a reference to the key and it->second is a reference to the value which easily can be used in a cout statement.

0
 

Author Comment

by:urobins
ID: 20067575
Thanks, that works great.  The only problem I noticed is the words aren't in the order they were read in exactly.  Looks like they are coming in chunks and alphabetized by that chunk..  Does this make sense?

here is my input:

Two hundred and fifty years in the future, life as we know it is threatened by the arrival of Evil. Only the fifth element can stop the Evil from extinguishing life, as it tries to do every five thousand years. She is helped by ex-soldier, current-cab-driver, Korben Dallas, who is, in turn, helped by Prince/Arsenio clone, Ruby Rhod. Unfortunately, Evil is being assisted by Mr. Zorg, who seeks to profit from the chaos that Evil will bring, and his alien mercenaries

and this is the output:

Arsenio is used 1 times
Dallas is used 1 times
Evil is used 4 times
Korben is used 1 times
Mr is used 1 times
Only is used 1 times
Prince is used 1 times
Rhod is used 1 times
Ruby is used 1 times
She is used 1 times
Two is used 1 times
Unfortunately is used 1 times
Zorg is used 1 times
alien is used 1 times
and is used 2 times
arrival is used 1 times
as is used 2 times
assisted is used 1 times
being is used 1 times
bring is used 1 times
by is used 4 times
cab is used 1 times
can is used 1 times
chaos is used 1 times
clone is used 1 times
current is used 1 times
do is used 1 times
driver is used 1 times
element is used 1 times
every is used 1 times
ex is used 1 times
extinguishing is used 1 times
fifth is used 1 times
fifty is used 1 times
five is used 1 times
from is used 2 times
future is used 1 times
helped is used 2 times
his is used 1 times
hundred is used 1 times
in is used 2 times
is is used 4 times
it is used 2 times
know is used 1 times
life is used 2 times
mercenaries is used 1 times
of is used 1 times
profit is used 1 times
seeks is used 1 times
soldier is used 1 times
stop is used 1 times
that is used 1 times
the is used 5 times
thousand is used 1 times
threatened is used 1 times
to is used 2 times
tries is used 1 times
turn is used 1 times
we is used 1 times
who is used 2 times
will is used 1 times
years is used 2 times
 
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20068098
>> The only real prob is I don't know all possible lengths that he might test with.  So I wouldn't know how big to make it.

That's where a vector comes in handy - you don't need to know the size when you create it. It can be dynamically increased in size. A map is also an option, but it's slower, and since there's no real need to use it ... I wouldn't.


>> Another question would be what is a vector?

A vector is basically an array, but with some extra features added to it that make it easier to use :

        http://www.cplusplus.com/reference/stl/vector/


>> The only problem I noticed is the words aren't in the order they were read in exactly.  Looks like they are coming in chunks and alphabetized by that chunk..  Does this make sense?

That's normal. A map sorts on the key. The key in this case is the word, so it will sort all the words alphabetically.

You don't want that ?
0
 

Author Comment

by:urobins
ID: 20068151
If it did sort alphabetically that would be fine, but it is sorting alphabetically in chunks as shown in the above output.  Is there a way to sort the whole thing alphabetically?  the original task was to stoere the words in the order they were used, but as a side is said if you can sort them alphabetically then that is good too.

I got the strlen setup working well.  I just used an array of upto 26 and stored based on the length.  When I looked at his sample output it was obvious an array was used as it started at 0  letter words and went up to a point.  So I figured I can't think of too many 26 letter words :)  
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20068214
>> If it did sort alphabetically that would be fine, but it is sorting alphabetically in chunks as shown in the above output.

It does sort alphabetically. You have to know that capitals come before the lower case letters. So you'll first get A to Z, then a to z.


>> Is there a way to sort the whole thing alphabetically?

Sure : put all words in lower case only before you put them in the map.


>> I just used an array of upto 26 and stored based on the length

Make sure to verify that you're not trying to add a word longer than 26 characters, or you'll overwrite memory that doesn't belong to the array.


>> When I looked at his sample output it was obvious an array was used as it started at 0  letter words and went up to a point.

Both a vector and a map would have the same result.


>> So I figured I can't think of too many 26 letter words :)

That's true, but you could just be sure, and pick a higher value, like 64.
0
 

Author Comment

by:urobins
ID: 20068239
YeahI thought I was making it lower here is my code.  Do you see where I messed up because I don't???

int readAndCount(const char* fileName, map<string, int>& wordMap,int countAlpha[26], int countLength[26])
{
      string line;
      int numberOfLines=0;

      ifstream readingStream(fileName);

      while (getline(readingStream, line))
      {
         
          int lpos = 0;
          int pos=0;
              line +=" ";
          while ((pos = (int)line.find_first_of(" ,.-!?\\;/\n\t", lpos)) != string::npos)

          {      
                    if (pos > lpos)
                    {
                string word = line.substr(lpos, pos - lpos);
                        cout << word << endl;
                        word = makeLower(word,countAlpha);
                        wordMap[word]++;
                        countLength[word.length()]++;
                    }
                    lpos = pos+1;
                    numberOfLines++;
           }
    }    
      readingStream.close();
      return numberOfLines;
      
}
string makeLower(const string& word, int countAlpha[26])
{
   char letter;
   string lowerWord= word;
   int place;
 
   int length = lowerWord.length();
   for (place =0; place < length;place++)
      {
         letter = tolower(lowerWord[place]);
         countAlpha[letter-'a']++;
             
       }
     return lowerWord;
   }
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20068269
>>          letter = tolower(lowerWord[place]);

This takes the character lowerWord[place], makes it lowercase, and stores the result in letter. It does NOT modify lowerWord[place].
0
 

Author Comment

by:urobins
ID: 20068274
Oh I see I am changing the letters but never putting the letters back into the string right?
0
 

Author Comment

by:urobins
ID: 20068290
You're too quick!

So I need to modify the string as well.
could I say on the next line
lowerWord[place] = letter?
0
 

Author Comment

by:urobins
ID: 20068301
Thanks, that did do the trick.  Is there a more efficient way of doing that?
0
 

Author Comment

by:urobins
ID: 20068353
Sorry one more question is how to I go about destroying my arrays and map at the end of the program?  I was told it is good practice to clean up memory when you are done and I'd like to get in the habit early :)
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20068435
>> Is there a more efficient way of doing that?

You can store the lowercase letter directly into the string without passing via the letter char.

Something like :

string makeLower(string word, int countAlpha[26]) {
    int place = 0;
    int length = word.length();
    for (place = 0; place < length; ++place) {
        word[place] = tolower(word[place]);
        countAlpha[word[place] - 'a']++;
    }
     return word;
}


>> Sorry one more question is how to I go about destroying my arrays and map at the end of the program?

The array you used is a static array, so it will be removed automatically. The map will also be handled automatically.

No worries ;)
0
 

Author Comment

by:urobins
ID: 20068469
Oh thanks!  I appreciate it!

Sorry but I have one last question... I read something somewhere about using setw(?) to space output properly but I can't find any info anywhere do you know what I am talking about?  I might have it wrong and that's why I can't find it.

0
 

Author Comment

by:urobins
ID: 20068488
Hey don't answer that here, I'll just ask a new question.  You've been a big help so I might as well get you more points!

0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20068502
This is a reference page for setw :

        http://www.cplusplus.com/reference/iostream/manipulators/setw.html
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address. This address might be address of another variable/address of devices/address of fu…
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.
Suggested Courses

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question