Link to home
Create AccountLog in
Avatar of brich744
brich744

asked on

c++ ispunct()

Hello Everyone,

I am writing a program that reads from a file and counts the number of words in the file. Next, it will resize the file to be 100, 1000, 10000, or 50,0000.  Depending on what the file size is closest to.  Finally, all of the words will be converted to lowercase and if there is a punctuation it must be deleted.  The problem that I am having  is if I have two punctuation like " it,. " it will delete the "," but not the ".".  So it will not consecutively delete punctuations.    
#include <iostream>
#include <string>
#include<cctype>
#include<fstream>
#include<cstdlib>
using namespace std;
 
void lowerCase(string&);
int determine_File_Size(int&);
 
int main()
{
    string strToConvert;  //The string that will be converted.
 	

  cout<<"***********************************"<<endl;
  cout<<"	Converter Program	"<<endl;
  cout<<"***********************************"<<endl;
 

  ifstream stringFile("test.txt");


  int count = 0;
 
  char ch;

  string s;

  stringFile>>ch;

  stringFile.unsetf(ios::skipws);

  while( getline(stringFile, s,' '))
  {
	 
	  //stringFile>>ch;

	   //if(isspace(ch)) {
      count++;
      //while(isspace(ch) && !stringFile.eof()) 
        //getline(stringFile, s);
		  // stringFile >> ch; 
	   //}

  }

  cout<<"Word Count "<<count<<endl;

  int fileSize = determine_File_Size(count);
	
  ifstream newCursor("test.txt");
  newCursor.unsetf(ios::skipws);
	int size = 0;
	string word; 
	while (getline(newCursor, word, ' ') && size != count)
    {
		
        lowerCase(word);
        cout << "\nNew String: " << word;

		size++;
 
    }


	 stringFile.close();
	 newCursor.close();

  system("PAUSE");
    return 0;
}
 
 
void lowerCase(string& strToConvert)
{
   for(unsigned int i=0;i<strToConvert.length();i++)
   {
	   if(isalpha(strToConvert[i]))
		strToConvert[i] = tolower(strToConvert[i]);

	   else if(ispunct(strToConvert[i]))
		   strToConvert.erase(i,1);
		
	   else if(isalnum(strToConvert[i]))
		   strToConvert.erase(i,1);
	   
   }
}

int determine_File_Size(int& count)
{
	int count100 = 100 - count;
	int count1000 = 1000 - count;
	int count10000 = 10000 - count;
	int count50000 = 50000 - count;

	int returnSize;

	if(abs(count100) < abs(count1000) && abs(count100) < abs(count10000) && abs(count100) < abs(count50000))
	  {
		  cout<<"The File Size Will Be 100"<<endl;
		  returnSize = 100;
	  }

	 else if(abs(count1000) < abs(count100) && abs(count1000) < abs(count10000) && abs(count1000) < abs(count50000))
	  {
		  cout<<"The File Size Will Be 1000"<<endl;
		  returnSize=1000;
	  }

	 else if(abs(count10000) < abs(count100) && abs(count10000) < abs(count1000) && abs(count10000) < abs(count50000))
	  {
		  cout<<"The File Size Will Be 10000"<<endl;
		  returnSize=10000;
	  }

	else if(abs(count50000) < abs(count100) && abs(count50000) < abs(count1000) && abs(count50000) < abs(count10000))
	  {
		  cout<<"The File Size Will Be 50000"<<endl;
		  returnSize=50000;
	  }

		return returnSize;

}

Open in new window

Avatar of kaufmed
kaufmed
Flag of United States of America image

My guess is that when you call erase(), you do remove a single punctuation, BUT, the next punctuation subsequently becomes the index that you just deleted--because you removed one character. Now, when your loop iterates, it iterates to where the 2nd puctuation *should* be, but that punctuation is now one index less that it used to be. When your loop iterates, you are looking at the very next index, but that index now points to whatever follows the 2nd punctuation.

For example, given the string:  He!!o World!

The iterations would be like this:
He!!o World!     // Looking at 0; No change
He!!o World!     // Looking at 1; No change
He!o World!      // Looking at 2; Everthing from index 3 onward shifts one index
He!o World!      // Looking at 3; No change since 3 now holds an "o"
He!o World!      // Looking at 4; No change
He!o World!      // Looking at 5; No change
He!o World!      // Looking at 6; No change
etc.

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
decrementing i won't work if the punctuation is the first character. Since i is an unsigned type you'll set it to a massive number causing your loop to terminate prematurely. The solution is to only increment i if you don't erase.
void lowerCase(string& strToConvert)
{
   for(unsigned int i=0;i<strToConvert.length();)
   {
           if(isalpha(strToConvert[i]))
                strToConvert[i] = tolower(strToConvert[i]);

           else if(ispunct(strToConvert[i]))
                   strToConvert.erase(i,1);
                
           else if(isalnum(strToConvert[i]))
                   strToConvert.erase(i,1);
           else i++;
           
   }
}

Open in new window

A cleaner solution is to use the STL remove_if algorithm
http://www.cplusplus.com/reference/algorithm/remove_if/
bool predfunc(char c)
{
   return isalpha(c) || ispunct(c) || isalnum(c);
}

strToConvert.resize(std::distance(strToConvert.begin(), std::remove_if(strToConvert.begin(), strToConvert.end(), predfunc));

Open in new window

Oh evilrix. Your C++-Fu is so much stronger than mine. I yield to your immense knowledge. I most certainly didn't think of that possibility, and I agree. Good call!
decrementing i won't work if the punctuation is the first character.

evilrix,

On second thought, even if you decrement an unsigned integer prior to the loop iteration firing, yes you get the maximum value for unsigned integer, but the loop iteration should loop it back around to 0 (since you've incremented past the max size of unsigned int. Am I missing something?
Heheh...  on third thought...

Nevermind. I forgot about the part, "causing your loop to terminate prematurely."
Actually kaufmed, you're right -- it won't cause premature termination because although it will be set to -1 (a massive unsigned number) it will be incremented before the test in the for loop, which will set it back to 0 again.

Apologies, I must have been half asleep this morning and I bow to your superior kung-fu on this occasion :)
I'm working on 3 hours of sleep. My kung-fu is by no means superior 2day  = )