How do I use memcpy to allocate multiple characters?

This code is supposed to copy a token (two characters) from the 1-D character array linetext to *token. But when I cout *token, only the second character is printed. Does this mean that I incorrectly copied linetext or that I'm incorrectly printed token?
(*token) = (char *) malloc(tokenLength + 1);
memcpy (*token, &linetext[i], tokenLength);
cout << "token: " << *token << endl;

Open in new window

vwpsAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

97WideGlideCommented:
Check the value of i.  C arrays are zero based (first character is at positon 0).  I am guessing that you are indexing into linetext assuming the first character is at linetext[1].  

Hope it helps.
Please let me know.
0
97WideGlideCommented:
Also, note the memory is not initialized by malloc() so you might want to make sure *token is null terminated after the memcpy().

0
vwpsAuthor Commented:
>> Check the value of i

Thanks 97WideGlide! You were right, the values of i in memcpy were incorrect. While I think I fixed it for 1-character tokens, it still doesn't work for 2-character tokens. Is it because of the way the lines are looped through? (Please see attached code)

>> you might want to make sure *token is null terminated after the memcpy()

What does this mean?


/* system_utilities.cpp */
 
#include <iostream>
#include <fstream>
#include <string.h> 
#include "system_utilities.h"
#include "definitions.h"
 
using namespace std;
 
ifstream file; // create input stream object
 
char linetext[256]; // array of characters to hold 255 characters + terminating null char
int length; // variable to hold length of current line of input (num chars)
int pos = 0; // variable to hold position of last character of current input line read by getNextToken
char *word;
 
int errorcode;
 
void printError(int errorcode)
{
   switch ( errorcode )
   {
      case END_OF_FILE:
         cout << "Error: End of file has been reached." << endl;
         break;
      case FILE_NOT_OPENED:
         cout << "Error: File could not be opened." << endl;
         break;
      case TOKEN_NOT_FOUND:
         cout << "Error: Token not found." << endl;
         break;
      default:
         cout << "There is an error." << endl;
   }
}
 
int openInputFile (char fname[])
{
   file.open(fname, ios::in); // what is ios::in for?
   if (file.good())
   {
      cout << fname << " was read successfully" << endl;
      return 0; 
   }
   else
   {
      cout << "file not read successfully" << endl;
      return FILE_NOT_OPENED; 
   }
}
 
// NOTE: WILL NOT READ NEXT LINE UNLESS LAST CHARACTER OF CURRENT LINE HAS A SPACE AFTER IT
 
int getNextToken(char **token) 
{
   int tokenFound = 0; // whether or not a token has been found
   int tokenLength = 0; // length of token
  
   while(tokenFound == 0) // when no token has been found
   {
      if(pos == 0) // if you're at the beginning of a line
      {
         int j;
         for (j = 0; j <256; j++) // set all character array values to NULL
         {
            linetext[j] = '\0';
         }
         file.getline(linetext, 256); // read in a line of text from the file
         length = strlen(linetext); // get the length of the line
      }
      
	  // recognizing single characters 
	  
	  int i;
	  for(i = pos; i < length && (tokenFound == 0); i++) // loop through whole line
	  { 
	     if( ((linetext[i] != ' ') && (linetext[i] != '\0')) && ( (linetext[i+1] == ' ') || (i == (length - 1))) ) // if the character is not a space or a null and the next character is a space // and the next character is a space, or this is the last character in the line
		 {
			cout << "       pos: " << pos << " linetext[" << i << "] = " << linetext[i] << endl;
			cout << "       i = " << i << "      length = " << length << endl;
			tokenLength = 1; // QUESTION: how do i fix this? i am confused about pos and i's difference
			(*token) = (char *) malloc(tokenLength + 1); // first row of token has enough space for length of token + null character
			memcpy(*token, &linetext[i - tokenLength + 1], tokenLength); // copy token into *token
			cout << "ONE CHARACTER TOKEN: " << *token << endl;
			cout << "       pos: " << pos << endl;
			pos = pos + tokenLength; 
		 }
		 
		 else if( (symbol(linetext[i]) && (symbol(linetext[i+1]))) ) // else if it's a symbol and the next one is a symbol 
		 {
			cout << linetext[i] << linetext[i+1] << " is a symbol with more than 1 character" << endl; // how is it executing both the if and the else statements?
			cout << "pos: " << pos << endl;
			tokenLength = 2; // QUESTION: how do i fix this? i am confused about pos and i's difference
			(*token) = (char *) malloc(tokenLength + 1);
			memcpy(*token, &linetext[i - tokenLength + 1], tokenLength);
			cout << "tokenLength: " << tokenLength << endl;
			cout << "TWO CHARACTER SYMBOL TOKEN: " << *token << endl; // QUESTION: why is this only printing the last character of the token?
			pos = pos + tokenLength; 
			cout << "pos: " << pos << endl;
			tokenFound = 1;
			i++;
		 } 
		 
		 else if( (linetext[i] != ' ') && (linetext[i] != '\0') && (linetext[i + 1] != ' ') && (linetext[i + 1] != '\0') )
		 {
			while( (linetext[i + 1] != ' ') && (linetext[i + 1] != '\0') && (i != length) )
			{
				cout << "                     next character is part of the token" << endl;
			// else if this one and the next one is a letter (two or more letters --> a word)
			/*cout << linetext[i] << linetext[i + 1] << " is a 2-letter word" << endl;
			tokenLength = 2; // QUESTION: how do i fix this? i am confused about pos and i's difference
			cout << "      tokenLength = " << tokenLength << endl;
			(*token) = (char *) malloc(tokenLength + 1);
			memcpy(*token, &linetext[i], tokenLength);
			*/
				i++;
			//tokenFound = 1;
			}
		 }
		 
	  }
	  
	  if(i == length) // if you've reached the end of the line 
	  {
		pos = 0; // reset pos to 0
		if(file.eof()) // if you're also at the end of the file
		{
			return END_OF_FILE;
		}
	  }
	}
	return 0;
}
 
int symbol(char c)
{
   if ((c == '>') || (c == '<') || (c == '/') || (c == '='))
      {
      return 1;
      }
   else
      {
      return 0;
      }
}
 
/* system_utilities.h */
 
int openInputFile(char fname[]); // input file name string, open file and assign to file-level ifstream variable
int getNextToken( char **token ); // finds next token in input file, allocates new space, assigns token to point to new space, copies characters
void printError(int errorcode); // prints appropriate error message
int symbol(char c);
 
/* main.cpp */
 
#include <iostream>
#include "system_utilities.h"
#include "definitions.h"
 
using namespace std;
 
char *token1[10000];
 
int main() 
{
   openInputFile("/Users/vshen/Documents/EECS 211/Program4/onlychars.txt");
   while(getNextToken(token1) != END_OF_FILE);
   cout << "The file has been tokenized." << endl;
   return 0;
}
 
/* definitions.h */
 
#define END_OF_FILE 201 
#define FILE_NOT_OPENED 202 
#define TOKEN_NOT_FOUND 301 
#define MAX_LINE_LENGTH 255 

Open in new window

0
PMI ACP® Project Management

Prepare for the PMI Agile Certified Practitioner (PMI-ACP)® exam, which formally recognizes your knowledge of agile principles and your skill with agile techniques.

Infinity08Commented:
I would suggest using strncpy instead of memcpy :

        http://www.cplusplus.com/reference/clibrary/cstring/strncpy.html
strncpy(*token, &linetext[i], tokenLength);
(*token)[tokenLength] = '\0';

Open in new window

0
evilrixSenior Software Engineer (Avast)Commented:
I would suggest using the much safer std::string, which will make your life so much simpler.
http://www.cplusplus.com/reference/string/string/

You can use this with >> or better still getline() to read from a file.
http://www.cplusplus.com/reference/string/getline.html

Some other observations (this isn't a complete analysis, just some quick points)...

Your code is a bit of a mess. You have many globals, for no good reason. Apart from making code hard to maintain they are not inherently thread safe (unless const). Also, non-pod (plain old data) types can throw exceptions on construction. You use fstream as a global, which is a non-pod type. If this throws on construction your program will just crash. This can be a very hard to diagnose problem.

Your functions for opening the file and reporting errors are unnecessary too. Just created and open the fstream when/where you need it. You can report errors by throwing an exception. If you want to avoid writing your own error detection and exception throwing code you can just enable steam exceptions using the exceptions() method on the stream. Using this you can tell the stream to throw an exception on specific events (such as bad bit or eof being set).

http://www.cplusplus.com/reference/iostream/ios/exceptions.html

>> what is ios::in for?
It tells the fstream you are opening the data for reading. You can use ifstream instead to avoid having to pass this value.

fstream fs;
fs.open("file", std::ios::in)

...is the same as...

ifstream ifs;
ifs.open("infile");

http://www.cplusplus.com/reference/iostream/ifstream/

In fact, you are better sticking to either ifstream or ofstream (same is std::ios::out) if you are only reading or writing as it prevents you accidentally passing an fstream to a function that takes either. The fstream class will automatically convert to ifstream or ofstream so it is possible to accidentally pass it to a function that you shouldn't.
0
97WideGlideCommented:
I'm not sure what you are trying to accomplish.

For example, depending on your input stream, I don't think you are guaranteed that your token will be "1 char only" as you mention in your comments.

but to answer your questions directly without trying to guess :

82:      tokenLength = 1; // QUESTION: how do i fix this? i am confused about pos and i's difference
                tokenLength = i + 1;

Hold on.  As I was in the process of making suggestions, I thought that it would be best if you explained what it is that you are trying to do before I make more suggestions.  If this is a homework project and you are constrained in the way you should solve the problem, say so.  We'll work with what you have.  Otherwise, if you just want to get the job done there might be standard C functions which will make your code much more straightforward.

In the meantime, consider the change to line 82 above.  It might be enough of a suggestion to enable you to change the rest of your code to function the way you want.


0
vwpsAuthor Commented:
Thank you for your comments. I'm sorry I didn't include the specifications earlier. evilrix, I agree that my code is a mess. Unfortunately, every function that I wrote (except for "symbol") is required. I am explicitly forbidden from using the built-in string class.

The purpose of this program is to split lines of text read in from a text file into tokens. The tokens will be single characters, words, symbols (one or two characters long), and quoted strings. For the full assignment description, see the attached file.

>> In the meantime, consider the change to line 82 above.

tokenLength = i + 1 does not work because there can be multiple tokens in a line. I think it's supposed to be closer to tokenLength = i - pos + 1, with i being the index of the last character in the token and pos being the last character read by the function before this token began. I'm not really sure how to implement pos.
p4directions.txt
0
itsmeandnobodyelseCommented:
you might consider using a switch statement for parsing the textline.

And simply safe the (potential) first position of next token in a variable (lpos).

Always reallocate the token array if you got a new token.


 
 int ntoken = 0;
 // allocate memory for 2d-array
 token  = (char**)malloc(sizeof(char*));
 *token = NULL;  // init token
 
   
int gettoken(const char* linetext, char** token, int ntoken)
{
    char cc = '\0';
    char cn = '\0';
    int npos = 0;
    int lpos = 0;
    int len = strlen(linetext);
    // note, we were parsing including the terminating zero
    for (int i = 0; i <= len; ++i)
    {
          npos = i;
          cc = linetext[i];
          cc = '\0';
          if (i < len)
             cn = linetext[i+1];
          switch (cc)
          {
              case ' ':   // the missing break is intented
              case '\0':  // take both end-of-line and space as separator
                   if (lpos < npos)
                   {
                     token[ntoken] = (char*)malloc(npos - lpos+1);
                     strncpy(token[ntoken], &linetext[lpos], 
                             npos-lpos);
                     token[ntoken][npos-lpos] = '\0';
                     ++ntoken;
                     // always allocate one char pointer more than needed
                     // that way you can detect the end off array 
                     // by checking for NULL pointer
                     token  = (char**)realloc(sizeof(char*) * (ntoken+1));
                     token[ntoken] = NULL;
                   }
                   lpos = npos + 1; 
                   break;
             case '<':         // the missing break is intented
             case '>':         // the missing break is intented
             case '=':
                   if (cn == '=')
                   {
                       // do what ever you want to do with symbols ...
 
                       lpos = npos + 1;
                   }
                   break;
             default:
                   // we just goon in the loop       
                   break; // that breaks the switch 
                          // but not the loop            
          }
    }
    return ntoken;
} 

Open in new window

0
97WideGlideCommented:

To answer your question, the length of your token (tokenLength) should be the difference between pos and i , namely, (i - pos + 1).  
 
Do you see why ???  Each time you call gettoken() you start a for() loop initializing i to be equal to pos.  Then, as you step along your input line you increment the value of i at the top of the for() loop - while pos holds your starting position;  So your token is (i - pos) characters in length and then you have to add 1.

Here are specific answers to your questions :

82 :  tokenLength = (i-pos+1); // QUESTION: how do i fix this? i am confused about pos and i's difference
96:   memcpy(*token, &linetext[i], tokenLength);   // you seem to have found a two character token in linetext starting at offset i, right ?

I was traveling but I'm back now.  If you have additional questions, just post them here at EE and I'll try to get back 2 U quickly.

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
vwpsAuthor Commented:
Thanks very much, and I'm sorry for the delay in assigning points.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming Languages-Other

From novice to tech pro — start learning today.