?
Solved

How do I use memcpy to allocate multiple characters?

Posted on 2008-11-16
10
Medium Priority
?
689 Views
Last Modified: 2013-11-18
This code is supposed to copy a token (two characters) from the 1-D character array linetext to *token. But when I cout *token, only the second character is printed. Does this mean that I incorrectly copied linetext or that I'm incorrectly printed token?
(*token) = (char *) malloc(tokenLength + 1);
memcpy (*token, &linetext[i], tokenLength);
cout << "token: " << *token << endl;

Open in new window

0
Comment
Question by:vwps
10 Comments
 
LVL 8

Expert Comment

by:97WideGlide
ID: 22973288
Check the value of i.  C arrays are zero based (first character is at positon 0).  I am guessing that you are indexing into linetext assuming the first character is at linetext[1].  

Hope it helps.
Please let me know.
0
 
LVL 8

Expert Comment

by:97WideGlide
ID: 22973298
Also, note the memory is not initialized by malloc() so you might want to make sure *token is null terminated after the memcpy().

0
 

Author Comment

by:vwps
ID: 22973354
>> Check the value of i

Thanks 97WideGlide! You were right, the values of i in memcpy were incorrect. While I think I fixed it for 1-character tokens, it still doesn't work for 2-character tokens. Is it because of the way the lines are looped through? (Please see attached code)

>> you might want to make sure *token is null terminated after the memcpy()

What does this mean?


/* system_utilities.cpp */
 
#include <iostream>
#include <fstream>
#include <string.h> 
#include "system_utilities.h"
#include "definitions.h"
 
using namespace std;
 
ifstream file; // create input stream object
 
char linetext[256]; // array of characters to hold 255 characters + terminating null char
int length; // variable to hold length of current line of input (num chars)
int pos = 0; // variable to hold position of last character of current input line read by getNextToken
char *word;
 
int errorcode;
 
void printError(int errorcode)
{
   switch ( errorcode )
   {
      case END_OF_FILE:
         cout << "Error: End of file has been reached." << endl;
         break;
      case FILE_NOT_OPENED:
         cout << "Error: File could not be opened." << endl;
         break;
      case TOKEN_NOT_FOUND:
         cout << "Error: Token not found." << endl;
         break;
      default:
         cout << "There is an error." << endl;
   }
}
 
int openInputFile (char fname[])
{
   file.open(fname, ios::in); // what is ios::in for?
   if (file.good())
   {
      cout << fname << " was read successfully" << endl;
      return 0; 
   }
   else
   {
      cout << "file not read successfully" << endl;
      return FILE_NOT_OPENED; 
   }
}
 
// NOTE: WILL NOT READ NEXT LINE UNLESS LAST CHARACTER OF CURRENT LINE HAS A SPACE AFTER IT
 
int getNextToken(char **token) 
{
   int tokenFound = 0; // whether or not a token has been found
   int tokenLength = 0; // length of token
  
   while(tokenFound == 0) // when no token has been found
   {
      if(pos == 0) // if you're at the beginning of a line
      {
         int j;
         for (j = 0; j <256; j++) // set all character array values to NULL
         {
            linetext[j] = '\0';
         }
         file.getline(linetext, 256); // read in a line of text from the file
         length = strlen(linetext); // get the length of the line
      }
      
	  // recognizing single characters 
	  
	  int i;
	  for(i = pos; i < length && (tokenFound == 0); i++) // loop through whole line
	  { 
	     if( ((linetext[i] != ' ') && (linetext[i] != '\0')) && ( (linetext[i+1] == ' ') || (i == (length - 1))) ) // if the character is not a space or a null and the next character is a space // and the next character is a space, or this is the last character in the line
		 {
			cout << "       pos: " << pos << " linetext[" << i << "] = " << linetext[i] << endl;
			cout << "       i = " << i << "      length = " << length << endl;
			tokenLength = 1; // QUESTION: how do i fix this? i am confused about pos and i's difference
			(*token) = (char *) malloc(tokenLength + 1); // first row of token has enough space for length of token + null character
			memcpy(*token, &linetext[i - tokenLength + 1], tokenLength); // copy token into *token
			cout << "ONE CHARACTER TOKEN: " << *token << endl;
			cout << "       pos: " << pos << endl;
			pos = pos + tokenLength; 
		 }
		 
		 else if( (symbol(linetext[i]) && (symbol(linetext[i+1]))) ) // else if it's a symbol and the next one is a symbol 
		 {
			cout << linetext[i] << linetext[i+1] << " is a symbol with more than 1 character" << endl; // how is it executing both the if and the else statements?
			cout << "pos: " << pos << endl;
			tokenLength = 2; // QUESTION: how do i fix this? i am confused about pos and i's difference
			(*token) = (char *) malloc(tokenLength + 1);
			memcpy(*token, &linetext[i - tokenLength + 1], tokenLength);
			cout << "tokenLength: " << tokenLength << endl;
			cout << "TWO CHARACTER SYMBOL TOKEN: " << *token << endl; // QUESTION: why is this only printing the last character of the token?
			pos = pos + tokenLength; 
			cout << "pos: " << pos << endl;
			tokenFound = 1;
			i++;
		 } 
		 
		 else if( (linetext[i] != ' ') && (linetext[i] != '\0') && (linetext[i + 1] != ' ') && (linetext[i + 1] != '\0') )
		 {
			while( (linetext[i + 1] != ' ') && (linetext[i + 1] != '\0') && (i != length) )
			{
				cout << "                     next character is part of the token" << endl;
			// else if this one and the next one is a letter (two or more letters --> a word)
			/*cout << linetext[i] << linetext[i + 1] << " is a 2-letter word" << endl;
			tokenLength = 2; // QUESTION: how do i fix this? i am confused about pos and i's difference
			cout << "      tokenLength = " << tokenLength << endl;
			(*token) = (char *) malloc(tokenLength + 1);
			memcpy(*token, &linetext[i], tokenLength);
			*/
				i++;
			//tokenFound = 1;
			}
		 }
		 
	  }
	  
	  if(i == length) // if you've reached the end of the line 
	  {
		pos = 0; // reset pos to 0
		if(file.eof()) // if you're also at the end of the file
		{
			return END_OF_FILE;
		}
	  }
	}
	return 0;
}
 
int symbol(char c)
{
   if ((c == '>') || (c == '<') || (c == '/') || (c == '='))
      {
      return 1;
      }
   else
      {
      return 0;
      }
}
 
/* system_utilities.h */
 
int openInputFile(char fname[]); // input file name string, open file and assign to file-level ifstream variable
int getNextToken( char **token ); // finds next token in input file, allocates new space, assigns token to point to new space, copies characters
void printError(int errorcode); // prints appropriate error message
int symbol(char c);
 
/* main.cpp */
 
#include <iostream>
#include "system_utilities.h"
#include "definitions.h"
 
using namespace std;
 
char *token1[10000];
 
int main() 
{
   openInputFile("/Users/vshen/Documents/EECS 211/Program4/onlychars.txt");
   while(getNextToken(token1) != END_OF_FILE);
   cout << "The file has been tokenized." << endl;
   return 0;
}
 
/* definitions.h */
 
#define END_OF_FILE 201 
#define FILE_NOT_OPENED 202 
#define TOKEN_NOT_FOUND 301 
#define MAX_LINE_LENGTH 255 

Open in new window

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 53

Expert Comment

by:Infinity08
ID: 22973777
I would suggest using strncpy instead of memcpy :

        http://www.cplusplus.com/reference/clibrary/cstring/strncpy.html
strncpy(*token, &linetext[i], tokenLength);
(*token)[tokenLength] = '\0';

Open in new window

0
 
LVL 40

Assisted Solution

by:evilrix
evilrix earned 400 total points
ID: 22973914
I would suggest using the much safer std::string, which will make your life so much simpler.
http://www.cplusplus.com/reference/string/string/

You can use this with >> or better still getline() to read from a file.
http://www.cplusplus.com/reference/string/getline.html

Some other observations (this isn't a complete analysis, just some quick points)...

Your code is a bit of a mess. You have many globals, for no good reason. Apart from making code hard to maintain they are not inherently thread safe (unless const). Also, non-pod (plain old data) types can throw exceptions on construction. You use fstream as a global, which is a non-pod type. If this throws on construction your program will just crash. This can be a very hard to diagnose problem.

Your functions for opening the file and reporting errors are unnecessary too. Just created and open the fstream when/where you need it. You can report errors by throwing an exception. If you want to avoid writing your own error detection and exception throwing code you can just enable steam exceptions using the exceptions() method on the stream. Using this you can tell the stream to throw an exception on specific events (such as bad bit or eof being set).

http://www.cplusplus.com/reference/iostream/ios/exceptions.html

>> what is ios::in for?
It tells the fstream you are opening the data for reading. You can use ifstream instead to avoid having to pass this value.

fstream fs;
fs.open("file", std::ios::in)

...is the same as...

ifstream ifs;
ifs.open("infile");

http://www.cplusplus.com/reference/iostream/ifstream/

In fact, you are better sticking to either ifstream or ofstream (same is std::ios::out) if you are only reading or writing as it prevents you accidentally passing an fstream to a function that takes either. The fstream class will automatically convert to ifstream or ofstream so it is possible to accidentally pass it to a function that you shouldn't.
0
 
LVL 8

Expert Comment

by:97WideGlide
ID: 22976067
I'm not sure what you are trying to accomplish.

For example, depending on your input stream, I don't think you are guaranteed that your token will be "1 char only" as you mention in your comments.

but to answer your questions directly without trying to guess :

82:      tokenLength = 1; // QUESTION: how do i fix this? i am confused about pos and i's difference
                tokenLength = i + 1;

Hold on.  As I was in the process of making suggestions, I thought that it would be best if you explained what it is that you are trying to do before I make more suggestions.  If this is a homework project and you are constrained in the way you should solve the problem, say so.  We'll work with what you have.  Otherwise, if you just want to get the job done there might be standard C functions which will make your code much more straightforward.

In the meantime, consider the change to line 82 above.  It might be enough of a suggestion to enable you to change the rest of your code to function the way you want.


0
 

Author Comment

by:vwps
ID: 22976874
Thank you for your comments. I'm sorry I didn't include the specifications earlier. evilrix, I agree that my code is a mess. Unfortunately, every function that I wrote (except for "symbol") is required. I am explicitly forbidden from using the built-in string class.

The purpose of this program is to split lines of text read in from a text file into tokens. The tokens will be single characters, words, symbols (one or two characters long), and quoted strings. For the full assignment description, see the attached file.

>> In the meantime, consider the change to line 82 above.

tokenLength = i + 1 does not work because there can be multiple tokens in a line. I think it's supposed to be closer to tokenLength = i - pos + 1, with i being the index of the last character in the token and pos being the last character read by the function before this token began. I'm not really sure how to implement pos.
p4directions.txt
0
 
LVL 39

Assisted Solution

by:itsmeandnobodyelse
itsmeandnobodyelse earned 400 total points
ID: 22978807
you might consider using a switch statement for parsing the textline.

And simply safe the (potential) first position of next token in a variable (lpos).

Always reallocate the token array if you got a new token.


 
 int ntoken = 0;
 // allocate memory for 2d-array
 token  = (char**)malloc(sizeof(char*));
 *token = NULL;  // init token
 
   
int gettoken(const char* linetext, char** token, int ntoken)
{
    char cc = '\0';
    char cn = '\0';
    int npos = 0;
    int lpos = 0;
    int len = strlen(linetext);
    // note, we were parsing including the terminating zero
    for (int i = 0; i <= len; ++i)
    {
          npos = i;
          cc = linetext[i];
          cc = '\0';
          if (i < len)
             cn = linetext[i+1];
          switch (cc)
          {
              case ' ':   // the missing break is intented
              case '\0':  // take both end-of-line and space as separator
                   if (lpos < npos)
                   {
                     token[ntoken] = (char*)malloc(npos - lpos+1);
                     strncpy(token[ntoken], &linetext[lpos], 
                             npos-lpos);
                     token[ntoken][npos-lpos] = '\0';
                     ++ntoken;
                     // always allocate one char pointer more than needed
                     // that way you can detect the end off array 
                     // by checking for NULL pointer
                     token  = (char**)realloc(sizeof(char*) * (ntoken+1));
                     token[ntoken] = NULL;
                   }
                   lpos = npos + 1; 
                   break;
             case '<':         // the missing break is intented
             case '>':         // the missing break is intented
             case '=':
                   if (cn == '=')
                   {
                       // do what ever you want to do with symbols ...
 
                       lpos = npos + 1;
                   }
                   break;
             default:
                   // we just goon in the loop       
                   break; // that breaks the switch 
                          // but not the loop            
          }
    }
    return ntoken;
} 

Open in new window

0
 
LVL 8

Accepted Solution

by:
97WideGlide earned 1200 total points
ID: 22981066

To answer your question, the length of your token (tokenLength) should be the difference between pos and i , namely, (i - pos + 1).  
 
Do you see why ???  Each time you call gettoken() you start a for() loop initializing i to be equal to pos.  Then, as you step along your input line you increment the value of i at the top of the for() loop - while pos holds your starting position;  So your token is (i - pos) characters in length and then you have to add 1.

Here are specific answers to your questions :

82 :  tokenLength = (i-pos+1); // QUESTION: how do i fix this? i am confused about pos and i's difference
96:   memcpy(*token, &linetext[i], tokenLength);   // you seem to have found a two character token in linetext starting at offset i, right ?

I was traveling but I'm back now.  If you have additional questions, just post them here at EE and I'll try to get back 2 U quickly.

0
 

Author Closing Comment

by:vwps
ID: 31517322
Thanks very much, and I'm sorry for the delay in assigning points.
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
Suggested Courses
Course of the Month16 days, 2 hours left to enroll

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question