Link to home
Start Free TrialLog in
Avatar of Vlearns
Vlearns

asked on

c++ Insert into Char* str

Hi

I have a char pointer (char * notmystuff) to html content. It has start html and end html tags and bunch of html code in between,.

i have another char * mystuff which has a anchor html tag like

<a href = "some website"> some text </a>

I want to  insert mystuff  into the original notmystuff  so that correct html is displayed..

i guess that means ..detecting the last </body> </html> tags...abd inserting " mystuff" just before body?


Any ideas on the best way to do this?
Avatar of Vlearns
Vlearns

ASKER

i also have lengths of both "notmystuff" and "mystuff"
Avatar of evilrix
>> Any ideas on the best way to do this?
Use a proper HTML parser (something like el kabong). Trying to parse HTML yourself from scratch is going to be a right pain unless you have absolute control of the HTML. A proper parser will take care of all the hard work for you leaving you to focus on your business logic.

http://sourceforge.net/projects/ekhtml/
You can use strstr() to find a string within a string:
const char * strstr ( const char * str1, const char * str2 );
         char * strstr (         char * str1, const char * str2 );
strstr returns a" pointer to the first occurrence in str1 of any of the entire sequence of characters specified in str2, or a null pointer if the sequence is not present in str1."
    http://www.cplusplus.com/reference/clibrary/cstring/strstr/

Here is sample code that illustrates how to use it.
/* strstr example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="This is a simple string";
  char * pch;
  pch = strstr (str,"simple");
  strncpy (pch,"sample",6);
  puts (str);
  return 0;
}

Open in new window

Since there may be nested tags, it is up to you to identify the matching end tags with their originating tag.
>> Since there may be nested tags, it is up to you to identify the matching end tags with their originating tag.
And as this can get very complex (I've written enough parsers to know what a pain this can be) I strongly recommend against trying to roll your own HTML parser -- there are just too many edge cases to consider and unless you have strict control of the HTML you will be forever added code to deal with another "special case" (unfortunately HTML is a very flexible protocol and parsers have to be very forgiving since HTML can be malformed and most browsers will still parse it). For this reason, unless this is an exercise that specifically requires you to parse HTML yourself, really - don't bother. Using a Sax style parser is just way simpler and el kabong (that I recommended earlier) is very light weight, simple to use and fast (very fast).
Avatar of Vlearns

ASKER

can there be two </html> or </body> tags in a valid html document?
No, but that doesn't mean there won't be and even if there is the html is still generally going to be ok in terms of being rendered. The thing with html is that unlike xml, which has strict rules that compliant parsers are obliged to enforce, html parsers are not. This being the case the chances of being presented with malformed html are much higher.

If you are in control of the creation of the html you can probably get away with rolling your own parser but if you are not then your parser needs to be able to cope with malformed html or you run the risk of getting unexpected parsing results that mean you'll keep having to add support for edge cases.

A tried and tested sax parser will already be able to cope with such problems (often having configuration options to control how the parser should deal with them), making your life a lot simpler.

Html is a real pain to parser properly. If what you are doing is more than a quick and dirty solution yoou really are better off leaving the parsing up to a proper parser.
Avatar of Vlearns

ASKER

in this code example attached by phoriffic

int main ()
{
  char str[] ="This is a simple string";
  char * pch;
  pch = strstr (str,"simple");
  strncpy (pch,"sample",6);
  puts (str);
  return 0;
}


simple is of the same size as sample

my problem is the following

int main ()
{
  char str[] ="This is a simple string whose size i know";
  char * pch;
  pch = strstr (str,"simple");
  strncpy (pch,mycontent,sizeof(mycontent));
  puts (str);
  return 0;
}

Making the change as above is causing the program to crash


Avatar of Vlearns

ASKER

strncpy (pch,ad_content,sizeof(ad_content));
memset(ad_content,32, sizeof(ad_content));//blank spaces
memcpy(ad_content,tag.c_str(),tag.size());

   char * pch;
   pch = strstr (_in_mem_msg_ptr->buff(),"</body></html>");
   strncpy (pch,ad_content,sizeof(ad_content));
   strncpy (pch,"</body></html>", sizeof("</body></html>"));

this is my code

that crashes at  strncpy (pch,ad_content,sizeof(ad_content));

can you help me debug?
How many bytes is mycontent? If sizeof(mycontent) were a few bytes, say 10, then you should be OK, but if it were large, say, 100, then you could overrun the str buffer.
int main ()
{
  char str[] ="This is a simple string whose size i know";
  char * pch;
  pch = strstr (str,"simple");
  strncpy (pch,mycontent,sizeof(mycontent)); // this could overun the str[] buffer and cause crash
  puts (str);
  return 0;
}

Open in new window

Avatar of Vlearns

ASKER

 char * pch;

    tag = tagObj.findTag(getLocationIdByName("us"), rcptList,"html","sbc" );


    char ad_content[300 + 1];  

    memset(ad_content,32, sizeof(ad_content));//blank spaces

   
   memcpy(ad_content,tag.c_str(),tag.size());
 

   pch = strstr (_in_mem_msg_ptr->buff(),"</body></html>");
   strncpy (pch,ad_content,sizeof(ad_content));
   strncpy (pch,"</body></html>", sizeof("</body></html>"));



it crashes at strncpy (pch,ad_content,sizeof(ad_content));

Avatar of Vlearns

ASKER

to phorrific:

yes thats what happening

how do i

increase the size of str buffer so that it can hold the content (which is always 300 characters)

originally
str = original content

after modification

str = original string before </body> + mycontent(300 chars) + original string after </body>

If you start using the C++ class, then you could use the string::replace operation:
     http://www.cplusplus.com/reference/string/string/replace/

Otherwise, using c-strings, the easiest way is to have a source buffer (initialized to 0) and a destination buffer. Then strcat the individual string pieces into the destination buffer.

Naturally, you want the destination buffer to be well oversized to handle unanticipated growth in your requirements. And you want to manage the number of bytes so that you don't write more into the destination buffer than there is space. (And don't forget the terminating null byte.)
Hi Vlearns,
   I'll be here a for just a little longer; if you have questions on my latest post, I'll be back tomorrow to see them.
    Paul
Avatar of Vlearns

ASKER

   
/* original data */
        data = _in_mem_msg_ptr->buff();
        data_len = _in_mem_msg_ptr->dataLen();


/*location of </body> in the original */

        char *insert = strstr(data, "</body>");

/*length of the new buffer string */

        int length = strlen(data)+strlen(ad_content);
        newdata =(char*)malloc(sizeof(char)*length);
        memset(newdata, 0, length);

/*copy the original data upto </body> into newdata*/

        memcpy(newdata,data,insert-data);

/*now add the ad_content */
        strcat(newdata,ad_content);

/*copy the data from </body> to end of original string(data) into newdata */

        memcpy(newdata,data,data_len - ending );

how do i implement the the last statement : memcpy(newdata,data,data_len - ending );

      i  need to copy the remainer of the data from my char* data beginning from and including </body> to the very end...how do i correctly compute the "ending" parameter in the memcpy?
 
Avatar of Vlearns

ASKER

       data = _in_mem_msg_ptr->buff();
        data_len = _in_mem_msg_ptr->dataLen();
        char *insert = strstr(data, "</body>");
        int length = strlen(data)+strlen(ad_content);
        newdata =(char*)malloc(sizeof(char)*length);
        memset(newdata, 0, length);
        memcpy(newdata,data,insert-data);
        strcat(newdata,ad_content);
        memcpy(newdata,data,data_len - ending );
ASKER CERTIFIED SOLUTION
Avatar of phoffric
phoffric

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Vlearns

ASKER

how can i use c++ string class to achieve the same?

thanks!
Avatar of Vlearns

ASKER



does this sound right to you?

        char *insert = strstr(_in_mem_msg_ptr->buff(), "</body>");//get pointer to </body>
       string ad_data = string(_in_mem_msg_ptr->buff(),insert - _in_mem_msg_ptr->buff()) ;//insert the part of _in_mem_msg_ptr->buff() before the </body>
       ad_data.append(ad_content); //add the new html content
       ad_data.append(_in_mem_msg_ptr->buff(),insert- _in_mem_msg_ptr->buff(),_in_mem_msg_ptr->dataLen()); //remainder of _in_mem_msg_ptr->buff() from and including </body> to the end

>> how can i use c++ string class to achieve the same?
Good question. I think the question to be asked in a separate question to keep the C and C++ concepts separated for the PAQ. If doing this, be clear what your input is and what you expect the output to be (e.g., using one of the simple examples, as in http:#35389759 ). Also, in your example, it might be best (if your application allows it) to avoid the char type by replacing it with the string class.