Replace string with string in unsigned char array

Hi

I have a dynamic unsigned char array, I want a piece of code which can find for example:

Test word in this array and replace that with larger string like Test22222 and re-allocate this space for unsigned char array.

Please advice.

Thanks from now!
LVL 17
CSecurityAsked:
Who is Participating?
 
Kent OlsenConnect With a Mentor Data Warehouse Architect / DBACommented:
Hi CSecurity.

It's only a few steps to make the change, but it's critical that all of the steps occur, in order, and the proper cleanup takes place.  And you'll need to decide what to do if the string occurs more than once.

Given that you have a string called Old and you want to replace the first occurence, you'll need to do this:

1)  Search the string for the target string.
2)  If the string does not occur, exit.
3)  Determine the length of string Old.
4)  Determine the length of the target string.
5)  Determine the length of the replacement string.
6)  Allocate a buffer large enough for the new string (after the replacement).
7)  Copy the Old string, up to where the target string starts, to the New string.
8)  Copy the replacement string to the New string.
9)  Copy the rest of the Old string, starting after the target string, to the New string.

Afterwards, you'll want to free the Old string and assign the New string to the variable the contained the pointer to the Old string.


Good Luck,
Kent
0
 
Infinity08Commented:
>> Test word in this array and replace that with larger string

You cannot simply replace it. If the buffer is big enough to hold the extra bytes, then you can move the part after the word x bytes to the right, and then insert the replacement word.
If the buffer is not big enough, you'll have to either realloc it to a big enough size, or just create a new buffer, and copy the data into it.
0
 
alexcohnConnect With a Mentor Commented:
To make the code cleaner, I used a cast from unsigned char to char before calling the replace() function.
#include <string.h>
 
unsigned char *array = (unsigned char *)strdup("string with word Test inside");
 
/* use the original array */
 
replace((char**)&array, "Test", "Test22222");
 
/* use the array after replacements */
 
free(array);
...
 
void replace(char** parray, const char* to_find, const char* replace_with)
{
    const char* found = strstr(*parray, to_find);
    if (found)
    {
        char *tmpbuf = (char*)malloc(strlen(*parray) - strlen(to_find) + strlen(replace_with);
        strncpy(tmpbuf, *parray, found - *parray);
        strcat(tmpbuf, replace_with); /* or strcpy(tmpbuf + (found - *parray), replace_with) */
        strcat(tmpbuf, found + strlen(to_find)); /* or strcpy(tmpbuf + (found - *parray) + strlen(replace_with), found + strlen(to_find)); */
        free(*parray);
        *parray = tmpbuf;
    }
    return;
}

Open in new window

0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
CSecurityAuthor Commented:
Thank you all, Thanks alex for great code, just a problem, I have non printing chars like char 157, I can't cast that as char... Any ideas?
0
 
Infinity08Commented:
You cannot use the string functions on binary data, as they would get confused by the null bytes that might be in there.

Use memcpy's instead for example.
0
 
CSecurityAuthor Commented:
Thank you Infinity, can you modify Alex's code to use memcpy etc. as you say? Thank you so much
0
 
Infinity08Commented:
Just follow Kdo's step-by-step plan, and you should be fine :)
0
 
CSecurityAuthor Commented:
Is there any ready code for it? Can anyone help me for this? Thanks
0
 
alexcohnCommented:
You should not worry about unprintable characters, like 157. The only limitation of strcpy() and other functions is that they cannot handle strings that contain zero characters ('\0').
0
 
CSecurityAuthor Commented:
But I have 0 char also
0
 
Kent OlsenData Warehouse Architect / DBACommented:
Hi CSecurity,

Is this classwork related?


0
 
alexcohnCommented:
If you have '\0', how do you determine the word lengths? How do you determine the actual length of your original array, to start with?
0
 
CSecurityAuthor Commented:
I have length in another variable... It's unsigned char array and I have length of it in another variable.

Kdo, no, I'm too older for having classworks :-)
0
 
Kent OlsenData Warehouse Architect / DBACommented:

"Too Old"  :)   A lot of that going on around here.   :)


Since the question was first asked, a bit more detail has been offered that might cloud things.

If the "string" that you want to examine has binary data that could contain bytes with a value of 0, the string functions won't work.  Similarly, there is no really good built-in search function that I know of to see if a "string" is contained within the buffer.

So let's get the last couple of details ironed out.

-  Is the buffer to be edited really a zero-terminated string or is it a binary buffer?
-  Is the object that you're trying to find in the buffer really a string or is it a binary buffer?
-  Is the object that you're trying to put into the buffer really a string or is it a binary buffer?


With these answers, the code's pretty easy to put together.  Alex already provided one example.


kent
0
 
CSecurityAuthor Commented:
Yes, it's a TCP packet, contains all chars including zero char.
I'm going to find some words and replace them with another word, but word I'm going to find is completely printable chars like "test" and I'll replace that with printable chars like test2222

It's all, I want to have a code that works, I'm not so good in C++ coding, if it was another language I was coded that myself, but when it comes to C++, and unsigned char arrays which strcpy, strcat etc not works on them, it's hard for me, if possible please show me an example code in internet that does it or please modify alex's code to work with unsigned chars...

Thank you all...
0
 
Kent OlsenData Warehouse Architect / DBACommented:
TCP packet -- one more complication.....

I'm assuming the packaging of the data within a packet will be external to this process?  It's asking an awful lot of a routine to modify embedded data AND maintain the integrity of the packet(s).

If the data to be edited is still in the packet, a semi-static buffer is in order to keep the integrity of the current packet.  If the data has already been unpacked to a buffer, programmer preference prevails.  :)

The last question is, what happens if the target string occurs more than once in the buffer?

Kent
0
 
alexcohnCommented:
If you are working with binary data (a.k.a. byte stream), the code I published above essentially holds. Instead of char* you should pass structures that contain unsigned char* and length. Instead of strcpy() use memcpy(). You cannot use the strcat(), use the full variant instead. And finally, you need the find() function to replace strstr().

The function is oversimplified to demonstrate the principle, it is very far from being optimal.
typedef struct
{
    unsigned char* buf;
    size_t len;
} bytestream;
 
const unsigned char* find(bytestream haystack, bytestream needle)
{
    const unsigned char* candidate;
    for (candidate = haystack.buf; candidate < haystack.buf + haystack.len - needle.len; candidate++)
    {
        if (0 == memcmp(candidate, needle.buf, needle.len))
            return candidate;
    }
    return NULL;
}

Open in new window

0
 
alexcohnCommented:
Oops, saw your new details now. I agree with Kdo that keeping packet integrity could pose a problem. But if the words to be looked for an replaced are printable (the only true limitation is that there should have no '\0', you can use my original code with no modifications except one simple change.
#include <string.h>
 
size_t arraylen = 36; // please verify
unsigned char *array = (unsigned char *)malloc(arraylen);
memset(array, "string with zeros\0 and word Test inside", arraylen);
 
/* use the original array */
 
replace((char**)&array, &arraylen, "Test", "Test22222");
 
/* use the array after replacements */
 
free(array);
...
 
const unsigned char* memstr(const char* array, size_t arraylen, const char* to_find)
{
    const char* candidate;
    for (candidate = array; candidate < array + arraylen - strlen(to_find); candidate++)
    {
        if (0 == strncmp(candidate, to_find, strlen(to_find)))
            return candidate;
    }
    return NULL;
}
 
void replace(char** parray, size_t* parraylen; const char* to_find, const char* replace_with)
{
    const char* found = memstr(*parray, *parraylen, to_find);
    if (found)
    {
        char *tmpbuf = (char*)malloc(*parraylen - strlen(to_find) + strlen(replace_with);
        memcpy(tmpbuf, *parray, found - *parray);
        strcpy(tmpbuf + (found - *parray), replace_with);
        strcpy(tmpbuf + (found - *parray) + strlen(replace_with), found + strlen(to_find));
        free(*parray);
        *parray = tmpbuf;
        *parraylen += strlen(replace_with) - strlen(to_find);
    }
    return;
}

Open in new window

0
 
CSecurityAuthor Commented:
Don't worry about packet integrity, my code works properly just code I wrote to do replace can't re-allocate space for extra chars and it overwrittes next bytes, but everything works properly, it works...

Alex, thank you so much for your last comment, but again you are casting the array as char again, you sure it will not cause data lose and 0 char problem?
0
 
Infinity08Commented:
>> Don't worry about packet integrity, my code works properly just code I wrote to do replace can't re-allocate space for extra chars and it overwrittes next bytes, but everything works properly, it works...

Did you update the checksums ?
0
 
CSecurityAuthor Commented:
Yes, everything works properly, just as I said my code overwrites bytes
0
 
Infinity08Commented:
Ok, so what problem do you still have then ?
0
 
CSecurityAuthor Commented:
Alex's last code again casts unsigned char array as char, I afraid it will cause losing 0 char and maybe some other data... is it correct?
0
 
alexcohnCommented:
The cast to signed char in this case is absolutely legitimate and causes no side effects.
0
 
Infinity08Commented:
But why do you need his code ? You said that your code already works ?
0
 
CSecurityAuthor Commented:
Alex, so you think with your last code I'll not lose any data, right?

Infinity, my code overwrites extra bytes, when I replace test with test2222 2222 is overwritten over next bytes, that's why I need advice
0
 
Infinity08Commented:
>> that's why I need advice

So, you SHOULD worry about data integrity, and what you said in http:#22796660 does not apply.

You don't want to overwrite, but you want to insert. Which means that not only will the checksums change, but the size of the packet will also increase, potentially to the point where it needs to be split up. And that causes a whole other series of problems, since the packet sequence id's will no longer be valid, including those of the next packets.
0
 
CSecurityAuthor Commented:
Don't worry about data integrity, I know what I'm saying... Data integrity will not corrupt... How about Alex's last code?
0
 
Kent OlsenData Warehouse Architect / DBACommented:
Hi CSecurity,

There seems to be only one small hitch.  Note a couple of key lines:

        memcpy(tmpbuf, *parray, found - *parray);
        strcpy(tmpbuf + (found - *parray), replace_with);
        strcpy(tmpbuf + (found - *parray) + strlen(replace_with), found + strlen(to_find));

The first line copies up to the target string, the second line copies the replacement string, and the last line copies the data after the target string.  But since the buffer could contain binary data, the copy could be "short".

        memcpy (tmpbuf + (found - *parray) + strlen (replace_with), found + strlen (to_find), *parraylen - (found - *parray) - strlen (replace_with));

I believe that the line above should replace the last line in the block of code above.

Kent

0
 
alexcohnCommented:
Kdo, thanks for this reminder... I forgot this second memcpy when I was modifying my original code.

Except from that, binary data in the input stream will be copied correctly. Remember: if your to_find and/or replace_with items may contain zero chars, the strcpy and strlen functions cannot be used anymore. Also, the code snippet should be considered an illustration, it does lots of unnecessary work, and was not carefully debugged (see the flaw that Kdo found right now).
0
 
CSecurityAuthor Commented:
I modified your code, because I got a errors like 5 errors, here is my modification:

const char* memstr(const char* array, size_t arraylen, const char* to_find)
{
    const char* candidate;
    for (candidate = array; candidate < array + arraylen - strlen(to_find); candidate++)
    {
        if (0 == strncmp(candidate, to_find, strlen(to_find)))
            return candidate;
    }
    return NULL;
}
 
void replace(char** parray, size_t* parraylen, const char* to_find, const char* replace_with)
{
    const char* found = memstr(*parray, *parraylen, to_find);
    if (found)
    {
        char *tmpbuf = (char*)malloc(*parraylen - strlen(to_find) + strlen(replace_with));
        memcpy(tmpbuf, *parray, found - *parray);
        strcpy(tmpbuf + (found - *parray), replace_with);
       
        memcpy (tmpbuf + (found - *parray) + strlen (replace_with), found + strlen (to_find), *parraylen - (found - *parray) - strlen (replace_with));
        free(*parray);
        *parray = tmpbuf;
        *parraylen += strlen(replace_with) - strlen(to_find);
    }
    return;
}



Also I do this:
size_t mylen = (size_t) tcplen;


because tcplen is int I do this conversation


I get a lot of runtime exceptions and access violations
0
 
CSecurityAuthor Commented:
I wrote a piece of code and solved the problem but thank you all for your time and help
0
All Courses

From novice to tech pro — start learning today.