Segfault 11 on long string, PRIOR to accessing string, only when string > 14

ANSI c on OSX 10.13.6 <br>
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix

I'm learning `c`

This is a function that manually (character-by-character) adds two character strings representing large numbers (that exceed the unsigned long long or double size).

It functions fine with any two strings 14 or less characters long, but segmentation fault 11 with any strings greater than 14 chars.

Changing the string's memory allocation method seems to have no effect (I.e. from
char[15] addend1; // not a ptr

Open in new window

to
char *addend1 = (char *) malloc(sizeof(char) * (16) ); // pointer

Open in new window


One things that's curious, is that it seems to segfault on the ...
for (int j = maxlength - 1 ; j >= 0; j--)

Open in new window

... prior to accessing either of `addend1` or `addend2`, but I'm not able to find an error there or change it to prevent the segfault.  

Am I misreading where the error arises, or could it be related to the for loop?

MAIN.c

	#include <stdio.h>
	#include <stdlib.h>
	#include "../../c-library/include/addViaStrings.h"
	
	int main(void) {
		//	s[0] = 72; s[1] = 101; s[2] = 108; s[3] = 108; s[4] = 111; s[5] = 32; s[6] = 87; s[7] = 111; s[8] = 114; s[9] = 108; s[10] = 100; s[11] = 0;
	
		// WORKS
		// char s1[] = "14073748835532";
		// char s2[] = "14073748835532";
	
		// FAILS
		char s1[] = "140737488355328";
		char s2[] = "140737488355328";
	
		char *sum = addNumericStrings(&s1, &s2);
		printf("main.sum = %s\n", sum);
	}

Open in new window


addViaStrings.h

	char* addNumericStrings(char *s1, char *s2){
		// Find the length of the greater of the two
		int maxlength = findMaxLength(s1, s2);
		printf("maxlength = %d\n", maxlength); //333
	
		///////////////////////////////////////////////
		// Using malloc instead of char[maxlength] seems to have NO EFFECT on the issue
		// char addend1[maxlength]; // not a pointer
		char *addend1 = (char *) malloc(sizeof(char) * (maxlength + 1) );
		// char addend2[maxlength]; // not a pointer
		char *addend2 = (char *) malloc(sizeof(char) * (maxlength + 1) );
	
		// Allocate sum pointer
		printf("char *sum = (char *) malloc(sizeof(char) * (maxlength + 1) ) ... "); //333
		char *sum = (char *) malloc(sizeof(char) * (maxlength + 1) );
		printf("DONE\n"); //333
	
		// General use vars
		int a1, a2, total;
		int carry = 0;
	
		// Prepare the strings for manual addition. Pad the left with char 0s
		leftPad(addend1, s1, maxlength);
		leftPad(addend2, s2, maxlength);
	
		// Buffer sum with zeros
		sum[maxlength + 1] = 0; // end flag
		printf("for (int i = 0; i < (maxlength); i++) { sum[i] = '0'; } ... "); //333
		for (int i = 0; i < (maxlength); i++) { sum[i] = '0'; } // Fill w/ 0s
		printf("DONE\n"); //333
	
		// Run the manual addition
		// Start adding individual ints from end (right side)
		printf("Start adding individual ints from end (right side) ...\n"); //333
		for (int j = maxlength - 1 ; j >= 0; j--) {
			///////////////////////////////////////////
			// The segfault seems to happen BEFORE accessing addend1 or addend2
			printf("%d ..."); // 333 This DOES NOT print
			///////////////////////////////////////////
			a1 = addend1[j] - '0'; // Convert to int
			a2 = addend2[j] - '0'; // Convert to int
			total = (a1 + a2 + carry);
			carry = 0;
			if ( total >= 10){
				carry += 1;
				total -= 10;
			}
			sum[j + 1] = '0'+total; // convert to ascii value for numbers (adding 48)
		}
		sum[0] = '0' + carry; // add last carry to start of num always, even if 0
		// Before returning, truncate leading zeros
		char *returnsum = (char *) malloc(sizeof(char) * (strlen(sum) + 1) );
		int sum_i = 0;
		int returnsm_i = 0;
	//    bool truncate = true; // Find out why this wont compile
		int truncate = 1; // true
		while (1){
			// if order is important here
			if (sum[sum_i] == '\0') { break; } // we're done
			if (sum[sum_i] == '0' && truncate == 1) { sum_i += 1; continue; } // 1 is true
			// if a num, Stop truncating 0s but DO continue adding numbers
			if (sum[sum_i] != '0') { truncate = 0; } // 0 is false
			returnsum[returnsm_i] = sum[sum_i];
			returnsm_i += 1;
			sum_i += 1;
		}
		return returnsum;
	}
[\code]

[i][b]Successful run (less than 15 chars)[/b][/i]
[code]
    maxlength = 14
    char *sum = (char *) malloc(sizeof(char) * (maxlength + 1) ) ... DONE
    for (int i = 0; i < (maxlength); i++) { sum[i] = '0'; } ... DONE
    Start adding individual ints from end (right side) ...
    13 ...12 ...11 ...10 ...9 ...8 ...7 ...6 ...5 ...4 ...3 ...2 ...1 ...0 ...main.sum = 28147497671064

Open in new window


UNSuccessful run (15 chars)
    maxlength = 15
    char *sum = (char *) malloc(sizeof(char) * (maxlength + 1) ) ... DONE
    for (int i = 0; i < (maxlength); i++) { sum[i] = '0'; } ... DONE
    Start adding individual ints from end (right side) ...

Open in new window

LVL 3
Mike R.Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

phoffric\Commented:
Skimmed snippets on my phone. Didn't see where findMaxLength is defined. Thought it unusual that you have printf with % and yet only one argument.

When dealing with pointer,
memory, and crash issues, don't rely on printf as they can alter the results.  Instead,  use a debugger and set a breakpoint on the line that crashes. Examine all relevant data at that point.

May as well work with the simpler case with no mallocs.
Write a simple one file program that crashes. Run it in the debugger. If still stuck, tell us what you see in the debugger and post your single file program for the simpler case. I'll take a look by tommorrow.

What are you using:
 OS ?
debugger ?
 version of C ?
 compiler?

Since it is critical that findMaxLength return the correct value, maybe that is the source of your problem. Be sure to include it in your next post.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Mike R.Author Commented:
Yes. It was (technically) the result of maxlength.

Actually, formally, the allocation of of the memory for sum was
char *sum = (char *) malloc(sizeof(char) * (maxlength + 1) ) 

Open in new window

and it should have been
char *sum = (char *) malloc(sizeof(char) * (maxlength + 2) )

Open in new window

phoffric\Commented:
Normally we don't do this:
(maxlength + 2)

Instead we set maxlength to the correct string length and add 1.
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

sarabandeCommented:
actually all the code is not good. why are you using dynamic strings for numbers and not simple strings that are big enough to take every string number?

#include <stdio.h>
#include <string.h>

int addNumericStrings(const char * s1, const char * s2, char sum[], size_t siz)
{
    unsigned char digit  = 0;
    unsigned char carry = 0;
    int len1 = strlen(s1);
    int len2 = strlen(s2);
    int i = 0, j = 0;
    int k = siz-1;  // we need 1 char for the terminating zero char

    // init sum string
    memset(sum, 0, siz);   
    for (i = len1-1, j = len2-1; i >= 0 || j >= 0; --i, --j)
    {
        unsigned char a1 = 0;
        unsigned char a2 = 0;
        if (k <= 0) 
            return -3;   // siz too small                   
        if (i >= 0)
        {
            a1 = (unsigned char)(s1[i] - '0');
            if (a1 > 9)
                return -1;
        }
        if (j >= 0)
        {
            a2 = (unsigned char)(s2[j] - '0');
            if (a2> 9)
                return -2;
        }
        digit = (a1+a2+carry);
        carry =  (digit >= 10)? 1 : 0;
        sum[--k] = (digit %10) + '0';
    }
    if (carry)
    {
        if (k <= 0)
            return -3;                    // siz too small
        sum[--k] = '1'; 
    }
    if (k > 0)
    {
        memmove(&sum[0], &sum[k], siz-k);
    }
    return 0;   // means success in c
}
//  98777977707
// 197279271662
// 296057249369


void main()
{
    char s1[] =  "98777977707";
    char s2[] = "197279271662";
//                296057249369
    char sum[32] = { '\0' };
    int  ret = addNumericStrings(s1, s2, sum, sizeof(sum));
    int t;
    if (ret == 0)
    {
         printf("%s + %s = %s\n", s1, s2, sum);
    }
    else 
    {        
          printf("error return = %d\n", ret);

    }       
     scanf("%d", &t);
}

Open in new window


a few remarks:

if using dynamic strings created by malloc, you never should create the string in the function as there is the rule that dynamic pointers should be freed in the same function where they have been created. so it is much better to provide a sufficient large buffer by the caller and handle the case where the buffer is too short.

you always should initialize buffers and variables. for fixed-sized strings this is very easy:

char sum[32] = { '\0' };

Open in new window


this statement will set all 32 characters to binary zero character. so if you fill from left up to index = sizeof(sum) - 1, the resulting string always is null-terminated.


Sara
Mike R.Author Commented:
@sarabande
Very nice. Thank you for taking the time to write that better code.
phoffric\Commented:
>> there is the rule that dynamic pointers should be freed in the same function where they have been created

This is a good rule to follow if you think of the function as being the owner of the allocated memory. But as with many rules there are exceptions. And on many projects I've seen so many exceptions, that I would call this rule more as a guide than a rule.

For example, if you have a C factory, then the function that allocates the memory passes the ownership to the caller and that function will likely take responsibility for freeing the memory.

There are other schemes where huge data chunks are allocated by a function that takes in null pointers and allocates the memory based on input arguments. Again, this allocator does not own the memory and will not free it. It may be responsible for touching each page before giving ownership to a higher level.

When documenting your design about memory allocations, you should note the ownership of the memory.
sarabandeCommented:
if you have a C factory, then the function that allocates the memory passes the ownership to the caller

Open in new window


yes, same is with malloc which creates a pointer or any other function where the main functionality is to provide a new pointer. it is easy to determine whether a function that returns a new pointer also passes ownership or not.

the function here was to calculate the sum of two string numbers and returns it as a string. if the function returns a char pointer which must be freed by the caller, the caller must know this what without doubt is bad design.

schemes where huge data chunks need to allocated based on parameters may have one function which does the allocation, but this function is only a helper that replaces a malloc call in C or a new operation in C++.  you easily could make this function only return the required size and do the allocation by the owner. or define the pointers as shared or global data what also clarifies the question of ownership.

Sara
sarabandeCommented:
ANSI c on OSX 10.13.6

i would assume you were not using ansi c compiler but a c++ compiler.

in ansi c all variables must be defined at top of a {} block without any normal c statements between. but in your original post there are printf statements above new char [] variables.

also a statement like

for (int j = maxlength - 1 ; j >= 0; j--) {

Open in new window


normally doesn't compile with an ansi c compiler.

in c++ you could use std::string class for strings and and not char pointers (or char buffers).

Sara
phoffric\Commented:
>> you easily could make this function only return the required size and do the allocation by the owner
Yes you could. As a designer you can do whatever you want, unless there are unecessary "rules" that force you to do it your way. Now all the designers on the team are forced to have the caller do the allocations and free the memory in that same function unless they get a waiver from the "rule".
sarabandeCommented:
as told it is a bad design if you return the result of  a function as a pointer that must be freed by the caller. i am a programmer since 40 years but this "rule" has nothing to do with paternalism but makes your life happier.  even for a factory or a private memory heap, you should provide a mechanism which disposes the caller from freeing memory. i don't know why you insist that this is unnecessary it is vital instead. give me any code where you ignore that and i will turn it to better code.

Sara
phoffric\Commented:
At this point I can't believe we are talking about the same general software engineering principles so we can stop.
Mike R.Author Commented:
@sarabande
"as told it is a bad design if you return the result of  a function as a pointer that must be freed by the caller."

I come from a Python background (so I'm used to creating an object once, towards the beginning - in Main() if you will - and passing it around as need be) ...

BUT, I'm learning C for embedded uses (not C++ for large software...but C for very small memory environments that need to be fast).

So, based on those two thoughts - What if you want this data value to be reusable by other functions? Should old data allocations be freed, and then recreated again, over and over?
sarabandeCommented:
What if you want this data value to be reusable by other functions?

there are two ways in C:

first: the top function, i. e. main function defines a pointer to data and passes the pointer as (normally first) argument to all functions who need access to it.

void main()
{
      MyData * pData = (MyData *)malloc(sizeof(MyData));
      Prepare(pData);    // might allocate pointer members of MyData
      Run(pData);
      Clean(pData);        // will free dynamic data used in MyData
      free pData;            // frees dynamic data allocated by malloc
}

Open in new window


or alternatively (and better if there is no resizing necessary)

void main()
{
      MyData data = { 0 };
      Prepare(&data);    // might allocate pointer members of MyData
      Run(&data);
      Clean(&data);        // will free dynamic data used in MyData
}

Open in new window



second: define global data
// main.c
#include "mydata.h"
...
// defines a global instance of struct MyData and initializes it to 0
MyData data = { 0 };
void main()
{
      Prepare();    // might allocate pointer members of MyData
      Run();
      Clean();        // will free dynamic data used in MyData
}

Open in new window



// prepare.c

#include "mydata.h>

// declare as extern that will make the linker to give access to the data defined in main.c
extern MyData data;

void Prepare()
{
       // prepare now has access to the one and only MyData instance defined in main.c
       if (data.some_other_pointer == 0)
       {
              data.some_other_pointer = (SomeOtherStruct *)malloc(sizeof(SomeOtherStruct));
       }
       ....
}

Open in new window


in both cases the MyData was defined only once. i always would prefer the first method with local data defined in main. but most projects have a combination of both when using shared data.

for very small memory environments that need to be fast
for speed it doesn't matter if you create memory in the main function or in any sub function.

it matters however, if you would do many resizes or copies where this is not really necessaray.  for example if you were parsing and convert a huge string and would realloc the new string for each character of the input string. that is very slow. much better is it to either do the changes inline in the input string not using a second buffer at all. or you calculate the new size once, create the new buffer and then do the conversion.

Sara
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C

From novice to tech pro — start learning today.