• C

Time efficiency and Space efficiency of a function

int any (char *s1, char *s2)
{
int i, n ;

for (n=0; s1[n] != '\0' ; n++){

for (i=0 ; s2[i] != '\0' ; i++){

if (s1[n] == s2[i]){
return n;}
}
}

return -1;
}
That is a function that I wrote in C, any(s1,s2) returns the first location in s1 where any character from s2 occurs, or -1 if no match found.

Can any expert out there help me to improve the function
1) Make it more time-efficient
2) Make it more space-efficient
thanks
Who is Participating?

Commented:
Hi oldb,
> 1) Make it more time-efficient

You could store all first occurences in a pointer array:

int firsts[256];
unsigned char c;

for(n=0;s1[n];n++){
c=(unsigned char)s1[n];
if(firsts[c] == -1)
firsts[c]=n;
}

for(i=0;s2[i];i++){
c=(unsigned char)s2[i];
if(firsts[c] != -1)
return firsts[c];

return -1;

This has O(n), not O(n^2).

> 2) Make it more space-efficient

Cheers,

Stefan
0

Data Warehouse Architect / DBACommented:
Hi oldb,

If you compile this program wuch optimation "turned on" you'll get a function without the integers *i* and *n*.  The compiler will generate code that uses pointers instead of indices.  There is a net savings of adding 1 to a pointer instead of adding 1 to an integer and then adding the integer to a base address.  The code will look something like this:

int any (char *s1, char *s2)
{
char *s1loc, *s2loc;

s1loc = s1;
s2loc = s2;

while (*s1loc){
s2loc = s2;
while (*s2loc){
if (*s1loc == *s2loc)
return s1loc - s1;
s2loc++;
}
s1loc++;
}
return -1;
}

Kent
0

Commented:
I think using s1[n] and s2[n] is not efficient.

int any (char *s1, char *s2)
{
char *_s1, *_s2;
_s1 = s1;
while (*_s1)
{
_s2 = s2;
while (*_s2)
{
if (*_s1 == *_s2)
{
return (_s1-s1);
}
_s2++;
}
_s1++;
}
return -1;
}
0

Commented:
Sorry. I am too slow...
0

Commented:
In any case, Kdo, your first line
s2loc = s2;
is unnecessary. So I did something :-)
0

Commented:
In many cases, you know that either s1 or s2 is going to be the same for many subsequent calls.
In that case you can use Stef's nice trick of pre-computing a table of first occurences for that string and use that inside the function.

Also you may know from your data that certain characters appear more frequently, it would help to put those near the start of the pattern string.

-or- the function could dynamically organize the pattern string, say every time there's a match, move that character with the first one on the list.  That will tend in many cases to self-organize the pattern array to be more  efficient.

-actually- in many cases it's best to call a library function if there is one that does this.  Quite often library functions are optimized all to heck, using special instructins that most C compilers don't generate, like "scan for matching byte".

0

Data Warehouse Architect / DBACommented:
Hi oldb,

Going back to the original question, do you have an expectation of what the data will look like?  If the first character in string 1 is likely to occur in string 2, then your solution (and the modifications provide by AlexFM and me) will usually give you the result faster than other methods.  But, if the returned value is greater than zero (not the first character in string 1), the solution offered by Stefan73 is usually superior.  (When n==1, the timings will be similar.  Also, timings may vary a bit depending on string size.)

If you're going to use Stefan73's solution as a "general purpose" routine, you'll need to call memset() to initialize the array every call.  This may slighly skew the timings when the returned value is small.

Kent
0

Commented:
Kdo,

Yes, speed very much depends on the average string sizes.

That's quite similar to SQL join techniques (like nested loops, hash join)...

0

Author Commented:
What does "unsigned char" do?

int firsts[256];
unsigned char c;

Why do you want the array to have 256 characters? Is it because there are 256 characters that computer use?

I am not sure whether I am right but is the algorithm doing something like : (if there are 256 unique character in computer),
then we mark off the character in that array, those that does not occur in s1 will stay '-1'. This is some sort of arranging, then we do the same for s2.

0

Commented:
>What does "unsigned char" do?

normal chars are signed, they range from -128 to 127. As you can imagine, that's not a good array index. There are two ways to solve this problem:

1. Use an int as array index and always add 128.
2. Use an unsigned char (0 to 255) instead.

From both performance and code clarity, #2 is preferable.

And yes, chars are 8 bits, so there are 256 possible values.

The algorithm is simple, but here are a bit more comments:

int firsts[256];      /* Lookup table. An entry != -1 is the index of that character in s1. */
unsigned char c;

/* Clear table */

/* Scan s1 for character occurences. Ignore repetitions of characters we already have in our lookup table. */
for(n=0;s1[n];n++){
c=(unsigned char)s1[n];
if(firsts[c] == -1) /* We don't have this character yet? */
firsts[c]=n; /* Yes -> store index */
}

/* For each s2 character, check if we have marked it as existing in our lookup table. If yes, return the first s1 character found in s2. */
for(i=0;s2[i];i++){
c=(unsigned char)s2[i];
if(firsts[c] != -1)    /* Does this character exist in s1? */
return firsts[c]; /* Yes -> return it */

return -1; /* Empty intersection -> -1 */

0

Author Commented:
thanks man
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.