Solved

Time efficiency and Space efficiency of a function

Posted on 2004-03-25
11
371 Views
Last Modified: 2010-04-15
int any (char *s1, char *s2)
{
int i, n ;

for (n=0; s1[n] != '\0' ; n++){

      for (i=0 ; s2[i] != '\0' ; i++){

            if (s1[n] == s2[i]){
                  return n;}
            }
      }

return -1;
}
That is a function that I wrote in C, any(s1,s2) returns the first location in s1 where any character from s2 occurs, or -1 if no match found.

Can any expert out there help me to improve the function
1) Make it more time-efficient
2) Make it more space-efficient
thanks
0
Comment
Question by:oldb
  • 3
  • 3
  • 2
  • +2
11 Comments
 
LVL 45

Expert Comment

by:Kdo
ID: 10676727
Hi oldb,

If you compile this program wuch optimation "turned on" you'll get a function without the integers *i* and *n*.  The compiler will generate code that uses pointers instead of indices.  There is a net savings of adding 1 to a pointer instead of adding 1 to an integer and then adding the integer to a base address.  The code will look something like this:

int any (char *s1, char *s2)
{
  char *s1loc, *s2loc;

  s1loc = s1;
  s2loc = s2;

  while (*s1loc){
     s2loc = s2;
     while (*s2loc){
         if (*s1loc == *s2loc)
             return s1loc - s1;
         s2loc++;
     }
     s1loc++;
  }
  return -1;
}




Kent
0
 
LVL 48

Expert Comment

by:AlexFM
ID: 10676765
I think using s1[n] and s2[n] is not efficient.

int any (char *s1, char *s2)
{
  char *_s1, *_s2;
  _s1 = s1;
  while (*_s1)
  {
    _s2 = s2;
    while (*_s2)
    {
          if (*_s1 == *_s2)
          {
               return (_s1-s1);
           }
           _s2++;
     }
     _s1++;
  }
  return -1;
}
0
 
LVL 48

Expert Comment

by:AlexFM
ID: 10676772
Sorry. I am too slow...
0
 
LVL 48

Expert Comment

by:AlexFM
ID: 10676786
In any case, Kdo, your first line
s2loc = s2;
is unnecessary. So I did something :-)
0
 
LVL 12

Accepted Solution

by:
stefan73 earned 125 total points
ID: 10676793
Hi oldb,
> 1) Make it more time-efficient

You could store all first occurences in a pointer array:

int firsts[256];
unsigned char c;

for(i=0;i<256;i++) firsts[i]=-1 ; /* -1 == not found */

for(n=0;s1[n];n++){
    c=(unsigned char)s1[n];
    if(firsts[c] == -1)
        firsts[c]=n;
}

for(i=0;s2[i];i++){
    c=(unsigned char)s2[i];
    if(firsts[c] != -1)
        return firsts[c];

return -1;

This has O(n), not O(n^2).

> 2) Make it more space-efficient
It's already space-efficient.

Cheers,

Stefan
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 22

Expert Comment

by:grg99
ID: 10677387
In many cases, you know that either s1 or s2 is going to be the same for many subsequent calls.
In that case you can use Stef's nice trick of pre-computing a table of first occurences for that string and use that inside the function.

Also you may know from your data that certain characters appear more frequently, it would help to put those near the start of the pattern string.

-or- the function could dynamically organize the pattern string, say every time there's a match, move that character with the first one on the list.  That will tend in many cases to self-organize the pattern array to be more  efficient.

-actually- in many cases it's best to call a library function if there is one that does this.  Quite often library functions are optimized all to heck, using special instructins that most C compilers don't generate, like "scan for matching byte".

0
 
LVL 45

Expert Comment

by:Kdo
ID: 10678564
Hi oldb,

Going back to the original question, do you have an expectation of what the data will look like?  If the first character in string 1 is likely to occur in string 2, then your solution (and the modifications provide by AlexFM and me) will usually give you the result faster than other methods.  But, if the returned value is greater than zero (not the first character in string 1), the solution offered by Stefan73 is usually superior.  (When n==1, the timings will be similar.  Also, timings may vary a bit depending on string size.)

If you're going to use Stefan73's solution as a "general purpose" routine, you'll need to call memset() to initialize the array every call.  This may slighly skew the timings when the returned value is small.


 
Kent
0
 
LVL 12

Expert Comment

by:stefan73
ID: 10678610
Kdo,

Yes, speed very much depends on the average string sizes.

That's quite similar to SQL join techniques (like nested loops, hash join)...


0
 

Author Comment

by:oldb
ID: 10696591
What does "unsigned char" do?

int firsts[256];
unsigned char c;

Why do you want the array to have 256 characters? Is it because there are 256 characters that computer use?

I am not sure whether I am right but is the algorithm doing something like : (if there are 256 unique character in computer),
then we mark off the character in that array, those that does not occur in s1 will stay '-1'. This is some sort of arranging, then we do the same for s2.

thanks for reading
0
 
LVL 12

Expert Comment

by:stefan73
ID: 10714117
>What does "unsigned char" do?

normal chars are signed, they range from -128 to 127. As you can imagine, that's not a good array index. There are two ways to solve this problem:

1. Use an int as array index and always add 128.
2. Use an unsigned char (0 to 255) instead.

From both performance and code clarity, #2 is preferable.

And yes, chars are 8 bits, so there are 256 possible values.

The algorithm is simple, but here are a bit more comments:

int firsts[256];      /* Lookup table. An entry != -1 is the index of that character in s1. */
unsigned char c;

/* Clear table */
for(i=0;i<256;i++) firsts[i]=-1 ; /* -1 == not found */

/* Scan s1 for character occurences. Ignore repetitions of characters we already have in our lookup table. */
for(n=0;s1[n];n++){
    c=(unsigned char)s1[n];
    if(firsts[c] == -1) /* We don't have this character yet? */
        firsts[c]=n; /* Yes -> store index */
}

/* For each s2 character, check if we have marked it as existing in our lookup table. If yes, return the first s1 character found in s2. */
for(i=0;s2[i];i++){
    c=(unsigned char)s2[i];
    if(firsts[c] != -1)    /* Does this character exist in s1? */
        return firsts[c]; /* Yes -> return it */

return -1; /* Empty intersection -> -1 */


0
 

Author Comment

by:oldb
ID: 10731186
thanks man
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Preface I don't like visual development tools that are supposed to write a program for me. Even if it is Xcode and I can use Interface Builder. Yes, it is a perfect tool and has helped me a lot, mainly, in the beginning, when my programs were small…
This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use nested-loops in the C programming language.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now