Solved

Do UDFs slow performance?

Posted on 2011-03-02
9
550 Views
Last Modified: 2012-05-11
Do user defined functions slow performance?  I don't have a lot of experience with SQL Server (Fox & Access background).  Because the text functions are so limited, I wanted to create some user defined functions for a process in which I'm comparing a lot of concatenated text.  (FYI - I'm creating PADR() & CHRTRAN().)  Some of the tables are pretty big, so speed is an issue.  But I was told that using UDFs would slow performance.  Is this correct?
Thanks!
0
Comment
Question by:sarahellis
  • 4
  • 3
  • 2
9 Comments
 
LVL 50

Accepted Solution

by:
Lowfatspread earned 167 total points
ID: 35018600
YES ... your functions will add to the processing requirements of the statements....

will this cause a problem ...

it depends how efficiently you write the functions...

it is true that string maniplulation can be amongst the most costly (processing intensive) operations that your processor
is asked to do...

consider the STUFF Function for PADR....


can you give sime examples of what it is your typically want to see equivalents for?
0
 

Author Comment

by:sarahellis
ID: 35018743
We're trying to determine if a new list of people are already members or on a list telling us not to contact.  So, we're going to compare e-mail addresses, of course but lots of older records don't contain them.  Our member list is enormous.  So, we also are looking to combine strings of a number of different fields, like left(firstname,5) + left(lastname,5) + right(phone,4).  We haven't figured out the best way to do this yet.  Is there a standard?  This can't be a unique situation.
Thanks!
0
 
LVL 22

Assisted Solution

by:8080_Diver
8080_Diver earned 333 total points
ID: 35018939
sarahellis,

Do you really need to concatenate the bits of the strings or might you be able to simply use them in a WHERE clause?  Unless you are adding a column to your table in which you plan to store this magic string and to put an index on it, you will probably wind up doing a table scan on the data being compared to . . . which is not going to be a Good Thing.

Whether you use your UDF on the data from both tables, use a LIKE constraint in your WHERE clause, or simply use the LEFT or RIGHT function on the columns in both tables,you will probably wind up doing a table scan.

By the way, what is your acceptable error on the conservative side (i.e. getting a false positive on a match and, therefore, excluding a valid email address)?  What about for a false negative (i.e. not excluding an entry that you should exclude, for instance in the case of a changed phone number)?  Even using the last 4 of a telephone number, I  would not bet on the uniqueness of the first 5 of the first name and the first 5 of the last name, especially considering the possibility of people moving and geting a new phone number.  
0
 
LVL 50

Expert Comment

by:Lowfatspread
ID: 35019010
rather than wirting sql using the functions all the time it may be better to either have some tables populated with these partial
field strings and search against them, or to have some computed columns on your main tables and persist and index them...

using functions also limits the usage that the dbms can make of anu indexes you have on the columns in question....

but left(firstnrakle,4)+left(lastname,5)+right(phone,4)    is a standard way to construct as "String"

? do you need to rtrim(ltrim(yourcolumn))   before applying the left/right or have you already dealt with that?

do you also want to consider usage of soundex and difference to detect duplicates?



good luck


0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 22

Expert Comment

by:8080_Diver
ID: 35019130
Lowfatspread,

The Soundex suggestion could be just the ticket, although I would add the recommendation that a column be added to each table (newly imported and existing) with the Soundex value in it for precisely the index reasons you mentioned.
0
 

Author Comment

by:sarahellis
ID: 35019332
@Lowfaspread - Thank you.  I won't use the UDF.

re: 'but left(firstnrakle,4)+left(lastname,5)+right(phone,4)    is a standard way to construct as "String"'
I meant is there a standard way that companies generally do these compares.

re:'? do you need to rtrim(ltrim(yourcolumn))   before applying the left/right or have you already dealt with that?'
Yes, I plan to trim - just using a simplified example to clarify my question.

re: 'do you also want to consider usage of soundex and difference to detect duplicates?'
I suggested those to my client but were rejected.  The changes in last names would be from Maiden vs Married names.  First name would be Mike vs Michael.  Those aren't similar.  The one's we'd catch would be Greene vs Green, but people rarely mis spell their own names.  So, we had to throw them out.  - But I thought they were good ideas.

I think I'm just going to go with querying multiple views.  Basically one joined on email, one joined on first = first and last = last and  
(left(middle,1)=left(middle,1) OR right(phone,4)=right(phone,4)).

@8080 Diver
Thanks for your input.  We will have plenty of errors - we just have to do the best that we can.  Essentially, we'll invite a few people that are already members, and a few people that should be invited will get left out.

Will using left() & right() really slow it down?
0
 
LVL 22

Assisted Solution

by:8080_Diver
8080_Diver earned 333 total points
ID: 35019724
Will using left() & right() really slow it down?
Probably not as bad as using the UDF, however, using any function as a part of the constraints in the WHERE clause or the ON clause will void the use of any indexes.  

That is why the suggestion was made to add the column/columns with the results of the LEFT/RIGHT actions.  Running an UPDATE to populate the new column(s) would be quick.  Then you could create an index on the new column(s) and use those in the comparisons.

The changes in last names would be from Maiden vs Married names.
You're not going to catch those either way.

First name would be Mike vs Michael.
Or those.

The one's we'd catch would be Greene vs Green
That would be False Positive match based on the use of the first 5 of the Last Name.

IMHO, you might as well go with the comparison of the full first and last name as well as the full phone number.  It makes the WHERE/ON clause cleaner and let's your query use indexes based on those columns.  It still won't catch the Mike vs Michael issue . . . but you're going to mis that anyway . . . and, as you pointed out, people normally know how to spell their own name.  (They also tend to spell it consistently. ;-)

You can use a single query based upon a subquery that filters based uopon a query that LEFT OUTER JOINs the NEW and OLD data based upon the email address, selecting only those where the OLD.IdentityColumn (or some appropriate column that would normlaly never be NULL) is NULL and, then, selecting from that result LEFT OUTER JOINed to the OLD table based upon the names and telephone number where the same column from the OLD table used in the subselect is NULL.
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 35019762
Oops, forgot to provide example bit of code. ;-)
SELECT N.{list of columns}
FROM NewTable N
LEFT OUTER JOIN
(
 SELECT N.{list of columns}
 FROM NewTable N
 LEFT OUTER JOIN OldTable O
 ON N.EmailAddr = O.EmailAddr
 WHERE O.Identity IS NULL
) O
ON N.FirstName = O.FirstName AND
   N.LastName = O.LastName AND
   N.TelephoneNumber = O.TelephoneNumber 
WHERE O.Identity IS NULL
;

Open in new window

0
 

Author Comment

by:sarahellis
ID: 35020076
Thank so much for all suggestions.  Can't alter the tables - long story.  In the end, my client insisted I do it his way - as above:  Basically one joined on email, one joined on first = first and last = last and  
(left(middle,1)=left(middle,1) OR right(phone,4)=right(phone,4)).
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now