Advanced Comparison of two varchar fields

Posted on 2011-10-06
Last Modified: 2012-05-12
I have two fields that I need to compare and DIFFERENCE isn't quite doing the job.  Here's what I have so far and what I'd like to do with it:

1. A full text search is performed to find words that match in the two fields
2. Both the found and reference fields are converted to lower case and stripped of all non-character data
3. The fields are then compared using DIFFERENCE

DIFFERENCE isn't quite doing the job, so if anyone has any suggestions for a replacement?  I was thinking about adding the CHAR value of each character in the string of both fields and ranking the results based on similarity in overall number (e.g. If the reference field's overall value was 54 and the full text search match was 52, this result would appear near the top.  If the reference's value was 54 and the full text match was 187, that would appear towards the bottom.).  

I'm not sure, however, how I would do this as I'm not vastly experienced in string functions.  I do realize that I could do a more exact match with the full text search, however, for this application I need all available matches...just using better ranking.  

Question by:naisnet
    1 Comment
    LVL 39

    Accepted Solution

    In my opinion this is because you do
    "A full text search is performed to find words that match in the two fields"
    therefore full text does
    "[...] search columns containing character-based data types for values that match the meaning and not just the exact wording of the words in the search condition. When FREETEXT is used, the full-text query engine internally performs the following actions on the freetext_string, assigns each term a weight, and then finds the matches.

    Separates the string into individual words based on word boundaries (word-breaking).
    Generates inflectional forms of the words (stemming).
    Identifies a list of expansions or replacements for the terms based on matches in the thesaurus."

    What I'm trying to say is that result sets may be different from start therefor may not be useful in a comparison for that matter.

    If the data in the two fields to compare is NON-UNICODE I would use ASCII not CHAR and do the comparison after all strigs were converted to ASCII codes - you could follow the example from link below for doing that:

    Featured Post

    IT, Stop Being Called Into Every Meeting

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    Join & Write a Comment

    In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
    Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
    Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
    Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    14 Experts available now in Live!

    Get 1:1 Help Now