• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 313
  • Last Modified:

Advanced Comparison of two varchar fields

I have two fields that I need to compare and DIFFERENCE isn't quite doing the job.  Here's what I have so far and what I'd like to do with it:

1. A full text search is performed to find words that match in the two fields
2. Both the found and reference fields are converted to lower case and stripped of all non-character data
3. The fields are then compared using DIFFERENCE

DIFFERENCE isn't quite doing the job, so if anyone has any suggestions for a replacement?  I was thinking about adding the CHAR value of each character in the string of both fields and ranking the results based on similarity in overall number (e.g. If the reference field's overall value was 54 and the full text search match was 52, this result would appear near the top.  If the reference's value was 54 and the full text match was 187, that would appear towards the bottom.).  

I'm not sure, however, how I would do this as I'm not vastly experienced in string functions.  I do realize that I could do a more exact match with the full text search, however, for this application I need all available matches...just using better ranking.  

1 Solution
lcohanDatabase AnalystCommented:
In my opinion this is because you do
"A full text search is performed to find words that match in the two fields"
therefore full text does
"[...] search columns containing character-based data types for values that match the meaning and not just the exact wording of the words in the search condition. When FREETEXT is used, the full-text query engine internally performs the following actions on the freetext_string, assigns each term a weight, and then finds the matches.

Separates the string into individual words based on word boundaries (word-breaking).
Generates inflectional forms of the words (stemming).
Identifies a list of expansions or replacements for the terms based on matches in the thesaurus."

What I'm trying to say is that result sets may be different from start therefor may not be useful in a comparison for that matter.

If the data in the two fields to compare is NON-UNICODE I would use ASCII not CHAR and do the comparison after all strigs were converted to ASCII codes - you could follow the example from link below for doing that:


Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now