Finding and matching sets of numbers in strings

qube09
qube09 used Ask the Experts™
on
Ideally the solution will be a join between two tables in two servers..

I have a problem that I have been unable to resolve. Basically I need to retrieve a code from a database. There are less than 500 codes in a database containing thousands of records. There are tables involved.

One,table A has a code as part of the record such as:
codeA
60671
60449
63093
63094
61198
61184
61372
66301
61906
64916
65825
63482
65869
65870
62799
63375
63922
61308
62326
65129
60405
66336
60687
63506
65547
63475

The other, table B, has records with two columns of interest. Here they are:
codeB              location
null                sitenamepl^51458
null                sitenamepl^51298
abc872390    sitenamepl^60671
abc872393    sitenamepl^60449^sitenamepl^63093^sitenamepl^63094
abc872396    sitenamepl^61198^sitenamepl^61184^sitenamepl^61372
null                sitenamepl^51198
abc873803    sitenamepl^66301^sitenamepl^61906^sitenamepl^64916^sitenamepl^65825
abc873814    sitenamepl^63482^sitenamepl^65869^sitenamepl^65870
abc878883    sitenamepl^62799^sitenamepl^63375^sitenamepl^63922
abc878888    sitenamepl^61308^sitenamepl^62326^sitenamepl^65129^sitenamepl^60405
abc882410    sitenamepl^66336
null                sitenamepl^51198
abc882414    sitenamepl^60687
abc883363    sitenamepl^63506^sitenamepl^65547^sitenamepl^63475

Code A is used to match the numbers after sitenamepl^ in column location in order to retrieve codeB. (No I did not design this.)

Code A will probably (but not neccessarily) match one of the sitenamepl^nnnnn constructs such as sitenamepl^63506. You should note that there may be more than one  sitenamepl^nnnnn entry in the column location.

CodeA char(10)
CodeB varchar(255)
Location varchar(255)

Each table is on a different sql server and to match requires a collate database default.

Finding the nnnnn after sitenamepl^nnnnn in location for the match with codeA in order to find codeB in an efficient manner is another issue since ideally the code will numerous times since the table containing codeA is updated constantly. The table containing codeB and location is a reference table and is not updated frequently.

The objective and acceptable solution is efficient sql to complete a match of codeA to the numbers following sitename^ in order to obtain codeB. Ideally this be in a join between the two tables.

Only sql code is acceptable for inclusion in a stored procedure under sql 2005 to be upgraded to sql2008.

At this time I merely use a version of fnsplit and cross apply in what amount to a two step solution by building a separate reference table for location for the match . looking for something more sophistocated.
thanks!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2011

Commented:
In order to get both tables together you can use linked servers.

SELECT ..... FROM Table1 a INNER JOIN Table2 b ON b.location +'^' LIKE '%^' + a.CodeA + '^%'

Author

Commented:
wouldn't it be more efficient to avoid the use of +'^' LIKE '%^' + a.CodeA + '^%'
Top Expert 2011

Commented:
I agree that it is not efficient to join on strings , but you don't have choise.
CHARINDEX will be probably more efficient, but you still need to use similar way:

SELECT ..... FROM Table1 a INNER JOIN Table2 b ON CHARINDEX('^' + a.CodeA + '^' , b.location +'^' ) > 0
Success in ‘20 With a Profitable Pricing Strategy

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Top Expert 2012

Commented:
wouldn't it be more efficient to avoid the use of +'^' LIKE '%^' + a.CodeA + '^%'
Absolutely.  But by stating "Only sql code is acceptable" you have just tied both our hands behind out back and now you expect us to dance a tango.

SQL Server (T-SQL) is lousy at string manipulation, but if that is all you can use than you have no choice.

Author

Commented:
It has to be sql because that is exactly what I have to use and more to the point maintain. And that is why I am using a version of fnsplit which is working adequately. I had hoped for something better.
Top Expert 2012

Commented:
And that is why I am using a version of fnsplit which is working adequately. I had hoped for
something better.

Unless you can use CLR, then using T-SQL string functions is all you can do.
Top Expert 2011
Commented:
fnSplit will use CHARINDEX internaly. When you use it you will need to create a cursor which is not efficient. At the end of the operation you most likely will have a temp table with 1 varchar column and you will need to join this table with Table A (where your codes are).
If Table B has 100,000 records and each location contains 3 codes then :

CASE 1: using fnSplit
  - CHARINDEX will be called 100,000 x 4 = 400,000 (the 4th call will be to find that there are no more occurences and the code can exit the loop)
 - there will be 100,000 select statements in the cursor
 - there will be 100,000 insert statements ( to insert the values from the table returned from fnSplit into the main temp table)
 - you will have a join between the temp table and Table A based on a varchar columns wich will be the number of the records in the temp table (300,000) x the number of the records in Table A  (500) = 300,000x500 = 150,000,000

CASE 2: CHARINDEX  used directly inside the join
100,000x500 =50,000,000 calls to CHARINDEX

It will be easy to compare both scenarios with real data. My guess is that CASE 2 will be more efficient because there is no cursor involved and the number of reads is lower.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial