• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 336
  • Last Modified:

Update values based on multy line compare without using a loop

Hi,
I need to compare two tables, one is my reference table with names of "places" broken into words and the other one is address also broken into words.
in the reference table I keep a placeCode which I would like to update into the addressesTable.
I've been playing around with it for some time and I can't get it right...
there are a few problems :
when breaking a place name to words we get values that appear in more then one place entry like the word "New" which is part of New York as well as New Hampshire.
similarly there are values which are a part of another value such as "Hempshire"  and "New Hempshire". (can it be done using a few updates from the longer values to the shorter ones ?)
I would like to update only the rows that actually match and not the whole Address (ID)

please any ideas will be apreciated.

attached is a sample code that builds three tables and populates them, one table is the refrence table (dictionary) and the other two are the input data, one just as I get it and the other as I need it after the update.
I would like to avoid update in a loop...

/* table creation */
CREATE TABLE placesdictionary (
  PlaceID int,
  PlaceWordNum int,
  PlaceWord varchar(50) ,
  PlaceCode int
) ;

CREATE TABLE testdata_after (
  ID int,
  WordNum int,
  Word varchar(50) ,
  PlaceCode int
) ;

CREATE TABLE testdata_b4 (
  ID int,
  WordNum int,
  Word varchar(50) ,
  PlaceCode int
) ;

/* populate dictionary */
INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (1, 1, 'New', 7878)

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (1, 2, 'York', 7878)

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (2, 1, 'New', 8888)

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (2, 2, 'Hampshire', 8888)

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (3, 1, 'South', 9999)

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (3, 2, 'Carolina', 9999)

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (4, 1, 'South', 2323)

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (4, 2, 'Dakota', 2323)

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (5, 1, 'Camp', 8765);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (5, 2, 'Verde', 8765);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (6, 1, 'Camp', 5656);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (6, 2, 'Davis', 5656);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (7, 1, 'Jericho', 4444);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (8, 1, 'Hempshire', 6666);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (9, 1, 'Camp', 8765);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (9, 2, 'Verd', 8765);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (10, 1, 'Yericho', 4444);

/* populate b4 table */
INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (1, 1, 56);

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (1, 2, 'Arlington');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (1, 3, 'rd.');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (1, 4, 'New');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (1, 5, 'York');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (2, 1, 44);

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (2, 2, 'Main');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (2, 3, 'st.');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (2, 4, 'New');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (2, 5, 'Hempshire');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (3, 1, 5);

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (3, 2, 'Main');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (3, 3, 'st.');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (3, 4, 'Hempshire');

/* populate after table */
INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (1, 1, 56);

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (1, 2, 'Arlington');

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (1, 3, 'rd.');

INSERT INTO tests.testdata_after
(ID, WordNum, Word, PlaceCode)
VALUES (1, 4, 'New', 7878);

INSERT INTO tests.testdata_after
(ID, WordNum, Word, PlaceCode)
VALUES (1, 5, 'York', 7878);

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (2, 1, 44);

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (2, 2, 'Main');

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (2, 3, 'st.');

INSERT INTO tests.testdata_after
(ID, WordNum, Word, PlaceCode)
VALUES (2, 4, 'New', 8888);

INSERT INTO tests.testdata_after
(ID, WordNum, Word, PlaceCode)
VALUES (2, 5, 'Hempshire', 8888);

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (3, 1, 5);

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (3, 2, 'Main');

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (3, 3, 'st.');

INSERT INTO tests.testdata_after
(ID, WordNum, Word, PlaceCode)
VALUES (3, 4, 'Hempshire', 6666);
0
syoffe
Asked:
syoffe
  • 6
  • 5
  • 2
  • +1
1 Solution
 
mkobrinCommented:
This should do it for you
select b4.ID, b4.WordNum, b4.word, d.placecode
from testdata_b4 b4
left outer join placesdictionary d on d.PlaceWord = b4.word and d.PlaceID  = b4.ID
order by b4.ID, b4.WordNum

Open in new window

0
 
syoffeAuthor Commented:
thnx for the suggestion,
but no this does not help me as what I need is a little more complicated.
see the testdata_after and compare it to the result of the select you sent... similar but not same :)
0
 
syoffeAuthor Commented:
I must not take Place codes for half a place, like I cannot take the code for "New Hempshire" 8888 for the word New without it being followed by the word Hempshire
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
syoffeAuthor Commented:
Sorry I made some spelling mistakes in my testdata here it is corrected :

drop table placesdictionary;
drop table testdata_after;
drop table testdata_b4;
/* table creation */
CREATE TABLE placesdictionary (
  PlaceID int,
  PlaceWordNum int,
  PlaceWord varchar(50) ,
  PlaceCode int
) ;

CREATE TABLE testdata_after (
  ID int,
  WordNum int,
  Word varchar(50) ,
  PlaceCode int
) ;

CREATE TABLE testdata_b4 (
  ID int,
  WordNum int,
  Word varchar(50) ,
  PlaceCode int
) ;

/* populate dictionary */
INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (1, 1, 'New', 7878);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (1, 2, 'York', 7878);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (2, 1, 'New', 8888);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (2, 2, 'Hampshire', 8888);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (3, 1, 'South', 9999);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (3, 2, 'Carolina', 9999);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (4, 1, 'South', 2323);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (4, 2, 'Dakota', 2323);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (5, 1, 'Camp', 8765);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (5, 2, 'Verde', 8765);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (6, 1, 'Camp', 5656);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (6, 2, 'Davis', 5656);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (7, 1, 'Jericho', 4444);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (8, 1, 'Hempshire', 6666);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (9, 1, 'Camp', 8765);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (9, 2, 'Verd', 8765);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (10, 1, 'Yericho', 4444);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (11, 1, 'New', 8888);

INSERT INTO PlacesDictionary
(PlaceID, PlaceWordNum, PlaceWord, PlaceCode)
VALUES (11, 2, 'Hempshire', 8888);

/* populate b4 table */
INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (1, 1, 56);

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (1, 2, 'Arlington');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (1, 3, 'rd.');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (1, 4, 'New');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (1, 5, 'York');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (2, 1, 44);

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (2, 2, 'Main');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (2, 3, 'st.');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (2, 4, 'New');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (2, 5, 'Hempshire');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (3, 1, 5);

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (3, 2, 'Main');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (3, 3, 'st.');

INSERT INTO tests.testdata_b4
(ID, WordNum, Word)
VALUES (3, 4, 'Hempshire');

/* populate after table */
INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (1, 1, 56);

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (1, 2, 'Arlington');

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (1, 3, 'rd.');

INSERT INTO tests.testdata_after
(ID, WordNum, Word, PlaceCode)
VALUES (1, 4, 'New', 7878);

INSERT INTO tests.testdata_after
(ID, WordNum, Word, PlaceCode)
VALUES (1, 5, 'York', 7878);

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (2, 1, 44);

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (2, 2, 'Main');

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (2, 3, 'st.');

INSERT INTO tests.testdata_after
(ID, WordNum, Word, PlaceCode)
VALUES (2, 4, 'New', 8888);

INSERT INTO tests.testdata_after
(ID, WordNum, Word, PlaceCode)
VALUES (2, 5, 'Hempshire', 8888);

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (3, 1, 5);

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (3, 2, 'Main');

INSERT INTO tests.testdata_after
(ID, WordNum, Word)
VALUES (3, 3, 'st.');

INSERT INTO tests.testdata_after
(ID, WordNum, Word, PlaceCode)
VALUES (3, 4, 'Hempshire', 6666);
0
 
syoffeAuthor Commented:
and there is nothing between PlaceID and testdata_b4.ID I cannot use this column for the join...
0
 
JoeNuvoCommented:
is there a limit for number of WordNum under dictionary?
from given example is 2, but is there any thing greater?
0
 
JoeNuvoCommented:
and, will it have more than 1 word able to found from dictionary in 1 address?
just for ex, may found "New York" and "New Hempshire" in same ID of testdata
0
 
rajeevnandanmishraCommented:
Hi,

I am not considering the optimization. But in the given constraint. If you already have data in your testdata_after table then run the below code:
 
update P 
set PlaceCode = Q.PlaceCode 
from 
testdata_after P, 
(select x.id, y.PlaceCode from 
(select a.id, a.Word + ISNULL(b.Word,'') Word From 
(select id, wordNum, Word from testdata_b4 where WordNum = 4) a left outer join 
(select id, wordNum, Word from testdata_b4 where WordNum = 5) b on a.id = b.id ) x , 
(select a.PlaceCode, a.PlaceWord + ISNULL(b.PlaceWord,'') PlaceWord From 
(select PlaceCode, PlaceWord from placesdictionary where PlaceWordNum = 1) a left outer join 
(select PlaceCode, PlaceWord from placesdictionary where PlaceWordNum = 2) b on a.PlaceCode = b.PlaceCode) y 
 where x.Word = y.PlaceWord) Q 
where P.ID = Q.ID 
and P.WordNum in (4,5)

Open in new window


If you are creating the data by using the select query then please use:
 
INSERT INTO testdata_after 
SELECT P.ID, P.WordNum, P.Word, case when P.WordNum in (4,5) then Q.PlaceCode else null end PlaceCode 
from 
testdata_b4 P LEFT OUTER JOIN  
(select distinct x.id, y.PlaceCode from 
(select a.id, a.Word + ISNULL(b.Word,'') Word From 
(select id, wordNum, Word from testdata_b4 where WordNum = 4) a left outer join 
(select id, wordNum, Word from testdata_b4 where WordNum = 5) b on a.id = b.id ) x , 
(select a.PlaceCode, a.PlaceWord + ISNULL(b.PlaceWord,'') PlaceWord From 
(select PlaceCode, PlaceWord from placesdictionary where PlaceWordNum = 1) a left outer join 
(select PlaceCode, PlaceWord from placesdictionary where PlaceWordNum = 2) b on a.PlaceCode = b.PlaceCode) y 
 where x.Word = y.PlaceWord) Q 
ON P.ID = Q.ID

Open in new window

0
 
rajeevnandanmishraCommented:
I have considered an assumption that the Address that need the matching will always be with WordNum 4 or 5.
0
 
JoeNuvoCommented:
this is my very triggy code.
not sure how well it will perform.
you have to give it a try and feedback

-- Reduce the size of dictionary, to reduce the size of recursive CTE
Declare @PDic TABLE (
  PlaceID int,
  PlaceWordNum int,
  PlaceWord varchar(50) ,
  PlaceCode int,
  WordCnt int
)
INSERT INTO @PDic
	SELECT *, 0 FROM PlacesDictionary
	WHERE PlaceID IN (
		SELECT PlaceID 
		FROM PlacesDictionary PD 
			INNER JOIN testdata_b4 T ON PD.PlaceWord = T.Word
		)
-- update size of complete word
UPDATE @PDic
SET WordCnt = CNT
FROM (SELECT PlaceID, COUNT(*) CNT FROM @PDic GROUP BY PlaceID) PCnt, @PDic PD
WHERE PCnt.PlaceID = PD.PlaceID

-- define max recursive loop
Declare @max int
SELECT @max = MAX(wordnum) FROM testdata_b4

-- recursive to create matching table for all WordNum position
Declare @Match TABLE (ID int, WordNum int, PlaceCode int, PlaceID int, PlaceRound int, WordCnt int)
;WITH CTE as (
	SELECT PlaceID, PlaceWordNum, PlaceWord, PlaceCode, WordCnt, 1 As PlaceRound FROM @PDic
	UNION ALL
	SELECT PlaceID, PlaceWordNum + 1, PlaceWord, PlaceCode, WordCnt, PlaceRound + 1
	FROM CTE WHERE PlaceRound < @max
)
INSERT INTO @Match
SELECT ID, WordNum, CTE.PlaceCode, PlaceID, PlaceRound, Wordcnt
FROM testdata_b4 t
	INNER JOIN CTE on t.WordNum = CTE.PlaceWordNum AND t.Word = CTE.PlaceWord

-- delete incomplete word
DELETE M
FROM @Match M INNER JOIN (
	SELECT ID, PlaceID, PlaceCode, PlaceRound, COUNT(*) cnt
	FROM @Match
	GROUP BY ID, PlaceID, PlaceCode, PlaceRound
) Filter ON M.ID = Filter.ID AND M.PlaceID = Filter.PlaceID 
AND M.PlaceRound = Filter.PlaceRound
AND M.WordCnt <> Filter.cnt

-- delete overlap word, by delete the shorter one
;WITH CTE AS(
	SELECT ID, PlaceID, PlaceRound, WordCnt, MIN(WordNum) mn, MAX(WordNum) mx
	FROM @Match
	GROUP BY ID, PlaceID, PlaceRound, WordCnt
)
DELETE M
FROM @Match M INNER JOIN (
	SELECT A.ID, A.PlaceID, A.PlaceRound
	FROM CTE A INNER JOIN CTE B
	ON A.ID = B.ID AND A.mn <= B.mx AND A.mx >= B.mn AND A.WordCnt < B.WordCnt
) Filter ON M.ID = Filter.ID AND M.PlaceID = Filter.PlaceID AND M.PlaceRound = Filter.PlaceRound

-- update PlaceCode with matching data
UPDATE testdata_b4
SET PlaceCode = M.PlaceCode
FROM @Match M
WHERE testdata_b4.ID = M.ID AND testdata_b4.WordNum = M.WordNum

-- get result
SELECT *
FROM testdata_b4
ORDER BY ID, WordNum

Open in new window

0
 
JoeNuvoCommented:
Bug: let say got these word in dictionary "Arlington rd." (8) , "rd. New" (7), "New York" (7878)
for testdata ID 1  will output as
1	1	56		NULL
1	2	Arlington	8	-- first found have complete data
1	3	rd.		8
1	4	New		7	-- bug
1	5	York		7878	-- bug

Open in new window


patch: replace line 59 with below code
	ON A.ID = B.ID AND A.mn <= B.mx AND A.mx >= B.mn
		AND (A.WordCnt < B.WordCnt OR (A.WordCnt = B.WordCnt AND A.PlaceID < B.PlaceID))

Open in new window

0
 
syoffeAuthor Commented:
Thank you for your solutions and apologies for my delayed response,
JoeNuvo, there is no limit to the number of rows comprising the value, 2 is just an example, it can be greater  then 2 and it can also be 1, we should prefer longer values. as for more then one value per address we can safely assume there will be only one value per address.
Rajeevnandanmishra, we cannot use the after table as our input, it is there just to illustrate what the data should look like after the update, and we cannot also assume the location of the "place" in the address...
also notice the content of the "after" table, it does not match the insert SQL you suggested.
JoeNuvo, looks very interesting, I'll test and post my comment.

thank you all
0
 
syoffeAuthor Commented:
Hi JoeNuvo,
it seems to be working like magic :) thank you
my problem now is that I have to actually understand the CTE part of it in order to adopt it to the real scenario in order to check performance :(

thnx again
0
 
JoeNuvoCommented:
for the first CTE
let say for "New York", this word got PlaceWordNum 1,2
In order to compare with all position from test table, let say having 5 words long, at least we need to have "New York" for 1,2 : 2,3 : 3,4 : 4,5
so, the CTE will try to generate those list, using RECURSIVE

Round	Word	Place
1	New	1
1	York	2
UNION ALL
	2	New	2
	2	York	3
	UNION ALL
		3	New	3
		3	York	4
			UNION ALL
			4	New	4
			4	York	5

Open in new window


the second CTE is just like sub query

;WITH CTE AS (SELECT Field1, Field2, Field3 FROM MyTable WHERE condition = 10)
SELECT ...
FROM CTE
      INNER JOIN ....

will be same as

SELECT ...
FROM (SELECT Field1, Field2, Field3 FROM MyTable WHERE condition = 10) CTE
      INNER JOIN ....
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 6
  • 5
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now