Solved

Can referential integrity involve more than one field in a table

Posted on 2009-05-16
14
272 Views
Last Modified: 2012-05-07
In a survey database, consider two tables, Respondents (rt) (one record for everyone who takes a given survey) and Responses (rs) (one record for each response to each item on any survey). We have about 500K records in rt and 35000K records in rs. Each rt record is related to about 70 rs records. (I.e. a typical survey has 70 items). The database as a whole is keeping track of about two dozen different unique surveys.

The problem comes when fields are added to rs to make sql query times acceptable. For example:

Obviously one field in rs is rtKey, a FK pointing to the PK in some rs record. And one field in rt is suKey, a FK pointing to the survey itself in some third table that defines survey parameters. Another field in rs is rsResponse, a tinyint representing the response on an item, and another is rsItemnr, the number of the survey item for that record (or possibly a pointer to an item bank, but for simplicity let's just say the item number itself.)

The problem: it turns out that if you want to do a simple query such as find the decimal mean score to item 1 on survey 1 across all respondents, you have something like:

SELECT AVG(CONVERT(decimal,rsResponse)) from rs, rt WHERE rs.rtKey = rt.rtKey AND rt.suKey = 1 AND rs.rsItemnr=1

The execution time with a properly indexed SQLServer 2005 database is about 10 seconds over the subset of the 35 million records! And over 90% of that time is resolving the inner join, whose sole purpose is to determine which survey to select. So it is very tempting to denormalize the database and add an extra field to rs containing the survey number. This eliminates the inner join and reduces the execution time to under 1 second.

Thus the question, is it possible to enforce referential integrity, whereby a record in rs containing the survey number and also pointing to a record in rt that also contains the survey number is assured to be pointing to the same survey number? This is not a unique constraint, and doesn't appear to be a foreign key constraint. Is there a way to do this? (The new field would be rs.suKey).

Thank you.

roricka

SELECT AVG(CONVERT(decimal,rsResponse)) from rs, rt WHERE rs.rtKey = rt.rtKey AND rt.suKey = 1 AND rs.rsItemnr=1 

[10 seconds]
 

VS
 

SELECT AVG(CONVERT(decimal,rsResponse)) from rs WHERE rs.suKey = 1 AND rs.rsItemnr=1

[1 second]

Open in new window

0
Comment
Question by:roricka
  • 6
  • 5
  • 3
14 Comments
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 24402589
what about this:
SELECT AVG(CONVERT(decimal,rsResponse)) from rs
WHERE rs.rtKey IN ( SELECT rtKey FROM rt WHERE rt.suKey = 1 )
  AND rs.rsItemnr=1 

Open in new window

0
 

Author Comment

by:roricka
ID: 24402981
Thank you for your offer at help, but I'm afraid that suggestion (using a subquery) doesn't help at all. The execution time is just as long.

I certainly appreciate anybody who thinks there is a fast way to do this without denormalizing, but in case there ISN'T one, could somebody help me with my question, which is how to insure referential integrity? Or is the theory that is is NEVER better to denormalize and that there is ALWAYS a way to make the query fast on normalized tables thus there is NO NEED to have the kind of referential integrity I am asking about??
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 24403307
>which is how to insure referential integrity?
triggers (or procedures)

once you de-normalize, you have to ensure that in one way or another, all the data that has to be assured with integrity are really up-to-date.

triggers can do so, or, if all database access for updates/inserts/delete for those table is done in stored procedures, those can do also.


apart from that, did you try:
SET NOCOUNT ON

DECLARE @t TABLE ( rtKey int )

INSERT INTO @t SELECT rtKey FROM rt WHERE rt.suKey = 1

SELECT AVG(CONVERT(decimal,rsResponse)) from rs

WHERE rs.rtKey IN ( SELECT rtKey FROM @t )

  AND rs.rsItemnr=1 

Open in new window

0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24403596
Or:

SELECT      AVG(CONVERT(decimal,rsResponse))
from      rs
Where      rs.rsItemnr = 1
            And EXISTS (SELECT 1 FROM rt WHERE rt.rtKey = rs.rtKey And rt.suKey = 1)
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24403605
Part of the delay could come from having to convert to decimal.  Also consider that if you are just going to use decimal, you may be better of using integer or bigint. As in:
 AVG(CAST(rsResponse as integer))
0
 

Author Comment

by:roricka
ID: 24403621
Interesting. But I stopped that query after 2 minutes and it hadn't finished. Why did you think that might be a good approach? Is there something I'm missing here?

Now about your answer. No we don't use stored procedures for all inserts. So the triggers idea is probably the way to go. I'm unfamiliar with triggers. Can I create one that would prevent me from adding a record to rs that had a conflicting suKey with the rt record it was point to via its rtKey FK? Or would it just let me add, and then raise an error? If I was adding the record using query analyzer, would it still flag an error?

Thanks.
0
 

Author Comment

by:roricka
ID: 24403728
Sorry. My previous response was to angelll.

acperkins: thank YOU for helping, but alas, a subquery approach just doesn't work (i.e. it's at least 10 times slower than with a denormalized table.)

Regarding your concern about data types, the conversion must take place no matter what so I don't see how it affects this issue. And I totally do not understand your comment about the CAST. The mean is a decimal. Using the CAST returns an integer. How does that help? One more thing, acperkins, and I hope you don't take this the wrong way. I find that many many times when i ask for help here, instead of help I get someone suggesting that my premise is incorrect (e.g., instead of helping with the question about constraints, you suggest I perform the query differently.) Now, this kind of comment would be VERY HELPFUL if it were accurate. But both you and angelll (and I say this hoping neither of you kind souls takes it the wrong way) seem to think it's better to suggest total guesses at how to improve the processing speed of the query instead of addressing my question. Believe me, I have tried and tried to make this query go fast. I'm not (no offense) asking for help there. I'm specifically asking for help regarding referential integrity. And it would be refreshing to get responses directed at the question I asked, NOT the one you think you can answer. (heh heh)  ;-)
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24403880
>> And I totally do not understand your comment about the CAST. The mean is a decimal.<<
If you define it as decimal it has no decimals, so you may be better of using integer or bigint.

>> I say this hoping neither of you kind souls takes it the wrong way<<
Not at all.  It is just the nature of this type of medium.  All we can do is guess by the type of question the level of expertise of the author.

>> And it would be refreshing to get responses directed at the question I asked<<
Fair enough.

Good luck.
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 24404308
>Can I create one that would prevent me from adding a record to rs that had a conflicting suKey with the rt record it was point to via its rtKey FK?

yes, that is possible.
as you seem to be a man of practice, I presume you want to first read up on triggers a bit?
0
 

Author Comment

by:roricka
ID: 24404347
>> If you define it as decimal it has no decimals, so you may be better of using integer or bigint.
acperkins, I guess I'm being dense here, but the mean, of course, MUST be decimal. E.g., the mean of 3 and 4 is 3.5. If it's not converted to decimal it merely appears as an integer, which is useless. Please explain because I know I am not understanding the point you are making.
>>guess by the type of question the level of expertise of the author
acperkins, the question is how do I create this kind of referential integritiy. SInce you are "guessing" can't you just guess that the level of expertise here is that I know that referential integrity is something that exists, but that I don't know how it applies, and then give some advice about referential integrity? To me that sounds reasonable. Are you saying that to you, you guessed that my level of expertise was such that it wasn't up to being given some advice about referential integrity?

>>I presume you want to first read up on triggers a bit?
angelll, yes of course. But the Microsoft docs on this subject are a bit opaque. Can you point me in a direction that gives a pretty clear description, including what I'm gonna need to make sure the problems we both know might occur with the denormalizing I've spoken about are prevented. Thanks in advance.

roricka
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 24404355
>But the Microsoft docs on this subject are a bit opaque
let's say: they are purely technical. not really practical examples ...

let's see if this tutorial makes things easier to understand:
http://www.sqlteam.com/article/an-introduction-to-triggers-part-i
0
 

Author Comment

by:roricka
ID: 24409485
Thanks. Good reading, but just enough to whet the appetite. And as usual, the follow-on MS doc is pretty opaque. For example, MS recommends against using COMMIT TRANSACTION, but they talk about ROLLBACK TRANSACTION as if it were OK ("Specifying When a DML Trigger Fires")

Lets say we're not doing transaction processing, but just isolated INSERTS. Do I do an AFTER INSERT trigger or just an INSERT if I want to prevent an INSERT that breaks referential rules (i.e., as above, that inserts a survey key that "disagrees" with the respondent key). I'm not asking you to write this thing for me, but it is confusing. Can one use just a "regular" INSERT trigger? Then do you just raise an error when a problem occurs? Do you have to delete the inserted row?

Thanks.

Roricka

0
 
LVL 142

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 500 total points
ID: 24409913
>For example, MS recommends against using COMMIT TRANSACTION
>but they talk about ROLLBACK TRANSACTION
the rollback allows to rollback the entire transaction, even if @@tranlevel > 1,
while commit cannot do that. it will only commit 1 level. making debugging close to impossible when you get errors like "cannot commit as tranlevel = 0

>want to prevent an INSERT that breaks referential rules
normally, you would rely of the referential checks, but in your case, that is not possible.
whether you choose a AFTER trigger of a INSTEAD OF trigger is your choice here, AFAIK


>Can one use just a "regular" INSERT trigger?
>Then do you just raise an error when a problem occurs?
the raiseerror will NOT rollback the statement, but leaves the decision to the calling code.
you will need to call ROLLBACK explicitly AND the raiserror

>Do you have to delete the inserted row?
with ROLLBACK, that is done for you (in the regular trigger)
in the INSTEAD OF trigger, you go the other way round: if the checks are ok, you have to call the INSERT, otherwise just the raiserror.


0
 

Author Closing Comment

by:roricka
ID: 31582218
Angelll -- I think I've squeezed as much as I should from this question. Thank you for all your help. If I still have trouble, I'm just going to post another question specifically about triggers. Thanks again. - roricka
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Join & Write a Comment

If you having speed problem in loading SQL Server Management Studio, try to uncheck these options in your internet browser (IE -> Internet Options / Advanced / Security):    . Check for publisher's certificate revocation    . Check for server ce…
by Mark Wills Attending one of Rob Farley's seminars the other day, I heard the phrase "The Accidental DBA" and fell in love with it. It got me thinking about the plight of the newcomer to SQL Server...  So if you are the accidental DBA, or, simp…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now