Link to home
Start Free TrialLog in
Avatar of roricka
roricka

asked on

Can referential integrity involve more than one field in a table

In a survey database, consider two tables, Respondents (rt) (one record for everyone who takes a given survey) and Responses (rs) (one record for each response to each item on any survey). We have about 500K records in rt and 35000K records in rs. Each rt record is related to about 70 rs records. (I.e. a typical survey has 70 items). The database as a whole is keeping track of about two dozen different unique surveys.

The problem comes when fields are added to rs to make sql query times acceptable. For example:

Obviously one field in rs is rtKey, a FK pointing to the PK in some rs record. And one field in rt is suKey, a FK pointing to the survey itself in some third table that defines survey parameters. Another field in rs is rsResponse, a tinyint representing the response on an item, and another is rsItemnr, the number of the survey item for that record (or possibly a pointer to an item bank, but for simplicity let's just say the item number itself.)

The problem: it turns out that if you want to do a simple query such as find the decimal mean score to item 1 on survey 1 across all respondents, you have something like:

SELECT AVG(CONVERT(decimal,rsResponse)) from rs, rt WHERE rs.rtKey = rt.rtKey AND rt.suKey = 1 AND rs.rsItemnr=1

The execution time with a properly indexed SQLServer 2005 database is about 10 seconds over the subset of the 35 million records! And over 90% of that time is resolving the inner join, whose sole purpose is to determine which survey to select. So it is very tempting to denormalize the database and add an extra field to rs containing the survey number. This eliminates the inner join and reduces the execution time to under 1 second.

Thus the question, is it possible to enforce referential integrity, whereby a record in rs containing the survey number and also pointing to a record in rt that also contains the survey number is assured to be pointing to the same survey number? This is not a unique constraint, and doesn't appear to be a foreign key constraint. Is there a way to do this? (The new field would be rs.suKey).

Thank you.

roricka

SELECT AVG(CONVERT(decimal,rsResponse)) from rs, rt WHERE rs.rtKey = rt.rtKey AND rt.suKey = 1 AND rs.rsItemnr=1 
[10 seconds]
 
VS
 
SELECT AVG(CONVERT(decimal,rsResponse)) from rs WHERE rs.suKey = 1 AND rs.rsItemnr=1
[1 second]

Open in new window

Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

what about this:
SELECT AVG(CONVERT(decimal,rsResponse)) from rs
WHERE rs.rtKey IN ( SELECT rtKey FROM rt WHERE rt.suKey = 1 )
  AND rs.rsItemnr=1 

Open in new window

Avatar of roricka
roricka

ASKER

Thank you for your offer at help, but I'm afraid that suggestion (using a subquery) doesn't help at all. The execution time is just as long.

I certainly appreciate anybody who thinks there is a fast way to do this without denormalizing, but in case there ISN'T one, could somebody help me with my question, which is how to insure referential integrity? Or is the theory that is is NEVER better to denormalize and that there is ALWAYS a way to make the query fast on normalized tables thus there is NO NEED to have the kind of referential integrity I am asking about??
>which is how to insure referential integrity?
triggers (or procedures)

once you de-normalize, you have to ensure that in one way or another, all the data that has to be assured with integrity are really up-to-date.

triggers can do so, or, if all database access for updates/inserts/delete for those table is done in stored procedures, those can do also.


apart from that, did you try:
SET NOCOUNT ON
DECLARE @t TABLE ( rtKey int )
INSERT INTO @t SELECT rtKey FROM rt WHERE rt.suKey = 1
SELECT AVG(CONVERT(decimal,rsResponse)) from rs
WHERE rs.rtKey IN ( SELECT rtKey FROM @t )
  AND rs.rsItemnr=1 

Open in new window

Or:

SELECT      AVG(CONVERT(decimal,rsResponse))
from      rs
Where      rs.rsItemnr = 1
            And EXISTS (SELECT 1 FROM rt WHERE rt.rtKey = rs.rtKey And rt.suKey = 1)
Part of the delay could come from having to convert to decimal.  Also consider that if you are just going to use decimal, you may be better of using integer or bigint. As in:
 AVG(CAST(rsResponse as integer))
Avatar of roricka

ASKER

Interesting. But I stopped that query after 2 minutes and it hadn't finished. Why did you think that might be a good approach? Is there something I'm missing here?

Now about your answer. No we don't use stored procedures for all inserts. So the triggers idea is probably the way to go. I'm unfamiliar with triggers. Can I create one that would prevent me from adding a record to rs that had a conflicting suKey with the rt record it was point to via its rtKey FK? Or would it just let me add, and then raise an error? If I was adding the record using query analyzer, would it still flag an error?

Thanks.
Avatar of roricka

ASKER

Sorry. My previous response was to angelll.

acperkins: thank YOU for helping, but alas, a subquery approach just doesn't work (i.e. it's at least 10 times slower than with a denormalized table.)

Regarding your concern about data types, the conversion must take place no matter what so I don't see how it affects this issue. And I totally do not understand your comment about the CAST. The mean is a decimal. Using the CAST returns an integer. How does that help? One more thing, acperkins, and I hope you don't take this the wrong way. I find that many many times when i ask for help here, instead of help I get someone suggesting that my premise is incorrect (e.g., instead of helping with the question about constraints, you suggest I perform the query differently.) Now, this kind of comment would be VERY HELPFUL if it were accurate. But both you and angelll (and I say this hoping neither of you kind souls takes it the wrong way) seem to think it's better to suggest total guesses at how to improve the processing speed of the query instead of addressing my question. Believe me, I have tried and tried to make this query go fast. I'm not (no offense) asking for help there. I'm specifically asking for help regarding referential integrity. And it would be refreshing to get responses directed at the question I asked, NOT the one you think you can answer. (heh heh)  ;-)
>> And I totally do not understand your comment about the CAST. The mean is a decimal.<<
If you define it as decimal it has no decimals, so you may be better of using integer or bigint.

>> I say this hoping neither of you kind souls takes it the wrong way<<
Not at all.  It is just the nature of this type of medium.  All we can do is guess by the type of question the level of expertise of the author.

>> And it would be refreshing to get responses directed at the question I asked<<
Fair enough.

Good luck.
>Can I create one that would prevent me from adding a record to rs that had a conflicting suKey with the rt record it was point to via its rtKey FK?

yes, that is possible.
as you seem to be a man of practice, I presume you want to first read up on triggers a bit?
Avatar of roricka

ASKER

>> If you define it as decimal it has no decimals, so you may be better of using integer or bigint.
acperkins, I guess I'm being dense here, but the mean, of course, MUST be decimal. E.g., the mean of 3 and 4 is 3.5. If it's not converted to decimal it merely appears as an integer, which is useless. Please explain because I know I am not understanding the point you are making.
>>guess by the type of question the level of expertise of the author
acperkins, the question is how do I create this kind of referential integritiy. SInce you are "guessing" can't you just guess that the level of expertise here is that I know that referential integrity is something that exists, but that I don't know how it applies, and then give some advice about referential integrity? To me that sounds reasonable. Are you saying that to you, you guessed that my level of expertise was such that it wasn't up to being given some advice about referential integrity?

>>I presume you want to first read up on triggers a bit?
angelll, yes of course. But the Microsoft docs on this subject are a bit opaque. Can you point me in a direction that gives a pretty clear description, including what I'm gonna need to make sure the problems we both know might occur with the denormalizing I've spoken about are prevented. Thanks in advance.

roricka
>But the Microsoft docs on this subject are a bit opaque
let's say: they are purely technical. not really practical examples ...

let's see if this tutorial makes things easier to understand:
http://www.sqlteam.com/article/an-introduction-to-triggers-part-i
Avatar of roricka

ASKER

Thanks. Good reading, but just enough to whet the appetite. And as usual, the follow-on MS doc is pretty opaque. For example, MS recommends against using COMMIT TRANSACTION, but they talk about ROLLBACK TRANSACTION as if it were OK ("Specifying When a DML Trigger Fires")

Lets say we're not doing transaction processing, but just isolated INSERTS. Do I do an AFTER INSERT trigger or just an INSERT if I want to prevent an INSERT that breaks referential rules (i.e., as above, that inserts a survey key that "disagrees" with the respondent key). I'm not asking you to write this thing for me, but it is confusing. Can one use just a "regular" INSERT trigger? Then do you just raise an error when a problem occurs? Do you have to delete the inserted row?

Thanks.

Roricka

ASKER CERTIFIED SOLUTION
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of roricka

ASKER

Angelll -- I think I've squeezed as much as I should from this question. Thank you for all your help. If I still have trouble, I'm just going to post another question specifically about triggers. Thanks again. - roricka