Paul Stevenson
asked on
Database structure - best fit for recording lists
Hi,
Odd question but my brain is melting over this one:
I have a form (on an iPhone) that returns answers to a number (about 10) of questions that will mostly return yes or no answers (although not exclusively so will be saved as a string) in an xml format. I expect about 20 users to complete the form every day and the forms may change over time and more forms may be added.
I want to store the data returned and have parsed the xml to extract the relevant information.
Should I create a tables to store that information like this (option 1) -
master table:
Unique ID
forms name
date/time returned
username
child table:
Unique ID
master table ID
question (in the form of a short code)
answer
OR like this (option 2) -
master table:
unique ID
form name
date/time returned
username
child table:
unique ID
Master table ID
question 1 (in a short name form)
answer 1
question 2 (in a short name form)
answer 2
and repeat for each question/answer
I think option 1 as querying to find which fields have answered yes would be quick but the number of rows will grow to a large number very quickly
Option 2 will give less records but queries will be harder etc.
Thanks in advance,
Paul
Odd question but my brain is melting over this one:
I have a form (on an iPhone) that returns answers to a number (about 10) of questions that will mostly return yes or no answers (although not exclusively so will be saved as a string) in an xml format. I expect about 20 users to complete the form every day and the forms may change over time and more forms may be added.
I want to store the data returned and have parsed the xml to extract the relevant information.
Should I create a tables to store that information like this (option 1) -
master table:
Unique ID
forms name
date/time returned
username
child table:
Unique ID
master table ID
question (in the form of a short code)
answer
OR like this (option 2) -
master table:
unique ID
form name
date/time returned
username
child table:
unique ID
Master table ID
question 1 (in a short name form)
answer 1
question 2 (in a short name form)
answer 2
and repeat for each question/answer
I think option 1 as querying to find which fields have answered yes would be quick but the number of rows will grow to a large number very quickly
Option 2 will give less records but queries will be harder etc.
Thanks in advance,
Paul
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If indeed "20 users to complete the form every day" and do not plan to have 100+ forms scalability should not be a problem in my opinion unless...you plan to keep the table(s) in a SQL Server on some mobile device:)
Quick math
select 20(users)*1(form)*100000(d ays) = 2 000 000 over a period of
select 100000/365 = 273 years
right? so you don't have that much data to be accumulated per form over a long period of time but indeed...is good to consider and learn from advice/suggestions you will get.
Quick math
select 20(users)*1(form)*100000(d
select 100000/365 = 273 years
right? so you don't have that much data to be accumulated per form over a long period of time but indeed...is good to consider and learn from advice/suggestions you will get.
lol i know, but scalability is always good to consider, and if this is a new venture, if something takes off in the mobile or internet infrastructures then it can become astronomically huge over night
many times this could but doesnt happen because the code falls over when a sudden influx of users starts, so dies straight away
many times this could but doesnt happen because the code falls over when a sudden influx of users starts, so dies straight away
DON'T shortcut the design process and just stick everything in one "table" (not really a table if there was NO design done on it).
Proper normalization is primarily for ease of maintaining accuracy, not performance. Even 100 rows should be properly normalized, although obviously you avoid extreme normalization unless you have huge numbers of rows.
Here's my initial rough cut; naturally it needs further fleshing out as well:
Users ( user_id, user_name, first_name, last_name, ..., date_registered,
password [encrypted, if possible], ... )
User_Contacts ( user_id, sequence(1=most preferred contact,2=next preferred contact,etc.), mode (email/phone/...), ... )
-- solely for the Qs themselves, w/o regard to any form;
--if a Q is significantly altered, you assign it a new question_id and
--adjust any form(s) accordingly
Questions ( question_id, question_text, question_description, date_originated, date_last_modified, date_ended, ... )
--about only the form itself, w/o regard to any qs
Forms ( form_id, name, date_originated, date_first_use, date_ended )
--form structure, w/o regard to any user
Form_Structure ( form_id, sequence(1=first q/element on form,2=2nd q/element on form,etc.), question_id (text of q could actually just be text to explain something on form or even a picture to be included at that point on the form, etc..), ... )
User_Forms ( user_id, form_id, date_asked, date_responded, was_completed, ... )
User_Answers ( user_id, form_id, question_id, answer )
Proper normalization is primarily for ease of maintaining accuracy, not performance. Even 100 rows should be properly normalized, although obviously you avoid extreme normalization unless you have huge numbers of rows.
Here's my initial rough cut; naturally it needs further fleshing out as well:
Users ( user_id, user_name, first_name, last_name, ..., date_registered,
password [encrypted, if possible], ... )
User_Contacts ( user_id, sequence(1=most preferred contact,2=next preferred contact,etc.), mode (email/phone/...), ... )
-- solely for the Qs themselves, w/o regard to any form;
--if a Q is significantly altered, you assign it a new question_id and
--adjust any form(s) accordingly
Questions ( question_id, question_text, question_description, date_originated, date_last_modified, date_ended, ... )
--about only the form itself, w/o regard to any qs
Forms ( form_id, name, date_originated, date_first_use, date_ended )
--form structure, w/o regard to any user
Form_Structure ( form_id, sequence(1=first q/element on form,2=2nd q/element on form,etc.), question_id (text of q could actually just be text to explain something on form or even a picture to be included at that point on the form, etc..), ... )
User_Forms ( user_id, form_id, date_asked, date_responded, was_completed, ... )
User_Answers ( user_id, form_id, question_id, answer )
ASKER
Hi Guys, thanks for the quick response.
I should add that the forms themselves (and users list, questions etc) are created and managed in a separate system.
The structure I am trying to create is just for the output of the forms returned in xml.
In this initial case it is a vehicle check form which users complete on an iPhone. The answers are mostly set to yes already so the user just confirms the answers by signing and saving the form.
When I receive the xml output most answers will be yes (by default) but I need to check for any "no" answers and notify a manager with the text that one question allows.
I will also need to run reports to show vehicle issues over time as it part of our Health and safety and vehicle management system.
I think all agree (with various levels of implementation) that there is no good reason to create a Q/A table with multiple columns (for each question and answer) and one record for the whole response. Better to have just a few columns (fields) and multiple records per form returned.
In this case you all have answered with what I was thinking and I understand the comments regarding normalisation but since this is internal use only, rarely queried once stored and code and querying will be easier the more basic the structure is, I am going with the first answer.
Thanks all for contributing.
Paul
I should add that the forms themselves (and users list, questions etc) are created and managed in a separate system.
The structure I am trying to create is just for the output of the forms returned in xml.
In this initial case it is a vehicle check form which users complete on an iPhone. The answers are mostly set to yes already so the user just confirms the answers by signing and saving the form.
When I receive the xml output most answers will be yes (by default) but I need to check for any "no" answers and notify a manager with the text that one question allows.
I will also need to run reports to show vehicle issues over time as it part of our Health and safety and vehicle management system.
I think all agree (with various levels of implementation) that there is no good reason to create a Q/A table with multiple columns (for each question and answer) and one record for the whole response. Better to have just a few columns (fields) and multiple records per form returned.
In this case you all have answered with what I was thinking and I understand the comments regarding normalisation but since this is internal use only, rarely queried once stored and code and querying will be easier the more basic the structure is, I am going with the first answer.
Thanks all for contributing.
Paul
ASKER
Whilst not 100% the "proper" way to do it but easiest and for the purposes of this project quicker to implement and test so well done.
i would give a table to user details this should give you much better scalability in the future and should simplify queries in the long run
something like this
FormsTB
formid,
formname,
active,
datecommenced
dateended
UserTB
UserID
username
realname
password
email
contactdetails
dateregistered
QuestionTB
QuestionID
FormID
DateUploaded
Active
QuestionText
AnswerTB
AnswerID
QuestionID
FormID
UserID
AnswerText
DateAnswered