Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 290
  • Last Modified:

Table Design

I am designing an e-mail tracking database. I’m working on normalizing the tables. Please comment on the proposed design. E-mail image file notes must be tracked. For example, notes about the c:\email\file1.pdf file must be tracked. The note may read: “File contents: 2013 tax preparation.” or “p. 11 from Mary Smith, CPA”

There will be level 1 and level 2 subjects and level 1 and level 2 topics. A example of level 1 and level 2 subjects and level 1 and level 2 topics is the following:
Two levels of subjects:
Taxes
Income tax preparation

Two levels of topics:
California CPAs
Los Angeles CPAs


Proposed Tables

Comments
CommentID
Comment

Files
FileID

File_ Comments
CommentID
FileID
PathID
Topic_Level1_ID
Topic_Level2_ID
Page

Paths
PathID
Path

Subject_Level1
Subject_Level1_ID
Subject

Subject_Level2
Subject_Level2_ID
Subject

Topic_Level1
Topic_Level1_ID
Topic
Subject_Level1_ID

Topic_Level2
Topic_Level2_ID
Topic
Subject_Level2_ID
0
Mark01
Asked:
Mark01
  • 7
  • 6
  • 4
  • +1
5 Solutions
 
Jim Dettman (Microsoft MVP/ EE MVE)PresidentCommented:
Before anyone can really comment on the design, we need a little more detail on what it is your trying to accomplish.

Basic e-mail tracking yes, but where do the CPA's and Level of topic fit in?

It would be best if you just briefly described the overall situation and what type of information you plan to get out of this.

Jim.
0
 
Brian CroweCommented:
What is the Paths table?

Is there some reason to limit yourself to two subjects/topics?  You could remove this limitation with a many-to-many table between File_Comment and Topic tables and creating a hierarchical table out of the subjects.
0
 
Mark01Author Commented:
Jim:
I am just trying to track the emails that are contained in PDFs and make comments about them. Numerous emails are printed to a single PDF file. Each PDF contains numerous emails. Each PDF will have multiple subjects and topics associated it. In the example, the  c:\email\file1.pdf file can have two subjects and two topics associated with it. The subjects are Taxes and Income tax preparation. Taxes is the parent subject and Income tax preparation child subject. The same with the parent and child topics:  California CPAs and Los Angeles CPAs .

Here is a sample report:

c:\email\file1.pdf
Subj: Taxes
Topic: California CPAs
Level 2 Subj: Income tax preparation
Level 2 topic: Los Angeles CPAs
Comments:
File contents: 2013 tax preparation.
p. 11: From Mary Smith, CPA

Brian Crowe:
That is a good point. No, there is no reason set a limit of two subjects/topics. Please provide an example of how to create a hierarchical table out of the subjects
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Mark01Author Commented:
The Paths table tracks the path to the PDF. In the example above, the  path is: c:\email.
0
 
Jim Dettman (Microsoft MVP/ EE MVE)PresidentCommented:
Mark,

 That's perfect.... so we have:

1. Files
2. e-mails
3. Subjects
4. Topics

Let's start with the first three and let's ignore the relationship between the four things we have.   For now, let's just talk about the things themselves.

tblFiles - One record per file
FileID - AN - PK
FilePath - Text - Path to file
FileName - Text - Name of file
DateReceived - D/T

tblemails - one per e-mail
emailID - AN - PK
emailAddress - Text -CK1
DateAdded - D/T

tblSubjects - One record per subject
SubjectID - AN - PK
SubjectText - Text - CK1

 
   'AN' is Autonumber
   'PK' is Primary key
   'CK' is Candidate key

  Now in regards to the forth, your speaking about levels of topics, so a couple of questions:

1. Can a child topic belong to more than one parent?
2. Can there be more than two levels?

  basically your looking at doing a hierarchy type of model (you'll see this most often used for a "Bill of Materials" in MFG - which is a list of the parts and sub-parts that make something up).

Jim.
0
 
Mark01Author Commented:
Here are the answers to the questions.

Q: Can a child topic belong to more than one parent?
A: Yes

Q: Can there be more than two levels?
A: Yes
0
 
Scott PletcherSenior DBACommented:
You need to realign this process.  Your first concern should only be the data itself.  Thus, you must start with logical design: i.e., "entities" and "attributes" rather than "tables" and "columns".  This is more than just semantics.  There are literally hundreds+ books about logical design; many thousands if you include articles.  

Later you can turn the logical design into a physical design.  That's the easy part.

You need to describe as many facts about the data as possible, then use that info to model the data.  "Use cases" are also very popular now, so use those if they will assist you.  You will have separate definition and relationship data -- this is critical!

For example (D) mean that entry is a definition, otherwise assume it's a relationship.  {} indicates suggested areas for consideration, and are thus optional:

we must track all(?) emails received
(D) an email is a standard email {or a fax? or a phone call? do other things need tracked??}  if so, "email" might become "communication", for example.
we must record the physical location of emails
an email may have none(?)-to-many notes
(D) a note may be text: “File contents: 2013 tax preparation.” or “p. 11 from Mary Smith, CPA” or {it may be a file or other attachment?}
email notes must all(?) be tracked
an email must have a subject, but there could also be a subject hierarchy, of indeterminate depth
(
we will pre-determine a list of subject hierarchies about which we want to track data
OR
we will track all subject hierarchies, adding new ones when received
)
...
0
 
Jim Dettman (Microsoft MVP/ EE MVE)PresidentCommented:
OK.  There are a couple of ways to do that.  This is one way:

tblTopics - one record per topic
TopicID - AN - PK
TopicDescription - Text - CK1

tblSubTopics - one record per topic per sub-topic
SubTopicID - AN - PK
ParentTopicID - Long - FK to tblTopics - CK1A
ChildTopicID - Long - FK to tblTopics - CK1B

 'FK' is "Foreign Key" - meaning it's a copy of a PK from another table.
 'CK1A' is "Candidate Key 1, part A"

  So what were saying here is that a child topic can only be listed once for each parent.   What this means is that I can have topics:

A
B
C
D
E
F

and
B is a child of A
C is a child of A

D has no children

E is a child of A
F is a child of B

So I would get
A
  B
    F
  C
  E

D

 Now on the comments, I'm gathering the exact same comment would never apply to more than one file?   In other words these are just general comments about each file?

  If so, then simply add a field "Comments" to tblFiles

 I'll stop here and if all that makes sense, we can continue on with all the linking tables (which model the relationships between what we have).

Jim.
0
 
Mark01Author Commented:
Jim, your explanation makes sense. Regarding the comments, I will just add add a "Comments" field to Files table.
0
 
Jim Dettman (Microsoft MVP/ EE MVE)PresidentCommented:
Mark,

 What Scott is bringing up is the fact that relational data models are about the meaning of the data, which is why I asked you to explain what the process was and what you were trying to accomplish.

  At the end of our little exercise, I was going to have you walk through plugging data into the "tables" and see if that worked for real world examples (Scott's  "use cases") and gave you what you needed.  The second part of that is to  then ask questions; "what if I want to see all the topics associated with a given e-mail address?"    Can you do that with the model at hand?   In other words, is all the output you want achievable.

  Often when your starting out, it is easier to walk through/learn like this rather than try and visualize it all up front, which you can do more easily once you've gained experience.

Jim.
0
 
Scott PletcherSenior DBACommented:
Everyone is in (too big) a hurry to get to a physical design.

But rushing to a physical design is the single biggest database design mistake made!

Spend some time on the design and definitions (!) of the data.  Your design will be vastly cleaner, and thus will be easy to implement, rather than the painful write, re-write, re-re-write, re-re-re-write, etc., process most people have to go through.
0
 
Jim Dettman (Microsoft MVP/ EE MVE)PresidentCommented:
<<Jim, your explanation makes sense. Regarding the comments, I will just add add a "Comments" field to Files table. >>

 Well besides the question of "Does a given comment apply to more than one file", there is also the question "Will a given file have more than one comment?".

 If so, or there are other things you want to track about a comment, like who made it and the date it was made, then it will need a separate table.

But if it's more link a description of the file, which will stand on it's own, then a field will suffice.   What you don't want to do in a free form field is:

08/20 - Jim made this comment
08/21/14 - Mark made another comment

 Because what you have there is actually three pieces of data:  A date, a name, and a comment.   With it in a single field, there's no way to parse it or break it down.   Someone might also do this:

08/10 - another comment

 and omit the name, which you might consider to be required.  

  What were back sliding into here are the rules used for normalizing data, which I'll sum up for you if you like.  But if I'm going into too much detail for you, holler.

Jim.
0
 
Scott PletcherSenior DBACommented:
Forget normalization for now!

There's a 99.99% percent change you'll need to talk with business people to fully flesh this out.  To them, "primary key", "candidate key", "foreign key", etc., is just gibberish (to many developers, as well :-) .

But it's easy for them to give you a string of data they need to see.  Just get down the data, what it is, and what you can about how it relates to other data.

You can then post that and a lot of experts can give you reasoned, expert advice on how to normalize that data.  But, if you first rush into normalization, it will much harder to re-work problem areas.
0
 
Mark01Author Commented:
Jim and Scott, I agree with your comments about the logical design: i.e., "entities" and "attributes. I am using the following book to learn about database design:

http://www.amazon.com/Database-Design-Mere-Mortals-Hands-/dp/B00DIKUOUA/ref=sr_1_11?s=books&ie=UTF8&qid=1408736146&sr=1-11&keywords=database+design+for+mere+mortals

The problem I have is that I have to quickly create a temporary database while I work on the logical design for the email tracking database. This is just a temporary small database to track the emails that are contained in PDFs and make comments about them. it is important to track the path for each PDF.

I would appreciate it if someone would comment on my proposed linking tables.
0
 
Scott PletcherSenior DBACommented:
OK, I give up.  I still believe an hour or two of reviewing data will save you vast time and trouble down the road, but I understand you're in a bigger hurry.  Good luck.  I'm sure Jim can help you properly.
0
 
Mark01Author Commented:
I have redefined and minimized the requirements. My goal is to just create a temporary small database to track the emails that are contained in PDFs and make comments about them. Numerous emails are printed to a single PDF file. Each PDF contains numerous emails.  This will be only one comment per file. This will be a single user Access database. It is important to track the path for each PDF.

Here is a sample report:
c:\email\file1.pdf
Comments: File contents: 2013 tax preparation.

Here are the proposed entities:
Files
Path

Here are the proposed attributes:
Comments
File name
0
 
Mark01Author Commented:
Thank you, Brian Crowe, Jim Dettman and ScottPletcher.
0
 
Jim Dettman (Microsoft MVP/ EE MVE)PresidentCommented:
I'm a bit surprised you've close this out.   I also see you've asked multiple questions on this now.

 What your going to find is that you will get to an answer quicker if you stick with the original question.  Also, if you do decide for some reason to open another question, it's best to point back to the prior so Experts can understand the background.

 It's a waste of your time and theirs, because you'll end up going right back over what's already been done most times.

Jim.
0

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

  • 7
  • 6
  • 4
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now