Data Modeling

Posted on 2014-08-25
Last Modified: 2014-08-26
This is not an academic assignment.

I am designing a database that will be used to track the emails that are contained in PDFs and make comments about them. Numerous emails are printed to a single PDF file. Each PDF contains numerous emails. The PDFs are not going to be stored in a database. Only the file path and file name will be tracked.  There will be only one comment per file. This will be a single user database.

Please help me create the data model.

Here is a sample report:
Comments: File contents: 2013 tax preparation.

Here is some more detail:

Entity: Files
Description: Numerous emails are printed to a single PDF file. Each PDF contains numerous emails. The PDFs are not going to be stored in a database.
Attributes: Comments, File name

Entity: Path
Description: The location of the PDF. Example: c:\email\file1.pdf
Attributes: ?
Question by:Mark01
    LVL 34

    Expert Comment

    by:Brian Crowe
    Didn't you get an answer to this already?

    What more are you looking for?

    Author Comment

    Those comments did not address logical modeling; they addressed the physical modeling.
    LVL 68

    Accepted Solution

    Here's a few thoughts.  I trust that actually writing out a definition was actually very helpful in identifying areas that needed clarification and in making sure everyone's expectations were aligned.

    Entities can be either singular or plural (that debate won't ever be settled), but be consistent: all singular or all plural.

     Entity: File
     Description: A standard-format PDF file used to hold related(?) emails.
         Numerous emails are printed to a single PDF file.  
         All emails in a single PDF must be related? by ?[date?; subject?; ??].
         PDFs are currently not stored in a database; they are stored as standard PDF files.
         Files may have zero to many associated comments.
         Path Id --identifier that specifies the drive and path to the file
         File Id --arbitrary unique sequential number assigned to every file name.  File names alone may not be unique, since the same file name could be stored in different paths.
         File Name --physical name of the file, excluding file extension.
         File Extension --
         File Type Id --physical file type; typically this will correlate to the file extension, but that's not actually required.  For example, a '[.]txt' file could actually [.]csv-format data and vice-versa.
         ?Date Created --
         ?Created By --
         Description --what this file is, its use, etc..

     Entity: File Comment
     Descrption: User-entered comments / notes about a particular file.
         File Id --the file to which the comments apply
         Comments --
         Date Created --
         Created By --

     Entity: File Type
     Description: Type of file details.  Will include the default extension and a brief description of the file type.
         File Type Id --arbitrary unique sequential number assigned to each unique File Type
         Default Extension --default extension for this type of file: 'PDF' for pdf files; 'xls';'xlsx' for Microsoft Excel files; etc..
         File Type --standardized description of the the file type: 'PDF'; 'Excel 2007-'; 'Excel 2010'.    

     Entity: Path
     Description: The path of a file. Example: c:\email\.  This does NOT include the file name, which is part of the file data.
         Path Id --arbitrary unique sequential number assigned to each unique path
         Drive --the "drive" name; may be a single letter, a share, a volume mount point, etc.
         Path --the remainder of the physical path to the file
         Description --purpose, use, etc., of this specific path
     Sample data:
         Holds email extract/archive PDF files. These files can be for any department in the company. This folder is accessible to all employees.
         Holds email only for the HR department where additional security is required.  Only HR members can access this folder.

    Author Comment

    Yes. Writing out the definitions is very helpful.

    I am currently using the following book to learn database design:

    Do you agree that the next step should be a preliminary field list and preliminary table list like the book recommends? The entities become tables and the attributes become the fields.
    LVL 68

    Expert Comment

    Once you've completed the attribute list and have all the critical relationships and rules, yes.  You've got a bit still to go there, but if you're with someone who knows the data, it won't take long, maybe 30-45 minutes.

    I'm not familiar with that particular author so I can't comment specifically on that book.

    Author Closing Comment

    Thank you, Scott Pletcher.
    LVL 68

    Expert Comment

    You're very welcome.  Good luck with the project!

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
    This article describes some very basic things about SQL Server filegroups.
    Viewers will learn how the fundamental information of how to create a table.
    Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

    732 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now