Solved

Relational Database to XML - Maintain 3NF

Posted on 2004-10-04
13
442 Views
Last Modified: 2013-11-19
Greetings ee's!

I was working on a project for a dot com site and decided to incorporate the project into my senior seminar for graduation.  I am not expecting anyone to do my work for me, but I am looking for some potential directions and solutions.  This is a fairly technical question worth 500 points (I may split points) so I'm looking for a lively discussion.

My goal: Convert a relational databse to XML format while retaining a BCNF or 3NF.  I want to retain referential integrity for elements within the database.

Example:
Table 1 - Students
----------
student_id
student_name
major_id

Table 2 - Major
------------
major_name
major_id
major_requirements

While this CAN be accomplished with a tree structure:
<student>
     <student_name>Olivia Newton-John</student_name>
     <student_id>1234</student_id>
    <major>
          <major_id>23454</major_id>
          <major_name>Chemistry</major_name>
          <major_requirements>Some requirements</major_requirements>
      </major>
</student>
<student>
     <student_name>John Travolta</student_name>
     <student_id>1234</student_id>
    <major>
          <major_id>23454</major_id>
          <major_name>Chemistry</major_name>
          <major_requirements>Some requirements</major_requirements>
      </major>
</student>

This hieracrhical approach is inefficient and contains lots of redundancy.

I've read 2 XML books and about 6 academic papers somewhat related to this issue.  Currently I am swimming in "X" based technologies.   I've read about XML Schemas, DTD's, XPaths, XPointers, XLinks, XQueries, Xad-nauseum.   Is there a technology or approach for accomplishing the conversion?

I realized I could use the XML format above, but it is designed for display, not data storage.

Troy



0
Comment
Question by:djlurch
  • 4
  • 3
  • 3
  • +2
13 Comments
 
LVL 7

Expert Comment

by:J_Mak
Comment Utility
What do you mean designed for display, not data storage? It stores the data in a very convenient way. To be able to display it, you must convert it to a different format using stylesheets. For example, you can display it in HTML. XML has basically split the display and storage functionalities in 2. XML itself is self-described data which can be displayed for viewing through XSL manipulations. The above example is a good example of data storage. Cheers.
0
 
LVL 7

Expert Comment

by:J_Mak
Comment Utility
As far as converting a RMDB into XML, I'm still looking into that.
0
 
LVL 6

Expert Comment

by:zulu_11
Comment Utility
0
 
LVL 1

Author Comment

by:djlurch
Comment Utility
J Mak:  Thank you for your comments and background on XML...however I know what XML is.  The question revolves around normalization of XML data.  

Zulu: nice link.  That is a great copy of lecture notes.  I may use it as one of my sources.

Any other takers?
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
<root>
   <majors>
      <major>
         <major_id>12345</major_id>
         <major_name>Chemistry</major_name>
         <major_requirements>Some requirements</major_requirements>
      </major>
      <major>
         <major_id>65432</major_id>
         <major_name>Mathematics</major_name>
         <major_requirements>Some requirements</major_requirements>
      </major>
   </majors>
   <students>
      <student>
        <student_name>Olivia Newton-John</student_name>
        <student_id>1234</student_id>
         <major_id>65432</major_id>
     </student>
      <student>
        <student_name>John Travolta</student_name>
        <student_id>1234</student_id>
         <major_id>65432</major_id>
      </student>
   <students>
</root>
0
 
LVL 1

Author Comment

by:djlurch
Comment Utility
sybe:

Yes, that is certainly a way the data could be rerpresented.  However, with no other information, referential integrity is not only not maintained, it isn't even suggested.  This needs to be done via DTD or XML Schema, or some XML linking technology.  The responses so far are starting to clarify the issue (for others) I hope..
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 28

Expert Comment

by:sybe
Comment Utility
There is no referential integrity in XML. The referential integrity that is maintained in a database is done by the engine, not by the way the data are being stored. If you want to maintain it in XML you's have to write an engine for that.

Sorry for that.
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
You have to realize that an XML database is much like a csv database in that respect, basically just plain text. More advanced databases store the data not in plain text, but compressed, but it is still the engine that checks the code. You could try to define constraints in the XML as well, and write code to check if the data are in accordance with those constraints, because it is not something that the common XML parsers offer (as far as i know).
0
 
LVL 26

Accepted Solution

by:
rdcpro earned 500 total points
Comment Utility
I posted a lengthy response previously, but it seems to have never arrived.  :(

There is a mechanism in XML to identify relationships within the hierarchical structure.  Schemas and DTDs both support this, and they use the ID, IDREF and IDREFS datatypes to specify the primary and foreign keys for the relation. The ID and IDREF type must begin with an alpha character, and contain no whitespace.  It's basically a name token.  IDREFS is a type that contains a whitespace delimited list of IDREF types.

In the link the Zulu_11 provided, they show how to tell if an XML is in XNF, but they don't particularly provide guidance on how this might be done.  Here's an example of one way you would handle a many-to-many relationship in a hierarchical document structure:

Using this DTD:

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT courses ( course*, student_info*) >
<!ELEMENT course ( title, taken_by) >
<!ATTLIST course cno ID #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT taken_by ( student*)>
<!ELEMENT student ( grade) >
<!ELEMENT grade (#PCDATA) >
<!ATTLIST student sidref IDREF #REQUIRED>
<!ELEMENT student_info (name) >
<!ATTLIST student_info sid ID #REQUIRED>
<!ELEMENT name ( #PCDATA)>

This XML validates and provide metadata that identifies the many-to-many relationship, and I believe this is in BCNF.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE courses SYSTEM "Courses.dtd">
<courses>
      <course cno="c-1">
            <title>Physics</title>
            <taken_by>
                  <student sidref="s-1">
                        <grade>A</grade>
                  </student>
                  <student sidref="s-2">
                        <grade>D</grade>
                  </student>
            </taken_by>
      </course>
      <course cno="c-2">
            <title>Philosophy</title>
            <taken_by>
                  <student sidref="s-3">
                        <grade>A</grade>
                  </student>
                  <student sidref="s-1">
                        <grade>F</grade>
                  </student>
            </taken_by>
      </course>
      <student_info sid="s-1">
            <name>Mike Sharp</name>
      </student_info>
      <student_info sid="s-2">
            <name>Paul Dirac</name>
      </student_info>
      <student_info sid="s-3">
            <name>Plato</name>
      </student_info>
</courses>


The IDREFS type is designed to allow XML to describe a one-to-many relationship.  So we might have a <registration> tag that looks like this:

<registration  sidref="s-1" cidrefs="c-1 c-2">

This shows that student s-1 is taking courses c-1 and c-2.  


Now, just because XML can do this, doesn't mean it's particularly good at it.  XML is best a representing hierarchical data, and when you normalize, you make it much harder to work with the data in a UI such as a web site.  If the idea is to maintain a RDBMS in the back end, and produce XML to render the web site, that's one thing.  There is no pressing need to normalize the XML, and in fact it can be *less efficient* to do so, because the rendering must now follow all these relational links.  Remember, when all is said and done, the UI won't be very normalized.  At some point, the data must be denormalized so that it can render in a way that humans can visualize.  I feel it's better to make that transition in the step between the back end and the middle tier, so that the XML data structure is related more to the way the data will be structured in the UI.  We can then use a simple transformation (whether by XSLT, or some other method) to create various presentations of that data.

One of the better ways I've seen to map relational data over XML is how MS SQL Server handles it.  You write a Schema that describes the XML, and the datatype of each node.  This is used for much more than simple validation. You then use a standardized annotation system to identify how this structure maps to the relational database.  This is powerful because now you can easily push XML documents--even huge ones--to the DB, and the Schema is used by the engine to automatically update the DB.  And the schema can improve the efficiency in retrieving XML from the database as well, since you can now query the relational data using XPath.

Here's the key, and this is a point that Kurt Cagle made once that I've always emphasized to my students:  XML works best when it most resembles the real-world "thing" you're trying to represent.  

For us, hierarchical structures are more intuitive--we think of Parents with children as a hierarchy, not as instances of the same type "persons" related by a key.  Relational models are efficient for processing, but are not easy to "See".

As far as your goal to "Convert a relational databse to XML format while retaining a BCNF or 3NF", yes this is possible, but unless you're dumping the relational DB entirely, I don't see the advantage.  At some point you're going to have to display the data.

There is another problem too.  Because XML is a hierarchical structure, every different view of the data that you'll need means you have to have another schema to define it--whether or not you actually write one.  If  you're interested in "Courses" you have one schema that lists the course with the students who are taking it.  So Course is higher in the hierarchy than student.  But if you're interested in Students first, then what course each is taking, you need another, completely different schema.  In RDBMS, the schema is fixed, and only the query changes.  Making queries of XML Structures using "relational" techniques like the ID/IDREF stuff I mentioned before is harder.  I don't think it's more efficient, either, as XPath is optimized for hierarchical, not relational data.

There are, however, excellent XML Databases out there.  Tamino is a good commercial one, and the Berkeley XML DB is an excellent free one.  If I were actually going to replace my relational DB with XML structures, I would probably go with an object orient DB like these.

Regards,
Mike Sharp
0
 
LVL 26

Expert Comment

by:rdcpro
Comment Utility
By the way, I didn't mention this, but the parser will enforce referential integrity in the example I gave above.  If you specify an IDREF attribute and there is no other node with a matchin ID attribute value, the parser will throw a validation error.  If you have two nodes with the same value in their ID attribute, it will also throw an error.   Try it out with a validating XML editor, like XML Spy.  This does not mean that the attribute must be *called* ID, its simply of "type" ID.  In my example, the cno attribute is an ID.

Regards,
Mike Sharp
0
 
LVL 28

Expert Comment

by:sybe
Comment Utility
wow, i learned something today as well, thanks :)
0
 
LVL 1

Author Comment

by:djlurch
Comment Utility
Mike:

Sorry for the delayed response.  EE may have eaten that response too.

Mike, you r-o-c-k.  That's exactly the information I'm looking for.  You have a full understading of the issue.   Look for me over at your developer site as I would like to be part of your community over there.  I may not have a lot to add, but I will post the text of the paper, when it is complete.

Mike, THANK YOU!!!

Troy
0
 
LVL 26

Expert Comment

by:rdcpro
Comment Utility
You're welcome--glad to help.

Regards,
Mike Sharp
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

I will show you how to create a ASP.NET Captcha control without using any HTTP HANDELRS or what so ever. you can easily plug it into your web pages. For Example a = 2 + 3 (where 2 and 3 are 2 random numbers) Session("Answer") = 5 then we…
This article covers the basics of the Sass, which is a CSS extension language. You will learn about variables, mixins, and nesting.
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now