Go Premium for a chance to win a PS4. Enter to Win


Data tables - normalised?

Posted on 2004-08-19
Medium Priority
Last Modified: 2006-11-17
We have a database structure to design for NewsItems. They relate to Topics and to Countries. Any NewsItem can relate to nil, one or many countries; and to nil, one or many Topics. Later we need to be able to search for NewsItems as broadly as possible: by country, by topic, by combination of country and topic etc.

I thought we should set up a Country table, a Topic table and a NewsItem table, and then store a record in an Intersection table which had a CountryID, a TopicID and a NewsItemID. So if a NewsItem related to 2 countries and 3 topics there would be six records put into it.

Someone else suggested we do without the Country table, and just treat countries as a kind of topic. The Intersection table would then have just a TopicID and a NewsItemID. So if a NewsItem related to 2 countries and 3 topics there would be five records put into it. This seems to represent a saving.

I am sure there is something inherently wrong with this, but I am blowed if I could come up with it. I muttered some buzzwords like "data cube" and "normalised" so I think I got away with my reputation intact for the moment but I could use some ammunition later. Can anyone point out any obvious flaws in the second structure?

Obviously I am biased toward solutions that support my own thinking, but if someone shows why the second structure IS OK, I will nevertheless (hopefully gracefully) still award points for it, admit I was incorrect at work and ask for the official Office Dunce Cap to wear for the statutary week (we're a pretty enlightened office - we're only burnt at the stake if we make coding errors).

Discussion welcome.  

Martin C
Question by:MartinC
LVL 17

Accepted Solution

BillAn1 earned 360 total points
ID: 11838971
You really want to have two intersection tables, for your 3 entities NewsItems, Topics & Countries
one that relates NewsItems to Topics, and one that relates NewsItems to Countries
This is better than having one table with 6 records. The only reason for having a table with all 3 in one row is if you wanted to record the fact that certain topic/Country combinations were in the NewsItem, but no others.
e.g. (and I don't know what your data is really like, so excuse my examples) One news Item may have two topics - weather &  food, and be related to three countries - France, Italy and Spain.
Now if you needed to record the fact that weather was discussed for France & Italy only, and food was discussed for Italy and Spain only, then you need to go with your first model of a table with all 3. If however, the News item simply refers to weather in general, and Italy in general, and there is no specific link as to which topics went with which countires, then the correct method is to have two seperate tables.
Now, if the only thing you record about a country is the NAME, you could do without the table, and so have only tables
NewsItems , Topics, NewsItems_Topics and NewsItems_Country. The only drawback is that if you wanted to record any information about a country, e.g. national press contact number, or capital city or whatever, youcouldn't. Also, if a country were to change it's name (as does happen) you would have to update hundres of records, instead of 1.
So, the most correct, normalised view is 2 join tables, for 3 entities.
However, if you are designing for a data warehouse, then you would most probably use the single 'fact' table i.e. your first proposal with 6 records in a join. However, this should really be done as a second layer warehouseing databse, feeding from a more normalised operational one.

LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 11839216
I agree with BillAn1 in all he said!

Just to say that you might think about more complex situations (which bosses really LIKE to ask):
what if you want to search later news related to continents (ie a list of countries)

Store the Continent as country... --> Don't like that really (think about Australia)
Store the Continent as separate table + add a Continent_ID in the country... --> ok, how do you relate a news to the continent itself --> buzzzzz

Ok, here my full-blown complex design:

News Table

Entity Table
  Entity_Type (Country, Continent, Topic, Galaxy etc)

News_Relation Table
Relation_Entity Table

Ok, how to read the News_Relation and the Relation_Entity tables:
 Every News can have 0 or more News_Relation
 and every of those can relate to 0 (1) or more Entities:

Say you have news 1 that should relate to Europe (Entity_ID = 5)
News_Relation:  Relation_ID = 1, News_ID = 1
Relation_Entity: Relation_ID = 1, Entity_ID = 5

Say you have news 2 that relate to Spain (entity = 7) and Portugal (entity = 8), and on topic Food (entity = 10), which means to store this:
News_Relation: Relation_ID = 2, News_ID = 2
News_Relation: Relation_ID = 3, News_ID = 2
Relation_Entity: Relation_ID = 2, Entity_ID = 7
Relation_Entity: Relation_ID = 2, Entity_ID = 10
Relation_Entity: Relation_ID = 3, Entity_ID = 7
Relation_Entity: Relation_ID = 3, Entity_ID = 10

For your sample (So if a NewsItem related to 2 countries and 3 topics, which no condition between the topics and the countries):
News = 3
News_Relation: relation_id = 4, News_ID = 3
Relation_Entity_ relation_id = 4, Entity_ID = C1
Relation_Entity_ relation_id = 4, Entity_ID = C2
Relation_Entity_ relation_id = 4, Entity_ID = T1
Relation_Entity_ relation_id = 4, Entity_ID = T2
Relation_Entity_ relation_id = 4, Entity_ID = T3

To find all the related news for a given entity:
select n.*
from news n
join News_Relation r
on r.news_id = r.news_id
join Relation_Entity e
on e.relation_id = r.relation_id
where e.entity = <your entity id>

To find news to related to 2 entities:
select n.*
from news n
join News_Relation r
on r.news_id = r.news_id
where r.relation_id in
     ( select e.relation_id
       from Relation_Entity e
       where e.relation_id = r.relation_id
       and e.entity_id in ( <entity_1>, <entity_2> )
       group by e.relation_id
       having count(*) = 2




Expert Comment

ID: 11842357
Keeping country and topic entities seperate also allows for more flexibilty later on.  You may end up having content other than news items that also needs to relate to a specific country.  It may not make sense to call the destination of that relation a topic.

Author Comment

ID: 11878517
OK, thanks for all the discussion. Looks like the consensus is that the gains of separating out the Country table are small and therefore not worth the bother. Thanks to all who contributed: points to BillAn1 who got there fustest with the mostest.

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article, we’ll look at how to deploy ProxySQL.
How much do you know about the future of data centers? If you're like 50% of organizations, then it's probably not enough. Read on to get up to speed on this emerging field.
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses

824 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question