Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Data tables - normalised?

Posted on 2004-08-19
Medium Priority
Last Modified: 2006-11-17
We have a database structure to design for NewsItems. They relate to Topics and to Countries. Any NewsItem can relate to nil, one or many countries; and to nil, one or many Topics. Later we need to be able to search for NewsItems as broadly as possible: by country, by topic, by combination of country and topic etc.

I thought we should set up a Country table, a Topic table and a NewsItem table, and then store a record in an Intersection table which had a CountryID, a TopicID and a NewsItemID. So if a NewsItem related to 2 countries and 3 topics there would be six records put into it.

Someone else suggested we do without the Country table, and just treat countries as a kind of topic. The Intersection table would then have just a TopicID and a NewsItemID. So if a NewsItem related to 2 countries and 3 topics there would be five records put into it. This seems to represent a saving.

I am sure there is something inherently wrong with this, but I am blowed if I could come up with it. I muttered some buzzwords like "data cube" and "normalised" so I think I got away with my reputation intact for the moment but I could use some ammunition later. Can anyone point out any obvious flaws in the second structure?

Obviously I am biased toward solutions that support my own thinking, but if someone shows why the second structure IS OK, I will nevertheless (hopefully gracefully) still award points for it, admit I was incorrect at work and ask for the official Office Dunce Cap to wear for the statutary week (we're a pretty enlightened office - we're only burnt at the stake if we make coding errors).

Discussion welcome.  

Martin C
Question by:MartinC
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 17

Accepted Solution

BillAn1 earned 360 total points
ID: 11838971
You really want to have two intersection tables, for your 3 entities NewsItems, Topics & Countries
one that relates NewsItems to Topics, and one that relates NewsItems to Countries
This is better than having one table with 6 records. The only reason for having a table with all 3 in one row is if you wanted to record the fact that certain topic/Country combinations were in the NewsItem, but no others.
e.g. (and I don't know what your data is really like, so excuse my examples) One news Item may have two topics - weather &  food, and be related to three countries - France, Italy and Spain.
Now if you needed to record the fact that weather was discussed for France & Italy only, and food was discussed for Italy and Spain only, then you need to go with your first model of a table with all 3. If however, the News item simply refers to weather in general, and Italy in general, and there is no specific link as to which topics went with which countires, then the correct method is to have two seperate tables.
Now, if the only thing you record about a country is the NAME, you could do without the table, and so have only tables
NewsItems , Topics, NewsItems_Topics and NewsItems_Country. The only drawback is that if you wanted to record any information about a country, e.g. national press contact number, or capital city or whatever, youcouldn't. Also, if a country were to change it's name (as does happen) you would have to update hundres of records, instead of 1.
So, the most correct, normalised view is 2 join tables, for 3 entities.
However, if you are designing for a data warehouse, then you would most probably use the single 'fact' table i.e. your first proposal with 6 records in a join. However, this should really be done as a second layer warehouseing databse, feeding from a more normalised operational one.

LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 11839216
I agree with BillAn1 in all he said!

Just to say that you might think about more complex situations (which bosses really LIKE to ask):
what if you want to search later news related to continents (ie a list of countries)

Store the Continent as country... --> Don't like that really (think about Australia)
Store the Continent as separate table + add a Continent_ID in the country... --> ok, how do you relate a news to the continent itself --> buzzzzz

Ok, here my full-blown complex design:

News Table

Entity Table
  Entity_Type (Country, Continent, Topic, Galaxy etc)

News_Relation Table
Relation_Entity Table

Ok, how to read the News_Relation and the Relation_Entity tables:
 Every News can have 0 or more News_Relation
 and every of those can relate to 0 (1) or more Entities:

Say you have news 1 that should relate to Europe (Entity_ID = 5)
News_Relation:  Relation_ID = 1, News_ID = 1
Relation_Entity: Relation_ID = 1, Entity_ID = 5

Say you have news 2 that relate to Spain (entity = 7) and Portugal (entity = 8), and on topic Food (entity = 10), which means to store this:
News_Relation: Relation_ID = 2, News_ID = 2
News_Relation: Relation_ID = 3, News_ID = 2
Relation_Entity: Relation_ID = 2, Entity_ID = 7
Relation_Entity: Relation_ID = 2, Entity_ID = 10
Relation_Entity: Relation_ID = 3, Entity_ID = 7
Relation_Entity: Relation_ID = 3, Entity_ID = 10

For your sample (So if a NewsItem related to 2 countries and 3 topics, which no condition between the topics and the countries):
News = 3
News_Relation: relation_id = 4, News_ID = 3
Relation_Entity_ relation_id = 4, Entity_ID = C1
Relation_Entity_ relation_id = 4, Entity_ID = C2
Relation_Entity_ relation_id = 4, Entity_ID = T1
Relation_Entity_ relation_id = 4, Entity_ID = T2
Relation_Entity_ relation_id = 4, Entity_ID = T3

To find all the related news for a given entity:
select n.*
from news n
join News_Relation r
on r.news_id = r.news_id
join Relation_Entity e
on e.relation_id = r.relation_id
where e.entity = <your entity id>

To find news to related to 2 entities:
select n.*
from news n
join News_Relation r
on r.news_id = r.news_id
where r.relation_id in
     ( select e.relation_id
       from Relation_Entity e
       where e.relation_id = r.relation_id
       and e.entity_id in ( <entity_1>, <entity_2> )
       group by e.relation_id
       having count(*) = 2




Expert Comment

ID: 11842357
Keeping country and topic entities seperate also allows for more flexibilty later on.  You may end up having content other than news items that also needs to relate to a specific country.  It may not make sense to call the destination of that relation a topic.

Author Comment

ID: 11878517
OK, thanks for all the discussion. Looks like the consensus is that the gains of separating out the Country table are small and therefore not worth the bother. Thanks to all who contributed: points to BillAn1 who got there fustest with the mostest.

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently, Microsoft released a best-practice guide for securing Active Directory. It's a whopping 300+ pages long. Those of us tasked with securing our company’s databases and systems would, ideally, have time to devote to learning the ins and outs…
In today's business world, data is more important than ever for informing marketing campaigns. Accessing and using data, however, may not come naturally to some creative marketing professionals. Here are four tips for adapting to wield data for insi…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question