Data tables - normalised?

Posted on 2004-08-19
Last Modified: 2006-11-17
We have a database structure to design for NewsItems. They relate to Topics and to Countries. Any NewsItem can relate to nil, one or many countries; and to nil, one or many Topics. Later we need to be able to search for NewsItems as broadly as possible: by country, by topic, by combination of country and topic etc.

I thought we should set up a Country table, a Topic table and a NewsItem table, and then store a record in an Intersection table which had a CountryID, a TopicID and a NewsItemID. So if a NewsItem related to 2 countries and 3 topics there would be six records put into it.

Someone else suggested we do without the Country table, and just treat countries as a kind of topic. The Intersection table would then have just a TopicID and a NewsItemID. So if a NewsItem related to 2 countries and 3 topics there would be five records put into it. This seems to represent a saving.

I am sure there is something inherently wrong with this, but I am blowed if I could come up with it. I muttered some buzzwords like "data cube" and "normalised" so I think I got away with my reputation intact for the moment but I could use some ammunition later. Can anyone point out any obvious flaws in the second structure?

Obviously I am biased toward solutions that support my own thinking, but if someone shows why the second structure IS OK, I will nevertheless (hopefully gracefully) still award points for it, admit I was incorrect at work and ask for the official Office Dunce Cap to wear for the statutary week (we're a pretty enlightened office - we're only burnt at the stake if we make coding errors).

Discussion welcome.  

Martin C
Question by:MartinC
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 17

Accepted Solution

BillAn1 earned 90 total points
ID: 11838971
You really want to have two intersection tables, for your 3 entities NewsItems, Topics & Countries
one that relates NewsItems to Topics, and one that relates NewsItems to Countries
This is better than having one table with 6 records. The only reason for having a table with all 3 in one row is if you wanted to record the fact that certain topic/Country combinations were in the NewsItem, but no others.
e.g. (and I don't know what your data is really like, so excuse my examples) One news Item may have two topics - weather &  food, and be related to three countries - France, Italy and Spain.
Now if you needed to record the fact that weather was discussed for France & Italy only, and food was discussed for Italy and Spain only, then you need to go with your first model of a table with all 3. If however, the News item simply refers to weather in general, and Italy in general, and there is no specific link as to which topics went with which countires, then the correct method is to have two seperate tables.
Now, if the only thing you record about a country is the NAME, you could do without the table, and so have only tables
NewsItems , Topics, NewsItems_Topics and NewsItems_Country. The only drawback is that if you wanted to record any information about a country, e.g. national press contact number, or capital city or whatever, youcouldn't. Also, if a country were to change it's name (as does happen) you would have to update hundres of records, instead of 1.
So, the most correct, normalised view is 2 join tables, for 3 entities.
However, if you are designing for a data warehouse, then you would most probably use the single 'fact' table i.e. your first proposal with 6 records in a join. However, this should really be done as a second layer warehouseing databse, feeding from a more normalised operational one.

LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 11839216
I agree with BillAn1 in all he said!

Just to say that you might think about more complex situations (which bosses really LIKE to ask):
what if you want to search later news related to continents (ie a list of countries)

Store the Continent as country... --> Don't like that really (think about Australia)
Store the Continent as separate table + add a Continent_ID in the country... --> ok, how do you relate a news to the continent itself --> buzzzzz

Ok, here my full-blown complex design:

News Table

Entity Table
  Entity_Type (Country, Continent, Topic, Galaxy etc)

News_Relation Table
Relation_Entity Table

Ok, how to read the News_Relation and the Relation_Entity tables:
 Every News can have 0 or more News_Relation
 and every of those can relate to 0 (1) or more Entities:

Say you have news 1 that should relate to Europe (Entity_ID = 5)
News_Relation:  Relation_ID = 1, News_ID = 1
Relation_Entity: Relation_ID = 1, Entity_ID = 5

Say you have news 2 that relate to Spain (entity = 7) and Portugal (entity = 8), and on topic Food (entity = 10), which means to store this:
News_Relation: Relation_ID = 2, News_ID = 2
News_Relation: Relation_ID = 3, News_ID = 2
Relation_Entity: Relation_ID = 2, Entity_ID = 7
Relation_Entity: Relation_ID = 2, Entity_ID = 10
Relation_Entity: Relation_ID = 3, Entity_ID = 7
Relation_Entity: Relation_ID = 3, Entity_ID = 10

For your sample (So if a NewsItem related to 2 countries and 3 topics, which no condition between the topics and the countries):
News = 3
News_Relation: relation_id = 4, News_ID = 3
Relation_Entity_ relation_id = 4, Entity_ID = C1
Relation_Entity_ relation_id = 4, Entity_ID = C2
Relation_Entity_ relation_id = 4, Entity_ID = T1
Relation_Entity_ relation_id = 4, Entity_ID = T2
Relation_Entity_ relation_id = 4, Entity_ID = T3

To find all the related news for a given entity:
select n.*
from news n
join News_Relation r
on r.news_id = r.news_id
join Relation_Entity e
on e.relation_id = r.relation_id
where e.entity = <your entity id>

To find news to related to 2 entities:
select n.*
from news n
join News_Relation r
on r.news_id = r.news_id
where r.relation_id in
     ( select e.relation_id
       from Relation_Entity e
       where e.relation_id = r.relation_id
       and e.entity_id in ( <entity_1>, <entity_2> )
       group by e.relation_id
       having count(*) = 2




Expert Comment

ID: 11842357
Keeping country and topic entities seperate also allows for more flexibilty later on.  You may end up having content other than news items that also needs to relate to a specific country.  It may not make sense to call the destination of that relation a topic.

Author Comment

ID: 11878517
OK, thanks for all the discussion. Looks like the consensus is that the gains of separating out the Country table are small and therefore not worth the bother. Thanks to all who contributed: points to BillAn1 who got there fustest with the mostest.

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When table data gets too large to manage or queries take too long to execute the solution is often to buy bigger hardware or assign more CPUs and memory resources to the machine to solve the problem. However, the best, cheapest and most effective so…
Azure Functions is a solution for easily running small pieces of code, or "functions," in the cloud. This article shows how to create one of these functions to write directly to Azure Table Storage.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

628 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question