• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 296
  • Last Modified:

Data tables - normalised?

We have a database structure to design for NewsItems. They relate to Topics and to Countries. Any NewsItem can relate to nil, one or many countries; and to nil, one or many Topics. Later we need to be able to search for NewsItems as broadly as possible: by country, by topic, by combination of country and topic etc.

I thought we should set up a Country table, a Topic table and a NewsItem table, and then store a record in an Intersection table which had a CountryID, a TopicID and a NewsItemID. So if a NewsItem related to 2 countries and 3 topics there would be six records put into it.

Someone else suggested we do without the Country table, and just treat countries as a kind of topic. The Intersection table would then have just a TopicID and a NewsItemID. So if a NewsItem related to 2 countries and 3 topics there would be five records put into it. This seems to represent a saving.

I am sure there is something inherently wrong with this, but I am blowed if I could come up with it. I muttered some buzzwords like "data cube" and "normalised" so I think I got away with my reputation intact for the moment but I could use some ammunition later. Can anyone point out any obvious flaws in the second structure?

Obviously I am biased toward solutions that support my own thinking, but if someone shows why the second structure IS OK, I will nevertheless (hopefully gracefully) still award points for it, admit I was incorrect at work and ask for the official Office Dunce Cap to wear for the statutary week (we're a pretty enlightened office - we're only burnt at the stake if we make coding errors).

Discussion welcome.  

Martin C
1 Solution
You really want to have two intersection tables, for your 3 entities NewsItems, Topics & Countries
one that relates NewsItems to Topics, and one that relates NewsItems to Countries
This is better than having one table with 6 records. The only reason for having a table with all 3 in one row is if you wanted to record the fact that certain topic/Country combinations were in the NewsItem, but no others.
e.g. (and I don't know what your data is really like, so excuse my examples) One news Item may have two topics - weather &  food, and be related to three countries - France, Italy and Spain.
Now if you needed to record the fact that weather was discussed for France & Italy only, and food was discussed for Italy and Spain only, then you need to go with your first model of a table with all 3. If however, the News item simply refers to weather in general, and Italy in general, and there is no specific link as to which topics went with which countires, then the correct method is to have two seperate tables.
Now, if the only thing you record about a country is the NAME, you could do without the table, and so have only tables
NewsItems , Topics, NewsItems_Topics and NewsItems_Country. The only drawback is that if you wanted to record any information about a country, e.g. national press contact number, or capital city or whatever, youcouldn't. Also, if a country were to change it's name (as does happen) you would have to update hundres of records, instead of 1.
So, the most correct, normalised view is 2 join tables, for 3 entities.
However, if you are designing for a data warehouse, then you would most probably use the single 'fact' table i.e. your first proposal with 6 records in a join. However, this should really be done as a second layer warehouseing databse, feeding from a more normalised operational one.

Guy Hengel [angelIII / a3]Billing EngineerCommented:
I agree with BillAn1 in all he said!

Just to say that you might think about more complex situations (which bosses really LIKE to ask):
what if you want to search later news related to continents (ie a list of countries)

Store the Continent as country... --> Don't like that really (think about Australia)
Store the Continent as separate table + add a Continent_ID in the country... --> ok, how do you relate a news to the continent itself --> buzzzzz

Ok, here my full-blown complex design:

News Table

Entity Table
  Entity_Type (Country, Continent, Topic, Galaxy etc)

News_Relation Table
Relation_Entity Table

Ok, how to read the News_Relation and the Relation_Entity tables:
 Every News can have 0 or more News_Relation
 and every of those can relate to 0 (1) or more Entities:

Say you have news 1 that should relate to Europe (Entity_ID = 5)
News_Relation:  Relation_ID = 1, News_ID = 1
Relation_Entity: Relation_ID = 1, Entity_ID = 5

Say you have news 2 that relate to Spain (entity = 7) and Portugal (entity = 8), and on topic Food (entity = 10), which means to store this:
News_Relation: Relation_ID = 2, News_ID = 2
News_Relation: Relation_ID = 3, News_ID = 2
Relation_Entity: Relation_ID = 2, Entity_ID = 7
Relation_Entity: Relation_ID = 2, Entity_ID = 10
Relation_Entity: Relation_ID = 3, Entity_ID = 7
Relation_Entity: Relation_ID = 3, Entity_ID = 10

For your sample (So if a NewsItem related to 2 countries and 3 topics, which no condition between the topics and the countries):
News = 3
News_Relation: relation_id = 4, News_ID = 3
Relation_Entity_ relation_id = 4, Entity_ID = C1
Relation_Entity_ relation_id = 4, Entity_ID = C2
Relation_Entity_ relation_id = 4, Entity_ID = T1
Relation_Entity_ relation_id = 4, Entity_ID = T2
Relation_Entity_ relation_id = 4, Entity_ID = T3

To find all the related news for a given entity:
select n.*
from news n
join News_Relation r
on r.news_id = r.news_id
join Relation_Entity e
on e.relation_id = r.relation_id
where e.entity = <your entity id>

To find news to related to 2 entities:
select n.*
from news n
join News_Relation r
on r.news_id = r.news_id
where r.relation_id in
     ( select e.relation_id
       from Relation_Entity e
       where e.relation_id = r.relation_id
       and e.entity_id in ( <entity_1>, <entity_2> )
       group by e.relation_id
       having count(*) = 2



Keeping country and topic entities seperate also allows for more flexibilty later on.  You may end up having content other than news items that also needs to relate to a specific country.  It may not make sense to call the destination of that relation a topic.
MartinCAuthor Commented:
OK, thanks for all the discussion. Looks like the consensus is that the gains of separating out the Country table are small and therefore not worth the bother. Thanks to all who contributed: points to BillAn1 who got there fustest with the mostest.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

A proven path to a career in data science

At Springboard, we know how to get you a job in data science. With Springboard’s Data Science Career Track, you’ll master data science  with a curriculum built by industry experts. You’ll work on real projects, and get 1-on-1 mentorship from a data scientist.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now