• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 143
  • Last Modified:

SELECT DISTINCT verses additional table. Database design

I have a table called orders which will be very large. Two of many columns on that table are called phone and address. There will have duplicate values for these two columns. While the orders table will be huge, the amount of results it returns back from queries will never be more than 5. That's just the nature of our software design.

On our form is an address input text field with an associated datalist. This field uses ajax to query the database for address values that match the phone number entered by the user, and populates the datalist with the values.

The table would normally return duplicate values, which we do not want. So in terms of database performance, the amount of transactions, amount of data being stored, etc etc.... is it better to:

A) Use a SELECT DISTINCT on the large table?
B) Create another table called addresses. After the initial record is added to the orders table, the phone number and address values also get written to the addresses table, but only if the value for BOTH fields does not already exist.

So the question is in terms of what is the best database practice for best performance to handle a situation like this? SELECT DISTINCT from the orders table, or SELECT with no DISTINCT clause needed on the addresses table? Is the duplication of data and writes worth it?
  • 2
  • 2
2 Solutions
Chris StanyonCommented:
A fundamental aspect of database design is called Normalisation. This is a process of designing your database to reduce data redundancy.

In your case, you seem to have a lot of data redundancy. A good design here would be to have an address table storing the phone number and address, along with an iD (the primary key). Your orders tables then has a foreign key (addressID) which links to the address table.

This will give you a one-to-many relationship - one address can have many orders.

Obviously, you'll need to consider your business rules as to whether this is the right 'model' for you.
BobHavertyComhAuthor Commented:
Hi Chris, thanks for your answer. A phone number and address cannot exist unless it was added as part of an order. So what is the difference between what you are suggesting and just putting the phone number and address values on the orders table itself? Why a second table? The only possible value I saw with option B was that at least the query doesn't have to check duplicate values, because an address and phone number that already exist on the Addresses table does not get written to the Addresses table, whereas the values MUST be duplicated on the orders table, because they are part of a unique event, and are therefore useful. So it seems like a contest between possible slower queries using SELECT DISTINCT on one table, or increasing write transactions and redundant info by creating an additional table that will have a smaller amount of records. But in the model you are suggesting, since every address has to have an ID that corresponds to the orders table, the Address table will have as many rows as the orders table.
Scott PletcherSenior DBACommented:
B) is better in your case.

I expect the same customers to order repeatedly.  A separate addresses table means that the ajax query does not have to read through thousands (or even millions?) of duplicate addresses to provide a DISTINCT list.  It also saves you space overall in the orders table by not storing the same address thousands, or millions, of times.
Chris StanyonCommented:
OK. Like I said, the way you normalise your database will depend largely on your business model.

If a single address/telephone number combination can be saved with several orders, then you will get data duplication. By normalising the data, you store the telephone number and address once, and then link that record to as many order records as you need. This is fundamental to database normalisation. Take a quick look at this data example:

// Your model
Order ID | Order Date | Address | Telephone
1 | 10 Jan 2015 | 123 High Street | 01234 321654
2 | 11 Jan 2015 | 123 High Street | 01234 321654
3 | 12 Jan 2015 | 123 High Street | 01234 321654
4 | 13 Jan 2015 | 123 High Street | 01234 321654
5 | 14 Jan 2015 | 567 Main Street | 555-12123
6 | 15 Jan 2015 | 567 Main Street | 555-12123

You can see here that you have a lot of duplicated data

Now look at a normalised model:

Address ID | Address | Telephone
1 | 123 High Street | 01234 321654
2 | 567 Main Street | 555-12123

Order ID | Order Date | Address ID
1 | 10 Jan 2015 | 1
2 | 11 Jan 2015 | 1
3 | 12 Jan 2015 | 1
4 | 13 Jan 2015 | 1
5 | 14 Jan 2015 | 2
6 | 15 Jan 2015 | 2

No data duplication!

In your code, you would then need to enforce your business rules. When a user enters an address, run a SELECT query against the address table to see if the address already exists. If it does, get the Address ID to insert into the Order Table. If the address doesn't exist, then create a new address record, get the new record's Address ID and use that to insert into the Orders table.

When you want to retrieve the data, you will just need to use a simple JOIN query.
BobHavertyComhAuthor Commented:
Thanks guys. I think Chris's solution was more specific, but both were correct, as long as duplication of information is avoided, and I think it is. My initial idea for B involved a limited amount of duplication, and these solutions really have none, just links. It's just an extra step of checking if it's already there or not and grabbing it if it is, and adding it if it is not.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now