Solved

What Codd once said...

Posted on 2015-02-03
134 Views
Qoute:
... notice the term distinct—a set has no duplicates. Codd once remarked on the no duplicates aspect: ”If something is true, then saying it twice won't make it any truer.“ For example, the set {a, b, c} is considered equal to the set {a, a, b, c, c, c}.

My understanding: A set like set {a, b, c} is distinct

The above writing states: For example, the set {a, b, c} is considered equal to the set {a, a, b, c, c, c}.

My question: I see repeat elements in set {a, a, b, c, c, c}. more than one a. Also there are more than one c. How can we say this set is distinct?
0
Question by:Mike Eghtebas
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• 8
• 6
• 2
• +3

LVL 48

Accepted Solution

PortletPaul earned 100 total points
ID: 40588053
In my view this is semantics.

An element is ether part of a set, or it is not part of a set; and this also means It cannot be part of a set more than once

so displaying a set as {a, a, b, c, c, c} misleads as a and c cannot be part of the same set more than once
hence {a, a, b, c, c, c} is actually {a, b, c}

and thus {a, b, c} = {a, b, c}
0

LVL 74

Assisted Solution

sdstuber earned 150 total points
ID: 40588571
in math language  {a,a,b,c,c,c} would be called a "multiset"  because it allows multiples of elements (technically a multiset also lacks order)

multisets are quite common when working with databases.

select customer_name from customer_orders

unless your customers only ever shop one time; that query will produce a multiset result.

Note though that SQL Server does not have support for multiset-specific operators; but Oracle does but they apply to nested table type collections, not the more common sql  interactions.
0

LVL 27

Assisted Solution

BigRat earned 100 total points
ID: 40588933
I must disagree with Paul, for a set is simply a collection of objects. All the balls in Snooker form a set, the red ball appears 15 times. In CODASYL databases, the precusor of SQL, the concept of sets of records was made by linking record to record, and consequently columns could be effectively duplicated. One navigated via sets.

Codd's norm rules were designed to to ensure that each table in the database has keys with which one accesses the data. Each table should avoid as far as possible duplicate columns particularly if they are keys. The concept of columns which connect two tables, the join, was introduced and formalized. But to preserve the possibilty of duplicated data, the select statement had optionally DISTINCT as a sort of "aggregate" function.

so SELECT DISTINCT(balls) FROM snooker;

should give you a set with 8 elements. So what would SELECT COUNT(DISTINCT(balls)) FROM snooker WHERE color=red; give you?
0

LVL 34

Author Comment

ID: 40589030
The following tables, one in Standard SQL and the other in T-SQL to continue the discussion on sets used,

Relation     (in T-SQL called Table)
-----------------------------------------------
Name             City              <-- Heading: set of attributes with attribute name and type
Jeff               New York       \    in T-SQL attribute is known as column. (no order for columns)
Mike             Los Angeles    >
Hans            Pasadena       /   Body: set of distinct tuples (in T-SQL distinct rows)
(no order for tuples)

Here we have two following sets:
set(Name,  City)  and set(Jeff - New York , Mike - Los Angeles, Hans- Pasadena)

Here we have the an identical table in T-SQL:
Table1
-----------------------------------------------
RID       Name             City              <-- Heading: set of attributes (in T-SQL called columns)
100        Jeff               New York       \                                       (There is no order for columns)
102        Mike             Los Angeles    \
103        Mike             Los Angeles    >  Body: set of distinct rows, T-SQL allows repeated rows
104        Hans            Pasadena       /   thus, hidden row id (RID) is added to make it distinct.

Here we have two following sets: without including RID
set(Name,  City)  and set(Jeff - New York , Mike - Los Angeles, Mike - Los Angeles, Hans- Pasadena)
==============================================
sdstuber,

- multiset also lacks order
As you indicate, a set by definition lacks order (regardless multiset or not).
But sequence unlike a set has order.

-multisets are quite common when working with databases.
From above examples:
Heading: set(Name,  City)                                            <--  in both Standard SQL and T-SQL, not a multiset (wich is not possible)
Body: set(Jeff - New York , Mike - Los Angeles, Hans- Pasadena)      <-- in Standard SQL, not a multiset  (wich is not possible)
Body: set(Jeff - New York , Mike - Los Angeles, Mike - Los Angeles, Hans- Pasadena)
^--- in T-SQL, a multiset, wheret T-SQL deviates from Standard SQL

- Note though that SQL Server does not have support for multiset-specific operators; but Oracle does
Unfortunately I am not familiar enough with Oracle yet, but the book I am reading says T-SQL allows multiset in rows but in order to stay in compliance with Standard SQL, automatically add a hidden column named RID (Row ID) to it.
-----------------------------------------
PortletPaul,

I can see the applicability of set as {a, a, b, c, c, c} in conjunction with rows in T-SQL but not SQL Server where Codd was referring to.

I am still sort of at loss why set {a, b, c} is considered equal to the set {a, a, b, c, c, c}

As set like " set {a, a, b, c, c, c}" as no place in Standard SQL regardless it is for heading or body. I am sure there is some explanation that I will get grasp of it at some point.

Mike

-------------
BigRat,

I just saw your post, The use of DISTINCT comes to play when we move to query discussions. The point you are making is correct in that regards. Here my focus is at table level for simplicity. I want to know why Codd is saying:

set {a, b, c} is considered equal to the set {a, a, b, c, c, c}

This post has been revised...
0

LVL 74

Assisted Solution

sdstuber earned 150 total points
ID: 40589041
well one easy example of {a,b,c} and {a,a,b,c,c,c,} equivalence is an IN clause

select * from some_table
where some_column IN ( select X from some_other_table)

if some other table represents the set {a,b,c}  or the set {a,a,b,c,c,c} it doesn't matter.  You'll get the same results.
0

LVL 34

Author Comment

ID: 40589089
sdstuber,

This is true but it is used as predicate in the queries: where some_column IN ( select X from some_other_table)

Now, the question becomes what Codd had in mind. Was he talking about relations (tables) or queries?

I strongly feel that he was referring to relations (tables) only. We know that

WHERE Col_1 IN(a,a,b,c,c,c)     will produce the same out put as WHERE Col_1 IN(a,,b,c) but why Codd is using such a poor example?

Mike
0

LVL 27

Assisted Solution

BigRat earned 100 total points
ID: 40589112
Hmm, the question is becoming more intriguing. Where exactly are you quoting Ted Codd from? Perhaps I can find it in my back issues of the ACM?
0

LVL 74

Assisted Solution

sdstuber earned 150 total points
ID: 40589122
I don't know your quote to read it in the full context to determine the intent.

The statement of equivalence is true but trivially so within the contexts of simple sets.
However, even within that simplified realm, you can still get multiples.
Codd didn't invent cartesian relations, they are part of set theory

So, as simple product of (a,b) and (c,d) will yield

a,c
a,d
b,c
b,d

And now, if you look at the first element of each pair you have multiples.
Of course, if you collect them into a set then they would be reduced back to (a,b).

Codd would have certainly known this and recognized that the multiple a and b elements in the pairs could be thought of as distinct based on their paired value.  On the other hand,  pairing a with c doesn't change its value, so there is still only one a and one b.  Thus (a,a,b,b) would be equivalent to (a,b).

There is just as much information in the multiset as in the set.  So they are equivalent.
My guess, lacking context, is this evaluation of information is what he was trying to convey. - but again, that's just my guess.
0

LVL 34

Author Comment

ID: 40589126
From: Lesson 1: Understanding the Foundations of T-SQL Chapter 1 Page 5
(Please see the attache image for book cover)

Notice the term distinct—a set has no duplicates. Codd once remarked on the no duplicates
aspect: ”If something is true, then saying it twice won't make it any truer.“ For example,
the set {a, b, c} is considered equal to the set {a, a, b, c, c, c}.
Another critical aspect of a set doesn’t explicitly appear in the aforementioned definition
by Cantor, but rather is implied—there’s no relevance to the order of elements in a set. In
contrast, a sequence (which is an ordered set), for example, does have an order to its elements.
Combining the no duplicates and no relevance to order aspects means that the set
{a, b, c} is equal to the set {b, a, c, c, a, c}.
BookCover.png
0

LVL 74

Expert Comment

ID: 40589144
hmmm, a quote of an anecdote.

I'm not sure how much rigor we can apply to that.
0

LVL 34

Author Comment

ID: 40589147
sdstuber,

Oh, now I kind of see why possibly set {a, a, b, c, c, c} is distinct. It is distinct maybe because it was like

a   a
b   c
c   c

in two columns.

The reason for lack of 100% vote by using maybe is to get some consensuses for all experts before I completely see it.

Mike
0

LVL 74

Expert Comment

ID: 40589151
probably not

by representation    {a, a, b, c, c, c}  would be a multiset of singleton values

whereas

a   a
b   c
c   c

looks like a set of of paired values

{    {a,b}, {b,c}, {c,c}  }
0

LVL 34

Author Comment

ID: 40589228
Errata & Book Support  <-- just in case it could help
We’ve made every effort to ensure the accuracy of this book and its companion content.
Any errors that have been reported since this book was published are listed on our Microsoft
Press site at oreilly.com:
If you find an error that is not already listed, you can report it to us through the same
page.
If you need additional support, email Microsoft Press Book Support at mspinput@
microsoft.com.
Please note that product support for Microsoft software is not offered through the

but, the link provided above redirects me elsewhere...
0

LVL 27

Assisted Solution

aburr earned 50 total points
ID: 40589307
If you postulate that (a b c) is equivalent to (a a b c c c )   you get one set of consequences
If you postulate that (a b c) is different from (a a b c c c) you get a different set of consequences.
Use which ever postulate is useful.
0

LVL 24

Assisted Solution

Phillip Burton earned 100 total points
ID: 40590416
Basically:

1. In set theory they are considered equivalent.
2. In T-SQL there are considered different.

This section is all about set theory instead of T-SQL; don't sweat it too much,
0

LVL 34

Author Comment

ID: 40590434
aburr,

Thank you for your comment. Could you please give me a reference or link (on SETS for beginners) so that I could understand why the followings apply?

If you postulate that (a b c) is equivalent to (a a b c c c )   you get one set of consequences
If you postulate that (a b c) is different from (a a b c c c) you get a different set of consequences.

Just to illustrate how lost I am reading your post, I feel like I can say:
If I postulate Apple is the same as Orange they will be the same.
Possibly you mean to say if I populate Apple Fruit_set is the same as Orange Fruit_set because they are both fruits?
This means I need to do some reading to understand the point you are making.

Greatly appreciated,

Mike

These definitions are here for my own use and reference:
postulate: suggest or assume the existence, fact, or truth of (something) as a basis for reasoning, discussion, or belief.
consequences: a result or effect of an action or condition

--------------------------------------
Phillip,

Thank you for the comment. About "don't sweat it too much", I suppose you don't mean shove the dirt under the carpet LOL. Not sweating for me is looking the other way.

Basically, to confirm with you, in set theory the set {a, b, c} is considered equal to the set {a, a, b, c, c, c}?

Also, why in T-SQL there are considered different? Is this because it doesn't adhere to the definition  of relation in Standard SQL?

Thanks,

Mike
0

LVL 24

Assisted Solution

Phillip Burton earned 100 total points
ID: 40590566
"don't sweat it too much" means "don't worry about it too much" - see for instance the book "Don't Sweat the Small Stuff-- and it's All Small Stuff: Simple Ways to Keep the Little Things from Taking over Your Life". This first section of the first chapter is really an introduction to set theory, as opposed to T-SQL, it's not 100% relevant IMO to what you NEED to know.

Yes, in set theory the set {a, b, c} is considered equal to the set {a, a, b, c, c, c}

In set theory, two "rows" which are the same are equal to one "row". It's like counting 1-10, and going 1, 2, 3, 4, 3, 5, 6, 2, 7, 8, 1, 9, 10. You've still counted 1-10, 10 numbers.

In T-SQL, two rows which are the same are counted as two rows. So, in the above, you've had 13 rows, which might be important if you ask "how many number 3s are there"?
0

LVL 74

Expert Comment

ID: 40591085
>>> Also, why in T-SQL there are considered different? Is this because it doesn't adhere to the definition  of relation in Standard SQL?

Sort of

as PortletPaul noted above you can't really have a set {a,a,b,c,c,c}  if you construct a set by adding those elements you will get {a,b,c}, a set of 3 elements, not a set of 5, because sets are distinct.

My "IN" clause above relates strongly to this and to the statement of "saying something is true"
X is either is or is not a member of a collection (whether that collection is a set or otherwise).  If that collection has multiple copies of X in it, that doesn't make a statement "X is a member" any more true than if it had only one.

So, the collections are equivalent.

In SQL, the same is true, except.... we can always (albeit sometimes expensively) identify distinct rows in a table, so row 1 with value X is different from row 2 with value X.  So,  they are implicitly distinct.  So, you always have a set if you want to look closely enough.

On the other hand,  if you do something like this...

select X from my_table,  then you simply have a collection of values and you've lost the row identifications that make them distinct.  This result may or may not be a set.

If it is a set, then again, {a,a,b,c,c,c} is not a possible result, because that is not a set.

If it is a multiset, then clearly  {a,b,c} is NOT the same as {a,a,b,c,c,c} in general.
BUT... those two multisets might still be considered equivalent depending on the usage (such as IN clause)

I think you're over-thinking this.   Especially since you're referencing a doc which is then only vaguely referencing a quote; and, who knows, maybe it's not even a real quote, but merely the author's opinion on a notion of Codd's and put it in quotes to highlight that it wasn't the author's idea but an attempt to give credit to an external idea.

So I'll agree with the "don't sweat it".
0

LVL 34

Author Comment

ID: 40593062
re> So I'll agree with the "don't sweat it".

Well, to me this seems winking it. I will not be able to move on until I figure out what is this all about. This maybe a personality disorder or some sort of resolve and strength of mine. I will not argue for one or another. This is the way I operate.

I will post a new question stating my question a bit different.

Thank you all. I appreciate for the time and effort.

Regards,

Mike
0

LVL 34

Author Comment

ID: 40593121
The new question is at:

http://www.experts-exchange.com/Database/MS-SQL-Server/Q_28611544.html

We understand that a set used as predicate (in WHERE clause) turns a set like IN(a, a, b, c, c, c) to IN(a, b, c) hence we can see why they are equal. But we are talking about sets used in heading or body not predicate. For more on this see the above link to see why in this discussion predicate has to be excluded.
0

Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
When we purchase storage, we typically are advertised storage of 500GB, 1TB, 2TB and so on. However, when you actually install it into your computer, your 500GB HDD will actually show up as 465GB. Why? It has to do with the way people and computers…
This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.
Suggested Courses
Course of the Month8 days, left to enroll