Solved

What Codd once said...

Posted on 2015-02-03
20
114 Views
Last Modified: 2015-02-06
Qoute:
... notice the term distinct—a set has no duplicates. Codd once remarked on the no duplicates aspect: ”If something is true, then saying it twice won't make it any truer.“ For example, the set {a, b, c} is considered equal to the set {a, a, b, c, c, c}.

My understanding: A set like set {a, b, c} is distinct

The above writing states: For example, the set {a, b, c} is considered equal to the set {a, a, b, c, c, c}.

My question: I see repeat elements in set {a, a, b, c, c, c}. more than one a. Also there are more than one c. How can we say this set is distinct?
0
Comment
Question by:Mike Eghtebas
  • 8
  • 6
  • 2
  • +3
20 Comments
 
LVL 48

Accepted Solution

by:
PortletPaul earned 100 total points
ID: 40588053
In my view this is semantics.

An element is ether part of a set, or it is not part of a set; and this also means It cannot be part of a set more than once

so displaying a set as {a, a, b, c, c, c} misleads as a and c cannot be part of the same set more than once
hence {a, a, b, c, c, c} is actually {a, b, c}

and thus {a, b, c} = {a, b, c}
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 150 total points
ID: 40588571
in math language  {a,a,b,c,c,c} would be called a "multiset"  because it allows multiples of elements (technically a multiset also lacks order)

multisets are quite common when working with databases.

select customer_name from customer_orders

unless your customers only ever shop one time; that query will produce a multiset result.

Note though that SQL Server does not have support for multiset-specific operators; but Oracle does but they apply to nested table type collections, not the more common sql  interactions.
0
 
LVL 27

Assisted Solution

by:BigRat
BigRat earned 100 total points
ID: 40588933
I must disagree with Paul, for a set is simply a collection of objects. All the balls in Snooker form a set, the red ball appears 15 times. In CODASYL databases, the precusor of SQL, the concept of sets of records was made by linking record to record, and consequently columns could be effectively duplicated. One navigated via sets.

Codd's norm rules were designed to to ensure that each table in the database has keys with which one accesses the data. Each table should avoid as far as possible duplicate columns particularly if they are keys. The concept of columns which connect two tables, the join, was introduced and formalized. But to preserve the possibilty of duplicated data, the select statement had optionally DISTINCT as a sort of "aggregate" function.

so SELECT DISTINCT(balls) FROM snooker;

should give you a set with 8 elements. So what would SELECT COUNT(DISTINCT(balls)) FROM snooker WHERE color=red; give you?
0
 
LVL 33

Author Comment

by:Mike Eghtebas
ID: 40589030
The following tables, one in Standard SQL and the other in T-SQL to continue the discussion on sets used,

Relation     (in T-SQL called Table)
-----------------------------------------------
Name             City              <-- Heading: set of attributes with attribute name and type
Jeff               New York       \    in T-SQL attribute is known as column. (no order for columns)
Mike             Los Angeles    >
Hans            Pasadena       /   Body: set of distinct tuples (in T-SQL distinct rows)
                                                                                          (no order for tuples)

Here we have two following sets:
set(Name,  City)  and set(Jeff - New York , Mike - Los Angeles, Hans- Pasadena)

Here we have the an identical table in T-SQL:
Table1
-----------------------------------------------
RID       Name             City              <-- Heading: set of attributes (in T-SQL called columns)
100        Jeff               New York       \                                       (There is no order for columns)
102        Mike             Los Angeles    \
103        Mike             Los Angeles    >  Body: set of distinct rows, T-SQL allows repeated rows
104        Hans            Pasadena       /   thus, hidden row id (RID) is added to make it distinct.

Here we have two following sets: without including RID
set(Name,  City)  and set(Jeff - New York , Mike - Los Angeles, Mike - Los Angeles, Hans- Pasadena)
==============================================
sdstuber,

- multiset also lacks order
As you indicate, a set by definition lacks order (regardless multiset or not).
But sequence unlike a set has order.

-multisets are quite common when working with databases.
From above examples:
Heading: set(Name,  City)                                            <--  in both Standard SQL and T-SQL, not a multiset (wich is not possible)
Body: set(Jeff - New York , Mike - Los Angeles, Hans- Pasadena)      <-- in Standard SQL, not a multiset  (wich is not possible)
Body: set(Jeff - New York , Mike - Los Angeles, Mike - Los Angeles, Hans- Pasadena)      
              ^--- in T-SQL, a multiset, wheret T-SQL deviates from Standard SQL

- Note though that SQL Server does not have support for multiset-specific operators; but Oracle does
Unfortunately I am not familiar enough with Oracle yet, but the book I am reading says T-SQL allows multiset in rows but in order to stay in compliance with Standard SQL, automatically add a hidden column named RID (Row ID) to it.
-----------------------------------------
PortletPaul,

I can see the applicability of set as {a, a, b, c, c, c} in conjunction with rows in T-SQL but not SQL Server where Codd was referring to.    

I am still sort of at loss why set {a, b, c} is considered equal to the set {a, a, b, c, c, c}

As set like " set {a, a, b, c, c, c}" as no place in Standard SQL regardless it is for heading or body. I am sure there is some explanation that I will get grasp of it at some point.

Mike

-------------
BigRat,

I just saw your post, The use of DISTINCT comes to play when we move to query discussions. The point you are making is correct in that regards. Here my focus is at table level for simplicity. I want to know why Codd is saying:

set {a, b, c} is considered equal to the set {a, a, b, c, c, c}


This post has been revised...
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 150 total points
ID: 40589041
well one easy example of {a,b,c} and {a,a,b,c,c,c,} equivalence is an IN clause


select * from some_table
where some_column IN ( select X from some_other_table)

if some other table represents the set {a,b,c}  or the set {a,a,b,c,c,c} it doesn't matter.  You'll get the same results.
0
 
LVL 33

Author Comment

by:Mike Eghtebas
ID: 40589089
sdstuber,

This is true but it is used as predicate in the queries: where some_column IN ( select X from some_other_table)

Now, the question becomes what Codd had in mind. Was he talking about relations (tables) or queries?

I strongly feel that he was referring to relations (tables) only. We know that

WHERE Col_1 IN(a,a,b,c,c,c)     will produce the same out put as WHERE Col_1 IN(a,,b,c) but why Codd is using such a poor example?

Mike
0
 
LVL 27

Assisted Solution

by:BigRat
BigRat earned 100 total points
ID: 40589112
Hmm, the question is becoming more intriguing. Where exactly are you quoting Ted Codd from? Perhaps I can find it in my back issues of the ACM?
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 150 total points
ID: 40589122
I don't know your quote to read it in the full context to determine the intent.

The statement of equivalence is true but trivially so within the contexts of simple sets.
However, even within that simplified realm, you can still get multiples.
Codd didn't invent cartesian relations, they are part of set theory

So, as simple product of (a,b) and (c,d) will yield

a,c
a,d
b,c
b,d

And now, if you look at the first element of each pair you have multiples.
Of course, if you collect them into a set then they would be reduced back to (a,b).

Codd would have certainly known this and recognized that the multiple a and b elements in the pairs could be thought of as distinct based on their paired value.  On the other hand,  pairing a with c doesn't change its value, so there is still only one a and one b.  Thus (a,a,b,b) would be equivalent to (a,b).

There is just as much information in the multiset as in the set.  So they are equivalent.
My guess, lacking context, is this evaluation of information is what he was trying to convey. - but again, that's just my guess.
0
 
LVL 33

Author Comment

by:Mike Eghtebas
ID: 40589126
From: Lesson 1: Understanding the Foundations of T-SQL Chapter 1 Page 5
(Please see the attache image for book cover)

Notice the term distinct—a set has no duplicates. Codd once remarked on the no duplicates
aspect: ”If something is true, then saying it twice won't make it any truer.“ For example,
the set {a, b, c} is considered equal to the set {a, a, b, c, c, c}.
Another critical aspect of a set doesn’t explicitly appear in the aforementioned definition
by Cantor, but rather is implied—there’s no relevance to the order of elements in a set. In
contrast, a sequence (which is an ordered set), for example, does have an order to its elements.
Combining the no duplicates and no relevance to order aspects means that the set
{a, b, c} is equal to the set {b, a, c, c, a, c}.
BookCover.png
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 40589144
hmmm, a quote of an anecdote.

I'm not sure how much rigor we can apply to that.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 33

Author Comment

by:Mike Eghtebas
ID: 40589147
sdstuber,

Oh, now I kind of see why possibly set {a, a, b, c, c, c} is distinct. It is distinct maybe because it was like

a   a
b   c
c   c

in two columns.

The reason for lack of 100% vote by using maybe is to get some consensuses for all experts before I completely see it.

Mike
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 40589151
probably not

by representation    {a, a, b, c, c, c}  would be a multiset of singleton values

whereas

a   a
b   c
c   c


looks like a set of of paired values

{    {a,b}, {b,c}, {c,c}  }
0
 
LVL 33

Author Comment

by:Mike Eghtebas
ID: 40589228
Errata & Book Support  <-- just in case it could help
We’ve made every effort to ensure the accuracy of this book and its companion content.
Any errors that have been reported since this book was published are listed on our Microsoft
Press site at oreilly.com:
http://go.microsoft.com/FWLink/?Linkid=263549
If you find an error that is not already listed, you can report it to us through the same
page.
If you need additional support, email Microsoft Press Book Support at mspinput@
microsoft.com.
Please note that product support for Microsoft software is not offered through the
addresses above.

but, the link provided above redirects me elsewhere...
0
 
LVL 27

Assisted Solution

by:aburr
aburr earned 50 total points
ID: 40589307
If you postulate that (a b c) is equivalent to (a a b c c c )   you get one set of consequences
If you postulate that (a b c) is different from (a a b c c c) you get a different set of consequences.
Use which ever postulate is useful.
0
 
LVL 24

Assisted Solution

by:Phillip Burton
Phillip Burton earned 100 total points
ID: 40590416
Basically:

1. In set theory they are considered equivalent.
2. In T-SQL there are considered different.

This section is all about set theory instead of T-SQL; don't sweat it too much,
0
 
LVL 33

Author Comment

by:Mike Eghtebas
ID: 40590434
aburr,

Thank you for your comment. Could you please give me a reference or link (on SETS for beginners) so that I could understand why the followings apply?  

If you postulate that (a b c) is equivalent to (a a b c c c )   you get one set of consequences
If you postulate that (a b c) is different from (a a b c c c) you get a different set of consequences.

Just to illustrate how lost I am reading your post, I feel like I can say:
If I postulate Apple is the same as Orange they will be the same.
Possibly you mean to say if I populate Apple Fruit_set is the same as Orange Fruit_set because they are both fruits?
This means I need to do some reading to understand the point you are making.

Greatly appreciated,

Mike

These definitions are here for my own use and reference:
postulate: suggest or assume the existence, fact, or truth of (something) as a basis for reasoning, discussion, or belief.
consequences: a result or effect of an action or condition

--------------------------------------
Phillip,

Thank you for the comment. About "don't sweat it too much", I suppose you don't mean shove the dirt under the carpet LOL. Not sweating for me is looking the other way.

Basically, to confirm with you, in set theory the set {a, b, c} is considered equal to the set {a, a, b, c, c, c}?

Also, why in T-SQL there are considered different? Is this because it doesn't adhere to the definition  of relation in Standard SQL?

Thanks,

Mike
0
 
LVL 24

Assisted Solution

by:Phillip Burton
Phillip Burton earned 100 total points
ID: 40590566
"don't sweat it too much" means "don't worry about it too much" - see for instance the book "Don't Sweat the Small Stuff-- and it's All Small Stuff: Simple Ways to Keep the Little Things from Taking over Your Life". This first section of the first chapter is really an introduction to set theory, as opposed to T-SQL, it's not 100% relevant IMO to what you NEED to know.

Yes, in set theory the set {a, b, c} is considered equal to the set {a, a, b, c, c, c}

In set theory, two "rows" which are the same are equal to one "row". It's like counting 1-10, and going 1, 2, 3, 4, 3, 5, 6, 2, 7, 8, 1, 9, 10. You've still counted 1-10, 10 numbers.

In T-SQL, two rows which are the same are counted as two rows. So, in the above, you've had 13 rows, which might be important if you ask "how many number 3s are there"?
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 40591085
>>> Also, why in T-SQL there are considered different? Is this because it doesn't adhere to the definition  of relation in Standard SQL?

Sort of

as PortletPaul noted above you can't really have a set {a,a,b,c,c,c}  if you construct a set by adding those elements you will get {a,b,c}, a set of 3 elements, not a set of 5, because sets are distinct.

My "IN" clause above relates strongly to this and to the statement of "saying something is true"
X is either is or is not a member of a collection (whether that collection is a set or otherwise).  If that collection has multiple copies of X in it, that doesn't make a statement "X is a member" any more true than if it had only one.

So, the collections are equivalent.

In SQL, the same is true, except.... we can always (albeit sometimes expensively) identify distinct rows in a table, so row 1 with value X is different from row 2 with value X.  So,  they are implicitly distinct.  So, you always have a set if you want to look closely enough.

On the other hand,  if you do something like this...

select X from my_table,  then you simply have a collection of values and you've lost the row identifications that make them distinct.  This result may or may not be a set.

If it is a set, then again, {a,a,b,c,c,c} is not a possible result, because that is not a set.

If it is a multiset, then clearly  {a,b,c} is NOT the same as {a,a,b,c,c,c} in general.
BUT... those two multisets might still be considered equivalent depending on the usage (such as IN clause)

I think you're over-thinking this.   Especially since you're referencing a doc which is then only vaguely referencing a quote; and, who knows, maybe it's not even a real quote, but merely the author's opinion on a notion of Codd's and put it in quotes to highlight that it wasn't the author's idea but an attempt to give credit to an external idea.

So I'll agree with the "don't sweat it".
0
 
LVL 33

Author Comment

by:Mike Eghtebas
ID: 40593062
re> So I'll agree with the "don't sweat it".

Well, to me this seems winking it. I will not be able to move on until I figure out what is this all about. This maybe a personality disorder or some sort of resolve and strength of mine. I will not argue for one or another. This is the way I operate.

I will post a new question stating my question a bit different.

Thank you all. I appreciate for the time and effort.

Regards,

Mike
0
 
LVL 33

Author Comment

by:Mike Eghtebas
ID: 40593121
The new question is at:

http://www.experts-exchange.com/Database/MS-SQL-Server/Q_28611544.html

We understand that a set used as predicate (in WHERE clause) turns a set like IN(a, a, b, c, c, c) to IN(a, b, c) hence we can see why they are equal. But we are talking about sets used in heading or body not predicate. For more on this see the above link to see why in this discussion predicate has to be excluded.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Foreword (May 2015) This web page has appeared at Google.  It's definitely worth considering! https://www.google.com/about/careers/students/guide-to-technical-development.html How to Know You are Making a Difference at EE In August, 2013, one …
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Via a live example, show how to setup several different housekeeping processes for a SQL Server.
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now