|
|
|
Basics:
Citizen of the United States
Born: 09/21/1977
Place: Helsinki, Finland
Education:
MS in Information Systems
George Washington Univ.
2006
MS in Project Management
George Washington Univ.
2005
BA in Political Science
Univ. of Michigan
2001
Contact Points:
Attn: Panos Marcoullis
2700 Wisconsin Avenue
Apartment 610
Washington DC, 20007
Ph: (202) 288-5249
Fx: (202) 248-2275
Em: panos@marcoullis.com
Ws: www.marcoullis.com
|
|
|
|
|
Quick refresher on set theory
- To understand relational algebra, and in fact SQL, one must have a thorough grasp of set theory... here is a refresher (Venn Diagrams):
Click to go to Venn Diagrams
Operators
- There are four kinds of operators in relational algebra:
- Set operators
- Selection operators
- Join operators
- Summary operators
- Remember: In relational algebra the one function we can perform is to derive new tables from existing tables. This is what the above relational algebra operators allow us to do, in four different ways!
Set operator
- Based on set theory in mathematics.
- There are four groupings of set operators:
- Unions ("A or B")
- Intersects ("A and B")
- Difference ("A without B")
- Product ("A with B")
- For use with examples below:
T1, T2 and T3 are tables of which
- A1, A2 and A3 are table attributes of which
- a, b, c and so on are table attribute values
Set operator > UNION
- T1 Union T2
is a table relation such that:
- All rows, R, such that R is a row in T1 or R is a row in T2 or R is a row in both tables.
- T1 and T2 must therefore have the same attributes!
- A picture speaks 1000 words:

(the above image is from the lecture notes given to us by Professor John Artz in class - September 2005. This is not the work of Panos Marcoullis)
Translated into english:
List everyone who is enrolled in MGT280 or MGT284
Set operator > INTERSECT
- T1 Intersect T2
is a table relation such that:
- All rows, R, such that R is a row in T1 and R is a row in T2.
- Again, T1 and T2 must therefore have the same attributes!
- A picture speaks 1000 words:

(the above image is from the lecture notes given to us by Professor John Artz in class - September 2005. This is not the work of Panos Marcoullis)
Translated into english:
List everyone who is enrolled in MGT280 and MGT284
Set operator > DIFFERENCE
- T1 Minus T2
is a table relation such that:
- All rows, R, such that R is a row in T1 and R is not a row in T2.
- Again, T1 and T2 must therefore have the same attributes!
- A picture speaks 1000 words:

(the above image is from the lecture notes given to us by Professor John Artz in class - September 2005. This is not the work of Panos Marcoullis)
Translated into english:
List everyone who is enrolled in MGT280 and not in MGT284
Set operator > PRODUCT
- T1 Product T2
is a table relation such that:
- All T1 is a table with attributes A1, A2,+...+An and T2 is a table with attributes B1, B2, +...+Bn then T1 Product T2 is a new table T3 consisting of all combinations of values a1, a2,+...+an from T1 and b1, b2,+...+bn from T2.
- This is called a Cartersian Product.
- A picture speaks 1000 words:

(the above image is from the lecture notes given to us by Professor John Artz in class - September 2005. This is not the work of Panos Marcoullis)
Translated into english:
What are the possible enrollments?
- The above translation shown graphically:
(the above image is from the lecture notes given to us by Professor John Artz in class - September 2005. This is not the work of Panos Marcoullis)
Selection operators
- Now we begin to merge set theory with the needs of SQL.
- Two types of Selection operators:
- Project (select attributes - columns)
- Restrict (select tuples - rows)
Selection operators > PROJECT
- Project (Aj...Ak) T1
translates to:
Select all attributes (columns) within table T1 that fulfill the criteria such that selected attributes (columns) are Aj through Ak inclusive; Place the selected attributes (columns) in a new table.
- A picture is worth 1000 words:
Project (A1, A3) T1

(the above image is from the lecture notes given to us by Professor John Artz in class - September 2005. This is not the work of Panos Marcoullis)
Selection operators > RESTRICT
- Restrict (A1=a) T1
translates to:
Select all tuples (rows) within table T1 that fulfill the criteria such that selected attribute (column) A1 has value a; Place the selected tuples (rows) in a new table.
- A picture is worth 1000 words:
Restrict (A1=a) T1

(the above image is from the lecture notes given to us by Professor John Artz in class - September 2005. This is not the work of Panos Marcoullis)
Join operators
- This is where it gets really interesting!
- The whole point of relational algebra is to decompose large tables into smaller ones that contain just the subset of data that we are interested in. In order to do this we need a method to reconstruct the larger ones in a way that is meaningful to us.
- To satisfy this need we use Join operations, which are nothing more than a Cartesian Product followed by a Restrict.
- Condition:
Attributes in common from both tables have related (usually equal) values.
- There are four kinds of Joins in relational algebra:
- Equi-joins
- Natural joins
- Outer joins
- Theta joins
Join operators > EQUI-JOIN (aka the Inner join)
EQUI join is an inner join which produces the set of all combinations of tuples (rows) in the table "Students" and the table "Enrollments" that are equal on their common attribute (column) names, where redundant attributes are not eliminated.

(the above image is altered from here )
- They are called Inner joins because diagrammatically, as you can see from the image above, we are dealing with the inside part (the intersection) of the two sets of data tables.
- The equi-join does not remove redundant attributes.
- A picture is worth 100o words:
Student Equi-Join (Student.SID=Enrollments.SID) Enrollments

(the above image is from the lecture notes given to us by Professor John Artz in class - September 2005. This is not the work of Panos Marcoullis)
- Notice how the attribute "SID" from table "Student" is joined to the attribute "SID" from table "Enrollments" to produce the equi-joined table called "Joined Enrollments".
- The join is formed in the intersection of the data in the tables.
- Redundant information is left in the equi-joined table. This is taken care of by using natural joins, which are another kind of inner join.
Join operators > NATURAL JOIN
(a type of Inner join)
NATURAL join is an inner join which produces the set of all combinations of tuples (rows) in the table "Students" and the table "Enrollments" that are equal on their common attribute (column) names, where redundant attributes are eliminated.
- A picture speaks 1000 words:
Student Natural Join (Student.SID=Enrollments.SID) Enrollments

- (the above image is an adaptation from the lecture notes given to us by Professor John Artz in class - September 2005. This is not the work of Panos Marcoullis)
Join operators > OUTER JOIN (Left, Right and Full)
LEFT outer joins is the set of all combinations of tuples (rows) in the table "Students" and the table "Enrollments" that are equal on their common attribute (column) names, in addition to tuples (rows) in the table "Students" that have no matching tuples (rows) in the table "Enrollments".
RIGHT outer joins is the set of all combination of tuples (rows) in the table "Enrollments" and the table "Students" that are equal on their common attribute (column) names, in addition to tuples (rows) in the table "Enrollments" that have no mathching tuples (rows) in the table "Students".
FULL join is the set of all combinations of tuples (rows) in the table "Students" and the table "Enrollments" that are equal on their common attribute (column) names, in addition to tuples (rows) in the table "Enrollmetns" that have no matching tuples (rows) in the table "Students" and tuples (rows) in the table "Students" that have no matching tuples (rows) in the table "Enrollments" in their common attribute (column) names.

(the above images are from here)
- Outer joins are used to present additional information on attributes in table T1, that may also be available in table T2 (Left outer join).
- By default we are specifying a left outer join when we refer to outer joins. Right outer joins are more rarely used.
- Here is an example:
T1 Outer-Join (A1=B1) T2

(the above image is from the lecture notes given to us by Professor John Artz in class - September 2005. This is not the work of Panos Marcoullis)
- Notice that A1=c and A2=d are not lows because this is a left outer join. All of the data from the left table (T1) must be included in the new table resulting from the left outer join.
- Outer joins are useful when you want to include null values from a join operation. Lets say we wanted to join a table called "Students" and a table called "Enrollment". With an inner join all students who are not enrolled would not show up in the resulting table from the inner join. With an outer join they would show up. In this case the relational algebra would be:
Students Outer-Join (Students.SID=Enrollment.SID) Enrollment
This table would include both the students who are enrolled and those who present a null value (not enrolled).
- In the above example, a count of the students from the inner join would exclude the students who are not currently enrolled. If thats what you are aiming for thats great. If you want all students accounted for, both enrolled and not enrolled, then a left outer join is called for.
Join operators > THETA JOIN
We use Theta joins when the relational operator is not an equivalency.
- Lets say we want to use:
- =! (not equal to)
- < (less than)
- <= (less than or equal to)
- > (greater than)
- >= (greater than or equal to)
- T1 Theat-Join (T1.A1 < T2.B1) T2
In simple english:
This creates a cartesian product of all tuples from table T1 with attribute A1 and all tuples from table T2 with attribute B1, such that A1 is greater than B1.
- Say we have a table containing student names called "Students" with attributes "SID" and "Student Name" . If I wanted to generate a new table from this table that would show me all the likely pairs that could result from this list of students I would first copy the "Students" table and call the copy "Students2" then I would create a Theta-join such that:
Students Theat-join (Students.SID =! Students2.SID) Students2
This would generate a cartesian product of all the tuples from the "Students" table with attribute "SID" and all tuples from the "Students2" table with attribute "SID" such that "Students"."SID" is not equal to "Students2"."SID"
- Theta join allows for ANY relational operator.
Summary operators
- Summary operators allow us to make rudimentary (very basic) summarizations of the data in tables, such as:
- Count
- Sum*
- Average*
- Max*
- Min*
- The attributes with an asterisk require require that the attribute be of a numerical type in order to be summarized. Count does not have this requirement.
- A pictures says 1000 words:
Summarize T per T {A1} Add Sum (A3) As SumA3
From table T, the attribute A1 is summed by attribute A3, presenting the results in as a new attribute SumA3.

Strengths and weaknesses of relational algebra
- Relational algebra is great for theoretical table manipulations, but really sucks for routine (everyday) manipulations. Its primitive and therefore serves a theoretical purpose (more closely tied to set theory) but is inadequate for everyday business operations.
- The main purpose it serves is as a primer to understanding SQL. Here is the reason why this is true:
- If you understand set theory then you can understand relational algebra
- If you understand relational algebra then you understand how tables can be manipulated
- If you understand how tables can be manipulated, then you can understand more complicaed ways of manipulating them, and that is where SQL comes in.
|
|
|