Notes on Database Joins

By Xah Lee. Date:

Databases has a JOIN operation, that queries 2 tables and return a accumulated result. The SQL language is a very fckedup un-orthogonal language, and some of the JOIN jargons is also fcked up and misleading. Here's some basic explanation.

• A Table is a Set.
• A row in a Table is a Element of the Set.
• The Columns are Attributes of a Element of the set.

• A Element in the set is often uniquely identified by one or more Attribute. (i.e. one or more Columns function as the Primary Key.)

CROSS JOIN

a CROSS JOIN means the Cartesian Product of two sets. So, if x is a element of A, and y is a element of B, then the result of CROSS JOIN is a set with all possible elements (x,y).

In database terms, the result of a CROSS JOIN will contain all rows in both A and B. And, each row's column has all columns in A and B.

Example of cross join (ANSI 92 standard syntax):

SELECT *
FROM employee CROSS JOIN 
     department;

Example cross join (non-standard syntax):

SELECT *
FROM employee
    ,department;

INNER JOIN

INNER JOIN is like a intersection of both tables. That is, the resulting set's elements are in both A and B.

For example if set A is {a,d,c,h} and set B is {m,c,d,o}, then both sets has h, and their intersection is the set {h}.

However, in database, elements are rows, each row are made up of columns. Two different tables will always have different columns. What do we mean if a element/row is in both table/set?

In the context of database, sameness is defined by matching a attribute/column of the element/row of a set/table.

Whether a element is in both A and B is defined by matching some attributes of elements in A and B. This function varies and is defined on the spot. Here's a pseudocode,

for each and every row in A, of a given column colA,
call this row x
  then for each and every now in B, of a given column colB,
  call this row y
    if value(calA)==value(colB)
       then return (x,y)

In general, more than one column can be used to determine if it's in both.

In order for INNER JOIN to make sense, it is assume that some Element's Attributes in both Sets are the same type of thing, and can be used to check if they match. In database speak, both table must have a column(s) that have the same type of value. The column used is often some primary key, but not necessarily.

Example: both tables have column “user_id”. Example: both tables have column “employee_hat_color”.

INNER JOIN is just a CROSS JOIN with a filtering step. Take any x in A and y in B and return the set of all possible (x,y). If you don't filter the result, then it's called cross join. If you filter the result by such way that you require some attributes in both sets be equal, then it's a INNER JOIN.

LEFT OUTER JOIN

LEFT OUTER JOIN is like INNER JOIN except that it doesn't eliminate any element in the first set. If the second set doesn't contain any element that matches a element's attribute of the first set, then the combined element (x,y) is still returned with y's attributes being all NULL.

Example left outer join (ANSI 92 standard syntax):

SELECT employee.LastName, employee.DepartmentID, department.DepartmentName  
FROM employee 
     LEFT OUTER JOIN 
     department 
       ON employee.DepartmentID = department.DepartmentID

RIGHT OUTER JOIN is just like LEFT OUTER JOIN, except that the sides are reversed.

FULL OUTER JOIN

A full outer join combines the results of both left and right outer joins. These joins will show records from both tables, and fill in NULLs for missing matches on either side.


Note: each database has slighly different syntax and uses different termnology for these joins. These names CROSS, INNER, LEFT OUTER, FULL OUTER, joins do not have a sound mathematical sense. Don't fret over them.

Note, the SQL statements on this page is taken from Wikipedia: Join (SQL).