Version 1.10
Version 4.0
Commentary
The Spring 1987 issue of Informix's Tech Notes had an article on Outer Joins. The content of that article is the basis of the version 4.0 appendix with minimal changes for the altered context, such as omitting a reference to the next article in the Tech Notes which was on query optimization.
This is a transcription and translation into HTML of an ancient document, namely Appendix G (Complex Outer Joins) from the Informix-4GL Reference Manual, Vol 2, Version 1.10 (part number 200-507-0002-0, dated July 1988). It is referring to Informix's OUTER construct, not the ISO 9075:1992 SQL standard construct with the same name but a radically different syntax. The ISO (ANSI) outer join notation is available IDS 7.31 and in IDS.2000 9.21. Also, the document uses the term RDSQL to refer to the database server. The original version of this document dates back to the days before SE and Turbo were separate products, when you bought Informix-4GL or Informix-SQL and the database process, sqlexec (which became the core of SE) was simply an important part of the base product.
End commentaryThe two-table outer join described in Chapter 2 [of the I4GL Reference Manual, Volume 1] is graphically represented as follows:
SELECT a, b
FROM tab1, OUTER tab2
WHERE a = b
Diagram
tab1 -------@ Level 1
|
tab2 Level 2
In the examples that follow, the tilde (~) symbol represents any Boolean relation between the columns of two tables.
FROM x, y, z
WHERE ...
Diagram
x ---- y ---- z Level 1
This is the standard inner join and all three tables are treated on the same level. There are no restrictions on the WHERE clause. It need not exist. Before the WHERE clause is applied, a full Cartesian product is made among all three tables. (Note: A Cartesian product of two tables appends every row of the second table to every row of the first. A Cartesian product of three tables is the Cartesian product of the third with the Cartesian product of the first two, and so on.)
FROM x, y, OUTER z
WHERE (x or y) ~ z
Diagram
x ---- y -----@ Level 1
|
z Level 2
FROM x, OUTER y, OUTER z
WHERE x ~ y AND x ~ z
Diagram
x -----@------@ Level 1
| |
y z Level 2
FROM x, OUTER (y, z)
WHERE x ~ (y or z)
Diagram
x -----@ Level 1
|
y ---- z Level 2
FROM x, OUTER (y, OUTER z)
WHERE x ~ y AND y ~ z
Diagram
x -----@ Level 1
|
y -----@ Level 2
|
z Level 3
Example 5 shows a three-level outer join. There must always be a relationship between adjacent levels. In this case, x must be related to y, and y must be related to z. RDSQL performs this join by picking a row from x and looking for a pair from y and z that satisfies (y, outer z).
If each of the tables consists of a single column with the values in the followin table, then the previously described joins will result in the subsequent tables.
x.a y.b z.c
1 2 3
2 3 4
3 4 5
5
FROM x, y, z
WHERE x.a = z.c
a b c
3 2 3
3 3 3
3 4 3
5 2 5
5 3 5
5 4 5
[JL: this example, and several of the following ones, has a Cartesian product of table y with the join of tables x and z.]
FROM x, y, OUTER z
WHERE x.a = z.c
a b c
1 2 -
1 3 -
1 4 -
2 2 -
2 3 -
2 4 -
3 2 3
3 3 3
3 4 3
5 2 5
5 3 5
5 4 5
FROM x, OUTER y, OUTER z
WHERE x.a = y.b AND x.a = z.c
a b c
1 - -
2 2 -
3 3 3
5 - 5
FROM x, OUTER (y, z)
WHERE x.a = z.c
a b c
1 - -
2 - -
3 2 3
3 3 3
3 4 3
5 2 5
5 3 5
5 4 5
FROM x, OUTER (y, OUTER z)
WHERE x.a = y.b AND y.b = z.c
a b c
1 - -
2 2 -
3 3 3
5 - -
The following steps are useful when considering how RDSQL performs an outer join:
FROM x, OUTER (y, OUTER (z, a, b)), outer c, d
x -----@-------------@----- d Level 1
| |
y ---- @ c Level 2
|
z ---- a ---- b Level 3
This is a transcription and translation into HTML of an ancient document, namely Appendix G (Outer Joins) from the Informix-4GL Reference Manual, Vol 2, Version 4.0 (part number 000-7405, dated March 1990). It is referring to Informix's OUTER construct, not the ISO 9075:1992 SQL standard construct with the same name but a radically different syntax. The ISO (ANSI) outer join notation is understood in IDS 7.31 and in IDS.2000 9.21. Note that the document uses the term '4GL' to refer to the database server.
End commentary
SELECT customer.customer_num, lname, order_num
FROM customer, orders
WHERE customer.customer_num = orders.order_num
SELECT customer.customer_num, lname, order_num
FROM customer, OUTER orders
WHERE customer.customer_num = orders.customer_num
Query Results from Stores7 Database Generated on Solaris 7 with IDS.2000 9.20.UC2.
Query 1 Results
104 Higgins 1001
101 Pauli 1002
104 Higgins 1003
106 Watson 1004
116 Parmelee 1005
112 Lawson 1006
117 Sipes 1007
110 Jaeger 1008
111 Keyes 1009
115 Grant 1010
104 Higgins 1011
117 Sipes 1012
104 Higgins 1013
106 Watson 1014
110 Jaeger 1015
119 Shorter 1016
120 Jewell 1017
121 Wallack 1018
122 O'Brian 1019
123 Hanlon 1020
124 Putnum 1021
126 Neelie 1022
127 Satifer 1023
Query 2 Results
customer_num lname order_num
101 Pauli 1002
102 Sadler
103 Currie
104 Higgins 1001
104 Higgins 1003
104 Higgins 1011
104 Higgins 1013
105 Vector
106 Watson 1004
106 Watson 1014
107 Ream
108 Quinn
109 Miller
110 Jaeger 1008
110 Jaeger 1015
111 Keyes 1009
112 Lawson 1006
113 Beatty
114 Albertson
115 Grant 1010
116 Parmelee 1005
117 Sipes 1007
117 Sipes 1012
118 Baxter
119 Shorter 1016
120 Jewell 1017
121 Wallack 1018
122 O'Brian 1019
123 Hanlon 1020
124 Putnum 1021
125 Henry
126 Neelie 1022
127 Satifer 1023
128 Lessor
By using a simple join, Query 1 fetches a list of only those customers who have items on order, while Query 2 fetches a list of all customers by using an outer join. Once you understand how similar queries can produce such dissimilar results, you can begin to use outer joins effectively. The obvious differences between the two joins are as follows:
The following section discusses outer joins in detail.
A join queries two or more tables as though they were one. It is as if 4GL creates and then acts upon a single temporary table to produce the query results. 4GL does not actually create such a table to perform a join, but it is helpful to conceptualize a join in these terms.
In a simple two-table join, the resulting "table" contains only those combinations of rows from both tables that satisfy the join condition. In an outer join, the resulting "table" contains these rows, plus all the remaining rows from on of the tables, called the dominant (or preserved) table. The second table is called the subservient table.
Consider two hypothetical tables, employees and depts, which contain the following columns and rows (where dash '--' indicates a NULL value):
employees
emp_num dept_num
2 105
4 103
6 103
5 --
3 102
depts
depts_num dept_loc
102 NY
103 LA
105 SF
Suppose, for example, that you need a list of employee numbers and department locations for all employees, including those employees whose department locations are unknown (represented by NULL values in the employees table). The following query fetches the desired results:
SELECT emp_num, dept_loc
FROM employees, OUTER depts
WHERE employees.dept_num = depts.dept_num
The keyword OUTER designates depts as the subservient table, making employees the dominant table. 4GL processes the query by the following steps:
The result is a "table" with the following rows:
emp_num dept_num dept_num dept_loc
2 105 105 SF
4 103 103 LA
6 103 103 LA
5 -- -- --
3 102 102 NY
Note: A filter is a condition expressed in a WHERE clause that applies to columns in a single table. For example,
dept_loc = "SF" or emp_num < 105
Because 4GL applies such filters to the subservient table as it performs the join, the resulting "table" may contain NULL values that were not present in the subservient table prior to the join.
Suppose that the query include a filter on the dept_loc column:
SELECT emp_num, dept_loc
FROM employees, OUTER depts
WHERE employees.dept_num = depts.dept_num
AND dept_loc != "LA"
At step 2, the results include more rows of NULL values than the results of the original query:
emp_num dept_loc
2 SF
4 --
6 --
5 --
3 NY
The filter removes rows from the depts table where dept_loc is equal to "LA".
emp_num dept_loc
2 SF
4 LA
6 LA
5
3 NY
The original version listed the emp_num values in numeric order without any sorting. IDS.2000 does not do this.
In a similar way to the previous example, the following query produces a list of all customers with supplemental information for those customers with items on orders. Where orders.customer_num is not equal to customer.customer_num, 4GL combines a row of NULL values with the corresponding row from the customer table. Because the query does not contain filters, the results preserve every row from the dominant table.
SELECT customer.customer_num, company, order_num, ship_date
FROM customer, OUTER orders
WHERE customer.customer_num = orders.customer_num
customer_num company order_num ship_date
101 All Sports Supplies 1002 26/05/1998
102 Sports Spot
103 Phil's Sports
104 Play Ball! 1001 01/06/1998
104 Play Ball! 1003 23/05/1998
104 Play Ball! 1011 03/07/1998
104 Play Ball! 1013 10/07/1998
105 Los Altos Sports
106 Watson & Son 1004 30/05/1998
106 Watson & Son 1014 03/07/1998
107 Athletic Supplies
108 Quinn's Sports
109 Sport Stuff
110 AA Athletics 1008 06/07/1998
110 AA Athletics 1015 16/07/1998
111 Sports Center 1009 21/06/1998
112 Runners & Others 1006
113 Sportstown
114 Sporting Place
115 Gold Medal Sports 1010 29/06/1998
116 Olympic City 1005 09/06/1998
117 Kids Korner 1007 05/06/1998
117 Kids Korner 1012 29/06/1998
118 Blue Ribbon Sports
119 The Triathletes Club 1016 12/07/1998
120 Century Pro Shop 1017 13/07/1998
121 City Sports 1018 13/07/1998
122 The Sporting Life 1019 16/07/1998
123 Bay Sports 1020 16/07/1998
124 Putnum's Putters 1021 25/07/1998
125 Total Fitness Sports
126 Neelie's Discount Sp 1022 30/07/1998
127 Big Blue Bike Shop 1023 30/07/1998
128 Phoenix University
The preceeding example queries two tables in the simplest type of outer join. You can, in fact, use outer joins to query any number of tables, producing more types of joins than can be discussed here. The following types are possible when three tables are involved in a query:
SELECT column-list
FROM x, OUTER (y, z)
WHERE x.a = y.a
AND y.b = z.b
SELECT column-list
FROM x, OUTER (y, OUTER z)
WHERE x.a = y.a
AND y.b = z.b
SELECT column-list
FROM x, OUTER (y, OUTER z)
WHERE x.a = z.a -- different join here!
AND y.b = z.b
SELECT column-list
FROM x, OUTER y, OUTER z
WHERE x.a = y.a
AND x.b = z.b
SELECT column-list
FROM x, OUTER y, OUTER z
WHERE x.a = y.a
AND y.b = z.b
SELECT customer.customer_num, lname, orders.order_num, stock_num, manu_code, quantity
FROM customer, OUTER(orders, items)
WHERE customer.customer_num = orders.customer_num
AND orders.order_num = items.order_num
AND manu_code = 'ANZ'
4GL performs the simple join between orders and items first, yielding information on all orders for ANZA-manufactured items. The outer join combines the customer table with the Anza order information. The query results do not include orders for other items.
customer_num lname order_num stock_num manu_code quantity
101 Pauli
102 Sadler
103 Currie
104 Higgins 1003 9 ANZ 1
104 Higgins 1003 8 ANZ 1
104 Higgins 1003 5 ANZ 5
104 Higgins 1011 5 ANZ 5
104 Higgins 1013 5 ANZ 1
104 Higgins 1013 6 ANZ 1
104 Higgins 1013 9 ANZ 2
105 Vector
106 Watson
107 Ream
108 Quinn
109 Miller
110 Jaeger 1008 8 ANZ 1
110 Jaeger 1008 9 ANZ 5
111 Keyes
112 Lawson 1006 5 ANZ 5
112 Lawson 1006 6 ANZ 1
113 Beatty
114 Albertson
115 Grant 1010 6 ANZ 1
116 Parmelee 1005 5 ANZ 10
116 Parmelee 1005 6 ANZ 1
117 Sipes 1012 8 ANZ 1
117 Sipes 1012 9 ANZ 10
118 Baxter
119 Shorter
120 Jewell
121 Wallack
122 O'Brian
123 Hanlon
124 Putnum 1021 201 ANZ 3
124 Putnum 1021 205 ANZ 2
125 Henry
126 Neelie 1022 6 ANZ 2
127 Satifer 1023 304 ANZ 1
128 Lessor
WHERE manu_code = 'ANZ'
SELECT customer.customer_num, lname, orders.order_num, stock_num, manu_code, quantity
FROM customer, OUTER (orders, OUTER items)
WHERE customer.customer_num = orders.customer_num
AND orders.order_num = items.order_num
AND manu_code = 'ANZ'
customer_num lname order_num stock_num manu_code quantity
101 Pauli 1002
102 Sadler
103 Currie
104 Higgins 1001
104 Higgins 1003 9 ANZ 1
104 Higgins 1003 8 ANZ 1
104 Higgins 1003 5 ANZ 5
104 Higgins 1011 5 ANZ 5
104 Higgins 1013 5 ANZ 1
104 Higgins 1013 6 ANZ 1
104 Higgins 1013 9 ANZ 2
105 Vector
106 Watson 1004
106 Watson 1014
107 Ream
108 Quinn
109 Miller
110 Jaeger 1008 8 ANZ 1
110 Jaeger 1008 9 ANZ 5
110 Jaeger 1015
111 Keyes 1009
112 Lawson 1006 5 ANZ 5
112 Lawson 1006 6 ANZ 1
113 Beatty
114 Albertson
115 Grant 1010 6 ANZ 1
116 Parmelee 1005 5 ANZ 10
116 Parmelee 1005 6 ANZ 1
117 Sipes 1007
117 Sipes 1012 8 ANZ 1
117 Sipes 1012 9 ANZ 10
118 Baxter
119 Shorter 1016
120 Jewell 1017
121 Wallack 1018
122 O'Brian 1019
123 Hanlon 1020
124 Putnum 1021 201 ANZ 3
124 Putnum 1021 205 ANZ 2
125 Henry
126 Neelie 1022 6 ANZ 2
127 Satifer 1023 304 ANZ 1
128 Lessor
In addition to customer, orders and so on, the following
queries include a hypothetical table named custnotes, containing
the following columns and data:
customer_num notes
104 sponsors soccer team
108 customer for 20 years
115 opening a second store
118 new customer
SELECT customer.customer_num, orders.order_num, notes
FROM customer, OUTER (orders, OUTER custnotes)
WHERE customer.customer_num = orders.customer_num
AND orders.customer_num = custnotes.customer_num
customer_num order_num notes
101 1002
102
103
104 1001 sponsors soccer team
104 1003 sponsors soccer team
104 1011 sponsors soccer team
104 1013 sponsors soccer team
105
106 1004
106 1014
107
108
109
110 1008
110 1015
111 1009
112 1006
113
114
115 1010 opening a second store
116 1005
117 1007
117 1012
118
119 1016
120 1017
121 1018
122 1019
123 1020
124 1021
125
126 1022
127 1023
128
To preserve notes for customers 108 and 118 who do not have orders, you
must outer-join the custnotes table directly with the
customer table, as shown in the next query.
SELECT customer.customer_num, orders.order_num, notes
FROM customer, OUTER orders, OUTER custnotes
WHERE customer.customer_num = orders.order_num
AND customer.customer_num = custnotes.customer_num
customer_num order_num notes
101
102
103
104 sponsors soccer team
105
106
107
108 customer for 20 years
109
110
111
112
113
114
115 opening a second store
116
117
118 new customer
119
120
121
122
123
124
125
126
127
128
All of the preceding queries fetch information from one table with supplemental information from other tables. When you need similar results, Informix recommends that you use an outer join. When you do not need supplemental information, as is normally the case, use a simple join instead.
Be aware that your choice of an outer join can influence query optimization and processing. You can use the SET EXPLAIN ON statement to examine how the query processor of Informix-4GL performs simple queries, joins and outer joins.
SELECT customer.customer_num, lname, order_num
FROM customer, orders
WHERE customer.customer_num = orders.order_num
customer ---- orders
SELECT customer.customer_num, lname, order_num
FROM customer, OUTER orders
WHERE customer.customer_num = orders.customer_num
customer -----@
|
orders
SELECT customer.customer_num, company, order_num, ship_date
FROM customer, OUTER orders
WHERE customer.customer_num = orders.customer_num
customer -----@
|
orders
SELECT customer.customer_num, lname, orders.order_num, stock_num, manu_code, quantity
FROM customer, OUTER(orders, items)
WHERE customer.customer_num = orders.customer_num
AND orders.order_num = items.order_num
AND manu_code = 'ANZ'
customer -----@
|
orders ---- items
SELECT customer.customer_num, lname, orders.order_num, stock_num, manu_code, quantity
FROM customer, OUTER (orders, OUTER items)
WHERE customer.customer_num = orders.customer_num
AND orders.order_num = items.order_num
AND manu_code = 'ANZ'
customer -----@
|
orders -----@
|
items
SELECT customer.customer_num, orders.order_num, notes
FROM customer, OUTER (orders, OUTER custnotes)
WHERE customer.customer_num = orders.customer_num
AND orders.customer_num = custnotes.customer_num
customer -----@
|
orders -----@
|
custnotes
SELECT customer.customer_num, orders.order_num, notes
FROM customer, OUTER orders, OUTER custnotes
WHERE customer.customer_num = orders.order_num
AND customer.customer_num = custnotes.customer_num
customer -----@---------@
| |
orders custnotes
CREATE VIEW IBB_V_Project AS
SELECT A.Project_Iref,
A.Section_Iref,
B.Section_Eref,
N.Company_Iref,
N.Company_Name,
A.Product_Desc,
A.Project_Type_Iref,
D.Project_Type,
A.Person_Iref,
F.Full_Name,
A.Respon_Iref,
G.Post_Location,
A.Project_Stat_Iref,
E.Project_Status,
A.Source_Iref,
I.Source,
A.Sic_Iref,
L.Sic_Eref,
A.Op_Activity_Iref,
M.Op_Activity_Desc,
A.Involve_Iref,
K.IBB_Involvement,
A.Nature_Iref,
C.Nature_Of_Next_Act,
A.Internat_Mobile,
A.Whether_Cop_Case,
A.Closed_Ind,
A.Next_Action_Date,
A.Creation_Date,
A.Last_Edit_Date,
A.Last_Editor_Iref,
H.Logname
FROM IBB_Project A,
IBB_Section B,
IBB_R_Proj_Type D,
IBB_R_Project_Stat E,
IBB_Personnel H,
OUTER IBB_R_Next_Act C,
OUTER IBB_Personnel F,
OUTER (IBB_Post_Respon X, OUTER IBB_V_Post_Resp2 G),
OUTER IBB_R_Source I,
OUTER IBB_R_Involvement K,
OUTER IBB_Sic L,
OUTER IBB_Op_Act M,
OUTER IBB_V_Proj_Co2 N
WHERE A.Section_Iref = B.Section_Iref
AND A.Project_Type_Iref = D.Project_Type_Iref
AND A.Project_Stat_Iref = E.Project_Stat_Iref
AND A.Last_Editor_Iref = H.Person_Iref
AND A.Nature_Iref = C.Nature_Iref
AND A.Person_Iref = F.Person_Iref
AND A.Respon_Iref = X.Respon_Iref
AND X.Respon_Iref = G.Person_Iref
AND A.Source_Iref = I.Source_Iref
AND A.Sic_Iref = L.Sic_Iref
AND A.Op_Activity_Iref = M.Op_Activity_Iref
AND A.Project_Iref = N.Project_Iref
AND A.Involve_Iref = K.Involve_Iref;
Note that IBB_V_Post_Resp2 and IBB_V_Proj_Co2 are both themselves views.
In fact, IBB_V_Proj_Co2 was a 3-table view, exact details unknown but of
the form:
CREATE VIEW IBB_V_Proj_Co2 AS
SELECT A.Project_Iref,
A.Some_Other_Col col01,
B.Xxxx_Iref,
B.Some_Other_Col col02,
C.Yyyy_Iref,
C.Some_Other_Col col03
FROM IBB_Project A,
OUTER (IBB_R_Xxxx B, IBB_R_Yyyy C)
WHERE A.Xxxx_Iref = B.Xxxx_IrEf
AND B.Yyyy_Iref = C.Yyyy_Iref;
This means that the IBB_V_Project view has an outer self-join on
IBB_Project. The IBB_V_Post_Resp2 view involved 3 tables too, somewhat
like:
CREATE VIEW IBB_V_Post_Resp2 AS
SELECT A.Person_Iref,
A.Some_Other_Col col01,
B.Xxxx_Iref,
B.Some_Other_Col col02,
C.Yyyy_Iref,
C.Some_Other_Col col03
FROM IBB_Personnel A,
IBB_R_Xxxx B,
IBB_R_Yyyy C
WHERE A.Xxxx_Iref = B.Xxxx_Iref
AND B.Yyyy_Iref = C.Yyyy_Iref;
The Zzzz_Iref columns were either SERIAL or INTEGER foreign keys
referencing a SERIAL key.
The primary view definition refers to 14 tables, with 4 inner joins and 9 outer joins. When the cross-referenced views are taken into account, there are 18 tables in total, with 7 inner joins and 10 outer joins. The view was constructed semi-automatically by a database schema design tool.
A ---- B ---- D ---- E ---- H -----@------@------@------@------@------@------@------@
| | | | | | | |
G ---- X C F I K L M N
IBB_Project -----@
|
IBB_R_Xxxx -----@
|
IBB_R_Yyyy
IBB_Personnel ---- IBB_R_Xxxx ---- IBB_R_Yyyy