One of the greatest benefits of using relational databases such as MySQL is that its relational structure allows you to easily store and query information across multiple tables.
Let's explore how to retrieve exactly the data you want from multiple database tables, and the various joins that are available which allow you to pull the exact results you want.
Initialize Sample Database
This is not required, but if you would like to follow along with the examples in this article, you may initialize a sample database locally with the below terminal commands:
git clone https://github.com/mdizak/sample-select-db.git
cd sample-select-db
sudo mysql < store.sql
sudo mysql sampledb
mysql> SELECT COUNT(*) FROM customers;
You should get a result stating there are 2000 rows within the customers table.
Default / INNER Join
The default join used within MySQL databases is called the INNER join, and is the most common and straight forward. This join returns all records for which there are matching records in both tables, and dismisses all other records.
For example, if you would like to see the customer's first and last names, plus order amount and date for all orders greater than $1000 you could use the following SQL statement:
SELECT
c.id, c.first_name, c.last_name, o.amount, o.created_at
FROM
customers c, orders o
WHERE
o.customer_id = c.id AND o.amount >= 1000;
A few notes regarding the above query:
- Five different columns are being selected, three from the customers table and two from the orders table.
- Within the FROM clause the two tables are defined, but suffixed with the letters "c" and "o". These simply specify aliases within SQL, can be anything you wish, and are used to shorten the SQL query.
- The o.customer_id = c.id is the join aspect of the query, and and ensures proper correlation between customers and orders.
A different, and technically more syntactically correct way to write the same query is below:
SELECT
c.id, c.first_name, c.last_name, o.amount, o.created_at
FROM
customers c INNER JOIN orders o
ON
customer_id = c.id
WHERE
o.amount >= 1000;
The above query tends to be a little easier to read as you can easily see the join between the customers and orders table. For all intents though, these two queries are the same, and will produce exactly the same records.
LEFT Joins
Left joins will return all records from the left table that also match with records from the right table, and dismiss all other records. For example, maybe you want to view the total amount of sales for each product in the database, you may try using a query such as:
SELECT
p.name, sum(item.amount) AS tamount
FROM
orders_items item LEFT JOIN products p
ON
item.product_id = p.id
GROUP BY item.product_id ORDER BY tamount DESC
This results in a nice two column view showing the product name with the total sales amount, and works as expected. The query went through all products within the orders_items table, joined them to records within the products table, and returned the total sales amount of each.
RIGHT Joins
Using the above example, take notice of the fact the above query only returned 19 records while there's a total of 22 products in the database. This is because the query began with the orders_items table and left joined it to the products table, and since some products have never been ordered, no records of those products exist within the orders_items table.
What happens if you want to get a list of all products with sales amounts, including products that have not been ordered? Try a right join with the following query:
SELECT
p.name, sum(item.amount) AS tamount
FROM
orders_items item RIGHT JOIN products p
ON
item.product_id = p.id
GROUP BY p.id ORDER BY tamount DESC
That's better, and the query now returns the full 22 products with three of them having an amount of null. This is because instead of using orders_items as the primary table that joins to the products table, the right join flips the order and joins the products table to the orders_items table.
Multiple Joins in a Query
Sometimes you have a need to join three or more tables together to get a specific set of results.
For example, maybe you want a list of all customers who have purchased the microwave (product id# 1), including their name and order date. This requires a SELECT across three tables which can be done by using two joins with the following query:
SELECT
c.first_name, c.last_name, o.amount, o.created_at
FROM
customers c INNER JOIN orders o
ON
c.id = o.customer_id INNER JOIN orders_items item
ON
item.order_id = o.id
WHERE
item.product_id = 1 ORDER BY o.created_at;
This query returns all 426 orders of the microwave, and works as expected. It first matches all customers to their respective orders, then further queries that result set by matching all orders to only those within the orders_items table that contain the microwave product (id# 1).
Never Use Sub-Queries with IN Clauses
As a quick sidenote, at all costs you should always avoid using sub-queries with within your SQL queries such as:
SELECT first_name,last_name FROM customers WHERE id IN (SELECT customer_id FROM orders WHERE status = 'approved' AND amount < 100);
Queries such as above are very inefficient, use a large number of resources, and should be avoided as much as possible. Instead, use proper joins as outlined in the above sections. For example, the above query should be re-written as:
SELECT c.first_name, c.last_name FROM customers c LEFT JOIN orders o ON o.customer_id = c.id WHERE o.status = 'approved' AND o.amount < 100;
Save Time With SQL Joins
This article has hopefully help show you the power of relational databases such as MySQL, and how to build SQL queries that retrieve records from multiple tables within one query using joins, allowing you to retrieve the exact results desired.
You have learned three different joins within SQL, how to alias column and table names, use multiple joins in one query, and why you should avoid sub-queries. Never scramble around again trying to manually compile different data sets into one, and begin using joins to impress your work colleagues and save time.