📘 SQL vs Aggregations
SQL​
A SQL query statement usually starts with a SELECT
where we put a list of the fields we need, then a FROM
clause to state the table/s (or in this case, the subquery) that will return the rows. We can filter out using WHERE
and group data using GROUP
.
We read this from the inside. If there's too much nesting, it's not easy to follow.
Get authors’ bios with books that have an average 5-star rating​
SELECT authors.name, authors.bio
FROM library.authors authors
JOIN library.author_book author_book_join ON authors.id = author_book_join.author_id
JOIN (
SELECT books.id
FROM library.books books
JOIN library.reviews reviews ON books.id = reviews.book_id
GROUP BY books.id
HAVING AVG(reviews.rating) = 5
) five_star_books ON author_book_join.book_id = five_star_books.id;
Get annual, average and max spending from customers in all cities​
SELECT
city,
SUM(annual_spend) Total_Spend,
AVG(annual_spend) Average_Spend,
MAX(annual_spend) Max_Spend,
COUNT(annual_spend) customers
FROM (
SELECT t1.city, customer.annual_spend
FROM customer
LEFT JOIN (
SELECT address.address_id, city.city,
address.customer_id, address.location
FROM address LEFT JOIN city
ON address.city_id = city.city_id
) AS t1
ON
(customer.customer_id = t1.customer_id AND
t1.location = "home")
) AS t2
GROUP BY city;
Equivalent MongoDB aggregation pipeline​
Get authors’ bios with books that have an average 5-star rating​
We o through 4 stages:
- group all the reviews for every book, calculating the average rating.
- filter out all average ratings other than 5.
- now we have reviews with 5 stars, but we also want the author bio, so we join with author to get the bio.
- we add a new field called
bio
with just the author's bio.
db.reviews.aggregate([
{
$group: {
_id: "$bookId",
averageRating: {
$avg: "$rating",
},
},
},
{ $match: { averageRating: 5 } },
{ $lookup: {
from: "authors",
localField: "_id",
foreignField: "books",
as: "author",
},
},
{$addFields: {
bio: "$author.bio"
}},
])
Get annual, average and max spending from customers in all cities​
Here we pass three stages: one to return one document per element in the address
array, and then we filter out to get only the documents that have a home
address location. Finally, we do the grouping. As we'll see, this can be split and tested separately and resembles our code.
db.customers.aggregate([
{
$unwind: "$address",
},
{
$match: {"address.location": "home"}
},
{
$group: {
_id: "$address.city",
totalSpend: {$sum: "$annualSpend"},
averageSpend: {$avg: "$annualSpend"},
maximumSpend: {$max: "$annualSpend"},
customers: {$sum: 1}
}
}
])
info
See also SQL to Aggregation Mapping Chart