π $group
In SQL, the GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country." The GROUP BY statement is often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result-set by one or more columns.
The $group stage in MongoDBβs Aggregation Framework is equivalent to GROUP BY in SQL. It allows us to group documents by a specific field and apply aggregate functions like sum, average, count, min, and max.
π¬ How does $group work?β
The $group stage groups documents based on a field and performs calculations on grouped data.
Syntaxβ
{
$group: {
_id: <expression>,
<field>: { <accumulator>: <expression> }
}
}
_id: The field to group by (usenullto aggregate all documents together)<accumulator>: An aggregation operator ($sum,$avg,$min,$max,$push,$addToSet, etc.)
π Example 1: Count the number of books published every yearβ
MongoDB queryβ
db.books.aggregate([
{
$group: {
_id: "$year",
totalBooks: { $sum: 1 },
},
},
]);
Equivalent SQL queryβ
SELECT year, COUNT(*) AS totalBooks
FROM books
GROUP BY year;
Sample outputβ
[
{"_id": 1980,"totalBooks": 42},
{"_id": 2000,"totalBooks": 490},
{"_id": 1981,"totalBooks": 45},
...
]
π Example 2: Without using $group, count the number of books published every yearβ
MongoDB queryβ
db.books.aggregate([
{
$sortByCount: "$year",
},
]);
Sample output remains the same as beforeβ
[
{"_id": 1980,"totalBooks": 42},
{"_id": 2000,"totalBooks": 490},
{"_id": 1981,"totalBooks": 45},
...
]
π Example 3: Find the total number of pages published every year.β
MongoDB queryβ
db.books.aggregate([
{
$group: {
_id: "$year",
totalPages: { $sum: "$pages" },
},
},
]);
Equivalent SQL queryβ
SELECT year, SUM(rating) AS totalPages
FROM books
GROUP BY year;
Sample outputβ
[
{ "_id": 1955, "totalPages": 664 },
{ "_id": 1952, "totalPages": 416 },
{ "_id": 1899, "totalPages": 128 }
...
]
π Challengeβ
π 1. Find the average book rating of all booksβ
Answer
- JavaScript
- C#
await reviews.aggregate([
{
$group: {
_id: "$_id.bookId",
avgRating: { $avg: "$rating" }
}
}
]).toArray();
public class BookRating
{
[BsonId]
public string BookId { get; set; }
[BsonElement("avgRating")]
public double AvgRating { get; set; }
}
var pipeline = reviewsCollection.Aggregate()
.Group(
r => r.BookId,
g => new BookRating
{
BookId = g.Key,
AvgRating = g.Average(r => r.Rating.Value)
}
);
var booksWithAvg = pipeline.ToList();
π 2. Find users with the most number of reviews (Hint: use the name field in the reviews collection)β
Answer
There are 2 ways to solve this-
- $group with $sort
- $sortByCount
- JavaScript
- C#
- $group with $sum + $sort
- $sortByCount
await reviews.aggregate([
{
$group: {
_id: "$name",
totalReviews: { $sum: 1 },
},
},
{
$sort: {
totalReviews: -1,
},
},
]).toArray();
await reviews.aggregate([
{
$sortByCount: "$name",
},
]).toArray();
- $group with $sum + $sort
- $sortByCount
public class ReviewerCount
{
[BsonId]
public string Name { get; set; }
[BsonElement("totalReviews")]
public int TotalReviews { get; set; }
}
var pipeline = reviewsCollection.Aggregate()
.Group(
r => r.Name,
g => new ReviewerCount
{
Name = g.Key,
TotalReviews = g.Count()
}
)
.SortByDescending(r => r.TotalReviews);
var result = await pipeline.ToListAsync();
var pipeline = reviewsCollection.Aggregate()
.AppendStage<BsonDocument>(
new BsonDocument("$sortByCount", "$name")
);
var result = pipeline.ToList();