This content is for v5.2. Switch to the latest version for up-to-date documentation.
Distinct and GroupBy
In the Laravel-Elasticsearch integration, distinct() and groupBy() methods play a pivotal role in data aggregation, especially when retrieving unique field values or aggregating grouped data summaries.
The distinct() and groupBy() methods are used to retrieve unique values of a given field. But they differ significantly in how they’re implemented under the hood
distinct() uses: Nested Term Aggs
Section titled “distinct() uses: Nested Term Aggs”- Full results, cannot paginate
- Sort on column names and by doc_count (ES default is _doc_count desc)
- Ideal For:
- Quick unique field lookups by doc count
groupBy() uses: Composite Aggregation
Section titled “groupBy() uses: Composite Aggregation”- Can paginate
- Sort on column names only (no doc_count)
- Ideal For:
- Paginated results
- Fetching relationship data
- Aggregating on the grouped data (eg: sum, avg, min, max)
Distinct
Section titled “Distinct”Basic Usage
Section titled “Basic Usage”->distinct(string|array $field, $includeDocCount = false);
UserLog::where('created_at', '>=', Carbon::now()->subDays(30))->distinct('user_id');
// Alternative syntax, explicitly selecting the fieldUserLog::where('created_at', '>=', Carbon::now()->subDays(30))->select('user_id')->distinct();Retrieves All the unique user_ids of users logged in the last 30 days
{ "index": "user_logs", "body": { "query": { "range": { "created_at": { "gte": "2025-02-21T18:41:18+00:00" } } }, "size": 0, "aggs": { "by_user_id": { "terms": { "field": "user_id", "size": 1000 } } } }, "_source": [ "user_id" ]}Note on Limits
Section titled “Note on Limits”Distinct will run a Nested Term Aggs on a given query. The aggregation will return the top unique values, limited to 1000 by default.
- ex:
UserLog::distinct('user_id')- will run a term aggregation on all records and return the top 1000 ordered by doc_count desc - ex:
UserLog::limit(5)->distinct('user_id')- will run a term aggregation on all records and return the top 5 ordered by doc_count desc - ex:
UserLog::limit(10000)->distinct('user_id')- will run a term aggregation on all records and return the top 10,000 ordered by doc_count desc
Multiple Fields
Section titled “Multiple Fields”Multiple fields perform a nested terms aggregation. The results is a unique combination of the fields provided.
UserLog::where('created_at', '>=', Carbon::now()->subDays(30))->distinct(['status', 'log_code']);Retrieves All the unique combinations of status and log_code of users logged in the last 30 days
{ "index": "user_logs", "body": { "query": { "range": { "created_at": { "gte": "2025-02-21T18:42:19+00:00" } } }, "size": 0, "aggs": { "by_status": { "terms": { "field": "status", "size": 1000 }, "aggs": { "by_log_code": { "terms": { "field": "log_code", "size": 1000 } } } } } }, "_source": [ "status", "log_code" ]}Ordering by Aggregation Count
Section titled “Ordering by Aggregation Count”_count is a special internal alias used by the package to sort by aggregation result count.
_count is not available with groupBy().
UserLog::where('created_at', '>=', Carbon::now()->subDays(30))->orderByDesc('_count')->distinct('user_id');Retrieves All the unique user_ids of users logged in the last 30 days Ordered by most logs by a user
{ "index": "user_logs", "body": { "query": { "range": { "created_at": { "gte": "2025-02-21T18:47:14+00:00" } } }, "size": 0, "aggs": { "by_user_id": { "terms": { "field": "user_id", "size": 1000, "order": [ { "_count": "desc" } ] } } } }, "_source": [ "user_id" ]}Return with doc count
Section titled “Return with doc count”UserLog::where('created_at', '>=', Carbon::now()->subDays(30)) ->orderByDesc('_count') ->distinct('user_id',true);
//OR
UserLog::where('created_at', '>=', Carbon::now()->subDays(30)) ->orderByDesc('_count') ->select('user_id') ->distinct(true);Retrieves All the unique user_ids of users logged in the last 30 days Ordered by most logs by a user
{ "index": "user_logs", "body": { "query": { "range": { "created_at": { "gte": "2025-02-21T18:47:14+00:00" } } }, "size": 0, "aggs": { "by_user_id": { "terms": { "field": "user_id", "size": 1000, "order": [ { "_count": "desc" } ] } } } }, "_source": [ "user_id" ]}{ "user_id": "1", "user_id_count": 24},{ "user_id": "2", "user_id_count": 18},{ "user_id": "3", "user_id_count": 8},GroupBy
Section titled “GroupBy”...->groupBy(string|array $field)->....
Basic Usage
Section titled “Basic Usage”UserLog::where('created_at', '>=', Carbon::now()->subDays(30))->groupBy(['user_id'])->get();Retrieves All the unique user_ids of users logged in the last 30 days
{ "index": "user_logs", "body": { "query": { "range": { "created_at": { "gte": "2025-02-22T09:20:46+00:00" } } }, "size": 0, "aggs": { "group_by": { "composite": { "sources": [ { "user_id": { "terms": { "field": "user_id.keyword" } } } ] } } } }, "_source": [ "*" ]}GroupBy take and skip
Section titled “GroupBy take and skip”If no limit is provided, then the aggregation will return the first 10 results as per Elasticsearch default.
You can set the limit to whatever you want, but it is not necessary to cover all the hits of your query since the composite is pagable.
UserLog::groupBy(['user_id'])->take(15)->skip(4)->get();{ "index": "user_logs", "body": { "size": 0, "aggs": { "group_by": { "composite": { "sources": [ { "user_id": { "terms": { "field": "user_id.keyword" } } } ], "size": 15, "after": { "user_id": "158e6cc9-fa3e-3258-99c2-571d052e923c" } } } } }, "_source": [ "*" ]}Multiple Fields
Section titled “Multiple Fields”UserLog::where('created_at', '>=', Carbon::now()->subDays(30))->groupBy(['user_id','status'])->get();{ "index": "user_logs", "body": { "query": { "range": { "created_at": { "gte": "2025-02-22T11:26:48+00:00" } } }, "size": 0, "aggs": { "group_by": { "composite": { "sources": [ { "user_id": { "terms": { "field": "user_id.keyword" } } }, { "status": { "terms": { "field": "status" } } } ] } } } }, "_source": [ "*" ]}Model Relations
Section titled “Model Relations”The results from groupBy() queries are returned as ElasticCollections, enabling the use of Laravel’s rich collection methods for further manipulation or processing.
Example: User has many user logs:
// Loading related user data from the distinct user_ids$users = UserLog::where('created_at', '>=', Carbon::now()->subDays(30))->groupBy(['user_id'])->get();return $users->load('user');GroupBy Pagination with Relations
Section titled “GroupBy Pagination with Relations”Example:
UserLog::where('created_at', '>=', Carbon::now()->subDays(30)) ->groupBy(['user_id']) ->paginate(10)->through(function ($userLog) { $userLog->load('user'); return $userLog; });