Skip to content

Distinct and GroupBy

In the Laravel-Elasticsearch integration, distinct() and groupBy() methods play a pivotal role in data aggregation, especially when retrieving unique field values or aggregating grouped data summaries.


The distinct() and groupBy() methods are used to retrieve unique values of a given field. But they differ significantly in how they’re implemented under the hood

distinct() uses: Nested Term Aggs

  • Full results, cannot paginate
  • Sort on column names and by doc_count (ES default is _doc_count desc)
  • Ideal For:
    • Quick unique field lookups by doc count

groupBy() uses: Composite Aggregation

  • Can paginate
  • Sort on column names only (no doc_count)
  • Ideal For:
    • Paginated results
    • Fetching relationship data
    • Aggregating on the grouped data (eg: sum, avg, min, max)
distinct(string|array $field, $includeDocCount = false)
Section titled “distinct(string|array $field, $includeDocCount = false)”
UserLog::where('created_at', '>=', Carbon::now()->subDays(30))
->distinct('user_id');
// Alternative syntax, explicitly selecting the field
UserLog::where('created_at', '>=', Carbon::now()->subDays(30))
->select('user_id')->distinct();

Retrieves All the unique user_ids of users logged in the last 30 days

{
"index": "user_logs",
"body": {
"query": {
"range": {
"created_at": {
"gte": "2025-02-21T18:41:18+00:00"
}
}
},
"size": 0,
"aggs": {
"by_user_id": {
"terms": {
"field": "user_id",
"size": 1000
}
}
}
},
"_source": [
"user_id"
]
}

Distinct will run a Nested Term Aggs on a given query. The aggregation will return the top unique values, limited to 10 by default.

  • ex: UserLog::distinct('user_id') - will run a term aggregation on all records and return the top 10 ordered by doc_count desc
  • ex: UserLog::limit(5)->distinct('user_id') - will run a term aggregation on all records and return the top 5 ordered by doc_count desc
  • ex: UserLog::limit(10000)->distinct('user_id') - will run a term aggregation on all records and return the top 10,000 ordered by doc_count desc

Multiple fields perform a nested terms aggregation. The results is a unique combination of the fields provided.

UserLog::where('created_at', '>=', Carbon::now()->subDays(30))
->distinct(['status', 'log_code']);

Retrieves All the unique combinations of status and log_code of users logged in the last 30 days

{
"index": "user_logs",
"body": {
"query": {
"range": {
"created_at": {
"gte": "2025-02-21T18:42:19+00:00"
}
}
},
"size": 0,
"aggs": {
"by_status": {
"terms": {
"field": "status",
"size": 10
},
"aggs": {
"by_log_code": {
"terms": {
"field": "log_code",
"size": 10
}
}
}
}
}
},
"_source": [
"status",
"log_code"
]
}

_count is a special internal alias used by the package to sort by aggregation result count.

_count is not available with groupBy().

UserLog::where('created_at', '>=', Carbon::now()->subDays(30))
->orderByDesc('_count')
->distinct('user_id');

Retrieves All the unique user_ids of users logged in the last 30 days Ordered by most logs by a user

{
"index": "user_logs",
"body": {
"query": {
"range": {
"created_at": {
"gte": "2025-02-21T18:47:14+00:00"
}
}
},
"size": 0,
"aggs": {
"by_user_id": {
"terms": {
"field": "user_id",
"size": 10,
"order": [
{
"_count": "desc"
}
]
}
}
}
},
"_source": [
"user_id"
]
}
UserLog::where('created_at', '>=', Carbon::now()->subDays(30))
->orderByDesc('_count')
->distinct('user_id',true);
//OR
UserLog::where('created_at', '>=', Carbon::now()->subDays(30))
->orderByDesc('_count')
->select('user_id')
->distinct(true);

Retrieves All the unique user_ids of users logged in the last 30 days Ordered by most logs by a user

{
"index": "user_logs",
"body": {
"query": {
"range": {
"created_at": {
"gte": "2025-02-21T18:47:14+00:00"
}
}
},
"size": 0,
"aggs": {
"by_user_id": {
"terms": {
"field": "user_id",
"size": 10,
"order": [
{
"_count": "desc"
}
]
}
}
}
},
"_source": [
"user_id"
]
}
{
"user_id": "1",
"user_id_count": 24
},
{
"user_id": "2",
"user_id_count": 18
},
{
"user_id": "3",
"user_id_count": 8
},
Version 5.3+

As of v5.3, distinct results are returned as ElasticCollections, enabling the use of Laravel’s rich collection methods for further manipulation or processing.

If a model relation is defined and you’ve aggregated on the foreign_key, you can load the related model

UserLog::where('created_at', '>=', Carbon::now()->subDays(30))
->with('user')
->orderByDesc('_count')
->select('user_id')
->distinct(true);

Retrieves All the unique user_ids of users logged in the last 30 days Ordered by most logs by a user

{
"index": "user_logs",
"body": {
"query": {
"range": {
"created_at": {
"gte": "2025-02-21T18:47:14+00:00"
}
}
},
"size": 0,
"aggs": {
"by_user_id": {
"terms": {
"field": "user_id",
"size": 10,
"order": [
{
"_count": "desc"
}
]
}
}
}
},
"_source": [
"user_id"
]
}
{
"user_id": "1",
"user_id_count": 24,
"user": {
"id": "1",
"name": "John Doe",
...
}
},
{
"user_id": "2",
"user_id_count": 18,
"user": {
"id": "2",
"name": "Jane Smith",
...
}
},
{
"user_id": "3",
"user_id_count": 8,
"user": {
"id": "3",
"name": "Alice Johnson",
...
}
},

Version 5.3+
bulkDistinct(array $fields, $includeDocCount = false)
Section titled “bulkDistinct(array $fields, $includeDocCount = false)”

To perform distinct on multiple fields in a single query, you can use the bulkDistinct() method.

This operates similarly to distinct(), but specifying multiple fields does not create nested aggregations. Instead, it runs separate term aggregations for each field provided.

As with distinct(), setting true for $includeCount will return the doc_count for each unique value.

$top3 = UserSession::where('created_at', '>=', Carbon::now()->subDays(30))
->limit(3)
->bulkDistinct(['country', 'device', 'browser_name'], true);
{
"index": "user_sessions",
"body": {
"query": {
"range": {
"created_at": {
"gte": "2025-12-21T13:02:07+00:00"
}
}
},
"size": 0,
"aggs": {
"by_country": {
"terms": {
"field": "country.keyword",
"size": 3
}
},
"by_device": {
"terms": {
"field": "device.keyword",
"size": 3
}
},
"by_browser_name": {
"terms": {
"field": "browser_name.keyword",
"size": 3
}
}
}
},
"_source": [
"country",
"device",
"browser_name"
]
}
[
{
"country": "US",
"country_count": 737
},
{
"country": "ZA",
"country_count": 148
},
{
"country": "GB",
"country_count": 105
},
{
"device": "desktop",
"device_count": 2287
},
{
"device": "mobile",
"device_count": 1574
},
{
"device": "tablet",
"device_count": 1139
},
{
"browser_name": "Chrome",
"browser_name_count": 1569
},
{
"browser_name": "Safari",
"browser_name_count": 1496
},
{
"browser_name": "Firefox",
"browser_name_count": 1139
}
]

UserLog::where('created_at', '>=', Carbon::now()->subDays(30))
->groupBy(['user_id'])
->get();

Retrieves All the unique user_ids of users logged in the last 30 days

{
"index": "user_logs",
"body": {
"query": {
"range": {
"created_at": {
"gte": "2025-02-22T09:20:46+00:00"
}
}
},
"size": 0,
"aggs": {
"group_by": {
"composite": {
"sources": [
{
"user_id": {
"terms": {
"field": "user_id.keyword"
}
}
}
]
}
}
}
},
"_source": [
"*"
]
}

If no limit is provided, then the aggregation will return the first 10 results as per Elasticsearch default.

You can set the limit to whatever you want, but it is not necessary to cover all the hits of your query since the composite is pagable.

UserLog::groupBy(['user_id'])
->take(15)->skip(4)
->get();
{
"index": "user_logs",
"body": {
"size": 0,
"aggs": {
"group_by": {
"composite": {
"sources": [
{
"user_id": {
"terms": {
"field": "user_id.keyword"
}
}
}
],
"size": 15,
"after": {
"user_id": "158e6cc9-fa3e-3258-99c2-571d052e923c"
}
}
}
}
},
"_source": [
"*"
]
}
UserLog::where('created_at', '>=', Carbon::now()->subDays(30))
->groupBy(['user_id','status'])
->get();
{
"index": "user_logs",
"body": {
"query": {
"range": {
"created_at": {
"gte": "2025-02-22T11:26:48+00:00"
}
}
},
"size": 0,
"aggs": {
"group_by": {
"composite": {
"sources": [
{
"user_id": {
"terms": {
"field": "user_id.keyword"
}
}
},
{
"status": {
"terms": {
"field": "status"
}
}
}
]
}
}
}
},
"_source": [
"*"
]
}

The results from groupBy() queries are returned as ElasticCollections, enabling the use of Laravel’s rich collection methods for further manipulation or processing.

Example: User has many user logs:

// Loading related user data from the distinct user_ids
$users = UserLog::with('user')
->where('created_at', '>=', Carbon::now()->subDays(30))
->groupBy(['user_id'])
->get();
Loads the related user data for the distinct user_ids

Example:

UserLog::where('created_at', '>=', Carbon::now()->subDays(30))
->groupBy(['user_id'])
->paginate(10)->through(function ($userLog) {
$userLog->load('user');
return $userLog;
}
);

Version 5.3+
groupByRanges(string $field, array $ranges)
Section titled “groupByRanges(string $field, array $ranges)”

groupByRanges() performs a range aggregation on the specified field.

This is useful to count records that fall within specific ranges, or aggregate data based on those ranges.

The range paramter is an array of ranges. Each range can be defined as:

  • An associative array with from and/or to keys with optional key key for naming the range
  • An array of [$from, $to] values
$ranges = [
[
'key' => 'low-stock',
'to' => 5,
],
[
'key' => 'medium-stock',
'from' => 5,
'to' => 15,
],
[
'key' => 'high-stock',
'from' => 15,
]
];
$groups = Item::groupByRanges('stock', $ranges)
->get();
{
"index": "items",
"body": {
"size": 0,
"aggs": {
"stock_range": {
"range": {
"field": "stock",
"ranges": [
{
"key": "low-stock",
"to": 5
},
{
"key": "medium-stock",
"from": 5,
"to": 15
},
{
"key": "high-stock",
"from": 15
}
]
}
}
}
},
"_source": [
"*"
]
}
[
{
"count_stock_range_low-stock": 2,
},
{
"count_stock_range_medium-stock": 6
},
{
"count_stock_range_high-stock": 2
}
]

You can also perform sub-aggregations on the grouped ranges. For example, to get the average, min and max price of items in each stock range:

$ranges = [
[
'key' => 'low-stock',
'to' => 5,
],
[
'key' => 'medium-stock',
'from' => 5,
'to' => 15,
],
[
'key' => 'high-stock',
'from' => 15,
]
];
$groups = Item::groupByRanges('stock', $ranges)
->agg(['min', 'max', 'avg'], 'price');
{
"index": "items",
"body": {
"size": 0,
"aggs": {
"stock_range": {
"range": {
"field": "stock",
"ranges": [
{
"to": 5,
"key": "low-stock"
},
{
"from": 5,
"to": 15,
"key": "medium-stock"
},
{
"from": 15,
"key": "high-stock"
}
]
},
"aggs": {
"min_price": {
"min": {
"field": "price"
}
},
"max_price": {
"max": {
"field": "price"
}
},
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
[
{
"count_stock_range_low-stock": 2,
"min_price_stock_range_low-stock": 8,
"max_price_stock_range_low-stock": 1500,
"avg_price_stock_range_low-stock": 754
},
{
"count_stock_range_medium-stock": 6,
"min_price_stock_range_medium-stock": 6,
"max_price_stock_range_medium-stock": 900,
"avg_price_stock_range_medium-stock": 173.83333333333334
},
{
"count_stock_range_high-stock": 2,
"min_price_stock_range_high-stock": 3,
"max_price_stock_range_high-stock": 350,
"avg_price_stock_range_high-stock": 176.5
}
]
Version 5.3+
groupByDateRanges(string $field, array $ranges, $options = [])
Section titled “groupByDateRanges(string $field, array $ranges, $options = [])”

groupByDateRanges() performs a date range aggregation on the specified field.

Similar to groupByRanges(), but specifically for date fields where the from and to values can be expressed in date math format. It also accepts an optional $options array to specify date formats.

$ranges = [
['to' => 'now-10M/M'],
['from' => 'now-10M/M'],
];
$options = ['format' => 'MM-yyyy'];
UserSession::groupByDateRanges('created_at', $ranges, $options)->get();
{
"index": "user_sessions",
"body": {
"size": 0,
"aggs": {
"created_at_range": {
"date_range": {
"field": "created_at",
"format": "MM-yyyy",
"ranges": [
{
"to": "now-10M/M"
},
{
"from": "now-10M/M"
}
]
}
}
}
}
}
Version 5.3+

Since each distinct(), bulkDistinct(),groupBy(), groupByRanges(), and groupByDateRanges() are returned as an ElasticCollections, you can access the bucket meta values using the getMetaValue('bucket') method.

$ranges = [
[
'key' => 'low-stock',
'to' => 5,
],
[
'key' => 'medium-stock',
'from' => 5,
'to' => 15,
],
[
'key' => 'high-stock',
'from' => 15,
]
];
$groups = Item::groupByRanges('stock', $ranges)
->agg(['min', 'max', 'avg'], 'price');
$buckets = $groups->map(function ($group) {
return $group->getMetaValue('bucket');
});
// return $buckets;
[
{
"key": "low-stock",
"to": 5,
"doc_count": 2,
"min_price": {
"value": 8
},
"avg_price": {
"value": 754
},
"max_price": {
"value": 1500
}
},
{
"key": "medium-stock",
"from": 5,
"to": 15,
"doc_count": 6,
"min_price": {
"value": 6
},
"avg_price": {
"value": 173.83333333333334
},
"max_price": {
"value": 900
}
},
{
"key": "high-stock",
"from": 15,
"doc_count": 2,
"min_price": {
"value": 3
},
"avg_price": {
"value": 176.5
},
"max_price": {
"value": 350
}
}
]