Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: clarify behavior of agg funcs regarding to nulls #78

Merged
merged 1 commit into from
Nov 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 49 additions & 40 deletions sql/functions/aggregate.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,17 @@ For details about the supported syntaxes of aggregate expressions, see [Aggregat

### `array_agg`

Returns an array from input values in which each value in the set is assigned to an array element. The `ORDER BY` clause is optional and specifies the order of rows processed in the aggregation, which determines the order of the elements in the result array.

```bash
array_agg ( expression [ ORDER BY [ sort_expression { ASC | DESC } ] ] ) -> output_array
Collects all the input values, including nulls, into an array. The `ORDER BY` clause is optional and specifies the order of rows processed in the aggregation, which determines the order of the elements in the result array.

```sql
array_agg ( expression [ ORDER BY sort_expression ] ) -> output_array
```

### `avg`

Returns the average (arithmetic mean) of the selected values.
Returns the average (arithmetic mean) of all non-null input values or null if no non-null values are provided.

```bash
```sql
avg ( expression ) -> see description
```

Expand All @@ -33,92 +32,101 @@ Return type is numeric for integer inputs and double precision for float point i

Returns the bitwise AND of all non-null input values or null if no non-null values are provided.

```bash
bit_and ( smallint, int, or bigint ) -> same as input type
```sql
bit_and ( smallint | int | bigint ) -> same as input type
```

### `bit_or`

Returns the bitwise OR of all non-null input values or null if no non-null values are provided.

```sql
bit_or ( smallint, int, or bigint ) -> same as input type
bit_or ( smallint | int | bigint ) -> same as input type
```

### `bool_and`
Returns true if all input values are true, otherwise false.

Returns true if all non-null input values are true, otherwise false.

```sql
bool_and ( boolean ) -> boolean
```

### `bool_or`

Returns true if at least one input value is true, otherwise false.
Returns true if any non-null input value is true, otherwise false.

```sql
bool_or ( boolean ) -> boolean
```

### `count`

Returns the number of non-null rows.
Returns the number of non-null input values.

```bash
```sql
count ( expression ) -> bigint
```

The input can be of any supported data type.

### `count(*)`

Returns the number of rows in the input.

```sql
count(*) -> bigint
```

### `jsonb_agg`

Aggregates values, including nulls, as a JSON array. The `ORDER BY` clause is optional and specifies the order of rows processed in the aggregation, which determines the order of the elements in the result array.
Collects all the input values, including nulls, into a JSON array. The `ORDER BY` clause is optional and specifies the order of rows processed in the aggregation, which determines the order of the elements in the result array.

```bash
jsonb_agg ( any_element ) -> jsonb
```sql
jsonb_agg ( any_element [ ORDER BY sort_expression ] ) -> jsonb
```

### `jsonb_object_agg`

Aggregates name/value pairs as a JSON object.
Aggregates name/value pairs as a JSON object. Values can be null, but keys cannot.

```bash
jsonb_object_agg ( key "string" , value "any" ) -> jsonb
```sql
jsonb_object_agg ( key "text" , value "any" ) -> jsonb
```

### `max`

Returns the maximum value in a set of values.
Returns the maximum of the non-null input values, or null if no non-null values are provided.

```bash
```sql
max ( expression ) -> same as input type
```

Input can be of any numeric, string, date/time, or interval type, or an array of these types.

### `min`

Returns the minimum value in a set of values.
Returns the minimum value of the non-null input values, or null if no non-null values are provided.

```bash
```sql
min ( expression ) -> same as input type
```

Input can be of any numeric, string, date/time, or interval type, or an array of these types.

### `string_agg`

Combines non-null values into a string, separated by `delimiter_string`. The `ORDER BY` clause is optional and specifies the order of rows processed in the aggregation, which determines the order of the elements in the result array.
Concatenates non-null input values into a string. Each value after the first is preceded by the corresponding delimiter (if it's not null). If no non-null values are provided, returns null. The `ORDER BY` clause is optional and specifies the order of rows processed in the aggregation, which determines the order of the elements in the result array.

```bash
string_agg ( expression, delimiter_string ) -> output_string
```sql
string_agg ( value text, delimiter text [ ORDER BY sort_expression ] ) -> output_string
```

### `sum`

Returns the sum of all input values.
Returns the sum of all non-null input values, or null if no non-null values are provided.

```bash
```sql
sum ( expression )
```

Expand All @@ -128,19 +136,19 @@ Return type is bigint for smallint or int inputs, numeric for bigint inputs, oth

### `first_value`

Returns the first value in an ordered set of values.
Returns the first value in an ordered set of values, including nulls.

```bash
```sql
first_value ( expression ORDER BY order_key ) -> same as input type
```

`order_key` is the column or expression used to determine the order of the values. It is required to make the result deterministic.

### `last_value`

Returns the last value in an ordered set of values.
Returns the last value in an ordered set of values, including nulls.

```bash
```sql
last_value ( expression ORDER BY order_key ) -> same as input type
```

Expand All @@ -150,31 +158,31 @@ last_value ( expression ORDER BY order_key ) -> same as input type

Calculates the population standard deviation of the input values. Returns `NULL` if the input contains no non-null values.

```bash
```sql
stddev_pop ( expression ) -> output_value
```

### `stddev_samp`

Calculates the sample standard deviation of the input values. Returns `NULL` if the input contains fewer than two non-null values.

```bash
```sql
stddev_samp ( expression ) -> output_value
```

### `var_pop`

Calculates the population variance of the input values. Returns `NULL` if the input contains no non-null values.

```bash
```sql
var_pop ( expression ) -> output_value
```

### `var_samp`

Calculates the sample variance of the input values. Returns `NULL` if the input contains fewer than two non-null values.

```bash
```sql
var_samp ( expression ) -> output_value
```

Expand All @@ -188,7 +196,7 @@ At present, ordered-set aggregate functions support only constant fraction argum
Computes the mode, which is the most frequent value of the aggregated argument. If there are multiple equally-frequent values, it arbitrarily chooses the first one.

```sql
mode () WITHIN GROUP ( ORDER BY sort_expression anyelement ) -> same as sort_expression
mode () WITHIN GROUP ( ORDER BY sort_expression ) -> same as sort_expression
```

`sort_expression`: Must be of a sortable type.
Expand All @@ -207,7 +215,7 @@ At present, `percentile_cont` is not supported for [streaming queries](/docs/cur

Computes the continuous percentile, which is a value corresponding to the specified fraction within the ordered set of aggregated argument values. It can interpolate between adjacent input items if needed.

```bash
```sql
percentile_cont ( fraction double precision ) WITHIN GROUP ( ORDER BY sort_expression double precision ) -> double precision

```
Expand All @@ -216,14 +224,15 @@ percentile_cont ( fraction double precision ) WITHIN GROUP ( ORDER BY sort_expre

This example calculates the median (50th percentile) of the values in `column1` from `table1`.

```bash
```sql
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY column1) FROM table1;
```

If NULL is provided, the function will not calculate a specific percentile and return NULL instead.


### `percentile_disc`

<Note>
At present, `percentile_disc` is not supported for streaming queries yet.
</Note>
Expand Down Expand Up @@ -279,7 +288,7 @@ Grouping operation functions are used in conjunction with grouping sets to disti

Returns a bit mask indicating which `GROUP BY` expressions are not included in the current grouping set. Bits are assigned with the rightmost argument corresponding to the least-significant bit; each bit is 0 if the corresponding expression is included in the grouping criteria of the grouping set generating the current result row, and 1 if it is not included.

```bash Syntax
```sql Syntax
grouping ( group_by_expression(s) ) → integer
```

Expand Down