#[function]
Expand description
Defining the RisingWave SQL function from a Rust function.
§Table of Contents
- SQL Function Signature
- Rust Function Signature
- Table Function
- Registration and Invocation
- Appendix: Type Matrix
The following example demonstrates a simple usage:
#[function("add(int32, int32) -> int32")]
fn add(x: i32, y: i32) -> i32 {
x + y
}
§SQL Function Signature
Each function must have a signature, specified in the function("...")
part of the macro
invocation. The signature follows this pattern:
name ( [arg_types],* [...] ) [ -> [setof] return_type ]
Where name
is the function name in snake_case
, which must match the function name (in UPPER_CASE
) defined
in proto/expr.proto
.
arg_types
is a comma-separated list of argument types. The allowed data types are listed in
in the name
column of the appendix’s type matrix. Wildcards or auto
can also be used, as
explained below. If the function is variadic, the last argument can be denoted as ...
.
When setof
appears before the return type, this indicates that the function is a set-returning
function (table function), meaning it can return multiple values instead of just one. For more
details, see the section on table functions.
If no return type is specified, the function returns void
. However, the void type is not
supported in our type system, so it now returns a null value of type int.
§Multiple Function Definitions
Multiple #[function]
macros can be applied to a single generic Rust function to define
multiple SQL functions of different types. For example:
#[function("add(int16, int16) -> int16")]
#[function("add(int32, int32) -> int32")]
#[function("add(int64, int64) -> int64")]
fn add<T: Add>(x: T, y: T) -> T {
x + y
}
§Type Expansion with *
Types can be automatically expanded to multiple types using wildcards. Here are some examples:
*
: expands to all types.*int
: expands to int16, int32, int64.*float
: expands to float32, float64.
For instance, #[function("cast(varchar) -> *int")]
will be expanded to the following three
functions:
#[function("cast(varchar) -> int16")]
#[function("cast(varchar) -> int32")]
#[function("cast(varchar) -> int64")]
Please note the difference between *
and any
: *
will generate a function for each type,
whereas any
will only generate one function with a dynamic data type Scalar
.
This is similar to impl T
and dyn T
in Rust. The performance of using *
would be much better than any
.
But we do not always prefer *
due to better performance. In some cases, using any
is more convenient.
For example, in array functions, the element type of ListValue
is Scalar(Ref)Impl
.
It is unnecessary to convert it from/into various T
.
§Automatic Type Inference with auto
Correspondingly, the return type can be denoted as auto
to be automatically inferred based on
the input types. It will be inferred as the smallest type that can accommodate all input types.
For example, #[function("add(*int, *int) -> auto")]
will be expanded to:
#[function("add(int16, int16) -> int16")]
#[function("add(int16, int32) -> int32")]
#[function("add(int16, int64) -> int64")]
#[function("add(int32, int16) -> int32")]
...
Especially when there is only one input argument, auto
will be inferred as the type of that
argument. For example, #[function("neg(*int) -> auto")]
will be expanded to:
#[function("neg(int16) -> int16")]
#[function("neg(int32) -> int32")]
#[function("neg(int64) -> int64")]
§Custom Type Inference Function with type_infer
A few functions might have a return type that dynamically changes based on the input argument
types, such as unnest
. This is mainly for composite types like anyarray
, struct
, and anymap
.
In such cases, the type_infer
option can be used to specify a function to infer the return
type based on the input argument types. Its function signature is
fn(&[DataType]) -> Result<DataType>
For example:
#[function(
"unnest(anyarray) -> setof any",
type_infer = "|args| Ok(args[0].unnest_list())"
)]
This type inference function will be invoked at the frontend (infer_type_with_sigmap
).
§Rust Function Signature
The #[function]
macro can handle various types of Rust functions.
Each argument corresponds to the reference type in the type matrix.
The return value type can be the reference type or owned type in the type matrix.
For instance:
#[function("trim_array(anyarray, int32) -> anyarray")]
fn trim_array(array: ListRef<'_>, n: i32) -> ListValue {...}
§Nullable Arguments
The functions above will only be called when all arguments are not null.
It will return null if any argument is null.
If null arguments need to be considered, the Option
type can be used:
#[function("trim_array(anyarray, int32) -> anyarray")]
fn trim_array(array: ListRef<'_>, n: Option<i32>) -> ListValue {...}
This function will be called when n
is null, but not when array
is null.
§Return NULL
s and Errors
Similarly, the return value type can be one of the following:
T
: Indicates that a non-null value is always returned (for non-null inputs), and errors will not occur.Option<T>
: Indicates that a null value may be returned, but errors will not occur.Result<T>
: Indicates that an error may occur, but a null value will not be returned.Result<Option<T>>
: Indicates that a null value may be returned, and an error may also occur.
§Optimization
When all input and output types of the function are primitive type (refer to the type
matrix) and do not contain any Option or Result, the #[function]
macro will automatically
generate SIMD vectorized execution code.
Therefore, try to avoid returning Option
and Result
whenever possible.
§Variadic Function
Variadic functions accept a impl Row
input to represent tailing arguments.
For example:
#[function("concat_ws(varchar, ...) -> varchar")]
fn concat_ws(sep: &str, vals: impl Row) -> Option<Box<str>> {
let mut string_iter = vals.iter().flatten();
// ...
}
See risingwave_common::row::Row
for more details.
§Functions Returning Strings
For functions that return varchar types, you can also use the writer style function signature to avoid memory copying and dynamic memory allocation:
#[function("trim(varchar) -> varchar")]
fn trim(s: &str, writer: &mut impl Write) {
writer.write_str(s.trim()).unwrap();
}
If errors may be returned, then the return value should be Result<()>
:
#[function("trim(varchar) -> varchar")]
fn trim(s: &str, writer: &mut impl Write) -> Result<()> {
writer.write_str(s.trim()).unwrap();
Ok(())
}
If null values may be returned, then the return value should be Option<()>
:
#[function("trim(varchar) -> varchar")]
fn trim(s: &str, writer: &mut impl Write) -> Option<()> {
if s.is_empty() {
None
} else {
writer.write_str(s.trim()).unwrap();
Some(())
}
}
§Preprocessing Constant Arguments
When some input arguments of the function are constants, they can be preprocessed to avoid calculations every time the function is called.
A classic use case is regular expression matching:
#[function(
"regexp_match(varchar, varchar, varchar) -> varchar[]",
prebuild = "RegexpContext::from_pattern_flags($1, $2)?"
)]
fn regexp_match(text: &str, regex: &RegexpContext) -> ListValue {
regex.captures(text).collect()
}
The prebuild
argument can be specified, and its value is a Rust expression Type::method(...)
used to construct a new variable of Type
from the input arguments of the function.
Here $1
, $2
represent the second and third arguments of the function (indexed from 0),
and their types are &str
. In the Rust function signature, these positions of parameters will
be omitted, replaced by an extra new variable at the end.
This macro generates two versions of the function. If all the input parameters that prebuild
depends on are constants, it will precompute them during the build function. Otherwise, it will
compute them for each input row during evaluation. This way, we support both constant and variable
inputs while optimizing performance for constant inputs.
§Context
If a function needs to obtain type information at runtime, you can add an &Context
parameter to
the function signature. For example:
#[function("foo(int32) -> int64")]
fn foo(a: i32, ctx: &Context) -> i64 {
assert_eq!(ctx.arg_types[0], DataType::Int32);
assert_eq!(ctx.return_type, DataType::Int64);
// ...
}
§Async Function
Functions can be asynchronous.
#[function("pg_sleep(float64)")]
async fn pg_sleep(second: F64) {
tokio::time::sleep(Duration::from_secs_f64(second.0)).await;
}
Asynchronous functions will be evaluated on rows sequentially.
§Table Function
A table function is a special kind of function that can return multiple values instead of just
one. Its function signature must include the setof
keyword, and the Rust function should
return an iterator of the form impl Iterator<Item = T>
or its derived types.
For example:
#[function("generate_series(int32, int32) -> setof int32")]
fn generate_series(start: i32, stop: i32) -> impl Iterator<Item = i32> {
start..=stop
}
Likewise, the return value Iterator
can include Option
or Result
either internally or
externally. For instance:
impl Iterator<Item = Result<T>>
Result<impl Iterator<Item = T>>
Result<impl Iterator<Item = Result<Option<T>>>>
Currently, table function arguments do not support the Option
type. That is, the function will
only be invoked when all arguments are not null.
§Registration and Invocation
Every function defined by #[function]
is automatically registered in the global function
table.
You can build expressions through the following functions:
// scalar functions
risingwave_expr::expr::build(...) -> BoxedExpression
risingwave_expr::expr::build_from_prost(...) -> BoxedExpression
// table functions
risingwave_expr::table_function::build(...) -> BoxedTableFunction
risingwave_expr::table_function::build_from_prost(...) -> BoxedTableFunction
Or get their metadata through the following functions:
// scalar functions
risingwave_expr::sig::func::FUNC_SIG_MAP::get(...)
// table functions
risingwave_expr::sig::table_function::FUNC_SIG_MAP::get(...)
§Appendix: Type Matrix
§Base Types
name | SQL type | owned type | reference type | primitive? |
---|---|---|---|---|
boolean | boolean | bool | bool | yes |
int2 | smallint | i16 | i16 | yes |
int4 | integer | i32 | i32 | yes |
int8 | bigint | i64 | i64 | yes |
int256 | rw_int256 | Int256 | Int256Ref<'_> | no |
float4 | real | F32 | F32 | yes |
float8 | double precision | F64 | F64 | yes |
decimal | numeric | Decimal | Decimal | yes |
serial | serial | Serial | Serial | yes |
date | date | Date | Date | yes |
time | time | Time | Time | yes |
timestamp | timestamp | Timestamp | Timestamp | yes |
timestamptz | timestamptz | Timestamptz | Timestamptz | yes |
interval | interval | Interval | Interval | yes |
varchar | varchar | Box<str> | &str | no |
bytea | bytea | Box<[u8]> | &[u8] | no |
jsonb | jsonb | JsonbVal | JsonbRef<'_> | no |
any | any | ScalarImpl | ScalarRefImpl<'_> | no |
§Composite Types
name | SQL type | owned type | reference type |
---|---|---|---|
anyarray | any[] | ListValue | ListRef<'_> |
struct | record | StructValue | StructRef<'_> |
T1[] | T[] | ListValue | ListRef<'_> |
struct<name_T 1, ..> | struct<name T, ..> | (T, ..) | (&T, ..) |
T
could be any base type ↩