Build RisingWave with Multiple Object Storage Backends
- Overview
- How RisingWave supports multiple object storage backends
- How to build RisingWave with multiple object store
Overview
As a cloud-neutral database, RisingWave supports running on different (object) storage backends. Currently, these storage products include
This doc first briefly introduces how RisingWave supports these storage products, then give a guidance about how to build RisingWave with these object stores quickly and easily through risedev.
How RisingWave supports multiple object storage backends
The first supported object storage was s3. Afterwards, for other object storage, RisingWave supports them in two ways: via s3 compatible mode or via OpenDAL.
S3 and other S3 compatible object store
If an object store declares that it is s3-compatible, it means that it can be directly accessed through the s3 APIs. As RisingWave already implemented S3ObjectStore
, we can reuse the interfaces of s3 to access this kind of object storage.
Currently for COS and Lyvecloud Storage, we use s3 compatible mode. To use these two object storage products, you need to overwrite s3 environmrnt with the corresponding access_key
, secret_key
, region
and bueket_name
, and config endpoint
as well.
OpenDAL object store
For those (object) storage products that are not compatible with s3 (or compatible but some interfaces are unstable), we use OpenDAL to access them. OpenDAL is the Open Data Access Layer to freely access data, which supports several different storage backends. We implemented a OpenDALObjectStore
to support the interface for accessing object store in RisingWave.
All of these object stores are supported in risedev, you can use the risedev command to start RisingWave on these storage backends.
How to build RisingWave with multiple object store
COS & Lyvecloud Storage
To use COS or Lyvecloud Storage, you need to overwrite the aws default access_key
, secret_key
, region
, and config endpoint in the environment variable:
export AWS_REGION=your_region
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export RW_S3_ENDPOINT=your_endpoint
then in risedev.yml
, set the bucket name, starting RisingWave with ridedev. Then you can successfully run RisingWave on these two storage backends.
GCS
To use GCS, you need to enable OpenDAL in risedev.yml
, set engine = gcs
, bucket_name
and root
as well. For authentication, you just need to config credential by gcloud
, and then OpenDAL gcs backend will automatically find the token in application to complete the verification.
Once these configurations are set, run ./risedev d gcs
and then you can run RisingWave on GCS.
OSS
To use OSS, you need to enable OpenDAL in risedev.yml
, set engine = oss
, bucket_name
and root
as well.
For authentication, set the identity information in the environment variable:
export OSS_ENDPOINT="endpoint"
export OSS_ACCESS_KEY_ID="oss_access_key"
export OSS_ACCESS_KEY_SECRET="oss_secret_key"
Once these configurations are set, run ./risedev d oss
and then you can run RisingWave on OSS.
Azure Blob
To use Azure Blob, you need to enable OpenDAL in risedev.yml
, set engine = azblob
, bucket_name
and root
as well. For azure blob storage, bucket_name
is actually the container_name
.
For authentication, set the identity information in the environment variable:
export AZBLOB_ENDPOINT="endpoint"
export AZBLOB_ACCOUNT_NAME="your_account_name"
export AZBLOB_ACCOUNT_KEY="your_account_key"
Once these configurations are set, run ./risedev d azblob
and then you can run RisingWave on Azure Blob Storage.
HDFS
HDFS requairs complete hadoop environment and java environment, which are very heavy. Thus, RisingWave does not open the hdfs feature by default. To compile RisingWave with hdfs backend, turn on this feature first, and enable hdfs for risedev tools.
Run ./risedev configure
, and enable [Component] Hummock: Hdfs Backend
.
After that, you need to enable OpenDAL in risedev.yml
, set engine = hdfs
, namenode
and root
as well.
You can also use WebHDFS as a lightweight alternative to HDFS. Hdfs is powered by HDFS’s native java client. Users need to setup the hdfs services correctly. But webhdfs can access from HTTP API and no extra setup needed. The way to start WebHDFS is basically the same as hdfs, but its default name_node is 127.0.0.1:9870
.
Once these configurations are set, run ./risedev d hdfs
or ./risedev d webhdfs
, then you can run RisingWave on HDFS(WebHDFS).