Build RisingWave with Multiple Object Storage Backends

Overview

As a cloud-neutral database, RisingWave supports running on different (object) storage backends. Currently, these storage products include

This doc first briefly introduces how RisingWave supports these storage products, then give a guidance about how to build RisingWave with these object stores quickly and easily through risedev.

How RisingWave supports multiple object storage backends

The first supported object storage was s3. Afterwards, for other object storage, RisingWave supports them in two ways: via s3 compatible mode or via OpenDAL.

S3 and other S3 compatible object store

If an object store declares that it is s3-compatible, it means that it can be directly accessed through the s3 APIs. As RisingWave already implemented S3ObjectStore, we can reuse the interfaces of s3 to access this kind of object storage.

Currently for COS and Lyvecloud Storage, we use s3 compatible mode. To use these two object storage products, you need to overwrite s3 environmrnt with the corresponding access_key, secret_key, region and bueket_name, and config endpoint as well.

OpenDAL object store

For those (object) storage products that are not compatible with s3 (or compatible but some interfaces are unstable), we use OpenDAL to access them. OpenDAL is the Open Data Access Layer to freely access data, which supports several different storage backends. We implemented a OpenDALObjectStore to support the interface for accessing object store in RisingWave.

All of these object stores are supported in risedev, you can use the risedev command to start RisingWave on these storage backends.

How to build RisingWave with multiple object store

COS & Lyvecloud Storage

To use COS or Lyvecloud Storage, you need to overwrite the aws default access_key, secret_key, region, and config endpoint in the environment variable:

export AWS_REGION=your_region
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export RW_S3_ENDPOINT=your_endpoint

then in risedev.yml, set the bucket name, starting RisingWave with ridedev. Then you can successfully run RisingWave on these two storage backends.

GCS

To use GCS, you need to enable OpenDAL in risedev.yml, set engine = gcs, bucket_name and root as well. For authentication, you just need to config credential by gcloud, and then OpenDAL gcs backend will automatically find the token in application to complete the verification.

Once these configurations are set, run ./risedev d gcs and then you can run RisingWave on GCS.

OSS

To use OSS, you need to enable OpenDAL in risedev.yml, set engine = oss, bucket_name and root as well.

For authentication, set the identity information in the environment variable:

export OSS_ENDPOINT="endpoint"
export OSS_ACCESS_KEY_ID="oss_access_key"
export OSS_ACCESS_KEY_SECRET="oss_secret_key"

Once these configurations are set, run ./risedev d oss and then you can run RisingWave on OSS.

Azure Blob

To use Azure Blob, you need to enable OpenDAL in risedev.yml, set engine = azblob, bucket_name and root as well. For azure blob storage, bucket_name is actually the container_name.

For authentication, set the identity information in the environment variable:

export AZBLOB_ENDPOINT="endpoint"
export AZBLOB_ACCOUNT_NAME="your_account_name"
export AZBLOB_ACCOUNT_KEY="your_account_key"

Once these configurations are set, run ./risedev d azblob and then you can run RisingWave on Azure Blob Storage.

HDFS

HDFS requairs complete hadoop environment and java environment, which are very heavy. Thus, RisingWave does not open the hdfs feature by default. To compile RisingWave with hdfs backend, turn on this feature first, and enable hdfs for risedev tools. Run ./risedev configure, and enable [Component] Hummock: Hdfs Backend.

After that, you need to enable OpenDAL in risedev.yml, set engine = hdfs, namenode and root as well.

You can also use WebHDFS as a lightweight alternative to HDFS. Hdfs is powered by HDFS’s native java client. Users need to setup the hdfs services correctly. But webhdfs can access from HTTP API and no extra setup needed. The way to start WebHDFS is basically the same as hdfs, but its default name_node is 127.0.0.1:9870.

Once these configurations are set, run ./risedev d hdfs or ./risedev d webhdfs, then you can run RisingWave on HDFS(WebHDFS).