Function is_parquet_schema_match_source_schema

Source
pub fn is_parquet_schema_match_source_schema(
    arrow_data_type: &DataType,
    rw_data_type: &DataType,
) -> bool
Expand description

This function checks whether the schema of a Parquet file matches the user-defined schema in RisingWave. It handles the following special cases:

  • Arrow’s timestamp(_, None) types (all four time units) match with RisingWave’s Timestamp type.
  • Arrow’s timestamp(_, Some) matches with RisingWave’s Timestamptz type.
  • Since RisingWave does not have an UInt type:
    • Arrow’s UInt8 matches with RisingWave’s Int16.
    • Arrow’s UInt16 matches with RisingWave’s Int32.
    • Arrow’s UInt32 matches with RisingWave’s Int64.
    • Arrow’s UInt64 matches with RisingWave’s Decimal.
  • Arrow’s Float16 matches with RisingWave’s Float32.

Nested data type matching:

  • Struct: Arrow’s Struct type matches with RisingWave’s Struct type recursively, requiring the same field names and types.
  • List: Arrow’s List type matches with RisingWave’s List type recursively, requiring the same element type.
  • Map: Arrow’s Map type matches with RisingWave’s Map type recursively, requiring the key and value types to match, and the inner struct must have exactly two fields named “key” and “value”.