risingwave_connector::source::iceberg::parquet_file_handler

Function extract_valid_column_indices

source
pub fn extract_valid_column_indices(
    columns: Option<Vec<Column>>,
    metadata: &FileMetaData,
) -> ConnectorResult<Vec<usize>>
Expand description

Extracts valid column indices from a Parquet file schema based on the user’s requested schema.

This function is used for column pruning of Parquet files. It calculates the intersection between the columns in the currently read Parquet file and the schema provided by the user. This is useful for reading a RecordBatch with the appropriate ProjectionMask, ensuring that only the necessary columns are read.

§Parameters

  • columns: A vector of Column representing the user’s requested schema.
  • metadata: A reference to FileMetaData containing the schema and metadata of the Parquet file.

§Returns

  • A ConnectorResult<Vec<usize>>, which contains the indices of the valid columns in the Parquet file schema that match the requested schema. If an error occurs during processing, it returns an appropriate error.