Module column_aware_row_encoding

Source
Expand description

Column-aware row encoding is an encoding format which converts row into a binary form that remains explanable after schema changes Current design of flag just contains 1 meaningful information: the 2 LSBs represents the size of offsets: u8/u16/u32 We have a Serializer and a Deserializer for each schema of Row, which can be reused until schema changes

Modulesยง

data_types ๐Ÿ”’
new_serde ๐Ÿ”’
Serialize and deserialize functions that recursively use ColumnAwareSerde for nested fields of composite types.

Structsยง

ColumnAwareSerde
Combined column-aware Serializer and Deserializer given the same column_ids and schema
Deserializer
Column-Aware Deserializer holds needed ColumnIds and their corresponding schema Should non-null default values be specified, a new field could be added to Deserializer
EncodedBytes ๐Ÿ”’
A view of the encoded bytes, which can be iterated over to get the column id and data. Used for deserialization.
Header ๐Ÿ”’
Header (metadata) of the encoded row.
RowEncoding ๐Ÿ”’
RowEncoding holds row-specific information for Column-Aware Encoding
Serializer
Column-Aware Serializer holds schema related information, and shall be created again once the schema changes

Enumsยง

ColumnMapping ๐Ÿ”’
A mapping from column id to the index of the column in the schema.
OffsetWidth ๐Ÿ”’
The width of the offset of the encoded data, i.e., how many bytes are used to represent the offset.

Constantsยง

COLUMN_ON_STACK ๐Ÿ”’
When a row has columns no more than this number, we will use stack for some intermediate buffers.

Traitsยง

Encode ๐Ÿ”’
A trait unifying ToDatumRef and already encoded bytes.

Functionsยง

encode_column_ids ๐Ÿ”’
try_drop_invalid_columns
Deserializes row encoded_bytes, drops columns not in valid_column_ids, serializes and returns. If no column is dropped, returns None.

Type Aliasesยง

EncodedColumnIds ๐Ÿ”’