Expand description
Column-aware row encoding is an encoding format which converts row into a binary form that
remains explanable after schema changes
Current design of flag just contains 1 meaningful information: the 2 LSBs represents
the size of offsets: u8
/u16
/u32
We have a Serializer
and a Deserializer
for each schema of Row
, which can be reused
until schema changes
Modulesยง
- data_
types ๐ - new_
serde ๐ - Serialize and deserialize functions that recursively use
ColumnAwareSerde
for nested fields of composite types.
Structsยง
- Column
Aware Serde - Combined column-aware
Serializer
andDeserializer
given the samecolumn_ids
andschema
- Deserializer
- Column-Aware
Deserializer
holds neededColumnIds
and their corresponding schema Should non-null default values be specified, a new field could be added to Deserializer - Encoded
Bytes ๐ - A view of the encoded bytes, which can be iterated over to get the column id and data. Used for deserialization.
- Header ๐
- Header (metadata) of the encoded row.
- RowEncoding ๐
RowEncoding
holds row-specific information for Column-Aware Encoding- Serializer
- Column-Aware
Serializer
holds schema related information, and shall be created again once the schema changes
Enumsยง
- Column
Mapping ๐ - A mapping from column id to the index of the column in the schema.
- Offset
Width ๐ - The width of the offset of the encoded data, i.e., how many bytes are used to represent the offset.
Constantsยง
- COLUMN_
ON_ ๐STACK - When a row has columns no more than this number, we will use stack for some intermediate buffers.
Traitsยง
- Encode ๐
- A trait unifying
ToDatumRef
and already encoded bytes.
Functionsยง
- encode_
column_ ๐ids - try_
drop_ invalid_ columns - Deserializes row
encoded_bytes
, drops columns not invalid_column_ids
, serializes and returns. If no column is dropped, returns None.
Type Aliasesยง
- Encoded
Column ๐Ids