Expand description
Column-aware row encoding is an encoding format which converts row into a binary form that
remains explanable after schema changes
Current design of flag just contains 1 meaningful information: the 2 LSBs represents
the size of offsets: u8/u16/u32
We have a Serializer and a Deserializer for each schema of Row, which can be reused
until schema changes
Modulesยง
- data_
types ๐ - new_
serde ๐ - Serialize and deserialize functions that recursively use
ColumnAwareSerdefor nested fields of composite types.
Structsยง
- Column
Aware Serde - Combined column-aware
SerializerandDeserializergiven the samecolumn_idsandschema - Deserializer
- Column-Aware
Deserializerholds neededColumnIdsand their corresponding schema Should non-null default values be specified, a new field could be added to Deserializer - Encoded
Bytes ๐ - A view of the encoded bytes, which can be iterated over to get the column id and data. Used for deserialization.
- Header ๐
- Header (metadata) of the encoded row.
- RowEncoding ๐
RowEncodingholds row-specific information for Column-Aware Encoding- Serializer
- Column-Aware
Serializerholds schema related information, and shall be created again once the schema changes
Enumsยง
- Column
Mapping ๐ - A mapping from column id to the index of the column in the schema.
- Offset
Width ๐ - The width of the offset of the encoded data, i.e., how many bytes are used to represent the offset.
Constantsยง
- COLUMN_
ON_ ๐STACK - When a row has columns no more than this number, we will use stack for some intermediate buffers.
Traitsยง
- Encode ๐
- A trait unifying
ToDatumRefand already encoded bytes.
Functionsยง
- encode_
column_ ๐ids - try_
drop_ invalid_ columns - Deserializes row
encoded_bytes, drops columns not invalid_column_ids, serializes and returns. If no column is dropped, returns None.
Type Aliasesยง
- Encoded
Column ๐Ids