Expand description
Column-aware row encoding is an encoding format which converts row into a binary form that
remains explanable after schema changes
Current design of flag just contains 1 meaningful information: the 2 LSBs represents
the size of offsets: u8/u16/u32
We have a Serializer and a Deserializer for each schema of Row, which can be reused
until schema changes
Modulesยง
- data_types ๐
- new_serde ๐
- Serialize and deserialize functions that recursively use ColumnAwareSerdefor nested fields of composite types.
Structsยง
- ColumnAware Serde 
- Combined column-aware SerializerandDeserializergiven the samecolumn_idsandschema
- Deserializer
- Column-Aware Deserializerholds neededColumnIdsand their corresponding schema Should non-null default values be specified, a new field could be added to Deserializer
- EncodedBytes ๐
- A view of the encoded bytes, which can be iterated over to get the column id and data. Used for deserialization.
- Header ๐
- Header (metadata) of the encoded row.
- RowEncoding ๐
- RowEncodingholds row-specific information for Column-Aware Encoding
- Serializer
- Column-Aware Serializerholds schema related information, and shall be created again once the schema changes
Enumsยง
- ColumnMapping ๐
- A mapping from column id to the index of the column in the schema.
- OffsetWidth ๐
- The width of the offset of the encoded data, i.e., how many bytes are used to represent the offset.
Constantsยง
- COLUMN_ON_ ๐STACK 
- When a row has columns no more than this number, we will use stack for some intermediate buffers.
Traitsยง
- Encode ๐
- A trait unifying ToDatumRefand already encoded bytes.
Functionsยง
- encode_column_ ๐ids 
- try_drop_ invalid_ columns 
- Deserializes row encoded_bytes, drops columns not invalid_column_ids, serializes and returns. If no column is dropped, returns None.
Type Aliasesยง
- EncodedColumn ๐Ids