Expand description
Per-sink Iceberg V3 commit coordinator. This is a plain struct (no background task / mpsc): the
crate::manager::iceberg_v3_sink::IcebergV3SinkManager holds one per registered sink behind a
per-sink async mutex and calls IcebergV3Coordinator::pre_commit / IcebergV3Coordinator::commit
directly from the barrier-completion path. The barrier path already serializes pre-commit/commit per
epoch, so the coordinator never needs its own request queue.
IcebergV3Coordinator::init is synchronous with respect to registration: it loads the iceberg
catalog/table, reads pending_sink_state, and drains any recovered pending epoch via an iceberg
overwrite_files transaction BEFORE returning a ready coordinator. Only once init completes does the
sink start accepting live pre-commit/commit calls.
The two phases dispatched from complete_barrier:
pre_commitβ aggregate the reports, generate asnapshot_id, persist the merged file list underpending_sink_state, return. No iceberg I/O.commitβ run an icebergoverwrite_filestransaction (keyed on the pre-generatedsnapshot_idfor idempotency), then mark the row Committed and prune the prior epochβs row.
On retry-exhausted commit failure the error propagates to the caller; the barrier then fails and the
meta-recovery path drops the coordinator, re-registers it (re-running init), and retries from the
persisted pending_sink_state rows.
StructsΒ§
- Epoch
Commit π - One epochβs worth of pre-committed state queued inside the coordinator. Holds the decoded merged file
list and the pre-generated
snapshot_id. The blob form (PbIcebergV3PreCommitState) is only materialized when persisting topending_sink_state; in-memory we keep the structured form. - Iceberg
V3Agg πResult - Iceberg
V3Coordinator - Per-sink Iceberg V3 commit coordinator. Owns the loaded iceberg catalog/table (reused across commits)
and the meta SQL connection (used for
pending_sink_stateexactly-once persistence).
ConstantsΒ§
- INIT_
TIMEOUT π - Bound the init phase so a register call canβt hang forever if the iceberg endpoint is unreachable;
on timeout
initreturns an error and registration fails (the caller surfaces it / retries).
FunctionsΒ§
- aggregate_
reports π - align_
report_ πid - commit_
one_ πepoch - decode_
pre_ πcommit_ state - encode_
pre_ πcommit_ state - load_
catalog_ πand_ table - recovery π
- Read every persisted row for this sink, recovering
prev_committed_epochand pending commits.