Assumptions (please correct if any are wrong)
- With
enable_stable_row_ids: true, stable row-ids are globally unique per dataset — no two fragments may claim the same id.
- A
merge_insert that moves updated rows to a new fragment (preserving their stable ids) is expected to remove those ids from the source fragment's row-id sequence, so fragment ranges never overlap.
scan().with_row_id() combined with a filter and existing deletions is expected to be safe.
Summary
On a stable-row-id dataset, a merge_insert that rewrites a fragment previously written by merge_insert produces fragments whose stable-row-id ranges overlap. The overlap is latent until a delete adds a deletion vector; after that, any filtered scan requesting row-ids fails:
- debug: panic in
RowIdIndex::new — rust/lance-table/src/rowids/index.rs:50 — assertion left == right failed: Wrong range
- release:
Invalid argument error: all columns in a record batch must have the same length — rust/lance-table/src/utils/stream.rs:331 (or a silently-wrong batch)
A full scan (no filter) works; only filter + with_row_id fails.
Versions
Reproduced on 7.0.0 and on v9.0.0-beta.5 (latest tag); the producing and failing code paths are unchanged between them and the program below compiles unmodified on both.
Minimal reproduction
Cargo.toml:
[dependencies]
lance = { git = "https://github.com/lance-format/lance", tag = "v9.0.0-beta.5" }
arrow-array = "58"
arrow-schema = "58"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
futures = "0.3"
src/main.rs:
use std::sync::Arc;
use arrow_array::{ArrayRef, RecordBatch, RecordBatchIterator, StringArray};
use arrow_schema::{DataType, Field, Schema};
use futures::TryStreamExt;
use lance::dataset::write::merge_insert::{MergeInsertBuilder, WhenMatched, WhenNotMatched};
use lance::dataset::{Dataset, WriteMode, WriteParams};
async fn merge(ds: Dataset, batch: RecordBatch, schema: Arc<Schema>) -> Dataset {
let mut b = MergeInsertBuilder::try_new(Arc::new(ds), vec!["slug".into()]).unwrap();
b.when_matched(WhenMatched::UpdateAll);
b.when_not_matched(WhenNotMatched::InsertAll);
let (ds, _) = b.try_build().unwrap()
.execute_reader(Box::new(RecordBatchIterator::new(vec![Ok(batch)], schema))).await.unwrap();
(*ds).clone()
}
#[tokio::main]
async fn main() {
let uri = "/tmp/lance-rowid-repro";
let _ = std::fs::remove_dir_all(uri);
let schema = Arc::new(Schema::new(vec![
Field::new("slug", DataType::Utf8, false),
Field::new("title", DataType::Utf8, false),
]));
let mk = |a: Vec<String>, b: Vec<String>| RecordBatch::try_new(schema.clone(),
vec![Arc::new(StringArray::from(a)) as ArrayRef, Arc::new(StringArray::from(b)) as ArrayRef]).unwrap();
// Empty dataset WITH stable row ids.
let params = WriteParams { mode: WriteMode::Create, enable_stable_row_ids: true, ..Default::default() };
let ds = Dataset::write(RecordBatchIterator::new(vec![Ok(mk(vec![], vec![]))], schema.clone()), uri, Some(params)).await.unwrap();
// Seed via merge_insert (40 rows), then merge_insert-UPDATE 15 of them (merge-on-merge).
let ds = merge(ds, mk((1..=40).map(|i| format!("t{i}")).collect(), (1..=40).map(|i| format!("r{i}")).collect()), schema.clone()).await;
let mut ds = merge(ds, mk((1..=15).map(|i| format!("t{i}")).collect(), (1..=15).map(|i| format!("e{i}")).collect()), schema.clone()).await;
// Delete one row (deletion vector), then a FILTERED scan that requests row ids.
let mut ds = (*ds.delete("slug = 't20'").await.unwrap().new_dataset).clone();
let mut scan = ds.scan();
scan.with_row_id();
scan.filter("slug = 't3'").unwrap();
let n: usize = scan.try_into_stream().await.unwrap().try_collect::<Vec<RecordBatch>>().await.unwrap()
.iter().map(|b| b.num_rows()).sum();
println!("filtered rows = {n} (expected 1)"); // never reached in debug: panics first
}
cargo run panics:
thread 'tokio-runtime-worker' panicked at rust/lance-table/src/rowids/index.rs:50:
assertion `left == right` failed: Wrong range for 3..=39, chunks:
[(3..=39, (RangeWithBitmap { range: 3..40, ... })),
(5..=5, (Range(5..6), ...)), (16..=16, ...), (17..=17, ...), ... ]
left: 37
right: 36
What is required to trigger it
- The seed must be written via
merge_insert (merge-on-merge). A native Dataset::write seed plus one merge does not reproduce — the trigger is merge_insert rewriting a fragment that was itself merge-written.
- The delete is required to surface it (it makes the live row count diverge from the row-id range span). Without it the overlapping ranges are tolerated.
- Only
filter + with_row_id fails; a full scan returns correct rows.
Mechanism
The merge-update moves the rewritten rows to a new fragment keeping their stable ids (3, 4, 5, …), while the source fragment's row-id sequence still spans the full range (0..=39). Two fragments then claim the same ids, and RowIdIndex::new's overlapping-chunk invariant (range span == sum of live chunk lengths) fails once the deletion makes the counts diverge (37 != 36 above).
Relationship to existing issues
Same family as #6877 (sequential merge_insert against previously-merge-written rows). #6965 (closes #6877) and #7429 fixed sibling symptoms (duplicate _rowid; intra-fragment RowAddrTreeMap overlap), but not this cross-fragment RowIdIndex overlap — it reproduces after both.
Impact
Any stable-row-id dataset updated via merge_insert and later subject to a delete breaks point-lookup-style filtered reads. In release the failure can also be a silently-incorrect batch rather than an error, since the debug_assert at rowids/index.rs:50 is compiled out.
Assumptions (please correct if any are wrong)
enable_stable_row_ids: true, stable row-ids are globally unique per dataset — no two fragments may claim the same id.merge_insertthat moves updated rows to a new fragment (preserving their stable ids) is expected to remove those ids from the source fragment's row-id sequence, so fragment ranges never overlap.scan().with_row_id()combined with a filter and existing deletions is expected to be safe.Summary
On a stable-row-id dataset, a
merge_insertthat rewrites a fragment previously written bymerge_insertproduces fragments whose stable-row-id ranges overlap. The overlap is latent until adeleteadds a deletion vector; after that, any filtered scan requesting row-ids fails:RowIdIndex::new—rust/lance-table/src/rowids/index.rs:50—assertion left == right failed: Wrong rangeInvalid argument error: all columns in a record batch must have the same length—rust/lance-table/src/utils/stream.rs:331(or a silently-wrong batch)A full scan (no filter) works; only
filter+with_row_idfails.Versions
Reproduced on 7.0.0 and on v9.0.0-beta.5 (latest tag); the producing and failing code paths are unchanged between them and the program below compiles unmodified on both.
Minimal reproduction
Cargo.toml:src/main.rs:cargo runpanics:What is required to trigger it
merge_insert(merge-on-merge). A nativeDataset::writeseed plus one merge does not reproduce — the trigger ismerge_insertrewriting a fragment that was itself merge-written.filter+with_row_idfails; a full scan returns correct rows.Mechanism
The merge-update moves the rewritten rows to a new fragment keeping their stable ids (
3, 4, 5, …), while the source fragment's row-id sequence still spans the full range (0..=39). Two fragments then claim the same ids, andRowIdIndex::new's overlapping-chunk invariant (range span == sum of live chunk lengths) fails once the deletion makes the counts diverge (37 != 36above).Relationship to existing issues
Same family as #6877 (sequential
merge_insertagainst previously-merge-written rows). #6965 (closes #6877) and #7429 fixed sibling symptoms (duplicate_rowid; intra-fragmentRowAddrTreeMapoverlap), but not this cross-fragmentRowIdIndexoverlap — it reproduces after both.Impact
Any stable-row-id dataset updated via
merge_insertand later subject to adeletebreaks point-lookup-style filtered reads. In release the failure can also be a silently-incorrect batch rather than an error, since thedebug_assertatrowids/index.rs:50is compiled out.