Row quarantine¶
Steps may quarantine individual rows that fail business rules without aborting the whole step. Quarantined rows are persisted under the run directory and recorded in manifest.json step state.
Step API¶
Each StepContext exposes a quarantine collector:
def step_validate(_ctx: StepContext, files: list[AstroFile]) -> None:
file = files[0]
df = file.load()
bad = df.filter(pl.col("score") < 0)
good = df.filter(pl.col("score") >= 0)
if not bad.is_empty():
_ctx.quarantine.quarantine_rows(file, bad, reason="negative score")
file.save_in_place(good)
Method |
Purpose |
|---|---|
|
Append rows to a new quarantine part file |
|
Convenience for a single row |
reasonmust be non-emptyrowsmust not be emptyStep authors must exclude quarantined rows from saved output
Quarantine Parquet rows use the source file schema plus _astro_quarantine_reason: str.
Run directory layout¶
.working/{run_id}/
ingested/
processed/ …
snapshots/{step_id}/{ingest_name}.parquet
quarantine/{step_id}/{ingest_name}.part-00001.parquet
manifest.json
snapshots/ captures the input active_path at step start. Quarantine uses part files (no read-merge-rewrite on append).
Step and run status¶
Event |
Step status |
Run status |
Pipeline action |
|---|---|---|---|
Step finishes with quarantined rows |
|
(unchanged until end) |
Continue to next step |
Step depends on a quarantined step |
|
|
Stop run |
All runnable steps done, some quarantined |
mixed |
|
Stop run (retryable) |
All steps complete, no quarantine |
|
|
Done |
Hard exception in step |
|
|
Stop run |
manifest.json stores per-step records in step_states: step_id, status, and optional detail.
Retry¶
Re-run astro run against a quarantined run, or a failed run that has quarantined steps. For each quarantined step only:
Truncate the step quarantine file(s)
Merge snapshot input with quarantined rows back into the file’s
active_pathRe-run that step
Previously completed steps are skipped. Previously blocked or pending dependent steps run once their dependencies are complete.
Next steps¶
Running pipelines — run resolution and display modes
Statistics —
rows_quarantinedmetric