# Row quarantine Steps may quarantine individual rows that fail business rules without aborting the whole step. Quarantined rows are persisted under the run directory and recorded in `manifest.json` step state. ## Step API Each `StepContext` exposes a `quarantine` collector: ```python def step_validate(_ctx: StepContext, files: list[AstroFile]) -> None: file = files[0] df = file.load() bad = df.filter(pl.col("score") < 0) good = df.filter(pl.col("score") >= 0) if not bad.is_empty(): _ctx.quarantine.quarantine_rows(file, bad, reason="negative score") file.save_in_place(good) ``` | Method | Purpose | |--------|---------| | `ctx.quarantine.quarantine_rows(file, rows, reason=...)` | Append rows to a new quarantine part file | | `ctx.quarantine.quarantine_row(file, row, reason=...)` | Convenience for a single row | - `reason` must be non-empty - `rows` must not be empty - Step authors must exclude quarantined rows from saved output Quarantine Parquet rows use the source file schema plus `_astro_quarantine_reason: str`. ## Run directory layout ```text .working/{run_id}/ ingested/ processed/ … snapshots/{step_id}/{ingest_name}.parquet quarantine/{step_id}/{ingest_name}.part-00001.parquet manifest.json ``` `snapshots/` captures the input `active_path` at step start. Quarantine uses part files (no read-merge-rewrite on append). ## Step and run status | Event | Step status | Run status | Pipeline action | |-------|-------------|------------|-----------------| | Step finishes with quarantined rows | `quarantined` | (unchanged until end) | Continue to next step | | Step depends on a quarantined step | `blocked` | `failed` | Stop run | | All runnable steps done, some quarantined | mixed | `quarantined` | Stop run (retryable) | | All steps complete, no quarantine | `complete` | `completed` | Done | | Hard exception in step | `failed` | `failed` | Stop run | `manifest.json` stores per-step records in `step_states`: `step_id`, `status`, and optional `detail`. ## Retry Re-run `astro run` against a `quarantined` run, or a `failed` run that has quarantined steps. For each quarantined step only: 1. Truncate the step quarantine file(s) 2. Merge snapshot input with quarantined rows back into the file's `active_path` 3. Re-run that step Previously completed steps are skipped. Previously blocked or pending dependent steps run once their dependencies are `complete`. ## Next steps - {doc}`running` — run resolution and display modes - {doc}`statistics` — `rows_quarantined` metric