Working directory layout¶

Astro stores run data and statistics relative to your pipeline directory.

Pipeline directory¶

pipeline_dir/
  pipeline.py
  .astro/
    stats.db              # SQLite statistics
  .persistent/
    {name}.parquet        # Canonical ID resolver stores
  .working/
    {run_id}/             # One directory per run

Run directory¶

.working/{run_id}/
  manifest.json           # Run metadata and step states
  astro.log               # Run-scoped log file
  ingested/
    {ingest_name}.parquet
  processed/              # Step outputs (author-defined subfolders)
  filtered/
    {step_id}/
      {ingest_name}.parquet
  snapshots/
    {step_id}/
      {ingest_name}.parquet
  quarantine/
    {step_id}/
      {ingest_name}.part-00001.parquet

manifest.json¶

Tracks run status (ingested, completed, quarantined, failed), ingested file records, and per-step state in step_states.

Each step record includes step_id, status (pending, complete, quarantined, failed, blocked), and optional detail.

Run IDs¶

Run IDs are 5-character lowercase alphanumeric strings assigned at ingest time.

Statistics database¶

PipelineStore persists statistics at .astro/stats.db:

runs — run metadata
ingest_files — per-file ingest records
statistics — scoped metrics (run, file, step)

See Statistics for the statistics API and built-in metrics.

Cleanup¶

Use astro cleanup to remove completed or failed run directories under .working/ and delete their statistics records. Pass --all to also clear .astro/stats.db and .persistent/. See CLI reference for --dry-run and confirmation options.

Next steps¶

Ingest — what happens during ingest
Row quarantine — snapshot and quarantine paths