Working directory layout¶
Astro stores run data and statistics relative to your pipeline directory.
Pipeline directory¶
pipeline_dir/
pipeline.py
.astro/
stats.db # SQLite statistics
.persistent/
{name}.parquet # Canonical ID resolver stores
.working/
{run_id}/ # One directory per run
Run directory¶
.working/{run_id}/
manifest.json # Run metadata and step states
astro.log # Run-scoped log file
ingested/
{ingest_name}.parquet
processed/ # Step outputs (author-defined subfolders)
filtered/
{step_id}/
{ingest_name}.parquet
snapshots/
{step_id}/
{ingest_name}.parquet
quarantine/
{step_id}/
{ingest_name}.part-00001.parquet
manifest.json¶
Tracks run status (ingested, completed, quarantined, failed), ingested file records, and per-step state in step_states.
Each step record includes step_id, status (pending, complete, quarantined, failed, blocked), and optional detail.
Run IDs¶
Run IDs are 5-character lowercase alphanumeric strings assigned at ingest time.
Statistics database¶
PipelineStore persists statistics at .astro/stats.db:
runs— run metadataingest_files— per-file ingest recordsstatistics— scoped metrics (run, file, step)
See Statistics for the statistics API and built-in metrics.
Cleanup¶
Use astro cleanup to remove completed or failed run directories under .working/ and delete their statistics records. Pass --all to also clear .astro/stats.db and .persistent/. See CLI reference for --dry-run and confirmation options.
Next steps¶
Ingest — what happens during ingest
Row quarantine — snapshot and quarantine paths