Introduction

Astro helps you build reliable CSV import pipelines in Python. You define a pipeline in an external repository, ingest source files from the command line, and run ordered processing steps with built-in validation, statistics, filtering, and quarantine support.

What Astro provides

  • CLI control — run and manage pipelines from the command line

  • Library — define pipelines by importing Astro in your own repository

  • External pipelines — each pipeline lives in its own repo with a pipeline.py file

  • Folder ingestion — ingest a source directory containing one or more CSV files with heterogeneous schemas

  • Persistent statistics — store pipeline run statistics locally in SQLite

Typical workflow

  1. Create a pipeline.py that declares ingest files, Pandera schemas, and run steps.

  2. Run astro ingest path/to/source/ to validate CSVs and write Parquet snapshots.

  3. Run astro run to execute registered pipeline steps.

  4. Inspect logs, statistics, and quarantine files under .working/{run_id}/.

Tech stack

Concern

Choice

Language

Python 3.11+

DataFrame

Polars

Schema validation

Pydantic

Data validation

Pandera

CLI

Typer

Local storage

SQLite (.astro/stats.db)

Next steps

Security model

Astro discovers and executes pipeline.py from the directory you pass to -C / --pipeline-dir (default: current directory). That module is arbitrary Python running as your user, with the same file and network access as any other Python process you start. Only run Astro against pipeline repositories you trust.