# Introduction Astro helps you build reliable CSV import pipelines in Python. You define a pipeline in an external repository, ingest source files from the command line, and run ordered processing steps with built-in validation, statistics, filtering, and quarantine support. ## What Astro provides - **CLI control** — run and manage pipelines from the command line - **Library** — define pipelines by importing Astro in your own repository - **External pipelines** — each pipeline lives in its own repo with a `pipeline.py` file - **Folder ingestion** — ingest a source directory containing one or more CSV files with heterogeneous schemas - **Persistent statistics** — store pipeline run statistics locally in SQLite ## Typical workflow 1. Create a `pipeline.py` that declares ingest files, Pandera schemas, and run steps. 2. Run `astro ingest path/to/source/` to validate CSVs and write Parquet snapshots. 3. Run `astro run` to execute registered pipeline steps. 4. Inspect logs, statistics, and quarantine files under `.working/{run_id}/`. ## Tech stack | Concern | Choice | |---------|--------| | Language | Python 3.11+ | | DataFrame | Polars | | Schema validation | Pydantic | | Data validation | Pandera | | CLI | Typer | | Local storage | SQLite (`.astro/stats.db`) | ## Next steps - {doc}`installation` — set up Astro locally - {doc}`quickstart` — walk through a complete ingest and run - {doc}`../user-guide/pipelines` — learn the pipeline contract ## Security model Astro discovers and executes `pipeline.py` from the directory you pass to `-C` / `--pipeline-dir` (default: current directory). That module is arbitrary Python running as your user, with the same file and network access as any other Python process you start. **Only run Astro against pipeline repositories you trust.**