# Quickstart

This walkthrough uses the example pipeline from the Astro repository. Adapt it for your own pipeline repository.

## 1. Create a pipeline

Create `pipeline.py` in your project directory:

```python
import pandera.polars as pa
import polars as pl

from astro import AstroFileSpec, Pipeline
from astro.pipeline import ExecutionMode, IngestFileSpec
from astro.pipeline.files import AstroFile
from astro.pipeline.steps import StepContext


class EstablishmentsFile(AstroFileSpec):
    ingest_name = "establishments"


def remove_closed(dataframe: pl.DataFrame) -> pl.DataFrame:
    return dataframe.filter(pl.col("EstablishmentName").str.contains("Closed"))


def step_copy_establishments(_ctx: StepContext, files: list[AstroFile]) -> None:
    file = files[0]
    file.save_to("processed", "establishments.parquet", file.load())


class ExamplePipeline(Pipeline):
    name = "example"
    execution_mode = ExecutionMode.SERIAL
    ingest_files = [
        IngestFileSpec(
            name="establishments",
            source_pattern="edubase*.csv",
            schema=pa.DataFrameSchema(
                {
                    "URN": pa.Column(str),
                    "EstablishmentName": pa.Column(str),
                },
                strict="filter",
            ),
        ),
    ]

    def configure_steps(self) -> None:
        self.add_filter("Remove closed establishments", remove_closed, [EstablishmentsFile()])
        self.add_step(
            "Copy establishments to processed",
            step_copy_establishments,
            [EstablishmentsFile()],
            depends_on=["remove-closed-establishments"],
        )


pipeline = ExamplePipeline()
```

The module **must** export a `pipeline` object.

## 2. Prepare source data

Create a directory with a CSV matching your ingest pattern:

```text
source/
  edubase_sample.csv
```

```csv
URN,EstablishmentName
100001,Open Example School
100002,Closed Example School
```

The source path passed to `astro ingest` must be a directory. The directory must contain exactly the CSV files expected by `ingest_files`; unexpected files and subdirectories fail validation.

## 3. Ingest source files

From the directory containing `pipeline.py`:

```bash
astro ingest path/to/source/
```

Astro will:

1. Validate the source directory contains exactly the expected files
2. Validate each CSV against its Pandera schema
3. Write Parquet files to `.working/{run_id}/ingested/`
4. Record statistics in `.astro/stats.db`

Large files show a progress bar during materialization.

## 4. Run the pipeline

```bash
astro run
```

By default, Astro shows a Rich dashboard with step progress and live logs. Use plain log output instead:

```bash
astro run --mode cli
```

## 5. Inspect the result

```text
.working/{run_id}/
  manifest.json
  ingested/
    establishments.parquet
  processed/
    establishments.parquet
  filtered/
    remove-closed-establishments/
      establishments.parquet
  astro.log
```

View the pipeline flow diagram:

```bash
astro describe
```

List stored runs:

```bash
astro list
```

The copied output contains the rows left after the filter step. The removed rows are kept under `filtered/` for audit.

## Next steps

- {doc}`../user-guide/pipelines` — pipeline configuration in depth
- {doc}`../user-guide/quarantine` — handle invalid rows without aborting
- {doc}`../user-guide/cli` — full CLI reference