Statistics

Pipeline runs record numeric statistics scoped to a run, file, or step, keyed by an action name.

Each update:

  1. Logs an INFO line to astro.stats (visible in console, dashboard, and astro.log)

  2. Upserts the value in SQLite (later calls replace the prior value for the same scope/subject/action)

Log format:

STAT run=abc12 scope=step subject=validate action=rows_quarantined value=3

Run-scoped statistics use subject=- in log output.

Step API

Each StepContext exposes a stats recorder:

def step_transform(ctx: StepContext, files: list[AstroFile]) -> None:
    file = files[0]
    dataframe = file.load()
    ctx.stats.record_file(file.spec.__class__.ingest_name, "rows_read", dataframe.height)
    ctx.stats.record_run("custom_counter", 1)
    ctx.stats.record_step("rows_written", dataframe.height)
    file.save_in_place(dataframe)

Method

Scope

ctx.stats.record_run(action, value)

Run

ctx.stats.record_file(file_name, action, value)

File (ingest name)

ctx.stats.record_step(action, value)

Current step

StatisticsRecorder and StatScope are also exported from astro for use outside step functions.

Built-in statistics

Phase

Scope

Action

When recorded

Ingest

file

row_count, column_count, source_size_bytes

After each file materializes

Ingest

run

files_ingested

After successful ingest

Ingest

run

ingest_failed

On ingest failure

Run

step

duration_ms

After each step executes

Run

step

rows_quarantined

When a step quarantines rows

Run

file

rows_filtered, rows_kept

After a filter step processes a file

Run

step

rows_filtered

After a filter step (total removed)

Run

run

steps_completed

When run finishes

Run

run

duration_ms

When run finishes

Run

run

steps_quarantined

When run finishes with quarantined steps

SQLite storage

Statistics are stored in {pipeline_dir}/.astro/stats.db:

  • runs: run_id, pipeline_name, status, source_directory, created_at, ingested_at

  • ingest_files: per-file row/column counts, source path, parquet path, source size

  • statistics: generic metrics with run_id, scope, subject, action, value, recorded_at

See Working directory layout for the full directory layout.