Learning Path/Building Pipelines/Data Health & Scheduling
0%
Intermediate

Data Health & Scheduling

Set up build schedules, health checks, and expectations so pipelines run reliably without manual intervention.

1 ยท Build Schedules

You can schedule datasets to build on a time-based cadence (hourly, daily, cron) or trigger-based (when an upstream input updates). Foundry's scheduler respects the dependency graph โ€” it won't build a transform until all its inputs are fresh. Combine both modes: schedule the top of the pipeline on a cron job and let downstream datasets build automatically when their inputs complete.

2 ยท Expectations and Data Health Checks

Expectations are rules you attach to a dataset: - Row count must be greater than 0. - Column email must match a regex pattern. - Null rate on revenue must be below 1%. - Dataset must be no more than 24 hours stale. After each build, Foundry evaluates these expectations. If any fail, the dataset is flagged unhealthy and alerts are sent.

3 ยท Alerting and SLAs

Data Health integrates with alerting: you can send Slack messages, emails, or PagerDuty incidents when a dataset goes unhealthy. Combine this with SLA monitoring to define data freshness guarantees โ€” if the 8 AM dashboard data isn't ready by 7:55 AM, the on-call engineer gets paged.

โœ… Key Takeaways

  • Schedules can be time-based, trigger-based, or a combination of both.
  • Expectations are declarative data quality rules evaluated after each build.
  • Unhealthy datasets trigger alerts via Slack, email, or PagerDuty.
  • SLA monitoring ensures data freshness guarantees are met.

๐Ÿง  Knowledge Check