Intermediate

Transforms Overview

Module Summary

Understand what Transforms are and how they declaratively convert input datasets into output datasets.

Declarative Data Transformation

In Foundry, a Transform is a function that takes one or more input datasets and produces an output dataset. You write the logic (in Python, SQL, or Java), and Foundry handles dependency resolution, scheduling, and parallelism. This declarative model means you never write "run this script at 3 AM". Instead, you say "this output depends on these inputs" and Foundry builds the execution graph.

Python Transforms

Python transforms use the transforms library. A typical transform looks like:
from transforms.api import transform_df, Input, Output

@transform_df(
    Output("/path/to/output"),
    source=Input("/path/to/input"),
)
def compute(source):
    return source.filter(source.status == "active")
You get the full power of PySpark with Foundry's dependency management layered on top.

SQL Transforms

For simpler logic, SQL transforms let you write standard SQL. Foundry compiles the SQL into a Spark execution plan, so you get the same distributed processing under the hood. SQL transforms are particularly popular with analysts who are already fluent in SQL and want to contribute to pipelines without learning Python.

Key Takeaways

  • Transforms declaratively convert input datasets into output datasets.
  • You write logic; Foundry handles scheduling, dependencies, and parallelism.
  • Python transforms use PySpark; SQL transforms compile to Spark under the hood.
  • The declarative model eliminates manual orchestration scripts.