Datasets & Schemas
Module Summary
Learn how Foundry stores data in versioned, immutable datasets and why schemas enforce structure and trust.
Datasets as Foundry's Storage Primitive
Everything in Foundry flows through datasets. Whether you're ingesting a CSV, writing a Python transform, or configuring a sync, the output is always a dataset — an immutable, versioned collection of structured rows.
Datasets live inside projects in the Compass file system. You can think of Compass as a file explorer and each dataset as a file, but with full lineage, permissions, and history baked in.
Schema Enforcement
A dataset's schema defines the column names and their data types — strings, integers, dates, arrays, structs, and more. Foundry enforces this schema at write time, which means broken data cannot silently appear.
You can evolve schemas over time (add columns, widen types) through governed migrations, but you cannot accidentally change a column's meaning without Foundry flagging the breaking change.
Transactions and Immutability
Each pipeline build creates a new transaction on the dataset. Transactions are either SNAPSHOT (full replacement) or APPEND (incremental add). Because transactions are immutable, you get a full audit trail of who wrote what and when — and you can roll back to any previous transaction if needed.
Key Takeaways
- Datasets are Foundry's universal storage unit — versioned and immutable.
- Schemas enforce column names and types so broken data cannot sneak in.
- Transactions give you a full audit trail with the ability to roll back.
- Datasets live in Compass, Foundry's project-based file system.
The Ontology
Object Types