Beginner

Datasets & Schemas

Learn how Foundry stores data in versioned, immutable datasets and why schemas enforce structure and trust.

⏱ 14 min read📖 3 chapters+150 XP

1 · Datasets as Foundry's Storage Primitive

Everything in Foundry flows through datasets. Whether you're ingesting a CSV, writing a Python transform, or configuring a sync, the output is always a dataset — an immutable, versioned collection of structured rows. Datasets live inside projects in the Compass file system. You can think of Compass as a file explorer and each dataset as a file, but with full lineage, permissions, and history baked in.

2 · Schema Enforcement

A dataset's schema defines the column names and their data types — strings, integers, dates, arrays, structs, and more. Foundry enforces this schema at write time, which means broken data cannot silently appear. You can evolve schemas over time (add columns, widen types) through governed migrations, but you cannot accidentally change a column's meaning without Foundry flagging the breaking change.

3 · Transactions and Immutability

Each pipeline build creates a new transaction on the dataset. Transactions are either SNAPSHOT (full replacement) or APPEND (incremental add). Because transactions are immutable, you get a full audit trail of who wrote what and when — and you can roll back to any previous transaction if needed.

✅ Key Takeaways

Datasets are Foundry's universal storage unit — versioned and immutable.
Schemas enforce column names and types so broken data cannot sneak in.
Transactions give you a full audit trail with the ability to roll back.
Datasets live in Compass, Foundry's project-based file system.

1 · Datasets as Foundry's Storage Primitive

2 · Schema Enforcement

3 · Transactions and Immutability

✅ Key Takeaways

🧠 Knowledge Check