Snowflake has moved Horizon well past a simple catalog. Horizon now provides semantic context, governance, and policy across the data estate: a semantic layer that lets an agent read business meaning instead of raw column names, classification and access policy, and end to end lineage. Snowflake and Databricks compete over this ground, each arguing that its platform is the better place for agents to find, understand, and be governed on enterprise data. For the question they answer, both have strong claims.
That question is what the data means, across the estate. A different question sits one layer over and goes unanswered by either platform. An agent produced a result; you need to know which data state it ran on, and whether you can get that result again.
Horizon tells an agent what the data means. Syntitan proves the data state an AI run can reproduce.
Meaning and reproducibility are different layers, and a semantic layer does not imply the second. Horizon can tell an agent that a column is revenue, that the customer dimension joins here, that a field came from a given source. That is semantic context across the estate, and it is useful. It describes what the data means in general. It does not capture and version the specific data state a particular run executed on, hold that state so it can be compared against a later one, or replay it to reproduce the earlier result. Knowing what the data means and reproducing a run are two distinct guarantees.
Where the line falls
The gap shows up the moment a result moves. A model ran last week and gave one answer. It runs this week on a refreshed table and gives another. Horizon can explain what each field means and trace its lineage. It does not, on its own, tell you which data state the working run depended on, what changed between then and now, or how to reproduce the earlier result.
Both platforms have pieces that look adjacent to this, and it is worth being precise about them. Snowflake offers Time Travel and zero-copy clone. Databricks has Delta time travel and, through MLflow, experiment tracking. Those are real, and they solve their own problems. Time travel rolls a table back to a past point. Experiment tracking records the parameters, metrics, and artifacts of a training run. Neither platform is built primarily around capturing, versioning, binding, diffing, and replaying the exact data state behind a specific AI run. Storage-level time travel and experiment-level tracking sit on different layers from run-level reproduction. Syntitan captures that state, versions it, diffs it against the current one, and replays it.
| Snowflake Horizon | Syntitan | |
|---|---|---|
| The question | What does this data mean? | Can this AI result be reproduced? |
| For an agent | Reads business meaning across the estate | Runs on a bound, releasable data state |
| Lineage is for | Understanding and audit | Reproducing a specific run |
| When a result changes | Explains field meaning and lineage | Diffs the state and re-runs the prior one |
| Scope | The data estate and its semantics | A single AI run’s data state |
Sitting on top, not against

Syntitan does not replace a data cloud, and it does not ask you to choose between Snowflake and Databricks. It reads a versioned, fixed data state and makes it something a model or an agent can execute on, trace, and reproduce. That works above whatever stores and serves the data underneath. A team can run Snowflake Horizon for semantic meaning across its estate and add Syntitan for the reproducibility of a specific run. The two cover different ground. The second matters once a model is in production and a result has to hold.
The shorter version
Horizon makes the meaning of enterprise data legible to agents across the estate, and it is strong at that. Syntitan captures and versions the data state behind an AI run, so the result can be compared, replayed, and reproduced when the data moves. Both describe themselves with the language of AI readiness, for different readings of the word. The reading that decides whether a model holds once it ships is reproduction, and most teams running models in production will want their data both governed and reproducible.