Data Vault is a method and architecture for delivering a Data Analytics Service to an enterprise supporting its Business Intelligence, Data Warehousing, Analytics and Data Science requirements. At the core it is a modern, agile way of designing and building efficient, effective Data Warehouses.

One of the main driving factors behind using Data Vault is for both audit and historical tracking purposes. If none of these are important to you or your organization, it can be difficult to eat the overhead required to introduce another layer into your modeling.

Data Vault is an innovative data modelling methodology for large scale Data Warehouse platforms. Invented by Dan Linstedt, Data Vault is designed to deliver an Enterprise Data Warehouse while addressing the drawbacks of the normalized (3rd normal form), and Dimensional Modelling techniques.

A Data Vault model is a detail-oriented, historical tracking, and uniquely linked set of normalized tables that support one or more functional areas of business.

Hubs are the core of any DV design. If done properly, Hubs are what allow you to integrate multiple source systems in your data warehouse. To do that, they must be source system agnostic. That means they must be based on true Business Keys (or meaningful natural keys) that are not tied to any one source system.

Hubs and Links form the backbone of a Data Vault schema. Records in Hub and Link tables can be created and read, but they are not updated or deleted.

Data vault provides the most benefits when your data comes from many source systems or has constantly changing relationships. Data vault works well for systems with these characteristics because it makes adding attributes simple. Reason 2: You need to be able to easily track and audit your data.

Data Vault modeling is not a replacement for dimensional modeling, which is an industry standard for defining the data mart (the layer used to present the data to the end-user). Because the book is meant to cover the whole process of building a data warehouse end-to-end, it also covers dimensional modeling.

What is a Raw Vault? It is what I described in my last post – it is the raw, unfiltered data from the source, loaded into Hubs, Links, and Satellites based on Business Keys.

As usual, a raw (operational) Data Vault processed first and the ETL processes populating this area identifies new business keys and assigns surrogate keys for these newly discovered business keys. Data captured in the operational Data Vault remains in its pristine, raw form.

Data Vault is an architectural approach that includes a specific data model design pattern and methodology developed specifically to support a modern, agile approach to building an enterprise data warehouse and analytics repository. Snowflake Cloud Data Platform was built to be design pattern agnostic.