Presentation Modes for Data Marts (Part 2)
Making history palatable for those who come after you
Welcome to the fifth issue of Model Your Reality, a newsletter with musings about data modeling, data warehousing and the like.
Until further notice, each issue will contain of two parts:
a list of data events that might be of interest for you (they definitely are of interest for me) and
some thoughts about a certain data topic (like data vault modeling patterns).
Let’s get started!
Data Events
Recent
In case you missed the Data Modeling Meetup with Michael Müller on early or late integration in data vault, you can watch the recording here.
And in case you missed the UK Data Vault User Group meeting with me, you can watch the recording here and download the slides here.
Upcoming
If you know about other relevant events in the near future, please mention them in the comments or send an email to admin@obaysch.net.
2022-03-07: The Modern Data Architect’s Dilemma with Stephen Dine (online)
2022-03-08: Data Modeling Meetup with Andrew Foad (online)
2022-03-24/25: Temporal Data in a Fast-Changing World, TEDAMOH (online)
2022-04-06/08: Data Vault Training (CDVDM), Genesee Academy (online)
2022-04-27: Data Modeling Meetup with Roelant Vos (online)
2022-05-19: DDVUG-Frühjahrstagung (Frankfurt am Main)
2022-06-01/03: Knowledge Gap data modeling & data architecture conference (online)
Presentation Modes for Data Marts
Introduction
While data vault is a great choice for the core layer of your data warehouse, a data vault model with its large number of objects isn’t necessarily the most accessible way of presenting data to human users (and other downstream consumers) in the presentation layer. Instead, you should consider the presentation modes outlined in this little series.
For each use case, try to pick the simplest presentation mode that meets your requirements. If current snapshots are all your consumers need, don’t give them bitemporal historization.
In Part 1, we discussed different modeling paradigms for the presentation layer. In Part 2, we will look at different historization approaches.
Historization Approaches
Theoretically, all the historization approaches described here (non-temporal, unitemporal, bitemporal, …) can be used in the presentation layer as well. Practically, you should try to stick to the most simple ones to make life easier for your consumers.
Simple Snapshots
In the large majority of cases, consumers are perfectly fine with a non-temporal approach. You just provide a current snapshot of the relevant data (as you think now that the data looks now) and refresh it periodically (for example, once or twice every day). This approach works both with a flat table and with a dimensional model. In dimensional terms, it is equivalent to using slowly changing dimensions of type 1 (SCD1).
If there is a requirement to recreate past reports, you might want to consider providing a history of snapshots. This means that you produce current snapshots but instead of throwing away previous snapshots, you keep all or some of them (for example, for each 1st day of the month to allow for monthly reporting on stable data). This approach has become more popular with cheaper storage and works both with a flat table and with a dimensional model.
History of Changes
If your consumers are interested in seeing change over time but don’t necessarily have to recreate the exact numbers of past reports, it might make sense to provide a current history of changes. This means that you produce a history of changes along a timeline that is meaningful your consumers. This can be the actual business change time or the source system change entry time; in general, it is usually not the data warehouse load time that you use to historize your data vault. For the different kinds of time, see here (or if you want to really dive into all possible kinds, here).
The current-history-of-changes approach works with both regular and temporal dimensional models but not with flat tables. In dimensional terms, it is equivalent to using slowly changing dimensions of type 2 (SCD2).
The most challenging part of this approach is dealing with corrections and late-arriving data in general because this means you have to destructively update your chosen timeline (if you don’t just use the data warehouse load time).
If compute resources permit, the easiest way around this challenge might be to just recreate the complete history every time (either by reloading physical tables or in view form). If this is not an option, you have to determine the impact of late-arriving data and perform the respective updates on the existing data. This tends to be easier with temporal dimensions because here, you usually only have to update the mutable part of the dimensions but not the dimension references in the fact tables.
If Possible, Avoid Bitemporality
While theoretically interesting, going bitemporal in the presentation layer is usually not a good idea.
Many people get confused when having to deal with two different timelines and there is a hardly a requirement can’t be dealt with using one of the other historization approaches. If you want to do it anyway, it might be better to do it with temporal dimensions to avoid complex bitemporal dimension references in your fact tables. Don’t even think about building bitemporal flat tables.
Outlook
In Parts 1 and 2, we discussed how to structure and historize presentation layer data. In one of the next issues of Model Your Reality, we’ll look at the actual content of your presentation layer (and establish the distinction between raw marts and information marts) in the final part of this little series.