Welcome to the sixth issue of Model Your Reality, a newsletter with musings about data modeling, data warehousing and the like.
Until further notice, each issue will contain of two parts:
a list of data events that might be of interest for you (they definitely are of interest for me) and
some thoughts about a certain data topic (like data vault modeling patterns).
Let’s get started!
Data Events
Recent
In case you missed the Data Modeling Meetup with Andrew Foad on the HOOK approach to data warehousing, you can watch the recording here.
Upcoming
If you know about other relevant events in the near future, please mention them in the comments or send an email to admin@obaysch.net.
2022-04-05: Data Modeling Meetup with Ronald G. Ross (online)
2022-04-06/08: Data Vault Training (CDVDM), Genesee Academy (online)
2022-04-19: CDVDM Recertification Class, Genesee Academy (online)
2022-04-27: Data Modeling Meetup with Roelant Vos (online)
2022-05-18: UK Data Vault User Group with Dirk Lerner (online)
2022-05-19: DDVUG-Frühjahrstagung (Frankfurt am Main)
2022-06-01/03: Knowledge Gap data modeling & data architecture conference (online)
Presentation Modes for Data Marts
Introduction
While data vault is a great choice for the core layer of your data warehouse, a data vault model with its large number of objects isn’t necessarily the most accessible way of presenting data to human users (and other downstream consumers) in the presentation layer.
Instead, you should consider the presentation modes outlined in this little series. For each use case, try to pick the simplest presentation mode that meets your requirements.
In part 1, we discussed different modeling paradigms for the presentation layer and in part 2, we looked at different historization approaches. So, we have already looked at how to structure and historize presentation layer data. But of course it’s also important which data you actually present.
Kinds of Marts
Traditionally, the catch-all term for a part of the presentation layer that covers a certain subject area or departmental perspective has been data mart. More recently, a distinction has been established between two kinds of data marts, raw marts and information marts.
How Raw Is Too Raw?
As the name suggests, a raw mart contains raw source data that just has been restructured for easier consumption (e. g., as a dimensional model or flat table). When building a raw mart, you skip most of the value-adding activities a data warehouse can provide and therefore can offer consumers at least a rough foundation for reporting and analytics very quickly.
This can buy you some time for value-adding activities and might even yield some insights to add even more value. But there are potential downsides as well: If the raw data is full of inconsistencies, people might blame the data warehouse for them. And if the raw data is perfectly fine, people might even ask questions about the necessity of building a data warehouse in the first place.
Too Much Information?
An information mart, on the other hand, contains cleansed, refined and harmonized data. Different source systems have been reconciled, important metrics have been pre-calculated and the most egregiously bad data has been filtered out. A well-maintained information mart is an opportunity for your data warehouse to really show its strengths and give data warehousing a good name inside your organization.
However, all this value-adding work takes time and effort that have to be paid for. There is something of a law of diminishing returns at work here, too: After some initial big wins, you can get stuck building complex logic to plaster over edge cases of minor importance that should be fixed in the source system (or just left alone to focus on more important issues).
Conclusion
In summary, when building your presentation layer, don’t let the perfect be the enemy of the good.
The data warehouse cannot solve all integration and data quality challenges, at least not with finite time and resources. These challenges are often the result of business process or organization structure deficiencies that usually aren’t yours to tackle.
Do what you can to make life easier for your consumers but be aware of the limits of what you can achieve with mere data warehousing. This doesn’t mean that you should be content with historizing and restructuring source system tables and give up on value-adding activities completely. Just don’t try to fix everything.
Outlook
After this little series, you should have a good grip on what the available options for the presentation layer of your data warehouse are.
The consumers of data from a data warehouse usually expect the data in the presentation layer to be clean, complete and consistent (i. e., more information mart than raw mart). Unfortunately, this expectation is usually somewhat at odds with the data quality found in the operational systems that feed the data warehouse.
In some of the next issues of Model Your Reality, we’ll look at various kinds of data quality, integration and error handling issues that you might have to tackle on the way to a presentable presentation layer.