Welcome to Model Your Reality, a newsletter with musings about data structure design, data warehousing and the like.
Until further notice, each issue will contain of two parts:
a list of data events that might be of interest for you (they definitely are of interest for me) and
some thoughts about a certain data topic (like data vault modeling patterns).
Let’s get started!
Data Events
Recent
I’ve uploaded the recording from the October Data Modeling Meetup on data products and data marts with Matthew Darwin here.
Upcoming
If you know about other relevant events in the near future, please mention them in the comments or send an email to admin@obaysch.net. Online events preferred.
2022-12-05/06/07: Data Vault Training (CDVDM), Genesee Academy (Stockholm)
2022-12-12: Ghosts of Data Warehousing Past, Present and Future (online)
2022-12-14: UK Data Vault User Group on data mesh and data vault (online)
2023-03-13/17: Data Modeling Masterclass (in German, online)
2023-05-24/25/26: Knowledge Gap data modeling & data architecture conference (online)
The More Things Change … (Part 1)
While the 2010s were dominated by data lakes (many of which turned into data swamps quickly), the 2020s are a time in which a whole zoo of new and new-looking data architecture paradigms fight for the time, money and attention of business sponsors, data architects and BI developers (now often rebranded as data engineers, analytics engineers and God knows what).
… the More They Stay the Same
The tasks at hand, however, haven’t really changed since the classic Inmon data warehouse definition: The data shouldn’t just be copied over 1:1 from the operational systems, it should be integrated and presented in a subject-oriented way. It should be stable but also able to handle change over time. Therefore, it stands to reason that whatever 2020s data architecture paradigm you choose, you might want to realize it using a proven data structuring approach that has shown again and again that it can deal with all these tasks.
We all know data structuring approaches like this. They belong to the ensemble modeling family and are called (among others) data vault, anchor modeling, and focal.
New Responsibilities for Data Sources
At first glance, the 2020s data and analytics architecture paradigms often seem quite different from each other.
Some of them are quite decentralized (like data mesh and data contracts), others are positioning themselves as more of a technical upgrade to the centralized Enterprise Data Warehouses of old (like cloud data warehouse and data lakehouse). But this means that they actually have something in common: the crucial axis of change is centralization/decentralization, both on the organizational and the technical level.
From an organizational perspective, decentralization seems to be all the rage in the 2020s. Most organizations used to have a single central data team that was the home of data skills (or at least thought it was), responsible for all data topics (at least those not covered by shadow IT) and therefore chronically overloaded. Several new paradigms want to resolve this by giving more responsibilities to data sources and/or data consumers.
When the data team signs (usually figuratively, not literally) data contracts with source systems or external data providers, it hands over some important tasks to the people responsible for its data sources. The people managing those data sources now have to make sure that the data arrives quickly, is clean and uses the agreed-upon data structures. Ideally, this means that the data team can stop worrying about the EL part of the ELT (Extract, Load, Transform) process and doesn’t have to do so much firefighting after yet another unannounced source system column change.
New Responsibilities for Data Consumers
The data team can also hand some of its responsibilities over to data consumers. Something similar has been happening with BI and data visualization tools for quite some time (sometimes called self-service BI) but now people have started delegating the T part of the ELT process as well.
More and more, people do so-called analytics engineering with tools like dbt (but mostly dbt itself). With this approach, data transformations are implemented by data analysts from the business side or former data team members that have been temporarily or permanently embedded in business teams.
Often, and especially in large organizations, you can see hybrid approaches. Maybe data contracts were feasible with some data sources but not others (especially SaaS and COTS sources). Maybe some business teams (especially data-literate ones like Finance or Marketing) do their own analytics engineerings while others still depend on the central data team for their transformations. And more often than not, there isn’t the one central data team anymore but several data teams that are responsible for different business teams or different use cases.
Outlook
Over the next few issues, we look at other 2020s data architecture buzzwords, what they mean in practice and how the Inmon criteria are still the right lens for evaluating them.
Stay tuned!