Welcome to the third issue of Model Your Reality, a newsletter with musings about data modeling, data warehousing and the like.
Until further notice, each issue will contain of two parts:
a list of upcoming data events that might be of interest for you (they definitely are of interest for me) and
some thoughts about a certain data topic (like data vault modeling patterns).
Let’s get started!
Upcoming Data Events
If you know about other relevant events in the near future, please mention them in the comments or send an email to admin@obaysch.net.
2022-01-12: UK Data Vault User Group with Neil Strange (online)
2022-02-01/02: Business Mapping & ELM Certification, Genesee Academy (online)
2022-02-08: Data Modeling Meetup with Michael Müller (online)
2022-02-09: UK Data Vault User Group with myself (online)
2022-02-23/25: Data Vault Training (CDVDM), Genesee Academy (online)
2022-03-08: Data Modeling Meetup with Andrew Foad (online)
2022-06-01/03: Knowledge Gap data modeling & data architecture conference (online)
A Data Modeling Library
In discussions on LinkedIn and elsewhere, I’m noticing again and again that people don’t seem to know many (or any) data modeling books. The following list will hopefully help to remedy this issue.
It is intended as a reading list for people who want to get started with or deepen their knowledge about data modeling. And if you’re already very knowledgeable in this area, maybe it motivates you to have a look at one of the classics again ...
General
Most of these books can serve as a good starting point. If you read just one book, read Data and Reality by William Kent.
Michael Brackett, Data Resource Design (a little abstract at times but with some interesting thoughts about dealing with data in an organization)
John Carlis, Mastering Data Modeling: A User-Driven Approach (an under-appreciated classic that introduces shapes and The Flow, a structured way of building and improving data models)
Terry Halpin & Tony Morgan, Information Modeling and Relational Databases (thorough introduction from an ORM perspective)
Steve Hoberman, Data Modeling Made Simple: A Practical Guide for Business and IT Professionals (very good introduction to data modeling; there are also some more tool-specific books by the same author)
William Kent, Data and Reality: A Timeless Perspective on Perceiving and Managing Information in Our Imprecise World (originally written in the 1970s but still very relevant as a foundational text)
Graeme Simsion, Data Modeling: Theory and Practice (interesting study about how people approach data modeling and if it is more description or design)
Graeme Simsion & Graham Witt, Data Modeling Essentials (the classic introduction into data modeling; goes into more depth than its title suggests)
Graham Witt, Data Modeling for Quality (concise overview of data modeling from one of the authors of Data Modeling Essentials)
Conceptual
These books introduce promising new takes on the old conceptual model.
Thomas Frisendal, Graph Data Modeling for NoSQL and SQL (introduces concept maps)
Steve Hoberman, The Rosedata Stone: Achieving a Common Business Language using the Business Terms Model (introduces the business terms model)
Ronald G. Ross, Business Knowledge Blueprint: Enabling Your Data to Speak the Language of the Business (introduces concept models; includes some helpful tips on getting your organization’s vocabulary straight)
Data Vault
Cautionary note: There is still a lot going on in the data vault space. While there are few recent additions, there’s not the one book that really reflects the current state of data vault thinking.
Patrick Cuba, The Data Vault Guru: A Pragmatic Guide on Building a Data Vault (very detailed, implementation-focused book with a few interesting ideas)
John Giles, The Elephant in the Fridge: Guided Steps to Data Vault Success through Building Business-Centered Models (combining data vault and modeling patterns in a structured, accessible way)
Kent Graziano, Better Data Modeling: An Introduction to Agile Data Engineering Using Data Vault 2.0 (short, very readable introduction to data vault modeling)
Hans Hultgren, Modeling the Agile Data Warehouse with Data Vault (detailed introduction to data vault modeling)
Daniel Linstedt & Michael Olschimke, Building a Scalable Data Warehouse with Data Vault 2.0 (detailed, at times a little too Microsoft-focused introduction to data vault)
Dimensional
The reports of dimensional modeling’s death are greatly exaggerated. In many cases, star schemas are still the go-to approach for information delivery.
Christopher Adamson, Star Schema: The Complete Reference (the title already says it all)
Lawrence Corr & Jim Stagnitto, Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema (introduces the BEAM* method)
Bill Inmon & Francesco Puppini, The Unified Star Schema: An Agile and Resilient Approach to Data Warehouse and Analytics Design (interesting generalization of the time-tested star schema)
Ralph Kimball & Margy Ross, The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling (the book that popularized dimensional modeling in its most recent incarnation)
Ralph Kimball et al., The Kimball Group Reader (the collected writings of the Kimball Group, covers almost all aspects of dimensional modeling and data warehousing in general)
Michael Venerable & Christopher Adamson, Data Warehouse Design Solutions (dimensional modeling patterns for a lot of use cases)
Fact-Based
Fact-based modeling is still somewhat of a niche approach but very powerful once you get used to it. There are some helpful tools as well.
Terry Halpin, Object-Role Modeling Fundamentals (introduction to ORM)
Jan Pieter Zwart, Marco Engelbart & Stijn Hoppenbrouwers, Fact Oriented Modeling with FCO-IM (comprehensive introduction to FCO-IM)
Patterns
For some reason, people keep reinventing the wheel with their data models. Any of these books can help you to reduce the wheel reinvention with tried and tested data modeling patterns.
Jim Arlow & Ila Neustadt, Enterprise Patterns and MDA: Building Better Software with Archetype Patterns and UML (describes all relevant patterns for modeling an organization in an accessible way)
Michael Blaha, Patterns of Data Modeling (includes many useful patterns at different levels of abstraction)
Martin Fowler, Analysis Patterns: Reusable Object Models (another pattern classic that covers a lot of ground)
John Giles, The Nimble Elephant: Agile Delivery of Data Models Using a Pattern-based Approach (from a practicioner’s perspective; good starting point before you graduate to the more comprehensive Hay and Silverston books)
David C. Hay, Enterprise Model Patterns: Describing the World (an updated and enhanced version of his groundbreaking Data Model Patterns book)
Len Silverston (et al.), The Data Model Resource Book (sizable catalog of data modeling patterns in three volumes; vol. 1, vol. 2, vol. 3)
Temporal
Unfortunately for such an important aspect of data modeling, none of these books is the most practical or the most accessible. I’ll hope to write some more on this topic (hopefully in a practical and accessible way) in future issues of this Substack.
C. J. Date, Hugh Darwen & Nikos Lorentzos, Time and Relational Theory (not for everyone; try if Johnston isn’t formal enough for you)
Tom Johnston, Bitemporal Data (interesting thoughts on bi- and tritemporal data that somewhat diverge from the existing standardization attempts)
Richard T. Snodgrass, Developing Time-Oriented Database Applications in SQL (the classic book about temporal databases)
This list is based on a blog post that was an updated and expanded version of an earlier LinkedIn article.
If you don’t like lists, stay tuned for some more prose in the next issue!