During these conversations, I have noticed that there’s quite a bit of work to be done to clarify and straighten out the meaning of the words that we use in these conversations. I have noticed that in the NOSQL data modeling space, we are not always very precise with our words - and that this imprecision can lead to all kinds of misunderstandings. Specifically, I have been struck by the confusion around 3 words: model, schema, and metadata.
This is not a confusion that is specific to my conversations of the past few weeks. A quick google search will show you that there have been literally millions of articles and conversations online asking the same question - and this has been going on for a very long time.
Again: I don’t think that this is a new conversation. However, it does seem to me that the confusion is kind of new - it does seem to have become more acute. In the old days of Relational data modeling, people seemed to have a much clearer view on what was meant with modeling terms. Specifically, people always talked about
This is not a confusion that is specific to my conversations of the past few weeks. A quick google search will show you that there have been literally millions of articles and conversations online asking the same question - and this has been going on for a very long time.
Again: I don’t think that this is a new conversation. However, it does seem to me that the confusion is kind of new - it does seem to have become more acute. In the old days of Relational data modeling, people seemed to have a much clearer view on what was meant with modeling terms. Specifically, people always talked about
- The conceptual data model: defines what the system contains, and is about collecting and agreeing on a shared vocabulary for the domain. Typically, data architects and business stakeholders are the users who create conceptual data models. These types are built with the intention to organize and define business concepts, the relationships and the attributes for these concepts.
- The logical data model: defines the business rules and how a system has to be implemented. It is not specific to a database. Users who generally create LDMs are data architects and business analysts. These models are used as a foundation for physical data models because they establish the structure for the data elements and identify the primary keys and foreign key relationships, .
- The physical data model: how the system will be implemented to a specific database management system. This is where the database designers and developers come in to create physical data models that are expansions of the logical and conceptual models. These models are designed for technical purposes that support the business objectives.
What about model / schema / metadata?
So what is the issue? Well first of all, I think that people are using these terms interchangeably way too easily. I have spoken to very smart people in our industry that sometimes say one thing but mean something else, and then in the next sentence get very specific around the precise meaning that they are trying to articulate. Let’s make up our mind, shall we!In general, I do think these three terms articulate very different concepts in the data modeling profession, and that as such, we should take deliberate care when using them. I know I am really just a novice at this, but from my point of view
- The database model is a graphical representation of the structure of the data that we are going to store in a database. As I outlined above, we would refer to “conceptual data models”, “logical data models” and “physical data models”, in traditional relational database management architectures. Models would typically be represented in some entity-relationship diagram, as the model is primarily a representation of how the software developer understands the business reality of the user. It’s the documentation of that conversation between the business and IT.
- The database schema is the (set of) technical artefacts(s) that can be the result of the physical modeling work, and which administrators / devops staff will use as they are tasked with the day-to-day operation of the software system, to enforce the rules that we have formalized for this system. It’s a set of formalized rules that you will not be allowed to break easily inside the data model: things like uniqueness, optionality, etc. Typically, these rules will be formalized in a “data definition language” (DDL), which will be the output of the physical data model that a database management system will enforce.
- The database metadata encompasses both the model, the schema and more. Specifically, it would also include some level of formal documentation of the model and the schema, which would be properly written and maintained on a regular basis. The metadata of a database model is the comprehensive set of everything that we would need to demonstrate the proper governance of our data modeling efforts to whoever wanted to know.
That brings me to the end of this article. In writing it, I hope I have clarified some topics that have clearly been unclear in the NOSQL Data Modeling space, and that struck me during my first couple of ventures learning the important topics that the industry is struggling with. I am sure that this is not the end of the journey – I plan to explore these topics more in future article.
As always: let me know if you have any thoughts about this article – I would love to discuss.
All the best
Rik
No comments:
Post a Comment