Tuesday, 11 April 2023

In NOSQL, data modeling doesn't have to be logical!

At Hackolade, I have been really enjoying my journey into the world of data modeling. It really has been a journey: in some ways, I have felt like I had to go back in time a little bit, and re-learn some of the skills that I had known in the old days when relational databases still dominated this blue planet. ER diagrams, documentation requirements, naming conventions... all good things that had seemed completely normal in the old days - and that got a little bit of a bad rep when the hipster NOSQL databases and data formats started to gain popularity. I mean: at Neo4j, we used to say that "your data is your model", and downplay the need for deep modeling thoughts ahead of time. And at the same time, we also realised that every time a project wasn't going great, it was the freakin' data model that was the culprit. Every time.
So great: NOSQL data modeling is a thing, again, and we now have a great set of modern tools like the Hackolade Studio to facilitate it. But there I have noticed that there is a lingering question. It has to do with how data modeling facilitates the conversation between business and IT, and how that means that you have to have multiple levels of modeling.

It's good to be leveled

In traditional data modeling, there's always been different levels of modeling, at different levels of abstraction. Specifically, we have been working with
  • conceptual data models - to understand the business requirements and the relationships between entities, WITHOUT considering implementation details. Conceptual models are all about agreeing the scope of the data domain that we are describing, and creating consensus on the vocabulary that we are going to be using to describe that scope.
    • Note that this "conceptual" level of data modeling has signficant amounts of overlap with the ideas outlined in Domain Driven Design. We will explore and explain that link in a future article in more detail. 
  • logical data models - provides a next level of detail with regards to the data elements, attributes and their relationships - still in a technology-agnostic fashion
    • Note that this "logical" level of data modeling is the one that has been the source of quite a bit of confusion. Hence the title of this article - you will see below that we are going to suggest a departure from these three levels of data modeling... and from their names!
  • physical data models - provides the implementation details of a logical data model in a specific (database management) system

We summarized the characteristics of these different levels in this table:

Now, while it is clearly a good thing to have different levels of data modeling (as it greatly facilitates our ability to have a conversation between business people and technologists - one of the core functions of data modeling in the first place), it is not always easy to understand how we can apply this to a world of agile development and heterogenous data backends. Why, well
  • logical data models claim to be technology-agnostic - but the reality is that they assume the normalisation of your data - which basically means that the technology of your physical datamodel is almost certainly going to be relational.
  • because logical data models assume normalisation, it also means that they don't allow for complex data types that are more common in NOSQL
  • for the same reason, it will not be possible to use the common, conceptual or logical definitions for more than one NOSQL backend, if we wanted to do so - which is commonly the case.
At the end of the day, this is all about striking a balance, and finding the right levels of abstraction to achieve the results that we want. When you do that, and think about what is the most efficient and effective way to do that, you may want to make some changes to the conceptual/logical/physical levels above.

Simplify and solidify: 2 levels for the future

Because of the issues that we outlined above, it is appropriate to revisit the nature of the different modeling levels in our new, agile, NOSQL context. We want to be able to satisfy the different concerns of data modelers, and at the same time provide a more coherent framework that would
  • allow for business friendly, and TRULY technology-agnostic data modeling to happen
  • allow for technology specific, SQL and NOSQL, data models to be derived from a canonical definition that could drive consistency and governance.
This required a change in terminology and tooling. Hackolade introduced its ideas around Polyglot Data Modeling for this very reason - creating a new level of technology-agnostic data modeling that sits across the traditional boundaries between conceptual/logical and logical/physical data modeling. We suggest that you work with two levels to achieve your desired results:
  1. Polyglot data models will allow you to create an over-arching data model that you can use to drive different underlying physical, target-speficic schema implementations
  2. Polyglot data models will allow you to define and document conceptual and logical structures for your data models, and share them with your business and governance stakeholders. If available, they will leverage the ideas and deliverables of a Domain Driven Design to do so. More on this later.
  3. Polyglot data models will embrace complex agility and backend diversity at the conceptual/logical level, therefore providing a true technology-agnostic model that allows for subtype/supertypes, many-to-many relationships, and denormalisation.
  4. Polyglot data models will allow you to more fluently transition from the conceptual/logical discussion into a physical implementation, by automatically allowing for the right conversions and normalisations (if any) to take place when you transition from the higher-level (polyglot) to the lower-level (physical) data modeling conversations.
  5. Physical data models will allow for that variety of data structures that accomodate the optimizations that polyglot persistence and nosql access patterns, using the intimate understanding and knowledge of the specific backend infrastructure that is the subject of our physical data model.

This new, simpler and solidified structure for our data modeling efforts, will achieve the same objectives of the conceptual/logical/physical strata, but do so in a more modern way. This may seem illogical at first - but it is actually common sense and extremely simple and handy when you get your head around it.

Hope this was a useful article for you - and if you have any comments, please reach out and let's discuss!



No comments:

Post a Comment