Wednesday 18 January 2023

The modeling mismatch

After having spent 10+ years in the wonderful world of Neo4j, I have been reflecting a bit about what it was that really attracted me personally, and many - MANY - customers as well, to the graph. And I thought back to a really nice little #soundcloud playlist that I made back in the day: I basically went through dozens of #graphistania #podcasts that I had recorded, and specifically went back to the standard question that I was asking my interviewees on the podcast: WHY do you like graphs??? WHY, for god's sake!!!

Unsurprisingly, people very often came back with the same answer: it's the DATA MODEL. The intuitive, connected, associative, visual, understandable structure that we humans love interacting with: the labeled property graph (LPG). Have a listen to what people were saying:

The fundamental reason, I think, we humans love #graphs for tackling complex problems, is that it is just a better MATCH for describing, aka modeling, complex systems. Graphs are good at dealing with complexity. Modeling a complex system in a graph is MUCH much easier than doing something similar in a traditional database format.

You could argue that something similar is going on in other NOSQL databases (document databases like Mongodb or #Couchbase, key-value stores like DynamoDB or Redis, column family stores like Datastax/Cassandra). There, too, there's a mismatch between the "simplicity" of the domain data model, and the complexity of the traditional relational system. Relational domains are just too complicated for simple applications, the story goes. So there too, we should be looking for a better fit between the complexity of the domain and the supporting technology data model.

So how do we solve this? How do we bring the right data model to the right domain? How do we allow our software engineers to really optimize their backend data infrastructure, and choose the right tool for the job?

The answer, I believe, will lie in something that Hackolade is pioneering: polyglot persistence, guided by polygot data modeling.

The idea behind polyglot persistence has been described for quite some time by smart people like Martin Fowler. The argument is simple, rather than trying to cram all the data used by an application into a single, generic persistence layer that does many things (this is the idea behind most RDBMS' as well as most multi-model databases) averagely, split up the persistence layer into different, highly specialised and therefore optimized, persistence layers. That means that an application would not talk to ONE, but to MULTIPLE, data backends, and that the application would need to be adjusted so that it could talk to all of these backends correctly. The application would therefore become a polyglot, and be said to be using a polyglot persistence architecture. Sounds simple enough, right?
And it isn't THAT hard. But of course there is no such thing as a free lunch: the price that you pay for the additional optimization, performance and specialisation, is of course that 
  1. you have to be able to design the system to cover the different architectural capabilities, and 
  2. your application/some middleware will need to keep the different data backends in sync, where necessary. And this is where Hackolade comes in.
If you are developing polyglot applications today, you really should learn more about the polyglot data modeling tools that it provides to design and document the data model across these different data backends. For me, it has been an eye-opener to really see how quickly and efficiently you can bring this to life, with the right tools and expertise. It feels like it has been a problem that has been sitting there for quite some time waiting to be solved - and holding back the whole NOSQL/Agile development movement in many ways. All of that makes me want to know more about this - and I will be exploring this over the next few weeks.

Looking forward to the journey already!



No comments:

Post a Comment