Thursday 21 August 2014

How to mess up your Neo4j project

Early August, I celebrated my second anniversary of working for the wonderful folks at Neo Technology, makers of Neo4j. It has been a wonderful, inspiring journey so far, with many beautiful things to come I am sure. But along the way, I have also seen some things that I would have preferred not to have seen – mistakes or failures that I believe could have been avoided, specifically around the setup of Neo4j projects. Some of these reasons are technical - others aren't - but all of them seem valuable to me. So I thought I would write that up for you – both for your benefit and for my own …
Note to the reader: this post started out as a presentation that I half jokingly made internally in Neo Technology, talking about all the things that users could do wrong in their project. I took the perspective of “what would you do, if you really wanted to mess things up”. Of course, I could have made this point in different ways, and I could have documented “best practices” that talked about “how to do things”. But I decided not to, essentially because this is way funnier – and I am a big believer in using humor to get a point across. So here goes.


1. Never ask WHY!

First and foremost, and at the risk of beating down an open door, new technology users – Neo4j users are no exception – should always ask themselves the question WHY a particular piece of technology would be a great fit for their projects. I for one am not a big fan of introducing or even promoting technology for technology’s sake – it’s just a recipe for disaster. There has to be a good reason for you to adopt Neo4j – whether you are solving a performance issue, want to do something that you are already doing more efficiently, or want to seize an opportunity that you think could be the “next big thing”. You have to have a reason. Without a good, solid reason, the project will likely not deliver enough value for your team – and you should just continue doing what you are doing. And in other words, if you want the project to mess up bigtime – you should never ask why, and definitely NOT have a reason. But if you do, your company and co-workers will thank you for it.

2. Forget about your colleagues

Speaking about those co-workers: if you really want to mess things up, then you should always think of yourself as an island, and not talk to any of your colleagues about how the introduction of Neo4j will affect them. They will surely like being blindsided and get a technology pushed down their throats at the very last minute before going into production - no doubt about that.
Seriously: if your team is serious about introducing Neo4j in your project, then that must mean that you have a solid reason for it – see previous paragraph. If you have a solid reason, then you should really talk to other people about it. Developing your wonderful Neo4j app in a dark little corner is a true recipe for failure - you need to make the love spread throughout your organization. And if you don’t know how to do that, just ask other people how they did that – plenty of meetups, community users and commercial users around!


3. Get started before getting trained

The cool thing about Neo4j is actually that it is quite easy to get started with. I truly believe that it is very doable for someone with average IT skills to download Neo4j and get started with some graph database concepts. The early stage learning curve is not that steep at all. The problem, I think, is that this initial ease of use is actually quite deceptive – and that building a full-scale Neo4j application still requires true skill. Like with many technologies, the devil is in the gruesome detail – and if you really want to develop a Neo4j application for production use, then you are going to meet some detailed devils. So if you want to mess up the project, the thing to do is probably to sit back, relax and wait for the train wreck to happen.

If, however, you want the project to succeed - You should brace yourself for that “encounter” and educate yourself. There’s online training resources, there’s classroom training, there’s books, there’s tutorial workshops, there’s the GraphAcademy – so many things that you could read up on before you actually get into trouble. It just makes sense not to dive in head first – but to dip your feet in the water first, and then wade in step by step.

4. Jump in – head first!

Making any new technology project successful has many different moving parts, but there are a couple of parts that really deserve some additional attention. If however, you would want to make sure that you would hit a brick wall in no time, all you need to do is just jump in, head first, without giving the different moving parts any additional thought.

So let me call out those moving parts, and then depending on your intent – to mess up or not - hopefully you can spend some time thinking through these aspects before/so that the proverbial faeces would hit the air rotating device(s).



a. Modeling Shmodeling!

Over the past few years, the NOSQL movement has brought many great things to the world of databases – but I think the single worst thing that it introduced along the way is the confusion around schema’s and models. In traditional relational databases, there’s a one-to-one mapping of your database model into your database schema, and that schema was ... mandatory! You could not do without one! In many NOSQL systems, however, schema’s are non-existent or optional – and therefore somehow our reptile brains seem to think that Models should be too.

Nothing could be less true, and especially not in a highly structured graph database. Graphs provide a much richer and potentially freer way to handle your data, but with that freedom comes responsibility on the user to actually have a very well-thought-out model. The data is the model – and it will drive many different aspects of your Neo4j deployment. To paraphrase an old planning adagio: “If you fail to model, you model to fail”. So again, depending on your intent, you should definitely spend time (or not) on this part of the project. I will mean the difference between failure and success, for sure.


b. Import Shmimport!

The first thing most new Neo4j users want to do usually – after having skipped all of the steps above of course :) - is to import some of their own data into this new shiny graph database. Of course this is a valid and useful step for most users to take, but how you go about this really matters as well. Data import into graphs is just inherently more complicated than in other kinds of systems, and there are many different aspects to be taken into account – especially at scale. So yes: another great way to screw up the project, is by just “trying out some stuff” and assume that it would work. Chances are, that it won’t – it’s extremely unlikely even.

This is why I would sincerely recommend that people first start their import adventures by
  • studying the different alternative import methods first. Look at the official neo4j page or my earlier blogpost for some clues.
  • try the import on a “reasonable” scale – thousands of nodes and relationships, not billions – before venturing onwards. Walk before you run, and all that. 
And if you get stuck – do not despair, but look at number 5 in this post.

c. Query Shmery!

I have been there myself so many times before: graph database queries have a way of appearing simple and gentle at first, but then coming back at you with a vindictive head-butt. It’s so easy to get a query wrong – especially because graphs and graph concepts are quite new for most early users. A graph traversal that was not thought through, can lead to a combinatory explosion in no time, and bring down servers and machines in no time. Great mess up strategy!

But if you actually want to succeed, you should know that there are many things you can do about this today, and many things will become possible in the next few releases of Neo4j – but for now, additional care in the query composition is just a smart thing to put in place. And if you get stuck – do not despair, but look at number 5 in this article.

d. Operations Shmoperations!


I suppose this is related to the point around considering the impact on your colleagues: most developers still don’t work in environment where “operations” and “development” are integrated into a coherent “devops” process. As regretful as that may be, it does still mean that today, Neo4j developers that want to really screw their colleagues and jeopardize their projects, should really ignore the operational aspects of implementing a Neo4j database system.

Some questions that you could ask yourself if you wanted: Will you run it as a server or embedded? Have you tuned your Java Virtual Machine? Will you need clustering for HA/load balancing? How will you handle backups and restores? Do you need to put in place some monitoring? These are the kinds of questions that people are used to asking in traditional relational database systems – but seem to have lost track of in a graph world. Again: no one should want operations to be an afterthought – it will only hurt all the more afterwards.

5. Love your waterfall!

My friend and colleague Stefan added another creative way to mess up a Neo4j project: use a an old-school waterfall methodology for developing your application. Analyse. Develop, Test, Deploy. But never, ever, ever iterate! Don't ever try to test something out before you have finished the entire development phase!

Joking aside: you would be amazed at how many people still make this mistake, and uncover huge project risks extremely late in their project timelines - just because they did not have the appetite to be a bit more "agile". It's sad really - especially in a Neo4j context, as this kind of technology is still quite new and really benefits massively from short and numerous development spikes.

6. Be Proud – never ask for help

This is probably one of the trickiest things to articulate. Developers in general, but Neo4j enthusiasts in particular are usually quite smart people. They really are, and I consider myself lucky to be part of that community. But one of the annoying things about smart people is that they usually prefer not to display ignorance, and would prefer to “work it out” themselves. This, I have found, is one of the very, very best mess up strategies. 

The "problem" (depending on your objective :)) with that strategy in a Neo4j context, I have found, figuring things out yourself is hugely inefficient and terribly frustrating. Some things you just can’t “work out” that easily, especially not if this is your first Neo4j project. It is my belief that it would be massively more efficient for people to ask for help early and often.
There’s a lot of public AND private ways to ask for help on Neo4j, whether it’s on StackOverflow, Google Groups, Meetups, private email conversations, or even 1-on-1 sessions with Neo Technology staff, prototyping workshops – all of the above can mean the difference between a very productive or a massively frustrating experience with Neo4j. Over time, Neo4j will grow more mature and hopefully you will not need to ask as much. But for now, just ask early and often – that’s the way to go in my opinion.

7. Above all: don’t contribute back to the Neo4j project

Neo4j has been an Open Source project for a very long time now, and I know that all of us at Neo Technology are very proud and passionate about that. But as you probably know, there’s about a zillion open source projects out there that die within a few years from their inception – and of course that kind of a lifecycle could also very nicely kill of your project - leaving you with an infrastructure that is not supported, maintained, or just plain alive. And the best way to make sure that the project does not survive, is of course not to contribute anything back to the project – no code or cash.
Without wanting to plug Neo4j product sales on this blog, people should – in my opinion at least – realize that open source projects cannot survive without community/user contributions. Some users will contribute code and help us continue to innovate and maintain the product. But others, average commercial users, should contribute in cash – in their own interest. Not contributing in any way only increases your chances for a mess-up – there’s no such thing as a free lunch.

So that’s about it. I hope you guys enjoyed thinking about this as much as I did – and I truly truly hope that you … don’t mess up.

Cheers

Rik

No comments:

Post a Comment