Tuesday 25 April 2017

Autocompleting Neo4j - part 4/4 of a Googly Q&A

In the firstsecond and third posts in this series, I got round to finally aswering some of the more interesting "frequently asked questions" that Google seems to be getting on the topic of Neo4j.
Today, we'll continue the last part of that Q&A, and answer two more questions which - funnily enough - are kind of related. They both deal with the query language that people use to interact with their graph database. Neo4j has been pioneering openCypher of course, but clearly there are alternatives out there - and people need to make an informed choice between query languages, of course.

5. Does Neo4j support Gremlin

Long time ago, in a dim and distant past, Neo4j was a very Java-centric, mostly embeddable and programmatically available database that was optimized for graph traversals. Most users would embed the database in their Java software, and use the database from their Java business logic directly and imperatively. The latter is important: the software developer would explicitly tell the database what to do, how to traverse through the graph, and how to get to the resultset. Which, of course, will give the developer a lot of control, power, and ideally, speed, but also comes with some disadvantages.

Most notably, the imperative approach can only leverage the domain and data knowledge that the software developer has about the data. And, not insignificantly, you have to be a true software developer to use a database - which is so 1960s. The world has moved on since that time - and if the evolution of relational databases has shown us anything, it's that the true power of data can only be leveraged through the use of declarative (<> imperative) query tools. In a declarative world, you as a software developer can just describe and declare what kind of result-set you want, and the database would be tasked with getting you that data as efficiently as possible.

So: Gremlin is an imperative, groovy-based query language that is created by the Apache Tinkerpop project. It was a great approach to querying graphs - and Neo4j actually helped develop the early versions of the language. However, we really have moved away from Gremlin - for the above reasons.

Now, it's important to understand that both the imperative and the declarative approach to querying data have merit, just like manual gearboxes in cars can have merit over automatic gearboxes. In Europe, we for the longest time thought that manual gearboxes were simply superior - until some of us actually switched, until automatic gearbox technology actually got good, and since then, I know that I for one never am going back.

So that's a very long-winding answer to the does neo4j support gremlin question. Yes, it does, and you can add a gremlin plugin to the Neo4j server if you so wanted to. But why would you? openCypher (the declarative graph query language that Neo4j has pioneered) has become an open standard, Cypher has become faster and faster, and Neo4j's apocs have given you a very sharp tool to go imperative if and when you would need to. So even though it does in fact support Gremlin, we really don't recommend it. But at the end of the day - you are in control yourself :) ...

6. Does Neo4j support SparQL

Similar to the Gremlin question above, we can provide a very quick answer to this question: YES, you can run the standard SparQL query language against a Neo4j database, by using something like this SparQL plugin for Neo4j. But that does not necessarily mean it's a good idea :) ...

SparQL has been a W3C standard for querying the semantic web as stored in RDF data stores. It's also a declarative language, which is good - but it really is made for the RDF data model, which is a lot less rich than the Property Graph that is used by Neo4j and other graph databases out there. The RDF model differs essentially and significantly from the property graph, based on the fact that the relationships / predicates cannot be assigned weights/properties/attributes - which is a fundamental difference. You can make a RDF store out of Neo4j, but you cannot make a property graph store out of an RDF / Triple Store. And therefore, the SparQL query language will not always make sense on a Neo4j instance - on the contrary.

My friend and colleague Jesús Barrasa has been doing a lot of work and a lot of writing about how these two worlds compare on his blog. There's some really nice examples there, among which this one where he has some examples showing how ti query an RDF store with SparQL and how to then import that into Neo4j using Cypher.

That's about it - at least for now. We have answered the top autocomplete questions in more or less detail in these few blogposts. It's turned out to be quite a long series - I hope that's ok.

As always: let me know if you any feedback on the above - would love to hear your thoughts.



Note added after publication: 

No comments:

Post a Comment