Monday 29 June 2015

Podcast Interview with Javier de la Rosa, SylvaDB

I had a great conversation with Javier de la Rosa, recently. Javier is an active member of our community, and one of the driving forces behind the SylvaDB project. I first learned about Sylva when I had just started to work for Neo, and when I was trying to acquaint myself with the basic concepts in more detail. SylvaDB stands for
Sylva ["silva", a book to organize knowledge during the Renaissance] is an easy-to-use, flexible, and scalable database management system that helps you collect, collaborate, visualize and query large data sets. In Sylva, all your data is connected using a graph, and you will see the connections all the way through. 
And it really does deliver on that vision. It's really simple to use - even if you don't know anything about graphs, databases, query languages, etc... It's totally open source, and a great topic of conversation:

Here's the transcript of our conversation:
RVB: Hello everyone, my name is Rik Van Bruggen from Neo Technology, and here we are again doing an evening podcast recording all the way across the Atlantic. And my guest today is Javier de la Rosa, all the way from Ontario, in Canada. Hi Javier. 
JDLR: Hello, how are you Rik? 
RVB: I am very, very well. Thanks for coming on the podcast, really appreciate it. 
JDLR: Thank you for inviting me actually. 
RVB: Yeah, great. Javier, do you mind introducing yourself? I always ask the same question, but who are you and what's your relationship to the wonderful world of graph databases? 
JDLR: Sure, no problem. So, as you said, my name is Javier de la Rosa. I'm originally from Spain. I have background in computer science and artificial intelligence, but then I decided to switch to a kind of different field.  I moved to Canada, I live in Ontario now, the city of London, ironically, it's close to Toronto. Actually, I'm getting my PhD in Literature, working in a field called Digital Humanities. Because they have a lot of problems, we are now using graph databases and try to tackle all those different problems that they have. 
RVB: That's interesting. So how did you get into graph databases, Javier, and why did you get into it? 
JDLR: When I first came to the lab that I'm working for, I saw that they were all using somehow databases - let's say they were using like Microsoft Access, or Excel, stuff like that. Then, because I have a background in computer science, I thought maybe it's a good idea to use something that allow them to model the problems more freely. Then I discovered neo4j, and then I started working on my own and  address client the first time that they released the rest end point. Then, on top of that, we decided to be a tool for all the humanities to actually use, and get all the power that neo4j has to provide. 
RVB: So basically, you've written a bunch of tools on top of neo4j that allow actual language specialists, digital humanities specialists to do their jobs in a graphy kind of way. Is that what I'm hearing [chuckles]? 
JDLR: Yes, that's exactly it. Because they usually do a lot of analysis. They also work with networks. For example, the typical example is social network analysis, but sometimes they have to work in an isolated environment, and they only have like the email to share the stuff, so I thought that maybe like a cloud-based solution for them would be better. So that's why I thought of neo4j in the first place. 
RVB: Super cool. And the second question I always ask is why did you get into the graph? You've sort of started answering it, but what is it that makes it such a good fit for this digital humanities? What makes the graph and the graph database such a good fit? 
JDLR: The main thing is is that the world of humanities is such a mess. So they start working on a problem and then in two months they decide that they have to change the schema. And then a week later they have to change it again. Then again, and again, and again. So having something schema-free is actually the best solution for all of them, so that's why we decided to use something that allows to have flexible schema, or at least schema-less. 
RVB: Can you give me an example of a domain that really benefited from that? Some kind of a project that you were able to solve using a graph database? 
JDLR: Yeah. For example, we have a colleague now, he's working on analyzing like 13 million books written in Spanish, and then he has to model like how the transformation of knowledge actually happened in the 17th, 18th centuries. So he started creating a schema, he started to modeling the problem. And then as long as the research was actually in advance, he had to modify the schema several times to actually put his data. So the thing is that instead of having your schema first and then trying to feed your data into the schema, you actually modify the schema as long as you need it. It was a natural option for us. 
RVB: As I understand it, Javier, you've also done a lot of work to sort of put this into the hands of the researchers, right? That's what the SylvaDB is all about, isn't it? 
JDLR: Exactly. Even if I love  neo4j, and I really like the Cypher language, we have to acknowledge that it's not for everyone. If you're not a programmer or an analyst, it's hard to learn it, especially if you are from the humanities and all you have done in your life is just read books and do a lot of critical thinking. So with SylvaDB on top of neo4j, for them to get all the power, that you can actually get from using neo4j. 
RVB: So anyone can use it, right? I've registered, for example, and I was playing around with it, but-- so anyone can use this to create their own graph database? 
JDLR: Yeah. It is free to use. We have a public website which is called sylvadb.com so you can go there. But because we are running in it's like an academic thing, so if you feel that it's not enough, you  can actually go to Github, download all the code, and put it on your own machine, and that's good. That's good for us. It's a GPL license, so very good with that too. 
RVB: Super cool, yeah. I mean that's the beauty of open source, right [chuckles]? 
JDLR: Yeah. 
RVB: Absolutely. Yeah. 
JDLR: You have to contribute. 
RVB: Exactly, yeah. Absolutely. So in terms of open source, and one of the things is obviously there's a lot of top people developing new stuff around the open source project. That brings me to my last question - where is it going? Where do you think or where do you want it to go? Or what does the future have in store for graph databases and maybe also for Sylvadb
JDLR: As we see it, for example here in Canada, there is now a huge debate about if the government should provide  a national wide infrastructure to support research tools. One of the idea is to actually push for the government to have like an instance or something close to a graph database, a massive graph database support for all the data sets that researchers are using. That's one thing, but that would take a long time to be a reality. In terms of our short-term goals, we want to create another tool in Sylvadb that allows you to create projections. That's now easily done using Cypher, but in Sylvadb everything is useful. You don't need any programming knowledge. We want the researchers to be able to say, "Okay, I have a book and an author and then a CD," and they want to create a new graph which is the result of project. That relationship, which is have three different types into one single relationship that only going from one type to the other - usually called projections, but right now it's only available through Cypher. 
RVB: That sounds really interesting. And is that something that you'll think you'll be able to release in the next couple of months or is that long-term? 
JDLR: Let's see. I am working now, I'm finishing my thesis, so I have to defend, but that's, so let's see how much time I have for that. I would love to. 
RVB: Okay, well very cool. Well, Javier, thank you so much for sharing that with us. 
JDLR: Thank you. 
RVB: I think it was very interesting and I'm sure you're going to play around a little bit more with SylvaDB, so I thank you for sharing that with the community as well. And yeah-- 
JDLR: Thank you very much. 
RVB: I look forward to seeing you at one of the conferences, maybe a graph connect in October or something. That would be great. 
JDLR: Sure, I will try. 
RVB: Okay, thank you, Javier. Have a nice evening. 
JDLR: Thank you very much, Rik. 
RVB: Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Monday 22 June 2015

Podcast Interview with Luke Gannon and Pieter Cailliau, Neo Technology

Today I thought we would try something different. In this podcast series, we have talked to a bunch of people already, many from Neo Technology. We talked to founders, execs - we even talked to employee nr. 1. So now we are going to talk to some of the youngsters, newbies, or whatever you want to call them - people that just joined the wonderful world of Graphs, and get their perspective.

To make it "easier" on them, I thought I would get them to talk to me together - in a small group. A bit of an experiment, but I think it worked out. In today's episode we have Pieter Cailliau (of GraphConnect fame with his talk about how Tomtom uses Neo4j) and Luke Gannon have only been with Neo for a few months - but clearly already have a lot of things to say about it. Let's listen:

Here's the transcript of our conversation:
RVB: Hello everyone. My name is Rik Van Bruggen from Neo Technology, and here we are again recording a Neo4j Graph Database podcast. It's a bit of a special podcast today because it's a bit of an experiment, because I've got two people on the phone today, Luke Guerin and Pieter Cailliau. Both of them are consultants at Neo. I'd like them to introduce themselves. Luke, why don't you start? 
LG: Hi, Rik. Yes. I joined Neo back in May. I came from a big SI. I've been working in the graph database space for a month now, and it's been good fun working with Neo. 
RVB: Super, wow. So you've been using it for how long now, Luke? 
LG: I've been playing around with Neo for just over a year, year and a half. I came across it when I was back in the dark days of using MongoDB. I was looking at other solutions, and came across Neo. It's safe to say that I've never looked  back. 
RVB: Cool. The other person on this Skype call is Pieter Cailliau. I'm sure I'm mispronouncing his name again. Some of you may know Pieter, because he did a presentation at GraphConnect San Francisco last October. Hi, Pieter.
PC: Hi, Rik. Hi, Luke. Yeah, I joined Neo two months ago in April. I used to work for TomTom, where we have to use Neo4j to do math quality. So I've been using Neo4j now for two, three years, let's say. And I'm happy I joined Neo4j. 
RVB: Super cool, yeah. I'll put the links to the GraphConnect talk on the blog post that goes with this podcast, so people can take a look at it. They can try to pronounce TomTom a little bit better [laughter]. Very good. So guys, let me just ask you. How did you guys get into graphs? Luke, you're already talking about it a little bit. How did you get into it, and why did you get into it? Could you explain that a little bit more? Let's start with you again, Luke, if that's okay? 
LG: Yeah, that's fine. I started using graphs to model connected data for a <???> client at the time. They had lots of data that needed to be connected, and to traverse it and be quick, and to be able to apply quite complex queries over billions of nodes effectively. That's how I got to work with Neo, learning more about the products, seeing the wonders that Cypher can do. Don't really know what else to say. 
RVB: So, you were trying to solve a particular problem, is what I'm hearing. It was a particular problem that you needed to solve, and then you started looking for it, and found Neo. Is that--? 
LG: Yeah. The problem actually arised when, trying to load relational database and trying to keep it like a graph store. It doesn't work. We know this. If you're constantly doing cache sets across massive tables of billions of entries, you're going to hit some limitations instantly. 
RVB: Yeah. And what about you, Pieter? How did you first learn about Neo? 
PC: A routable network or actually map is a perfect fit for a graph. So you have where you model the nodes as intersections and the roads in between them, as the relationships effectively. We had a problem that we needed to do real time analysis, or impact analysis actually on our database. So map editor is continuously editing the map, and he wants to have some false feedback if the map is still connected. It's a very complex query to execute on a sequel database. We had the idea to transform our data into a graph, and that was very  effective to solve the problem in a graph way. But even better was that we could persist it in a graph database, so we could keep it in acid way we could persist the data and go back in time, how our graph looked in history, and execute that same question over again. 
RVB: So that means no more conversions from some kind of tabular formatting to graph, and back, and all that kind of stuff. Is that--? 
PC: TomTom keeps them quite... TomTom actually keeps them in sync, so there are different representations of data, but it's a perfect fit to do the complex queries they need to do. 
RVB: I understand. Cool. I wanted to ask you guys something, because you guys have been with Neo now for about two or three months. You know, that's actually something that I really want to dig into a little bit. What's it like to work in this industry? You're both young guys, and you're coming from different backgrounds. What's your experience been like working for Neo, working in this domain for the past couple of months? I'll go back to Pieter, for now. Changing the order [chuckles]. 
PC: I thought I would get some time to think about these questions [chuckles]. 
RVB: No, not this time, no. 
PC: What I really like about it is that, we see so many cool companies out there. The really hot and cool ones that would like to use our product for all different kinds of use cases. Some of them are banks, others are journals, like papers, that want to make recommendations to their readers - all kind of different use cases. It's very intriguing me, it really gets my attention over and over again. 
RVB: Yeah. It's very diverse, right [crosstalk]--? 
PC: It's very diverse and-- 
RVB: --anywhere. 
PC: And all hot topics, right? They're all cool stuff to work off. 
LG: I agree with Pieter. Pieter's got the nail on the head there. And when you see some of the people that come into us, and wanting to find out more about Neo-- that they could be anywhere within their life cycle from inception, an idea, to trying to solve a problem, and learning more about those demands is actually what keeps us quite interested. And how to apply it within a graph is even more spectacular.
RVB: Super cool. Yeah, so I'll ask one more question, actually two questions in one question. What's your favorite, or your least favorite feature of Neo4j right now? What do you like best, and what do you think needs most work right now? Pieter, tell me. Surprise questions, I like those. 
LG: Hey, Pieter. You know, what you do want to get sorted soon, the indexes, that problem. 
PC: Yeah, indeed. That's just [chuckles] an issue we had. So we currently have schema indexes. And if you make a schema index on a  property which is an array, you can't do an individual item lookup. For example, if you have middle names Rick, Chris, and Pieter as an array property, and you would like to find all persons that have the middle name Rick - the index doesn't leverage the separate items. It only uses the exact array. So, that's a feature I would like to see soon, because I think it's very powerful. Next to of course to much more string-ish like, or regular expression like queries, well, indexes. 
RVB: It's amazing how you delve into something really nity-bitty-gritty in detail like that [laughter]. 
PC: It was a topic we were discussing-- actually, right before this call. 
LG: Yeah. We have to put a disclaimer out. You can do it using the legacy indexes. 
PC: Yes, you can actually do it. 
RVB: But I think everyone agrees that the schema indexes probably needs some more work in Neo4j. I think that's a good one. And what do you like best Pieter? 
PC: What I like the best? 
RVB: So many things to choose from. 
PC: I've actually got it, and it's not nitty-gritty detail. What I like about Neo4j is the way how it brings your code to your data. In typical applications, for example, you have a SQL server and you have an application running on separate machine, which is actually sending data of the wire continuously. By the fact that we have a Cypher language which is very declarative and very smart, but also other APIs we have, you can actually execute your questions close to your data - actually, where the data is. For example in Cypher, we have a shorter spot as an example-- where it can traverse data directly where the data is. And that's, I think, one of  the most wonderful features we have in Neo4j. 
RVB: Super cool. I think it's very much related to the modeling power of the graph, right? You don't have all these transformations that are going on anymore. It's just you model reality as it is almost. Luke, what about you? Pick a most wonderful, and a least wonderful feature? 
LG: What I love is the traceability of Cypher, effectively. When you're building the query-- I don't know if you guys remember from back in the SQL days of trying to work out, where one calculation's going wrong, or where you're trying to do your joints and things like that. Just looking how you can model with Cypher, that query, it makes it so much easier to work out where you're going wrong. I'm sure you know this one, Rik, from seeing all your posts in the Cypher chatroom... 
RVB: Well, I like to get my hands dirty,  and so sometimes they're covered with mud [chuckles]. 
LG: My least favorite thing, I want to see some more visualizations. That's what I want to see. 
RVB: More interactive visual querying, stuff like that. Is that--? 
LG: Yeah. We've got Popoto.js, which is one of the open source projects, which people can use with Neo. That allows you to visually query the database, and it builds your cycle for you. But I'd like to see that brought into the main stack for instance, which would be awesome. If you just could literally drag the different node types onto screen and say, "Build me my Cypher, query my data," it will make people go into production quicker, I think. 
RVB: We've had a bunch of other people on this podcast series from Linkurio.us, from Prologram. I've been talking about visualization. It's such an interesting topic, and there's so much work to do there. It's a good one. Absolutely. Cool, guys. I think we'll wrap up this recording for now. Thank you for coming online. I really appreciate it. It was great having you as guests, and I'm sure there'll be lots of other opportunities to see you guys at work, on stage, at one of the meetups, GraphConnect, whenever, right? Thanks a lot. I appreciate it. 
PC: Oh, thank you, Rik. 
RVB: Cheers, guys. Talk to you later.
PC & LG: Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Monday 15 June 2015

Podcast Interview with Nat Pryce, independent software developer

Waw. Episode 27 already of this podcast series - that's pretty cool, if you don't mind me saying it myself. Ever since I started playing with this idea last March at Qcon, I have had a ton of really interesting conversations with truly interesting people. And the conversation below is another highlight: an interview with Nat Pryce.

I first was introduce to Nat in the fall of 2013 when we asked him to give a talk at our London meetup group about the stuff he did for Sky to optimize the memory usage of their set-top-box. Pretty amazing. And so when I reached out to him about the podcast he was immediately up for it, and that's when we had the following late-night, after-the-kids-went-to-bed conversation.

Here's the transcript of our conversation:
RVB: Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology and here I am again recording another episode for our Neo4j graph database podcast. And tonight, I am joined all the way from the UK by Nat Pryce. Hi Nat, how are you?
NP: Hello. I'm very well, thank you. 
RVB: Very good, excellent. Nat, thanks for joining us. It's great to have you on podcast. For those of our listeners that don't know you, do you mind introducing yourself, and tell a little bit about yourself, what you're doing right now. 
NP: Okay. I am a freelance software developer. I work for companies who basically want some consulting, helping with teams and agile software development and software engineering, and actually just developing products. I usually work for companies for a couple of years at a time, working on a product and working with the team to help transfer skills. 
RVB: But you've written some books and stuff as well, right? 
NP: Yes. I wrote a book with Steve Freeman called Growing Object-Oriented Software, Guided by Tests, which is a book about test room development and how it is applied in the wider software development life-cycle and process. 
RVB: I saw that, it was really interesting so very cool stuff. So Nat, how did you get into graphs and graph databases? Tell us a little bit about that. I've read some of your posts and I've seen some of your talks but maybe you can tell us a little bit about the history. 
NP: Okay. Well, the first thing that got me interested in graph databases was actually I had a crazy idea for writing my own programming language, which would not have a syntax but would actually be represented as an abstract binding graph, in a graph database, and then projected out into the different views that you could manipulate. So your editing of your program would be done by graph transformation. So I looked around to see if there was anything that could do that for me, so I wouldn't have to write it myself, in terms of just drawing the graph, and found Neo4j. It looked very easy to get started with and so, yeah, that's how I picked it up. 
NP: It was very easy to get up and running. This was quite a while ago before they added Cypher when it was an embedded Java library pretty much, which was exactly what I wanted for my particular project that I was experimenting with. So it was just really a crazy idea that got me interested in it and then I realized what a useful tool it is. I mean graph databases, I find them very attractive because they've got a very good form and theory behind them, and I find it very natural to represent my data in terms of property graphs. 
RVB: Wow. Did that project ever go anywhere [chuckles] if I can ask? 
NP: Yeah. I got a very, very simplified scheme-like language doing function applications and simple calculations. I didn't really take it any further than that. 
RVB: So when did you actually ever use Neo in more like a production context? I read some of your work on the Sky set-top box, was that the first time? 
NP: Yeah. So I guess my use cases are maybe a little bit different from a lot of other Neo4j users, in that I pretty much use it for ad hoc data analysis. So in that use case, the fact that it's really easy to install and I can throw data into it very easily and then do exploration with Cypher, and then get some visualizations up, is for me the killer feature of it. At Sky, we were working on their set-top boxes, which are embedded systems so they've got quite a limited amount of CPU power and fixed memory, and there's millions of them in the field. You can't upgrade the memory so we were trying to cram more and more functionality onto the box but into a fixed amount of memory. So the memory constraints were becoming more and more of an issue, and as we were trying to get to release, we were getting some out-of-memory situations when we needed to track them down. 
NP: The box was running a proprietary clean-room Java virtual machine that was optimized for efficient use of space on embedded systems, rather than performance. It didn't have a lot of tooling to analyse its behavior. So we basically had a fixed release date and some memory problems, and we had to build tooling to help ourselves analyse what was going in the Java virtual machine, which was a proprietary piece of software. So we couldn't look at the source code, we couldn't really understand a lot of its behavior, but we could get heap dumps out of it. So we could dump the heap dumps out but in a non-standard format but that was very easy to pass and heaps with objects relating to objects, all natural representation as a graph. 
RVB: As a graph. 
NP: So I immediately thought, "Oh right. Well, I know Neo4j, I've played around with it, I downloaded the latest version, I passed the data, just used a batch insert Java API and blasted the data into the Neo4j database on developer work stations." So that when we were working with the boxes, we could dump the data out and then query it with Cypher to understand what was going on inside the memory of these machines as they were running. 
RVB: Super cool. I've seen the talk and I've read the material that you published on it. I think it's a fantastic use [chuckles] because then actually there's quite a few people that are doing things like software dependency analysis on Neo4j as well. It's an interesting field I think. 
NP: Absolutely. I think a lot of aspects of software are naturally modeled as graphs, aspects of programming languages, of dependencies, core flow. Graph theory is a really good fit for many software-- different parts of different aspects of software development and understanding software development, understanding the contents of your version control history and all of this. It naturally falls out into graph analytics. In this project, we were discovering all sorts of unusual things about our own software that we didn't know how it behaved, about the JVM that we were using, about the Java compiler and how it was behaving. We were discovering things that we literally only found one page on the internet that was explaining what we were discovering in our heap. 
NP: We got some really good results out it. We were able to optimize the memory on these boxes and get the release out and it was a big success. But also we were able to-- because we were working with a tool called ProGuard which is also used for Android, and we were finding that the way that Java 5 and above was being compiled into Java bytecode was quite wasteful of memory. So we got in touch with the guy who writes the ProGuard tool and Sky funded him to write new optimizations into the tool, to optimize the memory for these particular cases we were discovering. That ended up being rolled out then released as open source so then everyone benefited so that was a good result.
RVB: Super cool. Yeah, absolutely. I've also read some of the more important work that you've done on word puzzles
NP: Yeah [laughter]. 
RVB: I thought that was very funny and interesting as well. 
NP: Yeah. Again, that was an experiment with graph modeling, something that doesn't initially look like a graph, but actually if you can work out how to model things as a graph. So the puzzles you're alluding to I think are called Word Ladders
RVB: Yeah. 
NP: They were invented by C.S. Lewis I think (note from RVB: it seems like they were invented by Lewis Caroll, author of Alice in Wonderland) , and you need to go from one four-letter word to another four-letter word in a number of steps where you only change one letter at each step, but you always have to change it into another real word. So you can model that as a graph where each step is a link from word to another. Then solving puzzles is just a matter of finding a path through the graph or step. 
RVB: Super cool, super cool. So maybe we can talk a little bit more about the future, Nat? What do you think graph databases will look like in the next couple of years, and what will they be used for or that kind of stuff? What are your thoughts on that? 
NP: I think thinking about how I use them, and I know that I can see that there's a push to have larger clusters and a lot more processing, but actually the kind of things I'm using them for which is ad hoc analytics, what I'd like to see is something that allows me to more fluidly move between exploring the graph with Cypher Queries, and then visualizing it and then visually selecting part of that visualization, and then using that as the starting point for another set of Cypher Queries to find more data. So the browser in Neo4j is great but limited because it is just like the basic access to the database. 
NP: I'd love to see much more interactivity and moving between querying and visualizing and exploring, for the kind of things that I do. Often I'm looking at graphs and I don't really know what I'm looking for, and so I'm exploring around them to try and discover interesting patterns. That's definitely the way we were working on those heap dumps at Sky was we didn't know what our problems were. We would try queries, discover things, use Cypher to summarize the information, and then dig deeper with some more exploration or visualizations. I'd love to see more of that kind of use case provided by tooling around graph databases. 
RVB: Is it mostly the visual aspect do you think, or is it more that interactive capability of intuitively going through the graph, or both maybe [chuckles]? 
NP: I think it was a mixture of both. There are some excellent visualization tools, and I'm thinking of Tom Sawyer and things like that, are incredible. But I'm a big fan of the Cypher Query language, I find it very powerful and very elegant. So what I was finding was I'd be doing a little bit of Cypher to find some information then I'd explore visually through the browser, and then I'd find some interesting new starting points that I wanted to then use as a starting point for more Cypher Queries. The current browser doesn't really make that easy, so I could see there could be some tooling around that mixture between writing and running queries and exploring interactively and visually. 
RVB: Yeah, absolutely. Well, there's a lot of activity on that front right now, both in the community and at Neo. There's a lot of work going into making the browser better but also there's some fantastic tools out there, both commercial and open source, that help you do that I think. So thank you for that perspective, I appreciate that. So Nat, I think we're going to wrap up here. We like to keep these podcasts short and I know it's getting late here in Belgium as well. I need my beauty sleep [chuckles] so thank you for coming online. It was a real pleasure talking to you and I hope to see you at one of the Neo events in the future. 
NP: Yeah, absolutely. Thanks for inviting me. 
RVB: Thank you. 
NP: Thank you. 
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Wednesday 10 June 2015

Dates as Numbers

This will be a short blogpost probably, but I am hoping it will be a good/useful one.

Something that I recently had to do was to store DATE information on nodes in my Neo4j database. I was loading the data from a CSV file, and in that CSV file the date information was formatted as
dd-mm-yyyy
In other words, as a string. Now that is not very helpful in any database, but in Neo4j particularly because we do not have a date-datatype primitive. The next best thing then, would be to store the date as a number - so I asked myself the question how I could load the data straight from the CSV file, transform the string into an integer, and then store the integer as a property. Not that difficult, you would think. Until I - with my limited skills and tiny reptile brain - gave it a try and immediately hit some issue. So what then - of course I called a friend, again!

So let's try this. I have put the data and the sample query on github. So let's take a look

If I run the following query
 load csv with headers   
 from "https://gist.githubusercontent.com/rvanbruggen/39be473ef3d69ce9a316/raw/24e9d5470abbfbe60d7fedb2005117a5259090d9/source.csv" as csv    
 return *  

I can clearly see that the date is in the above format:

So then I would run a simple load script to read the CSV, transform the dates, and then create a node for every row in the CSV:

load csv with headers  
from "https://gist.githubusercontent.com/rvanbruggen/39be473ef3d69ce9a316/raw/24e9d5470abbfbe60d7fedb2005117a5259090d9/source.csv" as csv  
create (p:Person {name: csv.Name, birthdate: toInt(substring(csv.Birthdate,0,2))+toInt(substring(csv.Birthdate,3,2))*100 + toInt(substring(csv.Birthdate,6,4))*10000});  

Look at what happens with the csv.Birthdate field. by using the "substring" string function, I am selecting the dd first, adding the mm*100 to it, and then adding the yyyy*10000 - so we get something like this:

yyyy0000+mm00+dd

which will give us a number equal to

yyyymmdd

Isn't that sweet? As you can see below the 5 nodes were created:

And indeed, the properties were set as integers, not strings:

So there we are. A nice and simple way to import data and do some simple but important transformations on the go.

Hope this was useful.

Cheers

Rik

Tuesday 9 June 2015

Podcast Interview with Tom Zeppenfeldt, Ophileon

One of the topics in the Graph Database space is that is truly dear to my heart is the visual aspect. Graphs seem to be - for some very deep and profound reason, is my guess - a very natural way for humans to interact with data. And what better way to do that then in a truly visual way.

So I still remember 2+ years ago or something, this guy shows up at our Amsterdam meetup gathering and starts talking to me about the Neo4j browser interface - which was still in its infancy at the time. All of his questions made total sense, and I kindof wish I had had better answers for him at the time. But I kindof also am happy that I didn't, because that's why Tom Zeppenfeldt but on his "working gloves" and got to developing a new tool that looks really promising: Ophileon's Prologram.

Tom has recorded some really nice videos online where he is showing some of the capabilities of Prologram:
So that's when I asked him to come on the podcast too - and here's the result:





Here's the transcript of our conversation:
RVB: Hello everyone. Here we are again, recording another episode for our Neo4j Graph Database podcast. My name is Rik Van Bruggen. I work for Neo Technology, and on the other side of this Skype call, all the way in the Netherlands, is Tom Zeppenfeldt. Hi Tom.
TZ: Hi Rik. How are you doing?
RVB: I’m doing very well. How about yourself?
TZ: Yes. I'm very fine.
RVB: [chuckles] That's great. Well, Tom, most people who are listening to this Podcast probably don't know you yet, so would you mind introducing yourself? Who are you, what do you do, and what's your relationship to the wonderful world of graph databases?
TZ: Okay. Well, my background is not in IT. By education I'm an agricultural engineer and the main field where I've worked over the last couple of years was in international development in Africa and Latin America and that's also where there is a link between-- the link with Neo4j.
RVB: Oh, no way. Tell me about that.
TZ: Yes, as you probably know and there are lots and lots of types of development projects in agriculture, infrastructure, education and health for instance, and one of the typical things that you use in that context is what they call Result Chains, so diagrams that link activities and organizations and results and impacts and effects to each other and also a concept that is called Actor Constellation Mapping. That is a way of diagramming all the interests of different stakeholders in a project and how people collaborate, how they form networks, and  when I say networks I'm already talking graphs.
RVB: Yes, absolutely. Wow, that's great. How long ago did you first encounter Neo and how did that...?
TZ: That happened about four years ago and when we ran into a very concrete situation where we had to make these kinds of diagrams accessible over the Internet. So instead of drawing Powerpoints with graph-like structures in it, we were looking for a database platform that would allow us to share these things and make these things interactive over the Internet.
RVB: So, then that sort of created the desire to store them in a network way as well, I suppose.
TZ: Yes.
RVB: Yes. Okay. What are you doing now with Neo, Tom, because I know you're really doing some really fancy stuff, but tell us about that.
TZ: Yes. Well, currently our main project is a project that is partially privately funded and partially subsidized by the dutch government, and which is focusing researchers like journalist that do research or investigations on a specific subject. It include partially Neo4j for storing relations between documents, but also documents and content from social networks and also all kinds of tagging and metadata and for that, we have also a corporation with two universities in the Netherlands  that help us to automatically classify and tag documents or to detect relationships inside documents.
RVB: Wow, that sounds really great.
TZ: That's the main project we are working on now, yes.
RVB: Wow. As I understand there's also a big visualization component to that project, right. There's a lot of stuff that you're doing on how to bring that information visually to those researchers?
TZ: Yes, and indeed that's in terms of what we develop in terms of software is in fact, you can consider it an enhanced browser for the Neo4j database because once we were starting this project, we were running into-- well, we wanted to add more than the standard Neo4j browser, and that is why we now have a browser environment with multiple panels and panels that you can link to each other and in one panel you can display the data as a network, in other panels you can display data as a table and you can link them to each other, and that's what we call-- this product is called, “Prologram," for now, and that is what a number of developers are working on actually, yes.
RVB: That's very cool.  I've seen it live when you presented it at the meetup and everything, but I've also seen some of the YouTube videos and I'll post those with the podcast, as well.
TZ: Okay, great.
RVB: Maybe, just looking at it in a little bit more detail, what do you think is the most powerful aspect of this?  In other words, why did you end up choosing a graph database for doing this type of a project?  Any comments on that?
TZ: Yes. Well I come from … my background; I once had an IT company myself and that was typically relational databases, and finally we ended up building a metadata layer to mimic graphs on top of it.  In fact, that says it all, because the real world is far more complex than you can model, well in tables. Specifically the domains that I've worked in, be it agricultural development or now investigative journalism.
RVB: Yes.
TZ: If you have a very generic and basic structure like you have in Neo4j where also not just the notes but also the relationships are really things in itself, you call them first class citizens and are very descriptive. It allows you - even when your data model is changing or expanding and that is happens very frequently - You don't have to do a complete overhaul of what you already have to have and still have an optimize structure. So, you can keep on building and adding stuff and that's where for us one of the main advantages of Neo4j and that is on top of, of course the way that you can and really do optimized and very localized searches in your graph database, and I have the experience with building meta-structures on top of relational database then the number of joins is incredible and finally it makes it workable.
RVB: Yes. I don't know if you've listened to any of the other podcast episodes but some of the Neo4j founders have been on there as well and that's how they started Neo4j. They started developing a meta-layer or graph layer on top of-- it was Postgress at the time I believe. I think you're coming to do right conclusion there [chuckles].
TZ: When the...
RVB: Go for it. You wanted to add something?
TZ: Well, for us it was quite easy to pick the concepts, to understand the concepts and that's also I think we now are making nice progress with what we make as a generic browser on top of Neo4j is that we come already,  we came already from a graph, a mindset that knew the advantages of a graph database.
RVB: So what is the future of all this? Tom, where do you think this industry is going, where should Neo4j go? Where is the Prologram going?
TZ: Well, that's the-- You said it's ten minutes, this podcast. [laughter] [crosstalk] RVB: Well, let's limit-- Okay, well, I think graph database are there to stay, it's not the hype. Especially when relations between contents become more and more important than perhaps the properties of content… For instance, we are now working on recommendation engines and while, in the the beginning, people got recommendations on the basis of properties or links inside the content itself, so very explicitly - now  we are already doing test with integrating into recommendation, the social aspects. Then you finally end up concluding that the suggestions that you can make on the basis of how people use the contents and that is typically something that you can easily store in the graph. They are better than the suggestions that you can make on the basis of the explicit tagging of a content.
TZ: As the world gets more and more connected, whether it's Internet of Things or documents that are shared. Well, it's the graph model is a very nice way to describe - you get more exact models than a relational database. I think the position of graph databases in the world is an established one. What we are now doing with the Prologram platform, we have been building it with let's say, about three to four persons of the last 14 to 15 months. We now have a version that is -- on one end it's good enough, but we are; say in the next two to three months we will probably start sharing it with the world because we want to know how people interact with it, and also I'm sure you cannot keep on developing without user feedback. We have already quite some users and in terms of functionality there will be additions like well integration with the elastic search, a complete function and trigger system that allows you to change theories, virtualization of notes that allows you to make aggregations - to visualize aggregation of notes and relationships. So, that's where we are more or less going for the next quarter. That's more or less the road map that we have.
RVB: You discover it as you go. Right?
TZ: Yes.
RVB: It sounds really exciting. We'll wrap it up at that. Tom, it was really great talking to you. Thank you so much for coming online and apologies for the technical hiccups that was entirely my fault. But, thank you for coming online. I'm sure will see each other again at one of the next meet-ups.
TZ: Okay. Thanks for the opportunity and keep up the good work.
RVB: Thank you. Cheers, bye.
TZ: Okay. Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Friday 5 June 2015

Podcast Interview with Tobias Lindaaker, Neo Technology

Next month, I will be celebrating my 3-year anniversary working for one of the best companies I have ever worked for: Neo Technology. It seems like I have been here a lot longer - so much has happened in those three years! I guess time truly flies when you are having fun :) ... but it makes me wonder sometimes - what was it like in the early early beginnings. What is it like for people that have been here a LOT longer to look back on the Neo4j journey?

So I decided to ask. Here's a lovely conversation with Tobias Lindaaker, employee number 1 of Neo Technology. Overall nice guy and top engineer - he's got a lot of ideas and perspectives on that journey:

Here's the transcript of our conversation:
RVB: Hello everyone, this is Rik - Rik van Bruggen - from Neo Technology, and here we are again, recording an episode for our Neo4j Graph Database podcast. It's a wonderful Wednesday afternoon, and together with me on the call here - Skype call - is Tobias from Neo in Malmo. Hi Tobias.
TL: Hi Rik?
RVB: Hi, welcome on the podcast.
TL: Thank you, thank you.
RVB: Tobias, I invited you to the podcast for a couple of reasons, but maybe you could start by introducing yourself, so people know who you are.
TL: Yeah, so I am Tobias Lindaaker. I work as a senior developer at Neo technology, working on pretty much all things in the development of Neo4j. I've had my hand in almost all feature stuff we've released.
RVB: That's pretty amazing, and that's probably a good lead into  why I sort of wanted to talk to you, because you're probably one of the first engineers that Neo hired, isn't it?
TL: I was the first engineer that the company actually hired.
RVB: When was that?
TL: That was in 2007. So September 2007, I got on the company to help our very first customers with using Neo4j.
RVB: Wow, and how did you get into it? Did you know some of the founders then?
TL: Yeah, me and Emil went to college together. I taught him calculus, and he thought that I was good enough with that, that he might as well give me a shot at doing this job for him.
RVB: [chuckles] Any gory details that you can share with us [chuckle]? I'll always appreciate it.
TL: On Emil--
RVB: Yeah, of course.
TL: —from the early days?
RVB: No, don't go there. Don't go there.
TL: So, I did not only  teach him calculus, he taught me a great deal about programming, because he actually wrote code back in the days, in particular about testing, because he was really big on testing. He spent more of his time writing tests for his code than writing the actual code. He was always last handing in his lab assignments, because he spent so much time developing his tests, and his testing framework, and all of that stuff. He didn't actually get the job done, he just played around it.
RVB: Do you still still remember what was the first feature or project that you were working on at the time?
TL: With Neo4j?
RVB: Yeah.
TL: The first project was a customer project. I was working on a geospatial system. We were doing road user charging, so essentially road tolls based on GSM-based positioning. We had a algorithm or an idea for an algorithm for how to use GSM triangulation, by knowing the position of the cell towers. We had two modes for this system. One where we would collect data and store it out in Neo4j, to match what the signal profile from different cell towers looked like when driving on particular roads. We would find the profiles for roads, and then we'd use that when a user was driving in the same area, to match the signal profile from the cell towers, with the road that he was driving on. So, that was the first part of what I did for Neo4j.
RVB: Super cool. So, how's it been? What's it like to work for a start-up that's been going through all this evolution in the past eight, nine years?
TL: The main thing for me, is the fact that it's the start-up or it has been a start-up. Finally this year, I'm starting to see signs that we're growing up and become a mature company, but there's been lots of ups and downs getting there. There was a failure to get investments of 2008, where the company nearly went bankrupt, and we were almost out of jobs - all of us. Then of course, there's been frustrations with things that happen when you on board more people, and you get less and less influence over the company because the company grows and becomes bigger. Pretty much when we started, everyone was on the same level, because it was just a bunch of guys working on the same thing together.  Then, as we've grown, I've stayed on the “working on things” level, and the people who were with me at the time have gone on to be CEO and CTO and such things, so there has been a transition, the relative influence at least in formal terms has diverged overtime. It's interesting that I get less and less time with my old friends at the company.
RVB: Yeah, but I suppose there's new friends coming on board right? There's new things happening--
TL: Absolutely.
RVB: —and there's new exciting stuff happening with the products as well.
TL: Absolutely, and in terms of the product, it's the main driving feature why I love this company. I think the product really has a lot going for it.
RVB: I couldn't agree more. What do you think the future holds Tobias, both for you from personal perspective, and for the company and your job there?  What do you think is in store?
TL: As I said, the company is growing up now, and I think we will start seeing effects of that in the product pretty soon. In that, since the team is growing, we've got more developers now than we've ever had before. Last year we started hiring developers for real, started actually growing a development team, and this year those developers are up to speed with what we're doing, so we can start continue hiring even more. What we're starting to see now, is the ability to work on a lot more features at the same time, and even start taking risks in what features we're developing, and invest in sort of high risk-high reward type of projects, where we aren't really sure if they will pay off, but we've got enough people that we can spend a small team  actually trying it out. That's really exciting both in terms of getting to work on those things, but also in terms of the potential that those features can deliver.
RVB: Maybe I can finish off with one question. What is your favorite feature of Neo4j?
TL: My favorite feature?
RVB: Surprised you there, sorry [chuckles].
TL: The … so there’s… I can either go very general and say that I really like the model of the Graph Database, because that's always been the main driving thing behind why I wanted to work on Neo4j, because I really like that model, but I'm not sure that really qualifies as a feature.
RVB: No, it isn't, right.
TL: So, in terms of something smaller, I'd say the Query language. Is that good enough to qualify as a feature? It's a recent thing in terms of if you compare to how long I've been with the company. The Query language hasn't been with product for as long as I have, but it's really nice to see the expressivity and usefulness of it.
RVB: Yeah, I know. I couldn't agree more. Cool. Tobias, thank you so much for coming on the podcast. You know that we want to keep these things nice and short, so I appreciate it. It's been really cool  having you here, and I'm going to be talking to a lot of other colleagues, and people that are working on Neo a lot less long than you have. So, it'll be interesting to compare notes in the next couple of episodes.
TL: Yeah, I'm looking forward to hearing that.
RVB: Absolutely. Thank you so much Tobias , and I"ll talk to you soon.
TL: Thank you.
RVB: Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik

Monday 1 June 2015

Podcast Interview with Aseem Kishore, 53

Had a super chat with one of the long-standing Neo4j users at our friends at fiftythree.com. Aseem Kishore has been a real advocate for Neo4j since a few years now, and his story is really cool. Here's a first presentation from 2013 that details the early parts of their story:

He actually did another talk at GraphConnect 2014 as well, but for some reason that one is not embeddable: watch it here.

So time for a Podcast interview:

Here's the transcript of our conversation:
RVB: Hello everyone. My name is Rik, Rik Van Bruggen from Neo Technology, and here we are again recording a episode for our Neo4j graph database podcast. Tonight's session is being recorded all the way across the atlantic, and I've got Aseem here, Aseem from FiftyThree. Hi, Aseem. 
AK: Hi, Rik. How you doing? 
RVB: I'm doing really well, and you? 
AK: I'm doing great. 
RVB: That's fantastic. Thanks for making the time, and again, apologies for me messing up the schedule twice [chuckles]. 
AK: No problem. 
RVB: So thanks for coming online. Aseem, would you mind introducing yourself to our audience here? 
AK: I'm Aseem Kishore. I'm an engineer here at a start up in New York called FiftyThree. We make an iPad app called Paper, and a hardware stylus called Pencil, and a collaboration service called Mix. 
RVB: I've heard so much about it. I actually saw a pop-up store of FiftyThree in the airport in London last week. You guys are expanding very quickly, aren't you? 
AK: Yeah, hoping to, trying to. 
RVB: How many people are at FiftyThree? Can you tell us about that? 
AK: We actually crossed the 53 person [chuckles] mark maybe a couple of months ago, so we're approaching 60 now, I think. 
RVB: Fantastic, and you're based in East Coast, U.S. right? 
AK: We're actually split between New York and Seattle, so most of our software's in done in New York and most of our hardware's done in Seattle. 
RVB: What's your relationship to Neo4j Aseem? Can you tell us about that? 
AK: We use Neo4j as our primary database for Mix, which is our collaboration service. And in general, from the start of the company, that's been the database that we've been building our back end services on. 
RVB: So it's real time applications that you build on it. What applications are those? 
AK: The biggest one is definitely Mix, which is-- basically you can think of it like a social network for ideas.  So the sketches that people are creating people in paper, they can post them to Mix and then the key feature that we have is that you can take another person's sketch, or idea and expand on it, so we call that re-mixing. We've built up this-- not just the social graph but really an idea graph or creativity graph. So you can see the evolution of ideas from just-- for example if you're a UI designer you can see the evolution from just a simple wire frame all the way to all sorts of  expirations and directions to take that wire frame in. 
RVB: So it's really a complete solution, starting at the pen, to the app, to the social platform. 
AK: Exactly yeah. 
RVB: Very cool. How long have you been using Neo4j for that then? 
AK: Mix has been in production for a half year, we were using Neo4j just for things like accounts, and things like that, before  we had Mix for maybe a year before that, and then developing with it for another year before that. 
RVB: So, why graphs? How did you come to Neo? How did you come to graph databases and why did you pick it? Could you expand on that? 
AK: That's a good question, why graphs? The easiest answer is that we always had Mix in our sights, and with the idea of free mixing, ideas can really go anywhere. They can go on all sorts of directions, so some of the examples we see on our own site are someone might publish a very useful template, let's say an agenda for the day. So that will get a lot of directory mixes so the shape of that tree will be very broad, and in other cases we see a lot of back and forth collaboration happening. So for example, a couple of architects working together might start with a simple sketch of a new house and then they keep on adding different elements and tweaking things here and there, and so that can become a very long chain. And then you see all sorts of things in between, so the  the whole notion of ideas can go anywhere, suggested to us that we needed to have an approach that was flexible to that. 
RVB: And then that’s the graph model, basically that delivers that isn’t it? It’s our-- 
AK: Exactly. 
RVB: Fantastic. So what has the experience been like so far? Has it been delivering on its promise? 
AK: In terms of the flexibility, it absolutely has. A good example is early on, when we were still in a private data for Mix, we were trying to decide how exactly we want to convey this idea of free mixing to users. Initially we started out similar to let’s say You Tube video comments, where when you are looking at a particular idea you can see the remixes of that idea directly underneath it, then you click on one of those and you can see its remixes and so on. But then we realized that  this is really too many clicks, a lot of times you really want to get the whole story in one glance and so we were able to very easily change our UI to show the entire tree of remixes on one page, and that was a very simple query change. We didn't have to change the way we store the data or denormalize it, or anything like that, so in terms of that of flexibility it absolutely has. One of the things we've been growing with is just running your Neo4j in production at scale, so Mix is growing very quickly. We hit the million user mark - I forget the exact number off the top of my head but I think it was within the first couple of months, which is pretty fast for a service - and we're continuing to see some pretty rapid growth. So we're just now trying to learn how we can scale our cluster the most effectively. 
RVB: Have you been working with someone on more recent versions of Neo then as well, or is that all based on older versions or ... How do you guys typically work that? 
AK: Good question, we try to stay pretty recent. We launched with a new Neo4j 2.1 and that's what we've been on. We're now just starting to experiment with 2.2. 
RVB: As you probably know there's a lot of attention going in that direction at Neo, so I'm hoping that we're going to stay aligned over the next couple of months and years, it's going to be great help. So what does the future hold Aseem? Where are you guys going? Where are graph databases going at FiftyThree? What do you think is in store for us? 
RVB: I guess for where FiftyThree is going, there's obviously a lot I can't talk about, but in general what we see our goal as, is to  enable all sorts of people to be creative. So not just people like designers and architects and artists, but software developers, business people, lawyers, financial advisors, investors. All these people that--  they don't think of their everyday job as being creative, but it turns out that in all of these professions, there's a part of the process where creative thinking and being able to communicate visually and get your ideas down on paper is critical. And so these are the kind of tools that we're building and so we have some pretty big things planned for this year, and Neo4j going to actually play a significant part in some of these. We're going to continue to push the envelope on what we're able to get out of our graph database, even perhaps even more real time and perhaps even much more write through-put, for example, much more scale. As far as word graph databases in general are going, I don't know that I can personally say, but what we've been really happy with is that the domain is so flexible that if we can make it work for us, we actually prefer to. So, one thing we've been seeing at other companies is that Neo4j is often used as a secondary database. So for example having talk to Medium, most of their core data is actually stored in Dynamo but then they use Neo4j to store the specific relationships like following, and recommendations. And that approach makes sense, but for us, it's so convenient to have all of the data in Neo4j because now we can use it, increase, our transactions are uniform things like that. So we're going to continue to discover where exactly can't we use corrupt databases and we're hoping that's not many places. 
RVB: I sure hope that, you know because graphs are so white board friendly that they would also be very Paper friendly, right? That would be great. 
AK: Exactly, yeah [chuckles]. 
RVB: The visual aspect is definitely something that, you know, they're very, very nice match with your thinking. Well, Aseem-- 
AK: The fun story is we just launched a new feature in Paper, a new set of tools we call Think Kit, which are all about diagramming and white boarding. And fun stories, all of the hand tuning of heuristics of shape recognition, all that stuff, one of the use cases was definitely drawing graphs and a couple of my-- 
RVB: No way. 
AK: [crosstalk] were used in our neutral testing which is funny. 
RVB: I saw one of the guys, Michael Hunger was already using it, so [chuckles] that was very cool. Thank you so much, Aseem for coming online and doing this recording with me. I really appreciate it. 
AK: No problem, thank you Rik. 
RVB: We want to keep these things quite short, so I'm going to wrap up here, but not without thanking you again and hoping that we'll see each other someday at some GraphConnect Conference or something. 
AK:Great. 
RVB: That would be lovely. Thank you, Aseem. 
AK: Thank you, Rik. Take care. 
RVB: Bye.
Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!

All the best

Rik