So of course, we had to pull out the old trick (started at Oredev 2014 actually - so quite some time ago!) of creating a "Conference Schedule Graph" for everyone to explore.
Showing posts with label conference. Show all posts
Showing posts with label conference. Show all posts
Wednesday, 26 April 2017
Graphconnect Europe 2017 schedule graph
So of course, we had to pull out the old trick (started at Oredev 2014 actually - so quite some time ago!) of creating a "Conference Schedule Graph" for everyone to explore.
Thursday, 8 September 2016
Podcast Interview with Dustin Cote, Luther College
I have called to him before, but I am going to do it again: thanks to the wonderful Bryce Merkl Sasaki at Neo4j, I have had the pleasure of talking to sooooooo many wonderful guests on the podcasts. People that I would otherwise probably never have talked to, for all kinds of reasons. Another one of these folks is my guest on the podcast today - Dustin Cote. An active graph enthusiast in the heartland of the Midwest - and a great knowledgeable database engineer at that, who I really enjoyed talking to. Here's our conversation:
Here's the transcript of our conversation:
All the best
Rik
Here's the transcript of our conversation:
RVB: 00:02.617 Hello everyone. My name is Rik Van Bruggen from Neo Technology and here I am recording another podcast session for the Neo4j Graphistania podcast. Tonight, I'm joined by someone from the beautiful Midwest in the USA. I've got Dustin Cote from Decorah, Iowa on the Skype call. Hi Dustin. How are you?
DC: 00:25.316 Hi Rik. I'm doing great. How are you
RVB: 00:27.268 Very, very good. Thank you for joining me. It's always great to have people from different parts of the world on this podcast. I've read some of your work in the communities and in the GraphGist, Dustin, but most of our listeners probably haven't yet. Do you mind introducing yourself to us and telling us who are, what do you do, and what's your relationship with the wonderful world of graphs?
DC: 00:53.085 All right. My name is Dustin Cody. I currently live in a northeast corner of Iowa in a small rural town by choice. I work for Luther College. It's a small, private college and just started last year. Prior to that, I was working at the University of Wisconsin-Madison and worked there for seven years as a PeopleSoft programmer analyst with emphasis in database design. Before that, I was a data warehouse administrator, and before that, I was doing stuff with databases and before that, I was learning to program in seventh grade. I've been around for a while, well seasoned as they say. That's what I'm working on, mainly because of my experience with ERP systems. Seems like no one wants to work on old technology, so there's always a niche for that.
RVB: 01:45.121 The world would stop turning without old technology, I think.
DC: 01:49.133 It would, believe it or not.
RVB: 01:50.191 It would, I absolutely do. Dustin, what's your relationship with Neo4j and graph databases? How did you get into them?
DC: 01:59.511 Very recently, there was a competition for GraphGist on Neo4j and I thought, "What a great to finally finish one of my project ideas." It had a deadline and basically, it was the only thing I needed to finish the project. I put together a conference data model, because I was going to a conference and I wanted to know different things from the booklet they gave you, but because it's in a certain order, you can't find out certain things. I knew that a graph database would be the perfect way to query on different angles and different ways of looking at your data. Before that, I would say back in 2008 when one of our companies moved and I was worried about competing against 25 Java developers, unleashed onto the town for jobs, I went back to school. Even though it turned out not to be an issue being reemployed, I continued to go back to school and get my masters. It was then that I was doing a research paper and one of the papers I was doing was to debunk over-hyped technologies. One of them was the Semantic Web.
DC: 03:03.407 I heard of the Web 3.0 and thought to myself that this has been over-hyped. I've heard about it for years and I haven't seen anything about it. So while researching it, I realized that this is exactly what I had been looking for to solve so many of my own database projects. For example, I had some projects where I was saving just spare data. So, for instance, when you save music, MP3s or songs, it's usually saved as artist, album and song name. Well, if you talk to classical music enthusiasts, they care about orchestras and maybe the conductor, or the original composer. They look at different things, but it's the same item almost, and you still have to store it. You know, by the time I was done designing all the different kinds of genres, there was just no database for relational databases that could handle it well. And when I looked into Semantic Web, I realized that this was the solution. And back then, there was a NOSQL movement. I would say Semantic Web - since it saves as tuples or triples - I believe it's based on graph databases, a little more formal perhaps. And so my natural-- so all of the uses I found for Semantic Web I also find useful for Neo4j. And at the time, when I was choosing which graph database to use, Neo4j was the one that I had heard the most. It had the best reviews at the time. And so I decided to spend the little time I had with raising two kids to learn a new technology. And so that's why I picked Neo4j.
RVB: 04:40.390 Interesting, yeah. It's very much related technology, and that we could talk for more than what we have in this podcast about the differences between Semantic Web databases and Neo4j, because they are quite different. But [chuckles], we'll take that off-line. But they are different but related technologies, I would say. What did you like about graph database then? What problems was it solving for you? Why was it such a good fit for some of the things that you had found problematic in other databases?
DC: 05:18.521 One of the things I've always liked is how you can make non-obvious connections. You might have two different graft sets that are unconnected at the moment, but some time in the future if you're still collecting data you might [inaudible] two nodes together, and then suddenly your same query would return different results, and, in fact, maybe solve a problem. I think one of the examples I saw long ago was perhaps besides the NASA and CIA kind of examples, would be different language authors, and they might have a different name in one language than another. You might be following him and you suddenly make the connection and realize that they have all this other body of work. So I like those kind of solutions that the graph database is continuously growing and making connections for you.
RVB: 06:08.085 Yes, like inferring new paths between different parts of the graph. Is that what I'm hearing?
DC: 06:15.632 Well-spoken. Yeah, exactly [laughter].
RVB: 06:19.269 Yeah, well. I mean that is a-- that's one of the examples that I always give. It's like the path finding. I've got these two things and how are they connected to each other, and show me those connections. Whether you're talking about a [BR?] data set or a social network or recommendation engine, that's one of the most powerful use case-- those hidden connections as you call it.
DC: 06:41.392 Right, and with database design, when you're working in large company, you have to spend so much time, ahead of time, designing things. And then once it's approved, and then once it's implemented, many months or years can go by and once it's done, you're pretty much nail in the coffin, and you really don't want to change it again because you don't know what you might change. With graph databases, sometimes you can add a whole new set of features or properties, and it won't affect the past data you had. It'll actually just enrich the data you have in your new applications that can leverage it. That's very powerful.
RVB: 07:16.441 That's such a powerful point that you're making there, and I think it's also why it's like a perfect storm now for graph databases because of the whole agile developing paradigm as well. You know, people don't develop waterfall systems anymore. You know, they try to take a much more leaner approach to software development. You don't know what you don't know when you're developing a new system. It's a great fit for that, I think, as well. Dustin, where do you think this is going? Where do you, personally, want to take this? I'll put some of the links to your work on the blog post with this podcast recording, but where do you see it going? What do you want to do with it? Where do you see the industry taking it?
DC: 08:04.811 Where I personally want to take it is, perhaps, more projects that can help categorize interests. I've been an internal description, is my industry. I'm might be interested in homesteading or something. There are podcasts out there that are in the 2,000 episode range and it would be great to be able to categorize those and find exactly what you're looking for, maybe even the author or the interviewee. Some day, maybe you'll have 2,000 interviews here.
RVB: 08:33.387 Oh my God [chuckles].
DC: 08:35.375 You never know how you can tie all those people together. People bring such great resources from their work. You know, [hack the plans?], it's more like categorize the plan. I think graph databases is good for that. As far as the world's future, I think this is a tool that's generic enough and powerful enough that someone out there that doesn't even know anything about this yet is going to come up with an idea, use your project and platform, and come up with something new. It's just that kind of technology that you're creating and it's really going to be anyone's imagination really, something that we haven't even thought of yet.
RVB: 09:18.040 Like a platform for innovation, basically? Like doing--?
DC: 09:20.356 Absolutely.
RVB: 09:22.665 Super relevant. I really like that perspective. I really do. I think, for example, in the past couple of months, we've seen at Neo4j when some of the Panama Papers research was published and stuff like that. That was for us fantastic validation. That would never have happened ten years ago and now it's happening because of the small contribution that we're making, I guess.
DC: 09:49.082 Absolutely. I like your ramp up time to test out Neo4j with that web interface you have, where you can start importing right away. It's a great tool.
RVB: 09:57.583 Very cool. Very nice. Thank you, Dustin, for spending your time with me on this recording. As you know, I like to keep these things short and snappy and digestible for everyone. We'll put some of the links to your work with the transcription, but for now, I want to thank you very much for coming online. I hope to meet you in person, face-to-face, at some point [chuckles].
DC: 10:21.682 It's my pleasure and keep up the good work.
RVB: 10:23.838 Thank you so much. Bye.
DC: 10:25.481 Bye.Subscribing to the podcast is easy: just add the rss feed or add us in iTunes! Hope you'll enjoy it!
All the best
Rik
Wednesday, 13 April 2016
GraphConnect Europe 2016 - of course we have a Schedule-graph
It's that time of the year again - GraphConnect Europe 2016 really is around the corner now. We have so much to look forward to - great users, community, customers, developers, engineers, speakers, founders - and everyone with a graphista-bone in their body is flocking to London on April 25th-26th. So of course, on this blog, I had to have some kind of little contribution to make - and traditionally (like I have done for many other conferences in different blogposts), I have to make an improvement on this "ugly" table:

This is: I really want to look at this data as a GRAPH - not a table. Of course, you say...

This is: I really want to look at this data as a GRAPH - not a table. Of course, you say...
The GraphConnect Schedule Spreadsheet
In orde to do that, we create a nice little spreadsheet version of the schedule first. Simple, in a Google Sheet: here it is. It really is pretty simple:Monday, 7 March 2016
Qcon 2016 obviously has a SCHEDULE GRAPH
Today is the start of an ANNIVERSARY edition of a great developer conference in London that most of us at Neo4j really like: QCON London is a yearly gathering of all kinds of software professionals in London, and is usually a great place to meet and learn lots of cool new things. It also marks that anniversary of our Graphistania podcast (which we started last year at the conference!), so I can honestly say that I am really looking forward.
Just like last year of course, I also wanted to share a "Schedule graph" of the conference schedule. As you can tell from the tabular webpage, we would need to do some conversions here and there to make it work:
But of course, we made it work. You can find all of the queries github of course.
It worked really well, and you can view the data over here now.
Then of course, I had to write a couple of queries to load the data. On github you will find two versions of these "load queries", and I am including the concise, all-i-one-go-version below:
load csv with headers from "https://docs.google.com/spreadsheets/d/1aFhc5zcxCEEPnS0FWmlcltcaYX6wCqhhbtsoZiRGu3A/export?format=csv&id=1aFhc5zcxCEEPnS0FWmlcltcaYX6wCqhhbtsoZiRGu3A&gid=1550249063" as hosts
merge (p:Person {name: hosts.Name, title: hosts.Title})
merge (c:Company {name: hosts.Company})
merge (p)-[:WORKS_FOR]->(c)
with hosts
load csv with headers from "https://docs.google.com/spreadsheets/d/1aFhc5zcxCEEPnS0FWmlcltcaYX6wCqhhbtsoZiRGu3A/export?format=csv&id=1aFhc5zcxCEEPnS0FWmlcltcaYX6wCqhhbtsoZiRGu3A&gid=0" as csv
merge (d:Day {date: toInt(csv.day)})
with csv
match (d:Day), (d2:Day)
where d.date = d2.date-1
merge (d)-[:PRECEDES]-(d2)
with csv
merge (r:Room {name: csv.room})
merge (t:Track {name: csv.track})
merge (p:Person {name: csv.speaker, title: csv.title})
merge (c:Company {name: csv.company})
merge (p)-[:WORKS_FOR]->(c)
with csv
match (d:Day {date: toInt(csv.day)})
merge (t1:Time {time: toInt(csv.starttime)})-[:PART_OF]->(d)
merge (t2:Time {time: toInt(csv.endtime)})-[:PART_OF]->(d)
with csv
match (t2:Time {time: toInt(csv.endtime)})-[:PART_OF]->(d:Day {date: toInt(csv.day)})<-[:PART_OF]-(t1:Time {time: toInt(csv.starttime)}), (r:Room {name: csv.room}), (t:Track {name: csv.track}), (p:Person {name: csv.speaker, title: csv.title}), (h:Person {name: csv.host})
merge (s:Session {title: csv.talk})
merge (s)<-[:SPEAKS_IN]-(p)
merge (s)-[:IN_ROOM]->(r)
merge (s)-[:STARTS_AT]->(t1)
merge (s)-[:ENDS_AT]->(t2)
merge (s)-[:IN_TRACK]->(t)
merge (h)-[:HOSTS]->(t);
Once I ran that query, and added the meta-graph, I had the following model lined up:
Obviously it is not a very big graph, but now we can do some exploring.
//Look at day 1
match (d:Day {date:20160307})<--(t:Time)<--(s:Session)--(connections)
return d,t,s,connections
limit 50
gives us the following graph.
Just like last year of course, I also wanted to share a "Schedule graph" of the conference schedule. As you can tell from the tabular webpage, we would need to do some conversions here and there to make it work:
But of course, we made it work. You can find all of the queries github of course.
Loading the QCON Schedule Graph
In order to get the data loaded, I first put everything into a nice little google sheet, with some furious copying and pasting:It worked really well, and you can view the data over here now.
Then of course, I had to write a couple of queries to load the data. On github you will find two versions of these "load queries", and I am including the concise, all-i-one-go-version below:
load csv with headers from "https://docs.google.com/spreadsheets/d/1aFhc5zcxCEEPnS0FWmlcltcaYX6wCqhhbtsoZiRGu3A/export?format=csv&id=1aFhc5zcxCEEPnS0FWmlcltcaYX6wCqhhbtsoZiRGu3A&gid=1550249063" as hosts
merge (p:Person {name: hosts.Name, title: hosts.Title})
merge (c:Company {name: hosts.Company})
merge (p)-[:WORKS_FOR]->(c)
with hosts
load csv with headers from "https://docs.google.com/spreadsheets/d/1aFhc5zcxCEEPnS0FWmlcltcaYX6wCqhhbtsoZiRGu3A/export?format=csv&id=1aFhc5zcxCEEPnS0FWmlcltcaYX6wCqhhbtsoZiRGu3A&gid=0" as csv
merge (d:Day {date: toInt(csv.day)})
with csv
match (d:Day), (d2:Day)
where d.date = d2.date-1
merge (d)-[:PRECEDES]-(d2)
with csv
merge (r:Room {name: csv.room})
merge (t:Track {name: csv.track})
merge (p:Person {name: csv.speaker, title: csv.title})
merge (c:Company {name: csv.company})
merge (p)-[:WORKS_FOR]->(c)
with csv
match (d:Day {date: toInt(csv.day)})
merge (t1:Time {time: toInt(csv.starttime)})-[:PART_OF]->(d)
merge (t2:Time {time: toInt(csv.endtime)})-[:PART_OF]->(d)
with csv
match (t2:Time {time: toInt(csv.endtime)})-[:PART_OF]->(d:Day {date: toInt(csv.day)})<-[:PART_OF]-(t1:Time {time: toInt(csv.starttime)}), (r:Room {name: csv.room}), (t:Track {name: csv.track}), (p:Person {name: csv.speaker, title: csv.title}), (h:Person {name: csv.host})
merge (s:Session {title: csv.talk})
merge (s)<-[:SPEAKS_IN]-(p)
merge (s)-[:IN_ROOM]->(r)
merge (s)-[:STARTS_AT]->(t1)
merge (s)-[:ENDS_AT]->(t2)
merge (s)-[:IN_TRACK]->(t)
merge (h)-[:HOSTS]->(t);
Once I ran that query, and added the meta-graph, I had the following model lined up:
Exploring the QCON Schedule Graph
Let's take a look at some queries here://Look at day 1
match (d:Day {date:20160307})<--(t:Time)<--(s:Session)--(connections)
return d,t,s,connections
limit 50
gives us the following graph.
Then we can also start looking at some connections between people, and explore the surrounding area of the graph:
//Look at two people
match (p1:Person), (p2:Person),
path = allshortestpaths( (p1)-[*]-(p2) )
where p1.name contains "Hunger"
and p2.name contains "Webber"
return path
Then we start seeing some interesting links:
And then we can of course also explore similar links between people and companies:
//Look at a person and a company
match (c:Company {name:"ThoughtWorks"}), (p:Person {name:"Jim Webber"}),
path = allshortestpaths( (c)-[*]-(p) )
return path
Gets us the "conference"-style links between Jim Webber and his former and current employer:
And then finally, a query that I always tend to run for the heck of it - the conference session that have more than one speaker:
//Look at sessions with more than one speaker
match (s:Session)-[r:SPEAKS_IN]-(p:Person)
with s, collect(p) as person, count(p) as count
where count > 1
return s,person
Looks like this - we can obviously explore this a lot further.
Cool. Again - this is just an example of the type of nice little graph explorations that you can easily set up on your local machines. If you have any questions or want to explore this further, then please don't hesitate to drop by the Neo4j booth!
Have fun at the conference!
Cheers
Rik
Sunday, 19 April 2015
The making of The GraphConnect Graph
Next month is GraphConnect London, our industry's yearly High Mass of Graphiness. It's going to be a wonderful event, and of course I wanted to chip in to promote it in any way that I can. So I did the same thing that I did for Øredev and Qcon before: importing the schedule into Neo4j.
I actually have already published an GraphGist about this already. But this post is more about the making of that database - just because I - AGAIN - learnt something interesting while doing it.
The right hand part is probably pretty easy to understand. But of course I had to do something special with the days and the timeslots.
I actually have already published an GraphGist about this already. But this post is more about the making of that database - just because I - AGAIN - learnt something interesting while doing it.
The Source Data
My dear Marketing colleague Claudia gave me a source spreadsheet with the schedule. But of course that was a bit too... Marketing-y. I cleaned it up into a very simple sheet that allowed me to generate a very simple CSV file:
I have shared the CSV file on Github. Nothing really special about it. But let me quickly explain what I did with it.Choosing a model
Before importing data, you need to think a bit about the model you want to import into. I chose this model:The right hand part is probably pretty easy to understand. But of course I had to do something special with the days and the timeslots.
- The days are part of the conference, and they are connected:

- And the timeslots within a day are also connected:

So how to import into that from that simple CSV file. Let's explore.
The LOAD CSV scripts
You can find the full load script - which actually loads from the dataset mentioned above - on github too. It's pretty straightforward: most commands just read a specific column from the csv file and do MERGEs to the graph. Like for exampleload csv with headers from "https://gist.githubusercontent.com/rvanbruggen/ff44b7dc37bb4534df2e/raw/aed34a149f04798e351f508a18492237fcccfb62/schedule.csv" as csvNice and easy. There's a couple of commands that are a bit more special, as they have to check for NULLs before you can do a MERGE. But nothing really complicated. There's two sets of import commands - one for each day - that is a bit more interesting: how do you import the timeslots and create a structure like the one above, where all timeslots are nicely ordered and connected in an in-graph index. That's not that trivial:
merge (v:Venue {name: csv.Venue})
merge (r:Room {name: csv.Room})
merge (r)-[:LOCATED_IN]->(v)
merge (d:Day {date: toInt(csv.Date)})
merge (tr:Track {name: csv.Track});
- loading the timeslots is easy with MERGE
- sorting them is of course also easy
- but creating the appropriate FOLLOWED_BY relationships between the timeslots to create the in-graph index/timeline, is not that easy.
Luckily I found these two blogposts by Mark Needham that shows me how to do it. Here's the query:
match (t:Time)--(d:Day {date: 20150506})What this does is the following:
with t
order by t.time ASC
with collect(t) as times
foreach (i in range(0,length(times)-2) |
foreach (t1 in [times[i]] |
foreach (t2 in [times[i+1]] |
merge (t1)-[:FOLLOWED_BY]->(t2))));
- you match all the timeslots that are part of the specific day that you want to order.
- you pass the ordered timeslots to the next part of the query
- you collect the ordered timeslots into a collection
- you iterate X times (where X is the length of the collection -2, so excluding the start and end position) through the collection with FOREACH
- you iterate through the starting positions (i) and the ending positions (i+1) in the collection
- every time you iterate, you MERGE a FOLLOWED_BY relationship between the starting position and the ending position
And that's it. Job done. We do this for the second day ("20150507") as well, of course.
Hope you enjoyed this as much as I did, and hope to see you at the conference!
Cheers
Rik
Friday, 27 February 2015
The QCON Graph
Next week is QCON, London's holiest of holiest (according to some) developer conferences. It's going to be a ton of fun, and the lineup of speakers and sessions is just... impressive. Take a look at it over here - there's dozens of sessions that I would love to attend, by speakers that I would love to see. Unfortunately, I will probably miss most of it - as Neo4j is sponsoring the event, and that means BOOTH DUTIES! Yej!
But, not to be worried, there's fun to be had - with the schedule. I mean, who can read any of this stuff, really:
Here are some interesting ones exploring the graph:
This is a little overview of the "conference timeline: just exploring the days and then looking at the timeslots available in that day:
And this is probably my favourite: looking at the connectivity between the unfollowable Mark Needham and Yan Cui of Gamesys:
But, not to be worried, there's fun to be had - with the schedule. I mean, who can read any of this stuff, really:
Tables, Schmables!!! Let's but it into a graph!
Start with a model!
From the tables above, I distilled the following model:
It's actually pretty rich:
- Floors, days and times are connected to eachother by "in-graph-index" relationships.
- Rooms and talks are there
- Talks are part of a track
- Persons can act as speakers and as track hosts.
There's some fun to be had there.
Importing it into Neo4j
Naturally, I started with a spreadsheet. I needed to do some copy/pasting and cleaning of the data, and that's what I use that for. But once it was there, generating the Cypher to create the graph was trivial. Here's the create script to do that yourself - just clone it if you want.
Once we had that, we could start doing some querying on the graph. I have put some sample queries over here.
Here are some interesting ones exploring the graph:
Here we go looking at some talks and tracks and how they are connected to eachother:
And this is probably my favourite: looking at the connectivity between the unfollowable Mark Needham and Yan Cui of Gamesys:
Of course there's plenty more to explore. That's why my friend and colleague Michael Hunger was kind enough to put the database (in read-only mode) on one of his servers - you can take a look at it over here.
Hope you found this useful - see you next week at the conference!
Cheers
Rik
Friday, 7 November 2014
Wasting time as a boothbabe @ Oredev
Subscribe to:
Posts (Atom)













