Tuesday 5 July 2022

Graphs are everywhere - also in Religious Texts - part 2 - import the Hadith narrators into Neo4j

The source data that we found in part 1 is in a .csv format - so that means that it basically looks tabular:

Luckily, we nowadays have some fantastic tools to import these files, without writing any code at all using the all new Neo4j Data Importer. After drawing a few nodes and relationships, I was able to do the basic import:  It was super quick to return after a few seconds: 

I am of course sharing the Data Importer config (model and data) as a zip file as well.

As usual, there is a bit of messyness in the data still, so I had to do some wranging to get a better/richer model.

First, we would want to split the two parents of a Scholar into different fields:

:auto MATCH (s:Scholar) 
CALL {
    WITH s
    SET s.parent1 = trim(split(s.parents,"/")[0]) 
    SET s.parent2 =  trim(split(s.parents,"/")[1])
} IN TRANSACTIONS of 1000 ROWS;

<!-- remove the brackets, introduce comma -->
:auto MATCH (s:Scholar) 
CALL {
    WITH s
    SET s.parent1 = replace(s.parent1," [",",")
    SET s.parent1 = replace(s.parent1,"]","")
    SET s.parent2 = replace(s.parent2," [",",")
    SET s.parent2 = replace(s.parent2,"]","")
} IN TRANSACTIONS of 1000 ROWS;

<!-- extract the IDs -->
:auto MATCH (s:Scholar) 
CALL {
    WITH s
    SET s.parent1_id = trim(split(s.parent1,",")[1])
    SET s.parent1 = trim(split(s.parent1,",")[0])
    SET s.parent2_id = trim(split(s.parent2,",")[1])    
    SET s.parent2 = trim(split(s.parent2,",")[0])
} IN TRANSACTIONS of 1000 ROWS;

This then allows us to create relationships between Scholars that have other Scholars as parents:

MATCH (s:Scholar)
WHERE s.parent1_id IS NOT NULL
WITH s
MATCH (parent:Scholar)
WHERE parent.scholar_indx = s.parent1_id
MERGE (s)-[:CHILD_OF]->(parent);

MATCH (s:Scholar)
WHERE s.parent2_id IS NOT NULL
WITH s
MATCH (parent:Scholar)
WHERE parent.scholar_indx = s.parent2_id
MERGE (s)-[:CHILD_OF]->(parent);

Next step is to create the marriage relationships between Scholars. To do that, we first have to split the s.spouse property and store that as a s.listofspouses:

:auto MATCH (s:Scholar)
CALL {
    WITH s
    SET s.listofspouses = split(replace(s.spouse," ",""),",")
} IN TRANSACTIONS OF 1000 ROWS;

Next, we UNWIND the s.listofspouses and get a list of scholar_indx properties that we can match and use to create the [:MARRIED_TO] relationships.

MATCH (s:Scholar)
UNWIND s.listofspouses as scholarspouse
WITH s, replace(split(scholarspouse,"[")[1],"]","") as scholarspouse_id
WHERE scholarspouse_id IS NOT NULL
MATCH (scholarspousenode:Scholar {scholar_indx: scholarspouse_id})
MERGE (s)-[:MARRIED_TO]->(scholarspousenode);

And then finally, we can create the teacher/student relationships between Scholars:

MATCH (s:Scholar)
WITH s, s.students_inds as students_of_scholar
UNWIND students_of_scholar as student
    MATCH (st:Scholar {scholar_indx: student})
    MERGE (st)-[:STUDENT_OF]->(s)
WITH s, s.teachers_inds as teachers_of_scholar
UNWIND teachers_of_scholar as teacher
    MATCH (tea:Scholar {scholar_indx: teacher})
    MERGE (tea)-[:TEACHER_OF]->(s);

After having done all of these manipulations, we actually can look at some really interesting subgraphs: 

Note: there are some additional data in the dataset (and included in the (:Scholar) nodes) like areas of interest and tags. For the purpose of this exercise - the Narrator networks and the chains of narration for each Hadith - this is not as interesting and therefore we are not splitting that information off into separate nodes and relationships. It would be trivial to do so - but unnecessary at this point.

In the next blogpost, we will go and import the actual Hadiths that are being narrated into our graph.

Looking forward already!

Rik

PS: as always all the code/queries are available on github!

PPS: you can find all the parts in this blogpost on the following links

No comments:

Post a Comment