Loading data from a SQL database to neo4j graph database

Neo4j is arguably the best graph database out there. Graph databases store data in the form of nodes, properties and relations between nodes. The way the data is indexed, and stored helps in answering queries like friends of friends, fraud detection with connected events etc. efficiently. It can also help in visualising random sample data. For whatever reason you chose to load data from SQL to Neo4j, this article assumes that you are pretty convinced you should do this, and you know a bit about how neo4j works. So, let’s dive right into it.

Prepare the data

First step is to collect the required data from the source(SQL db). There are good number of tools which can help you export this data in the form of a CSV file with the relevant fields.

employee_locations.csv

id, employee_name, employee_location
1, Sanjay, Mumbai
2, Karthik, Bengaluru

Define the node labels in Neo4j

CREATE (e:Employee)

CREATE (l:Location)

Create relevant indexes

CREATE CONSTRAINT ON (n:Employee) ASSERT n.id IS UNIQUE

CREATE CONSTRAINT ON (l:Location) ASSERT l.name IS UNIQUE

Put the csv file in a location that neo4j can read from

Neo4j by default reads from import folder, which is inside your installation folder(/Users/<username>/neo4j-community-4.4.8/). Let’s call this file name employee_locations.csv

Load data to neo4j

Create employee nodes.

load csv with headers from 'file:///employee_locations.csv' as row
merge (e:Employee{name:row.employee_name, id: row.id})

Create location nodes.

load csv with headers from 'file:///employee_locations.csv' as row
merge (l:Location{name:row.employee_location})

Notice how the CSV column header names are used.

Create appropriate relations between the nodes.

load csv with headers from 'file:///employee_locations.csv' as row
match (e:Employee), (l:Location)
where e.id = row.id and l.name = row.employee_location
create (e)-[r:stays_in]->(l)

That’s it. You can experiment with more datasets.