In this deep-dive tutorial you’ll learn how to connect to Neo4j exclusively from Python, model a non-trivial schema, ingest multi-domain data (movies, people, characters, companies, reviews, social links, release history), and run Graph Data Science algorithms—all in one script.
🚀 1. Why a Knowledge Graph?
- Natural model for connected data: nodes (entities) + relationships.
- Cypher: a declarative, SQL-style graph query language.
- Use cases: recommendations, fraud detection, social networks, knowledge graphs, taxonomy management.
🛠 2. Prerequisites
- Neo4j v5+ running locally (Community or via Docker).
- Python 3.8+ virtual environment.
- Install packages:
pip install neo4j pandas
📦 3. Script Overview
This script (complex_kg.py
) orchestrates the entire lifecycle of your movie‐social knowledge graph:
-
Establish Connection
- Opens a Bolt session to Neo4j using the official Python driver.
-
Schema Setup
- Creates unique constraints on key node labels (
Movie.title
,Person.name
, etc.). - Builds property indexes and a full‐text index for fast lookups.
- Creates unique constraints on key node labels (
-
Data Ingestion
- Genres & Companies Loads genre tags and production studios with their founding dates and countries.
- Movies & Genre Links Imports each movie node and attaches it to its genres.
- Characters & Roles Defines character nodes (with archetypes) and links them to their movies.
-
People (Actors, Directors, Writers)
Creates
Person
nodes, then establishesACTED_AS
,DIRECTED
andWROTE
relationships. -
Reviews & Social Graph
Inserts
Review
nodes, connects them to users and movies, and builds aFOLLOWS
network (with timestamps). -
Temporal Releases & Versions
Models per‐region release dates and version nodes (e.g., remasters) with
RELEASED_IN
andHAS_VERSION
edges.
-
Graph Data Science
-
Social PageRank
Projects the
User
+FOLLOWS
subgraph and computes influence scores. - Movie Similarity Builds a movie–genre projection and streams top‐N similar movie pairs.
-
Social PageRank
Projects the
-
Results Output
- Formats and prints PageRank scores and similarity pairs as pandas DataFrames.
🔧 4. Configuration & Connection
from neo4j import GraphDatabase, basic_auth
import pandas as pd
URI = "bolt://localhost:7687"
USER = "neo4j"
PASSWORD = "rootderoot"
driver = GraphDatabase.driver(URI, auth=basic_auth(USER, PASSWORD))
This opens a connection pool to your local Neo4j.
🔐 5. Constraints & Indexes
def create_constraints_and_indexes(tx):
tx.run("CREATE CONSTRAINT IF NOT EXISTS FOR (m:Movie) REQUIRE m.title IS UNIQUE")
tx.run("CREATE CONSTRAINT IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE")
tx.run("CREATE CONSTRAINT IF NOT EXISTS FOR (c:Character) REQUIRE (c.name, c.movie) IS NODE KEY")
tx.run("CREATE CONSTRAINT IF NOT EXISTS FOR (co:Company) REQUIRE co.name IS UNIQUE")
tx.run("CREATE CONSTRAINT IF NOT EXISTS FOR (g:Genre) REQUIRE g.name IS UNIQUE")
tx.run("CREATE CONSTRAINT IF NOT EXISTS FOR (u:User) REQUIRE u.name IS UNIQUE")
tx.run("CREATE INDEX IF NOT EXISTS movie_year FOR (m:Movie) ON (m.released)")
tx.run("""
CREATE FULLTEXT INDEX IF NOT EXISTS movie_text
FOR (m:Movie) ON EACH [m.title, m.tagline]
""")
- Node key on
Character
ensures uniqueness per (name, movie). - A full-text index on movies lets you search titles or taglines.
🌐 6. Loading Static Domain Data
6.1 Genres & Companies
def load_genres_and_companies(tx):
genres = ['Action','Sci-Fi','Thriller','Drama']
for name in genres:
tx.run("MERGE (:Genre {name:$name})", name=name)
companies = [
{"name":"Warner Bros.", "founded":1923, "country":"US"},
{"name":"Paramount Pictures", "founded":1912, "country":"US"}
]
for c in companies:
tx.run("""
MERGE (co:Company {name:$name})
SET co.founded = $founded, co.country = $country
""", **c)
6.2 Movies & Genre Links
def load_movies(tx):
movies = [
{"title":"Inception","released":2010,"tagline":"Your mind is the scene of the crime","genres":["Thriller","Sci-Fi"]},
{"title":"Interstellar","released":2014,"tagline":"Mankind’s next step will be our greatest","genres":["Sci-Fi","Drama"]}
]
for m in movies:
tx.run("MERGE (mv:Movie {title:$title}) SET mv.released=$released, mv.tagline=$tagline", **m)
for g in m["genres"]:
tx.run("""
MATCH (mv:Movie {title:$title}), (g:Genre {name:$genre})
MERGE (mv)-[:IN_GENRE]->(g)
""", title=m["title"], genre=g)
- MERGE ensures idempotent loads.
- We attach two genres per movie.
🎭 7. Characters, Actors, Directors, Writers
def load_people_and_roles(tx):
# Characters
characters = [
{"name":"Cobb","movie":"Inception","archetype":"Hero"},
{"name":"Murph","movie":"Interstellar","archetype":"Protege"}
]
for ch in characters:
tx.run("""
MERGE (c:Character {name:$name, movie:$movie})
SET c.archetype=$archetype
""", **ch)
# Actors & ACTED_AS
actors = [
{"name":"Leonardo DiCaprio","born":1974,"nationality":"US","character":"Cobb","year":2010},
{"name":"Jessica Chastain","born":1977,"nationality":"US","character":"Murph","year":2014}
]
for a in actors:
tx.run("""
MERGE (p:Person {name:$name})
SET p.born=$born, p.nationality=$nationality
""", **a)
tx.run("""
MATCH (p:Person {name:$name}), (c:Character {name:$character, movie:$character})
MERGE (p)-[:ACTED_AS {roles:[$character], year:$year}]->(c)
""", name=a["name"], character=a["character"], year=a["year"])
# Directors
directors = [
{"director":"Christopher Nolan","movie":"Inception","year":2010},
{"director":"Christopher Nolan","movie":"Interstellar","year":2014}
]
for d in directors:
tx.run("""
MERGE (p:Person {name:$director})
MERGE (m:Movie {title:$movie})
MERGE (p)-[:DIRECTED {year:$year}]->(m)
""", **d)
- We model characters separately from people.
- Each
Person
may ACTED_AS, DIRECTED, or WROTE aMovie
.
📝 8. Reviews, Social Follows & Likes
def load_reviews_and_social(tx):
# Users
for u in ["Alice","Bob","Carol"]:
tx.run("MERGE (:User {name:$name})", name=u)
# Reviews
reviews = [
{"user":"Alice","movie":"Inception","rating":5,"date":"2021-01-01","comment":"Mind-blowing!"},
{"user":"Bob","movie":"Interstellar","rating":4,"date":"2021-02-02","comment":"Epic visuals."}
]
for r in reviews:
tx.run("""
MATCH (u:User {name:$user}), (m:Movie {title:$movie})
CREATE (rev:Review {rating:$rating, date:date($date), comment:$comment})
MERGE (u)-[:WROTE]->(rev)
MERGE (rev)-[:FOR_MOVIE]->(m)
""", **r)
# Follows
follows = [("Alice","Bob","2021-03-01"),("Bob","Carol","2021-03-05")]
for fr,to,date in follows:
tx.run("""
MATCH (a:User {name:$fr}), (b:User {name:$to})
MERGE (a)-[f:FOLLOWS]->(b)
ON CREATE SET f.since = date($date)
""", fr=fr, to=to, date=date)
- We create Review nodes with rating, date, comment.
- Users WROTE reviews and FOLLOW one another.
📆 9. Temporal Releases & Versions
def load_temporal_and_versions(tx):
# Releases by region
releases = [
{"movie":"Inception","region":"US","date":"2010-07-16"},
{"movie":"Inception","region":"FR","date":"2010-07-21"}
]
for r in releases:
tx.run("MERGE (rel:Release {region:$region, date:date($date)})", **r)
tx.run("""
MATCH (m:Movie {title:$movie}), (rel:Release {region:$region, date:date($date)})
MERGE (m)-[:RELEASED_IN {region:$region, date:date($date)}]->(rel)
""", **r)
# Versions / Remasters
versions = [{"movie":"Interstellar","label":"4K Remaster","releaseDate":"2020-11-01"}]
for v in versions:
tx.run("""
MERGE (ver:Version {label:$label})
SET ver.releaseDate=date($releaseDate)
""", **v)
tx.run("""
MATCH (m:Movie {title:$movie}), (ver:Version {label:$label})
MERGE (m)-[:HAS_VERSION {releaseDate:date($releaseDate)}]->(ver)
""", **v)
- Each
Release
captures a region and a date. -
Version
nodes let you track director’s cuts or remasters over time.
🔬 10. Graph Data Science
10.1 PageRank on Social Graph
def run_gds(tx):
tx.run("CALL gds.graph.drop('social', false)").consume()
tx.run("""
CALL gds.graph.project('social','User',{FOLLOWS:{orientation:'NATURAL'}})
""").consume()
return tx.run("""
CALL gds.pageRank.stream('social')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS user, round(score,3) AS pr
ORDER BY pr DESC
""").data()
10.2 Movie Similarity via Genres
def run_movie_similarity(tx):
tx.run("CALL gds.graph.drop('movieActor', false)").consume()
tx.run("""
CALL gds.graph.project.cypher(
'movieActor',
'MATCH (m:Movie) RETURN id(m) AS id',
'MATCH (m1)<-[:IN_GENRE]-(:Genre)-[:IN_GENRE]->(m2)
WHERE id(m1)<id(m2)
RETURN id(m1) AS source, id(m2) AS target'
)
""").consume()
return tx.run("""
CALL gds.nodeSimilarity.stream('movieActor',{similarityCutoff:0.2})
YIELD node1,node2,similarity
RETURN gds.util.asNode(node1).title AS A,
gds.util.asNode(node2).title AS B,
round(similarity,3) AS sim
ORDER BY sim DESC LIMIT 5
""").data()
- PageRank reveals the most “influential” users in your social network.
- Node similarity finds the top 5 most similar movie pairs based on shared genres.
▶️ 11. Putting It All Together
def main():
with driver.session() as s:
s.execute_write(create_constraints_and_indexes)
s.execute_write(load_genres_and_companies)
s.execute_write(load_movies)
s.execute_write(load_people_and_roles)
s.execute_write(load_reviews_and_social)
s.execute_write(load_temporal_and_versions)
pr_scores = s.execute_read(run_gds)
sim_pairs = s.execute_read(run_movie_similarity)
import pandas as pd
print("PageRank scores:\n", pd.DataFrame(pr_scores))
print("\nTop movie similarities:\n", pd.DataFrame(sim_pairs))
driver.close()
if __name__ == "__main__":
main()
Run:
python complex_kg.py
Example output:
PageRank scores:
user pr
0 Alice 0.500
1 Bob 0.333
2 Carol 0.167
Top movie similarities:
A B sim
0 Inception Interstellar 0.707
...
📈 12. Next Steps
- Expose an API (Flask/FastAPI) that runs parameterized Cypher.
- Load real data from CSV, TMDB or Wikidata.
- Add neosemantics (n10s) plugin to export RDF/SPARQL.
- Visualize with Neo4j Bloom or Neodash.
You now have a comprehensive Python-driven workflow: schema definition, data ingestion, analytics, all in a single, reproducible script. Happy graphing!
Top comments (0)