Social Networks: Friend Discovery
Typed follow and friend edges, scored by meaning
What if your social graph and your interest-matching lived in one database, with no sync job between them?
Building a social platform usually means running two stores at once. A graph database like Neo4j holds the explicit social ties (who follows whom, who friended whom), and a vector store holds profile embeddings so you can match people by interest. Then you need an ETL job to keep them aligned, because a user exists in both systems and the two copies drift.
SwarnDB removes the second store. The id of a profile vector is the id of its graph node, so a person is one object, not a row in Postgres mirrored into Neo4j. Embed each user's bio and interests, insert the vector, and you have a fast interest-matching store by default. Flip the collection to hybrid mode and the same objects carry a real, typed social graph.
The social ties are explicit, typed edges you author: FOLLOWS, FRIEND, BLOCKED. You create them with put_edge as users connect, or bulk-import them from your existing social tables, and every edge carries provenance so you always know where a tie came from. To recommend "people you may know," seed candidates by profile similarity, walk the FOLLOWS edges across the network, then rank the surviving frontier by interest match. Structure and meaning in one query, over one copy of the data.
The Traditional Approach
The fragmented stack most teams cobble together today.
- Two stores for one concept: a user is a graph node AND a vector, kept in sync by hand
- ETL drift: a profile edited in one store is stale in the other until the next sync run
- Interest matching and social-tie traversal live in different engines, joined in app code
- Privacy changes (block, unfollow, hide) touch multiple systems and cascade
- No single audit trail for where a follow or friend edge came from
- Cross-community discovery needs custom graph algorithms bolted onto the vector results
The SwarnDB Approach
One database. Every capability built in.
Typed Social Edges
FOLLOWS, FRIEND, and BLOCKED are explicit, directed, typed edges you author with put_edge or bulk import. Each edge carries confidence and provenance, so a tie is something you created and can audit, not something inferred from similarity.
Graph Traversal
Walk the typed follow and friend edges to expand a network across communities. Single-hop, k-hop, and shortest path over the edges you authored, in one engine with the profile vectors.
Edge CRUD and Provenance
Create, update, and delete social edges; verify or reject them; keep full audit history. Blocking a user is a typed BLOCKED edge, not a cascading rebuild across stores.
Hybrid Query
Seed by profile similarity, traverse the social edges, then rank the surviving frontier by interest match. vector_similar, then traverse, then vector_rank, in one composable query.
- One object per user: the profile vector id is the graph node id, no sync job
- Social ties are explicit typed edges you author, with confidence and provenance
- Interest matching and social traversal run in one query, over one copy of the data
- Privacy via typed BLOCKED edges and edge CRUD, with a full audit trail
- Cross-community recommendations from a graph frontier ranked by meaning
- Bulk-import existing follow and friend tables straight into typed edges
01. How It Works
Setup is two stores' worth of capability in one. First, take a user's profile text and embed it with any embedding model. Second, insert that vector into a hybrid collection with whatever metadata you want: name, city, age, community. The id the insert returns is the user's graph node id. One object, not a vector mirrored into a separate graph store.
Social ties are explicit. When a user follows or friends someone, you author a typed edge with put_edge: FOLLOWS or FRIEND, directed, carrying provenance so you know exactly when and why it was created. If you already run a social platform, bulk-import your existing follow and friend tables straight into typed edges; no inference, no guessing.
To recommend people, you compose one query. Seed candidates by profile similarity, walk the FOLLOWS edges to reach friends-of-friends, then rank the surviving frontier by interest match. The result spans communities because the traversal follows real ties and the ranking respects meaning, all in one engine over one copy of the data.
Key insight:A follow or friend tie is something you author, not something inferred from a similarity score. The vector and the graph node are the same object.
from openai import OpenAI
from swarndb import SwarnDBClient
openai = OpenAI()
client = SwarnDBClient(host="localhost", port=50051)
# Hybrid mode turns on the first-class typed graph.
client.collections.create(
"social_network", dimension=1536, distance_metric="cosine", mode="hybrid"
)
# Embed the profile text
text = "gaming, streaming, esports, tech startups, coding. " \
"Competitive gamer and indie dev."
embedding = openai.embeddings.create(
model="text-embedding-3-small", input=text
).data[0].embedding
# The insert id is the user's node id. Vector and node are one object.
leo = client.vectors.insert("social_network",
vector=embedding,
metadata={"name": "Leo Hill", "city": "Austin", "age": 24}
)
# A follow is an explicit, typed edge you author, with provenance.
client.graph.put_edge("social_network", source=leo, target=ava,
edge_type="FOLLOWS", provenance={"source": "app-follow"})02. Cross-Community Discovery
Recommending people across communities is where the hybrid query earns its place. You do not want to recommend strangers who merely sit near someone in vector space; you want to reach people connected through real follow chains, then keep only the ones who actually match by interest.
The query does exactly that. Seed a small candidate set by profile similarity, traverse the FOLLOWS edges across a couple of hops to reach friends-of-friends, then rank that graph-built frontier by interest match. The structure (who is connected to whom) comes from the typed edges you authored. The relevance (who is worth recommending) comes from the vector ranking over the surviving nodes.
Because the traversal follows real ties rather than guessing, the recommendations are explainable: every person in the result is reachable through a concrete chain of follow edges, and every edge carries provenance. A gamer reaches a finance professional because a real follow path connects them and their profiles genuinely overlap, not because a threshold happened to wire two vectors together.
Key insight:Recommendations follow real follow chains, then rank by interest. Every result is explainable: a concrete path of typed edges connects it.
# Cross-community recommendations: scope by structure, rank by meaning
result = (
client.graph.query("social_network")
.vector_similar(leo_embedding, k=20)
.traverse("FOLLOWS", direction="outgoing")
.vector_rank(leo_embedding, k=10)
.return_nodes()
)
# Each returned person is reachable through real FOLLOWS edges
# AND ranks well by profile similarity.
for node in result.nodes:
print(node.id, node.label)
# Want friends-of-friends? Traverse two hops along the same typed edges
# before ranking the surviving frontier by meaning.03. Privacy and Blocking
Privacy in a two-store setup is painful: hiding or blocking a user means editing the graph store, the vector store, and the caches in front of both, then hoping they stay consistent. In SwarnDB the user is one object and the ties are typed edges, so privacy is ordinary edge CRUD.
To block someone, author a typed BLOCKED edge with put_edge. To unfollow, delete the FOLLOWS edge. Each change is a single, scoped operation on one store, and it carries provenance, so you always have an audit trail of who blocked whom and when. Nothing cascades across systems because there is only one system.
If you need a user to disappear from recommendations entirely, remove or reject their outbound social edges; their profile vector stays searchable for the user themselves, but the traversal no longer reaches them through anyone else's network. Restoring access is just re-creating the edges. Every step is auditable, and no other user's ties are touched.
Key insight:Blocking and unfollowing are ordinary edge CRUD with provenance. One scoped change, one audit trail, no cascade across separate stores.
# Block a user - a typed BLOCKED edge, with provenance
client.graph.put_edge("social_network", source=user_id, target=blocked_id,
edge_type="BLOCKED", provenance={"reason": "user-request"})
# Unfollow - delete the typed FOLLOWS edge
client.graph.delete_edge("social_network", source=user_id, target=other_id,
edge_type="FOLLOWS")
# Curation: verify or reject an edge, keep full audit history
client.graph.verify_edge("social_network", edge_id)
client.graph.reject_edge("social_network", edge_id)
# One scoped operation per change. No cascade across stores.
# Every edge carries provenance: you always know who connected whom.04. The Full Pipeline
A complete "People You May Know" feature is one composable query plus the typed edges behind it. First, author or bulk-import the FOLLOWS and FRIEND edges that represent your real social graph. Then run the hybrid builder: seed candidates by profile similarity, traverse the social edges to reach friends-of-friends, and rank the surviving frontier by interest match.
The result spans communities naturally, because the traversal follows real ties and the ranking keeps only genuine interest matches. A user's core community leads, but the feed includes people reached through concrete follow chains in adjacent communities. Every recommendation is explainable and auditable: you can show the exact path of typed edges that connects two people.
This is a complete recommendation flow over one store. No second graph database, no ETL sync, no copy drift. The interest matching and the social traversal happen in one engine, over one copy of the data, in one query.
Key insight:One composable query over typed social edges produces an explainable, auditable feed. No second store, no sync job, no copy drift.
# Full "People You May Know" pipeline, one composable query
# Edges authored as users connect, or bulk-imported from your tables:
# client.graph.put_edge(... edge_type="FOLLOWS" ...)
# client.graph.bulk_import_edges("social_network", data, format="csv")
feed = (
client.graph.query("social_network")
.vector_similar(user_embedding, k=50)
.traverse("FOLLOWS", direction="outgoing") # friends-of-friends
.vector_rank(user_embedding, k=200)
.return_nodes()
)
# Result: people reached through real follow chains, ranked by interest.
# Core community leads; adjacent communities arrive via concrete edge paths.
# One engine, one copy of the data, one query.SwarnDB vs Traditional Stack
A side-by-side look at the traditional approach versus SwarnDB.
| Capability | Traditional Stack | SwarnDB |
|---|---|---|
| Storage | Postgres + Neo4j + vector store | One database, one object per user |
| Social ties | Explicit edges in a separate graph store | Typed FOLLOWS / FRIEND edges, same engine |
| Interest match + traversal | Two engines joined in app code | One hybrid query: similar, traverse, rank |
| Privacy / blocking | Cascading edits across stores | Typed BLOCKED edge, scoped edge CRUD |
| Provenance | No unified audit trail | Every edge carries source and history |
Key Metrics
The Code
Everything above, in a few lines of Python.
from openai import OpenAI
from swarndb import SwarnDBClient
openai = OpenAI()
client = SwarnDBClient(host="localhost", port=50051)
# Hybrid mode: profile vectors and a typed social graph in one store.
client.collections.create(
"social_network", dimension=1536, distance_metric="cosine", mode="hybrid"
)
# Embed and store profiles. The insert id is the user's node id.
for user in users:
embedding = openai.embeddings.create(
model="text-embedding-3-small", input=user["bio"]
).data[0].embedding
client.vectors.insert("social_network",
vector=embedding,
metadata={"name": user["name"], "city": user["city"]}
)
# Author typed social edges (or bulk-import existing tables).
client.graph.put_edge("social_network", source=a, target=b,
edge_type="FOLLOWS", provenance={"source": "app-follow"})
client.graph.bulk_import_edges("social_network", follow_rows, format="csv")
# People you may know: scope by structure, then rank by meaning.
feed = (
client.graph.query("social_network")
.vector_similar(user_embedding, k=50)
.traverse("FOLLOWS", direction="outgoing")
.vector_rank(user_embedding, k=200)
.return_nodes()
)