First-Class Typed Graph

Directed, typed edges with confidence and provenance, in the same engine

“What if your vector store and your graph were literally the same object?”

vector id = node idIdentityTyped & directedEdgesOn every edgeProvenanceOpt-in per collectionActivation

Most stacks bolt a graph onto a vector database. You keep vectors in one store and relationships in another, then write a sync job to keep them aligned. The two drift, edges get stale, and a crash can leave them inconsistent. The graph is always a second-class citizen, a mirror of the real data rather than part of it.

SwarnDB takes the opposite path. The id of a vector is the id of its graph node. There is no foreign key, no mirror table, no eventual consistency between two stores. The thing you searched for and the thing you traverse from are literally the same row, with one storage path and one crash-recovery path for both.

The graph is real and typed, not inferred from similarity. Edges are directed and carry a type (such as CITES or MENTIONS), a confidence value, a manual-vs-extracted flag, and a provenance record. You author them explicitly with put_edge or bulk import, or you have an LLM extract them with full provenance. This is structure you author, not similarity you hope is structure.

It is opt-in per collection. Every collection starts as a fast, accurate vector store with nothing turned on. Flip a collection to mode="hybrid" at create time and the first-class typed graph comes online for that collection only. Vector-only collections are completely unchanged. (An optional auto-similarity mode that infers edges from your vectors also exists, off by default, and is secondary to this typed graph.)

How the Typed Graph Works

Enable

Create a collection with mode="hybrid". This turns on the first-class typed graph for that collection. Vector-only collections stay exactly as they are; the graph is opt-in and never forced on you.

Insert

Insert vectors as usual. The id each insert returns is the node's id. The vector and the graph node are one object: one identity, one storage path, one crash-recovery path.

Author

Create directed, typed edges with put_edge or bulk import, each carrying a type, confidence, a manual-vs-extracted flag, and a provenance record. Or point the collection at an LLM and have it extract typed entities and edges with provenance.

Traverse

Walk typed edges inside a hybrid query: seed candidates by similarity, traverse the edges you authored, then rank the surviving frontier exactly by meaning. Similarity and structure in one plan, inside one engine.

What You Can Do

Capabilities

One Object, Not Two Databases

The id of a vector is the id of its graph node. No foreign key, no mirror table, no sync job between a vector store and a graph store. The row you searched for is the row you traverse from, with one storage path and one crash-recovery path for both. That single decision is what makes one query possible: scope by structure, then rank by meaning, in one plan.

one_object.py

from swarndb import SwarnDBClient

with SwarnDBClient(host="localhost", port=50051) as client:
    # mode="hybrid" turns on the first-class typed graph.
    client.collections.create(
        "articles", dimension=384, distance_metric="cosine", mode="hybrid"
    )

    # The id each insert returns is the node's id.
    a = client.vectors.insert("articles", vector=[0.1, 0.2, 0.3], metadata={"topic": "physics"})
    b = client.vectors.insert("articles", vector=[0.3, 0.1, 0.4], metadata={"topic": "math"})

Typed Edges with Provenance

Directed edges that carry a type, a confidence value, a manual-vs-extracted flag, and a provenance record. An edge is not just a pointer between two nodes; it is a fact with a known source. Create them with put_edge, and you always know where each relationship came from.

put_edge.py

# Typed edges carry provenance, not just a pointer.
client.graph.put_edge(
    "articles", source=a, target=b, edge_type="CITES",
    provenance={"doc_id": "paper-1"}
)

Author or Extract

You decide where edges come from. Create them explicitly with put_edge, bulk import them from CSV or JSONL, or point the collection at any OpenAI-compatible model with your own key and have it extract typed entities and edges with full provenance. Every path produces the same first-class typed edge.

Opt-In Per Collection

The graph is enabled per collection via mode="hybrid" at create time. Collections you leave as vector-only behave exactly as before, with nothing extra turned on. You pay for the graph only where you ask for it, and you never see it where you do not.

Authored, Not Inferred

This is not a similarity graph derived from your vectors. The edges are structure you author or have an LLM extract, never relationships guessed from how close two vectors happen to be. (An optional auto-similarity mode that infers edges from vectors exists, off by default, and is secondary to this typed graph.)

Edges You Author, Not Edges You Guess

A similarity graph answers one question: which vectors are close to each other? That is useful, but closeness is not the same as a relationship. Two papers can be near each other in vector space and have nothing to do with one another, while a paper that cites another may sit far apart. Real structure is something you state, not something you infer from distance.

In SwarnDB, edges are facts you author. You call put_edge with a source node, a target node, an edge type such as CITES, and a provenance record, and that directed, typed edge becomes part of the graph. It carries a confidence value and a manual-vs-extracted flag, so the graph knows not only that a relationship exists but how it got there and how sure you are.

Because the vector and the node are the same object, there is no second store to keep in sync. The edge connects two rows that already hold your vectors. There is no copy drift, no foreign key to maintain, no ETL job that can fall behind. The graph is part of the data, recovered on the same crash-recovery path as the vectors themselves.

If you would rather not author every edge by hand, point the collection at an LLM and have it extract typed entities and edges with full provenance. Either way, the result is the same kind of first-class typed edge, and you always know where each one came from.

Insight:Closeness is not a relationship. SwarnDB stores the relationships you author, typed and directed, with provenance, never relationships guessed from how close two vectors happen to be.

authored_edges.py

from swarndb import SwarnDBClient

with SwarnDBClient(host="localhost", port=50051) as client:
    client.collections.create(
        "articles", dimension=384, distance_metric="cosine", mode="hybrid"
    )

    a = client.vectors.insert("articles", vector=[0.1, 0.2, 0.3], metadata={"topic": "physics"})
    b = client.vectors.insert("articles", vector=[0.3, 0.1, 0.4], metadata={"topic": "math"})

    # A directed, typed edge with provenance: a fact with a known source.
    client.graph.put_edge(
        "articles", source=a, target=b, edge_type="CITES",
        provenance={"doc_id": "paper-1"}
    )

One Identity, One Store, One Recovery Path

The usual way to add a graph to a vector database is to run a separate graph store next to it. Now you have two systems, two copies of every entity, and a sync job whose only purpose is to keep them aligned. The job lags. The copies drift. A crash can leave the vector store and the graph store disagreeing about what exists.

SwarnDB removes the second store entirely. The id of a vector is the id of its graph node, so a single row is both the thing you search and the thing you traverse from. There is no foreign key linking two tables, because there is only one table. There is no eventual consistency between two stores, because there is only one store.

This also means one crash-recovery path. Write-ahead logging protects the vectors and the edges together; a crash never leaves them out of step. When a collection comes back, the graph comes back with it, already consistent, because it was never a separate thing that could fall behind.

The payoff is the hybrid query. Because structure and meaning live in the same engine over the same rows, you can scope candidates by structure and then rank them by meaning in a single plan, with no round trip to a second database in the middle.

Insight:No second database, no sync job, no copy drift. One identity, one store, one crash-recovery path for both the vector and the graph node.

Opt-In, and an Optional Auto-Similarity Mode

The typed graph is opt-in per collection. Every collection starts life as a fast, accurate vector store with nothing turned on: four distance metrics, per-query ef_search, batch search, correct filter-then-search, and nothing else. You enable the graph for a specific collection by passing mode="hybrid" at create time. Collections you leave alone are vector-only and behave exactly as they always did.

If you would prefer not to author edges at all, there is also an optional auto-similarity mode, set with mode="auto_similarity", that builds similarity edges for you. It is off by default and it is secondary. The first-class typed graph, with edges you author or have an LLM extract, is SwarnDB's graph. The auto-similarity mode is a convenience for cases where similarity-as-structure is genuinely what you want, and it is clearly labelled as such.

Insight:Opt-in per collection. The typed graph is the product's graph; the auto-similarity mode is an optional, secondary convenience that is off by default.

collection_modes.py

# Vector-only by default: nothing extra turned on.
client.collections.create("plain", dimension=384, distance_metric="cosine")

# Opt in to the first-class typed graph for one collection.
client.collections.create("articles", dimension=384, distance_metric="cosine", mode="hybrid")

# Optional, secondary, off by default: infer similarity edges instead of authoring them.
client.collections.create("auto", dimension=384, distance_metric="cosine", mode="auto_similarity")

Complete Example

Everything above, in one script.

typed_graph_complete.py

from swarndb import SwarnDBClient

with SwarnDBClient(host="localhost", port=50051) as client:
    # mode="hybrid" turns on the first-class typed graph.
    client.collections.create(
        "articles", dimension=384, distance_metric="cosine", mode="hybrid"
    )

    # The id each insert returns is the node's id. Vector and node are one object.
    a = client.vectors.insert("articles", vector=[0.1, 0.2, 0.3], metadata={"topic": "physics"})
    b = client.vectors.insert("articles", vector=[0.3, 0.1, 0.4], metadata={"topic": "math"})

    # Typed edges carry provenance, not just a pointer.
    client.graph.put_edge(
        "articles", source=a, target=b, edge_type="CITES",
        provenance={"doc_id": "paper-1"}
    )

    # One composable hybrid query:
    # seed by similarity -> walk the graph -> rank the frontier exactly by meaning.
    result = (
        client.graph.query("articles")
        .vector_similar([0.1, 0.2, 0.3], k=20)
        .traverse("CITES", direction="outgoing")
        .vector_rank([0.1, 0.2, 0.3], k=10)
        .return_nodes()
    )
    for node in result.nodes:
        print(node.id, node.label)

Start building with First-Class Typed Graph

Clone the repo and explore this feature in minutes.

View on GitHub