Edges, Provenance & Curation

Every edge knows where it came from

“When your graph answers a question, can you say where each fact came from?”

CRUD + verify/rejectEdge opsFull audit trailHistorySource, chunk, modelProvenanceCSV & JSONLImport

In most systems a graph edge is just a pointer: this node connects to that node. When the graph surfaces something surprising, you have no way to ask where the connection came from, who created it, or how much to trust it. The edge is anonymous, and anonymous facts are hard to audit.

SwarnDB treats every edge as a fact with a known origin. Each edge carries its type, a confidence value, a manual-vs-extracted flag, and a provenance record: the source document, the source chunk, the model that proposed it if it was extracted, and its verification status. The graph does not just say two things are related; it says how it knows.

That makes curation a first-class workflow rather than an afterthought. You can create, read, update, and delete edges; verify an edge you trust or reject one you do not; and read the full audit history of how an edge reached its current state. When an LLM proposes edges, you review them with the same controls, keeping the good ones and rejecting the rest.

When you have relationships in bulk, import them from CSV or JSONL rather than writing them one at a time. Every imported edge lands as the same first-class typed edge, carrying its provenance with it, so the honest property holds no matter how the edge got there: you always know where a fact came from.

How Edge Curation Works

Create

Author edges with put_edge or import them in bulk from CSV or JSONL. Each edge records its type, confidence, source, and a manual-vs-extracted flag from the moment it exists.

Review

Inspect an edge's provenance: source document, source chunk, model, confidence, and verification status. You see not just that a relationship exists but how it got into the graph and how sure the system is.

Curate

Verify an edge you trust or reject one you do not. Update or delete edges as your understanding changes. Curation is an explicit, supported operation, not a side effect of re-running a pipeline.

Audit

Read the full audit history of an edge: how it was created, every verify or reject decision, and every change. The graph keeps a record, so you can always reconstruct where a fact came from.

What You Can Do

Capabilities

Edge CRUD with Verify & Reject

Create, read, update, and delete edges, and mark each one verified or rejected. Curation is a first-class operation: you keep the relationships you trust, reject the ones you do not, and the graph reflects your decisions rather than re-deriving them on every run.

Full Audit History

Every edge keeps a record of how it was created and every verify, reject, or change since. When the graph surfaces a fact, you can trace exactly how that edge reached its current state, which is essential for fraud, threat, and compliance work where the provenance of a link is the point.

Provenance on Every Edge

Each edge carries its source document, source chunk, the model that proposed it if it was extracted, its confidence, and a manual-vs-extracted flag. The edge is a fact with a citation, not an anonymous pointer, so you always know where a relationship came from.

edge_provenance.py

# Every edge carries provenance, not just a pointer.
client.graph.put_edge(
    "articles", source=a, target=b, edge_type="CITES",
    provenance={"doc_id": "paper-1", "chunk_id": "p1-c3"}
)

Bulk Import from CSV & JSONL

Load relationships you already have in bulk from CSV or JSONL instead of authoring them one edge at a time. Every imported edge becomes the same first-class typed edge, carrying its type, confidence, and provenance, so a bulk-loaded graph is just as auditable as a hand-authored one.

You Always Know Where a Fact Came From

When a graph drives a decision, the provenance of its edges is not a nice-to-have, it is the whole point. In fraud and threat work, "these two accounts are linked" only matters if you can say how: which transaction, which shared identifier, which document established the link. An anonymous edge is a claim you cannot defend.

SwarnDB attaches that origin to every edge. Each edge records its type, its confidence, whether a person authored it or a model extracted it, and a provenance record pointing at the source document and chunk, plus the model if it was extracted. The graph carries its citations with it, so any relationship it surfaces can be traced back to where it came from.

This turns review into a real workflow. You verify edges you trust and reject ones you do not, update or delete them as your understanding changes, and read the full audit history of how each edge reached its current state. When an LLM proposes edges, you curate them with the same controls, so the graph reflects human judgment, not whatever the model happened to output.

Bulk import keeps the property at scale. Relationships loaded from CSV or JSONL land as the same first-class typed edges, carrying their provenance with them. However an edge got into the graph, authored, imported, or extracted, you can always answer the one question that matters: where did this fact come from?

Insight:An edge is a fact with a citation, not an anonymous pointer. Verify, reject, and audit every relationship, and trace any of them back to its source.

Complete Example

Everything above, in one script.

edges_provenance_complete.py

from swarndb import SwarnDBClient

with SwarnDBClient(host="localhost", port=50051) as client:
    client.collections.create(
        "articles", dimension=384, distance_metric="cosine", mode="hybrid"
    )

    a = client.vectors.insert("articles", vector=[0.1, 0.2, 0.3], metadata={"topic": "physics"})
    b = client.vectors.insert("articles", vector=[0.3, 0.1, 0.4], metadata={"topic": "math"})

    # Author an edge with full provenance.
    client.graph.put_edge(
        "articles", source=a, target=b, edge_type="CITES",
        provenance={"doc_id": "paper-1", "chunk_id": "p1-c3"}
    )

    # Curate: verify the edges you trust, reject the ones you do not,
    # and read the audit history of how each edge reached its state.

Start building with Edges, Provenance & Curation

Clone the repo and explore this feature in minutes.

View on GitHub