Hybrid Queries

Scope by structure, then rank by meaning, in one plan

“What if one query could narrow by structure and then rank by meaning, without leaving the database?”

similar → traverse → rankPipeline1-hop, k-hop, pathTraversalvector_rankDefault modeOne engineRound trips

Retrieval usually forces a choice. Pure vector search ranks by meaning but ignores structure: it cannot say "only among the papers this one cites." Pure graph traversal respects structure but cannot rank the result by how well it matches your intent. To get both, teams stitch two systems together and pay for a round trip in the middle.

SwarnDB does both in a single plan. Because the vector and the graph node are the same object, a hybrid query can seed candidates by similarity, walk typed edges across the graph, and then rank the surviving frontier exactly by meaning, all inside one engine over one copy of the data.

The pipeline is composable and reads in the order it runs: vector_similar to seed, traverse to walk typed edges, vector_rank to rank the frontier, return_nodes to collect the result. Each step narrows or reshapes the candidate set the next step works on.

Traversal is flexible. You can take a single hop, expand k hops, or find a shortest path between nodes. Filter-then-search still applies: when you constrain the qualified set first, the query returns the correct top-k among the items that actually qualify, not an approximate set assembled after the fact. The default graph-augmented retrieval mode is vector_rank.

How a Hybrid Query Runs

Seed

vector_similar finds the initial candidates by meaning, the entry points into the graph. You choose how many with k, so the traversal starts from a controlled, relevant frontier rather than the whole collection.

Traverse

traverse walks typed edges from the seed nodes, by edge type and direction. Take a single hop, expand across k hops, or find a shortest path between nodes. This is where structure scopes the candidate set.

Rank

vector_rank ranks the surviving frontier exactly by meaning, over the candidate set the traversal produced. This graph-first ranking is the default graph-augmented retrieval mode.

Return

return_nodes collects the final nodes, each one both a vector and a graph node, so you get the ranked, structurally-scoped result in one round trip with no second database in the loop.

What You Can Do

Capabilities

Composable Builder

Chain the query as a pipeline that reads in execution order: vector_similar to seed candidates, traverse to walk typed edges, vector_rank to rank the surviving frontier by meaning, and return_nodes to collect the result. Each step is explicit, so the plan is easy to read and reason about.

hybrid_builder.py

result = (
    client.graph.query("articles")
    .vector_similar([0.1, 0.2, 0.3], k=20)
    .traverse("CITES", direction="outgoing")
    .vector_rank([0.1, 0.2, 0.3], k=10)
    .return_nodes()
)
for node in result.nodes:
    print(node.id, node.label)

Single-Hop, K-Hop, Shortest Path

Traverse exactly as far as your problem needs. A single hop follows direct typed edges. A k-hop expansion reaches across multiple typed edges to gather a wider frontier. Shortest path finds the most direct typed route between two nodes. Each is a step in the same composable query.

Graph-First vector_rank

vector_rank ranks exactly over the frontier the graph traversal produced, not over the whole collection. It is the default graph-augmented retrieval mode: meaning applied to a structurally-scoped candidate set, so the ranking reflects both what you asked for and the structure you walked.

Quality-Aware & Temporal Traversal

Weight hops by edge confidence, recency, or a numeric property so stronger or fresher relationships count more. Restrict a hop to edges valid at a point in time and regime for time-aware traversal. These are opt-in and off by default; a plain traversal ignores them entirely.

Correct Filter-Then-Search

Fix the qualified set first, then rank exactly within it. When you constrain a hybrid query, it returns the true top-k among the items that actually qualify, rather than approximating after the fact. Structure, filters, and meaning compose without sacrificing correctness.

Scope by Structure, Then Rank by Meaning

Think about a real retrieval question: "of the papers this one cites, which are most relevant to my query?" Vector search alone cannot express the "papers this one cites" part; it only knows meaning. Graph traversal alone can find the cited papers but cannot rank them by how well they match your intent. The interesting questions live exactly where structure and meaning meet.

A SwarnDB hybrid query expresses that intersection directly. vector_similar seeds the entry points by meaning. traverse walks the CITES edges you authored to scope the candidate set by structure. vector_rank then ranks the surviving frontier exactly by meaning. The query says, in one plan, "start near my query, follow the citations, and rank what is left by relevance."

Because the vector and the node are the same object, none of this leaves the engine. There is no step where you pull ids out of a vector store, send them to a graph store, get edges back, and re-rank in application code. The whole pipeline runs over one copy of the data, in one round trip.

The default graph-augmented retrieval mode is vector_rank. It is graph-first: it ranks over the frontier the traversal produced, so the result reflects both the structure you walked and the meaning you searched for.

Insight:One plan, one copy of the data: seed by meaning, scope by the typed edges you authored, then rank the surviving frontier exactly by meaning. No round trip to a second database.

scope_then_rank.py

from swarndb import SwarnDBClient

with SwarnDBClient(host="localhost", port=50051) as client:
    # Seed by similarity -> walk typed edges -> rank the frontier by meaning.
    result = (
        client.graph.query("articles")
        .vector_similar([0.1, 0.2, 0.3], k=20)
        .traverse("CITES", direction="outgoing")
        .vector_rank([0.1, 0.2, 0.3], k=10)
        .return_nodes()
    )
    for node in result.nodes:
        print(node.id, node.label)

Quality-Aware and Temporal Traversal (Opt-In)

Not every edge deserves equal weight. A relationship you verified by hand is more trustworthy than one an LLM proposed with low confidence; a recent connection may matter more than an old one; some edges carry a numeric property that should bias the walk. Quality-aware traversal lets you weight hops by confidence, recency, or a numeric property, so the traversal leans toward the relationships that matter most.

Some relationships are also bounded in time. An edge that was valid last quarter may not be valid now. Temporal traversal restricts a hop to edges valid at a point in time and regime, so you can ask the graph what it looked like then, not just what it looks like today.

Both of these are opt-in and off by default. A plain traverse ignores confidence, recency, and time entirely and simply follows the typed edges. You reach for quality-aware or temporal traversal only when your problem genuinely needs it, and the rest of the time the query stays simple.

Insight:Weight hops by confidence, recency, or a property, and restrict hops to a point in time. Both are opt-in and off by default.

Complete Example

Everything above, in one script.

hybrid_queries_complete.py

from swarndb import SwarnDBClient

with SwarnDBClient(host="localhost", port=50051) as client:
    client.collections.create(
        "articles", dimension=384, distance_metric="cosine", mode="hybrid"
    )

    a = client.vectors.insert("articles", vector=[0.1, 0.2, 0.3], metadata={"topic": "physics"})
    b = client.vectors.insert("articles", vector=[0.3, 0.1, 0.4], metadata={"topic": "math"})
    client.graph.put_edge(
        "articles", source=a, target=b, edge_type="CITES",
        provenance={"doc_id": "paper-1"}
    )

    # Composable hybrid query: scope by structure, then rank by meaning.
    result = (
        client.graph.query("articles")
        .vector_similar([0.1, 0.2, 0.3], k=20)
        .traverse("CITES", direction="outgoing")
        .vector_rank([0.1, 0.2, 0.3], k=10)
        .return_nodes()
    )
    for node in result.nodes:
        print(node.id, node.label)

Start building with Hybrid Queries

Clone the repo and explore this feature in minutes.

View on GitHub