AcademiaHighly Complex

Research Paper Discovery Engine

Real CITES edges, traversed and ranked by meaning

Vector SearchTyped CITES EdgesCross-field TraversalHybrid Query

100

Papers

Fields

Typed

Citation Edges

All

Tests Passed

The Scenario

Research paper abstracts are embedded as vectors, and citations are authored as typed CITES edges from the reference lists, with provenance. A hybrid query seeds by abstract similarity, traverses the CITES edges (including across disciplines), and ranks the surviving frontier by meaning. A machine learning paper reaches a neuroscience paper because a real citation path connects them, ranked up because their abstracts genuinely overlap. Every cross-field connection is explainable from the citations behind it.

Key Results

Cross-field connections follow real CITES edges, then rank by meaning
Every connection is explainable from the citations behind it
Citations authored or bulk-imported with provenance, not inferred
10-field coverage with citation-grounded inter-field bridges

100

Papers

Fields

Typed

Citation Edges

All

Tests Passed

The Code

Everything above, in a few lines of Python.

python

# Citations are typed CITES edges, authored or bulk-imported with provenance.
client.graph.put_edge("papers", source=paper_a, target=paper_b,
                      edge_type="CITES", provenance={"source": "refs"})
client.graph.bulk_import_edges("papers", citation_rows, format="csv")

# Cross-field discovery: seed by similarity, traverse CITES, rank by meaning.
results = (
    client.graph.query("papers")
    .vector_similar(paper_embedding, k=20)
    .traverse("CITES", direction="outgoing")
    .vector_rank(paper_embedding, k=20)
    .return_nodes()
)

Try it yourself

Clone the repo, spin up SwarnDB, and run this showcase in minutes.

View on GitHub