Back to Home

Vector Mathematics

15+ server-side operations, some running over a graph-built frontier

What if your database could generate new ideas, detect behavioral changes, and find hidden patterns, all with simple API calls, server-side?

15+Operations,Single API CallComplexity,No ML PipelineDependencies,Real-Time ResultsLatency

Most databases store data and retrieve it. You put information in, you get information out. The data itself doesn't change, and the database doesn't generate anything new.

Vector math operations transform SwarnDB from a passive data store into an active intelligence engine. Instead of just storing and retrieving vectors, SwarnDB can interpolate between concepts to generate new ones, detect when behavior drifts over time, identify outliers that don't belong, find the optimal balance between relevance and diversity, reduce complexity for visualization, and cluster data into meaningful groups.

Each operation is a single API call. There are no external ML pipelines to build, no data science notebooks to maintain, no custom code to write and debug. Want to blend two concepts together? One call. Want to detect if a user's behavior has changed? One call. Want to find fake accounts in your dataset? One call. Want to automatically group documents by topic? One call.

This is the difference between a database that stores intelligence and a database that creates it. The vectors are already rich with meaning. Math operations unlock that meaning and put it to work. And on a hybrid collection, several of these operations (analogy, diversity, cone, isolation, centroid, and interpolation) run exactly over the candidate set a graph query produced, so the math applies to a structurally-scoped frontier, not the whole collection.

How Vector Mathematics Works

11

Store

Your data is already in vector form. Every embedding you've stored in SwarnDB contains rich semantic information. Vector math operations work directly on these stored vectors without any preprocessing or export.

22

Compute

Call a math operation on one or more stored vectors. Operations range from simple (normalize a vector) to powerful (cluster an entire collection into groups). Each operation is a single API call with clear parameters.

33

Result

Get back new vectors, similarity scores, cluster assignments, or dimensional reductions. Results are immediately usable: store the new vector, use the score in your application logic, or visualize the reduced dimensions.

44

Apply

Use results for generation (create content that blends two styles), monitoring (alert when behavior changes), discovery (find hidden groups in your data), or curation (ensure diverse, non-repetitive results).

What You Can Do

Capabilities

01

Server-Side, Over a Graph-Built Frontier

Every operation runs server-side, with no round-trip to NumPy and no data export. On hybrid collections, several of them (analogy, diversity via MMR, cone, isolation, centroid, and interpolation) run exactly over the candidate set a graph query produced. You scope by structure first, then apply the math to that frontier, all inside one engine. Each operation is exposed over both gRPC and REST.

02

SLERP Interpolation

Blending between two concepts. Imagine you have a vector for "formal business email" and one for "casual chat message." SLERP at t=0.5 generates a vector halfway between them: a "professional but friendly" tone that doesn't exist in your database but is mathematically coherent. Slide the t parameter from 0 to 1 and you get a smooth spectrum between any two concepts. In drug discovery, interpolate between two active compounds to generate novel candidates in the chemical space between them. In music, blend between genres to find crossover sounds. In content creation, mix formal and casual to find the right voice.

interpolate.py
from swarndb import SwarnDBClient

client = SwarnDBClient("localhost", 50051)

# Blend two concepts: formal + casual = professional-friendly
blended = client.math.interpolate(
    formal_email_vec,
    casual_chat_vec,
    t=0.5,  # 0.0 = fully formal, 1.0 = fully casual
    method="lerp"
)
03

MMR Diversity Sampling

Relevant but not repetitive. When you search for similar items, you often get 10 results that are all nearly identical. MMR (Maximal Marginal Relevance) balances relevance with diversity. A lambda of 1.0 gives you the most similar results, which may feel like an echo chamber. A lambda of 0.0 gives you the most diverse results regardless of relevance. The sweet spot is usually around 0.7: results that are relevant to your query but different enough from each other to be interesting. Perfect for playlists that flow naturally without repeating the same sound, content feeds that stay on-topic without boring repetition, and product recommendations that show variety within a category.

mmr.py
# Diverse results: relevant but not repetitive
results = client.math.diversity_sample(
    "songs",
    query_vector,
    k=20,
    lambda_=0.7  # 1.0 = most similar, 0.0 = most diverse
)
# Returns 20 songs: all relevant to query, but varied
04

Centroid Computation

Finding the center of a group. Given a set of vectors, the centroid is the single point that best represents all of them. Think of it as the "average opinion" of a group, the "typical member" of a cluster, or the "essence" of a category. Use it to find the most representative document in a topic, the archetypal product in a category, or the center of a musical genre. You can also use centroids for comparison: compute the centroid of each department's documents and measure how far apart different teams' thinking is.

centroid.py
# Find the "center" of a group
centroid = client.math.centroid(
    "articles",
    vector_ids=[article_1_id, article_2_id, article_3_id, article_4_id]
)
# Returns the vector that best represents all 4 articles
05

Drift Detection

Spotting when things change. Compare a vector's current position to where it was before. If a user's preference vector drifts significantly over time, their tastes are evolving. If a financial account's transaction pattern drifts suddenly, it might be compromised. If a product's review sentiment drifts downward, quality might be slipping. Drift is measured as the distance between two states of the same entity. A high drift score means something meaningful has changed. A low score means stability. One API call tells you if something has changed and by how much.

drift.py
# Detect behavioral change over time
report = client.math.detect_drift(
    "accounts",
    window1_ids=last_month_ids,
    window2_ids=this_month_ids
)
# High drift = behavior changed significantly
# Low drift = stable, consistent behavior
06

Ghost Detection

Finding things that don't belong. A ghost vector is one with no strong connections to anything else in the dataset. It sits alone in vector space, disconnected from the community. In a social network, it's a fake account with a plausible-looking profile but no genuine similarity to real users. In fraud detection, it's a synthetic identity constructed to look legitimate but semantically hollow. In content platforms, it's spam that mimics real posts but doesn't cluster with any natural topic. Ghost detection surfaces these isolated vectors so you can investigate them.

ghost_detection.py
# Find vectors with no strong connections
ghosts = client.math.detect_ghosts(
    "users",
    threshold=0.3  # Vectors with no neighbor above 0.3 similarity
)
# Returns isolated vectors: potential fakes, bots, or anomalies
07

K-Means Clustering

Automatic grouping. Given a collection of vectors, k-means divides them into k groups based on similarity. No labels needed, no training data, no supervision. Documents cluster by topic. Products cluster by category. Transactions cluster by pattern. Users cluster by behavior. Each cluster has a centroid you can inspect to understand what the group represents. Increase k for finer-grained groups, decrease it for broader categories. The clustering runs entirely inside SwarnDB, so there is no data export, no external library, and no ML pipeline.

clustering.py
# Automatically group documents into topics
clusters = client.math.cluster(
    "documents",
    k=10  # Divide into 10 groups
)
# Returns cluster assignments and centroids
# Each centroid represents the "theme" of its cluster
08

PCA Dimensionality Reduction

Simplifying complexity. High-dimensional vectors with 1536 dimensions are impossible for humans to visualize or reason about. PCA (Principal Component Analysis) projects them down to 2 or 3 dimensions while preserving the most important patterns and relationships. Use it for visualization: plot your entire dataset on a 2D scatter plot and see clusters, outliers, and gaps at a glance. Use it for compression: reduce storage requirements by keeping only the dimensions that matter most. Use it for preprocessing: feed lower-dimensional vectors into downstream systems that can't handle high-dimensional input.

pca.py
# Reduce 1536 dimensions to 2 for visualization
reduced = client.math.reduce_dimensions(
    "products",
    n_components=2
)
# Plot the result: clusters and outliers become visible
09

Analogy Computation

Reasoning by analogy. The classic example: King minus Man plus Woman equals Queen. Given three vectors (A, B, C), analogy computes the vector that relates to C the way B relates to A. This captures the relationship between A and B and applies it to C. Use it for creative generation: if A is a watercolor painting and B is the same scene in oil paint, apply that style transfer to C, a new scene. Use it for relationship discovery: find what product relates to Category B the way a bestseller relates to Category A. Use it for transfer learning: apply patterns learned in one domain to another.

analogy.py
# Reasoning by analogy: A is to B as C is to ?
result = client.math.analogy(
    king_vec,   # "king"
    man_vec,    # "man"
    woman_vec   # "woman"
)
# Returns vector closest to: king - man + woman = queen
10

Normalization

Putting everything on the same scale. Vectors from different sources or models may have different magnitudes. One vector might have a length of 1.0, another 15.7, another 0.003. Normalization ensures all vectors have the same magnitude (length equals 1), making cosine similarity calculations reliable and comparable across the entire dataset. Without normalization, a vector with a large magnitude might dominate similarity calculations regardless of its actual direction in space. This is essential preprocessing for many use cases and a single API call in SwarnDB.

Generation: Creating What Doesn't Exist

SLERP interpolation and analogy computation are generative operations. They don't just retrieve data from your database. They create new vectors that represent concepts not present in your data.

Interpolation generates a smooth spectrum between any two concepts. Take a vector for a jazz recording and a vector for an electronic track, and SLERP at t=0.5 produces a vector representing the crossover sound between them. This vector doesn't correspond to any item in your database, but you can search for its nearest neighbors to find real items that come closest to that blended concept. This is how you build recommendation engines that go beyond "more of the same." Instead of showing a jazz listener more jazz, you can explore the space between jazz and their second-favorite genre.

Analogy computation is even more powerful. It captures a relationship and transfers it. If you know how "Monday" relates to "Tuesday" (the next day), you can apply that same relationship to "January" and discover "February." If you know how a product's budget version relates to its premium version, you can apply that relationship to a completely different product category. The database is reasoning by analogy, something previously reserved for complex ML models.

Together, these operations turn SwarnDB into a creative engine. Feed it data, and it can generate concepts that exist in the spaces between your data points.

Insight:These operations don't retrieve existing data. They generate new vectors representing concepts that don't exist in your database yet.

generation.py
from swarndb import SwarnDBClient

client = SwarnDBClient("localhost", 50051)

# Generate a new concept: blend jazz + electronic
crossover = client.math.interpolate(
    jazz_vec, electronic_vec, t=0.5, method="lerp"
)
# Find real tracks closest to this blended concept
discoveries = client.search.query("music", crossover, k=10)

# Analogy: apply a "style transfer" relationship
# If A is to B as C is to ?
new_concept = client.math.analogy(
    budget_phone_vec, premium_phone_vec, budget_laptop_vec
)
# Returns a vector representing a "premium laptop" concept

Discovery: Finding Hidden Patterns

Clustering, centroid computation, and ghost detection are discovery operations. They reveal structure in your data that you didn't know was there.

K-means clustering automatically groups your vectors into coherent categories. No labels, no training data, no manual categorization. Store 10,000 customer support tickets and cluster them into 20 groups. Each group's centroid tells you what the cluster is about. You might discover that 30% of your tickets are about billing, 20% about a specific feature bug, and 5% about a use case you didn't know your product supported. This insight comes from the data itself.

Centroid computation finds the representative center of any group. Give it the vectors for all articles in the "technology" category, and the centroid represents the archetypal technology article. Compare centroids across categories to understand how different topics relate. Compare a cluster's centroid to an individual item to measure how typical or unusual that item is within its group.

Ghost detection is the inverse of clustering. Instead of finding groups, it finds loners, vectors that don't belong to any group. These are items with no strong similarity to anything else in the dataset. In a social network, ghosts are potential fake accounts. In an e-commerce catalog, ghosts are miscategorized products. In a document collection, ghosts are off-topic content. Surfacing these anomalies is a single API call.

Insight:No ML pipeline, no data science team, no custom code. Just API calls that reveal the hidden structure already present in your vectors.

discovery.py
# Discover hidden structure in your data

# 1. Automatic grouping
clusters = client.math.cluster("support_tickets", k=20)
for cluster in clusters:
    print(f"Group: {cluster.size} tickets")
    # Inspect centroid to understand the theme

# 2. Find the archetypal item in a group
center = client.math.centroid("articles", vector_ids=technology_article_ids)
# Search for the real article closest to the center
most_representative = client.search.query("articles", center, k=1)

# 3. Find anomalies
ghosts = client.math.detect_ghosts("products", threshold=0.3)
print(f"Found {len(ghosts)} items with no strong connections")
# Investigate: miscategorized? spam? genuinely unique?

Monitoring: Detecting Change Over Time

Drift detection turns SwarnDB into a monitoring system. By comparing a vector's current state to a previous snapshot, you can detect and quantify change over time.

The concept is simple: store a vector representing the current state of some entity. Later, compute a new vector for the same entity. Measure the distance between them. If the distance is large, something changed significantly. If it's small, things are stable.

The applications are broad. In user behavior, embed a user's recent activity into a vector each month. Drift between months tells you if their preferences are evolving. A streaming service can detect when a user's taste shifts from action movies to documentaries before the user even realizes it, and adjust recommendations proactively.

In security, embed an account's transaction patterns. Sudden drift means the pattern changed, possibly because the account is compromised. Traditional fraud detection requires complex rule engines and ML models. Drift detection gives you a meaningful signal with one API call.

In product quality, embed customer reviews over time. Drift in the review sentiment vector tells you if perception is changing. Catch quality issues early, before they become five-star-to-one-star collapses visible in aggregate ratings.

In content moderation, track how a community's discourse vector drifts over time. Gradual drift toward toxic language patterns can be detected and addressed before it becomes a crisis.

Insight:One number tells you if something changed and by how much. No complex ML model, no rule engine. Just a distance measurement between two states.

monitoring.py
# Monitor behavioral change over time

# Save monthly snapshots of user preferences
current_vector = embed(user_recent_activity)

# Detect drift from last month
drift_report = client.math.detect_drift(
    "user_preferences",
    window1_ids=last_month_ids,
    window2_ids=this_month_ids
)

if drift_report > 0.4:
    print("Significant taste change detected")
    # Refresh recommendations proactively
elif drift_score > 0.7:
    print("Dramatic shift - possible account compromise")
    # Trigger security review

Curation: Balancing Quality and Diversity

MMR diversity sampling and PCA dimensionality reduction are curation operations. They shape how results are presented and understood.

The diversity problem is everywhere. Search for "running shoes" and you get 10 results that are all nearly identical Nike models. A playlist of "relaxing music" plays the same ambient sound 50 times. A news feed about "technology" shows 20 articles about the same product launch. The results are technically relevant, but the experience is terrible.

MMR solves this by balancing two objectives: each result should be similar to the query (relevance), but different from the other results already selected (diversity). The lambda parameter controls the balance. At 1.0, you get pure relevance, the most similar items regardless of repetition. At 0.0, you get pure diversity, the most spread-out items regardless of relevance. At 0.7, you get the sweet spot: every result is on-topic, but each one brings something different to the set.

PCA serves a different curation need: making high-dimensional data understandable. A collection of 1536-dimensional vectors is mathematically rich but visually opaque. PCA reduces those 1536 dimensions to 2 or 3, preserving the most important relationships. Plot the reduced vectors on a scatter chart and you can see at a glance where clusters form, where outliers sit, and where gaps exist. This visual understanding often reveals insights that no amount of numerical analysis would surface.

Insight:MMR ensures your top 10 results don't all say the same thing. PCA lets you see your entire dataset at a glance. Both are single API calls.

curation.py
# Curate: relevant AND diverse results

# Instead of 20 similar results...
results = client.math.diversity_sample(
    "articles",
    query_vector,
    k=20,
    lambda_=0.7
)
# ...get 20 results that cover the topic broadly

# Visualize your entire dataset in 2D
reduced = client.math.reduce_dimensions("products", n_components=2)
# Plot reduced vectors: clusters, gaps, and outliers
# become visible at a glance

Complete Example

Everything above, in one script.

vector_math_complete.py
from swarndb import SwarnDBClient

client = SwarnDBClient("localhost", 50051)

# 1. Interpolate: Blend two concepts
blended = client.math.interpolate(
    formal_vec, casual_vec, t=0.5, method="lerp"
)

# 2. Diversity Sample: Get diverse, relevant results
diverse_results = client.math.diversity_sample(
    "articles", query_vector, k=20, lambda_=0.7
)

# 3. Drift: Detect behavioral change
drift = client.math.detect_drift(
    "users", window1_ids=last_month_ids, window2_ids=this_month_ids
)
print(f"Drift score: {drift}")

# 4. Ghost Detection: Find anomalies
ghosts = client.math.detect_ghosts("accounts", threshold=0.3)
print(f"Suspicious accounts: {len(ghosts)}")

# 5. Clustering: Automatic grouping
clusters = client.math.cluster("documents", k=10)
for c in clusters:
    print(f"Cluster {c.id}: {c.size} documents")

# 6. Analogy: Reasoning by relationship
result = client.math.analogy(
    king_vec, man_vec, woman_vec
)

Start building with Vector Mathematics

Clone the repo and explore this feature in minutes.

View on GitHub