Fraud Detection: Ring Identification
Typed money-movement edges with provenance and audit history
What if a fraud ring, and the evidence trail behind every link, lived in one auditable database?
Individual transaction monitoring catches lone actors: the stolen credit card used at a gas station, the suspicious wire to an offshore account. These are solved problems. Fraud rings are not.
A fraud ring is a coordinated group of accounts that transact in patterns designed to look legitimate individually but form a clear network when viewed together. Detecting the ring means seeing the relationships: money moving between accounts, accounts sharing a device, a phone, or a payee. In SwarnDB those relationships are explicit, typed edges you author from your transaction and KYC data: TRANSACTED_WITH, SHARES_DEVICE, SHARES_IDENTITY. Each is directed, carries confidence, and carries provenance: the transaction id, the source system, who or what created it. For a regulated investigation, that audit trail is the point.
This is exactly the workload the typed graph is for. Transaction and behavioral signals are also embedded as vectors (the vector id is the account or transaction's node id), so the same store ranks by behavioral similarity, walks the typed money-movement edges to surface the ring, and runs the vector math (ghost detection for synthetic identities, drift for account takeover, clustering for triage) over the graph-built frontier. One store, one auditable view, full provenance on every link.
The Traditional Approach
The fragmented stack most teams cobble together today.
- Relationship links live in one store, behavioral scores in another, joined by hand
- Audit trails are fragmented: no single record of where a link came from
- Alert fatigue: thousands of false positives from per-transaction rules
- Synthetic identities look legitimate to per-account analysis
- Behavioral drift goes undetected until damage is done
- Manual triage: analysts spend hours correlating related alerts
The SwarnDB Approach
One database. Every capability built in.
Typed Money-Movement Edges
TRANSACTED_WITH, SHARES_DEVICE, and SHARES_IDENTITY are explicit, typed edges authored from your transaction and KYC data. Each carries confidence and a provenance record (transaction id, source system, creator), so every link in a ring is auditable.
Ring Traversal
Start from one suspicious account and walk the typed money-movement and shared-identity edges to surface the whole ring. Single-hop, k-hop, and shortest path over edges you authored, with the audit trail attached.
Ghost and Drift Detection
Synthetic identities show up as ghost vectors with no genuine behavioral neighbors; detect_ghosts finds them. Drift detection compares current behavior to a historical baseline to catch account takeover. Both run server-side.
Clustering and Curation
K-means clustering groups related alerts into investigation cases. Verify, reject, and audit history on every edge let analysts curate the ring and keep a defensible record of every decision.
- Money-movement and identity links are typed edges with provenance, fully auditable
- One store for transaction vectors, typed links, anomaly scoring, and clustering
- Ring traversal from a single suspicious account, with the evidence trail attached
- Ghost detection for synthetic identities, drift detection for account takeover
- Verify / reject / audit-history curation gives a defensible investigation record
- Bulk-import existing money-movement edges from CSV or JSONL
01. Ring Discovery
A traditional fraud detection system flags a single suspicious transaction. An analyst reviews it. It looks legitimate on its own. The analyst clears it and moves on. Multiply this by 500 alerts per day, and you have alert fatigue, the number one problem in fraud operations.
SwarnDB changes the workflow. The links between accounts are typed edges you authored from your data: TRANSACTED_WITH for money movement, SHARES_DEVICE and SHARES_IDENTITY for collusion signals, each carrying a provenance record. Starting from one suspicious account, you walk those edges to surface the ring, and the audit trail comes along: for every link you can show the transaction id and source system that produced it.
A two- to three-hop traversal expands the network. Hop 1 reaches accounts that transacted directly with the suspect or shared a device. Hop 2 reaches their counterparties. Hop 3 closes the ring. You rank the surviving frontier by behavioral similarity so the coordinated accounts rise to the top. No single transaction triggered a hard rule, but the network, and the evidence behind every edge, is unmistakable. Minutes of work instead of days.
Key insight:From one suspicious account, walk typed money-movement edges to the full ring, with a provenance record behind every link. The network is the evidence.
# One suspicious account: walk typed money-movement edges to the ring
ring = (
client.graph.query("transactions")
.vector_similar(suspect_embedding, k=20)
.traverse("TRANSACTED_WITH", direction="outgoing")
.vector_rank(suspect_embedding, k=50)
.return_nodes()
)
# Each hop follows an edge you authored, with provenance attached.
# Rank the frontier so coordinated accounts surface first.
ring_accounts = {node.metadata["account_id"] for node in ring.nodes}
# A coordinated ring, with the evidence trail behind every link.
# No single transaction triggered a rule; the NETWORK is the evidence.02. Synthetic Identity Detection
Synthetic identities are the most sophisticated form of fraud. A fraudster combines a real Social Security number (often from a child or deceased person) with fabricated personal details to create a new identity that passes all standard verification checks. The synthetic identity opens accounts, builds credit history over months, and then executes a "bust-out," maxing out all credit lines simultaneously and disappearing.
Traditional detection systems struggle because each synthetic identity, viewed in isolation, looks legitimate. The account was opened normally, transactions are reasonable, payments are made on time. There's nothing anomalous about any single account.
SwarnDB's ghost detection exploits a subtle but powerful signal: synthetic identities have no genuine relationships. Their transaction embeddings exist in vector space, but they don't cluster with real accounts. Real people have overlapping behavioral patterns: they shop at similar stores, transact at similar times, have similar spending profiles. Synthetic identities are fabricated to look real, but their behavioral embeddings are artificially constructed and don't naturally cluster with anyone.
Ghost vectors are vectors with no genuine behavioral neighbors; they exist in the space but are isolated. detect_ghosts scans the collection and returns vectors whose maximum similarity to any other vector falls below a threshold. These are the accounts that exist but don't "belong," the definition of a synthetic identity.
Key insight:Synthetic identities look real in isolation but have no genuine behavioral neighbors. Ghost detection finds the accounts that exist but don't belong.
# Find synthetic identities - ghost detection (vector math)
ghosts = client.math.detect_ghosts("transactions",
threshold=0.3
)
# Ghosts: vectors with no genuine behavioral neighbors
# Real accounts cluster naturally - synthetic ones don't
for ghost in ghosts.results:
account = ghost.metadata["account_id"]
max_similarity = ghost.max_similarity
# max_similarity < 0.3 = no genuine behavioral neighbors
# Flag for investigation as potential synthetic identity
# Cross-check against the typed graph: is the ghost wired into a ring?
for ghost_id in ghost_account_ids:
connected = (
client.graph.query("transactions")
.vector_similar(ghost_embedding, k=10)
.traverse("SHARES_IDENTITY", direction="outgoing")
.return_nodes()
)
# Isolated ghost, no shared-identity edges = lone synthetic identity
# Ghost wired into shared-identity edges = part of a coordinated ring03. Behavioral Drift
Account takeover is another major fraud vector. A legitimate account, sometimes years old with perfect history, gets compromised. The fraudster starts using it, and the transaction patterns shift. The account that used to buy groceries and pay rent starts making international wire transfers and purchasing cryptocurrency.
Traditional systems detect this through hard rules: "flag if transaction amount exceeds 5x historical average." These rules are brittle, generate false positives (a legitimate customer buys a car and gets flagged), and miss subtle drift (a gradual shift in spending patterns that stays within rule thresholds).
SwarnDB's drift detection compares embeddings over time. Every account has a historical behavioral embedding, the centroid of its past transaction vectors. As new transactions come in, SwarnDB computes the current behavioral embedding and measures how far it has drifted from the historical baseline.
Small drift is normal; people's spending patterns evolve naturally. Large, sudden drift is suspicious; it suggests the account is now being operated by a different person. Gradual but persistent drift might indicate a slow-moving compromise where the fraudster is careful not to trigger sudden-change rules.
The drift score is a continuous value, not a binary flag. You can set thresholds per risk tier: high-value accounts trigger investigation at drift > 0.2, standard accounts at drift > 0.4. The score captures nuance that binary rules cannot.
Key insight:A legitimate account starts behaving differently. Drift detection catches the shift, whether sudden (account takeover) or gradual (slow compromise).
# Monitor behavioral drift for account takeover detection
drift = client.math.drift("transactions",
vector_id_a=current_behavior_id,
vector_id_b=historical_baseline_id
)
# drift.score interpretation:
# < 0.1 - normal, stable behavior
# 0.1-0.3 - minor shift, within expected range
# 0.3-0.5 - significant change, investigate high-value accounts
# > 0.5 - major drift, likely compromised
# Per-risk-tier thresholds
if drift.score > 0.3 and account.tier == "high_value":
flag_for_investigation(account)
elif drift.score > 0.5:
flag_for_investigation(account)
# Track drift over time for trend analysis
weekly_drift_scores = []
for week in recent_weeks:
d = client.math.drift("transactions",
vector_id_a=week.current_id,
vector_id_b=week.baseline_id
)
weekly_drift_scores.append(d.score)
# Gradual increase = slow compromise
# Sudden spike = account takeover04. Investigation Clusters
A fraud operations team receives hundreds of alerts per day. Most are false positives. The real fraud signals are buried in noise. Worse, related alerts, transactions from the same ring, variants of the same attack pattern, arrive as separate items. An analyst might investigate three alerts independently without realizing they're all part of the same fraud campaign.
SwarnDB's clustering operation groups related alerts automatically. K-means clustering on transaction embeddings organizes alerts into coherent investigation clusters. Alerts from the same fraud ring land in the same cluster. Alerts with similar behavioral patterns (even from different accounts) are grouped together. The analyst no longer sees 500 individual alerts; they see 15 investigation cases, each containing all related signals.
This transforms the fraud operations workflow. Instead of triaging individual alerts (most of which are noise), analysts review clusters. A cluster of 30 alerts from 8 accounts with coordinated timing is obviously a ring, and the entire ring is presented as one investigation case, not 30 separate alerts. A cluster of 5 alerts with unusual international transfer patterns becomes one case with full context.
The clustering is based on behavioral similarity, not rule matching. Novel fraud patterns that don't match any known rule still cluster together because they share behavioral characteristics. This means SwarnDB catches new fraud patterns that rule-based systems are completely blind to.
Key insight:500 individual alerts become 15 investigation cases. Related fraud signals are grouped automatically. No manual triage needed.
# Group alerts into investigation clusters
clusters = client.math.cluster("transactions",
k=15 # Expected number of distinct patterns
)
# Each cluster is an investigation case
for cluster in clusters.results:
accounts = set(v.metadata["account_id"] for v in cluster.vectors)
total_amount = sum(v.metadata["amount"] for v in cluster.vectors)
print(f"Case {cluster.id}:")
print(f" Alerts: {len(cluster.vectors)}")
print(f" Accounts: {len(accounts)}")
print(f" Total Amount: ${total_amount:,.2f}")
print(f" Pattern: {cluster.centroid_description}")
# Example output:
# Case 7: 30 alerts, 8 accounts, $142,000 - coordinated ring
# Case 3: 5 alerts, 2 accounts, $89,000 - international transfers
# Case 12: 47 alerts, 1 account - false positive (legitimate business)
# Combine with ring discovery: walk typed edges from the cluster seed
for suspicious_cluster in high_risk_clusters:
ring = (
client.graph.query("transactions")
.vector_similar(suspicious_cluster.centroid, k=20)
.traverse("TRANSACTED_WITH", direction="outgoing")
.vector_rank(suspicious_cluster.centroid, k=50)
.return_nodes()
)
# Full ring structure from the cluster seed, with edge provenanceSwarnDB vs Traditional Stack
A side-by-side look at the traditional approach versus SwarnDB.
| Capability | Traditional Stack | SwarnDB |
|---|---|---|
| Ring Detection | Manual queries in a separate graph DB | Walk typed edges from one account |
| Link provenance | Fragmented across stores | Per-edge audit record |
| Synthetic IDs | Not reliably detectable | Ghost detection finds isolated vectors |
| Behavioral Change | Hard rules (5x average) | Continuous drift score |
| Stack | Graph DB + ML + Rule engine + SIEM | One database, one auditable view |
Key Metrics
The Code
Everything above, in a few lines of Python.
from swarndb import SwarnDBClient
client = SwarnDBClient(host="localhost", port=50051)
# Hybrid mode: transaction vectors and a typed money-movement graph.
client.collections.create(
"transactions", dimension=256, distance_metric="cosine", mode="hybrid"
)
# Author money-movement edges from your data, with provenance.
client.graph.put_edge("transactions", source=acct_a, target=acct_b,
edge_type="TRANSACTED_WITH",
provenance={"txn_id": "t-9913", "source": "ledger"})
client.graph.bulk_import_edges("transactions", movement_rows, format="csv")
# Ring discovery: walk typed edges from one suspicious account.
ring = (
client.graph.query("transactions")
.vector_similar(suspect_embedding, k=20)
.traverse("TRANSACTED_WITH", direction="outgoing")
.vector_rank(suspect_embedding, k=50)
.return_nodes()
)
# Synthetic identities and account takeover (vector math).
ghosts = client.math.detect_ghosts("transactions", threshold=0.3)
drift = client.math.drift("transactions",
vector_id_a=current_behavior_id, vector_id_b=historical_baseline_id
)
# Curate the investigation: verify or reject edges, keep audit history.
client.graph.verify_edge("transactions", edge_id)