┌─ agidb · technical preprint · landing artifact last build · 2026-05-23 · 14:32 UTC · commit 7f4ab02 v2 pre-alpha · 44 tests passing · phase 7/16 ──┐
v2 substrate · m9 · v2.1 brain-aligned · m12

the cognitive
substrate for
autonomous agents.

agidb is a content-addressable hyperdimensional memory database with first-class goals, beliefs, sensory input, and self-model — and in v2.1, the only agent memory substrate with brain-aligned multimodal encoding.

not a vector database. not a graph database. a new category — built around the seven things an autonomous agent must persist, each as a first-class typed shape. one rust binary. one API. observe() · recall() · set_goal() · assert_belief() · unlearn(). no LLM in the read path. sub-50ms p95.

primitive8192-bit HV
recall p95< 50ms
floors7 cognitive
networkzero
┌── agidb.observe_multimodal · live pipeline ── v2.1 · brain-aligned
idle · sensory buffer
000ms
▸ video V-JEPA 2 · 1.2B
1024d
▸ audio Wav2Vec-BERT 2.0
1024d
▸ text Llama-3.2-3B
2048d
Charikar '02 · sign(R·x) VSA · ⊕ role-filler bind
[ bound episode signature · 8192-bit ] ep #—
ROLEv⊗ŝᵥ ⊕ ROLEa⊗ŝₐ ⊕ ROLEt⊗ŝₜ ⊕ ROLEτ⊗ŝτ
BAMS · RSA vs TRIBE v2 DMN D-Att FPN SM Vis V-Att r̄ = 0.67
└── no LLM in read path · pure rust · embedded ──
┌─ inheriting hippocampal indexing· complementary learning systems · McClelland '95· HDC · Kanerva '88 · Charikar '02· bi-temporal · Snodgrass '99· TRIBE v2 · Meta FAIR · Mar '26· V-JEPA 2 · Wav2Vec-BERT · Llama-3.2-3B

every database was built
for a different consumer.

postgres for accountants. mongo for app developers. neo4j for analysts. pinecone for retrieval pipelines. mem0 / letta / zep for chat-style RAG memory. none were designed for an autonomous agent that must remember, reason, revise, and forget across years — with provenance, audit, and goal awareness.

six structural problems of the vector-DB pipeline
  1. latency — p95 1–3s, sometimes 60s, every recall is multiple network calls
  2. cost — embedding API + context-window tokens, every query
  3. no temporal grounding — vector DBs don't know what was true when
  4. no provenance — weak attribution, no audit trail
  5. no graceful degradation — below threshold = empty result
  6. no consolidation — store grows without bound, no sleep
four cognitive gaps specific to autonomous agents
  1. no first-class goals — text-stored; state, parent-child, success criteria in agent code
  2. no first-class beliefs — confidence + revision history live in agent code, badly
  3. no introspection — agent can't ask "what did I learn?" — no event log
  4. no clean unlearn — removal cascades through atoms, beliefs, procedures · nobody handles this
these aren't bugs · they are properties of the wrong primitive.
today six-step glue pipeline
  1. 01embed every conversation
  2. 02store vectors · pinecone / qdrant / pgvector
  3. 03graph DB for relations · neo4j / kuzu
  4. 04at recall: embed query · cosine top-k
  5. 05rerank with another LLM call
  6. 06stuff chunks into prompt · pray
p951–3ssome hit 60s
cost$/recallembed + tokens
goals?nonot first-class
unlearn?DELETEdestructive
agidb one substrate · seven floors
let db = Agidb::open("./memory.agidb").await?;

db.observe("sarah recommended bawri in bandra last weekend").await?;
db.assert_belief(Belief::new("sarah likes thai food").with_confidence(0.8)).await?;
db.set_goal(Goal::new("find a thai place for the team dinner")).await?;

let r = db.recall("what thai place did sarah mention?").await?;
// → "Bawri"  conf=0.94  tier=Exact  goal-biased  source=msg#1217
p95< 50mslocal POPCOUNT
cost$0no API in read
goals?typedstate machines
unlearn?cascading+ self-vector subtract

seven things an
autonomous agent persists.

50 years of memory research (tulving, squire, baddeley, mcclelland) names five biological memory systems. add goals/beliefs and the self-model — what an agent needs that a brain organism doesn't — and you get the seven floors agidb stores as first-class typed shapes.

notation

↗ = surprise-gated promotion · ↻ = revision · ⊕ = bound role-filler · α = EMA rate

brain mapping

floors 1·3 → hippocampus · floor 4 → neocortex · floor 5 → basal ganglia · floor 7 → DMN + dlPFC

phase status

✅ shipped from sochdb v1 inheritance · ⬜ phase 9–16 (v2.0 + v2.1)

07
self-model DMN · dlPFC ⬜ phase 10 · v2.0

append-only log of every learning event · self-vector EMA — a slowly drifting 8192-bit centroid representing "what kind of agent am I"

db.what_did_i_learn(since) · db.self_vector() · db.attention_trace(recall_id)
06
goals + beliefs PFC ⬜ phase 9 · v2.0

what the agent wants (state machine: Active / Paused / Completed / Abandoned) and what it thinks is true (confidence + evidence + revision audit)

db.set_goal(g) · db.assert_belief(b) · db.revise_belief(evidence) · db.what_do_i_believe(about)
05
procedural basal ganglia · cerebellum ⬜ phase 9+

skills and workflows with execution traces and success-count statistics — typed episode shape

db.observe_procedure(p) · db.record_execution(p, trace) · db.procedure_stats(p)
04
semantic neocortex ✅ inherited · phase 6

decoupled general knowledge · facts consolidated from N≥3 episodic patterns into a SemanticAtom

db.consolidate() · db.what_about(concept_id)
03
episodic hippocampus · CA1 · CA3 ✅ inherited · phase 4

events with bi-temporal stamps and HDC signatures · the core typed shape · multimodal in v2.1

db.observe(text, ctx) · db.observe_multimodal(video, audio, text) · db.recall(query)
02
working dlPFC · ~7±2 slots ✅ inherited

active context · session-scoped · recency-weighted retrieval over episodic — no separate table

db.recall(Query::with_session(sid))
01
sensory primary cortices (V1 / A1 / S1) ⬜ phase 10 · multimodal v2.1

raw signal ring buffer with surprise-gated promotion to episodic · v2.1: multimodal (video + audio + text) with brain-calibrated θ_brain

db.observe_sensory(frame) · db.observe_multimodal(...) · surprise > θ_brain ↗ episodic

give it a fragment.
it returns the whole pattern.

classical DBs need keys. vector DBs need similar embeddings. agidb takes the partial pattern itself — a few entities, a rough time, half a sentence — and lets activation converge toward the stored memory it best overlaps. this is what a hippocampus does.

memory cortex 8192-bit space · projected 40×16 · 4 stored episodes
phase 1 · awaiting query
row 00row 04row 08row 12
col 00col 08col 16col 24col 32
stored episode patterns
  • ep #1217 · sarah's thai pick
  • ep #841 · "i live in berlin"
  • ep #1402 · lisbon offsite
  • ep #1654 · auth refactor
the four phases
  1. 01storeepisode binds to sparse cell pattern
  2. 02probepartial query activates ~35% of target
  3. 03propagateactivation spreads through neighbors
  4. 04convergenearest attractor lights · match
░ inactive cell ▒ stored pattern (dim) ▓ probe activation ▓ propagating █ converged match

one episode.
three modalities.
one bound signature.

v2.1 adds a multimodal sensory pipeline using the same frozen encoder stack as Meta FAIR's TRIBE v2 brain-encoding foundation model. each modality projects to an 8192-bit signature via Charikar '02 random projection (training-free, JL-distance-preserving). modalities are bound via VSA role-filler XOR — factorable, unlike attention fusion.

▸ video 64 frames · 256×256 · 4s window
V-JEPA 2 Gigantic-256
1.2B params · self-supervised on 1M+ hours · ViT + 3D-RoPE · EMA target
CPU 1.5s · GPU 200ms
1024d latent → 8192-bit ŝᵥ
▸ audio 60s @ 16kHz · resampled 50→2 Hz
Wav2Vec-BERT 2.0
multilingual SSL · frame-level 1024-d latents · mean-pooled
CPU 400ms · GPU 80ms
1024d latent → 8192-bit ŝₐ
▸ text 1024 tokens · preceding context
Llama-3.2-3B
layer-32 mean-pool hidden state · no generation · forward only
CPU 200ms · GPU 30ms
2048d latent → 8192-bit ŝₜ
↓ VSA role-filler XOR · factorable
episode = ROLEv⊗ŝᵥ ROLEa⊗ŝₐ ROLEt⊗ŝₜ ROLEτ⊗ŝτ
recover any modality  ⇒  ŝₐ ≈ episode ⊕ ROLEa  then nearest-neighbor cleanup against the audio codebook
attention fusion · TRIBE / mem0 / letta

dense hidden state. components entangled. cannot recover original audio from a fused episode.

↘ ↓ ↙ ▓▓▓ entangled ▓▓▓
VSA · XOR role-filler · agidb

each modality bound to its own role HV. unbind ⊕ ROLE recovers the modality signature with clean-up.

↘ ↓ ↙ ⊕ [ factorable bound HV ]

not a metaphor.
a benchmark.

because agidb uses the same encoder stack as TRIBE v2 (Meta FAIR, March 2026 · the brain-encoding foundation model that won Algonauts 2025), its internal HDC signatures can be benchmarked directly against TRIBE-predicted cortical activations across 720 fMRI subjects. that benchmark is BAMS — representational similarity analysis across six functional networks.

cortical flatmap · schaefer 1000-parcel atlas · 6 networks TRIBE v2 · n=720 · OOD r̄=0.215
DMN · default moder=0.72 dorsal attentionr=0.64 frontoparietalr=0.81 somatomotorr=0.41 visualr=0.88 ventral attentionr=0.55
RSA correlation · agidb signatures × TRIBE-predicted BOLD
r < 0.3 0.3–0.6 > 0.6 diag mean · 0.67
θ_brain · brain-calibrated surprise gate phase 15 calibration
surprise(t) = 1 − ham_sim( s(t), bundle( s[t-K..t] ) )
fit against TRIBE-predicted neural surprise · associative cortex (TPJ · dlPFC · DMN) θbrain = 0.52
┌─ inheriting from TRIBE v2 · Meta FAIR · arxiv 2507.22229 · mar '26 · CC-BY-NC · 720 subjects · 70k voxel-level V-JEPA 2 · arxiv 2506.09985 · Gigantic-256 · 1.2B params Charikar '02 · STOC · similarity estimation via rounding · JL preservation Algonauts 2025 · TRIBE v1 · 1st of 263 teams

it remembers
what was true when.

new facts don't overwrite. they're stamped t_valid_start = now; the old fact gets t_valid_end = now − 1ms and a superseded_by pointer. ask the db "as of" any date — get the answer that was true then.

┌── valid-time axis · drag the scrubber ──
2024-01jan 2025-08aug 2026-03mar todaymay '26
i live in mumbai · superseded
i live in berlin · superseded
i live in lisbon · current
recall("where do i live?") as of 2025-09-15
berlin
tier · exact confidence · 0.97 source · msg #841

the substrate
that sleeps & becomes.

every five minutes a background worker clusters episodic patterns into semantic atoms, detects contradictions, decays unused entries, and — new in v2 — updates the self-vector EMA: self_vec ← (1−α)·self_vec + α·bundle(consolidated). inspired by V-JEPA 2's target encoder + TRIBE's per-subject embedding.

01
cluster
scan 7d · hamming group
02
surprise
score vs current beliefs
03
bundle
N≥3 → semantic atom
04
contradict
overlap t_valid → supersede
05
self-vec
EMA update α=0.05
06
decay
unused 90d → cold
07
compact
rewrite signatures.dat
consolidation
01
cluster
┌─ last consolidation report
  • scanned12,471 episodes7d
  • clusters218 candidates≥ 3 members
  • surprise > θ1,084 (8.7%)brain-cal
  • semantic atoms+ 84 newbundled
  • contradictions11 resolvedsuperseded
  • self-vec driftΔ 184 bitshamming
  • decayed2,103 archivedcold
  • duration1.84sidle
self_vector trajectory · 30d · α=0.05 ||drift|| = 1,842 bits
day -30 today

the only substrate
that can truly forget.

GDPR Article 17. poisoned memories. right-to-be-forgotten. a hard DELETE leaves the self-model contaminated by centroid drift from the deleted data. agidb cascades through episodes → beliefs → semantic atoms → procedures, tombstones non-destructively for a 30-day recovery window, and subtracts the deleted signatures from the self-vector itself.

01
identify cascade
find episodes · beliefs · atoms · procedures referencing target · compute dependency graph
47 episodes · 12 beliefs · 3 atoms · 1 procedure
02
tombstone
mark t_tombstoned = now · invalidate in mmap · concept HV withdrawn · removed from inverted index
47 signatures invalidated · 30-day recovery window
03
cascade revise
beliefs with evidence under threshold → confidence reduced or withdrawn · atoms recomputed without removed evidence
8 beliefs withdrawn · 4 revised · 1 procedure degraded
04
self-vector subtract
the step nobody else does. self_vec ← self_vec − α·bundle(tombstoned_sigs). without it, the self-model still "remembers" via centroid contamination.
Δ self-vector = 184 bits · drift recorded in self_vector_history
self_vec ← self_vec − α · bundle(tombstoned)
05
audit
emit LearningEvent::Unlearned with target_ref, cascade_size, self_vec_drift, dependency_graph_id, reason — permanent record
audit entry persisted · cannot be unlearned itself

the database
AGI will run on.

v2.0 is the substrate. v2.1 is brain-aligned multimodal. v2.2–v2.5 builds the cognitive engine on top — pattern completion, formal belief revision, causal claims, world model fragments, closed-loop self-modification with formal safety guarantees. five years committed.

v2.0
2026 · m9
substrate
  • seven floors · typed cognitive shapes
  • goals + beliefs first-class
  • self-model · learning log · unlearn
  • neurosymbolic interface
v2.1
2026 · m12
brain-aligned
  • V-JEPA 2 + W2V-BERT + Llama-3.2-3B sensory
  • brain-calibrated θ_brain
  • BAMS benchmark · ICLR '26 paper
v2.2
2027
cognitive engine v0.1
  • Hopfield pattern completion
  • AGM-formal belief revision
  • analogical retrieval via HDC bind
v2.3
2028
causal + world model
  • causal claim storage · intervention semantics
  • world model fragments
  • on-line learning state
v2.4
2029–30
production · distributed
  • enterprise tier · distributed mode
  • formal safety on self-modification
  • BCI sensory experimental · Brain-JEPA
v2.5
2031
AGI-grade
  • closed-loop self-modification
  • causal reasoning over beliefs
  • full cognitive engine

memory frameworks above the LLM.
agidb is a substrate beneath.

mem0, letta, zep, cognee, hippoRAG, MemMachine all sit above the LLM as python frameworks. agidb is a rust substrate beneath the agent loop. different layer, different shape.

layer
read path
representation
cognitive prims
brain-align
agidbrust substrate
beneath agent
no LLM · POPCOUNT
HDC · 8192-bit · VSA bind
goals · beliefs · self-vec · unlearn
BAMS · TRIBE-aligned
mem0$24M · 41K★
python framework
LLM optional · 1–3s p95
vector + graph + kv
no
no
lettamemgpt · $10M · 22K★
agent runtime
LLM in loop
memory blocks · postgres
core mem text
no
zep · graphiti25.7K★
python + neo4j
cypher + embed hybrid
temporal KG
no
no
cognee€7.5M · 12K★
python pipeline
LLM-heavy
vector + graph hybrid
no
no
hippoRAG / hippoMMOSU-NLP · linyueqian
app on LLM
PPR over KG
KG + dentate gyrus
no
architectural claim only
hyperonsingularitynet
research substrate
MeTTa interpreter
metagraph · AtomSpace
yes
no
agentmemoryrohitg00 · rust
rust server
BM25 + HNSW
RocksDB hybrid
no
no

may 2026 is
the right window.

01

agent memory became a category.

real funding · the question shifted from whether to how.

  • mem0 · $24M · oct '25
  • letta · $10M · sep '24 · felicis
  • cognee · €7.5M · feb '26 · pebblebed
02

HDC matured for production.

torchhd · karunaratne nature electronics · pathHD · HPE hippocampus papers.

  • 31× lower latency vs vector dbs
  • 14× lower token cost
  • math is no longer experimental
03

rust embedded dbs grew up.

duckdb · lancedb · redb · surrealdb · tigerbeetle — single binary is normal.

  • redb · pure rust · ACID · MVCC
  • cargo add · sqlite-shaped story
  • MCP reaches agents directly
04

frontier labs aren't shipping substrates.

anthropic memory tool = CRUD over /memories. openai memory = product feature. gemini personal context = product feature. vendor-neutral wedge is open.

  • anthropic · sep '25 · file directory
  • openai · apr '25 · feature
  • nobody else · open
05

TRIBE v2 made brain-alignment tractable.

meta FAIR released open weights for fMRI-prediction across 720 subjects from V-JEPA 2 + Wav2Vec-BERT + Llama-3.2-3B. shared encoder stack = free brain-alignment for agidb.

  • 720 subjects · 70k voxel
  • algonauts 2025 · 1st of 263
  • no other agent memory can ship this

one week
collapses the bet.

every claim on this page resolves at phase 7 · week 12 against a shared harness: LongMemEval-S · LoCoMo · BEAM · cognitive (goal · belief · unlearn · multi-floor). six metrics per run · never a single number · raw logs + harness hash with every claim. three outcomes — and the project commits to one.

→ commit proceed to launch · v2.1 · fundraise
  • ≥ Zep/Graphiti accuracy on LongMemEval-S (within 1pp F1 + LLM-judge)
  • ≥ 3× lower p95 latency vs mem0 (target < 50ms)
  • ≥ 3× lower token cost vs mem0 (< 2,500 tokens/query)
  • wins noisy-cue degradation test
  • all four cognitive benchmarks pass
  • holds across all three standard benchmarks · no cherry-picking
↦ reposition ship smaller · no v2.1 · no fundraise
  • within 3pp of mem0 F1
  • ≥ 10× memory savings vs alternatives
  • partial cognitive benchmark pass acceptable
  • reposition as agidb-lite · embedded cognitive memory for edge agents
  • skip brain-alignment milestone
↤ retreat fold back · reposition product
  • >10pp behind dense baselines
  • gap doesn't close with reranking
  • cognitive benchmarks fail
  • reposition as ctxgraph · temporal graph memory
  • preserve the IP · publish what we learned

embedded.
one binary.
zero infra.

agidb v2 is pre-alpha · milestone-driven · open-source. v2.0 substrate ships month 9, v2.1 brain-aligned multimodal at month 12.

licenseapache-2.0
platformsmacos · linux · win
bindingsrust · python · MCP
statusv2 pre-alpha
tests44 passing
v2.1 weights~4GB · downloaded on first use
# Cargo.toml
[dependencies]
agidb = "0.2"     # v2.0 pre-alpha

// main.rs
use agidb::{Agidb, Goal, Belief};

let db = Agidb::open("./memory.agidb").await?;

db.observe("sarah recommended bawri in bandra").await?;
db.assert_belief(Belief::new("sarah likes thai").with_confidence(0.8)).await?;
db.set_goal(Goal::new("find a thai place")).await?;

let r = db.recall("what thai place did sarah pick?").await?;
println!("{:?}", r.top());
// → Match { text: "Bawri", confidence: 0.94, tier: Exact, source: "msg#1217" }
# install
$ pip install agidb

# use
from agidb import Agidb, Goal, Belief

db = await Agidb.open("./memory.agidb")
await db.observe("sarah recommended bawri")
await db.assert_belief(Belief("sarah likes thai", confidence=0.8))
await db.set_goal(Goal("find a thai place"))

r = await db.recall("what thai place?")
print(r.top)   # → "Bawri" conf=0.94
# run as an MCP server
$ agidb mcp --path ./memory.agidb

# MCP tools exposed:
#   agidb.observe / agidb.recall / agidb.consolidate
#   agidb.set_goal / agidb.assert_belief / agidb.revise_belief
#   agidb.what_did_i_learn / agidb.unlearn

# claude desktop / cursor / any MCP-compatible agent
# now has typed cognitive memory · no glue code
// v2.1 · multimodal observation
use agidb::{Agidb, VideoClip, AudioClip};

let db = Agidb::open("./memory.agidb").await?;
// encoders download on first use · ~4GB

let ep_id = db.observe_multimodal(
    VideoClip::from_path("clip.mp4")?,
    AudioClip::from_path("clip.wav")?,
    "sarah pointed at the bawri sign".into(),
).await?;
// V-JEPA 2 · W2V-BERT · Llama-3.2-3B
// → bound 8192-bit episode HV · ~2s laptop CPU

// later: recover the audio component
let audio_sig = db.extract_modality(ep_id, Modality::Audio).await?;