Zep

L4 — Intelligent Retrieval Agent Memory Free (OSS CE) / Cloud usage-based Apache-2.0 · OSS

Long-term agent memory built on a temporal knowledge graph (Graphiti). Zep Community Edition under Apache-2.0; Zep Cloud as the managed offering. Strong fit for agents that need time-aware facts and entity-resolution across user sessions.

AI Analysis

Zep is a long-term agent memory framework built on a temporal knowledge graph (Graphiti), preserving time-aware facts and entity-resolution across user sessions in a way that pure vector retrieval cannot. Available as Apache-2.0 Community Edition and a managed Zep Cloud, it is the leading choice when an agent needs to know not just what is true, but when it became true. The key tradeoff: best-in-class semantic memory and temporal reasoning versus a younger operational footprint and a steeper concept curve than dropping in a vanilla vector store.

Trust Before Intelligence

For Layer 4 agent memory, trust means an agent retrieves the right facts about the right user with the right freshness — and never asserts something as current when it was actually superseded. Zep's temporal knowledge graph is the rare memory substrate that makes recency a first-class property: facts have valid-from and valid-until timestamps, and superseded facts are explicitly tombstoned rather than silently overwritten. The failure modes that hurt trust are mostly operational — graph drift when ingestion lags, or entity-resolution mistakes that merge two distinct users — both of which are detectable with the right monitoring discipline.

INPACT Score

25/36

I — Instant

4/6

Memory retrieval on Zep Cloud typically returns in 200-500ms; self-hosted Community Edition depends on the backing store. Temporal graph traversals on cold paths can hit 1-2s. Below 5s, no cap, but not the sub-100ms tier that pushes I to 5-6.

N — Natural

4/6

Pythonic SDK with `memory.add` and `memory.search`; REST API mirrors the SDK. Graphiti graph queries surface through high-level helpers. Some advanced features need docs reading.

P — Permitted

3/6

User-scoped memory and session isolation; Zep Cloud adds workspace RBAC. No native ABAC at the framework level. Cap rule applied — RBAC-only without ABAC caps at 3.

A — Adaptive

5/6

CE self-hostable on AWS, Azure, GCP, on-prem. Zep Cloud as the managed option. Apache-2.0 license keeps the OSS path durable; no cloud lock-in.

C — Contextual

5/6

Temporal knowledge graph (Graphiti) preserves time-aware context. Custom entity types act as a semantic vocabulary. Far richer than pure vector memory.

T — Transparent

4/6

OSS code, structured logs, message-level metadata. Zep Cloud adds usage analytics. OSS edition lacks built-in per-query cost attribution.

GOALS Score

18/25

G — Governance

3/6

Audit log on Zep Cloud plus structured logs in CE. Graph versioning via temporal facts counts toward G4 (model versioning). No native HITL, threat modeling, or compliance mapping.

O — Observability

3/6

Zep Cloud dashboard plus OpenTelemetry hooks cover APM. Temporal facts make retrieval rationale visible (O6). Lacks distributed tracing and LLM cost attribution.

A — Availability

3/6

Sub-second retrieval p95 on tuned deployments; real-time message ingestion is supported. Cache hit rate and 10x load scenarios not first-class concepts in the framework itself.

L — Lexicon

5/6

Top-of-category Lexicon support — entity resolution via Graphiti node merging, custom entity types as glossary, confidence scores enable disambiguation, entity aliasing supported. Aligned with mem0 and letta peers at L=5.

S — Solid

4/6

Graph constraints, required fields on entity types, single source of truth in the graph, entity schema validation. No first-class quality gates or ML-based anomaly detection.

AI-Identified Strengths

+ Temporal facts make recency native — agents can ask 'what is true now' vs 'what was true on date X' without bolt-on logic
+ Graphiti's entity resolution merges duplicate facts about the same person/thing across sessions
+ Apache-2.0 Community Edition keeps the OSS path durable and OSI-approved
+ Native integrations with LangChain, LangGraph, AutoGen reduce glue code
+ Zep Cloud SOC 2 Type II provides a managed path for teams that cannot self-host

AI-Identified Limitations

- No HIPAA BAA publicly offered as of 2026-05 — PHI workloads must self-host on a BAA-signing backing store
- No FedRAMP authorization — not appropriate for federal workloads without significant additional control work
- Graphiti graph engine is newer than the underlying components; production track record is shorter than mem0
- Concept model (temporal facts, entity nodes) takes more onboarding than a simple vector retrieval API
- Self-hosted HA depends entirely on the chosen backing graph and vector stores

Industry Fit

Best suited for

Customer-support agents that need to remember user preferences, past tickets, and the history of those facts over timeConsumer AI assistants where personalization depth and temporal awareness drive product qualityB2B copilots in technology and professional services with moderate compliance needs

Compliance certifications

Zep Cloud SOC 2 Type II per https://www.getzep.com/security. No HIPAA BAA, ISO 27001, or FedRAMP as of 2026-05. CE compliance posture inherits entirely from the deployment environment and backing stores.

Use with caution for

Healthcare PHI workloads — no BAA on Zep Cloud; self-host on a BAA-signing backing storeFederal / FedRAMP workloads — not authorized as of 2026-05Use cases where temporal facts are irrelevant — pure latest-snapshot retrieval gets cheaper alternatives

AI-Suggested Alternatives

Mem0

Both are OSS agent memory frameworks. Mem0 is lighter, faster to bolt on, with broader storage-backend support. Zep wins when temporal reasoning matters — what happened when, what superseded what — which Mem0 does not model first-class.

View analysis →

Letta

Letta (MemGPT lineage) is a more opinionated agent runtime with memory as one part. Zep is a memory layer that bolts onto any agent runtime. Pick Letta if you want a complete persona runtime; Zep if you want memory to layer onto LangGraph or AutoGen.

View analysis →

Integration in 7-Layer Architecture

Role: Sits at Layer 4 as the temporal memory substrate — turns raw conversation history into a queryable knowledge graph of typed entities and time-aware facts.

Upstream: Reads from LLM-generated turns and user messages; pulls embeddings from OpenAI / Anthropic / local models. Backed by graph stores (Neo4j, FalkorDB) and vector stores (pgvector, Qdrant, Pinecone).

Downstream: Feeds retrieved memories back into agent prompts at L4; integrates with LangChain, LangGraph, AutoGen as a memory provider. Logfire / OpenTelemetry export for L6 observability.

⚡ Trust Risks

high Entity-resolution false merge — two distinct users get conflated when they share enough identifying attributes, leaking memory across them

Mitigation: Configure aggressive `user_id` scoping; require explicit identity tokens on every memory write; audit merged-entity logs weekly; integration tests asserting no cross-user retrieval

high Temporal fact tombstoning fails silently when ingestion lags, causing the agent to assert stale facts as current

Mitigation: Monitor ingestion lag; surface 'as-of-date' in retrieval results; treat older facts with explicit recency-aware prompting

medium Self-hosted deployment without backups loses the entire user memory on disk failure

Mitigation: Run scheduled backups of the backing graph store; test restore quarterly; document RTO/RPO in the deployment runbook

Use Case Scenarios

strong Long-running customer-support copilot that remembers user plans, past issues, and how those have changed over months

Temporal facts and Graphiti's entity resolution are exactly the right shape; the agent can reason about 'plan upgraded last Tuesday' as a first-class fact.

moderate Enterprise knowledge assistant for a healthcare workflow

Memory model fits, but no BAA on Cloud means the team must self-host on a BAA-signing backing store and accept higher operational burden.

weak Latency-critical agent with a single 'remember the last 10 turns' requirement

Overkill — a simple Redis-backed conversation buffer is the right shape and avoids the Graphiti operational footprint.

Stack Impact

L1 Drops a graph store (Neo4j, FalkorDB) and a vector store (pgvector, Qdrant) into L1 — plan capacity and HA accordingly

L5 Real authorization lives at L5 (OpenFGA or SpiceDB) — Zep does not enforce ABAC across tenants on its own

L6 Ingestion lag and entity-merge events need L6 monitoring — these are the early-warning signals for memory quality

⚠ Watch For

! Plan to use Zep Cloud for PHI without a signed BAA — do not proceed
! Multi-tenant deployment without a wrapper enforcing `user_id` on every memory call — entity-resolution merges become tenant-leakage bugs
! Self-hosted CE running on a single-node graph store with no backups — one disk failure ends user history

2-Week POC Checklist

☐ Walk a sample conversation through Zep and verify the temporal facts extracted match what a human would record
☐ Test entity-resolution edge cases: two users with similar names, a user changing email, a user with multiple accounts
☐ Measure retrieval p95 against the backing store at expected concurrency
☐ Verify GDPR delete: insert facts, call delete-by-user, confirm vector store + graph store are both purged
☐ Confirm BAA / compliance posture matches the workload before any sensitive data flows

Explore in Interactive Stack Builder →

Visit Zep website →

This analysis is AI-generated using the INPACT and GOALS frameworks from "Trust Before Intelligence." Scores and assessments are algorithmic and may not reflect the vendor's complete capabilities. Always validate with your own evaluation.