> ## Documentation Index
> Fetch the complete documentation index at: https://docs.velesagent.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Исследование долговременной памяти в OpenClaw

> Подробное исследование архитектуры долговременной памяти OpenClaw для переноса идей в Veles.

# Исследование долговременной памяти в OpenClaw

> Research date: 2026-03-25
> Purpose: Understand openclaw memory architecture for porting to Veles

***

## Table of Contents

1. [Architecture Overview](#1-architecture-overview)
2. [Two Plugin Systems](#2-two-plugin-systems)
3. [Core Memory Module (src/memory/)](#3-core-memory-module)
4. [Memory Tools Exposed to Agent](#4-memory-tools-exposed-to-agent)
5. [System Prompt Integration](#5-system-prompt-integration)
6. [Indexing & Sync Pipeline](#6-indexing--sync-pipeline)
7. [Search Flow (Hybrid BM25 + Vector)](#7-search-flow)
8. [Conversation History Persistence](#8-conversation-history-persistence)
9. [Session Memory Hook (Dated Files)](#9-session-memory-hook-dated-files)
10. [Pre-Compaction Memory Flush](#10-pre-compaction-memory-flush)
11. [LanceDB Plugin (Independent Long-Term Memory)](#11-lancedb-plugin)
12. [Configuration Reference](#12-configuration-reference)
13. [Data Flow Summary](#13-data-flow-summary)
14. [Key Files Map](#14-key-files-map)
15. [Porting Considerations for Veles](#15-porting-considerations-for-veles)

***

## 1. Architecture Overview

OpenClaw's memory is a multi-layered system with **three independent mechanisms** that work together:

```
┌──────────────────────────────────────────────────────────────┐
│                     Agent Loop                                │
│                                                               │
│  ┌─────────────┐   ┌──────────────┐   ┌───────────────────┐ │
│  │ System Prompt│   │ memory_search│   │  memory_get       │ │
│  │ "## Memory  │   │ (tool call)  │   │  (tool call)      │ │
│  │  Recall"    │   └──────┬───────┘   └────────┬──────────┘ │
│  └──────┬──────┘          │                     │            │
│         │                 ▼                     ▼            │
│         │     ┌───────────────────────────────────┐         │
│         │     │     MemoryIndexManager (SQLite)    │         │
│         │     │  ┌─────────┐  ┌────────────────┐  │         │
│         │     │  │ FTS5    │  │ sqlite-vec     │  │         │
│         │     │  │ (BM25)  │  │ (cosine dist)  │  │         │
│         │     │  └─────────┘  └────────────────┘  │         │
│         │     │  ┌─────────────────────────────┐  │         │
│         │     │  │ Embedding Provider (auto)   │  │         │
│         │     │  │ openai/gemini/voyage/local  │  │         │
│         │     │  └─────────────────────────────┘  │         │
│         │     └───────────────────────────────────┘         │
│         │                      ▲                             │
│         │                      │ indexes                     │
│         │     ┌────────────────┴──────────────────┐         │
│         │     │         Source Files               │         │
│         │     │  MEMORY.md                         │         │
│         │     │  memory/*.md (dated files)         │         │
│         │     │  sessions/*.jsonl (experimental)   │         │
│         │     └───────────────────────────────────┘         │
│         │                      ▲                             │
│         │                      │ writes                      │
│         │     ┌────────────────┴──────────────────┐         │
│         │     │      Memory Writers                │         │
│         │     │  • Agent (direct file writes)      │         │
│         │     │  • Session-memory hook (/new)       │         │
│         │     │  • Pre-compaction flush             │         │
│         │     └───────────────────────────────────┘         │
└──────────────────────────────────────────────────────────────┘
```

**Key principle**: Memory is **plain Markdown files on disk**. The vector index is a derived cache — if you delete it, it rebuilds from the files.

***

## 2. Two Plugin Systems

Memory is provided via a **plugin slot** (`plugins.slots.memory`). Two plugins exist:

### 2a. `memory-core` (default, built-in)

* Provides `memory_search` and `memory_get` tools
* Uses SQLite + FTS5 + sqlite-vec for indexing and search
* Indexes `MEMORY.md` + `memory/*.md` files
* Hybrid BM25 + vector search
* Multiple embedding providers with auto-selection
* **Entry point**: `extensions/memory-core/index.ts`

### 2b. `memory-lancedb` (optional, install-on-demand)

* Completely independent vector DB system using LanceDB
* Provides `memory_recall`, `memory_store`, `memory_forget` tools
* **Auto-recall**: injects relevant memories before agent starts
* **Auto-capture**: analyzes user messages after agent ends, stores important facts
* Uses OpenAI embeddings (`text-embedding-3-small`)
* DB stored at `~/.openclaw/memory/lancedb`
* **Entry point**: `extensions/memory-lancedb/index.ts`

### Plugin selection

```yaml theme={null}
plugins:
  slots:
    memory: "memory-core"     # default
    # memory: "memory-lancedb"  # alternative
    # memory: "none"            # disable
```

***

## 3. Core Memory Module

Located at `src/memory/`. This is the engine behind `memory-core`.

### Class hierarchy

```
MemoryManagerSyncOps (abstract)
  → MemoryManagerEmbeddingOps (abstract)
    → MemoryIndexManager (concrete, implements MemorySearchManager)
```

### Storage: SQLite

Database location: `<stateDir>/memory/<agentId>.sqlite`

**Tables:**

| Table             | Purpose                                                                                    |
| ----------------- | ------------------------------------------------------------------------------------------ |
| `meta`            | Key-value metadata (model, provider, chunk config, vector dims)                            |
| `files`           | Indexed file records (path, source, hash, mtime, size)                                     |
| `chunks`          | Text chunks with embeddings (id, path, source, start/end line, hash, text, embedding JSON) |
| `chunks_vec`      | Virtual table for sqlite-vec vector search (Float32Array blobs)                            |
| `chunks_fts`      | FTS5 full-text search virtual table                                                        |
| `embedding_cache` | Provider/model/hash-keyed embedding cache                                                  |

### Memory file discovery

The system indexes these files:

1. `MEMORY.md` or `memory.md` in workspace root
2. All `*.md` files recursively under `memory/` directory
3. Additional paths from `memorySearch.extraPaths` config
4. Session JSONL files (if `experimental.sessionMemory: true`)

### Document chunking

`chunkMarkdown()` in `internal.ts`:

* Splits by lines, groups into \~400-token chunks with 80-token overlap
* Each chunk: `startLine`, `endLine`, `text`, `hash` (SHA-256)
* Token estimate: `characters / 4`

### Embedding providers

Factory: `createEmbeddingProvider()` in `embeddings.ts`. Supports:

| Provider | Default Model                       | File                    |
| -------- | ----------------------------------- | ----------------------- |
| OpenAI   | `text-embedding-3-small`            | `embeddings-openai.ts`  |
| Gemini   | `gemini-embedding-001`              | `embeddings-gemini.ts`  |
| Voyage   | `voyage-4-large`                    | `embeddings-voyage.ts`  |
| Mistral  | `mistral-embed`                     | `embeddings-mistral.ts` |
| Ollama   | `nomic-embed-text`                  | `embeddings-ollama.ts`  |
| Local    | `embeddinggemma-300m-qat-Q8_0.gguf` | via `node-llama-cpp`    |

**Auto-selection (`provider: "auto"`):**

1. Try local model if `modelPath` configured
2. Try remote in order: openai → gemini → voyage → mistral
3. If all fail (missing keys) → **FTS-only mode** (keyword search, no vectors)

***

## 4. Memory Tools Exposed to Agent

### `memory_search` (from memory-core)

Semantic + keyword hybrid search over memory files.

* Agent calls it with a text query
* Returns matching chunks with file path, line numbers, score, text
* Searches `MEMORY.md`, `memory/*.md`, optionally session transcripts

### `memory_get` (from memory-core)

Safe snippet read from memory files.

* Parameters: file path, `from` line, `lines` count
* Returns the requested lines from a memory file
* Used for targeted follow-up reads after search

### `memory_recall` / `memory_store` / `memory_forget` (from memory-lancedb)

Only available if lancedb plugin is active instead of memory-core.

***

## 5. System Prompt Integration

The memory-core plugin registers a **prompt section builder** via `api.registerMemoryPromptSection(buildPromptSection)`.

This injects a `## Memory Recall` section into the system prompt that:

* Tells the agent about `memory_search` and `memory_get` tools
* Instructs the agent to search memory before answering recall-dependent questions
* Controls citation mode (`on`/`off`/`auto`) for file paths and line numbers in replies
* Only appears if the tools are in `availableTools`

**Key file**: `src/memory/prompt-section.ts` — singleton pattern, only one memory plugin's prompt builder can be active.

***

## 6. Indexing & Sync Pipeline

### `runSync()` method (in `manager-sync-ops.ts`)

1. **Memory files sync**: List all `MEMORY.md` + `memory/*.md`, compare hashes against DB, re-index changed files
2. **Session files sync**: List session JSONL files, extract user/assistant messages, build text entries, index them
3. **Atomic reindex**: If embedding model/provider changes, wipe all chunks and re-index everything
4. **Embedding batching**: Chunks grouped by byte size (max 8000 tokens/batch), sent to provider in parallel (concurrency: 4)
5. **Embedding cache**: SHA-256 content hash as cache key — unchanged content is never re-embedded

### Sync triggers

| Trigger           | When                                                    |
| ----------------- | ------------------------------------------------------- |
| File watcher      | chokidar on `memory/` dir and `MEMORY.md`, debounced    |
| On search         | If dirty flag is set (default: `sync.onSearch: true`)   |
| Session start     | Initial sync                                            |
| Periodic interval | Configurable `sync.intervalMinutes`                     |
| Session delta     | When accumulated session changes > 100KB or 50 messages |
| Post-compaction   | Forced sync after context compaction                    |

***

## 7. Search Flow

`MemoryIndexManager.search()`:

### FTS-only mode (no embedding provider)

1. Extract keywords from query
2. Run BM25 full-text search via FTS5 `MATCH`
3. Merge and deduplicate

### Hybrid mode (default)

1. **Keyword search**: FTS5 BM25 on `chunks_fts` table
2. **Vector search**: Embed query → `vec_distance_cosine()` on `chunks_vec` (or in-memory cosine fallback)
3. **Merge**: `score = 0.7 * vectorScore + 0.3 * textScore`
4. **Optional temporal decay**: Exponential, configurable half-life (default 30 days, disabled by default)
5. **Optional MMR re-ranking**: Jaccard similarity diversity, lambda 0.7 (disabled by default)
6. **Filter**: `minScore ≥ 0.35`
7. **Limit**: `maxResults = 6`

***

## 8. Conversation History Persistence

### Session transcripts (JSONL)

Every conversation is stored as a **JSONL file** per session:

* Location: `~/.openclaw/agents/<agentId>/sessions/<SessionId>.jsonl`
* Each line: JSON record with `type: "message"`, full message object (role, content, usage)
* Session metadata: `~/.openclaw/agents/<agentId>/sessions/sessions.json`

### Session scoping

Sessions are scoped by key format:

* `agent:<agentId>:<mainKey>` — DM sessions
* `agent:<agentId>:<channel>:group:<id>` — group sessions
* DM scope modes: `main`, `per-peer`, `per-channel-peer`, `per-account-channel-peer`

### Session lifecycle

* **Daily reset**: at 4 AM local time
* **Idle reset**: after configurable idle timeout
* **Manual reset**: `/new` or `/reset` commands
* **Rotation**: when transcript exceeds `rotateBytes`
* **Pruning**: `session.maintenance.pruneAfter` for old sessions

### Session transcripts as searchable memory (experimental)

When `memorySearch.experimental.sessionMemory: true` and `sources: ["memory", "sessions"]`:

* Session JSONL files are parsed, user/assistant messages extracted
* Content is chunked, embedded, and indexed alongside memory files
* Becomes searchable via `memory_search`

***

## 9. Session Memory Hook (Dated Files)

**This is the mechanism that creates `memory/YYYY-MM-DD-slug.md` files.**

### Files

* `src/hooks/bundled/session-memory/handler.ts`
* `src/hooks/bundled/session-memory/transcript.ts`

### Trigger

Fires on `/new` or `/reset` commands (session end/rotation).

### Process

1. Find the **previous session's** transcript JSONL file
2. Read the **last N messages** (default: 15, configurable)
3. Send messages to LLM to **generate a descriptive filename slug** (e.g., "api-design", "vendor-pitch")
4. Write `memory/YYYY-MM-DD-slug.md` with:

```markdown theme={null}
# Session: 2026-01-16 14:30:00 UTC

- **Session Key**: agent:main:main
- **Session ID**: abc123def456
- **Source**: telegram

## Conversation Summary

user: What about the API design?
assistant: I suggest we use REST with...
```

### Transcript reading (`transcript.ts`)

* Reads JSONL files, parses `type: "message"` entries
* Extracts user and assistant messages (skips `/` commands)
* Supports fallback to `.reset.` rotated transcript files
* Finds previous session files by session ID, topic variants, or most recent

***

## 10. Pre-Compaction Memory Flush

**Automatically captures memories before the context window fills up and gets compacted.**

### Files

* `src/auto-reply/reply/memory-flush.ts`
* `src/auto-reply/reply/agent-runner-memory.ts`

### Trigger conditions (`shouldRunMemoryFlush`)

* Session tokens exceed `contextWindow - reserveTokensFloor - softThresholdTokens` (default soft threshold: 4000 tokens)
* OR transcript file size exceeds `forceFlushTranscriptBytes` (default: 2MB)
* Has not already flushed for the current compaction cycle

### Execution

1. Read token usage from session transcript
2. Project next input size
3. Run a **dedicated LLM agent turn** with system prompt instructing it to write durable memories
4. Target file: `memory/YYYY-MM-DD.md` (uses user's timezone)
5. Instructions to LLM: **append only, never overwrite, never edit root memory files**

### Default prompt

> "Pre-compaction memory flush. Store durable memories only in `memory/YYYY-MM-DD.md`. If nothing to store, reply with `[silent]`."

### Configuration

```yaml theme={null}
agents:
  defaults:
    compaction:
      memoryFlush:
        enabled: true
        softThresholdTokens: 4000
        systemPrompt: "..."
        prompt: "..."
```

***

## 11. LanceDB Plugin

An **alternative/complementary** long-term memory system.

### Auto-recall (`before_agent_start` hook)

1. Embed the incoming user message
2. Search top 3 memories (minScore: 0.3) from LanceDB
3. Inject as `<relevant-memories>` XML block into context

### Auto-capture (`agent_end` hook)

1. Scan user messages for capturable content
2. Rule-based trigger detection:
   * Preference patterns ("I prefer...", "I like...")
   * Contact info
   * Decision language
   * Explicit "remember" instructions
3. Category detection: preference, fact, decision, entity, other
4. Duplicate check: skip if >0.95 cosine similarity with existing memory
5. Store up to 3 memories per conversation

### Safety

* `shouldCapture()` filters: length 10-500 chars, skip injected memory context, skip system content, skip markdown-heavy output, skip emoji-heavy content
* `looksLikePromptInjection()` protection
* HTML escaping in memory injection

### Storage

* LanceDB table `"memories"` with fields: id, text, vector, importance, category, createdAt
* L2 distance converted to similarity: `1 / (1 + distance)`
* DB path: `~/.openclaw/memory/lancedb`

***

## 12. Configuration Reference

### Core memory search config (`agents.defaults.memorySearch`)

| Setting                              | Default      | Description                              |
| ------------------------------------ | ------------ | ---------------------------------------- |
| `enabled`                            | `true`       | Enable memory search                     |
| `provider`                           | `"auto"`     | Embedding provider                       |
| `sources`                            | `["memory"]` | What to index (`"memory"`, `"sessions"`) |
| `extraPaths`                         | `[]`         | Additional markdown paths to index       |
| `store.driver`                       | `"sqlite"`   | Storage backend                          |
| `store.vector.enabled`               | `true`       | Enable sqlite-vec acceleration           |
| `chunking.tokens`                    | `400`        | Chunk size in tokens                     |
| `chunking.overlap`                   | `80`         | Overlap between chunks                   |
| `query.maxResults`                   | `6`          | Max search results                       |
| `query.minScore`                     | `0.35`       | Min similarity threshold                 |
| `query.hybrid.enabled`               | `true`       | Enable hybrid search                     |
| `query.hybrid.vectorWeight`          | `0.7`        | Vector search weight                     |
| `query.hybrid.textWeight`            | `0.3`        | Keyword search weight                    |
| `query.hybrid.mmr.enabled`           | `false`      | MMR diversity re-ranking                 |
| `query.hybrid.temporalDecay.enabled` | `false`      | Recency-aware scoring                    |
| `cache.enabled`                      | `true`       | Embedding cache                          |
| `sync.onSearch`                      | `true`       | Sync before search                       |
| `sync.watch`                         | `true`       | File watcher                             |
| `fallback`                           | `"none"`     | Fallback provider                        |

### Memory backend config (`memory`)

```yaml theme={null}
memory:
  backend: "builtin"  # or "qmd"
  citations: "auto"   # "on" | "off" | "auto"
```

### Plugin config

```yaml theme={null}
plugins:
  slots:
    memory: "memory-core"  # or "memory-lancedb" or "none"
  entries:
    memory-lancedb:
      config:
        embedding:
          provider: "openai"
          model: "text-embedding-3-small"
          apiKey: "..."
        autoCapture: true
        autoRecall: true
```

***

## 13. Data Flow Summary

### Write paths (how memories get created)

```
1. Agent directly edits MEMORY.md or memory/*.md
   └── File watcher detects change → re-index

2. Session-memory hook (on /new or /reset)
   └── Reads last N messages from JSONL transcript
   └── LLM generates slug
   └── Writes memory/YYYY-MM-DD-slug.md → re-index

3. Pre-compaction flush (before context compaction)
   └── Dedicated LLM turn
   └── Appends to memory/YYYY-MM-DD.md → re-index

4. LanceDB auto-capture (if lancedb plugin active)
   └── Rule-based trigger on user messages
   └── Embeds + stores in LanceDB (separate from file-based memory)
```

### Read paths (how memories get recalled)

```
1. Agent calls memory_search tool
   └── MemoryIndexManager.search()
   └── Hybrid BM25 + vector search on SQLite
   └── Returns ranked chunks with file/line references

2. Agent calls memory_get tool
   └── Direct file read with line range

3. LanceDB auto-recall (if lancedb plugin active)
   └── before_agent_start hook
   └── Embed user message → search LanceDB
   └── Inject <relevant-memories> into context

4. System prompt loads MEMORY.md content directly
   └── Always available, no search needed
   └── Only in main private session
```

### Persistence across sessions

```
Session N ends (/new or /reset)
  │
  ├── JSONL transcript already on disk
  │     ~/.openclaw/agents/<id>/sessions/<sessionId>.jsonl
  │
  ├── Session-memory hook fires
  │     → memory/2026-03-25-api-design.md created
  │
  └── Next session starts
        │
        ├── MEMORY.md loaded into system prompt
        ├── memory/*.md files indexed (including new dated file)
        ├── Agent can memory_search for past conversations
        └── (optional) session transcripts indexed if experimental flag on
```

***

## 14. Key Files Map

### Plugin entry points

* `extensions/memory-core/index.ts` — default memory plugin
* `extensions/memory-lancedb/index.ts` — LanceDB alternative

### Core memory engine

* `src/memory/manager.ts` — MemoryIndexManager (877 lines, main class)
* `src/memory/manager-sync-ops.ts` — file/session sync operations
* `src/memory/manager-embedding-ops.ts` — embedding batch operations
* `src/memory/manager-search.ts` — vector + keyword search implementation
* `src/memory/internal.ts` — file listing, chunking, cosine similarity
* `src/memory/hybrid.ts` — hybrid search merge logic
* `src/memory/mmr.ts` — Maximal Marginal Relevance re-ranking
* `src/memory/temporal-decay.ts` — recency-aware scoring
* `src/memory/memory-schema.ts` — SQLite schema definitions
* `src/memory/session-files.ts` — session transcript indexing
* `src/memory/search-manager.ts` — factory with QMD fallback
* `src/memory/prompt-section.ts` — system prompt injection

### Embedding providers

* `src/memory/embeddings.ts` — factory + local provider
* `src/memory/embeddings-openai.ts`
* `src/memory/embeddings-gemini.ts`
* `src/memory/embeddings-voyage.ts`
* `src/memory/embeddings-mistral.ts`
* `src/memory/embeddings-ollama.ts`

### Session memory hook

* `src/hooks/bundled/session-memory/handler.ts` — creates dated memory files
* `src/hooks/bundled/session-memory/transcript.ts` — reads JSONL transcripts

### Pre-compaction flush

* `src/auto-reply/reply/memory-flush.ts` — flush trigger logic
* `src/auto-reply/reply/agent-runner-memory.ts` — flush execution

### Configuration

* `src/config/zod-schema.agent-runtime.ts` — MemorySearchSchema (lines 587-734)
* `src/config/types.tools.ts` — MemorySearchConfig type
* `src/config/types.memory.ts` — MemoryConfig, QMD types

### Documentation

* `docs/concepts/memory.md` — conceptual overview
* `docs/reference/memory-config.md` — full config reference (712 lines)
* `docs/cli/memory.md` — CLI reference

***

## 15. Porting Considerations for Veles

### What's worth porting (ranked by impact vs effort)

**High value, moderate effort:**

1. **Dated memory files** (`memory/YYYY-MM-DD-slug.md`) — session end hook that summarizes conversation into a dated file. Needs: session end event, LLM call for slug generation, file write.
2. **Pre-compaction memory flush** — before context compaction, run a dedicated LLM turn to extract durable memories. Needs: token counting, compaction event hook, LLM call.
3. **`memory_search` tool** — let the agent search its own memory files. Needs: FTS at minimum, vector search is a bonus.

**Medium value, lower effort:**
4\. **Memory file convention** — `MEMORY.md` as curated long-term + `memory/*.md` as daily logs. Zero code needed, just a convention.
5\. **System prompt memory section** — inject recall instructions into system prompt when memory tools are available.

**Nice to have, higher effort:**
6\. **Hybrid BM25 + vector search** — requires SQLite FTS5 + embedding provider.
7\. **Auto-recall/auto-capture** (LanceDB style) — requires lifecycle hooks + embedding pipeline.

### Minimal integration points needed

* **Session end event** → write dated memory file
* **Pre-compaction event** → run memory flush LLM turn
* **Tool registration** → add `memory_search` / `memory_get` to agent
* **System prompt builder** → add memory recall instructions
* **File watcher** (optional) → re-index on memory file changes

### What to skip

* QMD backend (experimental, complex)
* Multimodal memory (niche)
* sqlite-vec (FTS-only is a good starting point)
* LanceDB plugin (separate concern, memory-core is sufficient)
