fanfpy/AssetManager.UniApp

Fork 0

虾球 f6a1c1959a feat: 完成 P1 任务 - 今日盈亏、风险平价补全、Mock 数据 PreviousClose

2026-03-12 10:06:57 +00:00

28 KiB

Raw Blame History

🧠 memory-lancedb-pro · OpenClaw Plugin

OpenClaw 增强型 LanceDB 长期记忆插件

混合检索（Vector + BM25）· 跨编码器 Rerank · 多 Scope 隔离 · 管理 CLI

English | 简体中文

📺 视频教程

观看完整教程 — 涵盖安装、配置，以及混合检索的底层原理。

🔗 https://youtu.be/MtukF1C8epQ

🔗 https://www.bilibili.com/video/BV1zUf2BGEgn/

为什么需要这个插件？

OpenClaw 内置的 memory-lancedb 插件仅提供基本的向量搜索。memory-lancedb-pro 在此基础上进行了全面升级：

功能	内置 `memory-lancedb`	memory-lancedb-pro
向量搜索	✅	✅
BM25 全文检索	❌	✅
混合融合（Vector + BM25）	❌	✅
跨编码器 Rerank（Jina）	❌	✅
时效性加成	❌	✅
时间衰减	❌	✅
长度归一化	❌	✅
MMR 多样性去重	❌	✅
多 Scope 隔离	❌	✅
噪声过滤	❌	✅
自适应检索	❌	✅
管理 CLI	❌	✅
Session 记忆	❌	✅
Task-aware Embedding	❌	✅
任意 OpenAI 兼容 Embedding	有限	✅（OpenAI、Gemini、Jina、Ollama 等）

架构概览

┌─────────────────────────────────────────────────────────┐
│                   index.ts (入口)                        │
│  插件注册 · 配置解析 · 生命周期钩子 · 自动捕获/回忆       │
└────────┬──────────┬──────────┬──────────┬───────────────┘
         │          │          │          │
    ┌────▼───┐ ┌────▼───┐ ┌───▼────┐ ┌──▼──────────┐
    │ store  │ │embedder│ │retriever│ │   scopes    │
    │ .ts    │ │ .ts    │ │ .ts    │ │    .ts      │
    └────────┘ └────────┘ └────────┘ └─────────────┘
         │                     │
    ┌────▼───┐           ┌─────▼──────────┐
    │migrate │           │noise-filter.ts │
    │ .ts    │           │adaptive-       │
    └────────┘           │retrieval.ts    │
                         └────────────────┘
    ┌─────────────┐   ┌──────────┐
    │  tools.ts   │   │  cli.ts  │
    │ (Agent API) │   │ (CLI)    │
    └─────────────┘   └──────────┘

文件说明

文件	用途
`index.ts`	插件入口。注册到 OpenClaw Plugin API，解析配置，挂载 `before_agent_start`（自动回忆）、`agent_end`（自动捕获）、`command:new`（Session 记忆）等钩子
`openclaw.plugin.json`	插件元数据 + 完整 JSON Schema 配置声明（含 `uiHints`）
`package.json`	NPM 包信息，依赖 `@lancedb/lancedb`、`openai`、`@sinclair/typebox`
`cli.ts`	CLI 命令实现：`memory list/search/stats/delete/delete-bulk/export/import/reembed/migrate`
`src/store.ts`	LanceDB 存储层。表创建 / FTS 索引 / Vector Search / BM25 Search / CRUD / 批量删除 / 统计
`src/embedder.ts`	Embedding 抽象层。兼容 OpenAI API 的任意 Provider（OpenAI、Gemini、Jina、Ollama 等），支持 task-aware embedding（`taskQuery`/`taskPassage`）
`src/retriever.ts`	混合检索引擎。Vector + BM25 → RRF 融合 → Jina Cross-Encoder Rerank → Recency Boost → Importance Weight → Length Norm → Time Decay → Hard Min Score → Noise Filter → MMR Diversity
`src/scopes.ts`	多 Scope 访问控制。支持 `global`、`agent:<id>`、`custom:<name>`、`project:<id>`、`user:<id>` 等 Scope 模式
`src/tools.ts`	Agent 工具定义：`memory_recall`、`memory_store`、`memory_forget`（核心）+ `memory_stats`、`memory_list`（管理）
`src/noise-filter.ts`	噪声过滤器。过滤 Agent 拒绝回复、Meta 问题、寒暄等低质量记忆
`src/adaptive-retrieval.ts`	自适应检索。判断 query 是否需要触发记忆检索（跳过问候、命令、简单确认等）
`src/migrate.ts`	迁移工具。从旧版 `memory-lancedb` 插件迁移数据到 Pro 版

核心特性

1. 混合检索 (Hybrid Retrieval)

Query → embedQuery() ─┐
                       ├─→ RRF 融合 → Rerank → 时效加成 → 重要性加权 → 过滤
Query → BM25 FTS ─────┘

向量搜索: 语义相似度搜索（cosine distance via LanceDB ANN）
BM25 全文搜索: 关键词精确匹配（LanceDB FTS 索引）
融合策略: Vector score 为基础，BM25 命中给予 15% 加成（非传统 RRF，经过调优）
可配置权重: vectorWeight、bm25Weight、minScore

2. 跨编码器 Rerank

Jina Reranker API: jina-reranker-v3（5s 超时保护）
混合评分: 60% cross-encoder score + 40% 原始融合分
降级策略: API 失败时回退到 cosine similarity rerank

3. 多层评分管线

阶段	公式	效果
时效加成	`exp(-ageDays / halfLife) * weight`	新记忆分数更高（默认半衰期 14 天，权重 0.10）
重要性加权	`score = (0.7 + 0.3 importance)`	importance=1.0 → ×1.0，importance=0.5 → ×0.85
长度归一化	`score = 1 / (1 + 0.5 log2(len/anchor))`	防止长条目凭关键词密度霸占所有查询（锚点：500 字符）
时间衰减	`score = 0.5 + 0.5 exp(-ageDays / halfLife)`	旧条目逐渐降权，下限 0.5×（60 天半衰期）
硬最低分	低于阈值直接丢弃	移除不相关结果（默认 0.35）
MMR 多样性	cosine 相似度 > 0.85 → 降级	防止近似重复结果

4. 多 Scope 隔离

内置 Scope 模式: global、agent:<id>、custom:<name>、project:<id>、user:<id>
Agent 级访问控制: 通过 scopes.agentAccess 配置每个 Agent 可访问的 Scope
默认行为: Agent 可访问 global + 自己的 agent:<id> Scope

5. 自适应检索

跳过不需要记忆的 query（问候、slash 命令、简单确认、emoji）
强制检索含记忆相关关键词的 query（"remember"、"之前"、"上次"等）
支持 CJK 字符的更低阈值（中文 6 字符 vs 英文 15 字符）

6. 噪声过滤

在自动捕获和工具存储阶段同时生效：

过滤 Agent 拒绝回复（"I don't have any information"）
过滤 Meta 问题（"do you remember"）
过滤寒暄（"hi"、"hello"、"HEARTBEAT"）

7. Session 记忆

/new 命令触发时可保存上一个 Session 的对话摘要到 LanceDB
默认关闭（enabled: false），因为 OpenClaw 已有原生 .jsonl 会话保存
开启会导致大段摘要污染检索质量，建议仅在需要语义搜索历史会话时开启
可配置消息数量（默认 15 条）

8. 自动捕获 & 自动回忆

Auto-Capture（agent_end hook）: 从对话中提取 preference/fact/decision/entity，去重后存储（每次最多 3 条）
- 触发词支持 简体中文 + 繁體中文（例如：记住/記住、偏好/喜好/喜歡、决定/決定等）
Auto-Recall（before_agent_start hook）: 注入 <relevant-memories> 上下文（最多 3 条）

不想在对话中“显示长期记忆”？

有时模型会把注入到上下文中的 <relevant-memories> 区块“原样输出”到回复里，从而出现你看到的“周期性显示长期记忆”。

方案 A（推荐）：关闭自动召回 autoRecall

在插件配置里设置 autoRecall: false，然后重启 gateway：

{
  "plugins": {
    "entries": {
      "memory-lancedb-pro": {
        "enabled": true,
        "config": {
          "autoRecall": false
        }
      }
    }
  }
}

方案 B：保留召回，但要求 Agent 不要泄漏

在对应 Agent 的 system prompt 里加一句，例如：

请勿在回复中展示或引用任何 <relevant-memories> / 记忆注入内容，只能用作内部参考。

安装

AI 安装指引（防幻觉版）

如果你是用 AI 按 README 操作，不要假设任何默认值。请先运行以下命令，并以真实输出为准：

openclaw config get agents.defaults.workspace
openclaw config get plugins.load.paths
openclaw config get plugins.slots.memory
openclaw config get plugins.entries.memory-lancedb-pro

建议：

plugins.load.paths 建议优先用绝对路径（除非你已确认当前 workspace）。
如果配置里使用 ${JINA_API_KEY}（或任何 ${...} 变量），务必确保运行 Gateway 的服务进程环境里真的有这些变量（systemd/launchd/docker 通常不会继承你终端的 export）。
修改插件配置后，运行 openclaw gateway restart 使其生效。

Jina API Key（Embedding + Rerank）如何填写

Embedding：将 embedding.apiKey 设置为你的 Jina key（推荐用环境变量 ${JINA_API_KEY}）。
Rerank（当 retrieval.rerankProvider: "jina"）：通常可以直接复用同一个 Jina key，填到 retrieval.rerankApiKey。
如果你选择了其它 rerank provider（如 siliconflow / pinecone），则 retrieval.rerankApiKey 应填写对应提供商的 key。

Key 存储建议：

不要把 key 提交到 git。
使用 ${...} 环境变量没问题，但务必确保运行 Gateway 的服务进程环境里真的有该变量（systemd/launchd/docker 往往不会继承你终端的 export）。

什么是 “OpenClaw workspace”？

在 OpenClaw 中，agent workspace（工作区） 是 Agent 的工作目录（默认：~/.openclaw/workspace）。根据官方文档，workspace 是 OpenClaw 的 默认工作目录（cwd），因此 相对路径会以 workspace 为基准解析（除非你使用绝对路径）。

说明：OpenClaw 的配置文件通常在 ~/.openclaw/openclaw.json，与 workspace 是分开的。

最常见的安装错误： 把插件 clone 到别的目录，但在配置里仍然写类似 "paths": ["plugins/memory-lancedb-pro"] 的相对路径。相对路径的解析基准会受 Gateway 启动方式/工作目录影响，容易指向错误位置。

为避免歧义：建议用绝对路径（方案 B），或把插件放在 <workspace>/plugins/（方案 A）并保持配置一致。

方案 A（推荐）：克隆到 workspace 的 `plugins/` 目录下

# 1) 进入你的 OpenClaw workspace（默认：~/.openclaw/workspace）
#    （可通过 agents.defaults.workspace 改成你自己的路径）
cd /path/to/your/openclaw/workspace

# 2) 把插件克隆到 workspace/plugins/ 下
git clone https://github.com/win4r/memory-lancedb-pro.git plugins/memory-lancedb-pro

# 3) 安装依赖
cd plugins/memory-lancedb-pro
npm install

然后在 OpenClaw 配置（openclaw.json）中使用相对路径：

{
  "plugins": {
    "load": {
      "paths": ["plugins/memory-lancedb-pro"]
    },
    "entries": {
      "memory-lancedb-pro": {
        "enabled": true,
        "config": {
          "embedding": {
            "apiKey": "${JINA_API_KEY}",
            "model": "jina-embeddings-v5-text-small",
            "baseURL": "https://api.jina.ai/v1",
            "dimensions": 1024,
            "taskQuery": "retrieval.query",
            "taskPassage": "retrieval.passage",
            "normalized": true
          }
        }
      }
    },
    "slots": {
      "memory": "memory-lancedb-pro"
    }
  }
}

方案 B：插件装在任意目录，但配置里必须写绝对路径

{
  "plugins": {
    "load": {
      "paths": ["/absolute/path/to/memory-lancedb-pro"]
    }
  }
}

重启

openclaw gateway restart

注意： 如果之前使用了内置的 memory-lancedb，启用本插件时需同时禁用它。同一时间只能有一个 memory 插件处于活动状态。

验证是否安装成功（推荐）

1）确认插件已被发现/加载：

openclaw plugins list
openclaw plugins info memory-lancedb-pro

2）如果发现异常，运行插件诊断：

openclaw plugins doctor

3）确认 memory slot 已指向本插件：

# 期望看到：plugins.slots.memory = "memory-lancedb-pro"
openclaw config get plugins.slots.memory

配置

完整配置示例（点击展开）

{
  "embedding": {
    "apiKey": "${JINA_API_KEY}",
    "model": "jina-embeddings-v5-text-small",
    "baseURL": "https://api.jina.ai/v1",
    "dimensions": 1024,
    "taskQuery": "retrieval.query",
    "taskPassage": "retrieval.passage",
    "normalized": true
  },
  "dbPath": "~/.openclaw/memory/lancedb-pro",
  "autoCapture": true,
  "autoRecall": false,
  "retrieval": {
    "mode": "hybrid",
    "vectorWeight": 0.7,
    "bm25Weight": 0.3,
    "minScore": 0.3,
    "rerank": "cross-encoder",
    "rerankApiKey": "${JINA_API_KEY}",
    "rerankModel": "jina-reranker-v3",
    "candidatePoolSize": 20,
    "recencyHalfLifeDays": 14,
    "recencyWeight": 0.1,
    "filterNoise": true,
    "lengthNormAnchor": 500,
    "hardMinScore": 0.35,
    "timeDecayHalfLifeDays": 60,
    "reinforcementFactor": 0.5,
    "maxHalfLifeMultiplier": 3
  },
  "enableManagementTools": false,
  "scopes": {
    "default": "global",
    "definitions": {
      "global": { "description": "共享知识库" },
      "agent:discord-bot": { "description": "Discord 机器人私有" }
    },
    "agentAccess": {
      "discord-bot": ["global", "agent:discord-bot"]
    }
  },
  "sessionMemory": {
    "enabled": false,
    "messageCount": 15
  }
}

访问强化（1.0.26）

为了让“经常被用到的记忆”衰减得更慢，检索器可以根据 手动 recall 的频率（类似间隔重复/记忆强化）来延长有效的 time-decay half-life。

配置项（位于 retrieval 下）：

reinforcementFactor（范围 0–2，默认 0.5）— 设为 0 可关闭
maxHalfLifeMultiplier（范围 1–10，默认 3）— 硬上限：有效 half-life ≤ 基础值 × multiplier

说明：

强化逻辑只对白名单 source: "manual" 生效（用户/工具主动 recall），避免 auto-recall 意外“强化”噪声。

Embedding 提供商

本插件支持 任意 OpenAI 兼容的 Embedding API：

提供商	模型	Base URL	维度
Jina（推荐）	`jina-embeddings-v5-text-small`	`https://api.jina.ai/v1`	1024
OpenAI	`text-embedding-3-small`	`https://api.openai.com/v1`	1536
Google Gemini	`gemini-embedding-001`	`https://generativelanguage.googleapis.com/v1beta/openai/`	3072
Ollama（本地）	`nomic-embed-text`	`http://localhost:11434/v1`	与本地模型输出一致（建议显式设置 `embedding.dimensions`）

（可选）从 Session JSONL 自动蒸馏记忆（全自动）

OpenClaw 会把每个 Agent 的完整会话自动落盘为 JSONL：

~/.openclaw/agents/<agentId>/sessions/*.jsonl

但 JSONL 含大量噪声（tool 输出、系统块、重复回调等），不建议直接把原文塞进 LanceDB。

推荐方案（2026-02+）：使用 /new 非阻塞沉淀管线（Hooks + systemd worker），在你执行 /new 时异步提取高价值经验并写入 LanceDB Pro：

触发：command:new（你在聊天里发送 /new）
Hook：只投递一个很小的 task.json（毫秒级，不调用 LLM，不阻塞 /new）
Worker：systemd 常驻进程监听队列，读取 session .jsonl，用 Gemini Map-Reduce 抽取 0～20 条高信噪比记忆
写入：通过 openclaw memory-pro import 写入 LanceDB Pro（插件内部仍会 embedding + 查重）
中文关键词：每条记忆包含 Keywords (zh)，并遵循三要素（实体/动作/症状）。其中“实体关键词”必须从 transcript 原文逐字拷贝（禁止编造项目名）。
通知：可选（可做到即使 0 条也通知）

示例文件：

examples/new-session-distill/

Legacy 方案：本插件也提供一个安全的 extractor 脚本 scripts/jsonl_distill.py，配合 OpenClaw 的 cron + 独立 distiller agent，实现“增量蒸馏 → 高质量记忆入库”：（适合不依赖 /new 的全自动场景）

只读取每个 JSONL 文件新增尾巴（byte offset cursor），避免重复和 token 浪费
生成一个小型 batch JSON
由 distiller agent 把 batch 蒸馏成短、原子、可复用的记忆，再用 memory_store 写入

你会得到什么

✅ 全自动（每小时）
✅ 多 Agent 支持（main + 各 bot）
✅ 只处理新增内容（不回读）
✅ 防自我吞噬：默认排除 memory-distiller 自己的 session

脚本输出位置

Cursor：~/.openclaw/state/jsonl-distill/cursor.json
Batches：~/.openclaw/state/jsonl-distill/batches/

脚本只读 session JSONL，不会修改原始日志。

（可选）启用 Agent 来源白名单（提高信噪比）

默认情况下，extractor 会扫描 所有 Agent（但会排除 memory-distiller 自身，防止自我吞噬）。

如果你只想从某些 Agent 蒸馏（例如只蒸馏 main + code-agent），可以设置环境变量：

export OPENCLAW_JSONL_DISTILL_ALLOWED_AGENT_IDS="main,code-agent"

不设置 / 空 / * / all：扫描全部（默认）
逗号分隔列表：只扫描列表内 agentId

scope 策略（非常重要）

当蒸馏“所有 agents”时，务必显式设置 scope：

跨 agent 通用规则/偏好/坑 → scope=global
agent 私有 → scope=agent:<agentId>

否则不同 bot 的记忆会相互污染。

回滚

禁用/删除 cron：openclaw cron disable <jobId> / openclaw cron rm <jobId>
删除 distiller agent：openclaw agents delete memory-distiller
删除 cursor 状态：rm -rf ~/.openclaw/state/jsonl-distill/

CLI 命令

# 列出记忆
openclaw memory-pro list [--scope global] [--category fact] [--limit 20] [--json]

# 搜索记忆
openclaw memory-pro search "query" [--scope global] [--limit 10] [--json]

# 查看统计
openclaw memory-pro stats [--scope global] [--json]

# 按 ID 删除记忆（支持 8+ 字符前缀）
openclaw memory-pro delete <id>

# 批量删除
openclaw memory-pro delete-bulk --scope global [--before 2025-01-01] [--dry-run]

# 导出 / 导入
openclaw memory-pro export [--scope global] [--output memories.json]
openclaw memory-pro import memories.json [--scope global] [--dry-run]

# 使用新模型重新生成 Embedding
openclaw memory-pro reembed --source-db /path/to/old-db [--batch-size 32] [--skip-existing]

# 从内置 memory-lancedb 迁移
openclaw memory-pro migrate check [--source /path]
openclaw memory-pro migrate run [--source /path] [--dry-run] [--skip-existing]
openclaw memory-pro migrate verify [--source /path]

数据库 Schema

LanceDB 表 memories：

字段	类型	说明
`id`	string (UUID)	主键
`text`	string	记忆文本（FTS 索引）
`vector`	float[]	Embedding 向量
`category`	string	`preference` / `fact` / `decision` / `entity` / `other`
`scope`	string	Scope 标识（如 `global`、`agent:main`）
`importance`	float	重要性分数 0-1
`timestamp`	int64	创建时间戳 (ms)
`metadata`	string (JSON)	扩展元数据

常见问题 / 排错

"Cannot mix BigInt and other types"（LanceDB / Apache Arrow）

在 LanceDB 0.26+（底层 Apache Arrow）中，部分数值列在运行时可能会以 BigInt 的形式返回（常见：timestamp、importance、_distance、_score）。如果你遇到类似报错：

TypeError: Cannot mix BigInt and other types, use explicit conversions

请升级到 memory-lancedb-pro >= 1.0.14。插件已对这些字段统一做 Number(...) 转换后再参与运算（例如：计算分数、按时间排序）。

AI Agent 铁律（Iron Rules）

OpenClaw 用户：将下方代码块复制到你的 AGENTS.md 中，让 Agent 自动遵守这些规则。

## Rule 1 — 双层记忆存储（铁律）

Every pitfall/lesson learned → IMMEDIATELY store TWO memories to LanceDB before moving on:

- **Technical layer**: Pitfall: [symptom]. Cause: [root cause]. Fix: [solution]. Prevention: [how to avoid]
  (category: fact, importance ≥ 0.8)
- **Principle layer**: Decision principle ([tag]): [behavioral rule]. Trigger: [when it applies]. Action: [what to do]
  (category: decision, importance ≥ 0.85)
- After each store, immediately `memory_recall` with anchor keywords to verify retrieval.
  If not found, rewrite and re-store.
- Missing either layer = incomplete.
  Do NOT proceed to next topic until both are stored and verified.
- Also update relevant SKILL.md files to prevent recurrence.

## Rule 2 — LanceDB 卫生

Entries must be short and atomic (< 500 chars). Never store raw conversation summaries, large blobs, or duplicates.
Prefer structured format with keywords for retrieval.

## Rule 3 — Recall before retry

On ANY tool failure, repeated error, or unexpected behavior, ALWAYS `memory_recall` with relevant keywords
(error message, tool name, symptom) BEFORE retrying. LanceDB likely already has the fix.
Blind retries waste time and repeat known mistakes.

## Rule 4 — 编辑前确认目标代码库

When working on memory plugins, confirm you are editing the intended package
(e.g., `memory-lancedb-pro` vs built-in `memory-lancedb`) before making changes;
use `memory_recall` + filesystem search to avoid patching the wrong repo.

## Rule 5 — 插件代码变更必须清 jiti 缓存（MANDATORY）

After modifying ANY `.ts` file under `plugins/`, MUST run `rm -rf /tmp/jiti/` BEFORE `openclaw gateway restart`.
jiti caches compiled TS; restart alone loads STALE code. This has caused silent bugs multiple times.
Config-only changes do NOT need cache clearing.

依赖

包	用途
`@lancedb/lancedb` ≥0.26.2	向量数据库（ANN + FTS）
`openai` ≥6.21.0	OpenAI 兼容 Embedding API 客户端
`@sinclair/typebox` 0.34.48	JSON Schema 类型定义（工具参数）