Context 系统
Token 管理实现
Token 管理实现
概述
Token 管理是 Context 系统的基础设施,负责:
- Token 计数:估算文本的 Token 数量
- 预算跟踪:跟踪各部分的 Token 使用情况
- 文本截断:将文本截断到指定 Token 数
- 滑动窗口:控制历史消息的 Token 占用
- 工具压缩:压缩大型工具返回结果
TokenCounter(Token 计数器)
核心功能
提供 Token 计数和预算管理功能:
- 快速估算:使用字符计数,避免调用 tokenizer
- 中英文适配:根据文本语言动态调整
- 预算跟踪:实时跟踪各部分的使用情况
- 智能截断:在句子边界截断文本
计数策略
设计原则:速度优先,准确度足够
function estimateTokens(text: string): number {
if (!text) return 0;
// 1. 检测中文比例
const chineseRatio = getChineseRatio(text);
// 2. 根据中英文比例计算加权因子
// 中文: 1.8 字符/token, 英文: 4 字符/token
const charsPerToken = chineseRatio * 1.8 + (1 - chineseRatio) * 4;
// 3. 计算 token 数
return Math.ceil(text.length / charsPerToken);
}
function getChineseRatio(text: string): number {
const chineseRegex = /[\u4e00-\u9fff]/g;
const chineseChars = text.match(chineseRegex);
return chineseChars ? chineseChars.length / text.length : 0;
}准确度对比:
| 文本类型 | 实际 Tokens | 估算 Tokens | 误差 |
|---|---|---|---|
| 纯中文 (100 字) | ~55 | 56 | +1.8% |
| 纯英文 (400 字) | ~100 | 100 | 0% |
| 混合 (50+200) | ~77 | 78 | +1.3% |
为什么不用 tokenizer?
- tokenizer API 调用有延迟(~50-100ms)
- 需要加载模型文件(~10MB)
- 估算足够准确,不需要精确值
文本截断
智能截断:在句子边界截断,保留语义完整性
export function truncateToTokens(
text: string,
maxTokens: number,
suffix: string = "..."
): string {
const currentTokens = estimateTokens(text);
if (currentTokens <= maxTokens) return text;
// 1. 估算需要保留的字符数
const ratio = maxTokens / currentTokens;
const targetLength = Math.floor(text.length * ratio) - suffix.length;
if (targetLength <= 0) return suffix;
// 2. 尝试在句子边界截断
const truncated = text.substring(0, targetLength);
const lastSentenceEnd = Math.max(
truncated.lastIndexOf("。"),
truncated.lastIndexOf("!"),
truncated.lastIndexOf("?"),
truncated.lastIndexOf("."),
truncated.lastIndexOf("!"),
truncated.lastIndexOf("?")
);
// 3. 如果找到句子边界,在边界截断
if (lastSentenceEnd > targetLength * 0.5) {
return truncated.substring(0, lastSentenceEnd + 1) + suffix;
}
// 4. 否则直接截断
return truncated + suffix;
}示例:
const text = "这是一个长句子。包含多个分句。每个分句都有意义。";
// 截断到 10 tokens (约 18 字符)
truncateToTokens(text, 10);
// "这是一个长句子。..."预算跟踪
TokenBudgetTracker:跟踪各部分的 Token 使用
export class TokenBudgetTracker {
private budget: number;
private used: number = 0;
private breakdown: Map<string, number> = new Map();
constructor(totalBudget: number) {
this.budget = totalBudget;
}
/**
* 分配 Token 到指定类别
* @returns 是否分配成功(预算充足)
*/
allocate(category: string, tokens: number): boolean {
if (this.used + tokens > this.budget) {
return false; // 预算不足
}
this.used += tokens;
this.breakdown.set(category, (this.breakdown.get(category) || 0) + tokens);
return true;
}
/**
* 获取剩余预算
*/
getRemaining(): number {
return this.budget - this.used;
}
/**
* 获取分类统计
*/
getBreakdown(): Record<string, number> {
return Object.fromEntries(this.breakdown);
}
}使用示例:
const tracker = new TokenBudgetTracker(10000);
// 分配预算
tracker.allocate("systemPrompt", 2000); // true
tracker.allocate("summary", 1000); // true
tracker.allocate("messages", 8000); // false (超出预算)
// 获取统计
console.log(tracker.getBreakdown());
// { systemPrompt: 2000, summary: 1000 }
console.log(tracker.getRemaining());
// 7000SlidingWindow(滑动窗口)
核心功能
控制历史消息的 Token 占用:
- 保留最近 N 轮:优先保留新消息
- 支持 Tool Call:完整保留工具调用格式
- 智能截断:单条消息过长时截断
- 按预算选择:优先保留最新的消息
窗口应用
applyWindow(
messages: ChatMessage[],
startSeq: number = 0,
budgetOverride?: number
): WindowResult {
const budget = budgetOverride ?? this.config.messagesBudget;
// 1. 过滤出未被摘要的消息
const eligibleMessages = messages.filter((m) => m.sequence >= startSeq);
// 2. 获取窗口内的消息(最后 N 条)
const windowMessages = eligibleMessages.slice(-this.config.windowSize * 2);
// 3. 倒序遍历:优先保留最新的
const reversed = [...windowMessages].reverse();
const result: WindowedMessage[] = [];
let totalTokens = 0;
for (const msg of reversed) {
const content = this.extractTextContent(msg);
let tokens = estimateTokens(content);
// 4. 如果单条消息过长,截断
if (tokens > this.config.maxMessageTokens) {
content = truncateToTokens(content, this.config.maxMessageTokens);
tokens = estimateTokens(content);
}
// 5. 估算工具调用开销
const toolParts = (msg.parts || []).filter((p) => p.type.startsWith("tool-"));
const toolTokens = toolParts.length * 50;
const messageTotalTokens = tokens + toolTokens;
// 6. 检查是否超出预算
if (totalTokens + messageTotalTokens > budget) {
break; // 停止添加
}
// 7. 使用 unshift 保持原始时间顺序
result.unshift({
role: msg.role,
content,
originalId: msg.id,
tokens: messageTotalTokens,
});
totalTokens += messageTotalTokens;
}
return {
messages: result,
totalTokens,
truncatedCount,
excludedCount: messages.length - result.length,
lastIncludedSeq: maxSeq,
};
}消息格式转换
支持完整的工具调用格式(OpenAI/Anthropic):
convertToStandardMessages(msg: ChatMessage): StandardMessage[] {
const messages: StandardMessage[] = [];
const parts = (msg.parts || []) as (MessagePart & ToolCallPart)[];
// 1. 提取文本和推理内容
const textParts = parts.filter((p) => p.type === "text" || p.type === "reasoning");
const textContent = textParts.map((p) => p.text || "").join("");
// 2. 识别工具调用和结果
const toolParts = parts.filter((p) => p.type.startsWith("tool-"));
const toolCallParts = toolParts.filter((p) => p.state === "input-available");
const toolResultParts = toolParts.filter((p) => p.state === "output-available");
// 3. 构建 Assistant 消息
if (msg.role === "assistant") {
const assistantMsg: StandardMessage = {
role: "assistant",
content: textContent,
tool_calls: toolCallParts.map((tc) => ({
id: tc.toolCallId || `call_${Date.now()}`,
type: "function" as const,
function: {
name: tc.type.replace("tool-", ""),
arguments: JSON.stringify(tc.input || {}),
},
})),
};
messages.push(assistantMsg);
// 4. 为每个工具结果生成 tool 消息
for (const tr of toolResultParts) {
messages.push({
role: "tool",
tool_call_id: tr.toolCallId || "",
content: JSON.stringify(tr.output || {}),
});
}
}
return messages;
}配置选项
interface SlidingWindowConfig {
/** 默认窗口大小 (消息对数) */
windowSize: number;
/** 最大消息 Token 数 (单条) */
maxMessageTokens: number;
/** 总消息区域的 Token 预算 */
messagesBudget: number;
}
const DEFAULT_CONFIG: SlidingWindowConfig = {
windowSize: 5, // 最近 5 轮 (10 条消息)
maxMessageTokens: 800, // 单条消息最多 800 tokens
messagesBudget: 4000, // 消息区域预算 4000 tokens
};ToolResultCompressor(工具结果压缩)
核心功能
压缩工具调用的返回结果:
- 针对不同工具:应用特定的压缩策略
- 大文本卸载:将大型内容存入工作记忆
- 结构保留:保留元数据和结构信息
- 列表截断:限制列表结果的数量
压缩策略
compress(toolName: string, result: unknown): string {
if (result === undefined || result === null) {
return "null";
}
// 针对特定工具的优化策略
switch (toolName) {
case "getDoc":
case "getCurrentDoc":
return this.compressDocResult(result);
case "listDocuments":
case "list_resources":
return this.compressListResult(result);
case "grep_search":
case "search_web":
return this.compressSearchResult(result);
case "view_file":
case "read_url_content":
return this.compressLargeText(result);
default:
return this.defaultCompress(result);
}
}文档类结果压缩
大内容卸载到工作记忆:
private compressDocResult(result: unknown): string {
if (typeof result !== "object") return String(result);
const anyResult = result as any;
// 如果是 Document 类型
if (anyResult.content && (anyResult.outline || anyResult.metadata)) {
const content = anyResult.content;
const tokens = estimateTokens(content);
// 如果内容过大,进行 Offloading
if (tokens > this.config.maxResultTokens) {
const preview = content.slice(0, this.config.textPreviewLength);
// 存入 Working Memory
const memoryId = WorkingMemoryService.store(
"getDoc",
content,
`Document: ${anyResult.title || "Untitled"}`,
preview
);
return JSON.stringify({
id: anyResult.id,
title: anyResult.title,
outline: anyResult.outline,
// 关键:返回 Pointer
_OFFLOADED_: {
memoryId: memoryId,
summary: `Content too large (${tokens} tokens). Saved to temporary memory.`,
preview: preview + "... [More available via read_working_memory]",
totalLength: content.length,
},
metadata: anyResult.metadata,
});
}
// 内容不大,直接返回
return JSON.stringify({
id: anyResult.id,
title: anyResult.title,
outline: anyResult.outline,
contentPreview: content.slice(0, 500),
totalLength: content.length,
metadata: anyResult.metadata,
});
}
return this.defaultCompress(result);
}列表类结果压缩
限制条目数量:
private compressListResult(result: unknown): string {
const list = Array.isArray(result) ? result : (result as any).items || [];
if (!Array.isArray(list)) return this.defaultCompress(result);
const totalItems = list.length;
if (totalItems <= this.config.maxListItems) {
return JSON.stringify(result);
}
const sliced = list.slice(0, this.config.maxListItems);
return JSON.stringify({
items: sliced,
_note_: `Showing ${sliced.length} of ${totalItems} items. Please use filters to see more.`,
totalCount: totalItems,
});
}大文本压缩
自动卸载到工作记忆:
private compressLargeText(result: unknown): string {
let text = "";
if (typeof result === "string") {
text = result;
} else if (typeof result === "object" && (result as any).output) {
text = (result as any).output;
} else {
return this.defaultCompress(result);
}
const tokens = estimateTokens(text);
if (tokens <= this.config.maxResultTokens) {
return text;
}
// 执行 Offload
const preview = text.slice(0, this.config.textPreviewLength);
const memoryId = WorkingMemoryService.store(
"large-text-tool",
text,
`Large Text Content (${tokens} tokens)`,
preview
);
return JSON.stringify({
_OFFLOADED_: {
memoryId: memoryId,
summary: `Content too large (${tokens} tokens/ ${text.length} chars). Saved to temporary memory.`,
preview: preview + "... [More available via read_working_memory]",
totalLength: text.length,
},
originalSize: tokens,
});
}配置选项
interface CompressorConfig {
/** 单个工具结果的最大 Token 数 */
maxResultTokens: number;
/** 列表类型结果的最大条目数 */
maxListItems: number;
/** 文本预览长度 */
textPreviewLength: number;
}
const DEFAULT_CONFIG: CompressorConfig = {
maxResultTokens: 1000,
maxListItems: 20,
textPreviewLength: 500,
};使用示例
Token 计数
import { estimateTokens, truncateToTokens } from "./context";
const text = "这是一段文本...";
// 估算 token 数
const tokens = estimateTokens(text);
console.log(`Tokens: ${tokens}`);
// 截断到指定 token 数
const truncated = truncateToTokens(text, 100);预算跟踪
import { TokenBudgetTracker } from "./context";
const tracker = new TokenBudgetTracker(10000);
// 分配预算
if (tracker.allocate("system", 2000)) {
console.log("System prompt allocated");
}
if (tracker.allocate("messages", 8000)) {
console.log("Messages allocated");
}
// 获取统计
console.log(tracker.getBreakdown());
// { system: 2000, messages: 8000 }
console.log(`Remaining: ${tracker.getRemaining()}`);
// Remaining: 0滑动窗口
import { SlidingWindow } from "./context";
// 应用滑动窗口
const result = SlidingWindow.applyWindow(
messages,
startSeq,
budgetOverride
);
console.log(`Included: ${result.messages.length}`);
console.log(`Excluded: ${result.excludedCount}`);
console.log(`Total tokens: ${result.totalTokens}`);工具结果压缩
import { ToolResultCompressor } from "./context";
const compressed = ToolResultCompressor.compress(
"getDoc",
{
id: "doc-1",
title: "Large Document",
content: "...", // 很长的内容
outline: [...],
}
);
console.log(compressed);
// {
// "id": "doc-1",
// "title": "Large Document",
// "_OFFLOADED_": {
// "memoryId": "mem-xxx",
// "summary": "Content too large...",
// "preview": "...",
// "totalLength": 50000
// }
// }性能优化
避免重复计数
// 缓存计数结果
const tokenCache = new Map<string, number>();
function estimateTokensCached(text: string): number {
if (tokenCache.has(text)) {
return tokenCache.get(text)!;
}
const tokens = estimateTokens(text);
tokenCache.set(text, tokens);
return tokens;
}批量处理
// 批量估算消息列表
function estimateMessagesTokens(
messages: Array<{ role: string; content: string }>
): number {
let total = 0;
for (const msg of messages) {
total += 4; // 消息格式开销
total += estimateTokens(msg.content);
}
return total;
}智能截断
// 在句子边界截断,保留语义完整性
const lastSentenceEnd = Math.max(
truncated.lastIndexOf("。"),
truncated.lastIndexOf("."),
truncated.lastIndexOf("?")
);
if (lastSentenceEnd > targetLength * 0.5) {
return truncated.substring(0, lastSentenceEnd + 1) + suffix;
}