Vibecape
Context 系统

Token 管理实现

Token 管理实现

概述

Token 管理是 Context 系统的基础设施,负责:

  1. Token 计数:估算文本的 Token 数量
  2. 预算跟踪:跟踪各部分的 Token 使用情况
  3. 文本截断:将文本截断到指定 Token 数
  4. 滑动窗口:控制历史消息的 Token 占用
  5. 工具压缩:压缩大型工具返回结果

TokenCounter(Token 计数器)

核心功能

提供 Token 计数和预算管理功能:

  • 快速估算:使用字符计数,避免调用 tokenizer
  • 中英文适配:根据文本语言动态调整
  • 预算跟踪:实时跟踪各部分的使用情况
  • 智能截断:在句子边界截断文本

计数策略

设计原则:速度优先,准确度足够

function estimateTokens(text: string): number {
  if (!text) return 0;

  // 1. 检测中文比例
  const chineseRatio = getChineseRatio(text);

  // 2. 根据中英文比例计算加权因子
  // 中文: 1.8 字符/token, 英文: 4 字符/token
  const charsPerToken = chineseRatio * 1.8 + (1 - chineseRatio) * 4;

  // 3. 计算 token 数
  return Math.ceil(text.length / charsPerToken);
}

function getChineseRatio(text: string): number {
  const chineseRegex = /[\u4e00-\u9fff]/g;
  const chineseChars = text.match(chineseRegex);
  return chineseChars ? chineseChars.length / text.length : 0;
}

准确度对比

文本类型实际 Tokens估算 Tokens误差
纯中文 (100 字)~5556+1.8%
纯英文 (400 字)~1001000%
混合 (50+200)~7778+1.3%

为什么不用 tokenizer?

  • tokenizer API 调用有延迟(~50-100ms)
  • 需要加载模型文件(~10MB)
  • 估算足够准确,不需要精确值

文本截断

智能截断:在句子边界截断,保留语义完整性

export function truncateToTokens(
  text: string,
  maxTokens: number,
  suffix: string = "..."
): string {
  const currentTokens = estimateTokens(text);
  if (currentTokens <= maxTokens) return text;

  // 1. 估算需要保留的字符数
  const ratio = maxTokens / currentTokens;
  const targetLength = Math.floor(text.length * ratio) - suffix.length;

  if (targetLength <= 0) return suffix;

  // 2. 尝试在句子边界截断
  const truncated = text.substring(0, targetLength);
  const lastSentenceEnd = Math.max(
    truncated.lastIndexOf("。"),
    truncated.lastIndexOf("!"),
    truncated.lastIndexOf("?"),
    truncated.lastIndexOf("."),
    truncated.lastIndexOf("!"),
    truncated.lastIndexOf("?")
  );

  // 3. 如果找到句子边界,在边界截断
  if (lastSentenceEnd > targetLength * 0.5) {
    return truncated.substring(0, lastSentenceEnd + 1) + suffix;
  }

  // 4. 否则直接截断
  return truncated + suffix;
}

示例

const text = "这是一个长句子。包含多个分句。每个分句都有意义。";

// 截断到 10 tokens (约 18 字符)
truncateToTokens(text, 10);
// "这是一个长句子。..."

预算跟踪

TokenBudgetTracker:跟踪各部分的 Token 使用

export class TokenBudgetTracker {
  private budget: number;
  private used: number = 0;
  private breakdown: Map<string, number> = new Map();

  constructor(totalBudget: number) {
    this.budget = totalBudget;
  }

  /**
   * 分配 Token 到指定类别
   * @returns 是否分配成功(预算充足)
   */
  allocate(category: string, tokens: number): boolean {
    if (this.used + tokens > this.budget) {
      return false; // 预算不足
    }

    this.used += tokens;
    this.breakdown.set(category, (this.breakdown.get(category) || 0) + tokens);
    return true;
  }

  /**
   * 获取剩余预算
   */
  getRemaining(): number {
    return this.budget - this.used;
  }

  /**
   * 获取分类统计
   */
  getBreakdown(): Record<string, number> {
    return Object.fromEntries(this.breakdown);
  }
}

使用示例

const tracker = new TokenBudgetTracker(10000);

// 分配预算
tracker.allocate("systemPrompt", 2000);  // true
tracker.allocate("summary", 1000);       // true
tracker.allocate("messages", 8000);      // false (超出预算)

// 获取统计
console.log(tracker.getBreakdown());
// { systemPrompt: 2000, summary: 1000 }

console.log(tracker.getRemaining());
// 7000

SlidingWindow(滑动窗口)

核心功能

控制历史消息的 Token 占用:

  • 保留最近 N 轮:优先保留新消息
  • 支持 Tool Call:完整保留工具调用格式
  • 智能截断:单条消息过长时截断
  • 按预算选择:优先保留最新的消息

窗口应用

applyWindow(
  messages: ChatMessage[],
  startSeq: number = 0,
  budgetOverride?: number
): WindowResult {
  const budget = budgetOverride ?? this.config.messagesBudget;

  // 1. 过滤出未被摘要的消息
  const eligibleMessages = messages.filter((m) => m.sequence >= startSeq);

  // 2. 获取窗口内的消息(最后 N 条)
  const windowMessages = eligibleMessages.slice(-this.config.windowSize * 2);

  // 3. 倒序遍历:优先保留最新的
  const reversed = [...windowMessages].reverse();
  const result: WindowedMessage[] = [];
  let totalTokens = 0;

  for (const msg of reversed) {
    const content = this.extractTextContent(msg);
    let tokens = estimateTokens(content);

    // 4. 如果单条消息过长,截断
    if (tokens > this.config.maxMessageTokens) {
      content = truncateToTokens(content, this.config.maxMessageTokens);
      tokens = estimateTokens(content);
    }

    // 5. 估算工具调用开销
    const toolParts = (msg.parts || []).filter((p) => p.type.startsWith("tool-"));
    const toolTokens = toolParts.length * 50;
    const messageTotalTokens = tokens + toolTokens;

    // 6. 检查是否超出预算
    if (totalTokens + messageTotalTokens > budget) {
      break; // 停止添加
    }

    // 7. 使用 unshift 保持原始时间顺序
    result.unshift({
      role: msg.role,
      content,
      originalId: msg.id,
      tokens: messageTotalTokens,
    });

    totalTokens += messageTotalTokens;
  }

  return {
    messages: result,
    totalTokens,
    truncatedCount,
    excludedCount: messages.length - result.length,
    lastIncludedSeq: maxSeq,
  };
}

消息格式转换

支持完整的工具调用格式(OpenAI/Anthropic):

convertToStandardMessages(msg: ChatMessage): StandardMessage[] {
  const messages: StandardMessage[] = [];
  const parts = (msg.parts || []) as (MessagePart & ToolCallPart)[];

  // 1. 提取文本和推理内容
  const textParts = parts.filter((p) => p.type === "text" || p.type === "reasoning");
  const textContent = textParts.map((p) => p.text || "").join("");

  // 2. 识别工具调用和结果
  const toolParts = parts.filter((p) => p.type.startsWith("tool-"));
  const toolCallParts = toolParts.filter((p) => p.state === "input-available");
  const toolResultParts = toolParts.filter((p) => p.state === "output-available");

  // 3. 构建 Assistant 消息
  if (msg.role === "assistant") {
    const assistantMsg: StandardMessage = {
      role: "assistant",
      content: textContent,
      tool_calls: toolCallParts.map((tc) => ({
        id: tc.toolCallId || `call_${Date.now()}`,
        type: "function" as const,
        function: {
          name: tc.type.replace("tool-", ""),
          arguments: JSON.stringify(tc.input || {}),
        },
      })),
    };
    messages.push(assistantMsg);

    // 4. 为每个工具结果生成 tool 消息
    for (const tr of toolResultParts) {
      messages.push({
        role: "tool",
        tool_call_id: tr.toolCallId || "",
        content: JSON.stringify(tr.output || {}),
      });
    }
  }

  return messages;
}

配置选项

interface SlidingWindowConfig {
  /** 默认窗口大小 (消息对数) */
  windowSize: number;
  /** 最大消息 Token 数 (单条) */
  maxMessageTokens: number;
  /** 总消息区域的 Token 预算 */
  messagesBudget: number;
}

const DEFAULT_CONFIG: SlidingWindowConfig = {
  windowSize: 5,        // 最近 5 轮 (10 条消息)
  maxMessageTokens: 800,  // 单条消息最多 800 tokens
  messagesBudget: 4000,   // 消息区域预算 4000 tokens
};

ToolResultCompressor(工具结果压缩)

核心功能

压缩工具调用的返回结果:

  • 针对不同工具:应用特定的压缩策略
  • 大文本卸载:将大型内容存入工作记忆
  • 结构保留:保留元数据和结构信息
  • 列表截断:限制列表结果的数量

压缩策略

compress(toolName: string, result: unknown): string {
  if (result === undefined || result === null) {
    return "null";
  }

  // 针对特定工具的优化策略
  switch (toolName) {
    case "getDoc":
    case "getCurrentDoc":
      return this.compressDocResult(result);

    case "listDocuments":
    case "list_resources":
      return this.compressListResult(result);

    case "grep_search":
    case "search_web":
      return this.compressSearchResult(result);

    case "view_file":
    case "read_url_content":
      return this.compressLargeText(result);

    default:
      return this.defaultCompress(result);
  }
}

文档类结果压缩

大内容卸载到工作记忆

private compressDocResult(result: unknown): string {
  if (typeof result !== "object") return String(result);

  const anyResult = result as any;

  // 如果是 Document 类型
  if (anyResult.content && (anyResult.outline || anyResult.metadata)) {
    const content = anyResult.content;
    const tokens = estimateTokens(content);

    // 如果内容过大,进行 Offloading
    if (tokens > this.config.maxResultTokens) {
      const preview = content.slice(0, this.config.textPreviewLength);

      // 存入 Working Memory
      const memoryId = WorkingMemoryService.store(
        "getDoc",
        content,
        `Document: ${anyResult.title || "Untitled"}`,
        preview
      );

      return JSON.stringify({
        id: anyResult.id,
        title: anyResult.title,
        outline: anyResult.outline,
        // 关键:返回 Pointer
        _OFFLOADED_: {
          memoryId: memoryId,
          summary: `Content too large (${tokens} tokens). Saved to temporary memory.`,
          preview: preview + "... [More available via read_working_memory]",
          totalLength: content.length,
        },
        metadata: anyResult.metadata,
      });
    }

    // 内容不大,直接返回
    return JSON.stringify({
      id: anyResult.id,
      title: anyResult.title,
      outline: anyResult.outline,
      contentPreview: content.slice(0, 500),
      totalLength: content.length,
      metadata: anyResult.metadata,
    });
  }

  return this.defaultCompress(result);
}

列表类结果压缩

限制条目数量

private compressListResult(result: unknown): string {
  const list = Array.isArray(result) ? result : (result as any).items || [];

  if (!Array.isArray(list)) return this.defaultCompress(result);

  const totalItems = list.length;

  if (totalItems <= this.config.maxListItems) {
    return JSON.stringify(result);
  }

  const sliced = list.slice(0, this.config.maxListItems);

  return JSON.stringify({
    items: sliced,
    _note_: `Showing ${sliced.length} of ${totalItems} items. Please use filters to see more.`,
    totalCount: totalItems,
  });
}

大文本压缩

自动卸载到工作记忆

private compressLargeText(result: unknown): string {
  let text = "";
  if (typeof result === "string") {
    text = result;
  } else if (typeof result === "object" && (result as any).output) {
    text = (result as any).output;
  } else {
    return this.defaultCompress(result);
  }

  const tokens = estimateTokens(text);
  if (tokens <= this.config.maxResultTokens) {
    return text;
  }

  // 执行 Offload
  const preview = text.slice(0, this.config.textPreviewLength);

  const memoryId = WorkingMemoryService.store(
    "large-text-tool",
    text,
    `Large Text Content (${tokens} tokens)`,
    preview
  );

  return JSON.stringify({
    _OFFLOADED_: {
      memoryId: memoryId,
      summary: `Content too large (${tokens} tokens/ ${text.length} chars). Saved to temporary memory.`,
      preview: preview + "... [More available via read_working_memory]",
      totalLength: text.length,
    },
    originalSize: tokens,
  });
}

配置选项

interface CompressorConfig {
  /** 单个工具结果的最大 Token 数 */
  maxResultTokens: number;
  /** 列表类型结果的最大条目数 */
  maxListItems: number;
  /** 文本预览长度 */
  textPreviewLength: number;
}

const DEFAULT_CONFIG: CompressorConfig = {
  maxResultTokens: 1000,
  maxListItems: 20,
  textPreviewLength: 500,
};

使用示例

Token 计数

import { estimateTokens, truncateToTokens } from "./context";

const text = "这是一段文本...";

// 估算 token 数
const tokens = estimateTokens(text);
console.log(`Tokens: ${tokens}`);

// 截断到指定 token 数
const truncated = truncateToTokens(text, 100);

预算跟踪

import { TokenBudgetTracker } from "./context";

const tracker = new TokenBudgetTracker(10000);

// 分配预算
if (tracker.allocate("system", 2000)) {
  console.log("System prompt allocated");
}

if (tracker.allocate("messages", 8000)) {
  console.log("Messages allocated");
}

// 获取统计
console.log(tracker.getBreakdown());
// { system: 2000, messages: 8000 }

console.log(`Remaining: ${tracker.getRemaining()}`);
// Remaining: 0

滑动窗口

import { SlidingWindow } from "./context";

// 应用滑动窗口
const result = SlidingWindow.applyWindow(
  messages,
  startSeq,
  budgetOverride
);

console.log(`Included: ${result.messages.length}`);
console.log(`Excluded: ${result.excludedCount}`);
console.log(`Total tokens: ${result.totalTokens}`);

工具结果压缩

import { ToolResultCompressor } from "./context";

const compressed = ToolResultCompressor.compress(
  "getDoc",
  {
    id: "doc-1",
    title: "Large Document",
    content: "...", // 很长的内容
    outline: [...],
  }
);

console.log(compressed);
// {
//   "id": "doc-1",
//   "title": "Large Document",
//   "_OFFLOADED_": {
//     "memoryId": "mem-xxx",
//     "summary": "Content too large...",
//     "preview": "...",
//     "totalLength": 50000
//   }
// }

性能优化

避免重复计数

// 缓存计数结果
const tokenCache = new Map<string, number>();

function estimateTokensCached(text: string): number {
  if (tokenCache.has(text)) {
    return tokenCache.get(text)!;
  }
  const tokens = estimateTokens(text);
  tokenCache.set(text, tokens);
  return tokens;
}

批量处理

// 批量估算消息列表
function estimateMessagesTokens(
  messages: Array<{ role: string; content: string }>
): number {
  let total = 0;
  for (const msg of messages) {
    total += 4; // 消息格式开销
    total += estimateTokens(msg.content);
  }
  return total;
}

智能截断

// 在句子边界截断,保留语义完整性
const lastSentenceEnd = Math.max(
  truncated.lastIndexOf("。"),
  truncated.lastIndexOf("."),
  truncated.lastIndexOf("?")
);

if (lastSentenceEnd > targetLength * 0.5) {
  return truncated.substring(0, lastSentenceEnd + 1) + suffix;
}

相关文档