事件与可观测性

进阶篇第 10 篇 📖 约 8 分钟

把 Agent 接进生产环境时，你往往需要回答两类问题：一次运行里发生了什么（工具调了几次、流式片段何时结束），以及花了多少代价（token、耗时、工具调用次数）。Cody SDK 用事件钩子（.on() / .on_async()）解决前者，用 enable_metrics() 与 get_metrics() 汇总后者。下面先给一份可直接跑的完整脚本，再逐项拆开。

本系列演示模型统一为 qwen3.5-plus。请已按第 01 篇配置环境变量（CODY_MODEL、CODY_MODEL_API_KEY、CODY_MODEL_BASE_URL）；本例使用 AsyncCodyClient 不显式传模型，由环境变量生效。

完整示例：事件 + 指标

这段代码在 Builder 上链式注册 TOOL_CALL、TOOL_RESULT、RUN_END，打开指标采集，跑完一次任务后打印 get_metrics() 返回的摘要字典。

import asyncio
import logging

from cody.sdk import Cody, EventType

logging.basicConfig(level=logging.INFO)
log = logging.getLogger("cody.demo")


async def main() -> None:
    client = (
        Cody()
        .workdir(".")
        .enable_metrics()
        .on("tool_call", lambda e: log.info("TOOL_CALL %s args=%s", e.tool_name, e.args))
        .on(EventType.TOOL_RESULT, lambda e: log.info("TOOL_RESULT %s", e.tool_name))
        .on(EventType.RUN_END, lambda e: log.info("RUN_END 输出长度=%s", len(e.result or "")))
        .build()
    )
    async with client:
        await client.run("用一句话说明当前工作目录里大概有什么文件")

    metrics = client.get_metrics()
    if metrics:
        log.info(
            "指标 total_tokens=%s total_tool_calls=%s total_duration=%.2fs",
            metrics["total_tokens"],
            metrics["total_tool_calls"],
            metrics["total_duration"],
        )


if __name__ == "__main__":
    asyncio.run(main())

这段脚本在做什么

.on(...) 在 Builder 上会自动打开事件系统（等效于显式 enable_events()），无需再手写一遍。
TOOL_CALL 与 TOOL_RESULT 的回调里，典型字段是 tool_name；调用阶段还有 args，结果阶段有 result。
RUN_END 对应一次 run() 正常结束，RunEvent 上可读 result。
enable_metrics() 让客户端在运行过程中累积数据；get_metrics() 返回聚合后的字典（未开启指标时为 None）。

事件基础：`Cody().on()`

除了在 Builder 上注册，也可以在 build() 之后、且配置里已启用事件时，对实例调用 client.on(...)：

from cody.sdk import Cody, EventType

client = Cody().workdir(".").enable_events().build()
client.on(EventType.RUN_END, lambda e: print(e.result))

与 Builder 链式写法等价的核心是：必须先启用事件（.on() 链式调用会自动启用，或手动 enable_events()），否则实例上的 on / on_async 会报错。

`EventType` 枚举全表

下列与 cody.sdk.events 中的定义一致。第三列说明该类型事件对象上常用的字段（具体类型为 RunEvent、ToolEvent 等，均带 event_type、timestamp、data）。

枚举成员	字符串值	典型字段（查阅事件对象）
`RUN_START`	`run_start`	`RunEvent`：`prompt`
`RUN_END`	`run_end`	`RunEvent`：`result`
`RUN_ERROR`	`run_error`	`RunEvent`：`error`
`STREAM_START`	`stream_start`	`StreamEvent`：`chunk_type`、`content`
`STREAM_CHUNK`	`stream_chunk`	`StreamEvent`：`content`
`STREAM_END`	`stream_end`	`StreamEvent`：`chunk_type`、`content`
`TOOL_CALL`	`tool_call`	`ToolEvent`：`tool_name`、`args`
`TOOL_RESULT`	`tool_result`	`ToolEvent`：`tool_name`、`result`
`TOOL_ERROR`	`tool_error`	`ToolEvent`：`tool_name`、`error`
`THINKING_START`	`thinking_start`	`ThinkingEvent`：`content`、`is_start`
`THINKING_CHUNK`	`thinking_chunk`	`ThinkingEvent`：`content`
`THINKING_END`	`thinking_end`	`ThinkingEvent`：`content`、`is_end`
`SESSION_CREATE`	`session_create`	`SessionEvent`：`session_id`、`title`
`SESSION_CLOSE`	`session_close`	`SessionEvent`：`session_id`、`title`
`MODEL_REQUEST`	`model_request`	`ModelEvent`：`model`、`provider`、`input_tokens` 等
`MODEL_RESPONSE`	`model_response`	`ModelEvent`：`output_tokens` 等
`MODEL_ERROR`	`model_error`	`ModelEvent`：`error`
`CONTEXT_COMPACT`	`context_compact`	`ContextCompactEvent`：`original_messages`、`compacted_messages`、`tokens_saved`

字符串与枚举两种写法

client.on() 与 Builder 的 .on() 都接受事件名字符串或 EventType。字符串须与枚举的值一致（例如 "tool_call"，不是驼峰名）。

方式	示例	说明
字符串	`.on("tool_call", handler)`	简短；拼写错误会在运行时表现为无法匹配或构造失败
枚举	`.on(EventType.TOOL_RESULT, handler)`	IDE 可补全、重构更安全

实例方法侧会把字符串转成 EventType(event_type)，因此两种注册方式最终一致。

异步处理器：`on_async`

若回调里需要 await（写数据库、发 HTTP），用 on_async 注册；分发时会先跑同步处理器，再 await 各异步处理器。

import asyncio
from cody.sdk import Cody, EventType

async def async_handler(e):
    # 例如 await audit_log.write(...)
    await asyncio.sleep(0)


client = Cody().workdir(".").enable_events().build()
client.on_async(EventType.TOOL_CALL, async_handler)

异步处理器中的异常会被 SDK 记录到日志，不会默默吞掉；生产环境仍建议在处理器内自行捕获可预期的错误。

指标：`enable_metrics` 与 `get_metrics`

在 Builder 上调用 enable_metrics() 后，客户端内部会挂载指标收集器；每次 run() / 流式运行结束后，可通过 get_metrics() 读取截至当前、跨多次运行聚合的摘要（字典）。

from cody.sdk import Cody

client = Cody().workdir(".").enable_metrics().build()
metrics = client.get_metrics()
# 主要键：total_tokens, total_tool_calls, total_duration
# 另有 total_runs, input_tokens, output_tokens, avg_run_duration,
# avg_tokens_per_run, tool_metrics, session_count 等

键	含义
`total_tokens`	累计 token（输入+输出合计后的总数）
`total_tool_calls`	累计工具调用次数
`total_duration`	各次运行耗时之和（秒，浮点）
`input_tokens` / `output_tokens`	分项累计
`tool_metrics`	按工具名聚合的调用次数、成功率、平均耗时
`total_runs`、`session_count` 等	运行次数、涉及会话数等辅助统计

实战：把所有工具调用写入 `logging`

下面在单独章节把「可观测性」落到标准库日志：对 TOOL_CALL / TOOL_RESULT / TOOL_ERROR 分级记录，便于接入 ELK、Loki 等。

import asyncio
import logging

from cody.sdk import Cody, EventType

logging.basicConfig(level=logging.INFO)
log = logging.getLogger("cody.tools")


def log_tool_call(e):
    log.info("调用工具 name=%s args=%s", e.tool_name, e.args)


def log_tool_result(e):
    body = (e.result or "")[:500]
    log.info("工具返回 name=%s result_prefix=%s", e.tool_name, body)


def log_tool_error(e):
    log.warning("工具失败 name=%s error=%s", e.tool_name, e.error)


async def main():
    client = (
        Cody()
        .workdir(".")
        .on(EventType.TOOL_CALL, log_tool_call)
        .on(EventType.TOOL_RESULT, log_tool_result)
        .on(EventType.TOOL_ERROR, log_tool_error)
        .build()
    )
    async with client:
        await client.run("当前目录下有哪些 Python 文件？")


asyncio.run(main())

日志里打印完整 result 可能包含文件内容或密钥片段；生产环境请做截断、脱敏或仅记录哈希与长度。

小结

你现在已经会用 .on() / .on_async() 订阅 EventType，能对照全表找到每种事件常用的字段，并会用 enable_metrics() 与 get_metrics() 做 token、耗时、工具调用维度的汇总。下一篇「项目记忆」将介绍如何在多轮任务中持久化与检索项目上下文。

事件与可观测性

完整示例：事件 + 指标

这段脚本在做什么

事件基础：Cody().on()

EventType 枚举全表

字符串与枚举两种写法

异步处理器：on_async

指标：enable_metrics 与 get_metrics

实战：把所有工具调用写入 logging

小结

事件基础：`Cody().on()`

`EventType` 枚举全表

异步处理器：`on_async`

指标：`enable_metrics` 与 `get_metrics`

实战：把所有工具调用写入 `logging`