可观测性

Spring AI 构建在 Spring 生态系统的可观测性特性之上，以提供对 AI 相关操作的洞察。

启用可观测性需要使用 spring-boot-actuator 模块。将 Spring Boot Actuator 依赖添加到项目的 Maven pom.xml 文件中：

<dependency>
 <groupId>org.springframework.boot</groupId>
 <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

或者添加到 Gradle build.gradle 文件中：

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-actuator'
}

Spring AI 为其核心组件提供指标和追踪能力：ChatClient（包括 Advisor）、ChatModel、EmbeddingModel、ImageModel 和 VectorStore。

低基数（low cardinality）的键将被添加到指标和追踪中，而高基数（high cardinality）的键只会添加到追踪中。

1.0.0-RC1 破坏性更改以下配置属性已重命名，以更好地反映其用途：

spring.ai.chat.client.observations.include-prompt → spring.ai.chat.client.observations.log-prompt
spring.ai.chat.observations.include-prompt → spring.ai.chat.observations.log-prompt
spring.ai.chat.observations.include-completion → spring.ai.chat.observations.log-completion
spring.ai.image.observations.include-prompt → spring.ai.image.observations.log-prompt
spring.ai.vectorstore.observations.include-query-response → spring.ai.vectorstore.observations.log-query-response

聊天客户端

spring.ai.chat.client 的观测记录会在调用 ChatClient 的 call() 或 stream() 操作时生成。它们用于衡量执行调用所花费的时间，并传播相关的追踪信息。

表 1. 低基数（Low Cardinality）键

名称	描述
`gen_ai.operation.name`	始终为 `framework`。
`gen_ai.system`	始终为 `spring_ai`。
`spring.ai.chat.client.stream`	聊天模型响应是否为流式 - `true` 或 `false`
`spring.ai.kind`	Spring AI 框架 API 的类型：`chat_client`。

表 2. 高基数（High Cardinality）键

名称	描述
`gen_ai.prompt`	通过 chat client 发送的 prompt 内容。可选。
`spring.ai.chat.client.advisor.params`（已弃用）	advisor 参数映射。对话 ID 现在包含在 `spring.ai.chat.client.conversation.id` 中。
`spring.ai.chat.client.advisors`	配置的 chat client advisors 列表。
`spring.ai.chat.client.conversation.id`	使用 chat memory 时的会话标识符。
`spring.ai.chat.client.system.params`（已弃用）	chat client 系统参数。可选。已被 `gen_ai.prompt` 取代。
`spring.ai.chat.client.system.text`（已弃用）	chat client 系统文本。可选。已被 `gen_ai.prompt` 取代。
`spring.ai.chat.client.tool.function.names`（已弃用）	启用的工具函数名称。已被 `spring.ai.chat.client.tool.names` 取代。
`spring.ai.chat.client.tool.function.callbacks`（已弃用）	配置的 chat client 函数回调列表。已被 `spring.ai.chat.client.tool.names` 取代。
`spring.ai.chat.client.tool.names`	传递给 chat client 的工具名称列表。
`spring.ai.chat.client.user.params`（已弃用）	chat client 用户参数。可选。已被 `gen_ai.prompt` 取代。
`spring.ai.chat.client.user.text`（已弃用）	chat client 用户文本。可选。已被 `gen_ai.prompt` 取代。

提示和完成数据

ChatClient 的 prompt 和 completion 数据通常很大，并且可能包含敏感信息。因此，这些数据默认情况下不会被导出。

Spring AI 支持将 prompt 和 completion 数据记录下来，以帮助调试和排查问题。

属性	描述	默认值
`spring.ai.chat.client.observations.log-prompt`	是否记录 chat client 的 prompt 内容	`false`
`spring.ai.chat.client.observations.log-completion`	是否记录 chat client 的 completion 内容	`false`

如果启用 chat client 的 prompt 和 completion 数据日志，有可能会暴露敏感或私密信息，请务必小心！

输入数据（已弃用）

spring.ai.chat.client.observations.include-input 属性已被弃用，取而代之的是 spring.ai.chat.client.observations.log-prompt。详见 Prompt 内容。

ChatClient 的输入数据通常很大，并且可能包含敏感信息。因此，这些数据默认情况下不会被导出。

Spring AI 支持记录输入数据，以帮助调试和排查问题。

属性	描述	默认值
`spring.ai.chat.client.observations.include-input`	是否在观测中包含输入内容	`false`

如果启用在观测中包含输入内容，有可能会暴露敏感或私密信息，请务必小心！

聊天客户端顾问

spring.ai.advisor 的观测数据会在顾问（Advisor）执行时记录。它们测量顾问执行所花费的时间（包括内部顾问执行的时间）并传播相关的跟踪信息。

表 3. 低基数键（Low Cardinality Keys）

名称	描述
`gen_ai.operation.name`	始终为 `framework`。
`gen_ai.system`	始终为 `spring_ai`。
`spring.ai.advisor.type`（已弃用）	顾问在请求处理中的应用位置，曾为 `BEFORE`、`AFTER` 或 `AROUND`。此区分已不再适用，因为所有顾问类型现在统一。
`spring.ai.kind`	Spring AI 框架 API 的类型：`advisor`。

表 4. 高基数键（High Cardinality Keys）

名称	描述
`spring.ai.advisor.name`	顾问的名称。
`spring.ai.advisor.order`	顾问链中的顾问顺序。

聊天模型

观测功能目前仅支持来自以下 AI 模型提供商的 ChatModel 实现：Anthropic、Azure OpenAI、Mistral AI、Ollama、OpenAI、Vertex AI、MiniMax、Moonshot、QianFan、Zhipu AI。未来版本将支持更多 AI 模型提供商。

gen_ai.client.operation 观测在调用 ChatModel 的 call 或 stream 方法时记录。它们测量方法完成所花费的时间，并传播相关的跟踪信息。

gen_ai.client.token.usage 指标用于衡量单次模型调用中使用的输入和输出 token 数量。

表 5. 低基数键（Low Cardinality Keys）

名称	描述
`gen_ai.operation.name`	正在执行的操作名称。
`gen_ai.system`	客户端监控识别的模型提供商。
`gen_ai.request.model`	发起请求的模型名称。
`gen_ai.response.model`	生成响应的模型名称。

表 6. 高基数键（High Cardinality Keys）

名称	描述
`gen_ai.request.frequency_penalty`	模型请求的频率惩罚设置。
`gen_ai.request.max_tokens`	模型为请求生成的最大 token 数。
`gen_ai.request.presence_penalty`	模型请求的存在惩罚设置。
`gen_ai.request.stop_sequences`	模型用于停止生成 token 的序列列表。
`gen_ai.request.temperature`	模型请求的温度设置。
`gen_ai.request.top_k`	模型请求的 top_k 采样设置。
`gen_ai.request.top_p`	模型请求的 top_p 采样设置。
`gen_ai.response.finish_reasons`	模型停止生成 token 的原因，对应接收到的每次生成。
`gen_ai.response.id`	AI 响应的唯一标识符。
`gen_ai.usage.input_tokens`	模型输入（prompt）使用的 token 数量。
`gen_ai.usage.output_tokens`	模型输出（completion）使用的 token 数量。
`gen_ai.usage.total_tokens`	模型交互中使用的 token 总数。
`gen_ai.prompt`	发送给模型的完整 prompt，可选。
`gen_ai.completion`	从模型接收的完整响应，可选。
`spring.ai.model.request.tool.names`	在请求中提供给模型的工具定义列表。

要衡量用户 token，前表列出了观测跟踪中包含的值。使用由 ChatModel 提供的指标名称 gen_ai.client.token.usage。

聊天提示和完成数据

聊天的 prompt（提示）和 completion（完成）数据通常很大，并且可能包含敏感信息。因此，这些数据默认不会被导出。

Spring AI 支持记录聊天的 prompt 和 completion 数据，这在排查问题时非常有用。当可用跟踪（tracing）时，日志将包含跟踪信息以便更好地关联。

属性	描述	默认值
`spring.ai.chat.observations.log-prompt`	是否记录 prompt 内容。`true` 或 `false`	`false`
`spring.ai.chat.observations.log-completion`	是否记录 completion 内容。`true` 或 `false`	`false`
`spring.ai.chat.observations.include-error-logging`	是否在观测中包含错误日志。`true` 或 `false`	`false`

如果启用聊天 prompt 和 completion 数据的记录，有可能会暴露敏感或私密信息，请务必小心！

工具调用

spring.ai.tool 的观测记录在执行聊天模型交互中的工具调用时进行。它们衡量工具调用完成所花费的时间，并传播相关的跟踪信息。

表 7. 低基数键（Low Cardinality Keys）

名称	描述
`gen_ai.operation.name`	执行的操作名称，总是 `framework`。
`gen_ai.system`	执行操作的提供者，总是 `spring_ai`。
`spring.ai.kind`	Spring AI 执行的操作类型，总是 `tool_call`。
`spring.ai.tool.definition.name`	工具名称。

表 8. 高基数键（High Cardinality Keys）

名称	描述
`spring.ai.tool.definition.description`	工具描述。
`spring.ai.tool.definition.schema`	调用工具时使用的参数模式。
`spring.ai.tool.call.arguments`	工具调用的输入参数。（仅在启用时）
`spring.ai.tool.call.result`	调用工具时使用的参数模式。（仅在启用时）

工具调用参数和结果数据

工具调用的输入参数和结果默认情况下不会被导出，因为它们可能包含敏感信息。

Spring AI 支持将工具调用的参数和结果数据作为 span 属性导出。

属性	描述	默认值
`spring.ai.tools.observations.include-content`	是否在观测中包含工具调用内容，`true` 或 `false`	`false`

如果启用将工具调用参数和结果包含在观测中，可能存在暴露敏感或私密信息的风险，请务必小心！

EmbeddingModel

观测功能目前仅支持来自以下 AI 模型提供商的 EmbeddingModel 实现：Azure OpenAI、Mistral AI、Ollama 和 OpenAI。未来版本将支持更多 AI 模型提供商。

gen_ai.client.operation 观测会在嵌入模型方法调用时记录。它们用于测量方法完成所花费的时间，并传播相关的追踪信息。

gen_ai.client.token.usage 指标用于测量单次模型调用所使用的输入和输出 token 数量。

表 9. 低基数键

名称	描述
`gen_ai.operation.name`	执行的操作名称
`gen_ai.system`	客户端监控识别的模型提供商
`gen_ai.request.model`	请求所使用的模型名称
`gen_ai.response.model`	生成响应的模型名称

表 10. 高基数键

名称	描述
`gen_ai.request.embedding.dimensions`	输出嵌入向量的维度数
`gen_ai.usage.input_tokens`	模型输入中使用的 token 数量
`gen_ai.usage.total_tokens`	模型交互中使用的 token 总数

用于测量用户 token 时，上表列出了观测追踪中存在的值。请使用 EmbeddingModel 提供的指标名称 gen_ai.client.token.usage。

图像模型

观测功能目前仅支持来自以下 AI 模型提供商的 ImageModel 实现：OpenAI。未来版本将支持更多 AI 模型提供商。

gen_ai.client.operation 观测会在图像模型方法调用时记录。它们用于测量方法完成所花费的时间，并传播相关的追踪信息。

gen_ai.client.token.usage 指标用于测量单次模型调用所使用的输入和输出 token 数量。

表 11. 低基数键

名称	描述
`gen_ai.operation.name`	执行的操作名称
`gen_ai.system`	客户端监控识别的模型提供商
`gen_ai.request.model`	请求所使用的模型名称

表 12. 高基数键

名称	描述
`gen_ai.request.image.response_format`	生成图像返回的格式
`gen_ai.request.image.size`	要生成的图像尺寸
`gen_ai.request.image.style`	要生成的图像风格
`gen_ai.response.id`	AI 响应的唯一标识符
`gen_ai.response.model`	生成响应的模型名称
`gen_ai.usage.input_tokens`	模型输入（提示）中使用的 token 数量
`gen_ai.usage.output_tokens`	模型输出（生成）中使用的 token 数量
`gen_ai.usage.total_tokens`	模型交互中使用的 token 总数
`gen_ai.prompt`	发送给模型的完整提示（可选）

用于测量用户 token 时，上表列出了观测追踪中存在的值。请使用 ImageModel 提供的指标名称 gen_ai.client.token.usage。

图像提示数据

图像提示数据通常较大，且可能包含敏感信息。因此，默认情况下不会导出这些数据。

Spring AI 支持记录图像提示数据，这对于调试场景非常有用。当可用追踪时，日志将包含追踪信息以便更好地关联。

属性	描述	默认值
`spring.ai.image.observations.log-prompt`	是否记录图像提示内容（`true` 或 `false`）	`false`

如果启用图像提示数据的日志记录，可能存在暴露敏感或私密信息的风险，请谨慎操作！

向量存储

Spring AI 中的所有向量存储实现都已接入监控，可以通过 Micrometer 提供指标和分布式追踪数据。

当与向量存储交互时，会记录 db.vector.client.operation 观察数据。它们用于衡量查询、添加和删除操作所花费的时间，并传播相关的追踪信息。

表 13. 低基数键

名称	描述
`db.operation.name`	正在执行的操作或命令名称，可为 `add`、`delete` 或 `query`。
`db.system`	客户端监控识别的数据库管理系统（DBMS）产品，可为 `pg_vector`、`azure`、`cassandra`、`chroma`、`elasticsearch`、`milvus`、`neo4j`、`opensearch`、`qdrant`、`redis`、`typesense`、`weaviate`、`pinecone`、`oracle`、`mongodb`、`gemfire`、`hana` 或 `simple`。
`spring.ai.kind`	Spring AI 框架 API 类型：`vector_store`。

表 14. 高基数键

名称	描述
`db.collection.name`	数据库中的集合（表、容器）名称。
`db.namespace`	数据库名称，在服务器地址和端口下的完全限定名。
`db.record.id`	记录标识符（如果存在）。
`db.search.similarity_metric`	相似度搜索使用的度量指标。
`db.vector.dimension_count`	向量的维度。
`db.vector.field_name`	向量的字段名称（例如某个字段名）。
`db.vector.query.content`	正在执行的搜索查询内容。
`db.vector.query.filter`	搜索查询中使用的元数据过滤器。
`db.vector.query.response.documents`	相似度搜索查询返回的文档（可选）。
`db.vector.query.similarity_threshold`	接受所有搜索分数的相似度阈值。0.0 表示接受任意相似度或禁用阈值过滤，1.0 表示要求完全匹配。
`db.vector.query.top_k`	查询返回的前 k 个最相似向量。

响应数据

向量搜索响应数据通常较大，并且可能包含敏感信息。因此，默认情况下不会导出这些数据。

Spring AI 支持记录向量搜索响应数据，这在排查问题时非常有用。当可用追踪时，日志将包含追踪信息以便更好地关联。

属性	描述	默认值
`spring.ai.vectorstore.observations.log-query-response`	是否记录向量存储查询响应内容，`true` 或 `false`	`false`

如果启用向量搜索响应数据的日志记录，可能存在暴露敏感或私人信息的风险，请谨慎操作！

更多指标参考

本节记录了 Spring AI 组件在 Prometheus 中导出的指标。

指标命名规范

Spring AI 使用 Micrometer。基础指标名称使用点号（例如，gen_ai.client.operation），Prometheus 导出时会将点号替换为下划线，并添加标准后缀：

计时器（Timers） → <base>_seconds_count、<base>_seconds_sum、<base>_seconds_max，以及（在支持时）<base>_active_count
计数器（Counters） → <base>_total（单调递增）

下表展示了基础指标名称在 Prometheus 时间序列中的展开方式：

基础指标名称	导出时间序列
`gen_ai.client.operation`	`gen_ai_client_operation_seconds_count` `gen_ai_client_operation_seconds_sum` `gen_ai_client_operation_seconds_max` `gen_ai_client_operation_active_count`
`db.vector.client.operation`	`db_vector_client_operation_seconds_count` `db_vector_client_operation_seconds_sum` `db_vector_client_operation_seconds_max` `db_vector_client_operation_active_count`

参考资料：

OpenTelemetry — 生成式 AI 的语义约定（概览）
Micrometer — 命名度量器

聊天客户端指标

指标名称	类型	单位	描述
`gen_ai_chat_client_operation_seconds_sum`	计时器（Timer）	秒	ChatClient 操作（call/stream）消耗的总时间
`gen_ai_chat_client_operation_seconds_count`	计数器（Counter）	次数	已完成的 ChatClient 操作数量
`gen_ai_chat_client_operation_seconds_max`	仪表（Gauge）	秒	ChatClient 操作观察到的最大持续时间
`gen_ai_chat_client_operation_active_count`	仪表（Gauge）	次数	当前正在执行的 ChatClient 操作数量

活动与已完成区分：active_count 显示正在进行的调用；_seconds 系列仅反映已完成的调用。

聊天模型指标（模型提供者执行）

Chat 模型操作指标

指标名称	类型	单位	描述
`gen_ai_client_operation_seconds_sum`	计时器（Timer）	秒	执行 Chat 模型操作的总时间
`gen_ai_client_operation_seconds_count`	计数器（Counter）	次数	已完成的 Chat 模型操作数量
`gen_ai_client_operation_seconds_max`	仪表（Gauge）	秒	Chat 模型操作观察到的最大持续时间
`gen_ai_client_operation_active_count`	仪表（Gauge）	次数	当前正在执行的 Chat 模型操作数量

Token 使用情况指标

指标名称	类型	单位	描述
`gen_ai_client_token_usage_total`	计数器（Counter）	tokens	消耗的总 token 数，按 token 类型标记

标签说明

标签	含义
`gen_ai_token_type=input`	发送给模型的 Prompt token
`gen_ai_token_type=output`	模型返回的 Completion token
`gen_ai_token_type=total`	输入 + 输出 token 总数

向量存储指标

向量存储操作指标

指标名称	类型	单位	描述
`db_vector_client_operation_seconds_sum`	计时器（Timer）	秒	向量存储操作（add/delete/query）花费的总时间
`db_vector_client_operation_seconds_count`	计数器（Counter）	次数	已完成的向量存储操作数量
`db_vector_client_operation_seconds_max`	仪表（Gauge）	秒	向量存储操作观察到的最大持续时间
`db_vector_client_operation_active_count`	仪表（Gauge）	次数	当前正在执行的向量存储操作数量

标签说明

标签	含义
`db_operation_name`	操作类型（`add`、`delete`、`query`）
`db_system`	向量数据库/提供者（`redis`、`chroma`、`pgvector` 等）
`spring_ai_kind`	`vector_store`

理解活动与已完成

Active (*_active_count) — 活跃数：即时衡量正在进行中的操作（并发/负载）的瞬时量规（gauge）。
Completed (*_seconds_sum|count|max) — 已完成数：针对已完成操作的统计数据：
_seconds_sum / _seconds_count → 平均延迟 (average latency)
_seconds_max → 自上次抓取以来的最高水位线 (high-water mark)（取决于注册表行为）

Weaviate

Docker Compose

Spring AI 提供了用于通过 Docker Compose 运行的模型服务或向量存储建立连接的 Spring Boot 自动配置。要启用此功能，请将以下依赖项添加到项目的 Maven pom.xml 文件中：