Skip to content

Fix streaming observability for tool calling: open observation scope across reactive thread boundaries#6199

Open
tzolov wants to merge 6 commits into
spring-projects:mainfrom
tzolov:fix-tool-observability-stream
Open

Fix streaming observability for tool calling: open observation scope across reactive thread boundaries#6199
tzolov wants to merge 6 commits into
spring-projects:mainfrom
tzolov:fix-tool-observability-stream

Conversation

@tzolov
Copy link
Copy Markdown
Contributor

@tzolov tzolov commented May 28, 2026

Micrometer observations in the streaming tool-call path were started with .start() but their ThreadLocal scope was never opened, streaming causing tool-call spans to appear as siblings in the trace instead of being correctly nested.

  • DefaultAroundAdvisorChain.nextStream(): open scope in Flux.defer so the OTel ThreadLocal is set during the synchronous subscription chain
  • DefaultChatClient.doGetObservableFluxChatResponse(): same fix so the ToolCallAdvisor observation is correctly nested under the chat-client span
  • ToolCallAdvisor.handleToolCallRecursion(): open scope on the boundedElastic thread before blocking tool execution

Apply the same fix to the deprecated internal tool-execution paths in Anthropic, Bedrock, DeepSeek, GoogleGenAI, MiniMax, MistralAI, Ollama, and OpenAI chat models.

@tzolov tzolov added this to the 2.0.0-RC1 milestone May 28, 2026
@tzolov tzolov added bug Something isn't working tool calling observability labels May 28, 2026
return Flux.deferContextual(ctx -> {
ToolExecutionResult toolExecutionResult;
Observation parentObs = ctx.getOrDefault(ObservationThreadLocalAccessor.KEY, null);
Observation.Scope scope = parentObs != null ? parentObs.openScope() : null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of automatic context propagation it might be that the scope is already open, so perhaps it is worth checking first before opening. Otherwise, the observation might become its own parent if I there is no protection against re-opening the same scope. Worth running a test with automatic context propagation enabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint. I've added a check for open observation and added tests for it.

@tzolov tzolov self-assigned this May 29, 2026
@tzolov tzolov force-pushed the fix-tool-observability-stream branch from 05f605c to 9ebf374 Compare May 29, 2026 08:06
return this.advisorChain.nextStream(chatClientRequest)
.doOnError(observation::error)
.doFinally(s -> {
scope.close();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope.close() most probably will happen on a different thread than the openScope() leading the original scope open on a different Thread.

Comment on lines +605 to +608
// Open observation scope at subscription time, ensuring
// advisor observations created during the synchronous subscription chain
// find this observation in ThreadLocal and attach to it as parent.
Observation.Scope scope = observation.openScope();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should be removed. The parent-child relationship is established above when the observation is created. The code here is not synchronous in nature and introduces a reactive dispatch which can change threads. Since the observation lands in the reactive context you should be fine to take it from the context when you call the synchronous tools.

Comment on lines +144 to +156
Flux<ChatClientResponse> chatClientResponse = Flux.defer(() -> advisor.adviseStream(chatClientRequest, this)
Flux<ChatClientResponse> chatClientResponse = Flux.defer(() -> {
// Open the scope so child observations created during the synchronous
// subscription chain find this observation in ThreadLocal and parent correctly.
Observation.Scope scope = observation.openScope();
return advisor.adviseStream(chatClientRequest, this)
.doOnError(observation::error)
.doFinally(s -> observation.stop())
.contextWrite(ctx -> ctx.put(ObservationThreadLocalAccessor.KEY, observation)));
.doFinally(s -> {
scope.close();
observation.stop();
})
.contextWrite(ctx -> ctx.put(ObservationThreadLocalAccessor.KEY, observation));
});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code can be reverted, this code block introduces a reactive chain that can hop threads between subscription and doFinally. The parent-child relationship is already established and the only thing that is missing is the ToolCallAdvisor population of thread local scope when dispatching to synchronous tools.

tzolov added 5 commits May 29, 2026 13:03
…across reactive thread boundaries

Micrometer observations in the streaming tool-call path were started with
.start() but their ThreadLocal scope was never opened, streaming causing tool-call
spans to appear as siblings in the trace instead of being correctly nested.

- DefaultAroundAdvisorChain.nextStream(): open scope in Flux.defer so the OTel
  ThreadLocal is set during the synchronous subscription chain
- DefaultChatClient.doGetObservableFluxChatResponse(): same fix so the
  ToolCallAdvisor observation is correctly nested under the chat-client span
- ToolCallAdvisor.handleToolCallRecursion(): open scope on the boundedElastic
  thread before blocking tool execution

Apply the same fix to the deprecated internal tool-execution paths in
Anthropic, Bedrock, DeepSeek, GoogleGenAI, MiniMax, MistralAI, Ollama, and
OpenAI chat models.

Signed-off-by: Christian Tzolov <christian.tzolov@broadcom.com>
Signed-off-by: Christian Tzolov <christian.tzolov@broadcom.com>
…maticContextPropagation() is active

Signed-off-by: Christian Tzolov <christian.tzolov@broadcom.com>
Signed-off-by: Christian Tzolov <christian.tzolov@broadcom.com>
Signed-off-by: Christian Tzolov <christian.tzolov@broadcom.com>
@tzolov tzolov force-pushed the fix-tool-observability-stream branch from 9ebf374 to 928905a Compare May 29, 2026 11:13
…o-propagation

Signed-off-by: Christian Tzolov <christian.tzolov@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants