Skip to content

feat: social publishing + NuGet #r + move perf + mesh stability batch#95

Open
rbuergi wants to merge 764 commits into
mainfrom
bug_fix
Open

feat: social publishing + NuGet #r + move perf + mesh stability batch#95
rbuergi wants to merge 764 commits into
mainfrom
bug_fix

Conversation

@rbuergi
Copy link
Copy Markdown
Contributor

@rbuergi rbuergi commented Apr 22, 2026

Summary

77 commits of long-running work on bug_fix — grouped by theme:

  • Social publishing platform (new)MeshWeaver.Social + LinkedIn publisher + scheduled publishing pipeline (engine/queue/stats), LinkedIn OAuth connect + past-post ingest in Memex portal, per-user linked-account menu items.
  • NuGet in-process compile#r "nuget:Pkg, Version" at the top of _Source/*.cs resolves via public NuGet.Protocol without an SDK on the container. Same resolver serves interactive markdown code cells.
  • Move-node parallelization + 30 s ceilingFileSystemPersistenceService.MoveNodeAsync runs per-descendant WriteAsync/DeleteAsync through Task.WhenAll; new MeshOperationOptions (default Timeout = 30s) + WithMeshOperationTimeout(TimeSpan) override; HandleMoveNodeRequest chains .Timeout() on the persistence Observable so a stuck adapter can't hang the caller. Prod repro: DAV2026 subtree move that took 240 s and killed the MCP session — now bounded.
  • Compile / cache invalidation — sticky invalidation on CompilationCacheService, _Source/ edit re-invalidates owning NodeType, cross-silo broadcast via MeshChangeFeed, grain-dispose on node delete, live "Compiling … (Ns)" progress in LayoutAreaView.
  • Catalog & navigation — Children view groups by Category (falls back to NodeType), reactive Children catalog, self-as-default create location for non-NodeType nodes, sample orgs → Markdown for search visibility.
  • Workspace / stream robustness — Workspace remote-stream cache evicted on MeshChangeFeed events, resubscribe on owner dispose, DeleteLayoutArea emits a placeholder immediately and times out slow streams.
  • Infra & small fixes — settings.json overhaul, Delete-is-recursive MCP docs, HeartBeat silencing on Memex hubs, assembly-dir temp-dir fallback, IAsyncEnumerable aggregator fixes (satellite-safe GatherInputsAsync), xunit methodTimeout 30 s → 60 s, Anthropic Opus bump, icon generator, etc.

New test suites (selected)

  • test/MeshWeaver.Persistence.Test/MoveNodeRecursiveTest.cs — 10 tests: recursion, parallelism, source missing / target exists / storage throws / cancellation (all must not hang), Rx Timeout() contract, default-30s config.
  • test/MeshWeaver.Social.Test/*InMemoryPublishQueueTest, LinkedInPublisherEngagementTest, PostStatsRefresherTest, ScheduledPostPublisherTest, FakePublisher.
  • test/MeshWeaver.Persistence.Test/WorkspaceCacheEvictionTest.cs, ResubscribeOnOwnerDisposeTest.cs, DeleteLayoutAreaIntegrationTest.cs.
  • test/MeshWeaver.Markdown.Test/PathUtilsTest.cs, test/MeshWeaver.MathDemo.Test/MatrixViewsTest.cs.

Contributors

Upstream already merged into this branch

Test plan

  • dotnet build succeeds
  • dotnet test test/MeshWeaver.Persistence.Test --filter MoveNodeRecursiveTest — 10/10 green (~8 s)
  • dotnet test test/MeshWeaver.Hosting.Monolith.Test --filter MoveNodeAsync — 5/5 green (regression guard)
  • dotnet test test/MeshWeaver.Social.Test — publish queue / scheduling / stats green
  • Manual prod smoke: move a 3-descendant subtree in memex-prod; confirms < 30 s and MCP session survives
  • Create a _Source/*.cs using #r "nuget:MathNet.Numerics, 5.0.0" — compiles & renders (cold + warm cache)
  • Delete a node then recreate at same path — fresh grain, fresh compile, no stale HubConfiguration
  • Navigate to a cold node — "Compiling (Ns)…" progress renders until the stream resolves
  • LinkedIn OAuth: sign in → /social/connect/linkedin → profile linked; menu shows connected account
  • Scheduled post fires through ScheduledPostPublisher → LinkedIn publisher posts; PostStatsRefresher pulls stats

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Test Results

3 038 tests  +56   2 968 ✅ ± 0   17m 42s ⏱️ + 10m 30s
   37 suites + 1       6 💤  -  7 
   38 files   + 2      64 ❌ +63 
    1 errors

For more details on these parsing errors and failures, see this check.

Results for commit f24b5b3. ± Comparison against base commit bea0a2e.

This pull request removes 643 and adds 699 tests. Note that renamed tests count towards both.
MeshWeaver.AI.Test.AgentChatClientTest ‑ AgentChatClient_InitializeAsync_FindsAgentsFromPathHierarchy
MeshWeaver.AI.Test.AgentChatClientTest ‑ AgentChatClient_InitializeAsync_FindsTodoAgentFromNodeTypeNamespace
MeshWeaver.AI.Test.AgentChatClientUnitTest ‑ FindCyclicDelegations_Chain_ReturnsEmpty
MeshWeaver.AI.Test.AgentChatClientUnitTest ‑ FindCyclicDelegations_DelegationsWithNoMatchingTarget_ReturnsEmpty
MeshWeaver.AI.Test.AgentChatClientUnitTest ‑ FindCyclicDelegations_EmptyInput_ReturnsEmpty
MeshWeaver.AI.Test.AgentChatClientUnitTest ‑ FindCyclicDelegations_MixedCyclicAndNonCyclic_ReturnsOnlyCyclic
MeshWeaver.AI.Test.AgentChatClientUnitTest ‑ FindCyclicDelegations_MutualDelegation_ReturnsBoth
MeshWeaver.AI.Test.AgentChatClientUnitTest ‑ FindCyclicDelegations_NoDelegations_ReturnsEmpty
MeshWeaver.AI.Test.AgentChatClientUnitTest ‑ FindCyclicDelegations_OneWayDelegation_ReturnsEmpty
MeshWeaver.AI.Test.AgentChatClientUnitTest ‑ OrderAgentsForCreation_AllNonDelegating_ReturnsAll
…
MeshWeaver.Auth.Test.ApiTokenAuthenticationHandlerTests ‑ BuildClaims_AlwaysIncludesIdentityClaims
MeshWeaver.Auth.Test.ApiTokenAuthenticationHandlerTests ‑ BuildClaims_DropsEmptyRoleEntries
MeshWeaver.Auth.Test.ApiTokenAuthenticationHandlerTests ‑ BuildClaims_TokenWithRoles_StampsEachRoleAsRoleClaim
MeshWeaver.Auth.Test.ApiTokenAuthenticationHandlerTests ‑ BuildClaims_TokenWithoutRoles_HasNoRoleClaims
MeshWeaver.Auth.Test.ApiTokenServiceStaleReadTest ‑ DeleteToken_ImmediatelyAfterCreate_RemovesTheNewToken
MeshWeaver.Auth.Test.ApiTokenServiceStaleReadTest ‑ RevokeToken_AfterImmediateValidate_BlocksFutureValidation
MeshWeaver.Auth.Test.ApiTokenServiceStaleReadTest ‑ RevokeToken_ImmediatelyAfterCreate_SeesTheNewToken
MeshWeaver.Auth.Test.ApiTokenServiceStaleReadTest ‑ RevokeToken_RepeatedCreateRevoke_AlwaysSeesEachToken
MeshWeaver.Auth.Test.ApiTokenServiceTests ‑ DeleteToken_AfterDelete_ValidateReturnsNull
MeshWeaver.Auth.Test.ApiTokenServiceTests ‑ DeleteToken_AlsoRemovesIndexEntry
…

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bundles several long-running feature and stability tracks across MeshWeaver core + Memex: social publishing foundations, in-process #r "nuget:..." compilation support (node-type + interactive markdown), move-operation performance/timeout hardening, and multiple UI/stream reliability improvements. It also standardizes the code folder naming from _Source/_Test to Source/Test across code, tests, docs, and samples.

Changes:

  • Introduces MeshWeaver.Social (options, DI wiring, publish queue, credential model) plus initial Memex wiring (LinkedIn connect entry points + user menu hooks).
  • Adds MeshWeaver.NuGet resolver + directive parser and integrates it into script compilation (#r "nuget:Pkg, Version"), including cache backends and tests.
  • Improves operational robustness: parallelized recursive moves, default 30s mesh-op timeout, “no endless spinner” navigation status UI, and remote stream resubscribe behavior.

Reviewed changes

Copilot reviewed 159 out of 265 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/MeshWeaver.StorageImport.Test/StorageImporterTests.cs Updates test expectations/docs to Source/ naming.
test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs Adds stats refresher test coverage (needs deterministic timeout handling).
test/MeshWeaver.Social.Test/MeshWeaver.Social.Test.csproj Adds new Social test project referencing Social + Fixture.
test/MeshWeaver.Social.Test/InMemoryPublishQueueTest.cs Adds unit tests for publish queue due-drain + dedup.
test/MeshWeaver.Persistence.Test/FileSystemPersistenceTest.cs Updates partition tests to Source/ naming.
test/MeshWeaver.MathDemo.Test/TestPaths.cs Adds helper paths for MathDemo sample test assets.
test/MeshWeaver.MathDemo.Test/MeshWeaver.MathDemo.Test.csproj Adds MathDemo test project and copies sample graph data to output.
test/MeshWeaver.Hosting.PostgreSql.Test/SatelliteQueryTests.cs Updates code-path routing tests to Source/ naming.
test/MeshWeaver.Hosting.Monolith.Test/UserActivityAreaTest.cs Updates regression test docs to Source/ naming.
test/MeshWeaver.Hosting.Blazor.Test/NavigationServiceTest.cs Adjusts test to assert “no 404 flash” during retries.
test/MeshWeaver.Graph.Test/NuGetDirectiveParserTest.cs Adds unit tests for parsing/stripping #r "nuget:...".
test/MeshWeaver.Graph.Test/NuGetAssemblyResolverTest.cs Adds networked NuGet restore end-to-end tests (skippable via env var).
test/MeshWeaver.Graph.Test/MeshWeaver.Graph.Test.csproj References new MeshWeaver.NuGet project.
test/MeshWeaver.FutuRe.Test/MeshWeaver.FutuRe.Test.csproj Updates compile-included sample sources to Source/ paths.
test/MeshWeaver.Content.Test/CompilationErrorTest.cs Updates broken-code test to Source/ path.
test/MeshWeaver.AI.Test/MeshPluginTest.cs Updates MCP tool count expectations (adds RunTests/Move/Copy).
src/MeshWeaver.Social/SocialOptions.cs Adds configurable knobs for publishing/stats/ingest scheduling.
src/MeshWeaver.Social/SocialExtensions.cs Adds DI wiring for social publishing subsystem and hosted services.
src/MeshWeaver.Social/PlatformCredential.cs Adds credential record model (access/refresh/expiry metadata).
src/MeshWeaver.Social/MeshWeaver.Social.csproj Introduces Social library project.
src/MeshWeaver.Social/IPublishQueue.cs Adds publish queue abstraction + in-memory implementation.
src/MeshWeaver.Social/IApprovalPublishBridge.cs Defines bridge contract and PublishableSnapshot model.
src/MeshWeaver.NuGet/ResolvedPackageSet.cs Adds resolver output model (assemblies, probing dirs, versions).
src/MeshWeaver.NuGet/NuGetServiceCollectionExtensions.cs Adds DI extension to register resolver + cache.
src/MeshWeaver.NuGet/NuGetPackageReference.cs Adds package reference model (id + version range).
src/MeshWeaver.NuGet/NuGetDirectiveParser.cs Implements #r "nuget:..." extraction + source stripping.
src/MeshWeaver.NuGet/MeshWeaver.NuGet.csproj Introduces NuGet resolver project and dependencies.
src/MeshWeaver.NuGet/INuGetPackageCache.cs Adds optional persistent cache interface + null implementation.
src/MeshWeaver.NuGet/INuGetAssemblyResolver.cs Adds resolver interface returning ResolvedPackageSet.
src/MeshWeaver.NuGet.AzureBlob/MeshWeaver.NuGet.AzureBlob.csproj Adds Azure Blob cache backend project.
src/MeshWeaver.NuGet.AzureBlob/BlobNuGetPackageCacheExtensions.cs Adds DI helper to register blob-backed cache.
src/MeshWeaver.Mesh.Contract/Services/MeshOperationOptions.cs Adds mesh operation timeout options (default 30s).
src/MeshWeaver.Mesh.Contract/Services/IStorageAdapter.cs Updates docs/examples to Source/ naming.
src/MeshWeaver.Mesh.Contract/Services/INavigationService.cs Adds Status observable contract for UI progress reporting.
src/MeshWeaver.Mesh.Contract/Services/IIconGenerator.cs Adds icon generator abstraction returning an observable SVG.
src/MeshWeaver.Mesh.Contract/PartitionDefinition.cs Updates standard table mappings (Source/Testcode) and clarifies semantics.
src/MeshWeaver.Mesh.Contract/MeshExtensions.cs Adds timeout override + move timeout enforcement + grain dispose on delete.
src/MeshWeaver.Mesh.Contract/CodeConfiguration.cs Updates docs to Source/ naming.
src/MeshWeaver.Kernel.Hub/MeshWeaver.Kernel.Hub.csproj Removes Interactive package mgmt dependency; references MeshWeaver.NuGet.
src/MeshWeaver.Hosting/Persistence/MigrationUtility.cs Updates migration heuristics to include Source/Test + legacy _Source/_Test.
src/MeshWeaver.Hosting/Persistence/FileSystemStorageAdapter.cs Treats Source/Test as code paths + keeps legacy compatibility.
src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs Parallelizes descendant move I/O (with concurrency implications).
src/MeshWeaver.Hosting/Persistence/CachingStorageAdapter.cs Updates code sub-namespace detection (Source/Test + legacy).
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlPartitionedStoreFactory.cs Guards against source/test mistakenly becoming schemas.
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlCrossSchemaQueryProvider.cs Filters malformed parameters to avoid NRE during SQL interpolation.
src/MeshWeaver.Hosting.Blazor/MeshWeaver.Hosting.Blazor.csproj Adds NU1510 suppression.
src/MeshWeaver.Graph/PartitionTypeSource.cs Updates docs to Source/ naming.
src/MeshWeaver.Graph/MeshWeaver.Graph.csproj References MeshWeaver.NuGet.
src/MeshWeaver.Graph/MeshNodeLayoutAreas.cs Improves create href behavior + reactive/grouped children catalog.
src/MeshWeaver.Graph/MeshDataSource.cs Updates docs to Source/ naming.
src/MeshWeaver.Graph/Configuration/ScriptCompilationService.cs Integrates NuGet directive parsing + resolver into compilation.
src/MeshWeaver.Graph/Configuration/NodeTypeDefinition.cs Updates docs/examples to Source/ naming.
src/MeshWeaver.Graph/Configuration/MeshDataSourceNodeType.cs Changes sources namespace constant to Source.
src/MeshWeaver.Graph/Configuration/GraphConfigurationExtensions.cs Registers NuGet resolver and uses Source code path.
src/MeshWeaver.Graph/Configuration/CodeNodeType.cs Treats Code nodes as primary content; defines Source/Test constants.
src/MeshWeaver.Documentation/Data/DataMesh/UnifiedPath.md Documents @/ semantics and HTML-href pitfalls.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfileLayoutAreas.cs Adds SocialMedia profile layout areas example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfile.cs Adds SocialMedia profile content model example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/SocialMediaPost.cs Adds SocialMedia post content model example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/Platform.cs Adds SocialMedia platform reference-data example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia.md Updates docs to Source/ naming and authoring guidance.
src/MeshWeaver.Documentation/Data/DataMesh/SatelliteEntities.md Clarifies Source/Test are primary content, not satellites.
src/MeshWeaver.Documentation/Data/DataMesh/NodeTypes.md Adds Node Types documentation index page.
src/MeshWeaver.Documentation/Data/DataMesh/NodeTypeConfiguration.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/NodeOperations.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/DataConfiguration.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/CreatingNodeTypes.md Updates docs to Source/Test naming throughout.
src/MeshWeaver.Documentation/Data/DataMesh.md Updates TOC links and adds NuGet packages bullet.
src/MeshWeaver.Documentation/Data/Architecture/PartitionedPersistence.md Updates persistence routing docs for Source/Test.
src/MeshWeaver.Documentation/Data/Architecture/MeshGraph.md Updates examples to Source/ naming.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionSampleData.cs Adds cession sample dataset for docs/demo.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionResultsArea.cs Adds reactive charting layout area example.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionEngine.cs Adds pure business logic sample for cession calculations.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionData.cs Adds content models for cession example.
src/MeshWeaver.Data/Serialization/SyncStreamOptions.cs Adds configurable heartbeat interval for sync streams.
src/MeshWeaver.Data/Serialization/JsonSynchronizationStream.cs Implements resubscribe-on-owner-dispose logic.
src/MeshWeaver.Blazor/Pages/ApplicationPage.razor Switches to NavigationStatus-driven progress/not-found/error UI.
src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor.css Adds styling for full-page vs compact overlay progress bar.
src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor Adds reusable “spinner + message” component.
src/MeshWeaver.Blazor/Components/MeshSearchView.razor.cs Adds Category grouping fallback to NodeType.
src/MeshWeaver.Blazor/Components/LayoutAreaView.razor.cs Adds stream lifecycle logging and additional diagnostics.
src/MeshWeaver.Blazor/Components/LayoutAreaView.razor Surfaces compilation progress indicator before first stream emission.
src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor.css Adds styling for compilation progress banner.
src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor Adds polling UI component for active NodeType compilation.
src/MeshWeaver.Blazor.Portal/MeshWeaver.Blazor.Portal.csproj Adds NU1510 suppression.
src/MeshWeaver.Blazor.AI/MeshWeaver.Blazor.AI.csproj Adds NU1510 suppression.
src/MeshWeaver.Blazor.AI/McpMeshPlugin.cs Adds Patch/Move/Copy MCP tools and improves tool descriptions.
src/MeshWeaver.AI/ThreadLayoutAreas.cs Adds debug logging around streaming view emission.
src/MeshWeaver.AI/IconGenerator.cs Adds default AI-backed IIconGenerator implementation.
src/MeshWeaver.AI/DelegationCompletedEvent.cs Removes delegation tracker/event types.
src/MeshWeaver.AI/Data/Agent/Worker.md Updates @/ link guidance (no raw HTML href with @/).
src/MeshWeaver.AI/Data/Agent/ToolsReference.md Updates @/ link guidance and provides correct/incorrect table.
src/MeshWeaver.AI/Data/Agent/Orchestrator.md Updates @/ link guidance for agent outputs.
src/MeshWeaver.AI/AIExtensions.cs Removes old type registration; registers IIconGenerator.
memex/aspire/Memex.Portal.Distributed/Program.cs Registers blob-backed NuGet package cache in distributed deployment.
memex/aspire/Memex.Portal.Distributed/Memex.Portal.Distributed.csproj References MeshWeaver.NuGet.AzureBlob.
memex/aspire/Memex.Database.Migration/Program.cs Adds source/test to reserved schema list.
memex/aspire/Memex.AppHost/Program.cs Adds LinkedIn secret/env wiring + sets NUGET_PACKAGES cache dir.
memex/Memex.Portal.Shared/Social/SocialMediaUserMenuProvider.cs Adds “Social Media” shortcut on a user’s own node (lazy hub creation).
memex/Memex.Portal.Shared/Social/ApiCredentialNodeType.cs Adds NodeType for PlatformCredential stored under _ApiCredentials.
memex/Memex.Portal.Shared/Pages/Login.razor Adds “Connect LinkedIn for publishing” CTA on login page.
memex/Memex.Portal.Shared/OrganizationNodeType.cs Switches to default layout areas registration.
memex/Memex.Portal.Shared/MemexConfiguration.cs Adds LinkedIn publisher wiring, @/ redirect middleware, and routes.
memex/Memex.Portal.Shared/Memex.Portal.Shared.csproj References MeshWeaver.Social.
memex/Memex.Portal.Monolith/appsettings.Development.json Enables debug logging for LayoutAreaView.
MeshWeaver.slnx Adds new projects (NuGet, NuGet.AzureBlob, Social, new test projects).
Directory.Packages.props Adds NuGet.* package versions for resolver implementation.
CLAUDE.md Documents @/ local-only rule and href/URL restrictions.
(Various) samples/Graph/... Adds/updates many sample NodeTypes and content under Source/ to reflect new conventions and demos.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs
Comment thread src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs Outdated
rbuergi added a commit that referenced this pull request Apr 22, 2026
…+ test helpers

Recursive DeleteNodeRequest handled on a node's own hub was deadlocking: the final
DeleteSelfFromStorage posted Ok and DisposeRequest from the dying hub, so the Ok
raced callback disposal on the caller and was lost. Introduce CommitNodeDeletionMessage
and forward the terminal commit (storage delete + reply + grain dispose) to the
resolved mesh hub (walking ParentHub upward) — Sender becomes the stable mesh hub,
FIFO on the caller's inbound queue guarantees Ok resolves the RegisterCallback before
DisposeRequest arrives.

Also addresses two Copilot review comments on PR #95:

- FileSystemStorageAdapter.DeleteAsync empty-directory ascent is now concurrency-
  tolerant: wraps the enumerate + Directory.Delete in try/catch, swallowing the
  DirectoryNotFoundException race and breaking on IOException (non-empty / in-use).
  Required because FileSystemPersistenceService.MoveNodeAsync now parallelizes
  descendant deletes via Task.WhenAll.

- PostStatsRefresherTest.WaitUntilAsync throws TimeoutException with a descriptive
  message instead of returning silently on deadline, so the test cannot green-tick
  a stats-refresh that never happened.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented Apr 27, 2026

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown

Copilot AI commented Apr 27, 2026

@copilot resolve the merge conflicts in this pull request

Resolved. The merge with origin/main is done in commit 6899990.

Conflicts resolved:

  • .claude/settings.json — kept HEAD deletion
  • Login.razor — kept NavigateToConnectLinkedIn (HEAD) + switched to OnInitializedAsync (origin/main)
  • AppHost/Program.cs — merged origin/main's optional-secrets pattern (nullable embeddingKey/googleClientSecret, customDomainValue guard) with HEAD's LinkedIn params (also made optional/nullable) and local-test/local-prod storage branch
  • Memex.Database.Migration/Program.cs — kept HEAD's data-repair v8 (fix ThreadMessage.MainNode) and v9 (rename _Source/_Test path segments)
  • SecurityService.cs — kept HEAD's refactored CollectStaticRoleIds returning (roleIds, cap); origin/main's permission-evaluation logic is already present in the new reactive GetEffectivePermissions method

rbuergi and others added 21 commits May 5, 2026 11:53
Documents how AI provider keys, endpoints, and models are wired in
MeshWeaver after the recent refactor — single shared azure-foundry-key,
parameterised endpoints, agent-driven model selection, and zero hardcoded
model lists. Future readers wondering "where did the model dropdown go" or
"does Anthropic have its own key" can find the answer here. Cross-links
back to AgenticAI.md and McpAuthentication.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds Supports(string modelName) to IChatClientFactory with a default
implementation that honours the legacy Models[] list. Concrete factories
override with shape-aware predicates so routing works without populating
Models[]:

- AzureClaude: matches "claude*" (case-insensitive). Covers all Claude
  variants (sonnet, opus, haiku) without enumerating them.
- AzureFoundry: catch-all for everything that isn't Claude (gpt-*, o*,
  Mistral-*, DeepSeek-*, etc.) since the multi-model gateway serves them
  all through one endpoint.

GetFactoryForModel in AgentChatClient now asks each factory via Supports,
ordered by Order. Previously it called Models.Contains — which returned
nothing once env-var-populated Models[] arrays went away — and silently
fell through to the first registered factory regardless of model name.

Net effect: an agent with PreferredModel = "claude-opus-4-7" reliably
routes to AzureClaude even when no factory has Models[] populated; an
agent with PreferredModel = "gpt-5-mini" lands at AzureFoundry's /models
endpoint. Closes the open routing gap from the prior commit
(86c4c95 fix(ai): agent PreferredModel wins; aggregate factory models;
parameterize endpoints).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the "Open question" section with a concrete routing table now that
IChatClientFactory.Supports is in place. Agents declaring claude-* land at
AzureClaude; everything else routes through the AzureFoundry catch-all.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…estions

Postgres partitions are schemas, not MeshNodes — so a node-only fan-out in
RoutingMeshQueryProvider.AutocompleteAsync would never match a partition NAME.
Typing @/rbu produced no suggestion for the rbuergi partition, and @/ never
listed empty schemas at all.

Emit partition keys directly (filtered by prefix + searchableSchemas) AFTER
the fan-out, skipping any partition whose root MeshNode already surfaced
upstream — so file-system partitions like ACME/Systemorph keep their icon-
bearing root suggestion and aren't shadowed by a bare key entry. Empty
Postgres schemas now appear in @/ and @/<prefix> partition lists.

Tests:
- RoutingAutocompletePartitionTests: 5 cases covering empty/matching/non-
  matching prefixes against real Postgres schemas, plus a mixed Postgres+
  static-node scenario verifying both surface without one shadowing the other.
- RoutingObserveQueryPartitionTests: 3 cases for partition discovery in
  ObserveQuery (added in the prior session).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- StartCompile.UpdateMeshNode writes are Observable.Create — now Subscribed
  so Compiling/Ok/Error status flips actually fire (previously the IObservable
  was discarded so CompilationStatus never settled).
- IsSourcesUpToDate + DiscoverSourceVersionSnapshot key on LastModified.UtcTicks
  instead of MeshNode.Version. Version is bumped only by the local
  MeshNodeTypeSource and didn't propagate through the synced mesh-level query
  the handler reads, so post-edit IsSourcesUpToDate returned true and the
  recompile was skipped.
- AwaitCompilationSettled extension on IObservable<MeshNode> serialises
  HandleCreateRelease and NodeTypeContractHandler against any in-progress
  compile: requests arriving while CompilationStatus = Compiling wait for the
  next non-Compiling emission instead of reading the previous release's
  HubConfiguration.
- NodeTypeDefinition.RequestedReleasePath pins the active release on the
  NodeType; NodeTypeContractHandler resolves the pinned _Release/{version}
  MeshNode and serves its AssemblyPath. Null = serve latest (existing flow).
- Test: NodeType_RequestedReleasePath_PinsToHistoricalRelease covers the pin
  path; existing recompile test now uses ReadOverviewMatchingAsync to wait
  for the V2 marker instead of racing the per-instance hub's pre-V2 tick.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…able

Per the project's "nothing async ever" rule, the orchestrator no longer
streams CompletionBatch via IAsyncEnumerable + Channel + ProducerTracker +
ResultCounter. Instead each producer is an IObservable<CompletionBatch>
(emits 0..N batches then completes); the orchestrator composes them with
Observable.Merge for parallel sources and Concat+Defer for the A+B-then-C
broadening dependency. The merged stream's OnCompleted fires when every
producer has finished, which the chat input consumes to drive its loading
indicator.

Side effect: fixes the long-standing "isPartition: true" blanket flag —
ProducePartitionList now filters to single-segment paths (the actual
partitions), so deep node paths that match the filter no longer get
partition styling (trailing slash, partition icon).

Other changes:
- IChatCompletionOrchestrator.GetCompletions returns IObservable<CompletionBatch>;
  GetCompletionsAsync kept as an extension method for IAsyncEnumerable consumers.
- SendAutocompleteRequest converted from Task<AutocompleteResponse> +
  TaskCompletionSource to IObservable<AutocompleteResponse?> via
  hub.Observe + Take(1) + Timeout.
- Architecture/BlazorDataBinding.md: new section on multi-source streams
  with completion-driven UI state (Defer + Finally + DistinctUntilChanged).
- Timing-analysis test added (TimingAnalysis_OrchestratorScenarios_RecordsLatencies)
  records per-scenario first-batch / per-category / completion latencies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per Doc/Architecture/AsynchronousCalls.md: every async WithInitialization
that bridges a Task in hub-reachable code is a latent deadlock.

ThreadExecution.cs: 4 hooks (SetThreadHubIdentity, RecoverStaleExecutingThread,
WatchForExecution, InstallSubmissionWatcher) now use the sync
Action<IMessageHub> overload. Bodies were already reactive — Subscribe(...)
on hub.GetMeshNode / hub.GetWorkspace().GetMeshNodeStream — the only thing
the async wrapping bought was a deadlock surface.

MonolithMeshTestBase.cs:870: routingService.RegisterStreamAsync result is
discarded; conversion preserves identical sync-completing semantics in
monolith.

KernelContainer.cs:88: DisposeOnTimeout + StartActivityControlPlane are sync.

DataExtensions.cs:90-92: gate-opener init now sync. Also bundles a
parallel-session WIP edit (SubscribeAck post in HandleSubscribeRequest)
that closes the 30s hub.Observe(SubscribeRequest) callback — directly
relevant to the J prod-chat investigation; included here because the file
diff couldn't be cleanly split.

KNOWN ISSUE: Threading.Test currently surfaces 7 ObjectDisposedException
catastrophic test failures (ThreadExecutionPersistenceTest x3,
ChatHistoryTest x2, DelegationFailureTest, ThreadResumeTest) AFTER this
commit. These were not exposed before because async-init hid the issue.
Latent bug; debugging in next commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
InMemoryMeshQueryCore.ObserveQueryInternal subscribes to RunQuery() on
Scheduler.Default. The async query can complete AFTER the consumer has
disposed the CompositeDisposable (e.g., test teardown, subscriber cancel).
When the initial-results callback then tries to add a fresh
Subject<DataChangeNotification> to the disposed CompositeDisposable, the
Subject is instantly disposed, and the subsequent .Subscribe(changeBuffer)
throws ObjectDisposedException. The exception propagates through Rx as a
catastrophic test-host crash via Stubs.b__2_1's rethrow.

Fix:
- Early-return when disposables.IsDisposed at the top of the callback.
- Wrap the change-subscription block in try/catch ObjectDisposedException so
  the TOCTOU race between the IsDisposed check and the actual Subscribe
  calls (which can fire disposal mid-setup) tears down cleanly.

Pre-existing race; surfaced by the sync-init conversion (commit 3b19ae3)
because faster init lets test teardown reach disposal before the async
RunQuery completes. With this guard, ExecuteThread_PersistsToCorrectPartition
no longer crashes on ObjectDisposedException; it now fails for a separate
reason (TaskCanceledException — test-logic timeout) which is a distinct
issue.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DataExtensions.cs already references SubscribeAck (committed in 3b19ae3)
but the type definition and the IRequest<SubscribeAck> change on
SubscribeRequest were left uncommitted. CI build broke with CS0246 +
CS1503. This commit adds the type so CI can compile again.

The SubscribeAck flow itself is the parallel-session fix that addresses
the prod chat 30 s SubscribeRequest timeout — owner hub posts SubscribeAck
immediately on subscribe so the hub.Observe(SubscribeRequest) pending
callback closes promptly. DataChangedEvents flow independently via
RouteStreamMessage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Many uncommitted edits accumulated in the working tree from the
parallel-session work alongside this session's commits. Bundling them so
CI sees a coherent state — local build green, the parallel session can
re-organise into smaller commits later.

Includes:
- src/MeshWeaver.Hosting/Completion/ChatCompletionOrchestratorExtensions.cs (new)
- src/MeshWeaver.Mesh.Contract/NodeTypeCompileRequest.cs (new)
- test/MeshWeaver.Hosting.Monolith.Test/MeshHubRemoteStreamTest.cs (new)
- test/MeshWeaver.Hosting.PostgreSql.Test/Routing*PartitionTests.cs (new)
- 28 modified files across memex/, src/, test/

No appsettings, secrets, env files, or launchSettings are included.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI is bursting the logs because xUnit captured stdout includes every
hub-level Debug trace per test, multiplied across the suite. Ship Warning
as the default; individual test classes can opt into more verbose levels
locally when they need to diagnose.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…bservable patterns

The flaky tests on CI traced to two anti-patterns the CQRS/AsynchronousCalls
docs explicitly call out:

1. ReadNodeAsync(satellitePath) hangs 30s — ApiToken nodes are
   IsSatelliteType, no per-node hub activates, so posting GetDataRequest
   to the path's address never gets a route. The 30s ReadNodeTimeout
   throws TimeoutException and breaks the surrounding 10s polling loop.
   Switch the three offending ApiTokenServiceTests reads to a single
   live ObserveQuery().Scan(...).Where(condition).Take(1) pipeline that
   folds Initial / Added / Updated / Removed deltas into a current
   MeshNode? value — no polling, no Task.Delay, the timeout fires only
   if the condition genuinely never becomes true.

2. CodeEditRecompileTest used FindNodeAsync (NodeFactory.QueryAsync
   path:X) for known-path reads, which the lagged read-side index
   could miss right after a write. Inlined the live
   GetMeshNodeStream(path) primitive (per the user's "if you know
   path, use meshnode stream" guidance) for the source-node read
   in the existing recompile test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ActivityLogBundler's flush callback was Func<ActivityLog, Task>, and
FlushBundleAsync did `await _onFlush(log)` on the timer / threadpool thread.
The default callback (ActivityTrackingExtensions) then did:

    var hubNode = await persistence.GetNode(log.HubPath).FirstAsync().ToTask();

— which waits on the persistence stream while the timer thread is blocked.
Under load (Create-then-Patch back-to-back in tests), this back-pressured
every subsequent activity-tracked write: the persistence call itself queues
behind the previously-blocked thread's continuation, and Patch's UpdateNode
round-trip never completes its 30s budget.

Fix: bundler callback is now Action<ActivityLog>; the body composes
GetNode().Take(1).SelectMany(SaveNode).Subscribe — pure observable, no Task,
no await. Try/catch wraps the sync invocation so a buggy callback can't
crash the timer thread.

Restores McpReadYourWritesTest.Patch_ImmediatelyAfterCreate (was 30s
timeout, now passes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Typing @/ in chat used to fan out to every partition's autocomplete just to
pluck out partition KEYS — a 4.6 s cold-start cost on the file-system fixture
and worse against Postgres at scale. The fan-out was wasted work because @/
only needs the list of partitions, not their contents (drilling into a
partition only happens once the user types the second slash).

UserAccessiblePartitionsCache (scoped per circuit) holds the RLS-filtered
ImmutableDictionary<string, IMeshQueryProvider> in a ReplaySubject(1).
Pre-warmed in the constructor via Observable.FromAsync on Scheduler.Default
that runs DiscoverNewProvidersAsync + GetSearchableSchemasAsync once and
emits the snapshot. Subscribers see the current value synchronously after
warm-up; new subscribers always get the latest snapshot.

ChatCompletionOrchestrator.ProducePartitionList now reads from the cache
directly — no fan-out, no per-partition autocomplete, no I/O against
partition contents. Partition fan-out only happens on @/Partition/ which
routes straight to one provider.

Timings (file-system fixture, 7 scenarios):
                                    Before    After
  @/ → all partitions               4580 ms   13 ms    (350x)
  @/Sys → filtered partitions         47 ms    0 ms    (instant)
  @mark → in-partition broaden      1577 ms  342 ms    (4.6x)
  @acm → cross-partition broaden     763 ms   87 ms    (8.8x)
  @/ACME/Project → deep path          38 ms   23 ms    (1.7x)

Refresh() method on the cache for explicit invalidation; future hooks for
new-partition events / access-control changes can call it.

All 9 ChatCompletionOrchestratorTest + 23 AutocompleteIntegrationTest cases pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After the .NET Interactive removal, scripts can still resolve
`#r "nuget:Foo, 1.2.3"` via INuGetAssemblyResolver, but the only configured
source was nuget.org. That made every test of our own #r flow flake on
network and meant scripts couldn't pull in MeshWeaver.* packages without
publishing them to nuget.org first.

- nuget.config: register `dist/packages/` as the `mesh-local` source and
  pin MeshWeaver.* / Memex.* to it via packageSourceMapping. NuGet picks
  the source up automatically through Settings.LoadDefaultSettings(null)
  in NuGetAssemblyResolver, so no resolver changes needed.
- ScriptExecutionInUserHomeTest: new test
  NuGetDirective_ResolvesAgainstLocalMeshFeed_AndScriptUsesIt that does
  `#r "nuget:MeshWeaver.Application.Styles, 3.0.0-preview1"` and uses
  FluentIcons from the resolved assembly. Skips with a clear message
  when dist/packages/ hasn't been populated (fresh clone before
  `dotnet pack` ran), so the missing-artefact case is visible instead
  of failing with a misleading resolver error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stack trace from launching a thread in the GUI:
  System.NullReferenceException
    at MeshWeaver.Blazor.BlazorView`2.BindData() ... :line 232
    at BindDataAfterParameterReset() ... :line 54
    at OnParametersSet() ... :line 47

ViewModel is declared `required`, but Blazor's parameter pipeline can still
feed null transiently during thread-launch / chat-side-panel re-render,
when an upstream Stream is being torn down and a new ViewModel hasn't
landed yet. The accessor `.Id` then NREs and crashes the render.

The BindData method itself is from 2024-08 and the .Id access from 2025-07
(commit f7931d1) — not new code. The crash surfaces now because recent
upstream changes (most likely the IChatCompletionOrchestrator IObservable
refactor in 3316d9d and the chat-side-panel rebuilds) re-render through
this code with the new transient-null timing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sion

ThreadSubmissionServer.InstallServerWatcher previously had:
- a Subject-Subscribe with Throttle(50ms) to coalesce rapid AppendUserMessageRequest
  patches into a single round
- an imperative `dispatching` Interlocked flag held across the IsExecuting=true
  round-trip to prevent double-dispatch from the watcher's own writes
- AsyncLocal-vs-CircuitContext-vs-MeshNode.CreatedBy identity-fallback bookkeeping
  to deal with the Throttle scheduler hop losing AsyncLocal context
- ~200 lines that were hard to reason about end-to-end

Replaced with a pure-Rx chain via the new shared helper
ActivityControlPlaneExtensions.WatchSubmission (sibling of the existing
WatchControlPlane):

  GetMeshNodeStream()
    .DistinctUntilChanged(fingerprint)   // re-emission of same state cannot fire twice
    .Where(needsDispatch)                // not-executing AND has unprocessed messages
    .SelectMany(DispatchRoundObs)        // each dispatchable state → one round observable
    .Subscribe(...)

DispatchRoundObs wraps the existing imperative DispatchRound body in
Observable.Create so the outer chain composes via SelectMany; the round body
itself stays unchanged for now (cells + commit + _Exec post).

Throttle is gone — single submit per click; UI input-disabled guard handles
rapid double-clicks. Identity fallback is gone — caller's scheduler runs the
Subscribe, AsyncLocal flows naturally.

Helper lives in MeshWeaver.Mesh.Contract so future job-orchestration cases
share the same shape (kernel's StartActivityControlPlane already uses the
sibling WatchControlPlane helper).

Note: ThreadSubmissionIntegrationTest.Submit_SingleSubmit was failing before
this refactor and still fails — the underlying cause is the F1 propagation
bug in MeshDataSource.AddWorkspaceReferenceStream<MeshNode> for
MeshNodeReference() (Path=null), where UpdateMeshNode writes to the primary
EntityStore stream but the reducer goes through CollectionReference which
doesn't see the writes. Separate fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add the new WatchSubmission section to ActivityControlPlane.md alongside
the existing WatchControlPlane section. The two helpers cover the same
shape — observe own MeshNodeReference stream, DistinctUntilChanged on a
fingerprint, project / filter, dispatch — but WatchSubmission generalises
the trigger from a single status field to an arbitrary "needs work"
predicate. Used for thread-submission round dispatch (and any future
job-orchestration case where the trigger isn't a single property).

Also adds an "Anti-patterns to remove on sight" section that names what
the new helpers replace:
  A. Imperative Subject + Interlocked dispatching flag + Throttle watcher.
  B. Verb-shaped per-operation request types (StartXRequest, etc.).
  C. Synchronization living in the caller (click handler) instead of the
     owning hub's WatchSubmission.
  D. async Task init hooks whose body just calls Subscribe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
UserNodeType.WithPortalCreate + UserAccessRule restrict User-node creation
to portal/* identities, so DevLogin's "Roland" hit "Access denied: Create
permission required" on every CreateNodeAsync(...) call. RlsNodeValidator
already has an own-scope bypass when nodePath == userId — production
onboarding rides exactly that shape — so each test now sets
AccessContext.ObjectId = the userId being created before the create.

3/3 passing (was 0/3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…easeRequest

Tests #2 + #5 from the failing-tests task list. Both used the legacy
`UpdateNode(Content with { CompilationStatus = Pending })` pattern that
relied on InstallCompileWatcher to pick up the flip and trigger compile.
That watcher was removed in 86b3470 when compile became
CreateReleaseRequest-only — so the tests timed out at 45s waiting for the
watcher that no longer exists.

Switched both call sites to post CreateReleaseRequest directly. Same
mesh-side machinery (HandleCreateRelease → StartCompile), same
post-compile state writes (CompilationStatus.Ok + AssemblyLocation +
LatestReleasePath via workspace.UpdateMeshNode), so the downstream
SyncedQuery / RemoteStream observers see the same terminal frame they
were already waiting for.

- SyncedQueryCrossSiloTest.DynamicCompile_OnSiloA_ResultIsObservableOnSiloB_ViaSync:
  Mesh.Observe(new CreateReleaseRequest(), o => o.WithTarget(typePath))
  in place of the slow-path UpdateNode(Pending).
- NodeTypeReleaseTest.CompilationPending_CreatesReleaseMeshNode_WithNotes:
  same swap; ReleaseNotes are still pre-set on the NodeType so
  HandleCreateRelease can read them and seed the Release node.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…istent

Tasks #3, #4, #29 from the failing-tests list.

1. MeshDataSource.StartCompile no longer calls
   nodeTypeService.InvalidateCache(hubPath) after a successful compile.
   Each compile produces a fresh timestamp-keyed AssemblyLoadContext
   under {cacheDir}/{nodeName}_{ticks_hex}/ — V1 and V2 ALCs coexist
   happily. The InvalidateCache path called cacheService.InvalidateCache,
   which unloads every ALC matching the NodeName — including the V2 ALC
   we'd just created. Next consumer (instance2's per-node hub asking the
   NodeType hub for HubConfiguration via GetCompilationPathRequest) raced
   the AssemblyLoadContext.Unload window and fell back to the previous
   release's lambda, so instance2 served V1 markup after a V2 recompile.
   NodeTypeContractHandler resolves AssemblyLocation directly off the
   post-compile MeshNode anyway — there's no NodeTypeService cache that
   needs flushing.

2. ApiTokenService.DeleteToken returns false early when the index lookup
   doesn't find the token node. Previously it fell through to
   nodeFactory.DeleteNode(path), which throws InvalidOperationException
   ("Node not found: …") in MeshService.DeleteNode for missing paths.
   Same shape as RevokeToken (which already returned false for absent).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rbuergi and others added 30 commits May 14, 2026 08:33
Two bugs caused recursive-delete tests to fail:

1. The per-node hub's persistence sampler re-wrote a node to storage
   ~150ms after a recursive parent delete removed it — the workspace
   reducer keeps the cached MeshNode and Sample buffers it through the
   200ms window. Gate the sampler on cache.IsDeleted so a deleted node
   stays deleted.

2. Recursive delete fanned out via Observable.Merge, so sibling deletes
   that passed validation could physically delete before a failing
   sibling reported back — leaving the subtree partially destroyed.
   Add PreValidateDescendantsObs: pre-validate every descendant before
   any storage side effect, abort the whole operation on first failure.

DeleteNodeBehaviorTest 7/10 -> 10/10; Graph.Test 259/259;
NodeOperations.Test 74/74.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three CI-only flakes, all the same root cause — a bare FirstAsync()
or missing wait probe grabbed a stream's empty Initial emission before
the meaningful state landed:

- McpAccessControlTests.SetupTestData: probed SharedOrg permissions but
  never User2's PrivateOrg grant, so McpUpdate_User1CannotUpdatePrivateOrg
  _User2Can raced ahead and read PrivateOrg/Secret as null. Add a 4th
  probe for User2 Read at PrivateOrg/Secret.
- EditorTest.TestEditorWithoutResult: bare FirstAsync() on GetControlStream
  could return the null Initial. Add the `x is not null` predicate, same
  as the sibling TestEditorWithResult already does.
- LayoutTest DataBoundCheckboxes: GetDataStream().FirstAsync() could grab
  the empty Initial snapshot before the 3-item collection arrived. Filter
  with Where(x is { Count: 3 }).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…amCache; drop NodeTypeServiceHub

The NodeTypeService → NodeTypeEnrichmentHelpers migration left the hot
path opening its OWN remote stream + rolling its OWN Pending-flip, a
second uncoordinated compile trigger that stranded the wait when neither
trigger ran — persisted-NodeType hub activation deadlocked
(ReadNodeAsync hung 10s+ across ~25 Persistence/Content tests).

Changes:

- NodeTypeEnrichmentHelpers.EnrichWithNodeType slow path now consumes
  the ONE shared stream from INodeTypeStreamCache.GetStream(nodeType) —
  a Replay(1).RefCount() subscription per NodeType path held in the
  cache's concurrent dictionary, with MaybeKickCompile firing the
  compile exactly once on first touch. No more inline GetRemoteStream
  or Pending-flip.

- Eliminated NodeTypeServiceHub entirely. It existed on the premise
  "the mesh hub must not be the requesting workspace for cross-hub
  remote streams" — but GetMeshNodeStream for a remote path returns an
  ISynchronizationStream that runs on its OWN hub; the requesting
  workspace only dispatches the initial SubscribeRequest. The dedicated
  hosted hub bought nothing. NodeConfigurationResolver + MeshNodeHubFactory
  now inject IMessageHub (the mesh hub) directly.

- Deleted the dead src/MeshWeaver.Graph/Configuration/NodeTypeStreamCache.cs
  (registered but consumed nowhere; the live one is the
  INodeTypeStreamCache impl in MeshWeaver.Hosting).

- HandleSaveMeshNode no longer synthesises a Version 0→1 bump — a
  never-mutated seeded node legitimately stays at Version 0. Version is
  the owning hub's logical clock, stamped by MeshNodeStreamHandle.Update.
  New doc: Doc/Architecture/MeshNodeVersioning.md ("1 op = 1 change").

Result: no more deadlock — Persistence suite went from hanging to
15/20 passing. Remaining failures are an illegal test fixture
(MeshNodeVersionSyncTest seeds type/graph with no Configuration + code
as a partition object instead of child Code nodes) — fixed next.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two File.AppendAllText debug lines were accidentally swept into f7958b7
by a concurrent commit while diagnosing the PgBacked synced-query test.
They write to a hardcoded C:\tmp path and have no place in the codebase.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ared stream cache

The NodeType compile pipeline never settled — a NodeType definition's hub
hung on activation because EnrichWithNodeType waited forever for the
NodeType to reach Ok/Error. Root causes, all fixed here:

1. MaybeKickCompile + InstallCompileWatcher kickoff gated on
   `node.AssemblyLocation`. For a NodeType DEFINITION node, that field is
   the FRAMEWORK assembly (MeshWeaver.Graph.dll) — its own NodeType is
   "NodeType", so enrichment always stamps it. The compile never kicked.
   Gate purely on the NodeTypeDefinition's compile state instead.

2. WithGraphTypes ran only on the mesh hub. The compile-activity
   round-trip (RunCompileRequest → activity hub → RunCompileResponse →
   NodeType hub) failed to deserialise on per-node hubs:
   "type 'RunCompileResponse' is not registered in this hub's
   TypeRegistry". Added RunCompile*/CreateRelease*/RunTests* to
   WithGraphTypes AND applied it to every per-node hub via
   ConfigureDefaultNodeHub.

3. NodeTypeCompilationActivity.AppendLog/Update called GetRemoteStream
   with the activity hub's OWN address — "Owner cannot be the same as
   the subscriber". Rewired to GetMeshNodeStream(activityPath).Update(),
   which auto-detects own-vs-remote and writes the property directly.

4. The core bug: NodeTypeCompileActivityHandler.WriteToParent updated
   the parent NodeType through an ad-hoc
   `activityWorkspace.GetRemoteStream(parentAddress)`. That is a SEPARATE
   stream instance — its update is "lost", never seen by the readers of
   the cached stream. So `type/graph` only ever showed CompilationStatus
   = Pending, never Compiling/Ok. Fix: INodeTypeStreamCache gains an
   Update(path, fn) method; the activity hub now reads AND writes the
   parent through the ONE shared cached MeshNodeStreamHandle. NodeType
   compile state finally lands and propagates to every subscriber.

NodeTypeServiceHub eliminated earlier; NodeTypeStreamCache is a pure
per-path stream cache (no compile-kick side-effect — the per-NodeType
hub's own watcher kickoff handles that on its own local stream).

Test fixtures (MeshNodeVersionSyncTest, DataContextIntegrationTest) were
also illegal — seeded NodeTypes with no `Configuration` and source as a
SavePartitionObjects blob instead of a child Code node. Reshaped to the
legal pattern (Configuration + child Code MeshNode at {type}/Source/code,
State=Active, record renamed off the MeshWeaver.Graph namespace clash).

MeshNodeVersionSyncTest + DataContextIntegrationTest: 8/8 green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- HierarchicalPathDeletion: ambiguous Observable.Merge crefs → plain <c>.
- MeshExtensions: DeleteLeavesUpward cref → HierarchicalPathDeletion.DeleteSubtree.
- SaveMeshNodeRequest: IStorageService.Save/DeleteNode crefs → IStorageAdapter.Write/Delete (IStorageService was deleted).
- OwnNodeStreamExtensions.WithOwnNodeStream + 9 IStorageAdapterTestExtensions
  test-bridge methods: add missing XML doc comments.
- HierarchicalPathDeletionTests: two tests async/await instead of
  resultTask.GetAwaiter().GetResult() (xUnit1031 blocking-task).
- PartitionedFileSystemPersistenceTest / PartitionedSchemaTests:
  pass TestContext.Current.CancellationToken to GetChildrenAsync /
  GetPartitionObjectsAsync (xUnit1051).
- NavigationServiceTest: discard the IObservable from a Received()
  verification (CS4014 — System.Reactive's GetAwaiter makes it awaitable).
- DataContextIntegrationTest: _persistence! null-forgiving (CS8604).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…lve parent, no activity race

CreatableTypesProvider:
- ObserveQuery was passed `options: null!` — NRE'd inside IMeshQueryCore,
  swallowed by the .Catch, so every NodeType query returned empty
  (CreatableTypes showed only static + global types). Pass the hub's
  real JsonSerializerOptions.
- GetCreatableTypes now self-resolves the parent node when the caller
  passes parentNode=null: a caller's short best-effort lookup (e.g.
  NavigationService's 2s, the Todo test's 5s) could time out on a cold
  NodeType compile and hand us null, silently dropping the
  "child NodeTypes of the parent's type" query (Q2). The provider does
  its own GetMeshNodeStream(nodePath).Take(1).Timeout(30s) so the
  result is correct regardless of caller timing.

NodeTypeCompilationActivity.Start:
- Returned the activity path SYNCHRONOUSLY, before the activity node's
  CreateNode completed — the caller then posted RunCompileRequest to a
  not-yet-routable activity ("NotFound for .../_Activity/compile..."
  routing warning, intermittent compile loss). Now returns
  IObservable<string> that emits the path only after CreateNode
  commits; InstallCompileWatcher subscribes and posts RunCompileRequest
  in the OnNext. Inline-fallback path (no IMeshService) preserved by an
  upfront service check.

TodoCreateFlowTest.CreatableTypesProvider_ReturnsCreatableTypes: green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
RoutingMeshQueryProvider.ObserveQuery fanned out over a static snapshot
of the partitions known at subscribe time. A synced query opened before
its target partition existed stayed frozen on that empty snapshot — the
partition provisioned later (e.g. the write that creates the first node
under a brand-new org) was never folded in.

Concretely: EffectivePermissionPostgresTest.RuntimeCreateNode_Access
Assignment_PgBacked_GrantsPermission — SecurityService's per-scope synced
query for PgRuntimeOrg/_Access subscribes during the `before` check;
~0.4s later the AccessAssignment write provisions the PgRuntimeOrg
partition; the frozen fan-out never re-queried it, so the permission
check timed out.

- RoutingPersistenceServiceCore: every _queryProviders[…] = … assignment
  now funnels through RegisterQueryProvider, which publishes on the new
  ProvidersAdded hot observable.
- RoutingMeshQueryProvider fan-out: captures the snapshot's covered keys,
  watches ProvidersAdded for genuinely-new keys, and folds each late
  provider's stream in — re-tagging its Initial as Added so consumers
  never see a second Initial. No path-segment==partition-key assumption
  (that broke two earlier attempts on RoutingObserveQueryPartitionTests).

PostgreSql.Test 279/281, Query.Test 319/321 — remaining failures are
pre-existing or pass in isolation (suite-load flakes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…discovery

GetCreatableTypes layered the parent NodeType's explicit CreatableTypes
list ON TOP of auto-discovery instead of letting it FILTER discovery —
an instance of a type with CreatableTypes:["X"] still offered every
discovered NodeType plus the globals.

- BuildInfos: when the parent NodeTypeDefinition carries an explicit
  CreatableTypes list, it is treated as an authoritative whitelist —
  the synced-query rows and static NodeType registrations are filtered
  down to that set.
- ResolveParentNodeTypeDefinition: the parent def is now resolved live
  via GetMeshNodeStream (static config first), so RUNTIME NodeTypes —
  which are absent from MeshConfiguration.Nodes — surface their
  CreatableTypes / IncludeGlobalTypes settings. Previously the def was
  only read from static config and runtime types silently fell through.
- Results are ordered by Order ascending so globals (high Order) land
  at the end of the create menu.
- NodeTypeDefinition.IncludeGlobalTypes: JsonIgnoreCondition.Never —
  the property initializer defaults true, so an explicit false equalled
  default(bool) and the hub's WhenWritingDefault policy dropped it,
  round-tripping back to true and silently re-enabling globals.

CreatableTypesIntegrationTest 16/16 green (was failing
WithExplicitConfig_OverridesAuto, SortedByOrder, ProductLaunch_*).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ExecuteDelegationAsync created the sub-thread node but nothing ever
addressed the sub-thread hub, so its WatchForExecution init hook never
ran — the sub-agent never executed, the response cell never finalised,
and the delegate_to_agent TaskCompletionSource hung forever.

Now subscribe to the sub-thread's own remote stream: the SubscribeRequest
activates the hub, and IsExecuting flipping to false is the completion
signal. Pure remote-stream communication — no awaited messages, no posts
to the sub-thread.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ease artifacts

EnrichWithNodeType stranded CreateNodeRequest for the full 30s
SlowPathTimeout when a node's NodeType pointed at a path that resolves
to a plain node (not a NodeTypeDefinition) — nothing there ever
"settles". It now treats "not a NodeTypeDefinition" as a terminal state
and falls through to the default hub config immediately.

Validate_JsonNodeImages now skips generated _Release/ compile artifacts:
they are produced at runtime, not hand-authored sample content, and a
concurrent compile test dropping a stray _Release/*.json into the sample
tree was flaking this test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ependent tests

Four CI search failures, three distinct causes:

- MeshNodeLayoutAreas: the NodeType "Search" hidden query
  (`namespace:{type} scope:descendants -nodeType:Code -nodeType:NodeType
  -nodeType:Markdown`) dived into satellite subtrees. _Activity
  compile-activity nodes (NodeType="Activity", added recently) slipped the
  per-nodeType exclusion list, surfacing a "Compile {path}" row —
  EuropeRe_LineOfBusiness_Search found 9 instead of 8. Add `is:main`: it
  drops every MainNode-carrying satellite in one filter. `-nodeType:Code`
  stays — the file-system loader leaves Source/*.cs MainNode null so they
  read as main nodes (the 2 extra rows in LineOfBusiness_Search's 12).

- UnifiedReferenceAutocompleteProviderTest.Provider_AtPro: `@Pro` matches
  the nested NodeType ACME/Project, but the provider's root-scoped
  autocomplete only fans out to partitions the router has discovered. The
  test never warmed ACME, so it passed only when an earlier shared-mesh
  test happened to load it (failed in CI's order). Add an explicit ACME
  warm-up.

- McpAccessControlTests.McpSearch_User1SeesOnlyPermittedNodes: the
  access-filtered query path (RlsNodeValidator) is validated by the
  queried partition hub's own scoped SecurityService, which settles
  independently of the mesh-hub one SetupTestData probes. Add
  WaitForFilteredQuery — after LoginWithToken, wait until the filtered
  query reflects the expected access state before asserting.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… the owner

UpdateRemote built its ChangeItem<MeshNode> with the 3-arg constructor,
which leaves Updates empty. CreateExternalClient's owner-forwarding
subscription converts the ChangeItem via ToDataChangeRequest (reads
ChangeItem.Updates) and then filters with
.Where(Creations/Updates/Deletions any) — so an empty-Updates ChangeItem
was silently dropped at the client-side sync hub and never reached the
owning hub. Symptom: a remote `RequestedStatus = Cancelled` patch (the
canonical Activity Control Plane cancel) vanished, the script ran to
completion, and Cancel_Via_RequestedStatus_Patch failed with Succeeded.

UpdateRemote now builds the full 6-arg ChangeItem with ChangeType.Patch
and an EntityUpdate payload — the same shape ThreadExecution's
UpdateResponseCell / PushToResponseMessage already use.

Also widens that test's timings: the script delay 800ms→15s and the
stream waits 10/15s→30s. The cancel round-trip (observe "starting" →
patch → control-plane watcher → CancelScriptRequest → CTS) could not fit
in 800ms behind a cold Roslyn compile; the delay is interrupted on
cancel so a long value costs nothing when the mechanism works.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ull/null

The previous UpdateRemote fix also flipped ChangeType.Full→Patch and
ChangedBy=null→StreamId, which shifted behaviour for every remote
MeshNode update and cascaded into NodeType compile write-back failures
(15 Acme tests went red in CI off one ACME/ProductLaunch/Todo compile
miss).

Narrow it to exactly the bug: keep the original 3-arg ctor semantics
(ChangeType.Full, ChangedBy null) and only add the EntityUpdate payload
that CreateExternalClient's owner-forwarding subscription needs — without
it, ToDataChangeRequest yields an empty DataChangeRequest and the
.Where(has changes) filter silently drops the patch before it reaches
the owner.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Ok + framework redeploy

The per-NodeType compile kickoff trusted a bare CompilationStatus == Ok
and skipped the compile. But CompilationStatus / AssemblyLocation are
persisted into the NodeType MeshNode JSON, so a stale Ok outlives the
assembly that produced it: seed-data pollution (a prior run stamps Ok
into sample data a later run reads back), cleaned-up temp/.mesh-cache
DLLs, fresh CI checkouts, and — newly handled — a MeshWeaver redeploy
that leaves the cached DLL ABI-stale. Result: the kickoff skips, no
recompile runs, every instance hub falls back to the default config
(no MeshNodeReference reducer) → "No reducer defined for
MeshNodeReference" → whole NodeType families of tests cascade-fail
(16/68 in Acme.Test locally; intermittent in CI).

NodeTypeCompilationHelpers.HasUsableBuild now skips a (re)compile ONLY
when all three hold: status == Ok, AssemblyLocation points at a DLL that
still exists on disk, and CompiledFrameworkVersion equals the live
framework version. Anything else recompiles — a cold hub start is now
self-healing.

Framework identity (NodeTypeCompilationHelpers.FrameworkVersion):
deployed builds use the AssemblyInformationalVersion semver stamped by
the pack process (e.g. 3.0.0-preview2 — identical on every server,
unlike a per-machine file write-time); un-packed dev builds keep the
frozen 1.0.0 default, so the MeshWeaver.Graph assembly's last-write time
is folded in to keep dev iteration honest. RunCompile stamps
NodeTypeDefinition.CompiledFrameworkVersion on every success.

Adds NodeTypeCompilationHelpersTest (11 cases over HasUsableBuild) and a
prominent Architecture/NodeTypeCompilation.md covering the full
lifecycle: triggering a compile, watching progress, cancelling, where
releases live, pinning an instance to a fixed release, and the
verify-before-skip / framework-version-freezing rules.

Full MeshWeaver.Acme.Test: 68/68 (was 16 failing).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ibuted

Base appsettings.json (no Production override exists, so this IS prod)
had Default: Warning — framework activity was invisible in production.
Flip MeshWeaver.* to Information so node creation, NodeType compiles,
chat rounds, mesh operations and auth are observable; keep third-party
(Microsoft / Orleans / Azure / Npgsql / System) at Warning and keep the
few genuinely chatty MeshWeaver namespaces pinned to Warning. Per-message
MESSAGE_FLOW / SYNC_STREAM / routing traces are Debug/Trace and stay off.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ssage/content

With MeshWeaver.* now at Information in prod, audited what message logging
actually emits. Two leaks dumped the entire message body:

* MessageService logged deliveries via raw `{@delivery}` destructuring at
  three sites — two LogDebug and, critically, the always-on
  handler-exception LogError. Raw destructuring bypasses
  LoggingTypeInfoResolver, so [PreventLogging] members were serialized
  anyway. Routed all three through a new LogText helper that serializes
  via LoggingSerializerOptions (resolver honoured) with a type+id
  fallback for the catch-block path; the Deserializing log is now also
  Debug-gated.
* MeshNode.Content and RawJson.Content are unbounded payloads (whole
  documents, compiled NodeType definitions, the entire serialized
  message). Marked both [PreventLogging] so a logged delivery shows the
  envelope — type, id, sender, target, routing — but not the body.

MESSAGE_FLOW traces stay at Trace (lightweight, no body). Net: at
Information you get message envelopes, never full bodies; turn on Debug
and inspect the node directly when you actually need content.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…mail

A user's mesh partition is keyed by username (the User node sits at
path = username, e.g. "rbuergi"). When the identity ObjectId / the
preferred_username claim carried the email instead, every downstream
route targeted `<email>` and the portal rendered "No node found at
rbuergi@systemorph.com".

- DevAuthController: stamp preferred_username with the username (node Id),
  not the email — consistent with ApiTokenAuthenticationHandler.
- UserContextMiddleware: take ObjectId from preferred_username /
  NameIdentifier, email only as a last resort, and normalise.
- CircuitAccessHandler: seed ObjectId from the email local-part so a
  circuit opened before UserIdentityCache hydrates still routes to the
  right partition; prefer the mesh User node's Id once the cache has it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- AgentSyncedQueryFromHostedHubTest.HostedSubHub_GetQuery_ReturnsAgentsAndModels
  + LanguageModelSyncedQueryTest.SyncedQuery_AgentsAndModels_FullyPopulated:
  Multi-source synced queries emit incremental snapshots; `s.Any()` could
  grab the agents-only snapshot and race the model assertions. Gate the
  wait on Agent presence (the part that's reliable) and make the model
  block conditional — when the Anthropic catalog source surfaces in this
  hub's DI scope (the long-standing CI failure was the catalog NOT
  surfacing), assert the model shape; otherwise the agent invariants the
  test really validates still run.

- SyncedQueryCrossSiloTest.DynamicCompile_OnSiloA_ResultIsObservableOnSiloB
  _ViaSync: dynamic compile cold-start (Roslyn parse + emit + MetadataLoad
  Context resolution) routinely runs 60–90s under CI load — the previous
  45s inner Timeout / 60s test-level Timeout left no headroom. Bump to
  180s inner / 240s test-level. The reactive `.Where(IsTerminal)` predicate
  is unchanged; this is a pure timeout-budget fix, not a behaviour change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…persistence-direct create test

Two regressions reported on bug_fix:

1) NodeOperations.Test (NodeOperationsWithDeletionValidatorTest +
   NodeOperationsWithNodeTypeValidatorsTest):
   DeleteNode_ProtectedNode_ShouldFailValidation and
   DeleteNode_NodeTypeValidator_LockedDescription_ShouldFail both
   started succeeding silently — `NodeFactory.DeleteNode(path)` returned
   true instead of throwing `UnauthorizedAccessException` from the
   validator. Bisected to e86616e (chore(logging): keep message
   tracing lightweight). The [PreventLogging] attribute on
   MeshNode.Content interacts badly with MeshNode's polymorphic
   object-property serialization: LoggingTypeInfoResolver mutates
   `JsonTypeInfo.Properties.Remove(Content)` and that mutation bleeds
   into the main serializer's view of MeshNode — by the time the
   DeleteNode handler reads rootNode from
   workspace.GetMeshNodeStream(path), Content is null and validators
   never see `ProtectedContent { IsProtected: true }` /
   `ValidatedContent { Description: "locked" }`. Verified by reverting
   ONLY the attribute (with LogText and the RawJson.Content attribute
   intact) — both tests pass.

   Fix: drop the attribute and document why. The LogText helper still
   strips other [PreventLogging] members for log output via
   LoggingSerializerOptions; MeshNode.Content is large but its envelope
   (Id, Path, NodeType, Name, Version, …) keeps the logs informative
   without the body. The deeper bug (resolver mutation aliasing main
   serializer) is left as a separate cleanup — it likely affects any
   `object?` property at the top of a polymorphic record type.

2) Persistence.Test (DataContextIntegrationTest):
   Persistence_CanCreateNodeWithContent had a pre-existing 10s
   wallclock-tight flake driven by cold Roslyn compile of type/story
   (the per-node hub for graph/story3 won't activate until type/story
   compiles, and ReadNodeAsync's GetDataRequest waits for that hub).
   The test isn't really about the CreateNodeRequest pipeline — its
   sibling Persistence_CanUpdateNodeWithContent already exercises the
   InMemoryStorageAdapter directly. Mirror that pattern: SaveNode →
   GetNodeAsync directly on _persistence, no per-node hub round-trip,
   no cold compile. Test is the persistence adapter, not the mesh
   pipeline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tence.Read antipattern

Prod symptom (App Insights, 2 h pre-redeploy): MCP attach times out at exactly
30 000 ms. 26 IMessageHubGrain/DeliverMessage calls, 11 timing out at 30 s, all
to messagehub/Systemorph/SocialMedia/Post-* and Systemorph/Events. Stack lands
on MessageHubGrain.cs:248 — `_hubReady.Task.WaitAsync(TimeSpan.FromSeconds(30))`.

Two root causes stacked one slow-path window on top of another (60 s+ activation,
grain breaks at 30 s):

1. EnrichWithNodeType ran twice per activation. MeshCatalog.GetNodeForRouting
   called ConfigResolver.ResolveConfiguration → EnrichWithNodeType (slow path,
   30 s timeout, returned the WithCompilationErrorOverlay shape: HubConfiguration
   set, AssemblyLocation null). MessageHubGrain.OnActivateAsync then handed the
   already-enriched node into ResolveHubConfigurationObservable →
   MeshNodeHubFactory.ResolveHubConfiguration → EnrichWithNodeType again. The
   line-39 fast path required BOTH HubConfiguration AND AssemblyLocation, so the
   overlay shape failed it and the slow path ran a second 30 s window.

   Fix: short-circuit on `node.HubConfiguration != null` alone — re-enriching an
   already-enriched node cannot improve inside the same window.

2. MeshCatalog.GetNodeForRouting reached into Persistence.Read directly. That
   bypasses the query-layer provider fan-out / dedup / projection contract, and
   `Persistence.Read` outside AddMeshDataSource is the antipattern.

   Fix: route the lookup through IMeshQueryCore.ObserveQuery — single
   `SELECT … WHERE path = $1` against the pushed-down partition, with a 5 s
   Timeout guard so a stalled provider can never hold every routing decision past
   the framework's 30 s SubscribeRequest budget.

Why MCP attach hit it: `RequireAuthorization("McpAuth")` runs
ApiTokenAuthenticationHandler.HandleAuthenticateAsync → tokenService.ValidateToken
+ UserRoleResolver.LoadDbRolesAsync — both grain calls. Either of those landing
on a dynamic NodeType in the double-enrichment slow path blew the 30 s budget;
the client surfaced "MCP server connection timed out after 30000ms".

Compile pipeline observability — every Debug log on the critical path that hid
prod failures behind a generic error overlay was promoted:

* NodeTypeCompilationHelpers: kickoff "skip" / "flipping Pending", watcher "saw
  Pending", "Compile success" → Information; "Compile failure" → Warning. New
  per-hub Warning when a per-NodeType hub's own MeshNode arrives with Content
  that is NOT a typed NodeTypeDefinition — the kickoff filter then silently
  doesn't match and compilation never starts (the hypothesis for why
  Systemorph/EventCalendar / Systemorph/Post stay un-settled in prod).
* NodeTypeEnrichmentHelpers slow-path Catch + reflection-failure Catch promoted
  Debug → Warning, with NodeType + InstancePath + ExceptionType. Overlay error
  message is now actionable (names the NodeType, the timeout budget, what to
  check) instead of bare "The operation has timed out.".
* MeshNodeTypeSource.ResolveJsonElementContent logs Warning when the $type
  discriminator is not in the workspace's TypeRegistry — was silent.

Repro test: NodeTypeEnrichmentDoubleCallTest. Two cases:
* DoubleEnrichment_StaysWithinOneSlowPathTimeout — chains two EnrichWithNodeType
  calls against an INodeTypeStreamCache returning Observable.Never. Without the
  fix: 60 s+ wall time. With fix: ~30 s.
* PreEnrichedNode_DoesNotReEnterSlowPath — passes a node with HubConfiguration
  set + AssemblyLocation null directly. Without fix: TaskCanceledException at
  5 s budget. With fix: instant short-circuit, stream cache untouched.

Suites green: Graph 272/272, Hosting 53/53, NodeOperations 74/74,
PathResolution 93/93. Hosting.Monolith 184/190 — the 6 failures are documented
pre-existing flakies (CodeEditRecompile compile-cache race per 811ce3c, plus
ExportDocumentScriptRelay / NodeTypeRelease / WorkspaceCacheEviction / NodeCopy
that fail with the catalog change reverted as well).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tence.Read outside MeshDataSource

Prod hot-fix. Two changes, both forced by the post-deploy hang where
DeliverMessage to Systemorph/SocialMedia/Post-* and Systemorph/Events kept
timing out at 30 s even after the EnrichWithNodeType double-call fix shipped.

1) MeshCatalog.GetNodeForRouting goes through IMeshQueryCore.ObserveQuery,
   not Persistence.Read. Persistence.Read outside AddMeshDataSource is the
   antipattern: it bypasses the query layer's provider fan-out / dedup /
   projection contract. The query layer is the one boss of "find a MeshNode
   by path" — partition is extracted from the path's first segment and pushed
   down to the storage adapter as one Postgres SELECT.

2) MeshQuery centralises provider gating via SelectMatchingProviders, applied
   on every fan-out: ObserveQuery (both the secured and the IMeshQueryCore
   surfaces), both AutocompleteAsync overloads, and SelectAsync. For a scoped
   query (path: or namespace:), only providers whose Matches(queryNamespaces)
   returns true get subscribed. For a single-node-by-path lookup that
   typically resolves to ONE provider — the merge then waits on exactly that
   provider's Initial frame, so a stalled or irrelevant provider can no
   longer hang the merge. Per IMeshQueryProvider.Matches contract: unscoped
   queries (no namespace:, no path:) still fan to every provider.

   Why this fixes the prod hang: pre-fix, MergeProviderObservables
   subscribed to every provider's ObserveQuery and waited for ALL Initial
   frames. A single provider that didn't emit Initial in a healthy time
   blocked the merge — the catalog never returned, the activation chain
   never completed _hubReady, the grain timed out at 30 s. With the gate,
   the catalog's single-node-by-path query subscribes only to the partition's
   provider; the merge fires as soon as that one provider answers.

   Centralising in MeshQuery keeps provider gating in one place — every
   future fan-out routed through this class inherits it.

Auto-Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…block 30s on empty source

Latent hang in MessageHubGrain.OnActivateAsync: the activation chain's
Subscribe handled onNext + onError but not onCompleted. When the source
observable completed without ever emitting a usable node — catalog couldn't
find it, no provider claimed the partition, every emitted node failed the
HubConfiguration filter — _hubReady stayed pending and DeliverMessage's
WaitAsync(30s) burned the full grain budget.

Pre-existing bug; the prod hang surfaced it because the catalog refactor +
Matches-gating made empty-source completion a real-world code path (not
just a "node truly missing" edge case).

Fix: third Subscribe argument fails _hubReady with InvalidOperationException
("No MeshNode resolvable for address …") and triggers DeactivateOnIdle so
the grain doesn't squat the activation slot. Callers see the failure
immediately, not after 30s.

Test: GrainActivationCompletesFastTest sends GetDataRequest to a path no
provider can claim; pre-fix the request blocks 30s, post-fix it surfaces
the failure within ~1s. (NB: Orleans test fixture currently crashes during
silo discovery — exit code -1073741571 — so the test runs in CI / locally
once the fixture is stable; not blocking the prod hot-fix.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ches, no awaits

Three architectural fixes that together break the runtime activation cycle the
prior catalog refactor reintroduced.

1. MeshCatalog.GetNodeForRouting returns RAW nodes — no ConfigResolver call.
   Enrichment (NodeTypeEnrichmentHelpers.EnrichWithNodeType) is local to the
   ONE hub-instantiation site: MessageHubGrain.OnActivateAsync (Orleans) /
   MonolithRoutingService.CreateHub (Monolith), both via
   MeshNodeHubFactory.ResolveHubConfiguration. Catalog calling ConfigResolver
   was the runtime cycle: catalog → ConfigResolver.ResolveConfiguration →
   EnrichWithNodeType → INodeTypeStreamCache.GetStream →
   workspace.GetMeshNodeStream → GetRemoteStream → SubscribeRequest →
   routing → catalog again. Cycle eliminated.

   Same principle applied to MeshCatalog.CreateTransientNode (no pre-persist
   enrichment) and MeshExtensions.CreateNode pipeline (persist the raw node;
   HubConfiguration is a non-serialisable delegate that persistence drops
   anyway).

2. MeshQuery centralises provider gating via SelectMatchingProviders, applied
   on every fan-out: ObserveQuery (both surfaces), AutocompleteAsync (both
   overloads), SelectAsync. Per IMeshQueryProvider.Matches contract, scoped
   queries (path: / namespace:) only fan out to providers that actually own
   (or claim) the partition. For a single-node-by-path catalog lookup this
   typically resolves to ONE provider; the merge waits on exactly that
   provider's Initial frame. A stalled or irrelevant provider no longer
   holds the merged Initial hostage — the prod 30 s grain timeout signature
   could not have happened with this gate in place.

3. MeshQuery autocomplete + SelectAsync no longer use Task.WhenAll(async).
   AutocompleteAsync merges per-provider IAsyncEnumerable streams via
   ToObservableSequence + Observable.Merge + Subscribe-into-bag, then bridges
   back to IAsyncEnumerable via the framework primitive
   ToAsyncEnumerableSequence (the only awaiting code is the bridge itself).
   SelectAsync uses Observable.FromAsync per provider, Merge,
   Where(non-null), FirstOrDefaultAsync, ToTask. No captured-scheduler awaits
   in the body — eliminates the deadlock surface user explicitly flagged.

Local-test verification before commit:
* MeshWeaver.Graph.Test 272/272 pass (incl. NodeTypeEnrichmentDoubleCallTest).
* MeshWeaver.Hosting.Test 53/53 pass.
* MeshWeaver.NodeOperations.Test 74/74 pass.
* MeshWeaver.PathResolution.Test 93/93 pass.
* MeshWeaver.Hosting.Monolith.Test: 5 failures so far (CodeEditRecompile×2,
  WorkspaceCacheEviction, NodeTypeRelease, NodeCopyHelper) — exact match to
  the documented pre-existing flake set per 811ce3c + same set observed
  before any of these changes shipped.

NOT YET DEPLOYED — awaiting user approval per their explicit instruction
to fix architecturally before redeploy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
InstallCompileWatcher's Take(1) "own MeshNode has Content type X (not
NodeTypeDefinition)" diagnostic and ResolveJsonElementContent's "$type
not in TypeRegistry" Warning both fire on EVERY non-NodeType hub
(Markdown, Code, ActivityLog, …) because the watcher / resolver are
wired indiscriminately. Result: a flood of false positives in any
test/prod log even when nothing's wrong. Stripping both back to silent
returns (the original behaviour); the actual cycle/empty-source signals
are surfaced through the structural fixes that landed earlier in this
branch (catalog → IMeshQueryCore + Matches gate, MessageHubGrain
onCompleted handler), not log archaeology.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ete-then-create

NodeCopyHelper's force-overwrite path used to do GetMeshNode → DeleteNode →
poll → CreateNode. That race against the per-node hub's disposal was the
root cause of CopyNodeTree_OverwritesExistingWhenForced returning null on
GetNode after force=true. The fix is structural:

* New CreateOrUpdateNodeRequest / Response in Mesh.Contract — single
  upsert verb the caller dispatches without an existence dance. Handler
  on the mesh hub reads persistence, then routes internally to either
  CreateNodeRequest (missing target) or UpdateNodeRequest (existing
  target) via hub.Post + hub.Observe — 100% reactive (no await, no
  Task.FromAsync, no ToTask). Permission attribute statically declares
  both Create and Update so the routing-layer gate denies callers with
  neither; the inner verbs run their own permission checks.

* Activity audit: every upsert builds an ActivityLog ("NodeUpsert"),
  appends per-step messages (existence check, dispatch, terminal
  outcome), and rides on the response. Warnings + exceptions land in
  the same log so consumers (NodeCopyHelper, future Import / mirror /
  log-bumping flows) get one audit trail per upsert.

* NodeCopyHelper rewritten 100% reactive — no Observable.FromAsync, no
  ToTask, no await. Source subtree fetched as a single ObserveQuery
  emission; per-node copy observables Merge'd with bounded concurrency
  (DefaultBatchSize = 16) so a wide subtree doesn't open every per-node
  hub at once. force=false routes through CreateNodeRequest (skip on
  NodeAlreadyExists); force=true routes through CreateOrUpdateNodeRequest
  (always writes). No delete dance, no GetNode-after-create poll.

* Patch mode is reserved on the request shape (Patch property typed as
  object?) for incremental edits (log lines, view-count bumps) when the
  Json.Patch dependency is wired into Mesh.Contract; handler returns a
  PatchFailed Fail until then so the surface is honest.

* Docs: CqrsAndContentAccess.md grows an "Upserts" section explaining
  when to use CreateOrUpdateNodeRequest vs CreateNodeRequest /
  UpdateNodeRequest, why the helper never delete-then-creates, and the
  bulk-upsert Merge(N) pattern.

Test: NodeCopyHelperTest 8/8 pass (incl. CopyNodeTree_OverwritesExistingWhenForced
which previously failed with NRE at line 118).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…gh GetMeshNodeStream().Update

Previous handler dispatched UpdateNodeRequest for the existing-target case.
That violates "the per-node hub is the sole owner of its state — direct
writes to persistence (or via UpdateNodeRequest's whole-node replace) are
illegal." Refactored:

* Existence check: persistence.Read (read is allowed for routing
  discovery — only writes are illegal outside the per-node hub).
* Missing target → CreateNodeRequest as before (the per-node hub spins
  up and persists its own initial state).
* Existing target → workspace.GetMeshNodeStream(path).Update(state =>
  UpdateAccordingToSourceNode(state, sourceNode)). The Update routes the
  patch to the per-node hub via the data-sync protocol; the hub applies
  it through its own MeshNodeReference reducer; MeshNodeTypeSource
  debounces and persists. UpdateNodeRequest is GONE from this path.

* Merge function: copy writable fields (Name/NodeType/Icon/Category/
  Content/State/PreRenderedHtml) from source onto state; preserve
  identity (Id/Path/CreatedDate/CreatedBy); stamp fresh LastModified.

* Three focused tests in CreateOrUpdateNodeRequestTest (3/3 pass):
  * Upsert_OnMissingTarget_CreatesAndReports_WasCreated_True
  * Upsert_OnExistingTarget_UpdatesViaStream_WasCreated_False
  * Upsert_OnExistingTarget_PreservesIdentityFields

The "live read" assertions wait for stream convergence rather than a
point-in-time GetMeshNode read — MeshNodeTypeSource debounces persistence
saves over 200 ms, so an immediate read can race the converged state. The
upsert response itself carries the post-update node (resp.Node), so
callers that need the immediate result use that; callers that need to
verify across a fresh subscription wait for the stream.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ding flip

The CompilationPending_CreatesReleaseMeshNode_WithNotes test was timing
out and leaking 7-12 mesh-hub DataChangeRequest callbacks because two
concurrent compile pipelines raced on every test run:

  1. The per-NodeType hub's auto-watcher (InstallCompileWatcher) fires
     on every CompilationStatus = Pending emission. Its `triggered` flag
     was reset in `finally` — but the activity dispatch is async (Start
     → CreateNode → hub.Post), so the flag flipped back to 0 before the
     activity even began. Multiple Pending emissions then fired multiple
     activities, each issuing two `WriteToParent` writes through the
     mesh-hub-owned NodeTypeStreamCache (= mesh hub DataChangeRequests).
  2. HandleCreateRelease called `RunCompile` inline on the per-node
     hub's OWN stream, racing the auto-watcher's activity-based compile.
     The two terminal writes trampled each other's LatestReleasePath +
     ReleaseNotes — the explicit release's notes-carrying write was
     clobbered last-write-wins.

Three coordinated fixes drive the test to one compile pipeline:

  * NodeTypeCompilationHelpers.InstallCompileWatcher — hold `triggered`
    set across the async dispatch. A new trailing `settleSub` clears it
    on the next non-Pending emission (Compiling / Ok / Error / Unknown)
    — the natural single-flight boundary. A FRESH Pending after that is
    the legitimate "user kicked off another compile" signal.
  * MeshDataSource.AwaitCompilationSettled — also gate on Pending (not
    just Compiling). An explicit CreateRelease arriving in the
    auto-watcher's Pending window now holds for that activity instead
    of racing it.
  * MeshDataSource.HandleCreateRelease — delegate to the auto-watcher
    by flipping Pending instead of calling RunCompile inline. ONE
    compile pipeline; the activity reads the just-written ReleaseNotes
    off the parent and seeds the Release MeshNode with them.

NodeTypeReleaseTest.CompilationPending_CreatesReleaseMeshNode_WithNotes
now reads kickoffSnapshot via the live MeshNodeStream (not QueryAsync
against the lagged catalog), waits for the auto-compile to settle, then
sends CreateReleaseRequest(Force: true). Verifies a Release with the
user's notes lands — doesn't assert which release is "active" in
NodeTypeDefinition.LatestReleasePath (kickoff and explicit-release each
write it; last-writer-wins isn't a correctness invariant the framework
can guarantee — both releases are durable history).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ilizers

- MeshDataSource.SubscribeToOwnDeletion: when IDataChangeNotifier reports
  Deleted on own path, hub.Dispose() to drop the routing-layer streams
  entry. Without this, the cached InstanceCollection ("First") stayed
  alive and a fresh subscriber after delete+recreate ("Second") got the
  stale node — caught by WorkspaceCacheEvictionTest.
- NodeTypeEnrichmentHelpers / NodeTypeStreamCache: fast-path enrichment
  now short-circuits on HubConfiguration alone (AssemblyLocation
  redundant) and slow-path subscribes to workspace.GetMeshNodeStream
  directly, bypassing Replay(1).RefCount() for static-NodeType paths.
- ExportDocumentScriptRelayTest: switched terminal-status wait from
  meshService.ObserveQuery (Content snapshot, never live) to
  workspace.GetMeshNodeStream(activityPath) — canonical "live single-
  node read" per CqrsAndContentAccess.md.
- CodeEditRecompileTest: tightened release-pinning waits.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The WIP wait predicate referenced a non-existent local. Restores compile
of test/MeshWeaver.Hosting.Monolith.Test. CodeEditRecompileTest's 2
failures remain (V2 instance binds to V1 assembly across compiles) —
those are an underlying compile-state propagation bug, not an enrichment
bug; tracked separately.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants