feat(slimfaas): add dashboard#262
Conversation
Signed-off-by: Guillaume Chervet <guillaume.chervet@gmail.com>
Signed-off-by: Guillaume Chervet <guillaume.chervet@gmail.com>
Signed-off-by: Guillaume Chervet <guillaume.chervet@gmail.com>
Signed-off-by: Guillaume Chervet <guillaume.chervet@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a SlimFaas dashboard (“front”) backed by new SSE/status endpoints and cluster-wide network activity tracking, and updates queue/proxy logic to better control per-pod concurrency and affinity.
Changes:
- Add a Vite/React dashboard built into
wwwroot, served by SlimFaas, plus new/status-functions-stream+ job/status endpoints for the UI. - Introduce
NetworkActivityTracker+ peer-scraping worker to aggregate activity across SlimFaas nodes. - Extend queue/proxy plumbing to reserve pod IPs during dequeue and track “202 awaiting callback” elements.
Reviewed changes
Copilot reviewed 102 out of 105 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/SlimFaas/Workers/SlimQueuesWorker.cs |
Adds reserved-IP dequeue and tracks 202 “awaiting callback” items + activity events. |
src/SlimFaas/Proxy.cs |
Adds IP reservation/affinity helpers and LRU-ish tie-breaker for least-connections. |
src/SlimFaas/Endpoints/StatusStreamEndpoints.cs |
Adds SSE stream + internal activity-events endpoint. |
src/SlimFaas/Endpoints/NetworkActivityTracker.cs |
Adds in-memory activity event store + SSE subscriber broadcasting + remote ingest. |
src/SlimFaas/Workers/NetworkActivitySyncWorker.cs |
Periodically scrapes peer nodes’ activity events and ingests locally. |
src/SlimFaas/Endpoints/JobStatusEndpoints.cs |
Adds /jobs/status (+ alias) returning job configurations/schedules/running jobs. |
src/SlimFaas/Program.cs |
Registers tracker/worker and serves static dashboard files from wwwroot. |
src/SlimFaas/SlimFaas.csproj / Dockerfile / src/SlimFaas/ClientApp/* |
Adds the new dashboard client app and build pipeline integration. |
src/SlimFaas/Database/* + src/SlimData/* |
Extends dequeue/pop to carry reserved IPs + extra queue metadata. |
tests/* |
Updates mocks/signatures and adds coverage for new endpoints/mapping/cron behavior. |
Comments suppressed due to low confidence (1)
src/SlimFaas/Database/DatabaseMockService.cs:104
DatabaseMockService.ListRightPopAsyncremoves the wrong range from the queue:list.RemoveRange(listToReturn.Count - 1, listToReturn.Count)uses the count as the start index, so it removes elements near the beginning of the list (and can throw) instead of removing the lastlistToReturn.Countelements. This breaks the semantics of a right-pop in tests using the mock. The start index should be based onlist.Count - listToReturn.Count(or useRemoveAtin a loop from the end).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ers to prevent channel leak Agent-Logs-Url: https://github.com/SlimPlanet/SlimFaas/sessions/24e32189-793a-4018-8f82-f3620c75c67d Co-authored-by: guillaume-chervet <52236059+guillaume-chervet@users.noreply.github.com>
Agent-Logs-Url: https://github.com/SlimPlanet/SlimFaas/sessions/32f24bec-3182-4318-854c-03156126f40b Co-authored-by: guillaume-chervet <52236059+guillaume-chervet@users.noreply.github.com>
Agent-Logs-Url: https://github.com/SlimPlanet/SlimFaas/sessions/32f24bec-3182-4318-854c-03156126f40b Co-authored-by: guillaume-chervet <52236059+guillaume-chervet@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 102 out of 105 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…vations on leadership loss Agent-Logs-Url: https://github.com/SlimPlanet/SlimFaas/sessions/f472f47b-9416-4cd6-a13b-b33b704ee33b Co-authored-by: guillaume-chervet <52236059+guillaume-chervet@users.noreply.github.com>
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 102 out of 105 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (9)
src/SlimFaas/Workers/SlimQueuesWorker.cs:1
- Reserved pod slots are incremented in
ReserveNextIPs, but if anything throws after dequeue (e.g., deserialization errors or a synchronous exception insideSendHttpRequestAsyncinvocation), the already-reserved IPs for the remaining items won’t be released. Consider wrapping the per-item processing in a try/catch that releases the corresponding reservation (and/or any leftover reservations) when an item fails before it is bound to an element ID, to avoid leakingActiveRequestsPerPodcapacity.
src/SlimFaas/Workers/SlimQueuesWorker.cs:1 NetworkActivityTracker.Recordexpects(type, source, target, queueName, ...). Here,sourceis set to the function name andtargettoslimfaas, andqueueNameis also set to the function name. This makes the activity graph inconsistent with the earlierDequeuerecord (which usessource=slimfaas, target=functionDeployment, queueName=functionDeployment). Consider changing thisRequestEndrecord to use the same semantics (typicallysource=slimfaas,target=functionDeployment,queueName=functionDeployment) so activity aggregation and visualization remain accurate.
src/SlimFaas/Endpoints/EventEndpoints.cs:1- The activity records in this endpoint mix different meanings for
Source/TargetvsSourcePod/TargetPod(e.g.,EventPublishis recorded withsource=external, target=slimfaas, while later per-subscriber publish records usesource=slimfaas, target=function). To keep the dashboard/network map reliable, consider standardizing a single semantic contract for all records (e.g., actor-level inSource/Target, and IP/connection IDs only inSourcePod/TargetPod) and encapsulating this into small helper methods to avoid accidental parameter inversions.
src/SlimFaas/Endpoints/EventEndpoints.cs:1 - This
RequestEndevent writes the caller identity intotargetPod, buttargetPodis documented as the downstream pod name/IP. If the goal is to represent the response going back to the caller, the caller should be represented inTarget(or possiblySourcePod/TargetPoddepending on your chosen semantics), not intargetPod. As-is, the dashboard may show the caller IP as if it were a destination pod.
src/SlimFaas/Endpoints/StatusStreamEndpoints.cs:1 - The new SSE endpoint appears to be publicly accessible (only guarded by
HostPortEndpointFilter) and the streamed payload includes detailed function/pod data (including pod IPs) plus recent activity. If this endpoint is meant for cluster-internal/dashboard-only usage, consider adding an access control check (e.g., the existing internal-namespace verification used by other internal endpoints, or an auth mechanism) and/or conditionally mapping it only whenEnableFrontis true.
src/SlimFaas/Endpoints/StatusStreamEndpoints.cs:1 - This performs one queue count call per function for every full-state push (currently every second). In clusters with many functions, this becomes an N-per-second polling pattern and can overload SlimData / DB. Consider reducing the full-state frequency, caching/batching queue length reads, or adding a mode that omits queue lengths unless explicitly requested by the client.
src/SlimFaas/SlimFaas.csproj:1 - Hooking
npm install/npm run buildintoBeforeBuildwill makedotnet builddepend on Node/npm by default, which can break developer workflows and CI agents that only build the backend. Consider moving the client build toBeforePublish, defaultingSkipClientAppBuildto true for regular builds, and/or adding a clearer preflight check/error message when npm is unavailable.
src/SlimFaas/Workers/NetworkActivitySyncWorker.cs:1 - Self-skip is based on
pod.Name.Contains(myNodeId), butNodeIdcan beEnvironment.MachineName(or a short GUID fallback) and may not match the Kubernetes pod name. In that case, the worker may repeatedly scrape itself (wasted traffic/log noise). Consider a more reliable self-detection strategy (e.g., comparepod.Ipto the node’s pod IP from env, or setNodeIdexplicitly to the pod name/hostname in k8s and compare by equality).
tests/SlimFaas.Tests/Endpoints/NetworkActivityTrackerTests.cs:1 - The display name says MaxRecentEvents is 200, but the assertion checks
<= 1000and the implementation constant isMaxRecentEvents = 1000. This test currently doesn’t verify trimming behavior. Consider aligning the display name with the real limit and asserting the exact expected trimmed count (or making the limit injectable/accessible for tests).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


No description provided.