OPCT-TBD: Add interactive etcd metric charts to web UI report#218
OPCT-TBD: Add interactive etcd metric charts to web UI report#218mtulio wants to merge 6 commits into
Conversation
Embed etcd Prometheus metric charts inline on the etcd page using Chart.js, replacing the previous Plotly-based approach for lighter rendering (~200KB vs ~3.5MB). Layout: - Split-pane with draggable divider (70% tables / 30% charts default) - Independent scroll on each panel - v-if/v-else ensures non-etcd pages use full-width layout unchanged Chart features: - Drag-to-select time range zoom on X axis (chartjs-plugin-zoom) - Reset zoom button per chart - Expand button opens chart in fullscreen overlay with zoom support - Responsive sizing, auto-resize on panel drag Charts displayed: - etcd fsync WAL/DB duration p99 - etcd peer round trip time - etcd total leader elections - etcd request duration p99 Data source: reuses existing Plotly JSON files already generated at metrics/ directory during report processing — no backend changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds an ETCd split-pane chart UI (styles, Chart.js deps, Vue state, rendering, overlay, and draggable divider) and changes server-side metrics extraction to emit per-chart JSON files plus an index.json while filtering NaN/Inf datapoints. ChangesETCd Chart Visualization
Metrics export / JSON output
🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
| <script src="https://cdn.jsdelivr.net/npm/vue@2/dist/vue.js"></script> | ||
| <script src="https://unpkg.com/axios/dist/axios.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.9/dist/chart.umd.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/chartjs-adapter-date-fns@3.0.0/dist/chartjs-adapter-date-fns.bundle.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/hammerjs@2.0.8/hammer.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-zoom@2.2.0/dist/chartjs-plugin-zoom.min.js"></script> |
There was a problem hiding this comment.
can we decrease the amount of dependencies or build it together, preferably with what we are using only, serving as static ?
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@data/templates/report/report.html`:
- Around line 13-16: The template loads Chart.js, adapter-date-fns, hammerjs and
chartjs-plugin-zoom from public CDNs without integrity/crossorigin—either add
Subresource Integrity (integrity attributes) plus crossorigin="anonymous" to
those script tags in report.html or switch to locally bundled copies to mitigate
supply-chain risk; additionally, in renderEtcdCharts() where this.chartInstances
is reset, first iterate existing Chart instances in this.chartInstances and call
their destroy() before clearing the map, and add a per-call render token (or
cancelation/sequence check) so async axios responses verify the token/sequence
before creating/updating charts to ignore stale responses and avoid
leaked/duplicated charts.
- Around line 313-320: changeMenuCleanup currently clears UI but doesn't destroy
Chart.js instances and renderEtcdCharts resets chartInstances and fires axios
requests without guarding against late responses; update changeMenuCleanup to
iterate over this.chartInstances and call destroy() on each Chart instance (and
then clear the map) before touching DOM, and modify renderEtcdCharts to (1)
avoid blindly replacing this.chartInstances = {} without first destroying any
existing charts, (2) attach a per-render token/id (or cancel token) to each
axios.get so late responses can be ignored or canceled, and (3) verify the
target canvas still matches the expected id/token before creating a new Chart to
prevent resurrecting charts on reused canvases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 3c8a48ee-26bc-4111-8afb-6a245de36022
📒 Files selected for processing (2)
data/templates/report/report.cssdata/templates/report/report.html
| <script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.9/dist/chart.umd.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/chartjs-adapter-date-fns@3.0.0/dist/chartjs-adapter-date-fns.bundle.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/hammerjs@2.0.8/hammer.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-zoom@2.2.0/dist/chartjs-plugin-zoom.min.js"></script> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="data/templates/report/report.html"
echo "== Snippet around scripts (lines 1-60) =="
sed -n '1,60p' "$file" | nl -ba | sed -n '1,80p'
echo "== Snippet around lines 280-360 =="
sed -n '280,360p' "$file" | nl -ba | sed -n '1,120p'
echo "== Find chart lifecycle bits =="
rg -n "chartInstances|renderEtcdCharts|destroy\(|new Chart\(|getContext\(|canvas|update\(" "$file" || true
echo "== Snippet around renderEtcdCharts (best-effort) =="
# try to locate function and print nearby
lnum="$(rg -n "renderEtcdCharts" "$file" | head -n1 | cut -d: -f1 || true)"
if [[ -n "${lnum}" ]]; then
start=$((lnum-80))
end=$((lnum+180))
sed -n "${start},${end}p" "$file" | nl -ba | sed -n "1,200p"
fi
echo "== Snippet around lines 780-980 =="
sed -n '780,980p' "$file" | nl -ba | sed -n '1,220p'Repository: redhat-openshift-ecosystem/opct
Length of output: 178
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="data/templates/report/report.html"
echo "== File exists? =="
if [[ -f "$file" ]]; then echo "FOUND: $file"; else echo "MISSING: $file"; exit 0; fi
echo "== Snippet around scripts (start of file) =="
sed -n '1,60p' "$file" | sed '=' | sed 'N; s/\n/ /'
echo "== Snippet around lines 280-360 =="
sed -n '280,360p' "$file" | sed '=' | sed 'N; s/\n/ /'
echo "== Search chart lifecycle markers =="
rg -n "chartInstances|renderEtcdCharts|destroy\(|new Chart\(" "$file" || true
echo "== Snippet around renderEtcdCharts =="
lnum="$(rg -n "renderEtcdCharts" "$file" | head -n1 | cut -d: -f1 || true)"
if [[ -n "${lnum}" ]]; then
start=$((lnum-80))
end=$((lnum+200))
sed -n "${start},${end}p" "$file" | sed '=' | sed 'N; s/\n/ /'
fi
echo "== Snippet around ~lines 780-980 (ETCD chart area best-effort) =="
sed -n '780,980p' "$file" | sed '=' | sed 'N; s/\n/ /'Repository: redhat-openshift-ecosystem/opct
Length of output: 32703
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="data/templates/report/report.html"
echo "== Calls to changeMenuCleanup() =="
rg -n "changeMenuCleanup\(" "$file" || true
echo "== Any destroy() usage on chartInstances/base charts =="
rg -n "chartInstances\]|chartInstances|\.destroy\(\)|destroy\(\)" "$file" || true
echo "== Any loop over chartInstances with destroy =="
rg -n "for .*chartInstances|Object\.values\(this\.chartInstances\)|Object\.keys\(this\.chartInstances\)" "$file" || trueRepository: redhat-openshift-ecosystem/opct
Length of output: 806
Fix supply-chain risk and Chart.js lifecycle issues in the report template
data/templates/report/report.html: loads Chart.js/adapter/zoom/hammer from public CDNs withoutintegrity/crossorigin; add SRI + crossorigin or bundle locally.renderEtcdCharts()(~877+): resetsthis.chartInstances = {}but never callsdestroy()on existing Chart.js instances; repeated/late async axios responses can recreate charts on the same canvases, leaking memory and causing duplicated charts—destroy prior instances before clearing and ignore stale responses (e.g., render token).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@data/templates/report/report.html` around lines 13 - 16, The template loads
Chart.js, adapter-date-fns, hammerjs and chartjs-plugin-zoom from public CDNs
without integrity/crossorigin—either add Subresource Integrity (integrity
attributes) plus crossorigin="anonymous" to those script tags in report.html or
switch to locally bundled copies to mitigate supply-chain risk; additionally, in
renderEtcdCharts() where this.chartInstances is reset, first iterate existing
Chart instances in this.chartInstances and call their destroy() before clearing
the map, and add a per-call render token (or cancelation/sequence check) so
async axios responses verify the token/sequence before creating/updating charts
to ignore stale responses and avoid leaked/duplicated charts.
…ages This commit enhances the etcd tab in the web UI by adding 2 missing charts and removes legacy standalone metrics pages that are no longer needed. Changes: 1. Fix metrics processor to export all charts (NaN handling): - Added NaN/Inf filtering in processMetricV2 to prevent JSON marshalling errors - Previously, API and etcd request duration charts failed with 'unsupported value: NaN' - Now all 6 etcd/API charts are successfully exported to index.json 2. Add missing charts to web UI etcd tab: - Added chart-5: Kube API request p99 (HTTP verb breakdown) - Chart-4 (etcd request duration p99) was already defined - Web UI now displays all 6 etcd/API performance charts 3. Remove legacy metrics pages (pre-1.0 deprecation): - Removed metrics.html generation (echarts-based standalone page) - Removed index.html generation (Plotly-based standalone page) - Removed index.js generation (only used by index.html) - Cleaned up unused template constants and imports - Keep index.json and chart JSON files (used by web UI) Chart inventory: - etcd fsync WAL p99 - etcd fsync DB p99 - etcd peer round trip time - etcd total leader elections - etcd request duration p99 (operations: create/delete/get/list/update) - Kube API request p99 (verbs: GET/POST/PUT/PATCH/DELETE/APPLY/LIST) All charts use Chart.js with interactive features: drag-to-zoom, fullscreen expand, and split-pane layout. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/openshift/mustgathermetrics/plotly.go`:
- Around line 65-68: The error log in the os.WriteFile error branch is using the
wrong variable: log.Errorf("Unable to save file %s: %v", indexJsonFileData, err)
prints the JSON bytes instead of the file path. Change the log.Errorf call to
use indexJsonFilePath as the %s argument (keep the error err as %v) so the
message reads the file path and the actual error; locate this in the block
wrapping os.WriteFile and update the call referencing indexJsonFileData to
indexJsonFilePath.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 5a08c8f9-ee32-4250-af66-6a0e2dde1652
📒 Files selected for processing (3)
data/templates/report/report.htmlinternal/openshift/mustgathermetrics/main.gointernal/openshift/mustgathermetrics/plotly.go
🚧 Files skipped from review as they are similar to previous changes (1)
- data/templates/report/report.html
…ON directly
This commit dramatically simplifies the metrics processing by eliminating
unnecessary format conversions and consolidating the package into a single file.
**Architecture Changes:**
1. Removed Plotly format conversion (was: Prometheus → Plotly → Chart.js)
- Now: Prometheus JSON → Chart.js directly
- Chart.js consumes Prometheus query_range API responses natively
- Eliminated 150 lines of format transformation code
2. Deleted obsolete files (dead code from metrics.html removal):
- charts.go (199 lines) - 100% unused echarts integration
- plotly.go (178 lines) - Plotly format conversion, now direct Prometheus
- Consolidated everything into main.go (270 lines, down from 609)
3. Externalized chart configuration:
- Created charts-config.json with chart metadata
- Embedded via go:embed for zero-config deployment
- Easy to add new charts without code changes
**Code Reduction:**
- Before: 609 lines across 3 files
- After: 270 lines in 1 file + 30 lines config
- Savings: 58% code reduction (-356 lines)
**Functional Changes:**
1. Simplified API:
- NewMustGatherMetrics(reportPath, data) - removed unused params
- Process() extracts and saves Prometheus JSON directly
2. Better NaN/Inf handling:
- Filter at processing time (not per-chart)
- Aggregate stats logged once (was: per-datapoint debug spam)
- Example: "Filtered 12 NaN/Inf datapoints from 1850 total (0.6% invalid)"
3. Web UI updates (data/templates/report/report.html):
- Parse Prometheus JSON format in renderEtcdCharts()
- Extract labels from metric.metric{} object
- Convert Unix timestamps to JS Date objects
- Removed unused parseMetricTimestamp() method
**Dependency Cleanup:**
Removed unused dependencies:
- github.com/go-echarts/go-echarts/v2 (echarts charts)
- k8s.io/utils/ptr (pointer helpers for echarts)
**File Changes:**
Modified:
- internal/openshift/mustgathermetrics/main.go - Complete rewrite
- data/templates/report/report.html - Prometheus JSON parser
- internal/opct/summary/result.go - Updated caller
- pkg/cmd/adm/parsemetrics.go - Updated caller and messages
Added:
- internal/openshift/mustgathermetrics/charts-config.json - Chart metadata
Deleted:
- internal/openshift/mustgathermetrics/charts.go - Dead code
- internal/openshift/mustgathermetrics/plotly.go - Unnecessary conversion
**Testing:**
Validated with real OPCT archive:
- ✅ All 6 charts generate correctly
- ✅ Prometheus JSON format preserved
- ✅ NaN/Inf filtering works (no JSON marshalling errors)
- ✅ index.json created with correct IDs
- ✅ Web UI parses Prometheus format (needs browser test)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/opct/summary/result.go (1)
515-523:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSet
HasMetricsonly after a successful metrics export.
HasMetricsis flipped totrueeven whenrs.Metrics.Process()fails. That advertises metrics assets that were never generated and can break the report flow downstream.Suggested fix
} else { - err := rs.Metrics.Process() - if err != nil { + if err := rs.Metrics.Process(); err != nil { log.Errorf("Processing MetricsData: %v", err) + } else { + rs.HasMetrics = true } - rs.HasMetrics = true } } else {As per coding guidelines,
**: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/opct/summary/result.go` around lines 515 - 523, The code sets rs.HasMetrics = true even when rs.Metrics.Process() fails; change the logic so you only set rs.HasMetrics to true after mustgathermetrics.NewMustGatherMetrics(...) succeeds and rs.Metrics.Process() returns nil. Specifically, in the block that calls mustgathermetrics.NewMustGatherMetrics and then rs.Metrics.Process(), move the assignment to rs.HasMetrics = true so it happens only on successful Process(), and on Process() error log the failure and clear or leave rs.Metrics unset (or nil) to avoid advertising non-existent assets.
♻️ Duplicate comments (1)
data/templates/report/report.html (1)
884-887:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse configured chart label metadata for legend labels (avoid metric key-order-dependent labeling).
data/templates/report/report.html(lines 884-887) derives the legend fromObject.keys(result.metric).find(k => k !== '__name__'), so the same series can be mislabeled depending on metric key order.internal/openshift/mustgathermetrics/charts-config.jsonalready defines the intended label dimension per chart; addlabelto eachetcdChartsentry and useresult.metric[chart.label]here.Suggested fix
- let labelKey = Object.keys(result.metric).find(k => k !== '__name__') || 'instance'; - let labelValue = result.metric[labelKey] || 'unknown'; + let labelValue = result.metric[chart.label]; + if (labelValue === undefined) { + labelValue = result.metric.instance || result.metric.verb || result.metric.operation || 'unknown'; + }Companion change (add
labelto each chart inetcdCharts):etcdCharts: [ { id: "etcd-chart-0", path: "./metrics/query_range-etcd-disk-fsync-wal-duration-p99.json.gz.json", title: "etcd fsync WAL p99", label: "instance" }, { id: "etcd-chart-1", path: "./metrics/query_range-etcd-disk-fsync-db-duration-p99.json.gz.json", title: "etcd fsync DB p99", label: "instance" }, { id: "etcd-chart-2", path: "./metrics/query_range-etcd-peer-round-trip-time.json.gz.json", title: "etcd peer round trip", label: "instance" }, { id: "etcd-chart-3", path: "./metrics/query_range-etcd-total-leader-elections-day.json.gz.json", title: "etcd total leader elections", label: "instance" }, { id: "etcd-chart-4", path: "./metrics/query_range-etcd-request-duration-p99.json.gz.json", title: "etcd req duration p99", label: "operation" }, { id: "etcd-chart-5", path: "./metrics/query_range-api-kas-request-duration-p99.json.gz.json", title: "Kube API request p99", label: "verb" }, ],
- Chart.js lifecycle:
renderEtcdCharts()resetsthis.chartInstances = {}without destroying existingChartinstances before re-rendering, so repeated navigation/re-rendering can leave stale charts/handlers behind.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@data/templates/report/report.html` around lines 884 - 887, The legend-labeling currently picks the first non-__name__ metric key which is order-dependent; update the series label logic to read the configured label key from the chart config (use chart.label when building datasets and fall back to 'instance' or an explicit fallback like result.metric[chart.label] || result.metric.instance || 'unknown') so legend names are stable, and ensure you add a "label" field to each etcdCharts entry; additionally, in renderEtcdCharts() destroy any existing Chart instances (call destroy() on entries in this.chartInstances) before clearing this.chartInstances = {} to avoid leaking handlers/stale charts during re-renders.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/openshift/mustgathermetrics/main.go`:
- Around line 218-223: The code currently logs per-chart write failures and
continues, and treats an empty index as success; change this so the first
failure to save a chart via mg.saveChartJSON(chartPath, filteredData) causes the
function to return that error immediately instead of continue, and after
generating the index (before returning success) check if len(index) == 0 and
return a descriptive error if so; update any loops that iterate
fileName/chartPath to propagate/save errors from mg.saveChartJSON rather than
only logging them, and ensure the calling function receives the error so exports
that are partial or empty are rejected.
---
Outside diff comments:
In `@internal/opct/summary/result.go`:
- Around line 515-523: The code sets rs.HasMetrics = true even when
rs.Metrics.Process() fails; change the logic so you only set rs.HasMetrics to
true after mustgathermetrics.NewMustGatherMetrics(...) succeeds and
rs.Metrics.Process() returns nil. Specifically, in the block that calls
mustgathermetrics.NewMustGatherMetrics and then rs.Metrics.Process(), move the
assignment to rs.HasMetrics = true so it happens only on successful Process(),
and on Process() error log the failure and clear or leave rs.Metrics unset (or
nil) to avoid advertising non-existent assets.
---
Duplicate comments:
In `@data/templates/report/report.html`:
- Around line 884-887: The legend-labeling currently picks the first
non-__name__ metric key which is order-dependent; update the series label logic
to read the configured label key from the chart config (use chart.label when
building datasets and fall back to 'instance' or an explicit fallback like
result.metric[chart.label] || result.metric.instance || 'unknown') so legend
names are stable, and ensure you add a "label" field to each etcdCharts entry;
additionally, in renderEtcdCharts() destroy any existing Chart instances (call
destroy() on entries in this.chartInstances) before clearing this.chartInstances
= {} to avoid leaking handlers/stale charts during re-renders.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 8c7358e4-0a4f-423b-9d48-40f3751e7118
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (8)
data/templates/report/report.htmlgo.modinternal/opct/summary/result.gointernal/openshift/mustgathermetrics/charts-config.jsoninternal/openshift/mustgathermetrics/charts.gointernal/openshift/mustgathermetrics/main.gointernal/openshift/mustgathermetrics/plotly.gopkg/cmd/adm/parsemetrics.go
💤 Files with no reviewable changes (3)
- internal/openshift/mustgathermetrics/charts.go
- internal/openshift/mustgathermetrics/plotly.go
- go.mod
✅ Files skipped from review due to trivial changes (1)
- internal/openshift/mustgathermetrics/charts-config.json
…rsion Created a new standalone metrics dashboard with an interactive grid layout that displays all 6 Prometheus charts as small widgets with click-to-expand functionality. Standardized Y-axis units to milliseconds for all duration metrics to improve readability and cross-chart comparison. Changes: - Added metrics.html with grid layout (3-4 columns, responsive design) - Implemented click-to-expand modal for full-size chart viewing - Converted duration metrics from seconds to milliseconds - Added Y-axis labels showing units (ms/count) - Limited etcd tab to 2 core fsync charts (WAL and DB) - Added info banner in etcd tab linking to full dashboard - Renamed "Metrics" tab to "Dashboard" - Reordered tabs: OPCT → Dashboard → Tests → Events → CAMGI - Auto-generated files: index.html (redirect), metrics.html (dashboard) Dashboard Features: - Grid layout with 3-4 columns of chart widgets - Click any chart to open full-size interactive modal - All 6 charts: etcd fsync WAL/DB p99, peer round trip, leader elections, etcd request duration p99, Kube API request duration p99 - Chart.js with zoom/pan controls in both views - Responsive design for mobile/desktop Metric Improvements: - Duration metrics now show milliseconds (5 ms vs 0.005 s) - Y-axis labeled with unit (ms or count) - Consistent units across all charts for easy comparison - Applied to grid dashboard, modal, and etcd tab Files Modified: - internal/openshift/mustgathermetrics/main.go - HTML embedding and generation - data/templates/report/report.html - etcd charts ms conversion, tab reordering - internal/openshift/mustgathermetrics/metrics.html - new grid dashboard (embedded) - internal/openshift/mustgathermetrics/index.html - redirect (embedded) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…or handling Fixed memory leaks and stale render issues in Chart.js usage, and improved error handling in metrics export to fail fast on errors. Changes: - Fixed Chart.js memory leaks: destroy() existing instances before cleanup - Added render token to ignore stale axios responses - Changed metrics export to fail on first chart write error (not just warn) - Added validation to reject empty chart exports Chart.js Lifecycle Fixes (data/templates/report/report.html): - changeMenuCleanup(): Now calls destroy() on all chartInstances before clearing - renderEtcdCharts(): Destroys old instances and uses render token to ignore stale responses - Prevents memory leaks from undestroyed Chart instances - Prevents duplicate/invalid charts from late axios responses Error Handling Improvements (internal/openshift/mustgathermetrics/main.go): - Chart write failures now return error instead of logging warning - Added len(index) == 0 check to fail if no charts were generated - Ensures partial/broken exports are rejected, not reported as success Addresses CodeRabbit review comments from PR redhat-openshift-ecosystem#218 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Response to CodeRabbit Review CommentsComment 1: plotly.go error logging (line 68)Status: ✅ Not applicable Comment 2: Chart.js lifecycle issues (report.html)Status: ✅ Fixed in commit
Comment 3: Error handling in main.goStatus: ✅ Fixed in commit
Comment 4: CDN dependencies without SRI (report.html:16)Status: 🔄 Design decision - needs discussion Current approach: Load Chart.js, date-fns adapter, hammerjs, and zoom plugin from public CDNs Pros:
Cons:
Options:
@mtulio asked: "can we decrease dependencies or build together, preferably serving as static?" Recommendation: Add SRI attributes now for security, consider bundling in future if deployment complexity is acceptable. Would you like me to proceed with option 1 (SRI attributes) or option 2 (local bundling)? |
The HasMetrics flag was being set to true even when Metrics.Process() failed, which would advertise metrics assets that don't exist and break the report flow. Now only set HasMetrics=true after Process() succeeds. Fixes CodeRabbit review comment on internal/opct/summary/result.go:515-523 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
✅ All CodeRabbit review comments addressed Fixed in commit 6d0133d:
Already fixed in previous commits:
Pending user decision:
🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/openshift/mustgathermetrics/metrics.html`:
- Around line 192-367: The grid and modal chart rendering code is duplicated;
extract shared helpers and use them from both places. Implement a
createDatasets(promResponse, config) that builds dataset objects (used where
datasets are created in the forEach block and in expandChart) and implement
createChartOptions(config, isModal=false) that returns the Chart.js options
(used instead of the duplicated options object in the Chart(...) calls), then
update the forEach chartInstances[config.id] creation and expandChart to call
axios, validate promResponse, call createDatasets(...) and
createChartOptions(..., true/false) and pass those into new Chart(...); keep
existing identifiers chartColors, chartInstances, expandChart and modalChart so
changes are local and preserve minor style differences via the isModal flag.
- Around line 161-168: The file currently hardcodes the chartConfigs array;
instead fetch and parse ./index.json on page load and use that response as
chartConfigs (preserving fields id, path, title, unit, convertToMs) before
invoking the existing chart/grid creation and chart-loading logic that expects
chartConfigs; update metrics.html to replace the static chartConfigs declaration
with an async fetch (axios or fetch) that falls back to an empty array or logs
an error if the request fails so the page degrades gracefully, and ensure any
code that reads chartConfigs (the chart/grid initialization code) runs only
after the fetch completes.
- Around line 7-11: Add Subresource Integrity (SRI) and crossorigin attributes
to the CDN resources referenced in the HTML (the <link> for bootstrap@5.3.3 and
the <script> tags for chart.js@4.4.9, chartjs-adapter-date-fns@3.0.0,
chartjs-plugin-zoom@2.2.0, and axios) by computing and inserting the correct
integrity hashes (e.g., sha384) for each URL and adding crossorigin="anonymous";
alternatively replace those CDN imports by bundling the libraries locally if you
prefer not to use CDN+SRI.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 21e9aa05-e17b-4e70-9202-d8a865eee23a
📒 Files selected for processing (5)
data/templates/report/report.htmlinternal/opct/summary/result.gointernal/openshift/mustgathermetrics/index.htmlinternal/openshift/mustgathermetrics/main.gointernal/openshift/mustgathermetrics/metrics.html
✅ Files skipped from review due to trivial changes (1)
- internal/openshift/mustgathermetrics/index.html
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/opct/summary/result.go
- internal/openshift/mustgathermetrics/main.go
- data/templates/report/report.html
| <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css" rel="stylesheet"> | ||
| <script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.9/dist/chart.umd.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/chartjs-adapter-date-fns@3.0.0/dist/chartjs-adapter-date-fns.bundle.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-zoom@2.2.0/dist/chartjs-plugin-zoom.min.js"></script> | ||
| <script src="https://cdn.jsdelivr.net/npm/axios/dist/axios.min.js"></script> |
There was a problem hiding this comment.
Add SRI hashes to CDN dependencies.
The CDN-loaded libraries lack Subresource Integrity (SRI) hashes, creating a supply-chain security risk. If a CDN is compromised or a MITM attack occurs, malicious code could be injected.
Per PR objectives, CDN security for report.html was left as a design decision. The same concern applies here. Consider adding SRI hashes with crossorigin="anonymous" to all CDN script/link tags, or bundle these dependencies locally.
🔒 Example with SRI hashes
- <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css" rel="stylesheet">
- <script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.9/dist/chart.umd.min.js"></script>
+ <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css"
+ rel="stylesheet"
+ integrity="sha384-..."
+ crossorigin="anonymous">
+ <script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.9/dist/chart.umd.min.js"
+ integrity="sha384-..."
+ crossorigin="anonymous"></script>Note: Generate actual SRI hashes using: curl -s URL | openssl dgst -sha384 -binary | openssl base64 -A
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/openshift/mustgathermetrics/metrics.html` around lines 7 - 11, Add
Subresource Integrity (SRI) and crossorigin attributes to the CDN resources
referenced in the HTML (the <link> for bootstrap@5.3.3 and the <script> tags for
chart.js@4.4.9, chartjs-adapter-date-fns@3.0.0, chartjs-plugin-zoom@2.2.0, and
axios) by computing and inserting the correct integrity hashes (e.g., sha384)
for each URL and adding crossorigin="anonymous"; alternatively replace those CDN
imports by bundling the libraries locally if you prefer not to use CDN+SRI.
| const chartConfigs = [ | ||
| { id: "chart-0", path: "./query_range-etcd-disk-fsync-wal-duration-p99.json.gz.json", title: "etcd fsync WAL p99", unit: "ms", convertToMs: true }, | ||
| { id: "chart-1", path: "./query_range-etcd-disk-fsync-db-duration-p99.json.gz.json", title: "etcd fsync DB p99", unit: "ms", convertToMs: true }, | ||
| { id: "chart-2", path: "./query_range-etcd-peer-round-trip-time.json.gz.json", title: "etcd peer round trip time", unit: "ms", convertToMs: true }, | ||
| { id: "chart-3", path: "./query_range-etcd-total-leader-elections-day.json.gz.json", title: "etcd total leader elections per day", unit: "count", convertToMs: false }, | ||
| { id: "chart-4", path: "./query_range-etcd-request-duration-p99.json.gz.json", title: "etcd request duration p99", unit: "ms", convertToMs: true }, | ||
| { id: "chart-5", path: "./query_range-api-kas-request-duration-p99.json.gz.json", title: "Kube API Server request duration p99", unit: "ms", convertToMs: true }, | ||
| ]; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Fetch chart configurations dynamically from index.json.
The chart paths, titles, and metadata are hardcoded, but according to context snippet 1 (main.go:186-270), the server generates an index.json file with chart metadata. Hardcoding creates tight coupling with the server-side filename format and duplicates metadata that exists in charts-config.json on the server.
If the server changes the filename format or chart metadata, this file must be manually updated, increasing maintenance burden and the risk of inconsistencies.
Consider fetching index.json on page load to dynamically discover available charts and their configurations. This would make the dashboard resilient to server-side changes and eliminate duplication.
♻️ Proposed approach
- Extend
index.jsonto include chart metadata (title, unit, convertToMs):
[
{
"id": "chart-0",
"path": "./query_range-etcd-disk-fsync-wal-duration-p99.json.gz.json",
"title": "etcd fsync WAL p99",
"unit": "ms",
"convertToMs": true
},
...
]- Fetch and parse index.json in metrics.html:
axios.get('./index.json')
.then(resp => {
const chartConfigs = resp.data;
// Proceed with existing grid creation and chart loading logic
});This approach ensures the client and server stay in sync automatically.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/openshift/mustgathermetrics/metrics.html` around lines 161 - 168,
The file currently hardcodes the chartConfigs array; instead fetch and parse
./index.json on page load and use that response as chartConfigs (preserving
fields id, path, title, unit, convertToMs) before invoking the existing
chart/grid creation and chart-loading logic that expects chartConfigs; update
metrics.html to replace the static chartConfigs declaration with an async fetch
(axios or fetch) that falls back to an empty array or logs an error if the
request fails so the page degrades gracefully, and ensure any code that reads
chartConfigs (the chart/grid initialization code) runs only after the fetch
completes.
| chartConfigs.forEach(config => { | ||
| axios.get(config.path) | ||
| .then(resp => { | ||
| const promResponse = resp.data; | ||
| if (!promResponse.data || !promResponse.data.result) { | ||
| throw new Error('Invalid Prometheus response format'); | ||
| } | ||
|
|
||
| const datasets = promResponse.data.result.map((result, i) => { | ||
| const labelKey = Object.keys(result.metric).find(k => k !== '__name__') || 'instance'; | ||
| const labelValue = result.metric[labelKey] || 'unknown'; | ||
|
|
||
| const dataPoints = result.values.map(v => ({ | ||
| x: new Date(v[0] * 1000), | ||
| y: config.convertToMs ? parseFloat(v[1]) * 1000 : parseFloat(v[1]) // Convert seconds to ms | ||
| })); | ||
|
|
||
| return { | ||
| label: labelValue, | ||
| data: dataPoints, | ||
| borderColor: chartColors[i % chartColors.length], | ||
| backgroundColor: 'transparent', | ||
| borderWidth: 1.5, | ||
| pointRadius: 0, | ||
| tension: 0.3, | ||
| }; | ||
| }); | ||
|
|
||
| const canvas = document.getElementById(config.id); | ||
| if (!canvas) return; | ||
|
|
||
| chartInstances[config.id] = new Chart(canvas, { | ||
| type: 'line', | ||
| data: { datasets: datasets }, | ||
| options: { | ||
| responsive: true, | ||
| maintainAspectRatio: false, | ||
| interaction: { mode: 'index', intersect: false }, | ||
| plugins: { | ||
| legend: { | ||
| position: 'bottom', | ||
| labels: { boxWidth: 10, font: { size: 10 } } | ||
| }, | ||
| tooltip: { mode: 'index', intersect: false }, | ||
| zoom: { | ||
| zoom: { | ||
| drag: { | ||
| enabled: true, | ||
| backgroundColor: 'rgba(78,121,167,0.15)', | ||
| borderColor: '#4e79a7', | ||
| borderWidth: 1 | ||
| }, | ||
| mode: 'x' | ||
| }, | ||
| pan: { enabled: true, mode: 'x' }, | ||
| }, | ||
| }, | ||
| scales: { | ||
| x: { | ||
| type: 'time', | ||
| time: { tooltipFormat: 'yyyy-MM-dd HH:mm' }, | ||
| ticks: { maxTicksAllowed: 6, font: { size: 9 } } | ||
| }, | ||
| y: { | ||
| beginAtZero: false, | ||
| ticks: { font: { size: 9 } }, | ||
| title: { | ||
| display: true, | ||
| text: config.unit, | ||
| font: { size: 10, weight: 'bold' } | ||
| } | ||
| }, | ||
| }, | ||
| }, | ||
| }); | ||
| }) | ||
| .catch(err => { | ||
| console.error(`Failed to load ${config.title}:`, err); | ||
| const canvas = document.getElementById(config.id); | ||
| if (canvas) { | ||
| canvas.parentElement.innerHTML = '<div class="chart-unavailable">Chart not available</div>'; | ||
| } | ||
| }); | ||
| }); | ||
|
|
||
| function expandChart(config) { | ||
| const modal = document.getElementById('chart-modal'); | ||
| const modalTitle = document.getElementById('modal-chart-title'); | ||
| const modalCanvas = document.getElementById('modal-chart-canvas'); | ||
|
|
||
| modalTitle.textContent = config.title; | ||
| modal.classList.add('active'); | ||
|
|
||
| // Destroy previous modal chart if exists | ||
| if (modalChart) { | ||
| modalChart.destroy(); | ||
| } | ||
|
|
||
| // Load chart data | ||
| axios.get(config.path) | ||
| .then(resp => { | ||
| const promResponse = resp.data; | ||
| if (!promResponse.data || !promResponse.data.result) { | ||
| throw new Error('Invalid Prometheus response format'); | ||
| } | ||
|
|
||
| const datasets = promResponse.data.result.map((result, i) => { | ||
| const labelKey = Object.keys(result.metric).find(k => k !== '__name__') || 'instance'; | ||
| const labelValue = result.metric[labelKey] || 'unknown'; | ||
|
|
||
| const dataPoints = result.values.map(v => ({ | ||
| x: new Date(v[0] * 1000), | ||
| y: config.convertToMs ? parseFloat(v[1]) * 1000 : parseFloat(v[1]) // Convert seconds to ms | ||
| })); | ||
|
|
||
| return { | ||
| label: labelValue, | ||
| data: dataPoints, | ||
| borderColor: chartColors[i % chartColors.length], | ||
| backgroundColor: 'transparent', | ||
| borderWidth: 2, | ||
| pointRadius: 0, | ||
| tension: 0.3, | ||
| }; | ||
| }); | ||
|
|
||
| modalChart = new Chart(modalCanvas, { | ||
| type: 'line', | ||
| data: { datasets: datasets }, | ||
| options: { | ||
| responsive: true, | ||
| maintainAspectRatio: false, | ||
| interaction: { mode: 'index', intersect: false }, | ||
| plugins: { | ||
| legend: { | ||
| position: 'bottom', | ||
| labels: { boxWidth: 15, font: { size: 12 } } | ||
| }, | ||
| tooltip: { mode: 'index', intersect: false }, | ||
| zoom: { | ||
| zoom: { | ||
| drag: { | ||
| enabled: true, | ||
| backgroundColor: 'rgba(78,121,167,0.15)', | ||
| borderColor: '#4e79a7', | ||
| borderWidth: 1 | ||
| }, | ||
| mode: 'x' | ||
| }, | ||
| pan: { enabled: true, mode: 'x' }, | ||
| }, | ||
| }, | ||
| scales: { | ||
| x: { | ||
| type: 'time', | ||
| time: { tooltipFormat: 'yyyy-MM-dd HH:mm:ss' }, | ||
| ticks: { font: { size: 11 } } | ||
| }, | ||
| y: { | ||
| beginAtZero: false, | ||
| ticks: { font: { size: 11 } }, | ||
| title: { | ||
| display: true, | ||
| text: config.unit, | ||
| font: { size: 12, weight: 'bold' } | ||
| } | ||
| }, | ||
| }, | ||
| }, | ||
| }); | ||
| }) | ||
| .catch(err => { | ||
| console.error(`Failed to load expanded chart:`, err); | ||
| modalCanvas.parentElement.innerHTML = '<div class="chart-unavailable">Failed to load chart data</div>'; | ||
| }); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Extract shared chart rendering logic to eliminate duplication.
The chart loading and rendering logic is duplicated between the grid view (lines 192-275) and modal view (lines 277-367). The dataset creation (lines 200-218 vs 298-316) and Chart.js configuration (lines 226-264 vs 321-360) are nearly identical, with only minor differences in styling.
This duplication violates DRY principles, increases maintenance burden (bug fixes must be applied twice), and makes it easy to introduce inconsistencies between the two views.
♻️ Proposed refactor
Extract a shared function to create datasets and chart options:
function createDatasets(promResponse, config) {
return promResponse.data.result.map((result, i) => {
const labelKey = Object.keys(result.metric).find(k => k !== '__name__') || 'instance';
const labelValue = result.metric[labelKey] || 'unknown';
const dataPoints = result.values.map(v => ({
x: new Date(v[0] * 1000),
y: config.convertToMs ? parseFloat(v[1]) * 1000 : parseFloat(v[1])
}));
return {
label: labelValue,
data: dataPoints,
borderColor: chartColors[i % chartColors.length],
backgroundColor: 'transparent',
pointRadius: 0,
tension: 0.3,
};
});
}
function createChartOptions(config, isModal = false) {
const fontSize = isModal ? 11 : 9;
const legendSize = isModal ? 12 : 10;
return {
responsive: true,
maintainAspectRatio: false,
// ... rest of options with conditional sizes
};
}Then use these helpers in both grid and modal rendering, passing style parameters as needed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/openshift/mustgathermetrics/metrics.html` around lines 192 - 367,
The grid and modal chart rendering code is duplicated; extract shared helpers
and use them from both places. Implement a createDatasets(promResponse, config)
that builds dataset objects (used where datasets are created in the forEach
block and in expandChart) and implement createChartOptions(config,
isModal=false) that returns the Chart.js options (used instead of the duplicated
options object in the Chart(...) calls), then update the forEach
chartInstances[config.id] creation and expandChart to call axios, validate
promResponse, call createDatasets(...) and createChartOptions(..., true/false)
and pass those into new Chart(...); keep existing identifiers chartColors,
chartInstances, expandChart and modalChart so changes are local and preserve
minor style differences via the isModal flag.


Summary
This PR enhances the etcd page in the OPCT web UI report by adding all 6 etcd and API performance metric charts with interactive visualizations. It also removes legacy standalone metrics pages (metrics.html, index.html) that are no longer needed since charts are now integrated into the main web UI.
Changes
1. Fix Metrics Processor (NaN Handling)
Problem: Two charts (Kube API request p99, etcd request duration p99) were embedded in the legacy metrics.html but failed to export as JSON files due to
json: unsupported value: NaNerrors.Solution: Added NaN/Inf filtering in
processMetricV2()before JSON marshalling:index.jsonFiles:
internal/openshift/mustgathermetrics/plotly.go2. Add Missing Charts to Web UI
Added the 6th chart to the etcd tab:
Complete chart inventory (6 total):
Files:
data/templates/report/report.html3. Remove Legacy Metrics Pages
Removed standalone metrics pages that are redundant with the new web UI integration:
Removed generation:
metrics.html- Large echarts-based standalone page (480KB+)index.html- Plotly-based standalone pageindex.js- JavaScript for Plotly pageKept (required by web UI):
index.json- Index of available charts*.json.gz.json- Individual chart data files (6 files)Code cleanup:
SaveMetricsPageReport,newMetricsPage)bytes,text/template,sort)RenderPage()to only generate JSON filesFiles:
internal/openshift/mustgathermetrics/main.go,internal/openshift/mustgathermetrics/plotly.goInteractive Chart Features
All 6 charts include:
Technical Details
Data Source: Reuses existing Plotly JSON files from
./metrics/directory generated by the must-gather-monitoring plugin.Conditional Rendering: Charts only appear when
hasMetricsDatafeature flag is true (metrics were collected during OPCT run).Layout Pattern: Uses
v-if/v-elsefor split-pane container to prevent CSS conflicts with other pages.Chart.js Integration: Version 4.4.9 with time-series adapters (chartjs-adapter-date-fns), touch gestures (hammerjs), and interactive zoom (chartjs-plugin-zoom).
Testing
Tested with local OPCT archive:
Validated:
Related Issues
🤖 Generated with Claude Code
— AI Claude