Skip to content

[PECOBLR-1143] Implement telemetry Phase 4-5: Export infrastructure and opt-in configuration#319

Open
samikshya-db wants to merge 10 commits intomainfrom
stack/PECOBLR-1143-telemetry-phase4-5
Open

[PECOBLR-1143] Implement telemetry Phase 4-5: Export infrastructure and opt-in configuration#319
samikshya-db wants to merge 10 commits intomainfrom
stack/PECOBLR-1143-telemetry-phase4-5

Conversation

@samikshya-db
Copy link
Collaborator

Summary

This PR implements Phases 4-5 of the telemetry system for the Databricks SQL Go driver.

Stack: Part 1 of 2

  • This PR: PECOBLR-1143 (Phases 4-5)
  • Next: PECOBLR-1381-1382 (Phases 6-7)

Phase 4: Export Infrastructure ✅

New file: exporter.go (192 lines)

  • telemetryExporter with circuit breaker integration
  • ✅ HTTP POST to /api/2.0/telemetry-ext endpoint
  • ✅ Exponential backoff retry (100ms base, 3 retries max)
  • ✅ Tag filtering via shouldExportToDatabricks()
  • ✅ JSON serialization (telemetryPayload, exportedMetric)
  • ✅ Comprehensive error swallowing
  • ✅ Support for HTTP and HTTPS URLs

New file: exporter_test.go (448 lines)

  • ✅ 17 comprehensive tests with mock HTTP server
  • ✅ Success scenarios, retry logic, circuit breaker
  • ✅ Tag filtering, error swallowing, exponential backoff
  • ✅ Context cancellation, 4xx/5xx handling

Phase 5: Opt-In Configuration Integration ✅

Updated: config.go (+48 lines)

  • isTelemetryEnabled() with 5-level priority logic
  • ✅ Integration with featureFlagCache
  • ✅ Error handling with safe fallbacks

Updated: config_test.go (+230 lines)

  • ✅ 8 tests for all 5 priority levels
  • ✅ Server error scenarios, unreachable hosts

Priority Logic:

  1. forceEnableTelemetry=true → always enabled
  2. enableTelemetry=false → always disabled
  3. enableTelemetry=true + server feature flag check
  4. Server-side feature flag only (default)
  5. Default disabled (fail-safe)

Changes

Total: +976 insertions, -57 deletions


Testing

All 70+ tests passing ✅ (2.017s)

  • Circuit Breaker: 15 tests ✓
  • Config & Opt-In: 19 tests ✓
  • Exporter: 17 tests ✓
  • Feature Flags: 12 tests ✓
  • Manager: 9 tests ✓
  • Tags: 7 tests ✓

Related Issues

  • Implements: PECOBLR-1143 (Phases 4-5)
  • Enables: PECOBLR-1381 (Phase 6)
  • Enables: PECOBLR-1382 (Phase 7)

Checklist

  • Implements Phase 4 export infrastructure
  • Implements Phase 5 opt-in configuration
  • Comprehensive unit tests
  • All tests passing
  • DESIGN.md checklist updated
  • No breaking changes

…nd opt-in configuration

This commit implements the remaining components for PECOBLR-1143 (Phases 4-5):

Phase 4: Export Infrastructure
- Implement telemetryExporter with HTTP POST to /api/2.0/telemetry-ext
- Add retry logic with exponential backoff (100ms base, 3 retries)
- Integrate with circuit breaker for endpoint protection
- Implement tag filtering via shouldExportToDatabricks()
- Add error swallowing to ensure telemetry never impacts driver
- Support both http:// and https:// URLs for testing

Phase 5: Opt-In Configuration Integration
- Implement isTelemetryEnabled() with 5-level priority logic:
  1. forceEnableTelemetry=true - bypasses all server checks
  2. enableTelemetry=false - explicit opt-out
  3. enableTelemetry=true + server flag - user opt-in with server control
  4. Server flag only - default Databricks-controlled behavior
  5. Default disabled - fail-safe default
- Wire up with existing featureFlagCache for server flag checks
- Handle errors gracefully (default to disabled on failures)

Testing:
- Add 17 comprehensive unit tests for exporter (success, retries, circuit breaker, tag filtering, error swallowing, exponential backoff, context cancellation)
- Add 8 unit tests for isTelemetryEnabled (all 5 priority levels, error handling, server scenarios)
- All 70+ telemetry tests passing

Documentation:
- Update DESIGN.md checklist to mark Phases 3-5 as completed

This completes the core telemetry infrastructure for PECOBLR-1143.
Next phases (6-7) will add metric collection and driver integration.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
samikshya-db and others added 4 commits February 4, 2026 00:39
Address PR review comments by implementing config overlay pattern:

**Changes:**
- Remove ForceEnableTelemetry flag (no longer needed)
- Change EnableTelemetry from bool to *bool pointer
  - nil = use server feature flag (default)
  - true = client forces enabled (overrides server)
  - false = client forces disabled (overrides server)
- Simplify priority logic to 3 clear layers:
  1. Client Config (explicit) - highest priority
  2. Server Config (feature flag) - when client doesn't set
  3. Fail-Safe Default (disabled) - when server unavailable

**Benefits:**
- Simpler mental model: client > server > default
- Client config naturally has priority (no special bypass flag)
- Foundation for general config overlay system
- All 89 telemetry tests passing

Addresses review comments:
- "if we take the config overlay approach that client side config take
  priority? we won't need the concept of force enablement?"
- Sets foundation for extending to all driver configs (not just telemetry)
…sibility

Config Overlay System:
- Add internal/config/overlay.go with generic ConfigValue[T] type
- Support client > server > default priority across all driver configs
- Refactor telemetry to use ConfigValue[bool] instead of *bool
- Add comprehensive tests for overlay pattern (20+ test cases)

Feature Flag Refactoring:
- Support multiple feature flags in single HTTP request
- Change from single flag to map[string]bool storage
- Add getAllFeatureFlags() for easy flag registration
- Add getFeatureFlag() generic method for any flag
- Keep isTelemetryEnabled() API backward compatible
- Add ADDING_FEATURE_FLAGS.md guide for extending system

Documentation Updates:
- Document config overlay pattern in DESIGN.md
- Clarify synchronous fetch behavior (10s timeout, RetryableClient)
- Document blocking scenarios and thundering herd protection
- Add rationale for synchronous vs async approach

Benefits:
- Easy to add new flags (3 lines of code)
- Single HTTP request fetches all flags (efficient)
- Reusable ConfigValue[T] pattern for all driver configs
- All 89 telemetry tests passing
- Run gofmt -s on all telemetry files
- Add explicit error ignore for json.Encoder.Encode in tests
- Add comment to empty select case to fix SA9003 staticcheck warning
- Add explanatory comment to empty default case in connection.go:340
- This fixes the staticcheck SA9003 warning about empty branch
- The select statement uses default as non-blocking context check

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
samikshya-db added a commit that referenced this pull request Feb 3, 2026
samikshya-db added a commit that referenced this pull request Feb 3, 2026
samikshya-db added a commit that referenced this pull request Feb 4, 2026
- Resolved conflicts by adopting the extensible multi-flag architecture
- Replaced ForceEnableTelemetry with config overlay pattern
- Updated all tests to use new API without driverVersion parameter
- Maintained telemetry hooks (telemetryUpdate callbacks) from PR #322
- All feature flag operations now support multiple flags extensibly
- Changed from /api/2.0/telemetry-ext to /telemetry-ext
- Verified against JDBC driver PathConstants.java:12
- TELEMETRY_PATH = "/telemetry-ext"
samikshya-db added a commit that referenced this pull request Feb 4, 2026
samikshya-db added a commit that referenced this pull request Feb 4, 2026
- First connection now creates cache context and fetches from server
- Ensures telemetry respects actual server flag on initial connection
- Subsequent connections remain non-blocking with cached value
- Double-checked locking pattern to prevent race conditions
- Fix TestExport_Success: correct expected endpoint path to /telemetry-ext
- Fix TestFeatureFlagCache_IsTelemetryEnabled_NoContext: provide valid httpClient to avoid nil pointer dereference

Signed-off-by: samikshya-chand_data <[email protected]>
Add explicit statements to empty if blocks to satisfy staticcheck SA9003.
These branches are intentionally empty as they swallow errors/panics
to ensure telemetry never impacts driver operation.

Signed-off-by: samikshya-chand_data <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants