Skip to content

feat: ACA certification authority, security hardening, and production infrastructure#41

Merged
parlakisik merged 9 commits intomainfrom
feat/aca
Apr 7, 2026
Merged

feat: ACA certification authority, security hardening, and production infrastructure#41
parlakisik merged 9 commits intomainfrom
feat/aca

Conversation

@parlakisik
Copy link
Copy Markdown
Contributor

Summary

  • Agent Certification Authority (ACA) — ECDSA P-256 certificate signing, capability attestation, evidence-based reputation engine with anti-gaming safeguards, CRL, W3C Verifiable Credential support
  • Security hardening — Circuit breaker 4xx fix, SSRF prevention on A2AEndpoint, JWT_SECRET required in production, contract expiry enforcement, optimistic concurrency for work-publisher
  • Standard JSON error responses — Replaced 193 http.Error() plain-text responses with structured {"error": {"code", "message", "timestamp"}} across all 13 services (closes Missing Error Response Schema #6)
  • Webhook retry + HMAC signing — Exponential backoff (3 attempts), HMAC-SHA256 payload signing via X-Webhook-Signature header (closes Missing Provider Webhook/Callback Specification #5)
  • Circuit breakers — sony/gobreaker on all inter-service HTTP, only 5xx/timeout trips the breaker (closes Missing Circuit Breaker Patterns #8)
  • NATS JetStream — 7 persistent streams, deduplication, dead letter queue, configurable replica factor for production HA (closes Add pub/sub mechanism between components #22)
  • Pagination — limit/offset on bid-gateway, provider-registry, identity list endpoints (max 100)
  • Gateway identity CB — Prevents cache stampede when identity service is down (5 failures, 10s open)
  • Certificate signature versioning — Stores signed canonical bytes to avoid reconstruction ambiguity on format changes
  • Deployment wiring — JWT_SECRET, WEBHOOK_SECRET as K8s Secrets; NATS_STREAM_REPLICAS in configmap (1 dev, 3 prod); docker-compose updated
  • Documentation — PROJECT.md (startup workflows), AUTHENTICATION.md (production auth), ACA.md (VC pitch), IMPROVEMENTS.md, GitHub Pages with MkDocs Material

Commits (8)

Commit Description
4dca067 ACA service, platform infrastructure, and documentation
ead7676 CB 4xx fix, SSRF prevention, JWT validation, contract expiry
7798a5f Pagination for list endpoints
d8ad824 Standard JSON error response schema (all 13 services)
6686aa8 Circuit breaker for gateway identity service validator
6aa59b5 Webhook retry with HMAC signing, externalize NATS replicas
d656050 Optimistic concurrency, certificate signature versioning
f701dab Wire NATS_STREAM_REPLICAS, JWT_SECRET, WEBHOOK_SECRET into deployments

GitHub Issues Closed

- MkDocs Material config with dark/light mode, Mermaid diagram rendering, search
- Homepage (docs/index.md) with full README content: problem statement, key benefits,
  architecture, service catalog, ad-tech parallel, quick start, FAQ, demos
- Navigation: Home, Getting Started, Architecture, ACA, Integrations, Platform
- GitHub Action auto-deploys on push to main when docs/ or mkdocs.yml change
- Custom CSS with AEX purple/amber branding
- Hero images copied to docs/assets/images/ for MkDocs compatibility
- Quick Start and Deployment guides added to docs site
…xpiry

- Circuit breaker no longer trips on 4xx client errors (only 5xx/timeout)
  Added serverError type to distinguish server failures from client errors
- A2AEndpoint URL validation in bid-gateway prevents SSRF attacks
  Blocks private IPs, localhost, cloud metadata endpoints
- Gateway requires non-empty JWT_SECRET in production environments
  Fatal on startup if missing in non-development mode
- Contract expiry enforced in progress/complete/fail handlers
  Returns 410 Gone and transitions to EXPIRED status
- Fix evaluator test for 128-bit ID length (37 chars)
- Fix work-publisher test for updated New() signature
- bid-gateway: ListByWorkID store interface accepts limit/offset,
  both memory and mongo stores updated, handler parses query params
- provider-registry: HandleListAllProviders applies limit/offset
  after active-status filtering, returns total count
- identity: HandleListAPIKeys adds limit/offset with total count
- All endpoints default to limit=100, offset=0, max limit capped at 100
…vices

Replace all 193 http.Error() plain-text responses with structured JSON:
  { "error": { "code": "ERROR_CODE", "message": "...", "timestamp": "..." } }

Error codes use UPPER_SNAKE_CASE convention:
- BAD_REQUEST, UNAUTHORIZED, NOT_FOUND, METHOD_NOT_ALLOWED
- CONFLICT, GONE, INTERNAL_ERROR, BAD_GATEWAY
- Service-specific: WORK_ID_REQUIRED, BID_EXPIRED, CONTRACT_EXPIRED,
  TENANT_ID_REQUIRED, CERTIFICATE_ID_REQUIRED, etc.

Closes #6
Prevents cache stampede when identity service is down. After 5
consecutive failures, requests fast-fail for 10s instead of all
queuing on the 5s HTTP timeout. Half-open state allows one request
through to detect recovery. Only 5xx/network errors trip the
breaker; 4xx responses reset it.
Webhook delivery:
- Exponential backoff retry (3 attempts, 500ms/1s/2s)
- Retry on network errors and 5xx, no retry on 4xx
- HMAC-SHA256 payload signing via X-Webhook-Signature header
- WithWebhookSecret() to configure signing key
- X-Webhook-Timestamp header for replay protection

NATS streams:
- Replica factor moved from hardcoded 1 to Config.StreamReplicas
- AllStreams(replicas) accepts replica count parameter
- Default 1 for dev, set to 3 via NATS_STREAM_REPLICAS for production
- Closes GitHub issue #5 (provider webhook specification)
…e versioning

Work-publisher distributed locking:
- Added Version field to WorkSpec model for optimistic concurrency
- Store.UpdateWork now checks version match, returns ErrVersionConflict on mismatch
- Both memory and mongo stores implement version checking (mongo uses filter)
- OnBidSubmitted, CloseBidWindow, CancelWork retry up to 3 times on conflict
- Prevents race condition between concurrent bid submissions and window close

Certificate signature versioning:
- Added SignedData field to AgentCertificate model (stores canonical JSON that was signed)
- Added SchemaVersion field (currently 1) for future format migration
- ApproveCertificateRequest and RenewCertificate store signed bytes
- VerificationService uses stored SignedData when available, falls back to
  reconstruction for legacy certificates without SignedData
…oyments

Service configs:
- Added NatsStreamReplicas + WebhookSecret to work-publisher, settlement, certauth configs
- Pass StreamReplicas to nats.Config in all 3 main.go files
- Call publisher.WithWebhookSecret() when WEBHOOK_SECRET is set

Kubernetes:
- Added NATS_STREAM_REPLICAS to base configmap (default 1)
- Production overlay patches NATS_STREAM_REPLICAS to 3 for HA
- Added JWT_SECRET + WEBHOOK_SECRET to aex-secrets
- Gateway deployment wires JWT_SECRET, REDIS_URL, ENVIRONMENT from secret/configmap
- Work-publisher, settlement, certauth deployments wire NATS_URL,
  NATS_STREAM_REPLICAS from configmap and WEBHOOK_SECRET from secret

Docker Compose:
- Added NATS_STREAM_REPLICAS=1 and WEBHOOK_SECRET to work-publisher,
  settlement, certauth services
@parlakisik parlakisik merged commit f5108bb into main Apr 7, 2026
2 checks passed
@parlakisik parlakisik deleted the feat/aca branch April 8, 2026 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants