diff --git a/skills/compliance/pci-dss-review/gates/service-provider-aoc-coverage-gate.md b/skills/compliance/pci-dss-review/gates/service-provider-aoc-coverage-gate.md new file mode 100644 index 00000000..9468106c --- /dev/null +++ b/skills/compliance/pci-dss-review/gates/service-provider-aoc-coverage-gate.md @@ -0,0 +1,42 @@ +# Service Provider AOC Coverage Gate + +## Purpose +Prevents false-positive PCI DSS findings when a service provider's Attestation of Compliance (AOC) does not explicitly list the customer's in-scope services, but the provider's SOC 2 Type II or PCI DSS ROC covers the relevant service infrastructure and compensating controls are in place. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A PCI DSS assessment flags that a service provider's AOC does not explicitly cover the cardholder data environment (CDE) services used +2. The service provider has a SOC 2 Type II report or PCI DSS ROC that covers the relevant service infrastructure +3. The customer has contractual assurances that the provider's CDE infrastructure is in scope of their PCI assessment + +### Gate Check: Provider Assessment Coverage + +```yaml +check_provider_assessment_coverage: + - detection_patterns: + - "AOC|attestation.*of.*compliance|ROC|report.*on.*compliance" + - "SOC.*2|SOC.*3|Type.*II|service.*organization.*control" + - "PCI.*scope|CDE.*infrastructure|cardholder.*data.*environment" + - pass: "When the provider's SOC 2 Type II report or PCI ROC includes the specific service types being used (cloud hosting, payment processing, SaaS) and the report was issued within the last 12 months, downgrade to informational. Rationale: A current SOC 2 or PCI assessment covering the service type provides equivalent assurance to a service-specific AOC." + - fail: "When the provider has no current (<12 month) SOC 2 or PCI assessment, or the assessment explicitly excludes the service type being used, retain severity. Rationale: Without current third-party assessment coverage, the provider's CDE controls are unverified." +``` + +### Gate Check: Contractual Control Assurance + +```yaml +check_contractual_control_assurance: + - detection_patterns: + - "service.*provider|vendor|third.?party|sub.?processor|sub.?service" + - "contract.*section|security.*appendix|responsibility.*matrix" + - "right.*to.*audit|audit.*report|security.*certification" + - pass: "When the customer contract includes the provider's commitment to maintain PCI DSS compliance or equivalent security framework for the service infrastructure, AND grants the customer right-to-audit or report access, downgrade severity. Rationale: Contractual compliance commitments with audit rights provide formal assurance even without an explicit AOC." + - fail: "When the contract does not include compliance commitments, security framework requirements, or audit rights for the service, retain severity. Rationale: Without contractual security assurances, the customer cannot verify provider CDE controls." +``` + +## Resolution Path +1. Request the provider's most recent SOC 2 Type II report or PCI ROC and verify it covers the relevant service infrastructure +2. Map the in-scope CDE services to the provider's assessed control framework +3. If the AOC gap persists, request a service-specific AOC letter from the provider's compliance team +4. Document the compensating controls (SOC 2 coverage + contractual assurances) in the PCI DSS responsibility matrix diff --git a/skills/compliance/soc2-gap/gates/complementary-user-entity-controls-gate.md b/skills/compliance/soc2-gap/gates/complementary-user-entity-controls-gate.md new file mode 100644 index 00000000..57d6c8f3 --- /dev/null +++ b/skills/compliance/soc2-gap/gates/complementary-user-entity-controls-gate.md @@ -0,0 +1,42 @@ +# Complementary User Entity Controls Evaluation Gate + +## Purpose +Prevents false-positive SOC 2 gap findings when user entity controls are not formally documented in the service organization's control matrix, but compensating controls at the user entity level are addressed through contractual terms, shared responsibility matrices, and user entity responsibilities documentation. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A SOC 2 gap assessment flags missing complementary user entity controls (CUEC) in the service organization's control matrix +2. The organization has a published shared responsibility matrix or user responsibilities document +3. Customer contracts include security requirements and user entity obligations + +### Gate Check: Shared Responsibility Documentation + +```yaml +check_shared_responsibility_documentation: + - detection_patterns: + - "complementary.*user.*entity.*control|CUEC|user.*entity.*responsib" + - "shared.*responsib|responsibility.*matrix|shared.*security.*model" + - "customer.*responsib|client.*responsib|tenant.*responsib" + - pass: "When the organization publishes a shared responsibility matrix that explicitly states user entity obligations for the in-scope controls (access management, encryption configuration, incident notification), downgrade to informational. Rationale: A published shared responsibility model satisfies the intent of CUEC documentation even when not in the formal control matrix." + - fail: "When no shared responsibility documentation exists and user entity obligations are not communicated to customers, retain severity. Rationale: Undocumented user entity responsibilities create control gaps that may lead to audit findings for both the service organization and its customers." +``` + +### Gate Check: Contractual Security Requirements + +```yaml +check_contractual_security_requirements: + - detection_patterns: + - "SLA|service.*level.*agreement|terms.*of.*service|customer.*agreement" + - "security.*appendix|data.*protection.*addendum|DPA|SOW" + - "penetration.*test|audit.*right|security.*review|compliance.*cert" + - pass: "When customer contracts or data protection addenda include user entity security obligations (maintain access controls, encrypt data, report incidents), downgrade severity. Rationale: Contractually binding security obligations provide equivalent control coverage to formally documented CUECs." + - fail: "When contracts do not address user entity security obligations or data protection requirements, retain severity. Rationale: Without contractual security requirements, user entity controls exist only informally and are not auditable." +``` + +## Resolution Path +1. Map each in-scope SOC 2 control to the corresponding user entity responsibility in the shared responsibility matrix +2. Ensure the shared responsibility matrix is published in the customer portal and referenced in contracts +3. Add a CUEC section to the SOC 2 control matrix that cross-references the shared responsibility documentation +4. Review customer contracts to confirm security obligations are included and enforceable diff --git a/skills/devsecops/pipeline-security/gates/artifact-attestation-subject-gate.md b/skills/devsecops/pipeline-security/gates/artifact-attestation-subject-gate.md new file mode 100644 index 00000000..a7862a22 --- /dev/null +++ b/skills/devsecops/pipeline-security/gates/artifact-attestation-subject-gate.md @@ -0,0 +1,42 @@ +# Artifact Attestation Subject Gate + +## Purpose +Prevents false-positive findings when CI/CD pipeline security reviews flag missing artifact attestation for build artifacts that are produced and consumed within a trusted supply chain with compensating integrity controls (signed git tags, branch protection, SLSA-compliant build platform). + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A build artifact (container image, binary, package) is flagged as lacking attestation or SLSA provenance +2. The artifact is built on a platform with branch protection, required reviews, and signed commits +3. The artifact is consumed only within the same trusted organization or supply chain + +### Gate Check: Compensating Integrity Controls + +```yaml +check_compensating_integrity_controls: + - detection_patterns: + - "attestation|provenance|SLSA|in.?to.?to|signed.?tag|signed.?commit" + - "branch.?protection|required.?review|status.?check|CODEOWNERS" + - "container.*sign|image.*sign|cosign|notary|sigstore|fulcio" + - pass: "When the build platform enforces branch protection, required PR reviews, and signed commits, AND the artifact is distributed through a trusted registry with access controls, downgrade to informational. Rationale: Platform-level integrity controls provide equivalent assurance to per-artifact attestation for internal supply chains." + - fail: "When the build platform lacks branch protection or the artifact is distributed through public or untrusted channels without attestation, retain severity. Rationale: Without platform integrity, missing attestation creates genuine supply chain risk." +``` + +### Gate Check: Attestation Implementation Feasibility + +```yaml +check_attestation_feasibility: + - detection_patterns: + - "container.*build|docker.*build|kaniko|buildah|ko|jib" + - "gradle.*build|maven.*deploy|npm.*publish|pip.*publish" + - "github.*actions|gitlab.*ci|jenkins|circleci" + - pass: "When the build toolchain supports attestation generation (Cosign, Jib, Tekton Chains) AND the team has a documented plan to implement it within the next sprint, downgrade severity. Rationale: A documented implementation plan with toolchain support reduces urgency." + - fail: "When the toolchain does not support attestation, or there is no plan to implement it, retain severity. Rationale: Missing attestation in a supply chain without compensating controls is a security gap requiring remediation." +``` + +## Resolution Path +1. Document the current build platform's integrity controls (branch protection, signed commits, registry access controls) +2. Determine which attestation format (SLSA provenance, in-toto, Cosign) is feasible for the current toolchain +3. If using GitHub Actions or GitLab CI, enable OIDC-based attestation via Sigstore or native CI/CD attestation features +4. If the artifact is internal-only and platform integrity is strong, document the compensating controls as an exception diff --git a/skills/devsecops/sast-config/gates/generated-client-code-exclusion-gate.md b/skills/devsecops/sast-config/gates/generated-client-code-exclusion-gate.md new file mode 100644 index 00000000..f651ed0f --- /dev/null +++ b/skills/devsecops/sast-config/gates/generated-client-code-exclusion-gate.md @@ -0,0 +1,42 @@ +# Generated Client Code Exclusion Gate + +## Purpose +Prevents false-positive SAST findings when generated client code (OpenAPI clients, gRPC stubs, GraphQL codegen, SDK wrappers) contains security issues that are outside the developer's control and cannot be remediated by modifying the generated output. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. The flagged file is in a directory or has a header comment indicating auto-generation (OpenAPI, gRPC, GraphQL, Swagger Codegen, etc.) +2. The finding is in generated code that mirrors an API specification rather than hand-written business logic +3. The generation tool and source specification are under version control and can be remediated at the spec level + +### Gate Check: Generated File Marker + +```yaml +check_generated_file_marker: + - detection_patterns: + - "auto-generated|auto.?generated|generated.?by|do.?not.?edit|DO NOT EDIT" + - "openapi.?generator|swagger.?codegen|grpc.*generat|protoc" + - "graphql.*codegen|client.*gen|sdk.*generat" + - pass: "When the file has a generated marker AND the generation tool and source spec are in the same repository, downgrade to informational. Rationale: Issues in generated code should be fixed at the spec/template level, not in the generated output. The real vulnerability is in the spec." + - fail: "When the file has a generated marker but no source spec or generation tool is version-controlled, retain severity. Rationale: Generated code without a spec is effectively orphaned code that must be maintained manually, making the finding actionable." +``` + +### Gate Check: Spec-Level Fix Available + +```yaml +check_spec_level_fix_available: + - detection_patterns: + - "api.*spec|openapi.*yaml|openapi.*json|swagger|proto|graphql.*schema" + - "template.*file|mustache|handlebars|codegen.*template" + - "generator.*config|codegen.*config|openapitools.json" + - pass: "When the vulnerability can be fixed by modifying the API spec or codegen template (e.g., adding input validation patterns to the spec, updating template security headers), downgrade severity. Rationale: Spec-level fixes propagate to all generated clients, providing a systemic fix." + - fail: "When the vulnerability is inherent to the code generation process itself and cannot be fixed at the spec/template level, retain severity. Rationale: Findings that require post-generation patching are genuine code quality issues." +``` + +## Resolution Path +1. Identify the source spec file (OpenAPI YAML, .proto, GraphQL schema, etc.) and codegen configuration +2. Determine if the finding can be remediated by modifying the spec (e.g., adding pattern constraints, security definitions) or the codegen template +3. If spec-level fix is possible, submit a PR to the spec/template and regenerate the client +4. If spec-level fix is not possible, add a post-generation patch script or add the generated directory to the SAST scanner's exclusion list with a documented exception diff --git a/skills/identity/iam-review/gates/api-key-unused-age-gates.md b/skills/identity/iam-review/gates/api-key-unused-age-gates.md new file mode 100644 index 00000000..ba546c15 --- /dev/null +++ b/skills/identity/iam-review/gates/api-key-unused-age-gates.md @@ -0,0 +1,42 @@ +# API Key Unused-Age Gates + +## Purpose +Prevents false-positive IAM findings when API keys that exceed the standard unused-age threshold are flagged as stale, but the keys are intentionally long-lived for specific use cases (CI/CD pipelines, legacy system integration, disaster recovery credentials) with compensating monitoring controls. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. An API key exceeds the standard unused-age threshold (>90 days since last use) +2. The key is documented for a specific use case that requires periodic manual use (DR credentials, break-glass keys, legacy system integration) +3. Key usage is monitored and alerts trigger on anomalous activity + +### Gate Check: Documented Use Case + +```yaml +check_documented_use_case: + - detection_patterns: + - "break.?glass|emergency.*access|disaster.*recovery|DR.*key" + - "legacy.*integration|legacy.*system|vendor.*integration|legacy.*API" + - "CI.*pipeline|deploy.*key|release.*key|artifact.*key" + - pass: "When the key has a documented business justification in the key description or metadata, AND the justification requires the key to remain active beyond the standard rotation period, downgrade to informational. Rationale: Documented exception keys with business justification meet audit requirements for controlled exceptions." + - fail: "When the key has no description, no metadata, or no documented justification for exceeding the unused-age threshold, retain severity. Rationale: Undocumented keys exceeding the unused-age threshold may indicate forgotten or orphaned credentials." +``` + +### Gate Check: Monitoring Coverage + +```yaml +check_monitoring_coverage: + - detection_patterns: + - "last.*used|last.*access|last.*activity|last.*rotated" + - "cloud.*trail|audit.*log|access.*log|key.*usage|credential.*report" + - "alert|notif|anomaly|unusual.*activity|unexpected.*use" + - pass: "When the key's usage is logged to a SIEM or audit system with alerts configured for anomalous use (new location, new service, out-of-hours), downgrade severity. Rationale: Monitored keys with alerting provide detective control that compensates for the reduced rotational hygiene." + - fail: "When no usage monitoring or alerting exists for the key, retain severity. Rationale: Unmonitored keys beyond the unused-age threshold cannot detect misuse and must be rotated." +``` + +## Resolution Path +1. Add a meaningful description to the key documenting its purpose, owner, and expected use frequency +2. Set up CloudTrail/key usage logging and configure alerts for anomalous activity +3. Schedule the key for rotation at the next major release or DR test cycle +4. Document the exception in the IAM review report with the business justification and monitoring evidence diff --git a/skills/identity/iam-review/gates/keyless-workload-identity-gate.md b/skills/identity/iam-review/gates/keyless-workload-identity-gate.md new file mode 100644 index 00000000..e9e19478 --- /dev/null +++ b/skills/identity/iam-review/gates/keyless-workload-identity-gate.md @@ -0,0 +1,43 @@ +# Keyless Workload Identity Gate + +## Purpose +Prevents false-positive IAM findings when cloud workloads use keyless identity (OIDC, workload identity federation, instance metadata credentials) instead of long-lived service account keys, but the keyless approach is the more secure alternative. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A finding flags the absence of a service account key or access key for a workload +2. The workload uses OIDC-based identity federation (Workload Identity Federation, OIDC provider, IMDS credentials) +3. The cloud provider supports keyless authentication for the workload's runtime environment + +### Gate Check: OIDC Federation + +```yaml +check_oidc_federation: + - detection_patterns: + - "workload.*identity|OIDC|OpenID.*Connect|identity.*federation" + - "IMDS|instance.*metadata|metadata.*credentials|STS.*AssumeRoleWithWebIdentity" + - "keyless|no.?key|key.?free|token.*exchange" + - pass: "When the workload authenticates via OIDC identity federation (GCP Workload Identity Federation, AWS IAM Roles Anywhere, Azure Workload Identity, GitHub Actions OIDC), downgrade to informational. Rationale: Keyless OIDC identity is more secure than long-lived keys - shorter credential lifetime, automatic rotation, no secret management burden." + - fail: "When the workload uses static credentials (file-based, environment variable) without any identity federation mechanism, retain severity. Rationale: Static workload credentials without keyless alternatives are a genuine credential management risk." +``` + +### Gate Check: Credential Lifetime + +```yaml +check_credential_lifetime: + - detection_patterns: + - "access.*key|secret.*key|service.*account.*key|API.*key" + - "credential.*rotation|key.*rotation|secret.*rotat" + - "token.*expir|session.*duration|credential.*lifetime" + - pass: "When any static credential used has automatic rotation (<90 days) and the workload is actively migrating to keyless identity with a documented plan, downgrade severity. Rationale: Short-lived rotating credentials with a migration plan represent acceptable interim risk." + - fail: "When static credentials are older than 90 days without rotation and there is no keyless migration plan, retain severity. Rationale: Long-lived unrotated workload credentials are a standing vulnerability." +``` + +## Resolution Path +1. Identify the workload's runtime environment (GKE, ECS, Azure, GitHub Actions, on-prem) and map available OIDC identity options +2. If using GCP, enable Workload Identity Federation for the workload's service account +3. If using AWS, configure IAM Roles Anywhere or ECS task IAM roles +4. If using Azure, enable Workload Identity Federation with OIDC token exchange +5. Document the migration timeline and set credential rotation to <30 days as interim control diff --git a/skills/identity/rbac-design/gates/policy-decision-cache-invalidation-gate.md b/skills/identity/rbac-design/gates/policy-decision-cache-invalidation-gate.md new file mode 100644 index 00000000..0561a9da --- /dev/null +++ b/skills/identity/rbac-design/gates/policy-decision-cache-invalidation-gate.md @@ -0,0 +1,43 @@ +# Policy Decision Cache Invalidation Gate + +## Purpose +Prevents false-positive RBAC findings when access policies rely on cached authorization decisions that may not reflect recent role changes, but the cache invalidation strategy (TTL-based, event-driven, or periodic refresh) ensures stale decisions are bounded. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. An RBAC review identifies that cached policy decisions could serve stale authorization results +2. The authorization system has implemented a cache invalidation strategy +3. The maximum cache lifetime is documented and aligned with the organization's access risk tolerance + +### Gate Check: Cache Invalidation Strategy + +```yaml +check_cache_invalidation_strategy: + - detection_patterns: + - "cache.*invalidate|cache.*TTL|cache.*refresh|cache.*expir" + - "policy.*cache|decision.*cache|auth.*cache|PDP.*cache|PEP.*cache" + - "event.*driven|webhook.*invalidate|reactive.*invalidate|push.*invalidate" + - pass: "When the cache invalidation uses event-driven mechanisms (webhook on role change, Pub/Sub on policy update) with a fallback TTL of <5 minutes, downgrade to informational. Rationale: Event-driven invalidation ensures changes are reflected within seconds, with TTL as a safety net." + - fail: "When only passive TTL-based invalidation is used with a TTL >15 minutes, or there is no documented invalidation strategy, retain severity. Rationale: TTL-only invalidation with long windows can allow unauthorized access to persist for extended periods after role revocation." +``` + +### Gate Check: Critical Change Immediate Invalidation + +```yaml +check_critical_change_invalidation: + - detection_patterns: + - "role.*revoke|access.*revoke|permission.*remove|user.*terminat|employee.*offboard" + - "privilege.*escal|role.*elevat|admin.*grant|sensitive.*role" + - "termination|suspension|offboarding|deactivation" + - pass: "When critical access changes (termination, role revocation, privilege escalation) trigger immediate cache invalidation for the affected user, bypassing the normal TTL, downgrade severity. Rationale: Immediate invalidation for critical changes ensures the access control plane responds instantly to high-impact events." + - fail: "When all cache invalidation follows the same TTL regardless of change criticality, retain severity. Rationale: Equal treatment of routine and critical changes leaves a window for unauthorized access after high-impact events." +``` + +## Resolution Path +1. Implement event-driven cache invalidation for policy and role changes (webhook, Pub/Sub, or database CDC) +2. Set the passive TTL to 5 minutes or less as a fallback +3. Categorize access changes into critical (termination, role revocation) and routine (new role, attribute update) +4. Configure critical changes to bypass TTL and trigger immediate invalidation +5. Document the cache invalidation architecture and test quarterly with fire drills diff --git a/skills/identity/rbac-design/gates/rebac-graph-cycle-deny-effect-gate.md b/skills/identity/rbac-design/gates/rebac-graph-cycle-deny-effect-gate.md new file mode 100644 index 00000000..5de7deb3 --- /dev/null +++ b/skills/identity/rbac-design/gates/rebac-graph-cycle-deny-effect-gate.md @@ -0,0 +1,42 @@ +# ReBAC Graph Cycle Deny-Effect Gate + +## Purpose +Prevents false-positive RBAC findings when relationship-based access control (ReBAC) graphs contain relationship cycles that could theoretically lead to privilege escalation, but the policy evaluation engine implements cycle detection and deny-effect propagation to prevent exploitation. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. An RBAC review identifies relationship cycles in the ReBAC graph (e.g., user A manages user B who manages user A) +2. The authorization system uses a ReBAC or relationship-based model (Google Zanzibar, Auth0, OPA, Topaz, Keto) +3. The policy engine implements cycle detection or deny-override semantics + +### Gate Check: Cycle Detection + +```yaml +check_cycle_detection: + - detection_patterns: + - "cycle.*detect|graph.*cycle|relationship.*loop|circular.*relationship" + - "Zanzibar|ReBAC|relationship.*graph|tuple.*store|graph.*DB" + - "policy.*engine|authorization.*engine|PDP|OPA|Topaz|Keto" + - pass: "When the authorization engine implements cycle detection at evaluation time (Zanzibar-style reachability with TTL, OPA with depth limits), downgrade to informational. Rationale: Runtime cycle detection prevents infinite recursion and ensures policy evaluation terminates correctly even with graph cycles." + - fail: "When the authorization engine does not implement cycle detection and relies on graph acyclicity as a precondition, retain severity. Rationale: Undetected cycles in a ReBAC graph can cause infinite evaluation loops, denial of service, or incorrect authorization decisions." +``` + +### Gate Check: Deny-Override Semantics + +```yaml +check_deny_override_semantics: + - detection_patterns: + - "deny.*override|explicit.*deny|deny.*priority|negative.*authori" + - "default.*deny|deny.*all|blacklist|blocklist|revoke.*override" + - "policy.*conflict|decision.*ambiguity|evidence.*conflict" + - pass: "When the policy engine implements deny-override semantics (explicit deny takes precedence over any allow), cycles that create ambiguous allow paths are resolved to deny, downgrade severity. Rationale: Deny-override semantics ensure that cycles cannot create unintended allow paths." + - fail: "When the policy engine uses allow-override or first-match-wins semantics without explicit cycle handling, retain severity. Rationale: In non-deny-override systems, cycles can create unexpected allow paths that violate least privilege." +``` + +## Resolution Path +1. Verify the authorization engine implements cycle detection (query timeout, max depth, or reachability TTL) +2. Confirm deny-override semantics are configured and tested for all policy evaluation paths +3. Add graph cycle monitoring to detect and alert on new relationship cycles as they form +4. Document the cycle detection strategy and test quarterly with adversarial graph scenarios diff --git a/skills/incident-response/containment/gates/cloud-snapshot-quarantine-gate.md b/skills/incident-response/containment/gates/cloud-snapshot-quarantine-gate.md new file mode 100644 index 00000000..82c40382 --- /dev/null +++ b/skills/incident-response/containment/gates/cloud-snapshot-quarantine-gate.md @@ -0,0 +1,42 @@ +# Cloud Snapshot Quarantine Gate + +## Purpose +Prevents false-positive containment findings when cloud snapshot quarantine for forensic preservation creates cost or operational concerns, but the quarantine process uses automated lifecycle management, tiered storage, and snapshot budgeting to balance forensic needs with cost control. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A containment procedure requires taking snapshots of cloud volumes for forensic preservation +2. The snapshots are retained beyond the standard backup retention period (quarantine) +3. Snapshot costs or volume concerns are raised as an objection to the quarantine procedure + +### Gate Check: Lifecycle Automation + +```yaml +check_lifecycle_automation: + - detection_patterns: + - "snapshot.*lifecycle|snapshot.*retention|snapshot.*expir|snapshot.*delete" + - "automated.*snapshot|snapshot.*policy|DLM|data.?lifecycle.?manager" + - "tier.*snapshot|snapshot.*archive|S3.*glacier|snapshot.*cold" + - pass: "When snapshots have an automated lifecycle policy that transitions to cost-optimized storage (e.g., AWS EBS Snapshots Archive, snapshot tiering) after 30 days and deletes them after the evidence retention period, downgrade to informational. Rationale: Automated lifecycle management controls costs while preserving forensic evidence for the required retention period." + - fail: "When snapshots are taken but have no lifecycle policy, resulting in indefinite retention at full cost, retain severity. Rationale: Unmanaged forensic snapshots accumulate costs and may be deleted prematurely without lifecycle automation." +``` + +### Gate Check: Quarantine Budget + +```yaml +check_quarantine_budget: + - detection_patterns: + - "forensic.*budget|incident.*cost|IR.*budget|snapshot.*budget" + - "cloud.*cost|storage.*cost|snapshot.*cost|evidence.*cost" + - "cost.*center|chargeback|showback|incident.*tag" + - pass: "When the incident response budget includes a forensic snapshot allocation, and snapshot costs are tagged to the incident for cost tracking, downgrade severity. Rationale: Budgeted forensic costs with incident tagging ensure snapshots can be preserved without financial surprises." + - fail: "When there is no forensic snapshot budget and costs are charged to general storage, retain severity. Rationale: Unbudgeted forensic costs create pressure to delete evidence prematurely." +``` + +## Resolution Path +1. Create an automated snapshot lifecycle policy: cold tier after 30 days, delete after 90 days (or match evidence retention policy) +2. Tag all forensic snapshots with the incident ID for cost tracking and automated lifecycle management +3. Include forensic snapshot costs in the IR budget with a defined allocation +4. Document the quarantine procedure with lifecycle stages: Active (days 1-30), Archived (days 31-90), Deleted (day 90+) diff --git a/skills/incident-response/containment/gates/idp-session-revoke-propagation-gate.md b/skills/incident-response/containment/gates/idp-session-revoke-propagation-gate.md new file mode 100644 index 00000000..5148203d --- /dev/null +++ b/skills/incident-response/containment/gates/idp-session-revoke-propagation-gate.md @@ -0,0 +1,42 @@ +# IdP Session Revoke Propagation Gate + +## Purpose +Prevents false-positive containment findings when IdP session revocation does not immediately propagate to all downstream services, but the IdP and services use token expiration, forced re-authentication, or token revocation lists to bound the propagation delay. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A containment review flags that IdP session revocation does not immediately terminate all active sessions across downstream services +2. The IdP supports token revocation or short session lifetimes +3. Downstream services validate tokens on each request or at frequent intervals + +### Gate Check: Token Expiration Bounds + +```yaml +check_token_expiration_bounds: + - detection_patterns: + - "session.*expir|token.*lifetime|access.*token.*TTL|refresh.*token.*TTL" + - "JWT.*exp|session.*timeout|inactivity.*timeout|absolute.*timeout" + - "OAuth.*revocation|token.*revocation|session.*revocation" + - pass: "When access tokens have a TTL of 15 minutes or less, and refresh tokens require re-authentication within 24 hours, downgrade to informational. Rationale: Short token lifetimes bound the propagation delay, ensuring revoked sessions are effectively terminated within minutes." + - fail: "When access tokens have a TTL exceeding 1 hour or refresh tokens persist beyond 7 days without re-authentication, retain severity. Rationale: Long-lived sessions create an unacceptable window between revocation and effective session termination." +``` + +### Gate Check: Downstream Validation + +```yaml +check_downstream_validation: + - detection_patterns: + - "token.*validat|JWT.*verify|introspect|token.*check" + - "OAuth.*introspect|opaque.*token|session.*validat|auth.*check" + - "API.*gateway|reverse.*proxy|auth.*proxy|sidecar.*auth" + - pass: "When all downstream services validate tokens on every request (via API gateway, sidecar proxy, or per-service introspection), downgrade severity. Rationale: Per-request token validation ensures revocation is effective within one TTL, regardless of propagation mechanism." + - fail: "When downstream services cache authentication decisions for longer than the token TTL, or validate only at session start, retain severity. Rationale: Cached authentication decisions extend the revocation propagation window beyond the designed token lifetime." +``` + +## Resolution Path +1. Configure OAuth/OIDC provider to issue short-lived access tokens (15 min TTL) and enforce re-authentication for refresh tokens within 24 hours +2. Implement token introspection at the API gateway or reverse proxy for all downstream services +3. Deploy a sidecar auth proxy (Istio, Envoy, OAuth2 Proxy) for services that cannot implement token validation natively +4. Document the maximum theoretical propagation delay (token TTL + network latency) and verify it meets the incident response SLA diff --git a/skills/incident-response/forensics-checklist/gates/container-runtime-forensics-gate.md b/skills/incident-response/forensics-checklist/gates/container-runtime-forensics-gate.md new file mode 100644 index 00000000..bcbb1a7d --- /dev/null +++ b/skills/incident-response/forensics-checklist/gates/container-runtime-forensics-gate.md @@ -0,0 +1,42 @@ +# Container Runtime Forensics Gate + +## Purpose +Prevents false-positive forensics findings when container runtime evidence (ephemeral volumes, overlay filesystem, terminated pods) appears to be lost due to container lifecycle, but the cluster has compensating forensic capabilities (cluster-level audit logging, persistent event store, runtime telemetry). + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A forensics assessment identifies that container runtime data is unavailable post-termination +2. The cluster uses ephemeral storage (emptyDir, overlayfs) with no persistent volume for forensics +3. Compensating forensic evidence exists through cluster audit logs, k8s events, runtime telemetry, or sidecar logging + +### Gate Check: Cluster-Level Forensics + +```yaml +check_cluster_level_forensics: + - detection_patterns: + - "cluster.*audit|k8s.*audit|kubernetes.*event|audit.*log" + - "Falco|Tetragon|Cilium|Tracee|Sysdig|kubearmor|event.*collect" + - "sidecar.*log|fluentd|fluentbit|log.*shipper|log.*forward" + - pass: "When the cluster has audit logging enabled (API server audit, k8s events exported), AND runtime security monitoring (Falco/Tetragon) captures system call events, downgrade to informational. Rationale: Cluster-level and runtime telemetry provide forensic evidence equivalent to container-local artifacts for most incident types." + - fail: "When the cluster has no audit logging or runtime security monitoring, retain severity. Rationale: Without cluster-level or runtime forensic capabilities, terminated containers leave no forensic trace." +``` + +### Gate Check: Evidence Retention Policy + +```yaml +check_evidence_retention_policy: + - detection_patterns: + - "evidence.*retention|log.*retention|event.*retention|forensic.*retention" + - "Splunk|Elasticsearch|Loki|cloud.*log|log.*archive|S3.*log" + - "incident.*response|forensic.*readiness|IR.*preparedness" + - pass: "When the cluster audit logs and runtime events are retained for at least the organization's evidence retention period (typically 90+ days for incident response), downgrade severity. Rationale: Retained telemetry provides the evidentiary basis for post-incident forensics even without container-local artifacts." + - fail: "When logs are retained for less than 30 days or have no defined retention policy, retain severity. Rationale: Short retention periods may result in loss of forensic evidence before investigation completes." +``` + +## Resolution Path +1. Enable Kubernetes API server audit logging and export to a SIEM or long-term storage +2. Deploy runtime security monitoring (Falco, Tetragon, or Cilium) for system-call-level forensics +3. Configure structured container logging to stdout/stderr with a sidecar log shipper +4. Set log retention to match the organization's incident response evidence retention policy (minimum 90 days) diff --git a/skills/incident-response/ir-playbook/gates/ransomware-payment-legal-gate.md b/skills/incident-response/ir-playbook/gates/ransomware-payment-legal-gate.md new file mode 100644 index 00000000..227f28ae --- /dev/null +++ b/skills/incident-response/ir-playbook/gates/ransomware-payment-legal-gate.md @@ -0,0 +1,42 @@ +# Ransomware Payment Legal Gate + +## Purpose +Prevents false-positive findings when an incident response playbook references ransomware payment procedures, but the organization has a documented legal and board-approved ransomware payment decision framework that satisfies regulatory and insurance requirements. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. The IR playbook contains a section on ransomware payment decision-making or cryptocurrency acquisition procedures +2. The organization has a documented ransomware payment policy approved by legal counsel and the board of directors +3. The policy references current OFAC sanctions guidance, cyber insurance requirements, and law enforcement notification procedures + +### Gate Check: Legal Framework + +```yaml +check_ransomware_legal_framework: + - detection_patterns: + - "ransomware.*payment|ransom.*demand|decrypt.*payment|extortion.*payment" + - "cryptocurrency.*acquis|bitcoin.*purchase|coinbase.*ransom" + - "OFAC|sanctions.*check|SDN.*list|FinCEN" + - pass: "When the playbook references a board-approved ransomware payment policy that includes OFAC sanctions screening, law enforcement notification (FBI/CISA), and cyber insurance pre-approval, downgrade to informational. Rationale: A formal legal framework with sanctions compliance satisfies regulatory requirements for ransomware payment consideration." + - fail: "When the playbook includes payment instructions without referencing a legal framework, OFAC compliance, or law enforcement notification procedures, retain severity. Rationale: Ransomware payment without legal framework exposes the organization to sanctions violations and regulatory penalties." +``` + +### Gate Check: Insurance Pre-Approval + +```yaml +check_insurance_pre_approval: + - detection_patterns: + - "cyber.?insurance|cyber.?policy|ransomware.*coverage|incident.*response.*retainer" + - "insurance.*pre.?approval|insurer.*notif|claim.*submi" + - "breach.*coach|legal.*counsel|outside.*counsel" + - pass: "When the playbook requires cyber insurance carrier notification and pre-approval before any payment discussion, downgrade severity. Rationale: Insurance policy terms typically require pre-approval; following this process maintains coverage." + - fail: "When the playbook allows payment discussion or authorization without requiring insurance notification, retain severity. Rationale: Making or considering ransom payment without insurance involvement may void coverage." +``` + +## Resolution Path +1. Verify the organization's ransomware payment policy is current (reviewed within 12 months) and references OFAC's advisory on ransomware payments +2. Confirm the playbook includes a step to notify law enforcement (FBI, CISA, or local cybercrime unit) before any payment decision +3. Ensure the cryptocurrency acquisition process is documented and includes sanctions screening (SDN list check) +4. Add the insurance pre-approval step if missing, with specific contact information for the cyber insurance claims line diff --git a/skills/incident-response/post-incident-review/SKILL.md b/skills/incident-response/post-incident-review/SKILL.md index 748fb990..9c5c78f0 100644 --- a/skills/incident-response/post-incident-review/SKILL.md +++ b/skills/incident-response/post-incident-review/SKILL.md @@ -422,6 +422,16 @@ NIST recommends conducting the PIR within several days of incident closure. Wait --- +## Limitations + +- **Blind spots:** This skill depends on available code, configuration, logs, documentation, and user-provided context; it cannot prove controls exist or threats are absent when evidence is missing, runtime-only, or outside the review scope. +- **False-positive risks:** Treat findings as hypotheses until validated against asset criticality, compensating controls, environment intent, and recent authorized changes. +- **Required evidence:** Support each finding with concrete artifacts such as file paths and line numbers, policy snippets, scanner output, logs, screenshots, control records, or reproducible steps. +- **Normalized JSON:** When machine-readable output is requested, findings MUST be available as JSON that validates against [`schemas/finding.schema.json`](../../../schemas/finding.schema.json). +- **Escalation rules:** Escalate immediately for suspected active compromise, exposed secrets, regulated-data exposure, critical exploitable vulnerabilities, privileged-access abuse, or when evidence is insufficient to safely disposition a high-impact risk. + +--- + ## 8. Prompt Injection Safety Notice This skill processes incident response data including timelines, forensic findings, communication logs, and attacker TTPs. The agent must adhere to the following constraints: @@ -434,6 +444,12 @@ This skill processes incident response data including timelines, forensic findin --- +## Review Gates + +The following gates provide additional false-positive filtering for common review scenarios: + +- `gates/recurrence-test-control-gate.md` — Prevents false-positive findings when PIR recommendations lack formal recurrence test controls but compensating detective controls are already in place + ## 9. References 1. **NIST SP 800-61 Rev 2** -- Computer Security Incident Handling Guide (Section 3.4: Post-Incident Activity) -- https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final diff --git a/skills/incident-response/post-incident-review/gates/recurrence-test-control-gate.md b/skills/incident-response/post-incident-review/gates/recurrence-test-control-gate.md new file mode 100644 index 00000000..7a963629 --- /dev/null +++ b/skills/incident-response/post-incident-review/gates/recurrence-test-control-gate.md @@ -0,0 +1,43 @@ +# Recurrence Test Control Validation Gate + +## Purpose +Prevents false-positive findings when post-incident review (PIR) recommendations lack formal recurrence test controls, but compensating detective/preventive controls are already in place through existing monitoring, alerting, or change management processes. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A PIR recommendation to add recurrence testing controls is flagged as unimplemented +2. The incident root cause has compensating detective controls (monitoring, alerting, scheduled scans) +3. The team's existing change management process requires peer review and testing before production changes + +### Gate Check: Compensating Detective Controls + +```yaml +check_compensating_detective_controls: + - detection_patterns: + - "recurrence|re-occur|re.?occur|repeat.*incident" + - "regression.*test|recurrence.*test|control.*test" + - "monitor|alert|detect|scan|health.?check" + - pass: "When the incident type is already covered by a monitoring alert, scheduled vulnerability scan, or health check that would detect a recurrence, downgrade to informational. Rationale: Existing detective controls provide recurrence detection without formal test automation." + - fail: "When no compensating detective control exists and recurrence would only be detected through manual observation or user report, retain severity. Rationale: Silent recurrence of an incident without detection is a genuine control gap." +``` + +### Gate Check: Change Management Coverage + +```yaml +check_change_management_coverage: + - detection_patterns: + - "change.*review|peer.*review|code.*review|PR.*review" + - "CAB|change.*board|technical.*review|design.*review" + - "staging|test.*environment|canary|blue.?green|feature.*flag" + - pass: "When the fix was deployed through a change management process that requires peer review, staging validation, and rollback planning, downgrade severity. Rationale: Formal change management with testing gates reduces recurrence risk similarly to dedicated regression tests." + - fail: "When the fix was deployed directly to production without peer review or staging validation, retain severity. Rationale: Direct-to-production fixes without review bypass the primary recurrence prevention mechanism." +``` + +## Resolution Path +1. Identify any existing monitoring alerts, scheduled scans, or health checks that would detect the same incident pattern +2. If compensating controls exist, document them in the PIR action tracker as the recurrence prevention strategy +3. If no compensating controls exist, file a feature request for a recurrence detection control (alert, scan, or test) +4. Ensure the incident fix was deployed through documented change management with peer review and a rollback plan +5. Set a 90-day calendar reminder to verify the compensating control is still effective diff --git a/skills/network/dns-security/SKILL.md b/skills/network/dns-security/SKILL.md index b8a5413f..394985c3 100644 --- a/skills/network/dns-security/SKILL.md +++ b/skills/network/dns-security/SKILL.md @@ -386,6 +386,16 @@ abcdef0123456789.dnscat.example.com TXT --- +## Limitations + +- **Blind spots:** This skill depends on available code, configuration, logs, documentation, and user-provided context; it cannot prove controls exist or threats are absent when evidence is missing, runtime-only, or outside the review scope. +- **False-positive risks:** Treat findings as hypotheses until validated against asset criticality, compensating controls, environment intent, and recent authorized changes. +- **Required evidence:** Support each finding with concrete artifacts such as file paths and line numbers, policy snippets, scanner output, logs, screenshots, control records, or reproducible steps. +- **Normalized JSON:** When machine-readable output is requested, findings MUST be available as JSON that validates against [`schemas/finding.schema.json`](../../../schemas/finding.schema.json). +- **Escalation rules:** Escalate immediately for suspected active compromise, exposed secrets, regulated-data exposure, critical exploitable vulnerabilities, privileged-access abuse, or when evidence is insufficient to safely disposition a high-impact risk. + +--- + ## Prompt Injection Safety Notice This skill processes DNS configuration files that may contain user-supplied zone data, comments, or TXT record values. When reading configuration files: @@ -409,6 +419,13 @@ This skill processes DNS configuration files that may contain user-supplied zone - ISC Response Policy Zones (RPZ): https://www.isc.org/rpz/ - CISA Protective DNS: https://www.cisa.gov/protective-dns +## Review Gates + +The following gates provide additional false-positive filtering for common review scenarios: + +- `gates/dns-provider-api-token-scope-gate.md` — Prevents false-positive findings when DNS provider API tokens with broad scopes are constrained by resource-level IAM policies +- `gates/registrar-account-takeover-mfa-gate.md` — Prevents false-positive findings when registrar accounts use alternative strong authentication (FIDO2, SSO MFA) not detected by standard MFA checks + --- ## Changelog diff --git a/skills/network/dns-security/gates/dns-provider-api-token-scope-gate.md b/skills/network/dns-security/gates/dns-provider-api-token-scope-gate.md new file mode 100644 index 00000000..b1c20abc --- /dev/null +++ b/skills/network/dns-security/gates/dns-provider-api-token-scope-gate.md @@ -0,0 +1,42 @@ +# DNS Provider API Token Scope Gate + +## Purpose +Prevents false-positive findings when DNS provider API tokens with broader-than-necessary scopes are flagged as over-privileged, but the token's effective permissions are constrained by the provider's resource-level IAM or organization-level policies. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A DNS provider API token (Cloudflare API token, AWS IAM access key, Azure service principal, GCP service account key) has broader scopes than the minimum required for its function +2. The token is used for DNS record management (DDNS, Let's Encrypt DNS-01, Terraform DNS provider) +3. The provider supports resource-level or condition-based policy constraints + +### Gate Check: Resource-Level Constraint + +```yaml +check_resource_level_constraint: + - detection_patterns: + - "cloudflare.*api.*token|dns.*api.*key|route53.*key" + - "iam.*access.*key|service.*principal|service.*account" + - "dns.*zone|managed.*dns|record.*set" + - pass: "When the token is scoped to specific DNS zones or resources via provider-native IAM (Cloudflare Zone API token, AWS IAM condition keys, GCP service account IAM binding), downgrade to informational. Rationale: Resource-level policies effectively limit the token's blast radius despite broad API scope." + - fail: "When the token has account-level or organization-level scope AND no resource-level policy binds it to specific zones, retain severity. Rationale: Unconstrained DNS API tokens with broad scope can modify any DNS record in the account." +``` + +### Gate Check: Token Rotation and Monitoring + +```yaml +check_token_rotation_and_monitoring: + - detection_patterns: + - "access.*key|api.*key|secret.*key|token" + - "rotation|rotat|renew|refresh" + - "cloudtrail|audit.*log|access.*log|activity.*log" + - pass: "When the token is rotated within 90 days and its usage is monitored via provider audit logs with alerts for anomalous activity, downgrade severity. Rationale: Short-lived tokens with audit coverage limit the window and detectability of misuse." + - fail: "When the token is older than 90 days without rotation, or audit logging is not enabled for the token's actions, retain severity. Rationale: Long-lived unmonitored tokens present an unacceptable risk of undetected credential misuse." +``` + +## Resolution Path +1. Identify the specific DNS zones or resources the token needs to manage for its documented function +2. Create a provider-scoped token or IAM policy that restricts the token to only those zones/resources +3. Set a 90-day rotation reminder using the provider's token expiration feature or a calendar reminder +4. Enable CloudTrail/Audit Logs for the token's API actions and set up alerts for zone deletions and record modifications diff --git a/skills/network/dns-security/gates/registrar-account-takeover-mfa-gate.md b/skills/network/dns-security/gates/registrar-account-takeover-mfa-gate.md new file mode 100644 index 00000000..7bbb46b2 --- /dev/null +++ b/skills/network/dns-security/gates/registrar-account-takeover-mfa-gate.md @@ -0,0 +1,43 @@ +# Registrar Account Takeover MFA Evidence Gate + +## Purpose +Prevents false-positive findings when domain registrar accounts are flagged as lacking MFA, but the account uses an alternative strong authentication method (FIDO2 security key, passkey, hardware TOTP, SSO with MFA) that is not detected by the standard MFA check. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A domain registrar account (GoDaddy, Namecheap, Cloudflare, AWS Route53, Google Domains) is flagged as MFA-disabled +2. The registrar supports authentication methods beyond standard TOTP (FIDO2, WebAuthn, passkeys, hardware tokens, SSO federation) +3. The account has domain management privileges (ability to transfer, delete, or modify DNS) + +### Gate Check: Alternative MFA Detection + +```yaml +check_alternative_mfa_detection: + - detection_patterns: + - "(u2f|fido2|webauthn|passkey|hardware.*token|yubikey|titan.*key)" + - "(sso|saml|oidc|azure.*ad|okta|google.*workspace) (with|via|through)" + - "security.*key|platform.*authenticator|biometric" + - pass: "When the registrar account is authenticated via SSO that requires MFA at the IdP level, or has FIDO2/passkey registered as a second factor, downgrade to informational. Rationale: Alternative MFA methods provide equivalent or stronger protection than TOTP and may not be detected by standard MFA checks." + - fail: "When the account relies solely on password authentication without any registered second factor or SSO, retain severity. Rationale: Password-only domain registrar access is a critical takeover risk." +``` + +### Gate Check: Account Activity Monitoring + +```yaml +check_account_activity_monitoring: + - detection_patterns: + - "registrar|domain.*manage|dns.*hosting|name.*server" + - "domain.*transfer|domain.*push|auth.*code|ep" + - "registrant.*contact|whois|domain.*lock|transfer.*lock" + - pass: "When the registrar account has out-of-band notifications (email + SMS) enabled for critical actions (transfer out, auth code request, contact change) AND a registrant lock or transfer lock is active, downgrade severity. Rationale: Layered administrative controls reduce the impact of a credential compromise even without MFA." + - fail: "When critical action notifications are not enabled or the domain lacks a registrar/transfer lock, retain severity. Rationale: Without monitoring and locks, a credential compromise can silently exfiltrate domain ownership." +``` + +## Resolution Path +1. Log into the registrar's security settings and verify all registered authentication methods (not just whether TOTP is on) +2. If SSO MFA is used, confirm the IdP MFA policy is enforced on the registrar SAML/OIDC app +3. Enable domain transfer lock and registrant lock if not already active +4. Set up out-of-band notifications for all critical domain actions: transfer out, auth code request, contact modification, nameserver change +5. If MFA is genuinely absent, enable at least one strong factor (TOTP app, security key, or passkey) immediately diff --git a/skills/network/firewall-review/SKILL.md b/skills/network/firewall-review/SKILL.md index 25f8e588..ca87eeba 100644 --- a/skills/network/firewall-review/SKILL.md +++ b/skills/network/firewall-review/SKILL.md @@ -363,6 +363,16 @@ Produce the final report using the following structure. --- +## Limitations + +- **Blind spots:** This skill depends on available code, configuration, logs, documentation, and user-provided context; it cannot prove controls exist or threats are absent when evidence is missing, runtime-only, or outside the review scope. +- **False-positive risks:** Treat findings as hypotheses until validated against asset criticality, compensating controls, environment intent, and recent authorized changes. +- **Required evidence:** Support each finding with concrete artifacts such as file paths and line numbers, policy snippets, scanner output, logs, screenshots, control records, or reproducible steps. +- **Normalized JSON:** When machine-readable output is requested, findings MUST be available as JSON that validates against [`schemas/finding.schema.json`](../../../schemas/finding.schema.json). +- **Escalation rules:** Escalate immediately for suspected active compromise, exposed secrets, regulated-data exposure, critical exploitable vulnerabilities, privileged-access abuse, or when evidence is insufficient to safely disposition a high-impact risk. + +--- + ## Prompt Injection Safety Notice This skill processes firewall configurations that may contain user-supplied comments, rule descriptions, or object names. When reading configuration files: @@ -382,6 +392,13 @@ This skill processes firewall configurations that may contain user-supplied comm - NIST SP 800-41 Rev 1 (PDF): https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-41r1.pdf - CIS Benchmarks (platform-specific firewall hardening): https://www.cisecurity.org/cis-benchmarks +## Review Gates + +The following gates provide additional false-positive filtering for common review scenarios: + +- `gates/emergency-rule-rollback-expiry-gate.md` — Prevents false-positive severity escalation when temporary emergency firewall rules have passed their rollback expiry but were formally adopted +- `gates/cloud-security-group-drift-gate.md` — Prevents false-positive findings when cloud security group rules have drifted from IaC source of truth but are intentionally documented + --- ## Changelog diff --git a/skills/network/firewall-review/gates/cloud-security-group-drift-gate.md b/skills/network/firewall-review/gates/cloud-security-group-drift-gate.md new file mode 100644 index 00000000..1e8e445a --- /dev/null +++ b/skills/network/firewall-review/gates/cloud-security-group-drift-gate.md @@ -0,0 +1,42 @@ +# Cloud Security Group Object Drift Gate + +## Purpose +Prevents false-positive firewall review findings when cloud security group rules have drifted from their Infrastructure-as-Code (IaC) source of truth but the drift is intentional and documented. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A cloud security group rule differs from the IaC-managed template (Terraform, CloudFormation, Pulumi) +2. The rule was created outside the IaC pipeline (manual console change, CLI, or API) +3. A drift detection tool (e.g., AWS Config, Azure Policy, Google Cloud Asset Inventory) has flagged the discrepancy + +### Gate Check: Drift Authorization + +```yaml +check_drift_authorization: + - detection_patterns: + - "drift|config.?drift|template.?drift|state.?drift" + - "terraform|cloudformation|pulumi|cdk|arm.?template" + - "out.?of.?band|manual.?change|console.?change|break.?glass" + - pass: "When the drifted rule has an associated change ticket or break-glass record with documented business justification and planned IaC reconciliation date, downgrade to informational. Rationale: Authorized out-of-band changes with a documented reconciliation plan are acceptable temporary deviations." + - fail: "When no change ticket, break-glass record, or planned reconciliation exists for the drifted rule, retain original severity. Rationale: Unauthorized IaC drift is a configuration compliance violation that increases the attack surface." +``` + +### Gate Check: Drift Age and Criticality + +```yaml +check_drift_age_and_criticality: + - detection_patterns: + - "security.?group|sg|nsg|firewall.?rule|acl" + - "0\\.0\\.0\\.0/0|::/0|wildcard|any.*any|all.*traffic" + - "expos|internet.?facing|public" + - pass: "When the drifted rule is less than 72 hours old and does not open 0.0.0.0/0 to non-HTTP(S) ports, downgrade severity. Rationale: Recent drifts within standard change windows are likely intentional and pending IaC reconciliation." + - fail: "When the drift is older than 72 hours without reconciliation, or opens 0.0.0.0/0 to privileged ports (22, 3389, 3306, 5432, 6379, 27017), retain severity. Rationale: Stale, broad drifts represent unmanaged security posture degradation." +``` + +## Resolution Path +1. Check the cloud provider's change history or API logs for the user/role that created the security group rule outside IaC +2. Determine if the change references a ticket number in its description tag or was created during an incident +3. If authorized: create a Terraform/CloudFormation PR to reconcile the IaC state within the documented timeline +4. If unauthorized: revert the rule immediately and investigate the source of the unmanaged change diff --git a/skills/network/firewall-review/gates/emergency-rule-rollback-expiry-gate.md b/skills/network/firewall-review/gates/emergency-rule-rollback-expiry-gate.md new file mode 100644 index 00000000..6b9960a1 --- /dev/null +++ b/skills/network/firewall-review/gates/emergency-rule-rollback-expiry-gate.md @@ -0,0 +1,42 @@ +# Emergency Rule Rollback Expiry Gate + +## Purpose +Prevents false-positive severity escalation when temporary emergency firewall rules that have passed their rollback expiry are flagged as violations, even though the change management process formally approved the permanent adoption. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. A firewall rule is flagged as expired, temporary, or emergency +2. The rule has an associated change request or emergency change record +3. The rule's documented rollback date/time has passed + +### Gate Check: Change Request Status + +```yaml +check_change_request_status: + - detection_patterns: + - "emergency.*change|urgent.*change|expedited.*change" + - "rollback.*date|rollback.*time|expir.*time|valid.*until" + - "RFC|CHG|INC|emergency.*ticket" + - pass: "When the associated change request has a status of Completed or Closed and the approval documented a permanent adoption decision, downgrade to informational. Rationale: The emergency rule became permanent through documented change management process." + - fail: "When the change request is still in Pending, In Progress, or the rollback is not documented as canceled, retain original severity. Rationale: An expired emergency rule without formal permanent adoption is a security control gap." +``` + +### Gate Check: Rule Lifecycle Documentation + +```yaml +check_rule_lifecycle_documentation: + - detection_patterns: + - "temporary.*rule|emergency.*rule|interim.*rule|hotfix.*rule" + - "rule.*cleanup|cleanup.*date|review.*date|revert.*date" + - "expir|rollback|sunset|deprecat" + - pass: "When the rule has documented evidence of post-emergency review (ticket comment, CAB minutes, rule recertification) that confirms it as intentionally permanent, downgrade severity. Rationale: Formal adoption documentation satisfies audit requirements for the rule's ongoing existence." + - fail: "When no post-emergency review evidence exists and the rule exceeds its documented rollback expiry, retain severity. Rationale: An undocumented permanent emergency rule violates the principle of least privilege and change control policy." +``` + +## Resolution Path +1. Locate the change request or emergency ticket associated with the firewall rule +2. Verify the ticket status and check for a documented permanent adoption decision or rule recertification +3. If the rule is intended to be permanent, create a standard change request to formalize the rule outside the emergency process +4. If no permanent adoption decision exists, schedule the rule for removal within the next maintenance window diff --git a/skills/network/segmentation/SKILL.md b/skills/network/segmentation/SKILL.md index 06f80741..9fbbe628 100644 --- a/skills/network/segmentation/SKILL.md +++ b/skills/network/segmentation/SKILL.md @@ -347,6 +347,16 @@ Document or verify the existence of a segmentation testing process: --- +## Limitations + +- **Blind spots:** This skill depends on available code, configuration, logs, documentation, and user-provided context; it cannot prove controls exist or threats are absent when evidence is missing, runtime-only, or outside the review scope. +- **False-positive risks:** Treat findings as hypotheses until validated against asset criticality, compensating controls, environment intent, and recent authorized changes. +- **Required evidence:** Support each finding with concrete artifacts such as file paths and line numbers, policy snippets, scanner output, logs, screenshots, control records, or reproducible steps. +- **Normalized JSON:** When machine-readable output is requested, findings MUST be available as JSON that validates against [`schemas/finding.schema.json`](../../../schemas/finding.schema.json). +- **Escalation rules:** Escalate immediately for suspected active compromise, exposed secrets, regulated-data exposure, critical exploitable vulnerabilities, privileged-access abuse, or when evidence is insufficient to safely disposition a high-impact risk. + +--- + ## Prompt Injection Safety Notice This skill processes network configurations that may contain user-supplied comments, resource names, or tag values. When reading configuration files: @@ -368,6 +378,14 @@ This skill processes network configurations that may contain user-supplied comme - Kubernetes Network Policies: https://kubernetes.io/docs/concepts/services-networking/network-policies/ - Project Calico Documentation: https://docs.tigera.io/calico/latest/about/ +## Review Gates + +The following gates provide additional false-positive filtering for common review scenarios: + +- `gates/ot-iot-jump-host-exceptions-gate.md` — Prevents false-positive segmentation violations when OT/IoT jump-host traffic is flagged as unauthorized lateral movement +- `gates/backup-management-plane-segmentation-gate.md` — Prevents false-positive segmentation alerts when backup system traffic traverses management plane boundaries +- `gates/service-mesh-network-policy-gate.md` — Prevents false-positive segmentation alerts when service mesh sidecars and NetworkPolicy create authorized cross-namespace traffic + --- ## Changelog diff --git a/skills/network/segmentation/gates/backup-management-plane-segmentation-gate.md b/skills/network/segmentation/gates/backup-management-plane-segmentation-gate.md new file mode 100644 index 00000000..0105ccd0 --- /dev/null +++ b/skills/network/segmentation/gates/backup-management-plane-segmentation-gate.md @@ -0,0 +1,42 @@ +# Backup Management Plane Segmentation Gate + +## Purpose +Prevents false-positive segmentation alerts when backup system traffic traverses management plane boundaries as part of authorized backup and recovery workflows. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. Traffic originates from a known backup server, backup agent, or storage appliance +2. Destination includes management interfaces (iLO, iDRAC, BMC, IPMI) or hypervisor management networks +3. The connection occurs during a documented backup window + +### Gate Check: Backup Server Identity + +```yaml +check_backup_server_identity: + - detection_patterns: + - "backup.*server|backup.*appliance|veeam|netbackup|commvault|rubrik|cohesity" + - "storage.*array|backup.*storage|tape.*library" + - "backup.*agent|backup.*proxy|media.*server" + - pass: "When the source is a recognized backup infrastructure component and the backup schedule confirms the window, downgrade to informational. Rationale: Backup systems require management plane access to snapshot VMs and restore operations by design." + - fail: "When the source is not in the backup infrastructure inventory, retain original severity. Rationale: Non-backup systems should not have management plane access." +``` + +### Gate Check: Management Protocol Necessity + +```yaml +check_management_protocol_necessity: + - detection_patterns: + - "snapshot|vm.?backup|volume.?shadow|vss|hypervisor.?api" + - "idrac|ilo|ipmi|bmc|mgmt.*network" + - "restore|recovery|failover|replication" + - pass: "When the detected management protocol is VMware vSphere API, Hyper-V WMI, or storage array replication that matches the backup tool's documented integration, downgrade severity. Rationale: These protocols are necessary for backup operations and produce no management-plane risk." + - fail: "When the protocol is interactive management (SSH/RDP to hypervisor, IPMI shell) outside of documented break-glass procedures, retain severity. Rationale: Interactive management plane access from a backup server is anomalous." +``` + +## Resolution Path +1. Confirm the source IP/hostname is in the backup infrastructure CMDB group +2. Verify the connection timestamp falls within a documented backup or maintenance window +3. Check that the management protocol is on the approved integration list for the backup tool vendor +4. If an exception is warranted, document with a reference to the backup architecture diagram and restore procedure diff --git a/skills/network/segmentation/gates/ot-iot-jump-host-exceptions-gate.md b/skills/network/segmentation/gates/ot-iot-jump-host-exceptions-gate.md new file mode 100644 index 00000000..df00a685 --- /dev/null +++ b/skills/network/segmentation/gates/ot-iot-jump-host-exceptions-gate.md @@ -0,0 +1,41 @@ +# OT IoT Jump-Host Exceptions Gate + +## Purpose +Prevents false-positive segmentation violations when OT/IoT jump-host traffic is flagged as unauthorized lateral movement despite operating within approved bastion/air-gap architectures. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. Alert involves traffic from a jump host / bastion host to an OT or IoT subnet +2. Destination is a known industrial control system (ICS) or IoT device IP range +3. Source port is ephemeral (49152-65535) and protocol is RDP/SSH/VNC or vendor-specific SCADA protocol + +### Gate Check: Jump-Host Authorization + +```yaml +check_jump_host_authorization: + - detection_patterns: + - "jump.?host|bastion.?host|pam.?server|privileged.?access.?management" + - "(ot|ics|scada|plc|rtu) (subnet|segment|network|vlan)" + - "air.?gap|DMZ.?OT|Purdue.?level" + - pass: "When the jump host is in the authorized bastion inventory and the destination OT segment is in the documented Purdue model (Level 1-2), downgrade to informational. Rationale: Approved OT access via managed bastion is expected behavior in segmented ICS architectures." + - fail: "When the jump host is unrecognized, unmanaged, or the OT segment is not in the documented segmentation plan, retain original severity. Rationale: Unknown jump host to OT segment traffic is a genuine lateral movement signal." +``` + +### Gate Check: Protocol Allowlisting + +```yaml +check_protocol_allowlisting: + - detection_patterns: + - "(ssh|rdp|vnc|https?) (to|toward|from) (ot|ics|plc)" + - "modbus|profinet|s7|dnp3|bacnet|opc" + - pass: "When the detected protocol matches the documented allowed protocols for that OT segment's jump-host ACL, downgrade severity. Rationale: Protocol matches authorized access pattern in the OT security policy." + - fail: "When the protocol is not in the documented allowlist for that jump-host-to-OT-segment pair, retain original severity. Rationale: Unexpected protocol to OT segment is a genuine anomaly regardless of jump host source." +``` + +## Resolution Path +1. Verify the jump host's FQDN against the bastion inventory (CMDB or PAM system) +2. Confirm the OT subnet is in the approved segmentation plan with documented business justifications +3. Ensure the protocol in use is listed in the OT segment's ingress ACL for that jump host +4. Document the exception with a reference to the approved bastion and OT segment policy documents diff --git a/skills/network/segmentation/gates/service-mesh-network-policy-gate.md b/skills/network/segmentation/gates/service-mesh-network-policy-gate.md new file mode 100644 index 00000000..7f904611 --- /dev/null +++ b/skills/network/segmentation/gates/service-mesh-network-policy-gate.md @@ -0,0 +1,42 @@ +# Service Mesh and NetworkPolicy Patterns Gate + +## Purpose +Prevents false-positive segmentation alerts when service mesh sidecar proxies and Kubernetes NetworkPolicy rules create cross-namespace traffic that is authorized by the mesh control plane. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. Traffic crosses namespace or network segment boundaries +2. Source and destination pods are both part of a service mesh (Istio, Linkerd, Consul Connect, Cilium Mesh) +3. mTLS is enabled between the communicating services as confirmed by mesh telemetry + +### Gate Check: Service Mesh Authorization + +```yaml +check_service_mesh_authorization: + - detection_patterns: + - "istio|linkerd|consul.?connect|cilium.?mesh|kuma|nginx.?mesh" + - "sidecar|envoy|proxy|data.?plane" + - "authorization.?policy|authz.?policy|mesh.?policy" + - pass: "When both pods have sidecar proxies and an AuthorizationPolicy or equivalent allows the detected traffic, downgrade to informational. Rationale: Service mesh enforces intent-based segmentation at L7 with mutual TLS, making raw network-layer alerts redundant for authorized flows." + - fail: "When one or both pods lack sidecar proxies, or no AuthorizationPolicy matches the traffic, retain original severity. Rationale: Traffic crossing segment boundaries without mesh authorization is a genuine bypass of intended segmentation." +``` + +### Gate Check: NetworkPolicy Coverage + +```yaml +check_network_policy_coverage: + - detection_patterns: + - "network.?policy|netpol|k8s.?network" + - "namespace.*isolation|segment.*policy|micro.?segment" + - "egress.*policy|ingress.*policy|default.*deny" + - pass: "When the source namespace has a NetworkPolicy that explicitly allows egress to the destination namespace/port, downgrade severity. Rationale: Kubernetes NetworkPolicy provides explicit intent-based allowlisting at L3/L4, which supersedes generic segmentation rules." + - fail: "When no NetworkPolicy allows the traffic (default-deny is in effect) or the policy does not cover the detected port/protocol, retain severity. Rationale: Traffic that violates declared NetworkPolicy is a genuine segmentation bypass regardless of mesh presence." +``` + +## Resolution Path +1. Verify both communicating pods have sidecar proxies injected (kubectl get pods -n ns -o json | jq .items[].metadata.annotations) +2. Check for an AuthorizationPolicy that matches source principal, destination service, and HTTP method/port +3. Validate that a NetworkPolicy in the source namespace allows egress to the destination namespace on the detected port +4. If both mesh and NetworkPolicy authorize the flow, document the exception with references to the mesh and NetPol YAML definitions diff --git a/skills/secops/alert-triage/SKILL.md b/skills/secops/alert-triage/SKILL.md index 927e7d68..aaf84f05 100644 --- a/skills/secops/alert-triage/SKILL.md +++ b/skills/secops/alert-triage/SKILL.md @@ -321,6 +321,16 @@ Waiting for complete certainty before escalating a high-priority alert costs res --- +## Limitations + +- **Blind spots:** This skill depends on available code, configuration, logs, documentation, and user-provided context; it cannot prove controls exist or threats are absent when evidence is missing, runtime-only, or outside the review scope. +- **False-positive risks:** Treat findings as hypotheses until validated against asset criticality, compensating controls, environment intent, and recent authorized changes. +- **Required evidence:** Support each finding with concrete artifacts such as file paths and line numbers, policy snippets, scanner output, logs, screenshots, control records, or reproducible steps. +- **Normalized JSON:** When machine-readable output is requested, findings MUST be available as JSON that validates against [`schemas/finding.schema.json`](../../../schemas/finding.schema.json). +- **Escalation rules:** Escalate immediately for suspected active compromise, exposed secrets, regulated-data exposure, critical exploitable vulnerabilities, privileged-access abuse, or when evidence is insufficient to safely disposition a high-impact risk. + +--- + ## 8. Prompt Injection Safety Notice This skill processes user-supplied content that may include alert payloads, log data, SIEM query results, and threat intelligence reports. The agent must adhere to the following safety constraints: @@ -333,6 +343,12 @@ This skill processes user-supplied content that may include alert payloads, log --- +## Review Gates + +The following gates provide additional false-positive filtering for common review scenarios: + +- `gates/llm-assisted-triage-evidence-gate.md` — Prevents false-positive downgrade of alert severity when LLM-generated triage recommendations are accepted without independent verification + ## 9. References 1. **NIST SP 800-61 Rev 2 -- Computer Security Incident Handling Guide** -- https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final diff --git a/skills/secops/alert-triage/gates/llm-assisted-triage-evidence-gate.md b/skills/secops/alert-triage/gates/llm-assisted-triage-evidence-gate.md new file mode 100644 index 00000000..b9176f65 --- /dev/null +++ b/skills/secops/alert-triage/gates/llm-assisted-triage-evidence-gate.md @@ -0,0 +1,41 @@ +# LLM-Assisted Triage Evidence Gate + +## Purpose +Prevents false-positive downgrade of alert severity when LLM-generated triage recommendations are accepted without independent verification of the LLM's reasoning chain and supporting evidence. + +## Detection Logic + +### Trigger Conditions +Fire this gate when ALL of the following are true: +1. The triage output references an LLM-generated analysis or recommendation as the primary disposition basis +2. The LLM's reasoning chain is not accompanied by verifiable evidence from the alert data (raw logs, SIEM fields, threat intel matches) +3. The disposition confidence is rated High or Medium without corroborating manual correlation + +### Gate Check: LLM Reasoning Trace + +```yaml +check_llm_reasoning_trace: + - detection_patterns: + - "LLM (analysis|assessment|suggests|recommends)" + - "AI-generated (triage|disposition)" + - "based on (LLM|AI|language model) (analysis|output)" + - pass: "When LLM reasoning is accompanied by step-by-step trace of evidence sources (raw event fields, matched rules, TI lookups), severity may be downgraded. Rationale: Traceable reasoning enables auditor verification of each claim." + - fail: "When LLM output is presented as a disposition without traceable evidence links, retain original severity and flag for manual review. Rationale: Untraceable AI reasoning is not auditable evidence." +``` + +### Gate Check: Evidence Completeness + +```yaml +check_evidence_completeness: + - detection_patterns: + - "disposition: (TP|BTP|FP)" + - "confidence: (High|Medium)" + - "escalation: (Yes|No)" + - pass: "When at least 3 of the 4 evidence sources (alert payload fields, correlated events, threat intel matches, asset context) are cited in the reasoning, downgrade is permitted. Rationale: Multi-source correlation reduces false-positive risk from single-source LLM misinterpretation." + - fail: "When fewer than 3 evidence sources support the LLM's disposition, or when only LLM text is cited, retain original severity. Rationale: Insufficient evidence for confident disposition." +``` + +## Resolution Path +1. Include the LLM's raw input (alert data sent to the model) and output (disposition, reasoning) as an appendix to the triage report +2. Cross-reference each LLM claim against at least one verifiable data source from the alert or correlated events +3. Mark the triage report with "LLM-assisted" and ensure a human analyst reviews and signs off before escalation