Reduce false positives in DatadogToken detector #4632

rootranjan · 2025-12-31T13:49:52Z

Description:

Reduce false positives in DatadogToken detector by filtering out legitimate code identifiers, checksums, encrypted data, and test values that match the detector pattern.

Changes:

Add filter to exclude letters-only matches (no digits)
Add filter to exclude repeated characters (test/placeholder values)
Add filter to exclude NPM integrity hashes (sha512-...== patterns)
Add filter to exclude Go module checksums (h1:...= patterns)
Add filter to exclude URL-encoded paths (%3A patterns)
Add filter to exclude SOPS-encrypted data (ENC[AES256_GCM,data:...] patterns)
Add filter to exclude base64-encoded certificates (caBundle patterns)
Fix lint errors by properly handling res.Body.Close() errors

This reduces false positives from legitimate code identifiers, checksums, encrypted data, and test values while still detecting real Datadog API and Application keys that contain digits and have higher entropy.

Problem:
The DatadogToken detector was flagging any 32-character or 40-character alphanumeric string near the keywords "datadog" or "dd" as a potential secret, including:

URL-encoded service names in paths (e.g., service%3Amy-app-service-name)
NPM package integrity hashes (e.g., substrings from sha512-...== patterns)
Go module checksums (e.g., substrings from h1:...= patterns)
SOPS-encrypted data (e.g., substrings from ENC[AES256_GCM,data:...] patterns)
Test/placeholder values (e.g., 11111111111111111111111111111111)
Base64-encoded certificates (e.g., substrings from caBundle fields)

Solution:
Added isLikelyFalsePositive() helper function with multiple filters:

Letters-only filter - Excludes strings with no digits (service names/identifiers)
Repeated characters filter - Excludes test/placeholder values like 11111111111111111111111111111111
NPM integrity hash filter - Detects sha512-...== patterns in package.json files
Go module checksum filter - Detects h1:...= patterns in go.sum/go.mod files
URL-encoded path filter - Detects %3A patterns and URL structures
SOPS-encrypted data filter - Detects ENC[AES256_GCM,data:...] patterns
Base64 certificate filter - Detects caBundle and certificate-related fields

Implementation Details:

Modified FromData() to use FindAllStringSubmatchIndex() to get match positions for context extraction
Added context-aware filtering that checks surrounding text (±200 chars for most patterns, ±2000 chars for certificates) to detect patterns
Filters are applied before processing matches to avoid unnecessary verification calls
Each filter function extracts context around the match and checks for specific patterns (e.g., sha512-, h1:, %3A, ENC[, caBundle)

Checklist:

Tests passing (make test-community)?
Lint passing (make lint this requires golangci-lint)?

Add filters to exclude legitimate code patterns: - Letters-only matches (service names/identifiers) - Repeated characters (test/placeholder values) - NPM integrity hashes (sha512-...== patterns) - Go module checksums (h1:...= patterns) - URL-encoded paths (%3A patterns) - SOPS-encrypted data (ENC[AES256_GCM,data:...] patterns) - Base64-encoded certificates (caBundle patterns) This reduces false positives while still detecting real Datadog API and Application keys.

nabeelalam

Hey @rootranjan! Thanks for proposing a solution for the false positive issues in this detector.

I'm all for testing for entropy and whether the tokens contain all required characters, but I feel the rest of the checks may be adding some complexity and performance hits without a strong guarantee of detecting false positives.

Introducing 4 more regular expressions along with the slicing/matching definitely will definitely impact the performance on this detector, we generally like to keep detectors light since inputs can be large.

Plus, several filters seem a little to broad to me (e.g., %3A anywhere in the range of 200 chars) and could possibly suppress actual secrets and mark them false and it's better to be safe than sorry in this case. Plus with the added complexity it would be harder to debug those cases too.

I would suggest keeping changes minimal, like the lighter weight constraints (required chars, entropy threshold) that you have added and add a couple of high-confidence exclusions only.

nabeelalam · 2026-01-01T11:23:51Z

pkg/detectors/datadogtoken/datadogtoken.go

+// isRepeatedCharacter checks if a string consists of the same character repeated.
+// This filters out test/placeholder values like "11111111111111111111111111111111"
+func isRepeatedCharacter(s string) bool {
+	if len(s) == 0 {
+		return false
+	}
+	firstChar := s[0]
+	for i := 1; i < len(s); i++ {
+		if s[i] != firstChar {
+			return false
+		}
+	}
+	return true
+}


We can use an entropy test for this

nabeelalam · 2026-01-01T11:34:23Z

pkg/detectors/datadogtoken/datadogtoken.go

+func hasDigit(s string) bool {
+	for _, r := range s {
+		if unicode.IsDigit(r) {
+			return true
+		}
+	}
+	return false
+}


I suppose we should check whether the token contains at least one lower-case and upper-case letter as well. I'm not entirely sure but the entropy check might also solve this (I doubt it).

rootranjan requested a review from a team December 31, 2025 13:49

rootranjan requested a review from a team as a code owner December 31, 2025 13:49

nabeelalam requested changes Jan 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce false positives in DatadogToken detector #4632

Reduce false positives in DatadogToken detector #4632

rootranjan commented Dec 31, 2025

Uh oh!

nabeelalam left a comment

Uh oh!

nabeelalam Jan 1, 2026

Uh oh!

nabeelalam Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Reduce false positives in DatadogToken detector #4632

Are you sure you want to change the base?

Reduce false positives in DatadogToken detector #4632

Conversation

rootranjan commented Dec 31, 2025

Description:

Checklist:

Uh oh!

nabeelalam left a comment

Choose a reason for hiding this comment

Uh oh!

nabeelalam Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

nabeelalam Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants