-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Reduce false positives in DatadogToken detector #4632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Reduce false positives in DatadogToken detector #4632
Conversation
Add filters to exclude legitimate code patterns: - Letters-only matches (service names/identifiers) - Repeated characters (test/placeholder values) - NPM integrity hashes (sha512-...== patterns) - Go module checksums (h1:...= patterns) - URL-encoded paths (%3A patterns) - SOPS-encrypted data (ENC[AES256_GCM,data:...] patterns) - Base64-encoded certificates (caBundle patterns) This reduces false positives while still detecting real Datadog API and Application keys.
nabeelalam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rootranjan! Thanks for proposing a solution for the false positive issues in this detector.
I'm all for testing for entropy and whether the tokens contain all required characters, but I feel the rest of the checks may be adding some complexity and performance hits without a strong guarantee of detecting false positives.
Introducing 4 more regular expressions along with the slicing/matching definitely will definitely impact the performance on this detector, we generally like to keep detectors light since inputs can be large.
Plus, several filters seem a little to broad to me (e.g., %3A anywhere in the range of 200 chars) and could possibly suppress actual secrets and mark them false and it's better to be safe than sorry in this case. Plus with the added complexity it would be harder to debug those cases too.
I would suggest keeping changes minimal, like the lighter weight constraints (required chars, entropy threshold) that you have added and add a couple of high-confidence exclusions only.
| // isRepeatedCharacter checks if a string consists of the same character repeated. | ||
| // This filters out test/placeholder values like "11111111111111111111111111111111" | ||
| func isRepeatedCharacter(s string) bool { | ||
| if len(s) == 0 { | ||
| return false | ||
| } | ||
| firstChar := s[0] | ||
| for i := 1; i < len(s); i++ { | ||
| if s[i] != firstChar { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use an entropy test for this
| func hasDigit(s string) bool { | ||
| for _, r := range s { | ||
| if unicode.IsDigit(r) { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we should check whether the token contains at least one lower-case and upper-case letter as well. I'm not entirely sure but the entropy check might also solve this (I doubt it).
Fixes #4631
Description:
Reduce false positives in DatadogToken detector by filtering out legitimate code identifiers, checksums, encrypted data, and test values that match the detector pattern.
Changes:
sha512-...==patterns)h1:...=patterns)%3Apatterns)ENC[AES256_GCM,data:...]patterns)caBundlepatterns)res.Body.Close()errorsThis reduces false positives from legitimate code identifiers, checksums, encrypted data, and test values while still detecting real Datadog API and Application keys that contain digits and have higher entropy.
Problem:
The DatadogToken detector was flagging any 32-character or 40-character alphanumeric string near the keywords "datadog" or "dd" as a potential secret, including:
service%3Amy-app-service-name)sha512-...==patterns)h1:...=patterns)ENC[AES256_GCM,data:...]patterns)11111111111111111111111111111111)caBundlefields)Solution:
Added
isLikelyFalsePositive()helper function with multiple filters:11111111111111111111111111111111sha512-...==patterns in package.json filesh1:...=patterns in go.sum/go.mod files%3Apatterns and URL structuresENC[AES256_GCM,data:...]patternscaBundleand certificate-related fieldsImplementation Details:
FromData()to useFindAllStringSubmatchIndex()to get match positions for context extractionsha512-,h1:,%3A,ENC[,caBundle)Checklist:
make test-community)?make lintthis requires golangci-lint)?