feat(learn): sanitize sensitive data in output by default#705
Open
asaphe wants to merge 1 commit intortk-ai:developfrom
Open
feat(learn): sanitize sensitive data in output by default#705asaphe wants to merge 1 commit intortk-ai:developfrom
asaphe wants to merge 1 commit intortk-ai:developfrom
Conversation
Add regex-based sanitization to `rtk learn` that redacts infrastructure identifiers before writing rules files or printing reports: - AWS resource IDs (vpc-*, sg-*, subnet-*, vpce-*, i-*, etc.) - AWS account IDs (12-digit numbers) - Route53 hosted zone IDs - Absolute user home paths (/Users/name/... -> ~/...) - GitHub org/repo names in URLs Sanitization is on by default. Use --no-sanitize to preserve raw output for debugging. Closes rtk-ai#651 Signed-off-by: asaphe <asaphe@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rtk learn --write-rulesoutputs raw command pairs that often contain sensitive infrastructure identifiers. These get auto-loaded into Claude Code sessions if the generated rules file is committed.This PR adds a
Sanitizerthat redacts sensitive patterns from all output formats by default, with user-extensible patterns viaconfig.toml.Supersedes #652
Built-in patterns (8)
vpc-*,sg-*,subnet-*,vpce-*, + 7 more{prefix}-<ID>[0-9a-f]{8,17}i-*(EC2 instances)i-<ID>Z<ZONE_ID>Z+[0-9A-Z]{10,32}<ACCOUNT_ID>::,:, or/— ignores bare numbers/Users/name/...,/home/name/...~/...github.com/org/repo,repos/org/repo{prefix}<org>/<repo>gh apipath patterns<UUID>--repo org/repo--repo <org>/<repo>User-defined patterns via config.toml
Compiled as regexes, applied after built-in patterns. Matches replaced with
<REDACTED>. Invalid patterns warn on stderr and are skipped.Flags
Sanitization is on by default.
--no-sanitizerestores raw output.Design
Sanitizerstruct constructed once, passed by&Sanitizerto all output functions:sanitize()returnsCow<str>— usesOption<String>to track whether any pattern matched. Zero allocation when disabled or when no pattern fires.lazy_static!regexes match the existingdetector.rsconventions.apply!macro eliminates repetition while correctly handling theOption<String>→&str→Cowborrow chain.Changes
src/learn/report.rsSanitizerstruct, 8 named regexes,apply!macro, updated output functions, 23 testssrc/learn/mod.rsSanitizer, sanitizes JSON output (was a gap in original code)src/config.rsLearnConfigwithsanitize_patterns: Vec<String>src/main.rs--no-sanitizeCLI flagCHANGELOG.md[Unreleased]entryVerification
Local (rustc 1.94.0, macOS aarch64):
cargo fmt --all --check— our files cleancargo clippy --all-targets— no warnings from our codecargo test— 999 passed, 0 failed, 3 ignored--write-rulesfile. Verified--no-sanitizepreserves raw. VerifiedCow::Borrowedfor disabled and no-match paths.23 tests:
sanitize_redacts_aws_resource_ids,sanitize_redacts_ec2_instance_idssanitize_redacts_route53_zone_idssanitize_redacts_account_ids_after_delimiters,sanitize_ignores_bare_12_digit_numberssanitize_redacts_user_home_pathssanitize_redacts_github_org_repo_in_urlssanitize_redacts_uuids,sanitize_redacts_uuids_in_quotes,sanitize_ignores_short_hex_dashessanitize_redacts_repo_flagsanitize_preserves_commands_without_sensitive_data(verifiesCow::Borrowed),sanitize_handles_empty_inputsanitize_applies_multiple_patterns_in_one_commandsanitize_applies_user_patterns_from_config,sanitize_user_patterns_combine_with_builtins,sanitize_multiple_user_patterns_chain_correctlydisabled_sanitizer_returns_input_unchanged(verifiesCow::Borrowed)format_console_report_shows_header_for_empty_rules,format_console_report_includes_counts_and_errors,format_console_report_redacts_when_sanitizedwrite_rules_file_produces_grouped_markdown,write_rules_file_redacts_when_sanitizedCloses #651
Signed-off-by: asaphe asaphe@users.noreply.github.com