Skip to content

feat(comprehend): add Amazon Comprehend PII redaction utility#611

Open
EazyHood wants to merge 1 commit into
arakoodev:tsfrom
EazyHood:feat/comprehend-pii-redaction
Open

feat(comprehend): add Amazon Comprehend PII redaction utility#611
EazyHood wants to merge 1 commit into
arakoodev:tsfrom
EazyHood:feat/comprehend-pii-redaction

Conversation

@EazyHood

Copy link
Copy Markdown

What

Adds an Amazon Comprehend PII redaction utility to the JS SDK so sensitive
data can be stripped from a prompt before it is chained into an existing
Endpoint class (OpenAI, GeminiAI, LlamaAI, ...).

import { Comprehend } from "@arakoodev/edgechains.js/comprehend";
import { OpenAI } from "@arakoodev/edgechains.js/ai";

const comprehend = new Comprehend();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const safePrompt = await comprehend.redact("My email is jane@doe.com"); // -> "My email is [EMAIL]"
const answer = await openai.chat({ prompt: safePrompt });

Acceptance criteria

  • New class chainable with the Endpoint classes to redact sensitive
    information in prompts — Comprehend.redact() returns a cleaned string that
    feeds straight into OpenAI.chat({ prompt }).
  • Test casesarakoodev/src/comprehend/src/tests/... cover offset
    handling, multiple entities, custom masks, and empty/invalid input.
  • Working exampleexamples/redact-pii-with-comprehend demonstrates
    the redaction → LLM chain.
  • Demonstration video — happy to record/add one; flagging that the
    example needs live AWS + OpenAI credentials to run end-to-end. Let me know if
    a video is required for sign-off and I'll attach it.

Details

  • arakoodev/src/comprehend/
    • Comprehend.redact(text, { languageCode?, mask? }) — detect + redact
    • Comprehend.detectPii(text, languageCode?) — raw PiiEntity[]
    • Comprehend.applyRedaction() — pure, reverse-offset replacement so earlier
      masks never shift later entity offsets (unit-tested in isolation)
  • Registered the ./comprehend subpath export and added
    @aws-sdk/client-comprehend as a dependency.
  • Credentials follow the SDK convention: constructor options with
    AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION fallbacks.

/claim #290

Closes #290

Adds a Comprehend class that detects PII via Amazon Comprehend
(DetectPiiEntities) and returns a redacted prompt, ready to chain into
existing Endpoint classes (OpenAI, GeminiAI, ...).

- arakoodev/src/comprehend: Comprehend.redact() / detectPii() + static
  applyRedaction() with safe reverse-offset replacement
- tests covering offset handling, custom masks, empty/invalid input
- examples/redact-pii-with-comprehend: redaction -> OpenAI chain demo
- register ./comprehend export and @aws-sdk/client-comprehend dep

Closes arakoodev#290
@github-actions

Copy link
Copy Markdown

CLA Assistant Lite bot: Thank you for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the Arakoo Contributor License Agreement. You can sign the CLA by adding a new comment to this pull request and pasting exactly the following text.


I have read the Arakoo CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BOUNTY: integrate AWS Comprehend as a utility to redact data

1 participant