Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
278 changes: 119 additions & 159 deletions docs/privacy-logging.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,17 @@
# Privacy Logging Guidelines
# Privacy Logging

## Overview
Disciplr is committed to protecting user data. To ensure that sensitive Personally Identifiable Information (PII) and credentials are never written to long-term storage via logs, we have implemented **Pino-based structured logging** with automatic redaction of sensitive fields.

## Architecture
`src/middleware/privacy-logger.ts` implements privacy-hardened HTTP request logging. It:

### Structured JSON Logging with Pino
The backend now uses **Pino** for efficient, structured JSON logging that is:
- **Machine-readable**: Emits single-line JSON per log event for easy ingestion into log aggregators (Datadog, ELK, Grafana Loki, etc.)
- **Secure by default**: Sensitive fields are automatically redacted via Pino's `redact` configuration
- **Developer-friendly**: Pretty-printed output in development for readability
- **Zero-overhead**: Minimal performance impact compared to console logging
- Recursively redacts all PII from request bodies, query strings, and headers before emitting any log output.
- Emits **exactly one structured JSON line per request** to `stdout` via `console.log`, on response finish.
- Exports a standalone `redact()` utility that any module can call.
- Never mutates the original request object.

### Two-Layer Redaction Strategy
1. **Pino built-in redaction** (`src/middleware/logger.ts`): Automatically redacts fields matching configured paths
2. **Explicit redaction engine** (`src/middleware/privacy-logger.ts`): Additional `redact()` function for explicit control and backward compatibility
## Log Schema

### Correlation IDs
All logs include correlation IDs (from `x-correlation-id` or `x-request-id` headers) for end-to-end request tracing:
```json
{
"correlationId": "550e8400-e29b-41d4-a716-446655440000",
"event": "http.request",
"durationMs": 45
}
```
Every log line has exactly these top-level keys — no more, no less.

## Redaction Policy

Expand Down Expand Up @@ -79,163 +66,109 @@ The redaction engine is recursive and works safely across:

### Request Logger (`src/middleware/requestLogger.ts`)
Emits structured JSON for every HTTP request:

```json
{
"correlationId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2024-06-18T22:00:00.000Z",
"level": "info",
"event": "http.request",
"req": {
"method": "POST",
"url": "/api/vaults",
"path": "/api/vaults",
"headers": { "authorization": "***REDACTED***" },
"body": { "email": "***REDACTED***", "amount": 1000 },
"userId": "user123",
"userRole": "admin"
},
"res": { "statusCode": 201 },
"service": "disciplr-backend",
"method": "POST",
"url": "/api/auth/login",
"status": 200,
"durationMs": 45,
"msg": "POST /api/vaults 201 45ms"
"ip": "10.20.x.x",
"body": { "email": "[REDACTED]", "amount": 100 },
"query": null,
"headers": { "content-type": "application/json", "authorization": "[REDACTED]" }
}
```

**Log Level Selection**:
- `error` (5xx status codes)
- `warn` (4xx status codes)
- `info` (2xx status codes)
- `debug` (1xx status codes)
The schema is snapshot-tested in `src/tests/privacy-logger.redaction.test.ts`.

### Privacy Logger (`src/middleware/privacy-logger.ts`)
Emits privacy-focused events with IP masking:
```json
{
"correlationId": "550e8400-e29b-41d4-a716-446655440000",
"event": "privacy.request_logged",
"ip": {
"original": "192.168.1.1",
"masked": "192.168.x.x"
},
"request": {
"method": "POST",
"url": "/api/test",
"headers": { "authorization": "***REDACTED***" },
"body": { "email": "***REDACTED***" }
},
"timestamp": "2025-06-02T14:32:10.000Z",
"msg": "Privacy-logged: POST /api/test"
}
```
## Redaction Marker

**IP Masking**:
- IPv4: `192.168.1.1` → `192.168.x.x` (mask last 2 octets)
- IPv6: `2001:0db8:85a3::` → `2001:0db8:85a3:xxxx:xxxx:xxxx:xxxx:xxxx` (mask last 5 groups)
Sensitive values are replaced with the string `"[REDACTED]"` (exported as `REDACTED`).

## Configuration
## What Gets Redacted

### Logger Setup (`src/middleware/logger.ts`)
```typescript
export const logger = createLogger()
```
### Sensitive Field Names (case-insensitive key match)

**Environment Variables**:
- `NODE_ENV` — Enables pretty-printing in `development` mode
- `LOG_LEVEL` — Set minimum log level (`debug`, `info`, `warn`, `error`; default: `info`)
| Key | Why |
|-----|-----|
| `password`, `passwordHash` | Credentials |
| `token`, `accessToken`, `refreshToken` | Auth tokens |
| `apiKey`, `api_key` | API keys |
| `secret`, `credential`, `credentials` | Generic secrets |
| `authorization` | Auth header |
| `x-api-key`, `x-auth-token` | Custom auth headers |
| `cookie` | Session cookies |
| `ssn` | Social Security Number |
| `creditCard`, `credit_card`, `cvv`, `pin` | Payment data |
| `email` | Email address |
| `clientSecret` | OAuth secret |
| `creator`, `successDestination`, `failureDestination` | Vault addresses |

### In Development
Logs are pretty-printed with colors and indentation for readability:
```
INFO (disciplr-backend): POST /api/vaults 201 45ms
req: {
"method": "POST",
"url": "/api/vaults"
}
```
### PII Patterns (applied to string values regardless of key name)

### In Production
Logs are emitted as single-line JSON:
```
{"correlationId":"550e8400...","event":"http.request","req":{...},"res":{"statusCode":201},"durationMs":45}
```
| Pattern | Example |
|---------|---------|
| Email address (`/[^@\s]+@[^@\s]+\.[^@\s]+/`) | `user@example.com` |
| JWT (`/^[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+$/`) | `eyJ...` |

## Integration with Log Aggregators
## IP Masking

### Example: Datadog
```bash
# Install Datadog agent on your infrastructure
# Provide Datadog API key

# Datadog will automatically ingest JSON logs and parse fields:
service: disciplr-backend
correlationId: 550e8400-e29b-41d4-a716-446655440000
event: http.request
req.method: POST
```
| Input | Output |
|-------|--------|
| `192.168.1.1` (IPv4) | `192.168.x.x` |
| `2001:0db8:85a3::7334` (IPv6) | `2001:0db8:85a3:xxxx:xxxx:xxxx:xxxx:xxxx` |
| empty / unparseable | `unknown` |

### Example: Grafana Loki
Configure Promtail to scrape and parse JSON:
```yaml
scrape_configs:
- job_name: disciplr-backend
static_configs:
- targets:
- localhost
labels:
job: disciplr-backend
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
```
## Exports

### Example: ELK Stack
Logstash will parse JSON automatically:
```json
{
"correlationId": "550e8400-e29b-41d4-a716-446655440000",
"event": "http.request",
"req.method": "POST"
}
```typescript
import { redact, maskIp, shouldRedact, privacyLogger, REDACTED } from './middleware/privacy-logger.js'
```

## Adding New Redactions
### `redact<T>(value: T): T`

### Option 1: Add to Pino Redact Paths (Automatic)
Edit `src/middleware/logger.ts` and add the path to the `redact.paths` array:
```typescript
redact: {
paths: [
'req.body.newSensitiveField', // Add here
// ...existing paths
]
}
```
Deep-copies `value` and replaces every sensitive field value and every string matching a PII pattern with `REDACTED`. Input is never mutated. Handles circular references, `Date`, `RegExp`, `Buffer`, nested objects, and arrays.

### Option 2: Add to Explicit Redaction List (Backward Compat)
Edit `src/middleware/privacy-logger.ts` and add the field key to `SENSITIVE_FIELDS`:
```typescript
const SENSITIVE_FIELDS = new Set([
'email',
'password',
'newSensitiveField', // Add here
// ...existing fields
])
redact({ password: 'secret', amount: 100 })
// => { password: '[REDACTED]', amount: 100 }

redact({ nested: { email: 'a@b.com' } })
// => { nested: { email: '[REDACTED]' } }
```

## Accessing Logs in Downstream Handlers
### `maskIp(ip: string): string`

Returns a partially masked IP string (see table above).

### `shouldRedact(key: string): boolean`

Returns `true` if the key name (case-insensitive) is in the sensitive-field list.

### `privacyLogger`

Express middleware. Register it after body parsers and before routes:

Request handlers can use the injected logger for consistent structured logging:
```typescript
import { Request, Response, NextFunction } from 'express'
app.use(express.json())
app.use(privacyLogger)
app.use('/api', router)
```

export const myHandler = (req: Request, res: Response, next: NextFunction) => {
const logger = (req as any).logger
const correlationId = (req as any).correlationId
## Error Path

logger.info({ event: 'my_event', data: {...} }, 'Processing request')
res.json({ message: 'Success' })
}
If log serialization fails for any reason, a minimal safe fallback is emitted and `next()` is still called:

```json
{ "level": "error", "event": "privacy-logger.serialization-failure", "timestamp": "..." }
```

All logs from the same request will automatically share the correlation ID.
No request data is included in the fallback.

## Privacy Endpoint Security

Expand All @@ -250,26 +183,53 @@ The `GET /api/privacy/export` and `DELETE /api/privacy/account` endpoints implem

## Development vs Production

Redaction runs in **all environments** (development, staging, production) to:
- Prevent accidental ingestion of PII into development databases or logs
- Ensure parity in testing across environments
- Maintain security posture uniformly
## Adding New Sensitive Fields

Edit `SENSITIVE_KEYS` in `src/middleware/privacy-logger.ts`:

```typescript
const SENSITIVE_KEYS = new Set([
// ... existing keys ...
'myNewSensitiveField',
])
```

Update the snapshot after changing the set:

Debugging should rely on non-sensitive identifiers:
- User IDs: `user123` (visible)
- Vault IDs: `vault456` (visible)
- Transaction references: `tx789` (visible)
- Email addresses: `***REDACTED***` (hidden)
- API keys: `***REDACTED***` (hidden)
```bash
npx jest src/tests/privacy-logger.redaction.test.ts --updateSnapshot
```

## Testing

Run privacy logger tests to verify redaction coverage:
```bash
# Run the hardened redaction test suite
npx jest src/tests/privacy-logger.redaction.test.ts

# Update snapshot after intentional schema changes
npx jest src/tests/privacy-logger.redaction.test.ts --updateSnapshot

npm test -- src/tests/privacy-logger.test.ts
npm test -- src/tests/exportQueue.pii.test.ts
```

Test coverage includes:

- Primitive passthrough
- Email and JWT value-pattern redaction
- All sensitive key names (case-insensitive)
- Nested object and array redaction
- Deeply nested PII
- Circular reference protection
- No input mutation
- `Date`, `RegExp`, `Buffer` serialization
- `maskIp` IPv4 / IPv6 / unknown
- Middleware schema (exact top-level keys)
- `null` body and `null` query
- Header redaction (`authorization`, `x-api-key`, `x-auth-token`, `cookie`)
- Serialization-failure fallback
- Snapshot of a representative request

Coverage includes:
- ✅ Sensitive field redaction at all nesting levels
- ✅ IP masking (IPv4 and IPv6)
Expand Down Expand Up @@ -297,4 +257,4 @@ This logging architecture supports compliance with:
- **GDPR** — Redaction prevents PII leakage to log storage; right to access and erasure endpoints are rate-limited and ownership-gated
- **HIPAA** — Sensitive fields are never stored in unencrypted logs
- **SOC 2** — Structured logging enables audit trail generation
- **PCI DSS** — Passwords, tokens, and API keys are redacted
- **PCI DSS** — Passwords, tokens, and API keys are redacted
Loading
Loading