Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/.vuepress/config.js
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ module.exports = {
['standards/versioning', 'Versioning'],
['standards/creating-new-features', 'Creating New Features'],
['standards/triaging-bugs', 'Triaging Bugs'],
['standards/logging-std', 'Logging Standards'],
]
},
{
Expand Down
53 changes: 53 additions & 0 deletions docs/community/standards/logging-std.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Mojaloop Logging Standard

**Version:** 1.0.0

## Purpose

This document defines the logging standard for Mojaloop projects to ensure:
- **Consistent log levels** across all services
- **OpenTelemetry compatibility** for unified observability
- **Distributed tracing integration** via trace context propagation
- **Actionable logs** in production environments
- **Efficient debugging** without log noise
- **Performance optimization** through appropriate log levels
- **Security** by preventing sensitive data exposure

## Standard Components

The standard is broken down into the following sections:

* [Log Levels](./logging/log_levels.md) - Definitions, OTel mapping, and decision trees.
* [Data Model](./logging/data_model.md) - JSON structure, required fields, and naming conventions.
* [Trace Context](./logging/trace_context.md) - Propagation and correlation rules.
* [Security](./logging/security.md) - PII and sensitive data handling.
* [Best Practices](./logging/best_practices.md) - Performance, anti-patterns, and implementation guidelines.

## Quick Reference Summary

| Level | OTel Severity | Production | Use Case | Example |
|-------|---------------|-----------|----------|---------|
| FATAL | 21-24 | ✅ Always | Service outage | DB unreachable, panic |
| ERROR | 17-20 | ✅ Always | Operation failures | Transaction failed |
| WARN | 13-16 | ✅ Always | Recoverable issues | Validation failure, retry |
| INFO | 9-12 | ✅ Always | Business events | Transfer completed |
| VERBOSE | 7-8 | ⚠️ Sampled| Operational noise | Health check, keep-alive |
| DEBUG | 5-6 | ⚠️ Temporarily | Troubleshooting | Function calls, logic |
| TRACE | 1-4 | ❌ Rarely | Deep diagnosis | Variable state, loop dump |


**Default production setting: INFO level (SeverityNumber >= 9)**
- Captures all important business events
- Minimizes performance impact
- Reduces log noise
- Can be changed to DEBUG temporarily for troubleshooting
- Compatible with OpenTelemetry severity filtering

## Extended Scenarios

Specific standards for common scenarios are available in the specific standards documents:

* [HTTP Requests](./logging/scenarios/http_requests.md) - Standard for incoming and outgoing HTTP logging.
* [Error Handling](./logging/scenarios/error_handling.md) - Rules for logging exceptions and propagating errors.
* [Database Queries](./logging/scenarios/sql_queries.md) - Guidelines for logging SQL and DB interactions.
* [Kafka Messaging](./logging/scenarios/kafka_messaging.md) - Guidelines for producing and consuming Kafka messages.
240 changes: 240 additions & 0 deletions docs/community/standards/logging/best_practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
# Best Practices and Guidelines

This document provides guidelines on implementation, performance optimization, and common anti-patterns to avoid.

## Production Logging Guidelines

### Performance Considerations
- **Avoid expensive operations at DEBUG/TRACE** - These should be cheap since they may be enabled temporarily
- **Use `loggerFactory` from contextLogger** - It checks `is<Level>Enabled` internally before calling Winston, so log arguments are only serialized when the level is active
- **Use `isDebugEnabled()` only for expensive computation** - Guard with `isDebugEnabled()` only when the arguments themselves require expensive work that exists solely for debugging
- **Structured logging** - Log objects, not concatenated strings
- **Avoid JSON.stringify** - Let the logging library handle serialization

### Examples of Performance-Aware Logging

**Bad:**
```javascript
// Always executes stringify even if DEBUG is disabled
logger.debug(`Processing: ${JSON.stringify(largeObject)}`);
```

**Good:**
```javascript
// contextLogger handles level checks internally — just log
logger.debug('Processing transfer', { transfer });

// Use isDebugEnabled() ONLY when guarding expensive computation
if (logger.isDebugEnabled()) {
logger.debug('Detailed state', { computedState: computeExpensiveState() });
}
```

### Metrics vs. Logs

Do **NOT** use logs for counting volume or calculating success rates. Use **Metrics** (Counters, Histograms) for throughput, latency, and error rate tracking.
Use **Logs** for high-cardinality details that cannot be captured in metrics (e.g., specific transaction IDs, error reasons).


## Infrastructure and Observability

### Correlation IDs and Trace Context

Winston's OTel auto-instrumentation (`@opentelemetry/instrumentation-winston`) automatically injects `trace_id`, `span_id`, and `trace_flags` into every log entry. Do **not** add them manually — duplicate or stale IDs cause correlation errors.

Use child loggers for **business-level** correlation IDs only (request ID, transfer ID, batch ID):

```javascript
// trace_id and span_id are injected automatically by OTel instrumentation
// Only add business identifiers via child loggers
const log = logger.child({ requestId: req.id, transferId: req.params.id })
log.info('Processing request')
```

### Log to Stdout

Write all logs to stdout. Do **not** use `console.log` or write to files directly. Delegate collection, rotation, and shipping to external tools (Fluentd, Vector, OTel Collector). In containerized environments, a sidecar or daemonset picks up stdout, enriches it with pod/host metadata, and ships it to a centralized store.

### OpenTelemetry Semantic Conventions

Use OTel's standard attribute names so logs, traces, and metrics correlate without per-service mapping:

| Domain | Attributes |
|--------|-----------|
| HTTP | `http.request.method`, `url.path`, `http.response.status_code` |
| Database | `db.system`, `db.statement`, `db.operation.name` |
| Errors | `error.type`, `error.message`, `error.stack_trace` |
| Messaging | `messaging.system`, `messaging.destination.name` |

When OTel renames or deprecates an attribute, update logging code to match.

### Canonical Log Lines

Emit one wide log entry per request at the HTTP boundary. Include method, route, status code, duration, and key business identifiers. This single entry supports latency analysis, error rate calculation, and audit without aggregation.

```javascript
logger.info(`${req.method} ${req.path} ${res.statusCode}`, {
'http.request.method': req.method,
'url.path': req.path,
'http.response.status_code': res.statusCode,
'http.server.request.duration': duration,
transferId
})
```

### Redaction at Logger Level

Apply redaction at the logger configuration — not at each call site — so new fields cannot leak by accident. Declare paths to mask (e.g., `authorization`, `password`, `token` fields). Log user IDs and entity references instead of full user objects. Dump sanitized configuration at startup: mark sensitive fields so they display as asterisks.

### Per-Component Log Levels

Use `setLevel()` on a child logger to change its verbosity without affecting the parent or siblings. This lets you debug one subsystem without raising the noise floor for everything else.

```javascript
// Only database logs go to debug; parent and siblings stay at their original level
const dbLog = logger.child({ component: 'database' })
dbLog.setLevel('debug')
```

### Dedicated `LOG_LEVEL_{DOMAIN}` Env Vars

When a shared library creates its own internal logger, control its level via a `LOG_LEVEL_{DOMAIN}` env var with a default of `'info'`. This lets operators tune library verbosity without changing the service's log level.

```javascript
const { LOG_LEVEL_KAFKA = 'info' } = require('node:process').env
const logger = loggerFactory('ml-kafka')
logger.setLevel(LOG_LEVEL_KAFKA)

// Usage: LOG_LEVEL_KAFKA=debug npm start
// Service stays at 'info', Kafka internals log at 'debug'
```

## Anti-Patterns to Avoid

### 1. Using console.log
```javascript
// ❌ BAD
console.log('Some message')

// ✅ GOOD
logger.info('Some message')
```

### 2. Logging Everything at INFO
```javascript
// ❌ BAD - Internal details at INFO, generic messages
logger.info('Entering validateTransfer function');
logger.info('Retrieved account from database');
logger.info('Balance check passed');

// ✅ GOOD - Only significant events at INFO with context
logger.info(`Transfer ${transferId} validated successfully for ${amount} ${currency}`, {
operation: 'validateTransfer',
eventName: 'TransferValidated',
transferId: transferId,
'transfer.amount': amount,
'transfer.currency': currency
});
```

### 3. Missing Context in Message
```javascript
// ❌ BAD - Message doesn't explain what happened
logger.error('Validation failed');

// ✅ GOOD - Descriptive message with inline context plus structured attributes
// Trace context is automatic, don't add manually!
logger.error(`Transfer ${transfer.id} validation failed at step '${step}': ${validationErrors.join(', ')}`, {
operation: 'validateTransfer',
eventName: 'ValidationFailed',
transferId: transfer.id,
validationStep: step,
validationErrors: validationErrors
});
```

### 4. Missing Error Stack
**Requirement:** "Verify that Errors are logged with Error Code, Error Stack defined".

```javascript
// ❌ BAD - Losing the stack trace
logger.error(`Failed: ${error.message}`);

// ✅ GOOD - Passing the error object ensures stack is captured
logger.error(`Transfer failed: `, error) // Logger serializer should handle 'exception.stacktrace' and 'exception.type'
```

### 5. Sensitive Data Exposure
```javascript
// ❌ BAD
logger.debug('User authentication', { password: user.password });

// ✅ GOOD
logger.debug('User authentication', {
userId: user.id,
method: 'password'
});
```

### 6. Over-Logging
```javascript
// ❌ BAD - Logging inside tight loops with repeated generic messages
for (const transfer of transfers) {
logger.info('Processing transfer', transfer); // Could be thousands
}

// ✅ GOOD - Aggregate with descriptive summary
logger.info(`Processing batch of ${transfers.length} transfers for batch ${batch.id}`, {
operation: 'processBatch',
eventName: 'BatchProcessingStarted',
'batch.id': batch.id,
'batch.transferCount': transfers.length,
'batch.totalAmount': transfers.reduce((sum, t) => sum + t.amount, 0)
});
```

## Where to Log (Avoiding Duplication)

**Goal:** Avoid "Errors captured in more than one place (often three times)".

### The "Catch and Log" Anti-Pattern
Do **NOT** catch an error just to log it and throw it again, unless you are adding significant context that cannot be added anywhere else.

```javascript
// ❌ BAD - Duplicates logs up the stack
try {
await performAction();
} catch (error) {
logger.error(error); // Log 1
throw error;
}

// ... caller ...
try {
await service.action();
} catch (error) {
logger.error(error); // Log 2 (Duplicate)
throw error;
}
```

### The "Catch, Context, and Bubble" Pattern
If you catch an error to add context, wrap it or attach properties, but do NOT log it until the "Edge" of the application.

```javascript
// ✅ GOOD - Add context, don't log yet
try {
await db.query();
} catch (error) {
throw new DatabaseError('Failed to query', { cause: error });
}

// ... Global Error Handler (The Edge) ...
// Log ONLY here
logger.error(finalError);
```

### Exceptions
- **Background Jobs**: Log errors inside the job as they have no caller to bubble to.
- **Async Event Handlers**: Log errors inside consumers/handlers if they don't have a standardized global error handler.

Loading