semidark · semidark · Apr 24, 2026 · Apr 24, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -106,7 +106,6 @@ Full setup in `docs/DEPLOYMENT_SUMMARY.md`.
 
 - `config.yaml.tpl` or `custom_auth.py` changes only take effect on redeploy — no hot reload.
 - `litellm_settings.drop_params: true` — prevents clients from overriding provider credentials at request time.
-- `litellm_settings.drop_unknown_params: true` — strips unsupported request fields before they reach upstream providers.
 - `custom_auth.py` caches valid keys in memory on first request. Key changes require redeploy to take effect.
 - `custom_auth` replaces LiteLLM's built-in master key check entirely — the handler explicitly also accepts `LITELLM_MASTER_KEY` so admin operations keep working.
 - No content logging (prompts/responses); metadata-only with 30-day retention in Log Analytics.
@@ -118,4 +117,4 @@ Full setup in `docs/DEPLOYMENT_SUMMARY.md`.
 - Per-key model access restrictions (extend `custom_auth.py` to map keys → allowed models)
 - Spend tracking / rate limiting without DB (e.g. Azure Table Storage counters)
 - Telemetry to Azure Monitor (latency, errors, token counts)
-- **Verify whether LiteLLM still exposes any residual `/ui` surface despite `disable_admin_ui: true`**. If needed, block it completely via an nginx sidecar that proxies traffic to LiteLLM on `localhost:4000` and returns `404` on `/ui*`. Change ingress `target_port` from `4000` to `80`. Alternative (paid): Azure Front Door WAF with a path-based custom rule.
+- **Verify whether LiteLLM still exposes any residual `/ui` surface despite `DISABLE_ADMIN_UI=True`**. If needed, block it completely via an nginx sidecar that proxies traffic to LiteLLM on `localhost:4000` and returns `404` on `/ui*`. Change ingress `target_port` from `4000` to `80`. Alternative (paid): Azure Front Door WAF with a path-based custom rule.
diff --git a/README.md b/README.md
@@ -286,7 +286,7 @@ Typical savings for workloads with repeated context:
 - **Secrets**: Never commit `.env` or `*.tfvars` files (both are gitignored)
 - **Logging**: No prompt/response content is logged; only metadata (timestamps, latency, token counts)
 - **HTTPS Only**: Container Apps enforces TLS on external ingress
-- **Proxy Hardening**: `disable_admin_ui: true`, `disable_key_management: true`, `drop_params: true`, `drop_unknown_params: true`
+- **Proxy Hardening**: `DISABLE_ADMIN_UI=True`, `drop_params: true`
 - **Runtime Hardening**: LiteLLM image pinned to `ghcr.io/berriai/litellm:main-v1.82.3`, `min_replicas = 0`, `max_replicas = 1`, `cooldown_period_in_seconds = 600`
 - **Least Privilege**: Managed identities used where possible
 

diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
@@ -143,9 +143,7 @@ curl -sS \
 - Store all secrets as Container Apps secrets; never commit to git
 - HTTPS only enforced (`allow_insecure_connections = false`)
 - `litellm_settings.drop_params: true` prevents clients overriding provider credentials
-- `litellm_settings.drop_unknown_params: true` drops unsupported request fields before they reach upstream APIs
-- Admin UI disabled (`disable_admin_ui: true`)
-- Key management routes disabled (`disable_key_management: true`)
+- Admin UI disabled via `DISABLE_ADMIN_UI` env var
 - Container image pinned to `main-v1.82.3` — no floating tag surprises
 - Scale-to-zero (`min_replicas = 0`, `max_replicas = 1`) limits blast radius of abuse
 - `cooldown_period_in_seconds = 600` slows repeated cold-start churn after bursts
@@ -176,15 +174,15 @@ See `docs/USAGE_ANALYSIS.md` for schema, KQL examples, and cost tracking roadmap
 - Per-key model access restrictions (extend `custom_auth.py`)
 - Key alias mapping (human-readable labels for keys)
 - Budget alerts / rate limiting
-- **Verify whether LiteLLM still exposes any residual `/ui` surface despite `disable_admin_ui: true`**. If needed, block it completely via an nginx sidecar that proxies traffic to LiteLLM on `localhost:4000` and returns `404` on `/ui*`. Change ingress `target_port` from `4000` to `80`. Alternative (paid): Azure Front Door WAF with a path-based custom rule.
+- **Verify whether LiteLLM still exposes any residual `/ui` surface despite `DISABLE_ADMIN_UI=True`**. If needed, block it completely via an nginx sidecar that proxies traffic to LiteLLM on `localhost:4000` and returns `404` on `/ui*`. Change ingress `target_port` from `4000` to `80`. Alternative (paid): Azure Front Door WAF with a path-based custom rule.
 
 ## Prompt Caching
 
 Azure OpenAI models (`gpt-4.1`, `gpt-5.4`, `gpt-5.1-codex`) support automatic prompt caching for prompts with 1024+ tokens. The LiteLLM proxy preserves native OpenAI caching semantics:
 
 - **No configuration required**: Caching activates automatically for eligible prompts
 - **Prompt structure matters**: Place static content at the beginning, variable content at the end
-- **Use `prompt_cache_key`**: Improves hit rates for workloads with shared prefixes (parameter survives `drop_unknown_params: true`)
+- **Use `prompt_cache_key`**: Improves hit rates for workloads with shared prefixes (parameter survives `drop_params: true`)
 - **Extended retention**: `prompt_cache_retention: "24h"` available for recurring tasks on `gpt-4.1` and newer models
 - **Visibility**: Cached token counts logged in `UsageMetrics` table (`CachedTokensIn_d` field)
 

diff --git a/docs/DEPLOYMENT_SUMMARY.md b/docs/DEPLOYMENT_SUMMARY.md
@@ -122,17 +122,16 @@ Authorization: Bearer <api_key>
 #### Additional Hardening
 
 - `litellm_settings.drop_params: true` — prevents clients from overriding provider credentials.
-- `litellm_settings.drop_unknown_params: true` — strips unknown request fields before proxying upstream.
 - DB features disabled (`store_model_in_db: false`, `disable_spend_logs: true`, etc.) — no database in use.
-- Admin UI and key-management routes disabled (`disable_admin_ui: true`, `disable_key_management: true`).
+- Admin UI disabled via `DISABLE_ADMIN_UI` env var.
 - Container image pinned to `ghcr.io/berriai/litellm:main-v1.82.3`, HTTPS-only ingress, `min_replicas = 0`, `max_replicas = 1`, and `cooldown_period_in_seconds = 600`.
 
 #### Prompt Caching
 
 Azure OpenAI models (`gpt-4.1`, `gpt-5.4`, `gpt-5.1-codex`) support automatic prompt caching. Key points:
 
 - **Automatic activation**: No configuration required; works for prompts with 1024+ tokens
-- **Parameter passthrough**: `prompt_cache_key` and `prompt_cache_retention` survive `drop_unknown_params: true` filtering
+- **Parameter passthrough**: `prompt_cache_key` and `prompt_cache_retention` survive `drop_params: true` filtering
 - **Cost impact**: Cached tokens billed at ~10-20% of standard input pricing
 - **Verification**: Check `usage.prompt_tokens_details.cached_tokens` in responses; monitor via Log Analytics `CachedTokensIn_d` field
 

diff --git a/docs/PROMPT_CACHING.md b/docs/PROMPT_CACHING.md
@@ -271,7 +271,6 @@ If `prompt_cache_key` or `prompt_cache_retention` are not working:
 
 1. Verify model supports caching via `/v1/model/info`
 2. Check LiteLLM version compatibility
-3. Review `drop_unknown_params` setting (currently enabled for security)
 
 The current deployment has validated that these parameters survive filtering for `gpt-4.1`.