Skip to content

feat: cache record#456

Open
hanakannzashi wants to merge 3 commits intomainfrom
feat/cache-record
Open

feat: cache record#456
hanakannzashi wants to merge 3 commits intomainfrom
feat/cache-record

Conversation

@hanakannzashi
Copy link
Contributor

@hanakannzashi hanakannzashi commented Feb 26, 2026

Note

High Risk
Touches billing-critical logic by changing cost calculation to account for cached prompt tokens and by adding new pricing/usage columns via a DB migration. API/DB contract changes could affect downstream clients and historical usage reporting if not rolled out carefully.

Overview
Adds end-to-end support for cached prompt tokens: providers now expose TokenUsage::cached_tokens(), API Usage includes prompt_tokens_details.cached_tokens, and streaming/non-streaming response aggregation carries total_cached_tokens.

Introduces cache-aware billing by adding cache_read_cost_per_token to model pricing (including admin update/list responses and model history) and computing input cost as (non-cached prompt tokens * input rate) + (cached prompt tokens * cache-read rate).

Persists and surfaces this data by migrating the DB (models, model_history, organization_usage_log) and extending usage recording/history APIs to store/return cache_read_tokens (defaulting to 0 where unavailable).

Written by Cursor Bugbot for commit a5d2ea4. This will update automatically on new commits. Configure here.

@hanakannzashi hanakannzashi added the WIP Work in progress label Feb 26, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hanakannzashi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the system's ability to track and bill for cached tokens. It integrates a new cache_read_cost_per_token into the model pricing structure, allowing for more granular and potentially optimized billing based on whether tokens are served from a cache. The changes ensure that this new metric is consistently handled from the initial API request through to database storage and final usage reporting.

Highlights

  • Cache-aware Token Pricing: Introduced a new cache_read_cost_per_token field across various model and usage tracking structures, enabling differentiated billing for tokens served from a cache.
  • Cached Token Extraction: Added a cached_tokens() method to the TokenUsage struct in crates/inference_providers/src/models.rs to accurately extract cached token counts from provider responses.
  • Usage Tracking and Reporting: Updated API endpoints, database models, and service logic to record, store, and report cache_read_tokens in usage logs and model responses, ensuring comprehensive tracking of cached token consumption.
  • Cost Calculation Logic: Modified the core cost calculation function in crates/services/src/usage/mod.rs to incorporate cache_read_tokens, applying the specific cache_read_cost_per_token rate when applicable.
Changelog
  • crates/api/src/conversions.rs
    • Updated From implementation for Usage to include cached tokens.
  • crates/api/src/models.rs
    • Modified Usage struct for token details and added cache_read_cost_per_token to model pricing structs.
  • crates/api/src/routes/admin.rs
    • Updated admin routes to manage and display cache_read_cost_per_token for models.
  • crates/api/src/routes/completions.rs
    • Added cache_read_tokens to various usage request builders.
  • crates/api/src/routes/models.rs
    • Included cache_read_cost_per_token in model listing and retrieval responses.
  • crates/api/src/routes/usage.rs
    • Added cache_read_tokens to usage history responses and population logic.
  • crates/database/src/models.rs
    • Extended model, pricing, history, and usage log structs with cache_read_cost_per_token or cache_read_tokens.
  • crates/database/src/repositories/admin_composite.rs
    • Integrated cache_read_cost_per_token into admin model repository operations.
  • crates/database/src/repositories/model.rs
    • Updated SQL queries and model repository methods to support cache_read_cost_per_token.
  • crates/database/src/repositories/model_repository_impl.rs
    • Modified model repository implementation to pass cache_read_cost_per_token to usage service ports.
  • crates/database/src/repositories/organization_usage.rs
    • Updated organization usage repository to record and retrieve cache_read_tokens.
  • crates/database/src/repositories/usage_repository_impl.rs
    • Adapted usage repository implementation to handle cache_read_tokens in usage recording and retrieval.
  • crates/inference_providers/src/models.rs
    • Added a method to TokenUsage for extracting cached token counts.
  • crates/services/src/admin/ports.rs
    • Extended admin service model structs with cache_read_cost_per_token.
  • crates/services/src/completions/mod.rs
    • Updated completion service logic to track and pass cache_read_tokens for usage.
  • crates/services/src/models/ports.rs
    • Added cache_read_cost_per_token to the ModelWithPricing struct.
  • crates/services/src/responses/models.rs
    • Introduced a new constructor for Usage to include cached tokens.
  • crates/services/src/responses/service.rs
    • Modified response service to incorporate total_cached_tokens into final usage reporting.
  • crates/services/src/responses/service_helpers.rs
    • Added total_cached_tokens to the response stream context and its usage accumulation.
  • crates/services/src/test_utils.rs
    • Updated test utilities to accommodate cache_read_tokens in mock and capturing usage services.
  • crates/services/src/usage/mod.rs
    • Implemented a new cost calculation function and updated existing methods to handle cache-aware pricing.
  • crates/services/src/usage/ports.rs
    • Extended usage service traits and request/response structs to support cache_read_tokens and cache_read_cost_per_token.
Activity
  • The pull request introduces a new feature, indicated by the feat: prefix in the title, focusing on cache record functionality.
  • The author hanakannzashi has implemented changes across multiple crates to support this new cache record feature.
  • The changes involve updating data models, API endpoints, database interactions, and service logic to incorporate cache_read_tokens and cache_read_cost_per_token.
  • The absence of a description body suggests the changes are self-explanatory through the code or that a more detailed explanation might be provided in comments or during review.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link

claude bot commented Feb 26, 2026

Code Review

Two critical issues need to be addressed before merge.

1. Missing Database Migration (Critical - Deployment Blocker)

The PR adds cache_read_tokens to organization_usage_log and cache_read_cost_per_token to models/model_history, but no migration SQL file was added.

The highest existing migration is V0045__add_usage_idempotency.sql. There is no V0046 file. Without it, all SQL queries referencing these columns will fail with column does not exist on any environment that has already been initialized (dev, staging, prod).

A prior migration (V0041__add_image_billing_fields.sql) established the correct pattern:

-- V0046__add_cache_read_billing.sql
ALTER TABLE models ADD COLUMN cache_read_cost_per_token BIGINT NOT NULL DEFAULT 0;
ALTER TABLE model_history ADD COLUMN cache_read_cost_per_token BIGINT NOT NULL DEFAULT 0;
ALTER TABLE organization_usage_log ADD COLUMN cache_read_tokens INTEGER NOT NULL DEFAULT 0;

2. Breaking API Change: alias to rename on Usage Struct (High)

File: crates/api/src/models.rs

Changing #[serde(alias = "prompt_tokens")] to #[serde(rename = "prompt_tokens")] on input_tokens (and similarly for output_tokens) is a backward-incompatible deserialization change.

  • alias: accepts both "input_tokens" and "prompt_tokens" as JSON keys when deserializing
  • rename: only accepts "prompt_tokens"; a payload with "input_tokens" will now fail to deserialize

Additionally, input_tokens_details was previously serialized as "input_tokens_details" (no rename) and is now renamed to "prompt_tokens_details". Any client parsing "input_tokens_details" from responses will silently get nothing.

The safest fix: #[serde(alias = "input_tokens", rename = "prompt_tokens")] is valid in serde and enables a backward-compatible rename.

Non-Issues (Confirmed Correct)

  • Cost calculation: No double-counting. min(cache_read_tokens, input_tokens) correctly splits the input bucket into cached/non-cached portions.
  • Type safety: cached_tokens() as i64 is a safe widening cast from i32.
  • SQL injection: Parameterized queries used throughout; no injection risk.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces functionality to record and account for cached tokens in token usage and cost calculations. The changes span across API models, database schemas, repositories, and services to support a new cache_read_cost_per_token pricing model. While the implementation is largely consistent, I've found a critical issue in a database query that could lead to runtime errors, and a medium-severity issue related to missing validation. My review includes suggestions to fix these issues.

context_length, verifiable, is_active, owned_by,
provider_type, provider_config, attestation_supported,
input_modalities, output_modalities
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The VALUES clause is missing a parameter for the cache_read_cost_per_token column. There are 17 columns in the INSERT list but only 16 placeholders here. This will cause a runtime error. You also need to update the corresponding bind parameters in the create_model function to include cache_read_cost_per_token.

Suggested change
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16)
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17)

Comment on lines +257 to +277
let input = input_tokens.unwrap_or(0);
let output = output_tokens.unwrap_or(0);
if input < 0 || output < 0 {
return Err(UsageError::ValidationError(
"token counts must be non-negative".into(),
));
}
RecordUsageApiRequest::ImageGeneration {
model,
image_count,
id,
} => {
if id.trim().is_empty() {
return Err(UsageError::ValidationError(
"id must be a non-empty string".into(),
));
}
if *image_count <= 0 {
return Err(UsageError::ValidationError(
"image_count must be positive".into(),
));
}
(
model.clone(),
0,
0,
Some(*image_count),
InferenceType::ImageGeneration,
id.clone(),
)
if input == 0 && output == 0 {
return Err(UsageError::ValidationError(
"at least one of input_tokens or output_tokens must be positive".into(),
));
}
};
(
model.clone(),
input,
output,
cached_tokens.unwrap_or(0),
None,
InferenceType::ChatCompletion,
id.clone(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and robustness, you should also validate that cached_tokens is non-negative, similar to how input_tokens and output_tokens are validated. While the downstream compute_token_cost function handles negative values safely, validating at the API boundary is a good practice.

                let input = input_tokens.unwrap_or(0);
                let output = output_tokens.unwrap_or(0);
                let cached = cached_tokens.unwrap_or(0);
                if input < 0 || output < 0 || cached < 0 {
                    return Err(UsageError::ValidationError(
                        "token counts must be non-negative".into(),
                    ));
                }
                if input == 0 && output == 0 {
                    return Err(UsageError::ValidationError(
                        "at least one of input_tokens or output_tokens must be positive".into(),
                    ));
                }
                (
                    model.clone(),
                    input,
                    output,
                    cached,
                    None,
                    InferenceType::ChatCompletion,
                    id.clone(),
                )

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Free Tier Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable autofix in the Cursor dashboard.

output_modalities = COALESCE(EXCLUDED.output_modalities, models.output_modalities),
updated_at = NOW()
RETURNING id, model_name, model_display_name, model_description, model_icon,
input_cost_per_token, output_cost_per_token, cost_per_image,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong SQL parameter index in ON CONFLICT clause

High Severity

The CASE WHEN $11 IS NULL in the ON CONFLICT clause was not renumbered after cache_read_cost_per_token was inserted as $5. Before this change, $11 referred to owned_by; now $11 refers to is_active (which is always non-null due to unwrap_or(true)). This means owned_by will always be unconditionally overwritten on conflict, breaking the conditional-preserve logic. The reference needs to be $12.

Additional Locations (1)

Fix in Cursor Fix in Web

) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16)
RETURNING id, model_name, model_display_name, model_description, model_icon,
input_cost_per_token, output_cost_per_token, cost_per_image,
input_cost_per_token, output_cost_per_token, cost_per_image, cache_read_cost_per_token,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL column count mismatches VALUES parameter count

High Severity

In create_model, the column list now includes cache_read_cost_per_token (17 columns total), but the VALUES clause still has only 16 placeholders ($1$16), and the params array also has only 16 entries (missing model.cache_read_cost_per_token). This will cause a PostgreSQL runtime error on every call to create_model.

Fix in Cursor Fix in Web

@think-in-universe think-in-universe linked an issue Feb 26, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: LLM Cache Read Cost Tracking

1 participant