Skip to content

feat(anyhunt/video-transcript): implement dual-mode transcript pipeline#80

Open
dvlin-dev wants to merge 19 commits intomainfrom
video2text
Open

feat(anyhunt/video-transcript): implement dual-mode transcript pipeline#80
dvlin-dev wants to merge 19 commits intomainfrom
video2text

Conversation

@dvlin-dev
Copy link
Owner

@dvlin-dev dvlin-dev commented Feb 9, 2026

add LOCAL+CLOUD_FALLBACK workers, queue orchestration, runtime switch, and budget guard

add console/admin pages, prisma migration, local deployment script, and deployment checklist docs


Open with Devin

dvlindev and others added 2 commits February 10, 2026 02:58
add LOCAL+CLOUD_FALLBACK workers, queue orchestration, runtime switch, and budget guard

add console/admin pages, prisma migration, local deployment script, and deployment checklist docs
@dvlin-dev
Copy link
Owner Author

@codex

chatgpt-codex-connector[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@dvlin-dev
Copy link
Owner Author

已按最佳实践补齐并修复 review threads(已全部 resolve):

  • VPS2/Mac local worker 使用独立 worker app(apps/anyhunt/server/src/video-transcript/worker.ts + apps/anyhunt/server/src/video-transcript/video-transcript-worker-app.module.ts),不加载全量 AppModule,避免误消费其他队列/定时任务;worker 进程改为 createApplicationContext(不提供 HTTP)。
  • Docker 入口新增 ANYHUNT_RUN_MODE=api|video-transcript-workerANYHUNT_RUN_MIGRATIONS=true|false,worker 建议 ANYHUNT_RUN_MIGRATIONS=false
  • URL 入参强制 http(s) 协议校验(DTO + normalize 双层兜底);VIDEO_TRANSCRIPT_ENABLE_LOCAL_WORKER / VIDEO_TRANSCRIPT_ENABLE_CLOUD_FALLBACK_WORKER 默认值调整为 false,避免误启。
  • Admin/Console 文案与时间展示对齐:Queues 页用户可见文案统一英文;时间展示统一 formatRelativeTime
  • 部署流程 + 上线前 checklist 已写入:docs/architecture/anyhunt-video-transcript-pipeline.md(第 16/17 节)。

devin-ai-integration[bot]

This comment was marked as resolved.

@dvlin-dev
Copy link
Owner Author

已按 PR review 修复并推送到分支 video2text

  • cancelTask 改为 updateMany + terminal status guard,并且仅在取消写入成功后才写 preempt signal,避免并发完成时把 COMPLETED 覆盖成 CANCELLED
  • Budget Lua EVAL 入参显式 String() 化,避免浮点/number 作为 ARGV 的隐式转换边界。
  • 补齐回归单测覆盖上述竞态分支。

对应变更:

  • apps/anyhunt/server/src/video-transcript/video-transcript.service.ts
  • apps/anyhunt/server/src/video-transcript/video-transcript-budget.service.ts
  • apps/anyhunt/server/src/video-transcript/__tests__/video-transcript.service.spec.ts

devin-ai-integration[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 28 additional findings in Devin Review.

Open in Devin Review

Comment on lines +208 to +216
if (reason === 'timeout' && !preemptSignaled) {
cloudOwnershipAcquired = await this.acquireCloudOwnership(
taskId,
'DOWNLOADING',
startedAt,
);
if (!cloudOwnershipAcquired) {
return;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Budget reservation leak when non-probe ownership acquisition fails

In the cloud fallback processor's handleCloudRun, when probedDurationSec === 0 (probe failed) and reason === 'timeout', budget is reserved using actual audio duration at video-transcript-cloud-fallback.processor.ts:196-197. Then at lines 208-216, if acquireCloudOwnership fails (e.g., local already completed or task was cancelled), the code returns without releasing the reserved budget. This is a budget leak that inflates the daily usage counter in Redis.

The PR explicitly fixed the analogous leak on the probe path (lines 163-167 correctly call releaseCloudBudgetReservation before returning), but missed the non-probe fallback path. The CLAUDE.md constraint at apps/anyhunt/server/src/video-transcript/CLAUDE.md states: "timeout 路径若在 probe 预算后丢失 CLOUD_FALLBACK 执行权,必须回滚这次 budget reservation" — this same principle applies to the non-probe reservation path.

Suggested change
if (reason === 'timeout' && !preemptSignaled) {
cloudOwnershipAcquired = await this.acquireCloudOwnership(
taskId,
'DOWNLOADING',
startedAt,
);
if (!cloudOwnershipAcquired) {
return;
}
if (reason === 'timeout' && !preemptSignaled) {
cloudOwnershipAcquired = await this.acquireCloudOwnership(
taskId,
'DOWNLOADING',
startedAt,
);
if (!cloudOwnershipAcquired) {
if (budgetReservation) {
await this.budgetService.releaseCloudBudgetReservation(
budgetReservation,
);
}
return;
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 28d91c109d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +40 to +41
const dir = path.join(os.tmpdir(), 'anyhunt-video-transcript', taskId);
await mkdir(dir, { recursive: true });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Isolate workspace directory per executor

createWorkspace derives the path only from taskId, and both LOCAL and CLOUD_FALLBACK processors call it with the same ID. In timeout fallback races (where cloud starts while local may still be running), both workers read/write video.*, audio.wav, and transcript files in the same directory, and either worker's cleanup can delete files the other still needs, causing spurious failures or corrupted outputs.

Useful? React with 👍 / 👎.

Comment on lines +212 to +215
startedAt,
);
if (!cloudOwnershipAcquired) {
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Release reserved budget when timeout takeover loses race

In the probeVideoDurationSeconds() === 0 path, budget is reserved later from measured audio duration, but if cloud ownership acquisition then fails, execution returns immediately without rolling back that reservation. This leaks daily cloud budget for tasks that never actually entered cloud execution (e.g., local completed/cancelled during the pre-takeover work), which can incorrectly exhaust the budget gate for subsequent tasks.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants