Skip to content

Implement approaches to avoid repeating actions in LLM agents #18

@harpomaxx

Description

@harpomaxx

Problem
Our LLM-based agent (ReAct-style reasoning + short-term memory of last 10 actions + binary progress feedback) shows a high action repetition rate (~60% over 250 episodes). The agent frequently selects the same action with similar parameters, even when it does not lead to state-space growth. This suggests policy collapse, insufficient exploration, and/or weak credit assignment from the current progress signal.

Top 3 Approaches

Stagnation-Aware Repetition Penalty / Action Masking
Add an explicit penalty or temporary mask for repeating the same action when no progress (Δstate ≤ 0) occurs. This discourages local loops while preserving exploitation when the action is effective.

Exploration & Novelty Incentives
Introduce entropy regularization, ε-greedy sampling, or a lightweight novelty bonus (e.g., state- or (state, action)-count–based reward). This promotes policy diversity and reduces collapse to a single overused action

Use Markov Chain matrices
Instead of let the LLM free for selecting the action uses the prob matrix learnt to guide the selection. Think about possible uses of this to reduce the repetition

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions