Implement approaches to avoid repeating actions in LLM agents

Problem
Our LLM-based agent (ReAct-style reasoning + short-term memory of last 10 actions + binary progress feedback) shows a high action repetition rate (~60% over 250 episodes). The agent frequently selects the same action with similar parameters, even when it does not lead to state-space growth. This suggests policy collapse, insufficient exploration, and/or weak credit assignment from the current progress signal.

Top 3 Approaches

Stagnation-Aware Repetition Penalty / Action Masking
Add an explicit penalty or temporary mask for repeating the same action when no progress (Δstate ≤ 0) occurs. This discourages local loops while preserving exploitation when the action is effective.

Exploration & Novelty Incentives
Introduce entropy regularization, ε-greedy sampling, or a lightweight novelty bonus (e.g., state- or (state, action)-count–based reward). This promotes policy diversity and reduces collapse to a single overused action

Use Markov Chain matrices
Instead of let the LLM free for selecting the action uses the prob matrix learnt to guide the selection. Think about possible uses of this to reduce the repetition 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement approaches to avoid repeating actions in LLM agents #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement approaches to avoid repeating actions in LLM agents #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions