LLM Critique

A powerful CLI tool that uses a creator-critic iterative workflow to generate high-quality content. Multiple AI models work together: one creates content, while others provide detailed critique and feedback, iterating until convergence.

🚀 Features

Creator-Critic Architecture: One model creates, others critique and improve iteratively
Latest AI Models Support: OpenAI GPT-4o/o1/o3, Anthropic Claude 4, Google Gemini 2.x
Intelligent Convergence: Automatically stops when quality thresholds are met
Cost Estimation: Pre-execution cost estimates and real-time cost tracking
Rich Visual Output: Beautiful console output with progress tracking and model identification
Reasoning Model Support: Compatible with OpenAI o1/o3 reasoning models
Flexible Configuration: YAML config and environment variables
Multiple Output Formats: Human-readable or JSON output

📚 Documentation

Quick Links

🚀 Getting Started Guide: Complete step-by-step setup and usage tutorial
📋 Command Reference: Full CLI options and examples
🤖 Model Support: Complete list of supported AI models
⚙️ Configuration: Environment and config file setup

File Documentation

env.example: Template for environment variables with detailed comments
config.yaml: Default configuration file with all options
GETTING_STARTED.md: Beginner-friendly tutorial with examples and troubleshooting

📦 Installation

Prerequisites

Python 3.9+
API keys for the AI providers you want to use

Quick Start

Clone and Install:

git clone https://github.com/yourusername/llm-critique.git
cd llm-critique
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

Set up API Keys (create .env file):

OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key  
GOOGLE_API_KEY=your-google-key
XAI_API_KEY=xai-your-xai-key

Run your first critique:

python -m llm_critique.main "Write a haiku about coding"

🎯 Quick Examples

Basic Usage

# Simple prompt with default models
python -m llm_critique.main "Explain quantum computing"

# Use specific creator and critic models
python -m llm_critique.main "Write a poem" --creator-model gpt-4o --critique-models claude-4-sonnet,gemini-2.0-flash

# Use X AI Grok models with real-time data
python -m llm_critique.main "Analyze recent AI trends" --creator-model grok-3 --critique-models grok-beta,claude-4-sonnet

# Multiple iterations for complex tasks
python -m llm_critique.main "Design a REST API" --iterations 3

# Read prompt from file
python -m llm_critique.main --file prompt.txt --iterations 2

Cost Estimation

# Estimate cost before running
python -m llm_critique.main --est-cost "Write a detailed analysis of climate change" --iterations 3

# Estimate cost from file
python -m llm_critique.main --est-cost --file large_prompt.txt --creator-model gpt-4o --critique-models claude-4-opus,gpt-4o-mini

# Estimate X AI Grok model costs (higher pricing)
python -m llm_critique.main --est-cost "Complex analysis task" --creator-model grok-3 --critique-models grok-3-reasoning,claude-4-sonnet

Model Management

# List all available models
python -m llm_critique.main --list-models

# Use reasoning models
python -m llm_critique.main "Solve this logic puzzle" --creator-model o1-mini --critique-models o3-mini,claude-4-sonnet

# Use X AI Grok reasoning models  
python -m llm_critique.main "Complex math problem" --creator-model grok-3-reasoning --critique-models grok-3-mini-reasoning,o1-mini

🤖 Supported Models

OpenAI

GPT-4o Series: gpt-4o, gpt-4o-mini (multimodal, fast)
Reasoning Models: o1, o1-mini, o3, o3-mini (advanced reasoning)
Legacy: gpt-4, gpt-3.5-turbo

Anthropic

Claude 4 Series: claude-4-opus, claude-4-sonnet (latest generation)
Claude 3.x: claude-3.7-sonnet, claude-3.5-sonnet, claude-3.5-haiku
Legacy: claude-3-opus, claude-3-sonnet, claude-3-haiku

Google

Gemini 2.x: gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash
Legacy: gemini-pro

X AI

Grok 3 Series: grok-3, grok-3-mini (latest flagship and efficient models)
Reasoning Models: grok-3-reasoning, grok-3-mini-reasoning (advanced reasoning)
Production: grok-beta (real-time X data), grok-2 (previous generation)

🛠️ Configuration

Environment Variables

Create a .env file in your project root:

# Required: At least one API key
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key  
GOOGLE_API_KEY=your-google-key
XAI_API_KEY=xai-your-xai-key

# Optional: Customize default behavior
LLM_CRITIQUE_DEFAULT_CREATOR=gpt-4o
LLM_CRITIQUE_DEFAULT_MODELS=claude-4-sonnet,gemini-2.0-flash
LLM_CRITIQUE_MAX_ITERATIONS=2

Configuration File (optional)

Create a config.yaml file:

# Default models for creator-critic workflow
default_creator: gpt-4o
default_models:
  - claude-4-sonnet
  - gemini-2.0-flash
  - gpt-4o-mini

# Workflow settings
max_iterations: 3
confidence_threshold: 0.85

# Cost and performance
timeout: 120
enable_cost_tracking: true

# Output preferences
output_format: human
debug: false

📖 How It Works

Creator-Critic Workflow

Creator Phase: A designated "creator" model generates initial content
Critic Phase: Multiple "critic" models analyze and provide structured feedback
Iteration: Creator improves content based on critic feedback
Convergence: Process stops when critics are satisfied or max iterations reached

Example Workflow

Iteration 1:
├── Creator (gpt-4o): Generates initial content
├── Critic 1 (claude-4-sonnet): "Good start, but needs more examples" 
├── Critic 2 (gemini-2.0-flash): "Structure is unclear, suggest reorganizing"
└── Decision: Continue (quality score < 85%)

Iteration 2:  
├── Creator (gpt-4o): Improves content based on feedback
├── Critic 1 (claude-4-sonnet): "Much better, examples are clear"
├── Critic 2 (gemini-2.0-flash): "Well structured, minor style tweaks"
└── Decision: Stop (quality score >= 85%)

🎨 Output Examples

Human-Readable Format

🔄 CREATOR-CRITIC ITERATION RESULTS
================================================================================
📊 Total Iterations: 2
🎯 Convergence Achieved: ✅ Yes  
⚙️  Creator Model: gpt-4o
🔍 Critic Models: claude-4-sonnet, gemini-2.0-flash

==================== ITERATION 1 ====================
🎨 CREATOR OUTPUT (gpt-4o)
Confidence: 80.0%
[Generated content in styled panel]

🔍 CRITICS FEEDBACK (Iteration 1)
  🤖 claude-4-sonnet (Iteration 1)
     📊 Quality Score: 75.0%
     💪 Strengths: Clear writing, good structure
     🔧 Improvements: Add more examples, improve conclusion
     🎯 Decision: 🔄 Continue

🔄 ITERATION SUMMARY
  📝 Requested Iterations: 3
  ✅ Used Iterations: 2  
  🎯 Status: Convergence achieved after 2 iterations
  💡 Early Stop: Stopped 1 iteration early due to quality convergence

⚡ PERFORMANCE
  ⏱️  Total Duration: 12.3s
  💰 Estimated Cost: $0.0087

JSON Format

python -m llm_critique.main "Your prompt" --format json

{
  "execution_id": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2024-12-19T10:30:00Z",
  "input": {
    "prompt": "Your prompt",
    "creator_model": "gpt-4o", 
    "critic_models": ["claude-4-sonnet", "gemini-2.0-flash"],
    "max_iterations": 3
  },
  "results": {
    "final_answer": "The final improved content",
    "confidence_score": 0.92,
    "consensus_score": 0.87,
    "total_iterations": 2,
    "convergence_achieved": true
  },
  "performance": {
    "total_duration_ms": 12300,
    "estimated_cost_usd": 0.0087
  }
}

💰 Cost Estimation

Pre-Execution Estimation

# Get cost estimate without running
python -m llm_critique.main --est-cost "Write a research paper on AI ethics" \
  --creator-model gpt-4o \
  --critique-models claude-4-opus,gemini-2.5-pro \
  --iterations 3

Output:

💰 Cost Estimation for LLM Critique Workflow
┌─────────────────┬─────────────┬──────────────────┬──────────────┬────────────┐
│ Component       │ Model       │ Usage            │ Cost per 1K  │ Total Cost │
├─────────────────┼─────────────┼──────────────────┼──────────────┼────────────┤
│ Creator (Iter 1)│ gpt-4o      │ 1,200 in + 500  │ $0.0025      │ $0.0043    │
│ Critic (Iter 1) │ claude-4-opus│ 1,700 in + 200 │ $0.015       │ $0.0285    │
└─────────────────┴─────────────┴──────────────────┴──────────────┴────────────┘

Estimated Total Cost: $0.0674

Real-Time Cost Tracking

The tool tracks actual costs during execution and displays them in the performance summary.

🔧 Command Line Reference

Main Commands

# Basic execution
python -m llm_critique.main [PROMPT] [OPTIONS]

# Utility commands  
python -m llm_critique.main --list-models      # Show all models
python -m llm_critique.main --est-cost [ARGS]  # Estimate costs

Options

Option	Description	Example
`--file`, `-f`	Read prompt from file	`-f prompt.txt`
`--creator-model`	Model for content creation	`--creator-model gpt-4o`
`--critique-models`	Comma-separated critic models	`--critique-models claude-4-sonnet,gemini-2.0-flash`
`--iterations`	Maximum iterations	`--iterations 3`
`--format`	Output format (human/json)	`--format json`
`--debug`	Enable debug logging	`--debug`
`--config`	Custom config file	`--config custom.yaml`
`--list-models`	List available models	`--list-models`
`--est-cost`	Estimate cost only	`--est-cost`
`--listen`	Save conversation to file	`--listen conversation.json`
`--replay`	Replay saved conversation	`--replay conversation.json`
`--version`	Show version and exit	`--version`

🧪 Development

Setup Development Environment

# Clone and setup
git clone https://github.com/yourusername/llm-critique.git
cd llm-critique

# Create virtual environment  
python -m venv venv
source venv/bin/activate

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=llm_critique --cov-report=html

# Run specific test file
pytest tests/test_main.py

# Run with debug output
pytest -v -s

Code Quality

# Format code
black llm_critique/ tests/
isort llm_critique/ tests/

# Check style
flake8 llm_critique/ tests/

# Type checking
mypy llm_critique/

# Run all quality checks
pre-commit run --all-files

🏗️ Project Structure

llm-critique/
├── llm_critique/
│   ├── core/
│   │   ├── models.py       # LLM client and model management
│   │   ├── chains.py       # Creator-critic workflow chains  
│   │   └── synthesis.py    # Response synthesis and output
│   └── main.py            # CLI entry point
├── tests/
│   ├── test_models.py     # Model integration tests
│   ├── test_chains.py     # Workflow logic tests
│   ├── test_synthesis.py  # Output formatting tests
│   └── conftest.py        # Test configuration
├── config.yaml           # Default configuration
├── .env.example          # Environment variable template
├── pyproject.toml        # Project metadata and dependencies
└── README.md             # This file

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes and add tests
Run quality checks: pre-commit run --all-files
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: This README and inline code documentation

🔒 Security & Privacy

Important Security Information:

API Key Security

✅ Never commit API keys to version control
✅ Store keys only in .env files (excluded from git)
✅ Use environment variables for all credentials
✅ Keys are automatically redacted from debug output and logs

Data Handling

🔄 Conversation files may contain sensitive prompts - stored with restrictive permissions (600)
📝 Log files use security filtering to prevent credential leakage
🚫 No data sent to external services beyond the specified AI APIs

Best Practices

# ✅ Good - Use environment variables
export OPENAI_API_KEY="sk-your-key-here"

# ❌ Bad - Never hardcode in scripts
api_key = "sk-your-key-here"  # DON'T DO THIS

File Permissions

The tool automatically sets secure file permissions:

Log files: 600 (owner read/write only)
Conversation files: 600 (owner read/write only)
Config directory: 755 (standard directory permissions)

🧪 Testing & Validation

Persona Validation

To ensure all expert personas are properly configured and can load without errors:

# Test all personas for YAML syntax and structure
pytest tests/test_personas.py -k "production_personas" -v

# Run full persona test suite
pytest tests/test_personas.py -v

This validation test catches:

YAML syntax errors (missing line breaks, malformed lists)
Missing required fields
Invalid data types
File encoding issues

For CI/CD: Include pytest tests/test_personas.py -k "production_personas" in your pipeline to ensure persona integrity.

Made with ❤️ for the AI community

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
docker		docker
llm_critique		llm_critique
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
DOCUMENT_CRITIQUE_GUIDE.md		DOCUMENT_CRITIQUE_GUIDE.md
Dockerfile		Dockerfile
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
PRODUCTION_DEPLOYMENT.md		PRODUCTION_DEPLOYMENT.md
README.md		README.md
USAGE_GUIDE.md		USAGE_GUIDE.md
config.yaml		config.yaml
config.yaml.example		config.yaml.example
demo_conversation_format.py		demo_conversation_format.py
docker-compose.yml		docker-compose.yml
env.example		env.example
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
run-example.txt		run-example.txt
setup.cfg		setup.cfg
setup.py		setup.py
test-quality-fix.md		test-quality-fix.md
test_data.json		test_data.json
test_document.txt		test_document.txt
test_rich_format.txt		test_rich_format.txt
tox.ini		tox.ini

License

sachafaust/critique

Folders and files

Latest commit

History

Repository files navigation