feat(mcp): add Python agent example, integration tests, and CI by datradito · Pull Request #1509 · crytic/echidna

datradito · 2025-12-29T23:50:45Z

Adds a minimal working example and test coverage for the MCP server
introduced in #1502.

What's included

examples/mcp_agent.py — A LangGraph-based autonomous agent that
connects to a live Echidna campaign over MCP. Each iteration it:

Calls status to read coverage and corpus size
When coverage stagnates, asks Claude to generate targeted transactions
and injects them via inject_fuzz_transactions
Periodically calls clear_fuzz_priorities to avoid getting stuck

examples/README.md — Setup, usage, and a reference table of all
available MCP tools.

tests/mcp/test_mcp.py — pytest integration tests that spin up a
real Echidna process and exercise the four core MCP operations:
inject transactions, check coverage, reset priorities, check status.

tests/mcp/contracts/ — EchidnaMCPTest.sol and SimpleToken.sol,
the contracts used by the test suite.

.github/workflows/mcp-tests.yml — CI job that builds Echidna via
Nix and runs the integration tests on every push and PR.

How to run locally

# Install test deps
pip install pytest httpx

# Run integration tests (requires Echidna binary in PATH)
pytest tests/mcp/test_mcp.py -v

How to run the agent

# Terminal 1 — start Echidna with MCP enabled
echidna MyContract.sol --server 8080 --format text

# Terminal 2 — run the agent
export ANTHROPIC_API_KEY=your_key_here
python examples/mcp_agent.py

--format text is required: it disables the interactive TUI so the
MCP server thread can accept connections.

Notes

Depends on Agentic refactoring and full MCP support #1502 (the MCP server itself)
The agent requires an Anthropic API key; the test suite does not

CLAassistant · 2025-12-29T23:50:55Z

All committers have signed the CLA.

Copilot

Pull request overview

This PR adds comprehensive documentation, examples, and testing infrastructure for the MCP (Model Context Protocol) server integration with Echidna. It enables AI agents to interact with Echidna's fuzzing campaigns through a standardized protocol, providing both simple and advanced (LLM-powered) agent examples for monitoring and guiding fuzzing operations.

Key Changes:

Complete test suite with pytest fixtures, schema validation, and integration tests covering all MCP tools
Two agent examples: a simple autonomous monitoring agent and an advanced LangGraph-based AI agent using Claude
Comprehensive documentation including setup guides, troubleshooting, and VS Code integration instructions

Reviewed changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated 32 comments.

Show a summary per file

File	Description
tests/mcp/test_*.py	Comprehensive pytest test suite covering all 9 MCP tools with schema validation, performance tests, and integration workflows
tests/mcp/conftest.py	pytest fixtures providing MCP client and Echidna campaign management for testing
tests/mcp/scripts/*.py	Helper utilities for MCP client wrapper and JSON schema validation
tests/mcp/contracts/*.sol	Test Solidity contracts (SimpleToken, EchidnaMCPTest) for validating MCP functionality
tests/mcp/contracts/*.json	JSON schemas defining expected response structures for each MCP tool
tests/mcp/requirements.txt	Python dependencies for testing framework and LangChain integration
examples/simple_agent.py	Autonomous monitoring agent with coverage stagnation detection and transaction injection
examples/langgraph_agent.py	Advanced LLM-powered agent using LangGraph for intelligent fuzzing strategy
examples/README.md	Documentation for agent examples with usage instructions and strategy ideas
AGENT_TESTING_GUIDE.md	Comprehensive guide covering server setup, manual testing, agent integration, and troubleshooting
test-mcp-client.py	Command-line test script validating all 7 core MCP tools
.gitignore	Added patterns for Python cache files and temporary test artifacts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/mcp/scripts/mcp_client_wrapper.py

tests/mcp/conftest.py

tests/mcp/scripts/mcp_client_wrapper.py

AGENT_TESTING_GUIDE.md

tests/mcp/test_corpus.py

tests/mcp/test_coverage.py

tests/mcp/test_read_logs.py

tests/mcp/test_schemas.py

gustavo-grieco

I like the direction of this PR, here is a couple of comments:

Why exactly we need to log all the commands to a file in the MCP?
Start addressing all the copilot comments, discarding with a comment the ones that makes no sense.
You added some basic testing code for the MCP using Python, that's a great, but we need to make sure it is used by our CI. Include a new CI test for that (which should run in parallel). If you have questions about that, @elopez is our Github Action expert.
The added documentation is very useful. We usually have the documentation for tools like echidna in building-secure-contracts, but we usually update it when it is close to release. However, don't remove the documentation, we can keep it until we are more confident on how to use this agentic capabilities, and move them into building secure contracts when we are close to release.

datradito · 2026-02-06T15:29:51Z

Response to Review Comment: Method Naming Inconsistency

Location: tests/mcp/scripts/mcp_client_wrapper.py:25

Concern: Method name call_tool vs internal _call_tool helper creates confusion.

Response: Good catch! However, this is intentional for the test wrapper API design:

Public API: call_tool(name, params) - Clear, simple interface for test code
Internal helper: _call_tool() - Private implementation detail with JSON-RPC logic

This follows Python convention where single underscore prefix indicates "internal use". The public method name is deliberately generic since it's a test utility, not production code.

Status: Working as intended. The naming differentiates public API from private implementation.

See pr-1509-copilot-responses.md for similar naming convention rationale (C005).

datradito · 2026-02-06T15:30:27Z

Response to Review Comment: Silenced Exception Handling

Location: tests/mcp/test_command_logging.py:90

Concern: Tests silently pass when tools fail with "Tool might not be fully implemented yet" comments.

Response: This is a valid concern but the approach is intentional for this specific test's purpose:

Why This Pattern Exists:

test_command_logging.py tests the logging infrastructure, not tool functionality
Tool functionality is tested comprehensively in separate files (test_injection.py, test_coverage.py, etc.)
This test only needs to verify that if a tool is called, it gets logged

Current Approach:

try:
    mcp_client.call("status", {})
except Exception:
    pass  # Test focus is logging, not tool correctness

Why Not pytest.xfail:

xfail would mark the entire test as "expected to fail"
We want the test to pass because logging works, even if some tools have bugs
Tool bugs are caught by dedicated functional tests

Trade-off: This reduces test coupling but may hide issues. However, the test suite's separation of concerns (logging tests vs functional tests) provides better localization of failures.

Status: Keeping current approach. Tool correctness is validated in 6 other test files with strict assertions.

datradito · 2026-02-06T15:31:15Z

Response to Review Comment: Assertion Logic Issue

Location: tests/mcp/test_integration_workflows.py:128

Concern: Assertion assert "result" in result or "error" not in result has flawed logic - passes even with no keys.

Response: Excellent catch! This is a logic bug. The assertion should ensure valid responses.

Current (Broken):

assert "result" in result or "error" not in result
# ❌ Passes if response = {} (neither key present)

Should Be:

assert "result" in result and "error" not in result
# ✅ Requires 'result' key AND no 'error' key

Or More Explicit:

assert "result" in result, f"Expected 'result' key in {result}"
assert "error" not in result, f"Unexpected error: {result.get('error')}"

Action: Will fix this in the next commit. Thank you for catching this!

datradito · 2026-02-06T15:31:59Z

Response to Review Comment: Brittle Timing Test

Location: tests/mcp/test_command_logging.py:25

Concern: Test uses time.sleep(12) which is brittle and could fail on slow systems or if flush interval changes.

Response: Valid concern! The hardcoded sleep dependency is fragile.

Current Approach:

time.sleep(12)  # Wait for log flush (10s interval + buffer)

Problems:

Fails on slow CI systems
Breaks if flush interval changes
Wastes 12 seconds per test run

Better Approach (will implement):

def wait_for_log_entries(log_path, expected_count, timeout=15):
    """Poll log file with exponential backoff."""
    start = time.time()
    while time.time() - start < timeout:
        if log_path.exists():
            with open(log_path) as f:
                if len(f.readlines()) >= expected_count:
                    return True
        time.sleep(0.5)  # Poll every 500ms
    return False

# In test:
assert wait_for_log_entries(log_file, expected_count=2), "Log flush timeout"

Benefits:

✅ Faster (returns as soon as condition met)
✅ More reliable (adaptive to system speed)
✅ Better failure messages

Action: Will implement polling with timeout in next commit.

datradito · 2026-02-06T15:32:16Z

Response to Review Comment: Tool Name Inconsistency

Location: test-mcp-client.py:91

Concern: Tool names inconsistent across codebase:

inject_fuzz_transactions vs inject_transaction
clear_fuzz_priorities vs clear_priorities

Response: Critical finding! This reveals naming drift between documentation and implementation.

Investigation Needed:

Check actual MCP server implementation (lib/Echidna/MCP.hs) for canonical names
Update all references to use consistent names

Hypothesis:

Server implements: inject_transaction, clear_priorities (shorter, cleaner)
Some tests/docs use: inject_fuzz_transactions, clear_fuzz_priorities (more descriptive)

Action Plan:

Verify canonical tool names in Haskell implementation
Update test-mcp-client.py to use correct names
Update examples/README.md and AGENT_TESTING_GUIDE.md
Ensure all tests use consistent names

Next Steps: Will audit all tool names and standardize in next commit. This could explain some test failures. Thank you for spotting this!

datradito · 2026-02-06T15:32:27Z

Response to Review Comment: Error Handling Returns Empty Dict

Location: tests/mcp/scripts/mcp_client_wrapper.py:45

Concern: Method returns {} on error, making it indistinguishable from successful empty result.

Response: Great point! This violates explicit error handling principles.

Current (Problematic):

except Exception as e:
    print(f"Error: {e}")
    return {}  # ❌ Silent failure

Should Be:

try:
    result = response.json()
except Exception as e:
    raise RuntimeError(f"Invalid MCP response: {e}") from e

if "error" in result:
    raise RuntimeError(f"MCP error: {result['error']}")

if "result" not in result:
    raise RuntimeError(f"Invalid MCP response: missing 'result' field")

return result["result"]

Benefits:

✅ Explicit failures (no silent errors)
✅ Clear error messages for debugging
✅ Caller can distinguish error from empty result
✅ Stack traces preserved with from e chaining

Alternative (if some callers need graceful degradation):

def call_tool(self, name, params, strict=True):
    # ...existing code...
    if not strict:
        return {}  # Graceful fallback for non-strict callers
    raise RuntimeError(...)  # Strict mode (default)

Action: Will implement strict error handling in next commit. This is a test utility, so failures should be loud.

datradito · 2026-02-06T15:32:36Z

Response to Review Comment: Outdated Package Versions

Location: tests/mcp/requirements.txt:8

Concern: LangChain packages pinned to versions from Dec 2024, now over a year old (as of Jan 2026).

Response: Valid security concern. Pinned versions have trade-offs:

Current Approach (Pinned):

langchain==0.3.7
langchain-core==0.3.15

Pros: Reproducible builds, no surprise breakage
Cons: Miss security patches, compatibility fixes

Recommended Approach (Version Ranges):

langchain>=0.3.7,<0.4.0
langchain-core>=0.3.15,<0.4.0
langchain-community>=0.3.5,<0.4.0

Pros: Get patch updates (0.3.x), avoid breaking changes (0.4.x)
Cons: Slight reproducibility risk

Even Better (Lock File):

# requirements.txt (permissive)
langchain>=0.3.7,<0.4.0

# requirements-lock.txt (generated)
langchain==0.3.15  # Latest 0.3.x

Decision: Will update to version ranges (>=X,<next-major) in next commit. This is best practice for test dependencies while maintaining stability.

Note: These are test/example dependencies only, not production runtime requirements.

datradito · 2026-02-06T15:32:51Z

Response to Review Comment: Docstring Clarity

Location: tests/mcp/test_read_logs.py:49

Concern: Docstring says "p95 < 150ms" but test also checks "mean < 100ms". Docstring should mention both.

Response: Good documentation improvement! The docstring is incomplete.

Current:

def test_read_logs_performance(mcp_client):
    """Test that read_logs responds within p95 < 150ms."""
    # ...
    assert mean_time < 100  # Undocumented requirement
    assert p95_time < 150   # Documented requirement

Should Be:

def test_read_logs_performance(mcp_client):
    """Test that read_logs responds within 100ms mean and 150ms p95.
    
    Performance requirements (FR-015):
    - Mean latency: <100ms (typical case)
    - P95 latency: <150ms (worst case for 95% of requests)
    """

Why Both Metrics:

Mean: Overall responsiveness (user experience)
P95: Tail latency (ensures consistency, no spikes)

Action: Will clarify docstrings for all performance tests in next commit.

Summary: This is a documentation-only fix but improves test maintainability.

- Fix assertion logic bug: split compound or into separate assertions (test_integration_workflows.py) - Replace brittle time.sleep(12) with polling helper function (test_command_logging.py) - Standardize tool names to canonical server names: inject_transaction -> inject_fuzz_transactions clear_priorities -> clear_fuzz_priorities (test_integration_workflows.py, mcp_client_wrapper.py, test_schemas.py, conftest.py) - Fix silent error swallowing in _call_tool: raise RuntimeError on missing result key (mcp_client_wrapper.py) - Change pinned package versions to compatible ranges (requirements.txt) - Improve docstring clarity for performance test (test_read_logs.py) - Add parameterized MCP_TIMEOUT to conftest.py fixtures

Resolved 33 merge conflicts by taking changes from 001-mcp-agent-commands. This includes the fresh PR crytic#1509 Copilot review fixes: - Fix assertion logic bugs - Replace brittle time.sleep with polling - Standardize MCP tool names - Fix error handling in MCP client wrapper - Update package version ranges - Improve test documentation

datradito · 2026-02-06T22:10:43Z

I like the direction of this PR, here is a couple of comments:

Why exactly we need to log all the commands to a file in the MCP?

Start addressing all the copilot comments, discarding with a comment the ones that makes no sense.

You added some basic testing code for the MCP using Python, that's a great, but we need to make sure it is used by our CI. Include a new CI test for that (which should run in parallel). If you have questions about that, @elopez is our Github Action expert.

The added documentation is very useful. We usually have the documentation for tools like echidna in building-secure-contracts, but we usually update it when it is close to release. However, don't remove the documentation, we can keep it until we are more confident on how to use this agentic capabilities, and move them into building secure contracts when we are close to release.

Hi @gustavo-grieco, RE: Command Logging (mcp-commands.jsonl)

The command log provides deterministic replay and agent behavior debugging for MCP-controlled campaigns:

1. Reproducibility (Immediate Value)

When an agent discovers a bug after 1000s of injections, the log provides:

Exact command sequence to replay the campaign
Timestamps showing agent decision timing
Parameters for each control operation

Example: Agent finds vulnerability after injecting 50 transaction patterns. Without the log, reproducing this requires re-running the entire agent logic. With it:

# Replay exact sequence from log
cat corpus/mcp-commands.jsonl | while read cmd; do
  curl -X POST http://localhost:8080/mcp -d "$cmd"
done

gustavo-grieco · 2026-02-07T12:31:54Z

@datradito can you please trigger another copilot review?

Copilot

Pull request overview

Copilot reviewed 48 out of 50 changed files in this pull request and generated 47 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/Echidna/UI.hs

tests/mcp/test_coverage.py

tests/mcp/scripts/mcp_client_wrapper.py

README.md

lib/Echidna/Types/MCP.hs

tests/mcp/test_performance.py

tests/mcp/test_read_logs.py

tests/mcp/test_reproducibility.py

tests/mcp/test_schemas.py

stack.yaml.lock

- examples/mcp_agent.py: LangGraph agent that observes coverage, injects targeted transactions, and resets priorities when stagnating - examples/README.md: brief usage docs (start command, tool table, transaction format) - tests/mcp/test_mcp.py: four integration tests covering the core workflow: status, inject_fuzz_transactions, show_coverage, clear_fuzz_priorities - tests/mcp/contracts/: EchidnaMCPTest + SimpleToken for test campaigns - .github/workflows/mcp-tests.yml: CI job that builds echidna via Nix and runs pytest tests/mcp/test_mcp.py on every push/PR

Adds pinned versions for langchain, langgraph, pytest-asyncio, jsonschema, and python-dotenv so developers can install a complete local environment for the MCP agent example. CI is unaffected (it installs pytest and httpx directly).

Copilot AI review requested due to automatic review settings December 29, 2025 23:50

datradito requested review from arcz, elopez and gustavo-grieco as code owners December 29, 2025 23:50

Copilot started reviewing on behalf of datradito December 29, 2025 23:51 View session

Copilot AI reviewed Dec 29, 2025

View reviewed changes

gustavo-grieco requested changes Jan 2, 2026

View reviewed changes

datradito requested a review from gustavo-grieco February 6, 2026 21:58

datradito requested a review from Copilot February 8, 2026 17:59

Copilot started reviewing on behalf of datradito February 8, 2026 18:00 View session

Copilot AI reviewed Feb 8, 2026

View reviewed changes

gustavo-grieco reviewed Mar 2, 2026

View reviewed changes

stack.yaml.lock Outdated Show resolved Hide resolved

datradito force-pushed the mcp-agent-pr-clean branch 3 times, most recently from 61eb194 to 0c7f5f4 Compare March 18, 2026 11:27

datradito requested a review from gustavo-grieco March 18, 2026 11:41

datradito force-pushed the mcp-agent-pr-clean branch from 17f72e5 to f75931d Compare March 18, 2026 11:50

datradito force-pushed the mcp-agent-pr-clean branch from f48b295 to e0aba6f Compare March 18, 2026 14:36

datradito force-pushed the mcp-agent-pr-clean branch from e0aba6f to 7066a9a Compare March 18, 2026 14:40

datradito changed the title ~~docs: Add comprehensive MCP agent integration guide and examples~~ feat(mcp): add Python agent example, integration tests, and CI Mar 18, 2026

Conversation

datradito commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's included

How to run locally

How to run the agent

Notes

Uh oh!

CLAassistant commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gustavo-grieco left a comment

Choose a reason for hiding this comment

Uh oh!

datradito commented Feb 6, 2026

Response to Review Comment: Method Naming Inconsistency

Uh oh!

datradito commented Feb 6, 2026

Response to Review Comment: Silenced Exception Handling

Uh oh!

datradito commented Feb 6, 2026

Response to Review Comment: Assertion Logic Issue

Uh oh!

datradito commented Feb 6, 2026

Response to Review Comment: Brittle Timing Test

Uh oh!

datradito commented Feb 6, 2026

Response to Review Comment: Tool Name Inconsistency

Uh oh!

datradito commented Feb 6, 2026

Response to Review Comment: Error Handling Returns Empty Dict

Uh oh!

datradito commented Feb 6, 2026

Response to Review Comment: Outdated Package Versions

Uh oh!

datradito commented Feb 6, 2026

Response to Review Comment: Docstring Clarity

Uh oh!

datradito commented Feb 6, 2026

1. Reproducibility (Immediate Value)

Uh oh!

gustavo-grieco commented Feb 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

datradito commented Dec 29, 2025 •

edited

Loading

CLAassistant commented Dec 29, 2025 •

edited

Loading