Skip to content

feat(mcp): add Python agent example, integration tests, and CI#1509

Open
datradito wants to merge 2 commits intocrytic:dev-agentsfrom
datradito:mcp-agent-pr-clean
Open

feat(mcp): add Python agent example, integration tests, and CI#1509
datradito wants to merge 2 commits intocrytic:dev-agentsfrom
datradito:mcp-agent-pr-clean

Conversation

@datradito
Copy link
Copy Markdown

@datradito datradito commented Dec 29, 2025

Adds a minimal working example and test coverage for the MCP server
introduced in #1502.

What's included

examples/mcp_agent.py — A LangGraph-based autonomous agent that
connects to a live Echidna campaign over MCP. Each iteration it:

  1. Calls status to read coverage and corpus size
  2. When coverage stagnates, asks Claude to generate targeted transactions
    and injects them via inject_fuzz_transactions
  3. Periodically calls clear_fuzz_priorities to avoid getting stuck

examples/README.md — Setup, usage, and a reference table of all
available MCP tools.

tests/mcp/test_mcp.py — pytest integration tests that spin up a
real Echidna process and exercise the four core MCP operations:
inject transactions, check coverage, reset priorities, check status.

tests/mcp/contracts/EchidnaMCPTest.sol and SimpleToken.sol,
the contracts used by the test suite.

.github/workflows/mcp-tests.yml — CI job that builds Echidna via
Nix and runs the integration tests on every push and PR.

How to run locally

# Install test deps
pip install pytest httpx

# Run integration tests (requires Echidna binary in PATH)
pytest tests/mcp/test_mcp.py -v

How to run the agent

# Terminal 1 — start Echidna with MCP enabled
echidna MyContract.sol --server 8080 --format text

# Terminal 2 — run the agent
export ANTHROPIC_API_KEY=your_key_here
python examples/mcp_agent.py

--format text is required: it disables the interactive TUI so the
MCP server thread can accept connections.

Notes

Copilot AI review requested due to automatic review settings December 29, 2025 23:50
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Dec 29, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive documentation, examples, and testing infrastructure for the MCP (Model Context Protocol) server integration with Echidna. It enables AI agents to interact with Echidna's fuzzing campaigns through a standardized protocol, providing both simple and advanced (LLM-powered) agent examples for monitoring and guiding fuzzing operations.

Key Changes:

  • Complete test suite with pytest fixtures, schema validation, and integration tests covering all MCP tools
  • Two agent examples: a simple autonomous monitoring agent and an advanced LangGraph-based AI agent using Claude
  • Comprehensive documentation including setup guides, troubleshooting, and VS Code integration instructions

Reviewed changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated 32 comments.

Show a summary per file
File Description
tests/mcp/test_*.py Comprehensive pytest test suite covering all 9 MCP tools with schema validation, performance tests, and integration workflows
tests/mcp/conftest.py pytest fixtures providing MCP client and Echidna campaign management for testing
tests/mcp/scripts/*.py Helper utilities for MCP client wrapper and JSON schema validation
tests/mcp/contracts/*.sol Test Solidity contracts (SimpleToken, EchidnaMCPTest) for validating MCP functionality
tests/mcp/contracts/*.json JSON schemas defining expected response structures for each MCP tool
tests/mcp/requirements.txt Python dependencies for testing framework and LangChain integration
examples/simple_agent.py Autonomous monitoring agent with coverage stagnation detection and transaction injection
examples/langgraph_agent.py Advanced LLM-powered agent using LangGraph for intelligent fuzzing strategy
examples/README.md Documentation for agent examples with usage instructions and strategy ideas
AGENT_TESTING_GUIDE.md Comprehensive guide covering server setup, manual testing, agent integration, and troubleshooting
test-mcp-client.py Command-line test script validating all 7 core MCP tools
.gitignore Added patterns for Python cache files and temporary test artifacts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@gustavo-grieco gustavo-grieco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction of this PR, here is a couple of comments:

  • Why exactly we need to log all the commands to a file in the MCP?
  • Start addressing all the copilot comments, discarding with a comment the ones that makes no sense.
  • You added some basic testing code for the MCP using Python, that's a great, but we need to make sure it is used by our CI. Include a new CI test for that (which should run in parallel). If you have questions about that, @elopez is our Github Action expert.
  • The added documentation is very useful. We usually have the documentation for tools like echidna in building-secure-contracts, but we usually update it when it is close to release. However, don't remove the documentation, we can keep it until we are more confident on how to use this agentic capabilities, and move them into building secure contracts when we are close to release.

@datradito
Copy link
Copy Markdown
Author

Response to Review Comment: Method Naming Inconsistency

Location: tests/mcp/scripts/mcp_client_wrapper.py:25

Concern: Method name call_tool vs internal _call_tool helper creates confusion.

Response: Good catch! However, this is intentional for the test wrapper API design:

  • Public API: call_tool(name, params) - Clear, simple interface for test code
  • Internal helper: _call_tool() - Private implementation detail with JSON-RPC logic

This follows Python convention where single underscore prefix indicates "internal use". The public method name is deliberately generic since it's a test utility, not production code.

Status: Working as intended. The naming differentiates public API from private implementation.


See pr-1509-copilot-responses.md for similar naming convention rationale (C005).

@datradito
Copy link
Copy Markdown
Author

Response to Review Comment: Silenced Exception Handling

Location: tests/mcp/test_command_logging.py:90

Concern: Tests silently pass when tools fail with "Tool might not be fully implemented yet" comments.

Response: This is a valid concern but the approach is intentional for this specific test's purpose:

Why This Pattern Exists:

  • test_command_logging.py tests the logging infrastructure, not tool functionality
  • Tool functionality is tested comprehensively in separate files (test_injection.py, test_coverage.py, etc.)
  • This test only needs to verify that if a tool is called, it gets logged

Current Approach:

try:
    mcp_client.call("status", {})
except Exception:
    pass  # Test focus is logging, not tool correctness

Why Not pytest.xfail:

  • xfail would mark the entire test as "expected to fail"
  • We want the test to pass because logging works, even if some tools have bugs
  • Tool bugs are caught by dedicated functional tests

Trade-off: This reduces test coupling but may hide issues. However, the test suite's separation of concerns (logging tests vs functional tests) provides better localization of failures.

Status: Keeping current approach. Tool correctness is validated in 6 other test files with strict assertions.

@datradito
Copy link
Copy Markdown
Author

Response to Review Comment: Assertion Logic Issue

Location: tests/mcp/test_integration_workflows.py:128

Concern: Assertion assert "result" in result or "error" not in result has flawed logic - passes even with no keys.

Response: Excellent catch! This is a logic bug. The assertion should ensure valid responses.

Current (Broken):

assert "result" in result or "error" not in result
# ❌ Passes if response = {} (neither key present)

Should Be:

assert "result" in result and "error" not in result
# ✅ Requires 'result' key AND no 'error' key

Or More Explicit:

assert "result" in result, f"Expected 'result' key in {result}"
assert "error" not in result, f"Unexpected error: {result.get('error')}"

Action: Will fix this in the next commit. Thank you for catching this!

@datradito
Copy link
Copy Markdown
Author

Response to Review Comment: Brittle Timing Test

Location: tests/mcp/test_command_logging.py:25

Concern: Test uses time.sleep(12) which is brittle and could fail on slow systems or if flush interval changes.

Response: Valid concern! The hardcoded sleep dependency is fragile.

Current Approach:

time.sleep(12)  # Wait for log flush (10s interval + buffer)

Problems:

  • Fails on slow CI systems
  • Breaks if flush interval changes
  • Wastes 12 seconds per test run

Better Approach (will implement):

def wait_for_log_entries(log_path, expected_count, timeout=15):
    """Poll log file with exponential backoff."""
    start = time.time()
    while time.time() - start < timeout:
        if log_path.exists():
            with open(log_path) as f:
                if len(f.readlines()) >= expected_count:
                    return True
        time.sleep(0.5)  # Poll every 500ms
    return False

# In test:
assert wait_for_log_entries(log_file, expected_count=2), "Log flush timeout"

Benefits:

  • ✅ Faster (returns as soon as condition met)
  • ✅ More reliable (adaptive to system speed)
  • ✅ Better failure messages

Action: Will implement polling with timeout in next commit.

@datradito
Copy link
Copy Markdown
Author

Response to Review Comment: Tool Name Inconsistency

Location: test-mcp-client.py:91

Concern: Tool names inconsistent across codebase:

  • inject_fuzz_transactions vs inject_transaction
  • clear_fuzz_priorities vs clear_priorities

Response: Critical finding! This reveals naming drift between documentation and implementation.

Investigation Needed:

  1. Check actual MCP server implementation (lib/Echidna/MCP.hs) for canonical names
  2. Update all references to use consistent names

Hypothesis:

  • Server implements: inject_transaction, clear_priorities (shorter, cleaner)
  • Some tests/docs use: inject_fuzz_transactions, clear_fuzz_priorities (more descriptive)

Action Plan:

  1. Verify canonical tool names in Haskell implementation
  2. Update test-mcp-client.py to use correct names
  3. Update examples/README.md and AGENT_TESTING_GUIDE.md
  4. Ensure all tests use consistent names

Next Steps: Will audit all tool names and standardize in next commit. This could explain some test failures. Thank you for spotting this!

@datradito
Copy link
Copy Markdown
Author

Response to Review Comment: Error Handling Returns Empty Dict

Location: tests/mcp/scripts/mcp_client_wrapper.py:45

Concern: Method returns {} on error, making it indistinguishable from successful empty result.

Response: Great point! This violates explicit error handling principles.

Current (Problematic):

except Exception as e:
    print(f"Error: {e}")
    return {}  # ❌ Silent failure

Should Be:

try:
    result = response.json()
except Exception as e:
    raise RuntimeError(f"Invalid MCP response: {e}") from e

if "error" in result:
    raise RuntimeError(f"MCP error: {result['error']}")

if "result" not in result:
    raise RuntimeError(f"Invalid MCP response: missing 'result' field")

return result["result"]

Benefits:

  • ✅ Explicit failures (no silent errors)
  • ✅ Clear error messages for debugging
  • ✅ Caller can distinguish error from empty result
  • ✅ Stack traces preserved with from e chaining

Alternative (if some callers need graceful degradation):

def call_tool(self, name, params, strict=True):
    # ...existing code...
    if not strict:
        return {}  # Graceful fallback for non-strict callers
    raise RuntimeError(...)  # Strict mode (default)

Action: Will implement strict error handling in next commit. This is a test utility, so failures should be loud.

@datradito
Copy link
Copy Markdown
Author

Response to Review Comment: Outdated Package Versions

Location: tests/mcp/requirements.txt:8

Concern: LangChain packages pinned to versions from Dec 2024, now over a year old (as of Jan 2026).

Response: Valid security concern. Pinned versions have trade-offs:

Current Approach (Pinned):

langchain==0.3.7
langchain-core==0.3.15

Pros: Reproducible builds, no surprise breakage
Cons: Miss security patches, compatibility fixes

Recommended Approach (Version Ranges):

langchain>=0.3.7,<0.4.0
langchain-core>=0.3.15,<0.4.0
langchain-community>=0.3.5,<0.4.0

Pros: Get patch updates (0.3.x), avoid breaking changes (0.4.x)
Cons: Slight reproducibility risk

Even Better (Lock File):

# requirements.txt (permissive)
langchain>=0.3.7,<0.4.0

# requirements-lock.txt (generated)
langchain==0.3.15  # Latest 0.3.x

Decision: Will update to version ranges (>=X,<next-major) in next commit. This is best practice for test dependencies while maintaining stability.

Note: These are test/example dependencies only, not production runtime requirements.

@datradito
Copy link
Copy Markdown
Author

Response to Review Comment: Docstring Clarity

Location: tests/mcp/test_read_logs.py:49

Concern: Docstring says "p95 < 150ms" but test also checks "mean < 100ms". Docstring should mention both.

Response: Good documentation improvement! The docstring is incomplete.

Current:

def test_read_logs_performance(mcp_client):
    """Test that read_logs responds within p95 < 150ms."""
    # ...
    assert mean_time < 100  # Undocumented requirement
    assert p95_time < 150   # Documented requirement

Should Be:

def test_read_logs_performance(mcp_client):
    """Test that read_logs responds within 100ms mean and 150ms p95.
    
    Performance requirements (FR-015):
    - Mean latency: <100ms (typical case)
    - P95 latency: <150ms (worst case for 95% of requests)
    """

Why Both Metrics:

  • Mean: Overall responsiveness (user experience)
  • P95: Tail latency (ensures consistency, no spikes)

Action: Will clarify docstrings for all performance tests in next commit.


Summary: This is a documentation-only fix but improves test maintainability.

datradito added a commit to datradito/echidna-mcp that referenced this pull request Feb 6, 2026
- Fix assertion logic bug: split compound or into separate assertions
  (test_integration_workflows.py)
- Replace brittle time.sleep(12) with polling helper function
  (test_command_logging.py)
- Standardize tool names to canonical server names:
  inject_transaction -> inject_fuzz_transactions
  clear_priorities -> clear_fuzz_priorities
  (test_integration_workflows.py, mcp_client_wrapper.py, test_schemas.py,
   conftest.py)
- Fix silent error swallowing in _call_tool: raise RuntimeError on
  missing result key (mcp_client_wrapper.py)
- Change pinned package versions to compatible ranges (requirements.txt)
- Improve docstring clarity for performance test (test_read_logs.py)
- Add parameterized MCP_TIMEOUT to conftest.py fixtures
datradito added a commit to datradito/echidna-mcp that referenced this pull request Feb 6, 2026
Resolved 33 merge conflicts by taking changes from 001-mcp-agent-commands.
This includes the fresh PR crytic#1509 Copilot review fixes:
- Fix assertion logic bugs
- Replace brittle time.sleep with polling
- Standardize MCP tool names
- Fix error handling in MCP client wrapper
- Update package version ranges
- Improve test documentation
@datradito
Copy link
Copy Markdown
Author

I like the direction of this PR, here is a couple of comments:

  • Why exactly we need to log all the commands to a file in the MCP?
  • Start addressing all the copilot comments, discarding with a comment the ones that makes no sense.
  • You added some basic testing code for the MCP using Python, that's a great, but we need to make sure it is used by our CI. Include a new CI test for that (which should run in parallel). If you have questions about that, @elopez is our Github Action expert.
  • The added documentation is very useful. We usually have the documentation for tools like echidna in building-secure-contracts, but we usually update it when it is close to release. However, don't remove the documentation, we can keep it until we are more confident on how to use this agentic capabilities, and move them into building secure contracts when we are close to release.

Hi @gustavo-grieco, RE: Command Logging (mcp-commands.jsonl)

The command log provides deterministic replay and agent behavior debugging for MCP-controlled campaigns:

1. Reproducibility (Immediate Value)

When an agent discovers a bug after 1000s of injections, the log provides:

  • Exact command sequence to replay the campaign
  • Timestamps showing agent decision timing
  • Parameters for each control operation

Example: Agent finds vulnerability after injecting 50 transaction patterns. Without the log, reproducing this requires re-running the entire agent logic. With it:

# Replay exact sequence from log
cat corpus/mcp-commands.jsonl | while read cmd; do
  curl -X POST http://localhost:8080/mcp -d "$cmd"
done

@gustavo-grieco
Copy link
Copy Markdown
Collaborator

@datradito can you please trigger another copilot review?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 48 out of 50 changed files in this pull request and generated 47 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@datradito datradito force-pushed the mcp-agent-pr-clean branch 3 times, most recently from 61eb194 to 0c7f5f4 Compare March 18, 2026 11:27
@datradito datradito force-pushed the mcp-agent-pr-clean branch from 17f72e5 to f75931d Compare March 18, 2026 11:50
@datradito datradito force-pushed the mcp-agent-pr-clean branch from f48b295 to e0aba6f Compare March 18, 2026 14:36
- examples/mcp_agent.py: LangGraph agent that observes coverage,
  injects targeted transactions, and resets priorities when stagnating
- examples/README.md: brief usage docs (start command, tool table,
  transaction format)
- tests/mcp/test_mcp.py: four integration tests covering the core
  workflow: status, inject_fuzz_transactions, show_coverage,
  clear_fuzz_priorities
- tests/mcp/contracts/: EchidnaMCPTest + SimpleToken for test campaigns
- .github/workflows/mcp-tests.yml: CI job that builds echidna via Nix
  and runs pytest tests/mcp/test_mcp.py on every push/PR
@datradito datradito force-pushed the mcp-agent-pr-clean branch from e0aba6f to 7066a9a Compare March 18, 2026 14:40
Adds pinned versions for langchain, langgraph, pytest-asyncio, jsonschema,
and python-dotenv so developers can install a complete local environment
for the MCP agent example. CI is unaffected (it installs pytest and httpx
directly).
@datradito datradito changed the title docs: Add comprehensive MCP agent integration guide and examples feat(mcp): add Python agent example, integration tests, and CI Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants