feat(mcp): add Python agent example, integration tests, and CI#1509
feat(mcp): add Python agent example, integration tests, and CI#1509datradito wants to merge 2 commits intocrytic:dev-agentsfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive documentation, examples, and testing infrastructure for the MCP (Model Context Protocol) server integration with Echidna. It enables AI agents to interact with Echidna's fuzzing campaigns through a standardized protocol, providing both simple and advanced (LLM-powered) agent examples for monitoring and guiding fuzzing operations.
Key Changes:
- Complete test suite with pytest fixtures, schema validation, and integration tests covering all MCP tools
- Two agent examples: a simple autonomous monitoring agent and an advanced LangGraph-based AI agent using Claude
- Comprehensive documentation including setup guides, troubleshooting, and VS Code integration instructions
Reviewed changes
Copilot reviewed 28 out of 29 changed files in this pull request and generated 32 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/mcp/test_*.py | Comprehensive pytest test suite covering all 9 MCP tools with schema validation, performance tests, and integration workflows |
| tests/mcp/conftest.py | pytest fixtures providing MCP client and Echidna campaign management for testing |
| tests/mcp/scripts/*.py | Helper utilities for MCP client wrapper and JSON schema validation |
| tests/mcp/contracts/*.sol | Test Solidity contracts (SimpleToken, EchidnaMCPTest) for validating MCP functionality |
| tests/mcp/contracts/*.json | JSON schemas defining expected response structures for each MCP tool |
| tests/mcp/requirements.txt | Python dependencies for testing framework and LangChain integration |
| examples/simple_agent.py | Autonomous monitoring agent with coverage stagnation detection and transaction injection |
| examples/langgraph_agent.py | Advanced LLM-powered agent using LangGraph for intelligent fuzzing strategy |
| examples/README.md | Documentation for agent examples with usage instructions and strategy ideas |
| AGENT_TESTING_GUIDE.md | Comprehensive guide covering server setup, manual testing, agent integration, and troubleshooting |
| test-mcp-client.py | Command-line test script validating all 7 core MCP tools |
| .gitignore | Added patterns for Python cache files and temporary test artifacts |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
gustavo-grieco
left a comment
There was a problem hiding this comment.
I like the direction of this PR, here is a couple of comments:
- Why exactly we need to log all the commands to a file in the MCP?
- Start addressing all the copilot comments, discarding with a comment the ones that makes no sense.
- You added some basic testing code for the MCP using Python, that's a great, but we need to make sure it is used by our CI. Include a new CI test for that (which should run in parallel). If you have questions about that, @elopez is our Github Action expert.
- The added documentation is very useful. We usually have the documentation for tools like echidna in building-secure-contracts, but we usually update it when it is close to release. However, don't remove the documentation, we can keep it until we are more confident on how to use this agentic capabilities, and move them into building secure contracts when we are close to release.
Response to Review Comment: Method Naming InconsistencyLocation: Concern: Method name Response: Good catch! However, this is intentional for the test wrapper API design:
This follows Python convention where single underscore prefix indicates "internal use". The public method name is deliberately generic since it's a test utility, not production code. Status: Working as intended. The naming differentiates public API from private implementation. See pr-1509-copilot-responses.md for similar naming convention rationale (C005). |
Response to Review Comment: Silenced Exception HandlingLocation: Concern: Tests silently pass when tools fail with "Tool might not be fully implemented yet" comments. Response: This is a valid concern but the approach is intentional for this specific test's purpose: Why This Pattern Exists:
Current Approach: try:
mcp_client.call("status", {})
except Exception:
pass # Test focus is logging, not tool correctnessWhy Not
Trade-off: This reduces test coupling but may hide issues. However, the test suite's separation of concerns (logging tests vs functional tests) provides better localization of failures. Status: Keeping current approach. Tool correctness is validated in 6 other test files with strict assertions. |
Response to Review Comment: Assertion Logic IssueLocation: Concern: Assertion Response: Excellent catch! This is a logic bug. The assertion should ensure valid responses. Current (Broken): assert "result" in result or "error" not in result
# ❌ Passes if response = {} (neither key present)Should Be: assert "result" in result and "error" not in result
# ✅ Requires 'result' key AND no 'error' keyOr More Explicit: assert "result" in result, f"Expected 'result' key in {result}"
assert "error" not in result, f"Unexpected error: {result.get('error')}"Action: Will fix this in the next commit. Thank you for catching this! |
Response to Review Comment: Brittle Timing TestLocation: Concern: Test uses Response: Valid concern! The hardcoded sleep dependency is fragile. Current Approach: time.sleep(12) # Wait for log flush (10s interval + buffer)Problems:
Better Approach (will implement): def wait_for_log_entries(log_path, expected_count, timeout=15):
"""Poll log file with exponential backoff."""
start = time.time()
while time.time() - start < timeout:
if log_path.exists():
with open(log_path) as f:
if len(f.readlines()) >= expected_count:
return True
time.sleep(0.5) # Poll every 500ms
return False
# In test:
assert wait_for_log_entries(log_file, expected_count=2), "Log flush timeout"Benefits:
Action: Will implement polling with timeout in next commit. |
Response to Review Comment: Tool Name InconsistencyLocation: Concern: Tool names inconsistent across codebase:
Response: Critical finding! This reveals naming drift between documentation and implementation. Investigation Needed:
Hypothesis:
Action Plan:
Next Steps: Will audit all tool names and standardize in next commit. This could explain some test failures. Thank you for spotting this! |
Response to Review Comment: Error Handling Returns Empty DictLocation: Concern: Method returns Response: Great point! This violates explicit error handling principles. Current (Problematic): except Exception as e:
print(f"Error: {e}")
return {} # ❌ Silent failureShould Be: try:
result = response.json()
except Exception as e:
raise RuntimeError(f"Invalid MCP response: {e}") from e
if "error" in result:
raise RuntimeError(f"MCP error: {result['error']}")
if "result" not in result:
raise RuntimeError(f"Invalid MCP response: missing 'result' field")
return result["result"]Benefits:
Alternative (if some callers need graceful degradation): def call_tool(self, name, params, strict=True):
# ...existing code...
if not strict:
return {} # Graceful fallback for non-strict callers
raise RuntimeError(...) # Strict mode (default)Action: Will implement strict error handling in next commit. This is a test utility, so failures should be loud. |
Response to Review Comment: Outdated Package VersionsLocation: Concern: LangChain packages pinned to versions from Dec 2024, now over a year old (as of Jan 2026). Response: Valid security concern. Pinned versions have trade-offs: Current Approach (Pinned): Pros: Reproducible builds, no surprise breakage Recommended Approach (Version Ranges): Pros: Get patch updates (0.3.x), avoid breaking changes (0.4.x) Even Better (Lock File): Decision: Will update to version ranges ( Note: These are test/example dependencies only, not production runtime requirements. |
Response to Review Comment: Docstring ClarityLocation: Concern: Docstring says "p95 < 150ms" but test also checks "mean < 100ms". Docstring should mention both. Response: Good documentation improvement! The docstring is incomplete. Current: def test_read_logs_performance(mcp_client):
"""Test that read_logs responds within p95 < 150ms."""
# ...
assert mean_time < 100 # Undocumented requirement
assert p95_time < 150 # Documented requirementShould Be: def test_read_logs_performance(mcp_client):
"""Test that read_logs responds within 100ms mean and 150ms p95.
Performance requirements (FR-015):
- Mean latency: <100ms (typical case)
- P95 latency: <150ms (worst case for 95% of requests)
"""Why Both Metrics:
Action: Will clarify docstrings for all performance tests in next commit. Summary: This is a documentation-only fix but improves test maintainability. |
- Fix assertion logic bug: split compound or into separate assertions (test_integration_workflows.py) - Replace brittle time.sleep(12) with polling helper function (test_command_logging.py) - Standardize tool names to canonical server names: inject_transaction -> inject_fuzz_transactions clear_priorities -> clear_fuzz_priorities (test_integration_workflows.py, mcp_client_wrapper.py, test_schemas.py, conftest.py) - Fix silent error swallowing in _call_tool: raise RuntimeError on missing result key (mcp_client_wrapper.py) - Change pinned package versions to compatible ranges (requirements.txt) - Improve docstring clarity for performance test (test_read_logs.py) - Add parameterized MCP_TIMEOUT to conftest.py fixtures
Resolved 33 merge conflicts by taking changes from 001-mcp-agent-commands. This includes the fresh PR crytic#1509 Copilot review fixes: - Fix assertion logic bugs - Replace brittle time.sleep with polling - Standardize MCP tool names - Fix error handling in MCP client wrapper - Update package version ranges - Improve test documentation
Hi @gustavo-grieco, RE: Command Logging ( The command log provides deterministic replay and agent behavior debugging for MCP-controlled campaigns: 1. Reproducibility (Immediate Value)When an agent discovers a bug after 1000s of injections, the log provides:
Example: Agent finds vulnerability after injecting 50 transaction patterns. Without the log, reproducing this requires re-running the entire agent logic. With it: # Replay exact sequence from log
cat corpus/mcp-commands.jsonl | while read cmd; do
curl -X POST http://localhost:8080/mcp -d "$cmd"
done |
|
@datradito can you please trigger another copilot review? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 48 out of 50 changed files in this pull request and generated 47 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
61eb194 to
0c7f5f4
Compare
17f72e5 to
f75931d
Compare
f48b295 to
e0aba6f
Compare
- examples/mcp_agent.py: LangGraph agent that observes coverage, injects targeted transactions, and resets priorities when stagnating - examples/README.md: brief usage docs (start command, tool table, transaction format) - tests/mcp/test_mcp.py: four integration tests covering the core workflow: status, inject_fuzz_transactions, show_coverage, clear_fuzz_priorities - tests/mcp/contracts/: EchidnaMCPTest + SimpleToken for test campaigns - .github/workflows/mcp-tests.yml: CI job that builds echidna via Nix and runs pytest tests/mcp/test_mcp.py on every push/PR
e0aba6f to
7066a9a
Compare
Adds pinned versions for langchain, langgraph, pytest-asyncio, jsonschema, and python-dotenv so developers can install a complete local environment for the MCP agent example. CI is unaffected (it installs pytest and httpx directly).
Adds a minimal working example and test coverage for the MCP server
introduced in #1502.
What's included
examples/mcp_agent.py— A LangGraph-based autonomous agent thatconnects to a live Echidna campaign over MCP. Each iteration it:
statusto read coverage and corpus sizeand injects them via
inject_fuzz_transactionsclear_fuzz_prioritiesto avoid getting stuckexamples/README.md— Setup, usage, and a reference table of allavailable MCP tools.
tests/mcp/test_mcp.py— pytest integration tests that spin up areal Echidna process and exercise the four core MCP operations:
inject transactions, check coverage, reset priorities, check status.
tests/mcp/contracts/—EchidnaMCPTest.solandSimpleToken.sol,the contracts used by the test suite.
.github/workflows/mcp-tests.yml— CI job that builds Echidna viaNix and runs the integration tests on every push and PR.
How to run locally
How to run the agent
Notes