Skip to content

[E09] Web page reference fetcher service #368

@IanMayo

Description

@IanMayo

Epic

Part of E09: Provenance Graph

Problem

When an analyst captures a reference value from a web page (URL + XPath/CSS selector), Debrief needs to be able to re-fetch that value later and determine whether it has changed. There is no service to perform this fetch-and-compare operation.

Proposed Solution

Python service that:

  1. Accepts a reference source record (URL + selector)
  2. Fetches the web page
  3. Applies the XPath/CSS selector to extract the current value
  4. Compares against the reference value at capture
  5. Returns the appropriate currency state:
    • current — fetched value matches captured value
    • changed — fetched value differs
    • unavailable — source could not be reached
    • source-structure-changed — source fetched but selector returned no match

This is the initial source type implementation; the service interface should be extensible for REST API, local file, and cross-session sources later.

Success Criteria

  • Service correctly determines all 4 currency states for web page sources
  • Handles network errors gracefully (returns unavailable)
  • Handles DOM restructuring (returns source-structure-changed)
  • Unit tests cover all currency state transitions
  • Exposed via MCP following existing service patterns

Dependencies

Requires #145 (reference data source schema)

Complexity

Medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions