Solas Workers

Cloudflare Workers for the Solas data collection pipeline.

Workers

Worker	Purpose	Input	Output
scraper	Extract clean text from a URL	`{ url, maxChars? }`	`{ url, title, text, charCount }`
crawler	BFS crawl a domain	`{ baseUrl, maxPages?, maxDepth? }`	`{ pages: [...], totalFound, totalCrawled }`
search-aggregator	Multi-source search	`{ query, count? }`	`{ results: [...], totalSources }`
extractor	Regex-based structured extraction	`{ content, type?, html? }`	`{ type, data: {...} }`

Deploy

cd <worker-name>
npm install
npx wrangler deploy

Prerequisites

Cloudflare account with Workers enabled
CLOUDFLARE_API_TOKEN set in environment

KV namespaces created for crawler and search-aggregator:

cd crawler && npx wrangler kv:namespace create CRAWL_RESULTS
cd search-aggregator && npx wrangler kv:namespace create SEARCH_CACHE

Update wrangler.jsonc with returned KV namespace IDs

Architecture

User → Worker → External Source → Extract → Return
                    ↓
              KV Cache (optional)

All workers are TypeScript, self-contained, minimal dependencies. They use native Cloudflare APIs (HTMLRewriter, fetch, KV) for performance and cost efficiency.

Notes

Scraper uses HTMLRewriter for streaming extraction (no full DOM parse)
Crawler respects robots.txt and rate-limits at 1 req/sec
Search aggregator caches results in KV for 1 hour
Extractor uses regex patterns — no LLM dependency (future version will add LLM extraction)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
crawler		crawler
extractor		extractor
health-check		health-check
scraper		scraper
search-aggregator		search-aggregator
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solas Workers

Workers

Deploy

Prerequisites

Architecture

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Solas Workers

Workers

Deploy

Prerequisites

Architecture

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages