Vanilla WAA bootstrap automation #10

abrichr · 2026-01-22T00:34:24Z

Summary

add vanilla WAA bootstrap + helper scripts that auto-clone and validate local setup
update CLI guidance to point at vanilla WAA scripts and short-circuit legacy flows
archive legacy WAA deploy assets/docs under deprecated/

Testing

Phase 1 of viewer consolidation plan: Foundation Changes: - Add openadapt-viewer as local file dependency in pyproject.toml - Create openadapt_ml/training/viewer_components.py adapter module * screenshot_with_predictions() - Screenshot with human/AI overlays * training_metrics() - Training stats metrics grid * playback_controls() - Playback UI controls * correctness_badge() - Pass/fail badge component * generate_comparison_summary() - Model comparison summary - Add tests/test_viewer_screenshots.py with component validation tests - Add openadapt_ml/training/viewer_migration_example.py validation example Design: - Zero breaking changes to existing viewer.py code - Adapter pattern wraps openadapt-viewer with ML-specific context - Functions accept openadapt-ml data structures - Can be incrementally adopted in future phases Next steps (Phase 2): - Gradually migrate viewer.py to use these adapters - Replace inline HTML generation with component calls Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Restored and enhanced the workflow segmentation system from commit dd9a393 with new integration for openadapt-capture format. ## What's Added ### Core Segmentation Pipeline (4 stages): 1. **Stage 1 - Frame Description (VLM)**: - Converts screenshots + actions into semantic descriptions - Supports Gemini, Claude, GPT-4o backends - Automatic caching for efficiency - File: openadapt_ml/segmentation/frame_describer.py 2. **Stage 2 - Episode Extraction (LLM)**: - Identifies coherent workflow boundaries - Few-shot prompting for better quality - Confidence-based filtering - File: openadapt_ml/segmentation/segment_extractor.py 3. **Stage 3 - Deduplication (Embeddings)**: - Finds similar workflows across recordings - Agglomerative clustering with cosine similarity - Supports OpenAI or local HuggingFace embeddings - File: openadapt_ml/segmentation/deduplicator.py 4. **Stage 4 - Annotation (VLM Quality Control)**: - Auto-annotates episodes for training data quality - Detects failures, boundary issues, incompleteness - Human-in-the-loop review workflow - File: openadapt_ml/segmentation/annotator.py ### Integration Features: - **CaptureAdapter**: Loads recordings from openadapt-capture SQLite format - File: openadapt_ml/segmentation/adapters/capture_adapter.py - Automatically used when capture.db is detected - Converts events to segmentation format - **Unified Pipeline**: Run all stages with single API - File: openadapt_ml/segmentation/pipeline.py - Automatic intermediate result caching - Resume support for interrupted runs - **CLI Interface**: Full command-line interface for all stages - File: openadapt_ml/segmentation/cli.py - Commands: describe, extract, deduplicate, annotate, review, export-gold - **Comprehensive Documentation**: - File: openadapt_ml/segmentation/README.md - 20+ code examples - Complete API reference - Integration guide - Cost estimates and performance benchmarks ## Use Cases 1. **Training Data Curation**: Extract and filter high-quality demonstration episodes 2. **Demo Retrieval**: Build searchable libraries for demo-conditioned prompting 3. **Workflow Documentation**: Auto-generate step-by-step guides from recordings ## Data Schemas All schemas use Pydantic for type safety (openadapt_ml/segmentation/schemas.py): - ActionTranscript: Frame-by-frame semantic descriptions - Episode: Coherent workflow segment with boundaries - CanonicalEpisode: Deduplicated workflow definition - EpisodeAnnotation: Quality assessment for training data ## Example Usage ```python from openadapt_ml.segmentation import SegmentationPipeline, PipelineConfig config = PipelineConfig( vlm_model="gemini-2.0-flash", llm_model="gpt-4o", similarity_threshold=0.85 ) pipeline = SegmentationPipeline(config) result = pipeline.run( recordings=["/path/to/recording1", "/path/to/recording2"], output_dir="workflow_library" ) print(f"Found {result.unique_episodes} unique workflows") ``` ## Next Steps See openadapt_ml/segmentation/README.md for: - P0: Integration tests with real openadapt-capture recordings - P0: Visualization generator for segment boundaries - P1: Improved prompt engineering and cost optimization - P2: Active learning and multi-modal features Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Features added: - Azure ML job tracking: Shows recent jobs from last 7 days with status - Cost tracking: Real-time uptime, hourly rate, and cost estimation - VM activity detection: Identifies what VM is currently doing - Evaluation history: Past benchmark runs and success rates (--details flag) - Enhanced UI: Structured dashboard with clear sections and icons New utility functions in vm_monitor.py: - fetch_azure_ml_jobs(): Fetch recent Azure ML jobs with filtering - calculate_vm_costs(): Calculate VM costs with hourly/daily/weekly rates - get_vm_uptime_hours(): Get VM uptime from Azure activity logs - detect_vm_activity(): Detect current VM activity (idle, running, setup) - get_evaluation_history(): Load past evaluation runs from results dir CLI enhancements: - Added --details flag for extended information - Improved output formatting with sections and separators - Better error handling and status icons - Preserved existing SSH tunnel and dashboard functionality Documentation: - Updated CLAUDE.md with new features and usage examples - Added detailed docstrings to all new functions This consolidates VM monitoring into a single enhanced command rather than creating duplicate dashboards, following the viewer consolidation strategy. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Update CaptureAdapter to work with actual openadapt-capture database format. Key changes: - Use screen.frame events instead of generic event types - Pair action events (mouse.down + mouse.up → single click) - Map frame events to screenshots via timestamp matching - Update event type filtering to match openadapt-capture schema - Improve frame-to-action association logic This enables the segmentation pipeline to process real capture recordings from openadapt-capture instead of requiring simulated data. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Enhance vm monitor command to provide complete VM usage tracking: - Real-time VM status (size, IP, power state) - Activity detection (idle, benchmark running, setup) - Cost tracking (uptime hours, hourly rate, total cost) - Azure ML jobs list (last 7 days with status) - Evaluation history (with --details flag) - Mock mode for testing without VM (--mock flag) Add new API endpoints to local.py dashboard server: - /api/benchmark/status - current job status with ETA - /api/benchmark/costs - cost breakdown (Azure VM, API, GPU) - /api/benchmark/metrics - performance metrics by domain - /api/benchmark/workers - worker status and utilization - /api/benchmark/runs - list all benchmark runs - /api/benchmark/tasks/{run}/{task} - task execution details Update README with VM monitor section including screenshots and usage examples. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add comprehensive test plan and results for workflow segmentation pipeline: - Test plan with 8 stages from environment setup to documentation - Test results documenting real capture processing outcomes - Test files for CaptureAdapter and segmentation pipeline Add VM monitor screenshot generation scripts and documentation: - Scripts for automated dashboard screenshot generation - Implementation plan for VM monitor screenshot feature - Analysis of screenshot capture approaches Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Archive OpenAdapter (incomplete pre-refactor cloud deployment POC) - Document key takeaways and lessons learned - Reference modern cloud infrastructure in openadapt-ml - Add guidelines for when to archive repositories OpenAdapter was an incomplete proof-of-concept from October 2024 with only 165 lines of code and no ecosystem usage. Cloud deployment is now production-ready in openadapt_ml/cloud/ and benchmarks/azure.py. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add search bar to viewer controls with Ctrl+F / Cmd+F keyboard shortcut - Implement advanced token-based search across step indices, action types, and text - Search filters step list in real-time with result count display - Clear button and Escape key support for resetting search - Consistent UI styling with existing viewer components - Integrates with existing step list filtering Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Remove non-existent openadapt_ml.shared_ui import from viewer.py - Skip anthropic test when anthropic package not installed (optional dependency) - Skip viewer_components test when openadapt-viewer not installed (optional dependency) All tests now pass (334 passed, 6 skipped). Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

- Add azure_ops_tracker.py for real-time status tracking via SSE - Add azure_ops_viewer.py with live VNC iframe embed - Add /api/azure-ops-status and /api/azure-ops-sse endpoints - Add progress bar, cost tracking, elapsed time display - Add copy logs button and auto-scroll controls feat(cli): add new VM management commands - Add vm start-windows command - Add vm restart-windows command - Add vm check-build command - Add vm screenshot command for capturing dashboards - Fix container restart to always use --cap-add NET_ADMIN feat(infra): add screenshot capture infrastructure - Add capture_screenshots.py script - Configure BuildKit GC with 30GB limit - Fix Dockerfile OEM path and networking docs: add Azure dashboard spec and update CLI documentation Co-Authored-By: Claude Opus 4.5 <[email protected]>

… install - Add automatic Docker build cache cleanup before waa-auto builds - Fix all VERSION=11e → VERSION=11 for fully unattended Windows install (Enterprise Evaluation shows edition picker dialog; Pro does not) - Update CLAUDE.md documentation with disk space management solution Co-Authored-By: Claude Opus 4.5 <[email protected]>

Root cause: CLI used VERSION=11 but Dockerfile uses VERSION=11e. This caused XML patches (applied for 11e) to be ignored at runtime. Enterprise Eval (11e) has built-in GVLK key - never prompts for product key. Fixes: openadapt-evals-b3l Co-Authored-By: Claude Opus 4.5 <[email protected]>

VERSION=11e (Enterprise Eval) has built-in GVLK - never prompts. VERSION=11 (Pro) may prompt for product key. Previous documentation was backwards, causing confusion. Co-Authored-By: Claude Opus 4.5 <[email protected]>

The previous approach copied windowsarena's autounattend.xml over dockurr/windows's version, which broke the OOBE flow. Changes: - Remove COPY commands that replaced the base image's XML files - Add conditional sed patch that only adds InstallFrom element if needed - Reorder Dockerfile to install Python deps before running python3 commands - Add clear comments explaining the OEM mechanism This fixes Windows installation failures where the OOBE would hang or show incorrect dialogs. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Major cleanup of benchmarks CLI: Removed deprecated handlers (~1200 lines): - setup-waa: Replaced by top-level 'waa' command - run-waa: Replaced by top-level 'waa' command - prepare-windows: Replaced by top-level 'waa' command - waa-native: Replaced by scripts/waa_bootstrap_local.sh Added features: - cleanup_waa_resources(): Auto-cleanup leftover Azure resources (NICs, VNETs, NSGs, PublicIPs, disks) before VM creation - Updated default VM size to Standard_D8ds_v5 (300GB temp storage) - Updated help text with temp storage sizes for each VM option - Added deprecation notice to legacy viewer command The cleanup function prevents "resource already exists" errors when previous VM deletion was incomplete. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Added comprehensive guidelines for Claude Code sessions: CLI-First Rule: - Never use raw az/ssh commands that require permission - Always use or extend the CLI for VM operations - Example pattern for adding new CLI functionality Standard VM Configuration Workflow: - Delete VM, update code, recreate (vs. trying to resize) - Current VM defaults (D8ds_v5, eastus, Ubuntu 22.04) This reduces friction by documenting the pre-approved command patterns and standard operating procedures. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Close unclosed code block (lines 33-41) - Remove hardcoded absolute path, use relative description Co-Authored-By: Claude Opus 4.5 <[email protected]>

Key fixes to waa_deploy/Dockerfile: - Don't replace dockurr/windows autounattend.xml, only patch with InstallFrom element to prevent "Select the operating system" dialog - Use sed instead of python3 for run.py patching (Python installed later) - Fix entrypoint: use /run/entry.sh instead of non-existent /copy-oem.sh This enables fully automated Windows 11 Enterprise Eval installation with VERSION=11e, no manual intervention required. WAA server starts automatically via FirstLogonCommands. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Key fixes: - Dockerfile: Don't replace autounattend.xml, only patch it with InstallFrom element (preserves dockurr/windows OEM mechanism, prevents "Select OS" dialog) - CLI: Run benchmark inside container with `docker exec -w /client` - CLI: Use valid som_origin "oss" instead of invalid "omniparser" - CLI: Fix VNC URLs to use localhost (SSH tunnel) instead of public IP - CLI: Add SSH retry logic with exponential backoff - CLI: Add ConnectTimeout to SSH options for faster failure detection The WAA benchmark now runs successfully with the navi agent. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Fetch VM IP from Azure CLI at runtime instead of stale registry file - Fix activity detection to use localhost:5000 (Docker port forward) - Change SSH tunnel to forward localhost:5001 -> VM:5000 - Update CLAUDE.md with comprehensive WAA workflow documentation - Add API key auto-loading note (loaded from .env via config.py) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Resolve conflicts in CLAUDE.md, README.md, cli.py, vm_monitor.py, local.py Taking PR branch versions which have more recent changes. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Move warnings.warn() after imports to fix E402 errors - Remove unused imports (Any, base64, os, Service) - Remove f-string without placeholders - Apply ruff formatting to unformatted files These fixes resolve CI failures on main branch introduced in PR #10. Co-Authored-By: Claude Opus 4.5 <[email protected]>

abrichr and others added 24 commits January 18, 2026 19:06

fix: resolve ruff linting and formatting issues

b4caa01

docs: fix inverted VERSION documentation in CLAUDE.md

4e0b2a4

VERSION=11e (Enterprise Eval) has built-in GVLK - never prompts. VERSION=11 (Pro) may prompt for product key. Previous documentation was backwards, causing confusion. Co-Authored-By: Claude Opus 4.5 <[email protected]>

feat: automate vanilla WAA bootstrap

8fe7e6f

docs: clarify unattended WAA bootstrap

fd11326

docs: fix markdown formatting in waa_vanilla_automation.md

95e96a1

- Close unclosed code block (lines 33-41) - Remove hardcoded absolute path, use relative description Co-Authored-By: Claude Opus 4.5 <[email protected]>

Merge branch 'main' into feat/waa-vanilla-automation

3482349

Resolve conflicts in CLAUDE.md, README.md, cli.py, vm_monitor.py, local.py Taking PR branch versions which have more recent changes. Co-Authored-By: Claude Opus 4.5 <[email protected]>

abrichr merged commit 9a37bb4 into main Jan 24, 2026
0 of 4 checks passed

abrichr mentioned this pull request Jan 24, 2026

docs: add research on API agent integration approaches for WAA #11

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vanilla WAA bootstrap automation #10

Vanilla WAA bootstrap automation #10

Uh oh!

abrichr commented Jan 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Vanilla WAA bootstrap automation #10

Vanilla WAA bootstrap automation #10

Uh oh!

Conversation

abrichr commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abrichr commented Jan 22, 2026 •

edited

Loading