IMPORTANT: Always use the SkyPilot API for accessing job and cluster information. Never query the SkyPilot SQLite databases directly (e.g., ~/.sky/jobs.db, ~/.sky/state.db).
- API endpoint:
https://skypilot-api.softmax-research.net - Authentication: OAuth2 cookies stored in
~/.sky/cookies.txt - Configuration:
~/.sky/config.yaml
- Centralized data: The API provides access to managed jobs across all users and clusters
- Up-to-date information: The API reflects the current state of the jobs controller
- Proper abstractions: The API provides structured data with proper types
- Security: Direct database access bypasses authentication and auditing
- Created by: User (via UI)
- Purpose: Organize experiments into logical groups
- Contains: Ordered list of experiments (many-to-many relationship)
- Fields: id, name, flags (columns to display), order, collapsed
- Created by: User (via "Create" button or "Duplicate")
- Purpose: Configuration template that defines what to run
- Key fields:
id: Auto-increment integer (internal)name: Unique string identifier (user-facing, used for matching jobs/checkpoints)desired_state: RUNNING or STOPPEDcurrent_state: Reflects latest job statusflags: Configuration key-value pairsnodes,gpus: Resource requirements
- Created by: Synced from SkyPilot API (poller)
- Purpose: Track actual job executions
- Matching: Jobs display under experiments where
job.experiment_id == experiment.name - Key fields: id, experiment_id (matches experiment.name), status, command, git_ref, nodes, gpus
- Created by: Synced from S3 (syncer)
- Purpose: Track model checkpoints and replay files
- S3 path:
s3://softmax-public/policies/{experiment.name}/ - Key fields: experiment_id (references experiment.id), epoch, model_path, replay_paths, policy_version
- Jobs match to experiments by name:
job.experiment_id == experiment.name - Checkpoints are stored by id:
checkpoint.experiment_id == experiment.id - S3 paths use experiment name (e.g.,
s3://.../{experiment.name}/) - If no experiment exists with matching name, jobs appear as "orphaned"
Database Location: SkyDeck uses SQLite for persistent storage.
- Default location:
~/.skydeck/skydeck.db - Configuration: Can be overridden with
--db-pathflag orSKYDECK_DB_PATHenvironment variable - Schema: Defined in
skydeck/database.pywith automatic migrations on startup
When working with the database directly:
# Backfill checkpoint versions (example)
uv run python -c "
import asyncio
from pathlib import Path
from skydeck.backfill_versions import backfill_checkpoint_versions
db_path = str(Path.home() / '.skydeck' / 'skydeck.db')
asyncio.run(backfill_checkpoint_versions(db_path))
"
# Query database directly
sqlite3 ~/.skydeck/skydeck.db "SELECT COUNT(*) FROM experiments;"- Always use
uvfor pip and python operations - Imports should go at the top of the file if possible
- Follow existing patterns in the codebase for consistency
- NEVER add fallbacks - fix the underlying problem instead
- When making backend changes, restart the server: the user must restart skydeck for changes to take effect
After making backend changes (Python), restart the server to pick up changes:
lsof -ti:8000 | xargs kill -9 2>/dev/null || true
sleep 2
nohup uv run skydeck --port 8000 > /tmp/skydeck.log 2>&1 &
sleep 3
curl -s http://localhost:8000/api/health | head -c 100 # Verify it's runningAfter making frontend changes (TypeScript/React):
cd packages/skydeck/frontend
npm run build # Builds to ../skydeck/static/Important: Always restart the backend yourself to test changes. Do not ask the user to restart.