Post-Training Workflow Guide

What to do after your model finishes training on Colab

📥 Step 1: Download Your Trained Model

After training completes on Google Colab:

Find model in Google Drive:
```
MyDrive/physicalaihack/models/act_shape_insertion/
```
This is a folder containing:
- model.safetensors (model weights)
- config.json (model config)
- Normalization files
- Training config
Download to your Mac:
- Download the entire act_shape_insertion folder
- Or use Drive desktop app

Place in local models directory:

# The folder should be at:
/Users/bencxr/dev/physicalaihack/models/act_shape_insertion/

# Verify files exist:
ls models/act_shape_insertion/
# Should show: model.safetensors, config.json, etc.

🧪 Step 2: Run Evaluation

Basic Evaluation (20 episodes)

# Activate environment
source lerobot-env/bin/activate

# Run evaluation
python eval_act_sim.py

What happens:

Runs 20 test episodes in simulation
Computes success rate, cycle time, failure modes
Saves videos to eval_videos/
Saves metrics to eval_results/

Expected time: 5-10 minutes

Advanced Options

# More episodes for better statistics
python eval_act_sim.py --episodes 50

# No visual rendering (faster)
python eval_act_sim.py --no-render

# Don't save videos (saves disk space)
python eval_act_sim.py --no-videos

# Custom model path
python eval_act_sim.py --model models/best_model.pth

📊 Step 3: Analyze Results

View Latest Results

python analyze_eval_results.py --latest

Output:

Success rate vs target (70%)
Cycle time vs target (<10s)
Gap analysis

Compare Multiple Runs

python analyze_eval_results.py --compare

Output:

Trends across evaluations
Best/worst/average metrics
Common failure patterns

View All Results

python analyze_eval_results.py

🎯 Step 4: Interpret Results

Success Rate: 70%+ ✅

You're ready for the hackathon!

Model performs well
Proceed to hardware transition
Document your approach

Next steps:

Review failure cases to understand edge cases
Prepare for hardware differences
Practice explanation for judges

Success Rate: 50-70% ⚠️

Good but can improve:

Collect 20-30 more demos (focus on failures)
Package and upload to Colab
Retrain (2-4 hours)
Re-evaluate

Focus areas:

Watch failed episodes in eval_videos/
Identify specific failure patterns
Add demos that show correct behavior

Success Rate: <50% ❌

Needs significant improvement:

Review demo quality with inspect_demos.py
Collect 50+ high-quality demos
Ensure consistent technique
Consider simplifying task initially
Retrain with larger dataset

Common issues:

Inconsistent demos (different approaches)
Too few demos (<20)
Poor demo quality (jerky movements)
Training bugs (check Colab logs)

🔄 Step 5: Iteration Loop

If you need to improve:

1. Analyze Failures

# Watch failed episodes
open eval_videos/episode_005.mp4  # Replace with failed episode

Look for:

Where does policy fail? (grab, transport, release)
Is it consistent or random?
Does it fail in specific situations?

2. Collect Targeted Demos

python teleop_sim.py

Focus on:

Scenarios where policy fails
Consistent approach technique
Smooth, deliberate movements
Successful insertions only

Target: 20-50 total demos (including previous)

3. Retrain on Colab

# Package new demos
cd sim_data
tar -czf shape_insertion_data_v2.tar.gz *

Then:

Upload to Google Drive
Run training notebook (2-4 hours)
Download new model
Repeat evaluation

4. Track Progress

# Compare all evaluation runs
python analyze_eval_results.py --compare

Look for:

Improvement in success rate
Reduction in specific failure modes
More consistent performance

📁 File Structure After Evaluation

physicalaihack/
├── models/
│   ├── act_shape_insertion_final.pth     # Downloaded from Colab
│   └── best_model.pth                    # (optional) best checkpoint
│
├── eval_videos/                          # Created by eval_act_sim.py
│   ├── episode_001.mp4                   # Video of episode 1
│   ├── episode_002.mp4
│   └── ...
│
├── eval_results/                         # Created by eval_act_sim.py
│   ├── eval_results_20260128_143022.json # Metrics from run 1
│   ├── eval_results_20260128_151544.json # Metrics from run 2
│   └── ...
│
├── sim_data/                             # Your collected demos
│   └── shape_insertion_demos.pkl
│
├── eval_act_sim.py                       # Evaluation script ⭐
├── analyze_eval_results.py               # Analysis script
└── POST-TRAINING-GUIDE.md                # This guide

🎓 Understanding the Metrics

Success Rate

What: Percentage of episodes that successfully insert shape
Target: 70%+ for hackathon readiness
Formula: (successes / total_episodes) × 100

Cycle Time

What: Time from episode start to successful insertion
Target: <10 seconds
Only counts: Successful episodes

Failure Modes

failed_to_grab: Policy never picked up the shape
dropped_shape: Picked up but dropped before slot
timeout: Ran out of time (500 steps)
other: Miscellaneous failures

💡 Tips for Success

During Evaluation

Watch at least a few episodes live to understand behavior
Save videos for later analysis and presentation
Run multiple evaluation rounds (3-5) for better statistics
Check for consistent vs random failures

During Iteration

Quality > Quantity for demos
Be consistent in your approach
Focus on smooth, deliberate movements
Only save successful demonstrations
Collect 20+ demos minimum, 50+ is better

Before Hackathon

Achieve 70%+ success rate
Document your approach
Understand failure modes
Prepare for hardware differences
Practice explaining to judges

🚨 Troubleshooting

"Model not found" Error

# Check model exists
ls -lh models/

# If missing, download from Google Drive
# Place at: models/act_shape_insertion_final.pth

"Import Error: ACT"

# Ensure LeRobot is installed
source lerobot-env/bin/activate
pip list | grep lerobot

# If missing, reinstall
cd lerobot
pip install -e .

Policy Performs Poorly

Check model downloaded correctly (not corrupted)
Verify training completed successfully
Review training loss curves in Colab
Ensure demo quality is good
Consider collecting more demos

Videos Not Saving

# Check directory exists and is writable
ls -ld eval_videos/

# Check disk space
df -h .

📞 Quick Reference

Common Commands

# Basic evaluation
python eval_act_sim.py

# Extended evaluation
python eval_act_sim.py --episodes 50

# View latest results
python analyze_eval_results.py --latest

# Compare all runs
python analyze_eval_results.py --compare

# Watch a specific episode
open eval_videos/episode_001.mp4

# Collect more demos
python teleop_sim.py

# Package for retraining
cd sim_data && tar -czf shape_insertion_data_v2.tar.gz *

Key Files

eval_act_sim.py - Main evaluation script
analyze_eval_results.py - Results analysis
inspect_demos.py - Demo quality checker
teleop_sim.py - Collect more demos
colab-notebook-clean.ipynb - Retraining

🎯 Success Checklist

Before hackathon (this week):

If success rate <70%:

🎉 You're Ready!

Once you hit 70%+ success rate:

✅ You're ready for the hackathon
🎥 Save your best videos for demo
📝 Document your approach
🤖 Prepare for hardware transition

At the hackathon:

Fine-tune on real robot demos
Transfer your sim-to-real approach
Show your simulation results
Present to judges!

Good luck! 🚀🤖

Last updated: January 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-Training Workflow Guide

📥 Step 1: Download Your Trained Model

🧪 Step 2: Run Evaluation

Basic Evaluation (20 episodes)

Advanced Options

📊 Step 3: Analyze Results

View Latest Results

Compare Multiple Runs

View All Results

🎯 Step 4: Interpret Results

Success Rate: 70%+ ✅

Success Rate: 50-70% ⚠️

Success Rate: <50% ❌

🔄 Step 5: Iteration Loop

1. Analyze Failures

2. Collect Targeted Demos

3. Retrain on Colab

4. Track Progress

📁 File Structure After Evaluation

🎓 Understanding the Metrics

Success Rate

Cycle Time

Failure Modes

💡 Tips for Success

During Evaluation

During Iteration

Before Hackathon

🚨 Troubleshooting

"Model not found" Error

"Import Error: ACT"

Policy Performs Poorly

Videos Not Saving

📞 Quick Reference

Common Commands

Key Files

🎯 Success Checklist

🎉 You're Ready!

FilesExpand file tree

POST-TRAINING-GUIDE.md

Latest commit

History

POST-TRAINING-GUIDE.md

File metadata and controls

Post-Training Workflow Guide

📥 Step 1: Download Your Trained Model

🧪 Step 2: Run Evaluation

Basic Evaluation (20 episodes)

Advanced Options

📊 Step 3: Analyze Results

View Latest Results

Compare Multiple Runs

View All Results

🎯 Step 4: Interpret Results

Success Rate: 70%+ ✅

Success Rate: 50-70% ⚠️

Success Rate: <50% ❌

🔄 Step 5: Iteration Loop

1. Analyze Failures

2. Collect Targeted Demos

3. Retrain on Colab

4. Track Progress

📁 File Structure After Evaluation

🎓 Understanding the Metrics

Success Rate

Cycle Time

Failure Modes

💡 Tips for Success

During Evaluation

During Iteration

Before Hackathon

🚨 Troubleshooting

"Model not found" Error

"Import Error: ACT"

Policy Performs Poorly

Videos Not Saving

📞 Quick Reference

Common Commands

Key Files

🎯 Success Checklist

🎉 You're Ready!