What to do after your model finishes training on Colab
After training completes on Google Colab:
-
Find model in Google Drive:
MyDrive/physicalaihack/models/act_shape_insertion/This is a folder containing:
model.safetensors(model weights)config.json(model config)- Normalization files
- Training config
-
Download to your Mac:
- Download the entire
act_shape_insertionfolder - Or use Drive desktop app
- Download the entire
-
Place in local models directory:
# The folder should be at: /Users/bencxr/dev/physicalaihack/models/act_shape_insertion/ # Verify files exist: ls models/act_shape_insertion/ # Should show: model.safetensors, config.json, etc.
# Activate environment
source lerobot-env/bin/activate
# Run evaluation
python eval_act_sim.pyWhat happens:
- Runs 20 test episodes in simulation
- Computes success rate, cycle time, failure modes
- Saves videos to
eval_videos/ - Saves metrics to
eval_results/
Expected time: 5-10 minutes
# More episodes for better statistics
python eval_act_sim.py --episodes 50
# No visual rendering (faster)
python eval_act_sim.py --no-render
# Don't save videos (saves disk space)
python eval_act_sim.py --no-videos
# Custom model path
python eval_act_sim.py --model models/best_model.pthpython analyze_eval_results.py --latestOutput:
- Success rate vs target (70%)
- Cycle time vs target (<10s)
- Gap analysis
python analyze_eval_results.py --compareOutput:
- Trends across evaluations
- Best/worst/average metrics
- Common failure patterns
python analyze_eval_results.pyYou're ready for the hackathon!
- Model performs well
- Proceed to hardware transition
- Document your approach
Next steps:
- Review failure cases to understand edge cases
- Prepare for hardware differences
- Practice explanation for judges
Good but can improve:
- Collect 20-30 more demos (focus on failures)
- Package and upload to Colab
- Retrain (2-4 hours)
- Re-evaluate
Focus areas:
- Watch failed episodes in
eval_videos/ - Identify specific failure patterns
- Add demos that show correct behavior
Needs significant improvement:
- Review demo quality with
inspect_demos.py - Collect 50+ high-quality demos
- Ensure consistent technique
- Consider simplifying task initially
- Retrain with larger dataset
Common issues:
- Inconsistent demos (different approaches)
- Too few demos (<20)
- Poor demo quality (jerky movements)
- Training bugs (check Colab logs)
If you need to improve:
# Watch failed episodes
open eval_videos/episode_005.mp4 # Replace with failed episodeLook for:
- Where does policy fail? (grab, transport, release)
- Is it consistent or random?
- Does it fail in specific situations?
python teleop_sim.pyFocus on:
- Scenarios where policy fails
- Consistent approach technique
- Smooth, deliberate movements
- Successful insertions only
Target: 20-50 total demos (including previous)
# Package new demos
cd sim_data
tar -czf shape_insertion_data_v2.tar.gz *Then:
- Upload to Google Drive
- Run training notebook (2-4 hours)
- Download new model
- Repeat evaluation
# Compare all evaluation runs
python analyze_eval_results.py --compareLook for:
- Improvement in success rate
- Reduction in specific failure modes
- More consistent performance
physicalaihack/
βββ models/
β βββ act_shape_insertion_final.pth # Downloaded from Colab
β βββ best_model.pth # (optional) best checkpoint
β
βββ eval_videos/ # Created by eval_act_sim.py
β βββ episode_001.mp4 # Video of episode 1
β βββ episode_002.mp4
β βββ ...
β
βββ eval_results/ # Created by eval_act_sim.py
β βββ eval_results_20260128_143022.json # Metrics from run 1
β βββ eval_results_20260128_151544.json # Metrics from run 2
β βββ ...
β
βββ sim_data/ # Your collected demos
β βββ shape_insertion_demos.pkl
β
βββ eval_act_sim.py # Evaluation script β
βββ analyze_eval_results.py # Analysis script
βββ POST-TRAINING-GUIDE.md # This guide
- What: Percentage of episodes that successfully insert shape
- Target: 70%+ for hackathon readiness
- Formula: (successes / total_episodes) Γ 100
- What: Time from episode start to successful insertion
- Target: <10 seconds
- Only counts: Successful episodes
- failed_to_grab: Policy never picked up the shape
- dropped_shape: Picked up but dropped before slot
- timeout: Ran out of time (500 steps)
- other: Miscellaneous failures
- Watch at least a few episodes live to understand behavior
- Save videos for later analysis and presentation
- Run multiple evaluation rounds (3-5) for better statistics
- Check for consistent vs random failures
- Quality > Quantity for demos
- Be consistent in your approach
- Focus on smooth, deliberate movements
- Only save successful demonstrations
- Collect 20+ demos minimum, 50+ is better
- Achieve 70%+ success rate
- Document your approach
- Understand failure modes
- Prepare for hardware differences
- Practice explaining to judges
# Check model exists
ls -lh models/
# If missing, download from Google Drive
# Place at: models/act_shape_insertion_final.pth# Ensure LeRobot is installed
source lerobot-env/bin/activate
pip list | grep lerobot
# If missing, reinstall
cd lerobot
pip install -e .- Check model downloaded correctly (not corrupted)
- Verify training completed successfully
- Review training loss curves in Colab
- Ensure demo quality is good
- Consider collecting more demos
# Check directory exists and is writable
ls -ld eval_videos/
# Check disk space
df -h .# Basic evaluation
python eval_act_sim.py
# Extended evaluation
python eval_act_sim.py --episodes 50
# View latest results
python analyze_eval_results.py --latest
# Compare all runs
python analyze_eval_results.py --compare
# Watch a specific episode
open eval_videos/episode_001.mp4
# Collect more demos
python teleop_sim.py
# Package for retraining
cd sim_data && tar -czf shape_insertion_data_v2.tar.gz *eval_act_sim.py- Main evaluation scriptanalyze_eval_results.py- Results analysisinspect_demos.py- Demo quality checkerteleop_sim.py- Collect more demoscolab-notebook-clean.ipynb- Retraining
Before hackathon (this week):
- Model trained on Colab (2-4 hours)
- Model downloaded to Mac
- Evaluation run (5-10 minutes)
- Success rate >70%
- Cycle time <10s
- Failure modes understood
- Videos saved for presentation
- Approach documented
If success rate <70%:
- Analyzed failures
- Collected 20+ more demos
- Retrained on Colab
- Re-evaluated
- Achieved target metrics
Once you hit 70%+ success rate:
- β You're ready for the hackathon
- π₯ Save your best videos for demo
- π Document your approach
- π€ Prepare for hardware transition
At the hackathon:
- Fine-tune on real robot demos
- Transfer your sim-to-real approach
- Show your simulation results
- Present to judges!
Good luck! ππ€
Last updated: January 2026