ACM AI Project Winter '26
Fine-tuned ResNet18 classifier on the HaGRID 500k dataset for real-time hand gesture recognition. Hand detection and localization is handled by MediaPipe — the model only classifies the cropped hand region. two_sideways is further split into two_sideways_left / two_sideways_right at inference time using wrist-to-fingertip landmark geometry. Predictions below 80% confidence display as ?.
Recognized gestures: palm, fist, ok, mute, two_up, two_down, two_sideways_left, two_sideways_right
The system runs across two machines:
- CV machine (
webcam_gesture_demo.py) — captures webcam input, runs inference, and sends recognized gestures over UDP - Game machine (
binds_fnaf2.py) — receives UDP payloads and executes system-level mouse/keyboard actions via PyAutoGUI; also supports keyboard bindings (keys 1–8) as a fallback
UDP payload format: "<Hand>:<gesture>" (e.g. "Left:palm", "Right:ok") sent to port 5005 with a 1-second cooldown.
Requires Python 3.12 (PyTorch does not support 3.13+).
CV machine:
python3.12 -m venv venv
source venv/bin/activate
pip install -r requirements.txtGame machine (additional dependencies):
pip install pyautogui keyboardUpdate UDP_IP in webcam_gesture_demo.py to the game machine's IPv4 address before running.
CV machine:
source venv/bin/activate
python webcam_gesture_demo.py
# Press 'q' to quitGame machine:
python binds_fnaf2.py
# Press 'ESC' to exitThe model weights file fnaf_hgr_final.pth must be in the same directory as webcam_gesture_demo.py.
- No
no_gestureclass — the model always tries to classify any visible hand. Suggested fix: add theno_gestureclass from HaGRID. - Accuracy drops when the hand is rotated from upright. Fix: augment training data with rotation variants.
- Model does classification only, not detection. MediaPipe handles localization as a pre-step.