Skip to content

sineline/woofalytics-streamer

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

woofalytics-streamer

A community fork of woofalytics by @mdoulaty. This fork was extended by an agentic AI (Antigravity, built by Google DeepMind) working alongside the repository owner. The AI autonomously designed, implemented, debugged, and committed all additions described below.


What is Woofalytics?

Woofalytics is an AI-powered dog bark detector originally built by @mdoulaty to run on a Raspberry Pi with a dual-channel microphone array. It uses a compact feed-forward neural network to classify barks in real time, estimates the direction of arrival (DOA) of the sound, and exposes a live web dashboard.

The original motivation: distinguish a neighbour's dog from one's own and use the trigger to auto-dispense treats β€” breaking the bark-response cycle. The model itself is a small two-hidden-layer network with a sigmoid output over 80-dimensional log-Mel filterbank features extracted from 60 ms windows. It is fast enough for real-time use on constrained hardware.

Original hardware: Raspberry Pi 4 + Andrea Electronics PureAudio USB Array Microphone (2-channel linear array).


What this fork adds

This fork keeps everything from the original and layers a production-ready monitoring and management stack on top, containerised with Docker.

🐳 Containerisation

  • Dockerfile, docker-compose.yml, .dockerignore β€” runs the full stack in a single command
  • NVIDIA GPU support via the Container Toolkit (CPU fallback included)
  • privileged: true passthrough for full USB audio visibility inside the container (required for the PS Eye 4-mic array β€” ALSA card registration does not transfer through /dev/snd alone)
  • Persistent volume at ./clips/ for recorded audio and the SQLite database

πŸŽ™οΈ Extended microphone support: Sony PlayStation Eye (PS Eye)

The original code targeted a 2-channel Andrea array. This fork adds full support for the PS Eye camera, which exposes a 4-microphone ULA (bNrChannels=4 at 16 000 Hz per the USB audio descriptor).

  • DOA estimation upgraded to a 4-element ULA beamformer
  • MIC_CHANNELS=4, MIC_SAMPLE_RATE=16000, MIC_ARRAY_SPACING configurable at runtime without a rebuild

πŸ“Š Analytics dashboard (/analytics)

  • Per-dog noise contribution charts (bark count, cumulative duration, peak dBFS)
  • Timeline heat-map of bark activity by hour
  • Filterable by dog identity and date range

πŸ“Ό Clip Library (/library)

  • Browse, play, retag, and delete stored WAV clips
  • Per-clip audio player, probability, peak dBFS, duration, DOA badge
  • Bulk select + delete
  • Dog reassignment dropdown
  • Search and sort (newest / oldest / loudest / longest)
  • Upload status badge per clip (queued / uploaded / failed)

πŸ—„οΈ Relational SQLite schema

  • New dogs table joined to events
  • Auto-naming: unidentified dogs are sequentially assigned "Dog 1", "Dog 2", … on first bark
  • CRUD API: create dog, rename dog, retag event, delete event + file

πŸ“Ί Stream page (/stream)

  • Go Live / Stop controls for YouTube RTMP (ffmpeg + libx264)
  • Auto-stream modes: manual / on-bark / always / scheduled (time window)
  • Encoding quality settings: resolution, fps, bitrate, OSD reset delay
  • Live stream status (duration, error display, key-set indicator)
  • Quick-links to YouTube Studio and Live Dashboard

☁️ Archive upload β€” async clip backup

An alternative to live streaming: automatically upload bark clips to cloud storage for long-term archiving and audit.

Backend Details
S3-compatible AWS S3, Backblaze B2, MinIO, Wasabi β€” just change the endpoint URL
SFTP Any SSH server; password or private key auth
  • Non-blocking background queue with 3-attempt retry and exponential back-off
  • Upload status (queued / uploaded / failed) tracked per clip in the DB
  • Secrets (S3 secret, SFTP password, stream key) are never returned to the browser

βš™οΈ Config page (/config)

  • Mic device selector β€” dropdown of all ALSA input devices, channels, sample rate, DOA array spacing
  • Stream key input β€” masked password field with show/hide toggle, saved to the Docker volume (not in env)
  • Video device selector β€” lists /dev/video* devices
  • Runtime sliders β€” bark threshold and auto-save cooldown apply instantly without a restart
  • docker-compose.yml snippet generator β€” copy-paste ready config for the current settings

🐞 Debug page (/debug)

  • Real-time system stats (CPU, RAM, GPU if present)
  • Live VU meter per channel
  • DOA compass visualization
  • Active audio device health
  • Scrollable live log tail

🧭 Shared navigation (/nav.js)

  • Single JS file injected on every page
  • Sticky glassmorphism bar with active-link detection
  • Live bark status pill (polling /api/bark)

🧠 Training & Labeling Pipeline (/train)

  • V2 CNN bark detector β€” 1D CNN with batch normalization, 500ms context window (vs 60ms in V1)
  • CMVN normalization β€” volume-independent classification (catches distant barks, ignores loud non-bark sounds)
  • Smart clip slicing β€” uses ffmpeg silencedetect to skip silent sections when splitting long clips into short labelling segments
  • Waveform player β€” interactive canvas-based player with click-to-seek, speed controls, and keyboard shortcuts
  • Per-clip labeling β€” Bark / Not-Bark buttons with keyboard shortcuts (B/N)

πŸ€– AI-Assisted Labeling (Google Gemini)

  • Integrates with Gemini 2.0 Flash to auto-classify audio clips as bark or not-bark
  • Per-clip "AI Label" button and bulk "AI Label All" for rapid labeling
  • Returns confidence scores and audio descriptions
  • API key stored securely in settings.json (masked in UI)

πŸ“‘ MQTT Integration (/mqtt)

  • Publish bark events as JSON to any MQTT broker
  • Configurable broker, port, username/password, TLS
  • Home Assistant auto-discovery support (binary_sensor with device_class: sound)
  • Connection test button with live status indicator
  • Last Will and Testament (LWT) for online/offline availability

πŸ”€ Data Augmentation (/augment)

  • Generate training clip variations from labelled data
  • Gain variation: Β±6 dB, Β±12 dB
  • Speed/pitch shift: 0.85Γ—, 0.9Γ—, 1.1Γ—, 1.15Γ—
  • Reverb simulation: room echo via convolution
  • Augmented clips inherit the source label and integrate directly with the training pipeline

πŸ’Ύ Persistent settings

settings.py writes a ./clips/settings.json file that survives container restarts and takes precedence over docker-compose.yml env vars for runtime-configurable fields.


Hardware

Component Original This fork
SBC Raspberry Pi 4 Any x86-64 Linux PC or ARM64 SBC (tested on Ubuntu)
Microphone Andrea PureAudio 2-ch USB array Sony PS Eye (4-mic ULA, 16 kHz)
GPU β€” Optional NVIDIA GPU for faster inference

Running on an Odroid N2+? The stack runs natively on Ubuntu 22.04 for ARM64. Install PyTorch with --index-url https://download.pytorch.org/whl/cpu. No CUDA needed β€” the model is small enough for CPU-only inference. For streaming, use h264_v4l2m2m hardware encoding instead of libx264 to keep CPU load low.


Quick start (Docker)

git clone https://github.com/sineline/woofalytics-streamer.git
cd woofalytics-streamer

# (optional) set your YouTube stream key
# edit docker-compose.yml β†’ YOUTUBE_STREAM_KEY=xxxx

docker compose up --build

Open http://localhost:8000

PS Eye users: The privileged: true flag in docker-compose.yml is required for ALSA to see the 4-channel card inside the container. Also update the USB device path if your bus/device numbers differ (lsusb to check):

devices:
  - /dev/bus/usb/001/008:/dev/bus/usb/001/008  # adjust to your lsusb output

Native install (no Docker)

# System dependencies (Debian/Ubuntu)
sudo apt install python3-pip python3-venv portaudio19-dev ffmpeg \
                 v4l-utils libsndfile1 alsa-utils

# Create venv
python3 -m venv .venv && source .venv/bin/activate

# PyTorch (CPU-only build β€” works on ARM64 and x86)
pip install torch --index-url https://download.pytorch.org/whl/cpu

# Everything else
pip install -r requirements.txt

python main.py

Environment variables

Variable Default Description
MIC_DEVICE_HINT USB Camera Substring matched against PyAudio device names
MIC_CHANNELS 4 2 for Andrea array, 4 for PS Eye
MIC_SAMPLE_RATE 16000 Hz β€” PS Eye only supports 16000
MIC_ARRAY_SPACING 0.1 Inter-mic spacing as fraction of wavelength (Ξ»)
AUTO_SAVE_COOLDOWN 30 Seconds between auto-saved clips
YOUTUBE_STREAM_KEY (empty) Disables streaming if unset
VIDEO_DEVICE /dev/video0 V4L2 device for ffmpeg video capture
BARK_QUIET_SECONDS 10 OSD reset delay after last bark
EVENTS_DB ./clips/events.db SQLite database path

All of these can also be set at runtime via the Config page and are persisted to ./clips/settings.json.


Web pages

Path Description
/ Live dashboard β€” bark probability, VU meters, DOA compass
/analytics Per-dog noise analytics and timeline
/library Browse, play, tag, and delete recorded clips
/stream YouTube streaming controls + archive upload config
/train Label clips, train model, AI-assisted labeling, smart slicing
/augment Generate training data variations (gain, speed, reverb)
/mqtt MQTT broker configuration and event publishing
/debug System telemetry, live log, audio device health
/config Device selectors, stream key, runtime tuning
/rec Manual clip recording trigger

API

Method Endpoint Description
GET /api/bark Current bark probability and timestamp
GET /api/analytics Aggregated per-dog stats
GET /api/events?limit=N&dog_id=X Recent bark events
GET /api/dogs All dogs in the DB
POST /api/dogs Create a new dog { "name": "Rex" }
PATCH /api/dogs/<id> Rename a dog { "name": "Buddy" }
DELETE /api/events/<id> Delete event + WAV file
PATCH /api/events/<id> Retag event { "dog_id": "Dog 2" }
GET /api/stream Stream status
POST /api/stream Start/stop stream { "action": "start" }
GET /api/upload Upload queue status
GET/POST /api/settings Read / write persistent settings
GET/POST /api/config Read / write runtime config (threshold, cooldown)
GET /api/devices List PyAudio input devices
GET /api/devices/video List /dev/video* devices
GET /api/debug Live system telemetry
GET /api/train Training job status
GET /api/train/clips All clips available for labeling
POST /api/train Start a training run
POST /api/clips/<id>/slice Slice a clip into segments { "smart": true }
POST /api/ai/label AI-classify a single clip { "event_id": 123 }
POST /api/ai/label-batch AI-classify multiple clips
POST /api/augment Generate augmented clips
GET /api/mqtt/status MQTT connection status
POST /api/mqtt/test Test MQTT broker connection
POST /api/mqtt/configure Save & apply MQTT settings

Model

Two model versions are supported:

V1 (Legacy)

Input: 80-dim log-Mel filterbank (60 ms window)
β†’ FC(480 β†’ 64, ReLU)
β†’ FC(64 β†’ 32, ReLU)
β†’ FC(32 β†’ 1, Sigmoid)
Output: P(barking)

V2 (CNN β€” current)

Input: [50, 80] log-Mel filterbank (500 ms window, CMVN normalized)
β†’ Conv1d(80β†’64, k=5) + BatchNorm + ReLU + MaxPool
β†’ Conv1d(64β†’128, k=3) + BatchNorm + ReLU + MaxPool
β†’ Conv1d(128β†’128, k=3) + BatchNorm + ReLU + AdaptiveAvgPool
β†’ FC(128β†’64, ReLU + Dropout)
β†’ FC(64β†’1, Sigmoid)
Output: P(barking)

Key difference: V2 uses CMVN (per-window mean/variance normalization) so it classifies by spectral shape rather than absolute volume. This catches distant barks and ignores nearby loud non-bark sounds.

The model version is auto-detected at startup. Train via the /train page.


IFTTT integration

Unchanged from the original. Set in record.py:

IFTTT_EVENT_NAME = "woof"
IFTTT_KEY = "YOUR_IFTTT_WEBHOOKS_KEY"

Credits

  • Original project: woofalytics by @mdoulaty β€” bark detection model, DOA estimation, core recording loop, web server, IFTTT integration.
  • This fork: All additions (Docker, multi-page UI, relational DB, archive upload, stream controls, config/debug pages, PS Eye support) were designed and implemented by Antigravity, an agentic AI coding assistant built by the Google DeepMind team, working interactively with the repository owner @sineline.

No original logic was removed or broken. All new features are additive.

About

AI Powered Bark Analytics with a vibe-coded youtube streaming interface and Eyetoy mic array.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 53.6%
  • Python 39.7%
  • Jupyter Notebook 4.5%
  • JavaScript 1.4%
  • Other 0.8%