Unified personal finance ledger with a hybrid deterministic + LLM pipeline. Aggregates heterogeneous bank exports (current accounts, credit/debit/prepaid cards, savings accounts) into a single chronological ledger, eliminating double-counting from periodic card settlements and internal transfers. Offline-first; remote LLM backends opt-in with mandatory PII sanitization.
👋 End users — looking to install Spendif.ai? Go to the Getting Started page for the illustrated install + first-launch guide. This README is for developers and contributors.
- Python 3.13 · Streamlit · SQLAlchemy · Pydantic · Pandas
- Hybrid pipeline: deterministic normalizer + LLM classifier + categorizer cascade
- Multi-backend LLM with circuit breaker: llama.cpp (default for desktop), Ollama, OpenAI, Claude — direct SDK use, no LangChain
- Native desktop launcher: pywebview + PyInstaller → DMG / MSIX / .deb / .rpm
- 14-page Streamlit UI, full IT+EN i18n (760+ translation keys)
- Hybrid pipeline (deterministic + LLM) —
core/normalizer.pyparses any tabular bank export;core/classifier.pyinfersDocumentSchemavia LLM;core/categorizer.pyruns a 4-step cascade (user rules → regex → LLM → fallback) - Multi-backend LLM with circuit breaker —
core/llm_backends.pyfactory: llama.cpp, Ollama, OpenAI, Claude. Automatic fallback + quarantine on failure - PII sanitization (RF-10) — IBAN / PAN / fiscal code / owner-name redaction in
core/sanitizer.py, mandatory before any remote call (assert_sanitized()is a precondition, not best-effort) - Multi-language taxonomy — 2-level in DB, 5 languages (it/en/fr/de/es), configurable from the Streamlit UI
- Card-account reconciliation (RF-03, beta) — 3-phase algorithm in
core/normalizer.py: pairs credit-card settlements with the underlying expenses to eliminate double-counting (edge cases still being refined) - Internal transfer detection (RF-04, beta) — symbolic-amount + ±7-day window matching, with owner-name permutations to catch "Cognome Nome" exports (edge cases still being refined)
Spendif.ai is in alpha. The features we add next are decided together with the people using the app today: no speculative work, no roadmap dictated from above.
Features under evaluation based on alpha-tester feedback:
- Cash tracking — log cash purchases manually, with no bank statement attached
- Investment performance — see at a glance how the instruments in your portfolio are doing
- Companion mobile app — capture cash on the fly from your phone and sync with the desktop
Would one of these help you? Tell us in GitHub Discussions. When enough requests stack up on the same feature, it moves to the top of the queue.
git clone https://github.com/drake69/spendif-ai.git
cd spendify
uv sync --extra desktop
# Local LLM (developer choice — the desktop installer handles this automatically):
# → if you already have Ollama running: ollama pull gemma3:12b
# → otherwise: `uv sync` installs llama-cpp-python and the launcher
# auto-downloads a GGUF model on first run
./start.sh # or: streamlit run app.pyPrerequisites: Python 3.13+, uv, and either Ollama or nothing (llama.cpp is bundled). Full setup → CONTRIBUTING.en.md.
uv run python -m desktop.launcherOpens a pywebview window, downloads an AI model on first run, and starts Streamlit inside the same window. Identical to the bundled DMG/MSIX experience.
uv run pytest -v # full suite (no LLM mocks)
uv run pytest --cov=. --cov-report=term-missing # with coverage (target ≥ 90%)
uv run pytest -k "architecture" # layer separation gate
uv run pytest -k "security" # forbidden patterns + SQL injectionArchitectural and security tests are mandatory CI gates and must stay green on main.
ui/ → services/ → core/ → db/ → SQLite
↑ ↑
async_runner llm_backends · sanitizer · normalizer · classifier · categorizer
UI may import only services/; core/ may not import db/; db/ may not import upward. The coupling gate (tools/coupling_check.py --strict) blocks PRs that violate this.
Full diagram and Flow 1 vs Flow 2 → docs/architecture.md.
sw_artifacts/
├── app.py # Streamlit entry point (onboarding gate + 14 pages)
├── core/ # Pipeline: orchestrator, normalizer, classifier, categorizer, sanitizer, llm_backends
├── services/ # Facade layer for UI; async runner; settings; import
├── ui/ # Streamlit pages + i18n + widgets
├── db/ # SQLAlchemy ORM, repository pattern, schema with auto-hash migrations
├── api/ # FastAPI REST endpoints (optional)
├── desktop/ # Native launcher (pywebview) + splash
├── packaging/ # Build scripts: macos/, windows/, linux/, homebrew/, winget/
├── docker/ # Containerisation
├── prompts/ # LLM prompt templates (versioned JSON)
├── reports/ # HTML + CSV + XLSX export
├── tests/ # pytest suite (≥ 90% coverage target)
├── benchmark/ # LLM benchmark suite (multi-provider)
└── docs/ # User & developer documentation
More detail → docs/developer_guide.en.md.
| Topic | Languages |
|---|---|
| Install & first launch | EN · IT |
| User guide (every page) | EN · IT |
| Reference guide (pipeline, taxonomy, RF-03/04) | EN · IT |
| Architecture | EN · IT |
| Design decisions | EN · IT |
| Configuration | EN · IT |
| Developer guide | EN · IT |
| Categorisation guide | EN · IT |
| Database schema | EN · IT |
| Deployment | EN · IT |
| Release process | EN · IT |
| Desktop build & test loop | EN · IT |
| Contributing | EN · IT |
| Security policy | EN · IT |
| Changelog | EN · IT |
Bug reports, feature ideas and PRs welcome. See CONTRIBUTING.en.md for the workflow, branch policy, priority framework, and CI gates.
PolyForm Noncommercial 1.0.0 — see LICENSE. Free for personal use; commercial use requires a separate licence.
All financial data is stored locally in ~/.spendifai/ledger.db.
Local LLM backend (default — llama.cpp, Ollama): nothing leaves the machine.
Remote LLM backend (opt-in — OpenAI, Claude): the payload contains sanitised descriptions plus transaction amounts, dates, and column metadata.
Raw transaction row from the CSV:
date: 2026-03-15
description: "BONIFICO da MARIO ROSSI IT60X0542811101000000123456 CAU 12345 STIPENDIO MENSILE"
amount: 1500.00
What the remote LLM actually receives:
{
"amount": "1500.00",
"description": "BONIFICO da Carlo Brambilla <ACCOUNT_ID> <TX_CODE> STIPENDIO MENSILE"
}What changed:
MARIO ROSSI(configured owner name) →Carlo Brambilla(fake from Italian pool, restored after the LLM responds)IT60X...(IBAN) →<ACCOUNT_ID>CAU 12345(bank transaction code) →<TX_CODE>amountand date metadata: sent in the clear
The categorizer prompt instructs the model to "base the decision on the
description, amount, and context" (prompts/categorizer.json). Whether
the amount actually changes accuracy in practice has not been benchmarked
against an amount-stripped baseline — the conservative default keeps it
in the payload until measured.
Amount + date redaction modes for remote backends (none / buckets /
strip) are on the roadmap — backlog item AI-55.