Spendif.ai v3.0

🇮🇹 Leggi in italiano

Unified personal finance ledger with a hybrid deterministic + LLM pipeline. Aggregates heterogeneous bank exports (current accounts, credit/debit/prepaid cards, savings accounts) into a single chronological ledger, eliminating double-counting from periodic card settlements and internal transfers. Offline-first; remote LLM backends opt-in with mandatory PII sanitization.

👋 End users — looking to install Spendif.ai? Go to the Getting Started page for the illustrated install + first-launch guide. This README is for developers and contributors.

What it is (technical)

Python 3.13 · Streamlit · SQLAlchemy · Pydantic · Pandas
Hybrid pipeline: deterministic normalizer + LLM classifier + categorizer cascade
Multi-backend LLM with circuit breaker: llama.cpp (default for desktop), Ollama, OpenAI, Claude — direct SDK use, no LangChain
Native desktop launcher: pywebview + PyInstaller → DMG / MSIX / .deb / .rpm
14-page Streamlit UI, full IT+EN i18n (760+ translation keys)

What's implemented

Hybrid pipeline (deterministic + LLM) — core/normalizer.py parses any tabular bank export; core/classifier.py infers DocumentSchema via LLM; core/categorizer.py runs a 4-step cascade (user rules → regex → LLM → fallback)
Multi-backend LLM with circuit breaker — core/llm_backends.py factory: llama.cpp, Ollama, OpenAI, Claude. Automatic fallback + quarantine on failure
PII sanitization (RF-10) — IBAN / PAN / fiscal code / owner-name redaction in core/sanitizer.py, mandatory before any remote call (assert_sanitized() is a precondition, not best-effort)
Multi-language taxonomy — 2-level in DB, 5 languages (it/en/fr/de/es), configurable from the Streamlit UI
Card-account reconciliation (RF-03, beta) — 3-phase algorithm in core/normalizer.py: pairs credit-card settlements with the underlying expenses to eliminate double-counting (edge cases still being refined)
Internal transfer detection (RF-04, beta) — symbolic-amount + ±7-day window matching, with owner-name permutations to catch "Cognome Nome" exports (edge cases still being refined)

What's coming next

Spendif.ai is in alpha. The features we add next are decided together with the people using the app today: no speculative work, no roadmap dictated from above.

Features under evaluation based on alpha-tester feedback:

Cash tracking — log cash purchases manually, with no bank statement attached
Investment performance — see at a glance how the instruments in your portfolio are doing
Companion mobile app — capture cash on the fly from your phone and sync with the desktop

Would one of these help you? Tell us in GitHub Discussions. When enough requests stack up on the same feature, it moves to the top of the queue.

👩‍💻 Develop locally

git clone https://github.com/drake69/spendif-ai.git
cd spendify
uv sync --extra desktop

# Local LLM (developer choice — the desktop installer handles this automatically):
#   → if you already have Ollama running:   ollama pull gemma3:12b
#   → otherwise: `uv sync` installs llama-cpp-python and the launcher
#     auto-downloads a GGUF model on first run

./start.sh                    # or: streamlit run app.py

Prerequisites: Python 3.13+, uv, and either Ollama or nothing (llama.cpp is bundled). Full setup → CONTRIBUTING.en.md.

Run as a native desktop app from source

uv run python -m desktop.launcher

Opens a pywebview window, downloads an AI model on first run, and starts Streamlit inside the same window. Identical to the bundled DMG/MSIX experience.

Run tests

uv run pytest -v                                  # full suite (no LLM mocks)
uv run pytest --cov=. --cov-report=term-missing   # with coverage (target ≥ 90%)
uv run pytest -k "architecture"                   # layer separation gate
uv run pytest -k "security"                       # forbidden patterns + SQL injection

Architectural and security tests are mandatory CI gates and must stay green on main.

Architecture

ui/  →  services/  →  core/  →  db/  →  SQLite
                ↑       ↑
       async_runner  llm_backends · sanitizer · normalizer · classifier · categorizer

UI may import only services/; core/ may not import db/; db/ may not import upward. The coupling gate (tools/coupling_check.py --strict) blocks PRs that violate this.

Full diagram and Flow 1 vs Flow 2 → docs/architecture.md.

Repository layout

sw_artifacts/
├── app.py                  # Streamlit entry point (onboarding gate + 14 pages)
├── core/                   # Pipeline: orchestrator, normalizer, classifier, categorizer, sanitizer, llm_backends
├── services/               # Facade layer for UI; async runner; settings; import
├── ui/                     # Streamlit pages + i18n + widgets
├── db/                     # SQLAlchemy ORM, repository pattern, schema with auto-hash migrations
├── api/                    # FastAPI REST endpoints (optional)
├── desktop/                # Native launcher (pywebview) + splash
├── packaging/              # Build scripts: macos/, windows/, linux/, homebrew/, winget/
├── docker/                 # Containerisation
├── prompts/                # LLM prompt templates (versioned JSON)
├── reports/                # HTML + CSV + XLSX export
├── tests/                  # pytest suite (≥ 90% coverage target)
├── benchmark/              # LLM benchmark suite (multi-provider)
└── docs/                   # User & developer documentation

More detail → docs/developer_guide.en.md.

📚 Documentation

Topic	Languages
Install & first launch	EN · IT
User guide (every page)	EN · IT
Reference guide (pipeline, taxonomy, RF-03/04)	EN · IT
Architecture	EN · IT
Design decisions	EN · IT
Configuration	EN · IT
Developer guide	EN · IT
Categorisation guide	EN · IT
Database schema	EN · IT
Deployment	EN · IT
Release process	EN · IT
Desktop build & test loop	EN · IT
Contributing	EN · IT
Security policy	EN · IT
Changelog	EN · IT

Contributing

Bug reports, feature ideas and PRs welcome. See CONTRIBUTING.en.md for the workflow, branch policy, priority framework, and CI gates.

License

PolyForm Noncommercial 1.0.0 — see LICENSE. Free for personal use; commercial use requires a separate licence.

What leaves the machine — be precise

All financial data is stored locally in ~/.spendifai/ledger.db.

Local LLM backend (default — llama.cpp, Ollama): nothing leaves the machine.

Remote LLM backend (opt-in — OpenAI, Claude): the payload contains sanitised descriptions plus transaction amounts, dates, and column metadata.

Redaction example — categorizer (`core/categorizer.py:303`)

Raw transaction row from the CSV:

date:        2026-03-15
description: "BONIFICO da MARIO ROSSI IT60X0542811101000000123456 CAU 12345 STIPENDIO MENSILE"
amount:      1500.00

What the remote LLM actually receives:

{
  "amount": "1500.00",
  "description": "BONIFICO da Carlo Brambilla <ACCOUNT_ID> <TX_CODE> STIPENDIO MENSILE"
}

What changed:

MARIO ROSSI (configured owner name) → Carlo Brambilla (fake from Italian pool, restored after the LLM responds)
IT60X... (IBAN) → <ACCOUNT_ID>
CAU 12345 (bank transaction code) → <TX_CODE>
amount and date metadata: sent in the clear

The categorizer prompt instructs the model to "base the decision on the description, amount, and context" (prompts/categorizer.json). Whether the amount actually changes accuracy in practice has not been benchmarked against an amount-stripped baseline — the conservative default keeps it in the payload until measured.

Roadmap

Amount + date redaction modes for remote backends (none / buckets / strip) are on the roadmap — backlog item AI-55.

Name		Name	Last commit message	Last commit date
Latest commit History 481 Commits
.github		.github
api		api
assets/screenshots		assets/screenshots
benchmark		benchmark
chat_bot		chat_bot
config		config
core		core
db		db
desktop		desktop
docker		docker
docs		docs
installer		installer
packaging		packaging
prompts		prompts
quarantine		quarantine
reports		reports
scripts		scripts
services		services
support		support
tests		tests
tools		tools
ui		ui
.dockerignore		.dockerignore
.env		.env
.env.example		.env.example
.gitignore		.gitignore
.nojekyll		.nojekyll
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
CHANGELOG.it.md		CHANGELOG.it.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.en.md		CONTRIBUTING.en.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.it.md		README.it.md
README.md		README.md
SECURITY.it.md		SECURITY.it.md
SECURITY.md		SECURITY.md
VERSION		VERSION
app.py		app.py
desktop.spec		desktop.spec
getting-started.de.html		getting-started.de.html
getting-started.en.html		getting-started.en.html
getting-started.es.html		getting-started.es.html
getting-started.fr.html		getting-started.fr.html
getting-started.html		getting-started.html
getting-started.ja.html		getting-started.ja.html
getting-started.nl.html		getting-started.nl.html
getting-started.pl.html		getting-started.pl.html
getting-started.pt.html		getting-started.pt.html
index.de.html		index.de.html
index.en.html		index.en.html
index.es.html		index.es.html
index.fr.html		index.fr.html
index.html		index.html
index.ja.html		index.ja.html
index.nl.html		index.nl.html
index.pl.html		index.pl.html
index.pt.html		index.pt.html
pyproject.toml		pyproject.toml
start.bat		start.bat
start.sh		start.sh
taxonomy.yaml		taxonomy.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spendif.ai v3.0

What it is (technical)

What's implemented

What's coming next

👩‍💻 Develop locally

Run as a native desktop app from source

Run tests

Architecture

Repository layout

📚 Documentation

Contributing

License

What leaves the machine — be precise

Redaction example — categorizer (`core/categorizer.py:303`)

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spendif.ai v3.0

What it is (technical)

What's implemented

What's coming next

👩‍💻 Develop locally

Run as a native desktop app from source

Run tests

Architecture

Repository layout

📚 Documentation

Contributing

License

What leaves the machine — be precise

Redaction example — categorizer (core/categorizer.py:303)

Roadmap

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Redaction example — categorizer (`core/categorizer.py:303`)

Packages