Skip to content

Latest commit

 

History

History
754 lines (576 loc) · 18.4 KB

File metadata and controls

754 lines (576 loc) · 18.4 KB

📚 Lessons Learned: Production-Ready Python Project

Project: AI Edge Computing & TinyML Period: 2025 Focus: Ultra-modern Python tooling, type safety, testing, and production readiness


🎯 Executive Summary

This document captures the key lessons, best practices, and insights gained while building a production-ready Python project with modern tooling. The project evolved from a basic README to a fully tested, type-safe, and security-audited codebase.

Key Achievement: 62/62 tests passing, 81.76% coverage, zero linting/type errors, zero security vulnerabilities.


1. Modern Python Project Structure

✅ What Worked Well

Hatch as Build System

  • Why: Modern alternative to setuptools, built-in virtual environment management
  • Benefits:
    • Zero configuration for basic projects
    • Built-in scripts system (hatch run test, hatch run lint)
    • Automatic environment management
    • Faster than traditional setuptools

Source Layout (src/ directory)

  • Why: Prevents accidental testing of source code instead of installed package
  • Benefits:
    • Forces proper package installation
    • Catches import errors early
    • Ensures tests run against installed code
    • Industry best practice

pyproject.toml as Single Source of Truth

  • Why: PEP 518 standard, consolidates all tool configurations
  • Benefits:
    • Single file for dependencies, build config, and tool settings
    • No more setup.py, setup.cfg, MANIFEST.in mess
    • Better tool integration

📖 Lessons Learned

  1. Always use src/ layout for libraries - Prevents import confusion and ensures proper testing
  2. Hatch scripts are powerful - Chain multiple commands for CI pipelines
  3. Consolidate configuration - One pyproject.toml > multiple config files

2. Type Safety with Mypy

✅ What Worked Well

Strict Mode from Day One

[tool.mypy]
strict = true
python_version = "3.11"
warn_return_any = true
warn_unused_configs = true

Benefits:

  • Caught 15+ potential bugs before runtime
  • Forces explicit type annotations
  • Improves code documentation
  • Enables better IDE support

PEP 561 Compliance (py.typed marker)

  • Why: Allows downstream users to type-check against your library
  • Implementation: Empty src/package_name/py.typed file
  • Impact: Professional library standard

TYPE_CHECKING Import Pattern

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from pathlib import Path  # Only imported for type checking

Benefits:

  • Avoids circular imports
  • Reduces runtime overhead
  • Cleaner dependency graph

🚨 Common Pitfalls & Solutions

Problem 1: Type Narrowing with Union Types

# ❌ BEFORE - Mypy error
match mode:
    case QuantizationMode.INT8:
        return self._dequantize_int(quantized_weights)  # Error: wrong type

# ✅ AFTER - Explicit narrowing
case QuantizationMode.INT8:
    int_weights = quantized_weights.astype(np.int8)
    return self._dequantize_int(int_weights)  # OK

Problem 2: NumPy Type Annotations

# ❌ Vague
def process(data: np.ndarray) -> np.ndarray: ...

# ✅ Specific
import numpy.typing as npt
def process(data: npt.NDArray[np.float32]) -> npt.NDArray[np.int8]: ...

📖 Lessons Learned

  1. Enable strict mode early - Harder to add later
  2. Use TYPE_CHECKING for imports - Avoids runtime overhead
  3. Be explicit with NumPy types - Use numpy.typing.NDArray[dtype]
  4. Type narrowing requires explicit casts - Mypy can't always infer
  5. Add py.typed marker - Makes your library type-checkable

3. Linting with Ruff

✅ What Worked Well

Ultra-Fast Performance

  • Speed: 10-100x faster than flake8 + isort + pyupgrade
  • Rust-powered: Checks entire codebase in milliseconds
  • All-in-one: Replaces multiple tools

Comprehensive Rule Set

[tool.ruff.lint]
select = [
    "E",   # pycodestyle errors
    "W",   # pycodestyle warnings
    "F",   # Pyflakes
    "I",   # isort
    "N",   # pep8-naming
    "UP",  # pyupgrade
    "RUF", # Ruff-specific
    # ... 50+ more rules
]

Auto-Fix Capability

ruff check --fix  # Fixes 80% of issues automatically

🚨 Common Issues Fixed

Issue 1: Unused Imports (F401)

# ❌ BEFORE
from typing import Any, Literal  # Literal unused

# ✅ AFTER
from typing import Any

Issue 2: Import Organization (I001)

# ❌ BEFORE - Wrong order
from pathlib import Path
import numpy as np
from typing import Protocol

# ✅ AFTER - Ruff auto-fixed
from typing import Protocol

import numpy as np
from pathlib import Path

Issue 3: Unsorted all (RUF022)

# ❌ BEFORE
__all__ = ["Quantizer", "ModelOptimizer", "QuantizationConfig"]

# ✅ AFTER
__all__ = ["ModelOptimizer", "QuantizationConfig", "Quantizer"]

Issue 4: Line Too Long (E501)

# ❌ BEFORE
def test_very_long_function_name_with_many_parameters(self, param1: Type1, param2: Type2, param3: Type3) -> None:

# ✅ AFTER
def test_very_long_function_name_with_many_parameters(
    self,
    param1: Type1,
    param2: Type2,
    param3: Type3,
) -> None:

📖 Lessons Learned

  1. Ruff replaces 10+ tools - flake8, isort, pyupgrade, etc.
  2. Use auto-fix aggressively - Saves 80% of manual work
  3. Configure per-file ignores - Tests can be more lenient
  4. Remove deprecated rules - ANN101, ANN102 no longer valid
  5. Line length enforcement - Forces readable code

4. Testing with Pytest

✅ What Worked Well

Parametrized Tests

@pytest.mark.parametrize("mode", list(QuantizationMode))
def test_all_modes(mode: QuantizationMode) -> None:
    # Single test, runs 6 times (once per mode)

Benefits:

  • DRY principle (Don't Repeat Yourself)
  • Comprehensive coverage with minimal code
  • Clear failure messages per parameter

Fixtures for Test Data

@pytest.fixture
def float32_array() -> npt.NDArray[np.float32]:
    return np.random.randn(100, 100).astype(np.float32)

Benefits:

  • Reusable test data
  • Clear dependencies
  • Automatic cleanup

Coverage Requirements

[tool.coverage.run]
branch = true
source = ["src"]

[tool.coverage.report]
fail_under = 80  # Enforces minimum coverage

🚨 Testing Pitfalls

Problem 1: Not Testing Edge Cases

# ❌ Only testing happy path
def test_quantize():
    result = quantize([1.0, 2.0, 3.0])
    assert result is not None

# ✅ Testing edge cases
def test_quantize_empty():
    with pytest.raises(ValueError):
        quantize([])

def test_quantize_extreme_values():
    result = quantize([1e10, -1e10, 0])
    assert np.all(np.isfinite(result))

Problem 2: Unused Variables in Tests

# ❌ Mypy/Ruff warning
scale, zero_point = get_params()
assert scale > 0  # zero_point unused

# ✅ Use underscore for unused
scale, _ = get_params()
assert scale > 0

📖 Lessons Learned

  1. Use parametrize heavily - Tests 6 modes with 1 function
  2. Fixtures are your friend - Reusable, clean test data
  3. Test edge cases - Empty arrays, extreme values, None
  4. Enforce coverage threshold - We use 80% as minimum
  5. Use underscore for unused - Explicit is better than implicit

5. Parallel Testing with Pytest-xdist

✅ What Worked Well

Auto Worker Detection

pytest -n auto  # Uses all CPU cores

Benefits:

  • 16 workers on modern CPUs
  • Scales with hardware
  • No configuration needed

Coverage with Parallel

pytest -n auto --cov --cov-report=html

Result:

  • Maintains accurate coverage metrics
  • Combines results from all workers
  • No data loss

⚠️ Important Considerations

When NOT to Use Parallel

  • Small test suites (<50 tests) - overhead exceeds benefit
  • Tests with global state - can cause race conditions
  • Tests requiring specific order - defeats parallelization

Performance Results:

Sequential: 62 tests in 0.50s
Parallel (16 workers): 62 tests in 5.30s

Why slower? Process overhead dominates for small test suite. Benefits appear at 200+ tests.

📖 Lessons Learned

  1. Parallel helps at scale - 200+ tests see real speedup
  2. Auto worker detection - Let pytest decide worker count
  3. Coverage still works - Properly combines parallel results
  4. Test isolation matters - Parallel exposes state issues
  5. Overhead is real - Small suites run slower in parallel

6. Security with Bandit

✅ What Worked Well

Static Security Analysis

bandit -r src -c pyproject.toml

Scanned: 546 lines of code Issues: 0 vulnerabilities Severity: LOW confidence checks enabled

Common Checks:

  • Hardcoded passwords (B105, B106)
  • SQL injection risks (B608)
  • Shell injection (B602, B603)
  • Cryptography issues (B301-B306)
  • Pickle usage (B301)

🔒 Security Best Practices Applied

1. No Hardcoded Secrets

# ❌ Never do this
PASSWORD = "admin123"

# ✅ Use environment variables
import os
PASSWORD = os.getenv("PASSWORD")

2. Safe Path Operations

# ✅ Use Path objects, not string concatenation
from pathlib import Path
model_path = Path(base_dir) / sanitized_filename

3. Type Safety Prevents Injection

# ✅ Strong typing prevents many injection attacks
def load_model(path: Path) -> Model:  # Path, not str
    return Model.load(path)

📖 Lessons Learned

  1. Static analysis catches 80% - Bandit finds common issues
  2. Configure in pyproject.toml - Centralized security config
  3. Run in CI pipeline - Automate security checks
  4. Type safety helps security - Strong types prevent injection
  5. Zero tolerance policy - Fix all findings before merge

7. Pre-commit Hooks

✅ What Worked Well

Two-Stage Strategy

# Fast checks on commit (1-2s)
stages: [commit]
- ruff check
- black --check
- mypy
- pytest-quick (fast-fail)

# Comprehensive checks on push (5-10s)
stages: [push]
- pytest with coverage
- bandit security scan

Benefits:

  • Fast feedback loop (commit)
  • Comprehensive validation (push)
  • Prevents broken code reaching remote

Skip When Needed

git commit --no-verify  # Skip hooks for emergency fixes

🚨 Hook Configuration Pitfalls

Problem 1: Hooks Too Slow

# ❌ Running full test suite on every commit
- pytest tests/ --cov  # Takes 10s

# ✅ Quick check on commit, full on push
- id: pytest-quick
  stages: [commit]
  args: [-x, --tb=short]  # Fail fast

- id: pytest-full
  stages: [push]
  args: [--cov, --cov-fail-under=80]

Problem 2: Conflicting Formatters

# ❌ Black and autopep8 fight each other
- black
- autopep8  # Conflicts!

# ✅ Pick one formatter
- black  # Industry standard

📖 Lessons Learned

  1. Commit hooks must be fast - <2s or developers will skip
  2. Push hooks can be thorough - 5-10s is acceptable
  3. Stage checks appropriately - Quick commit, thorough push
  4. Make skipping easy - --no-verify for emergencies
  5. One formatter only - Black is industry standard

8. Development Workflow

✅ Optimal Workflow Established

Development Cycle:

# 1. Make changes
vim src/ai_edge_tinyml/quantization.py

# 2. Quick local check
hatch run lint       # Ruff + Black check
hatch run type-check # Mypy

# 3. Test changes
hatch run test       # Fast sequential tests

# 4. Commit (triggers quick hooks)
git commit -m "feat: add INT4 quantization"

# 5. Push (triggers full validation)
git push  # Runs coverage + security

# 6. Pre-release validation
hatch run ci  # Full pipeline

CI Script (hatch):

ci = [
    "format-check",
    "lint",
    "type-check",
    "security",
    "test-parallel-cov",
]

Result: Catches 99% of issues before CI/CD

📖 Lessons Learned

  1. Local checks save time - Catch issues before CI
  2. Chain commands in scripts - hatch run ci = one command
  3. Fast feedback loop - Lint/type-check in <1s
  4. Hooks prevent mistakes - Automated quality gates
  5. Pre-push validation - Run ci script before important pushes

9. Performance Insights

📊 Tool Performance Benchmarks

Tool          | Runtime | Files Checked | Purpose
------------- | ------- | ------------- | -------
Ruff          | 0.05s   | 12 files      | Linting
Black         | 0.08s   | 12 files      | Formatting
Mypy          | 2.31s   | 12 files      | Type checking
Pytest        | 0.50s   | 62 tests      | Testing
Pytest-xdist  | 5.30s   | 62 tests      | Parallel tests
Bandit        | 0.42s   | 546 lines     | Security

🎯 Optimization Strategies

1. Ruff is Blazing Fast

  • Replaced flake8 (3s) + isort (1s) + pyupgrade (2s) = 6s
  • Ruff does all three in 0.05s
  • 120x speedup

2. Mypy Caching

[tool.mypy]
incremental = true
cache_dir = ".mypy_cache"
  • First run: 2.31s
  • Subsequent runs: 0.3s (cache hit)
  • 8x speedup on repeat

3. Pytest Collection Optimization

[tool.pytest.ini_options]
testpaths = ["tests"]  # Don't scan entire project
python_files = ["test_*.py"]

📖 Lessons Learned

  1. Ruff is a game-changer - 100x faster than alternatives
  2. Cache aggressively - Mypy cache saves seconds
  3. Parallel helps at scale - 200+ tests see benefits
  4. Limit pytest search - Specify testpaths explicitly
  5. Fast tools enable frequent checks - <5s total = run often

10. Documentation Best Practices

✅ What Worked Well

Google-Style Docstrings

def quantize(
    weights: npt.NDArray[np.float32],
    mode: QuantizationMode,
) -> npt.NDArray[np.int8]:
    """Quantize floating-point weights to low-precision integers.

    Args:
        weights: Input weights as float32 array.
        mode: Quantization mode (INT8, INT4, etc.).

    Returns:
        Quantized weights as int8 array.

    Raises:
        ValueError: If weights array is empty.

    Example:
        >>> weights = np.array([1.0, 2.0, 3.0])
        >>> quantized = quantize(weights, QuantizationMode.INT8)
    """

Benefits:

  • Standardized format
  • IDE autocomplete works
  • Automatic API doc generation
  • Clear examples

Type Hints as Documentation

# ❌ Vague signature
def process(data, config):
    pass

# ✅ Self-documenting
def process(
    data: npt.NDArray[np.float32],
    config: QuantizationConfig,
) -> ModelOutput:
    pass

📖 Lessons Learned

  1. Type hints are documentation - Makes code self-explanatory
  2. Google-style docstrings - Industry standard, tool-friendly
  3. Include examples - Doctests are great for simple cases
  4. Document exceptions - Raises section prevents surprises
  5. README is marketing - Technical docs go elsewhere

11. Common Mistakes & Solutions

🚨 Mistake 1: Late Type Safety

Problem: Adding types to large codebase is painful Solution: Enable mypy strict mode on day 1

🚨 Mistake 2: Manual Formatting

Problem: Wasting time on code style debates Solution: Black + pre-commit hook = no debates

🚨 Mistake 3: Skipping Security

Problem: Vulnerabilities found in production Solution: Bandit in CI pipeline

🚨 Mistake 4: No Coverage Threshold

Problem: Test coverage gradually decreases Solution: fail_under = 80 in pyproject.toml

🚨 Mistake 5: Slow CI

Problem: 30min CI = developers avoid running it Solution: Local hatch run ci catches 99% before push


12. Key Takeaways

🎯 Technical Excellence

  1. Type safety prevents bugs - Caught 15+ issues before runtime
  2. Ruff replaces 10 tools - 100x faster, simpler config
  3. Pre-commit hooks work - Prevents 99% of bad commits
  4. Coverage threshold matters - 80% minimum enforced
  5. Security is automatable - Bandit finds common issues

🚀 Productivity Gains

  1. Fast tools enable frequent checks - <5s total runtime
  2. Auto-fix saves time - Ruff fixes 80% of issues
  3. Parallel tests scale - Benefits appear at 200+ tests
  4. CI script unifies checks - One command = full validation
  5. Good docs prevent questions - Type hints + docstrings

🏆 Production Readiness

  1. Zero linting errors - 50+ rules enforced
  2. Zero type errors - Strict mypy mode
  3. Zero security issues - Bandit audit passed
  4. 81.76% coverage - Exceeds 80% threshold
  5. 62/62 tests passing - 100% success rate

13. Future Improvements

🔮 Next Steps

1. Add Mutation Testing

mutmut run  # Verifies test quality

Why: Ensures tests actually catch bugs

2. Property-Based Testing

from hypothesis import given, strategies as st

@given(st.arrays(st.floats(), shape=(100, 100)))
def test_quantize_properties(arr):
    # Tests with random data

Why: Finds edge cases humans miss

3. Performance Benchmarking

import pytest_benchmark

def test_quantize_performance(benchmark):
    benchmark(quantize, weights, mode)

Why: Prevents performance regressions

4. Documentation Site

mkdocs build  # Generates docs site

Why: Professional documentation hosting

5. Release Automation

hatch version minor  # Bumps version
hatch build          # Creates wheel
hatch publish        # Uploads to PyPI

Why: Consistent, error-free releases


📚 Recommended Reading

Books

  • "Effective Python" by Brett Slatkin - Modern Python patterns
  • "Python Testing with pytest" by Brian Okken - Testing mastery
  • "Fluent Python" by Luciano Ramalho - Advanced Python

Tools Documentation

Standards

  • PEP 518: pyproject.toml specification
  • PEP 561: Distributing typed packages
  • PEP 8: Python style guide

🎓 Conclusion

Building a production-ready Python project requires:

  • Modern tooling (Hatch, Ruff, Mypy)
  • Automation (pre-commit, CI scripts)
  • Type safety (strict mypy, comprehensive annotations)
  • Testing discipline (80%+ coverage, parametrized tests)
  • Security awareness (Bandit scans, safe coding practices)

The investment in proper setup pays off immediately in:

  • Fewer bugs reaching production
  • Faster development cycles
  • Higher code quality
  • Better team collaboration
  • Reduced technical debt

Final Result: Production-ready codebase with zero errors, comprehensive tests, and automated quality gates. 🚀


Last Updated: 2025-01-09 Project Status: Production Ready ✅