MediciMess/PANDAS_INTRO.md at main · ZipCodeCore/MediciMess

Pandas Learning Plan: Analyzing Medici Bank Transactions

Course Structure Overview

This plan uses the 20,000 historical transactions (1390-1440) as a rich, real-world dataset to teach Pandas fundamentals through progressively complex analyses.

Module 1: Introduction & Data Loading

Learning Objectives:

Make a special git branch to hold all the changes to the dataset
Understand what Pandas is and why it's essential for data engineering
Load CSV and JSON data into DataFrames
Explore basic DataFrame properties

Hands-on Activities:

Load the transaction data

import pandas as pd
df = pd.read_csv('medici_transactions.csv')

Initial exploration
- df.head(), df.tail(), df.info(), df.describe()
- Understanding DataFrame structure vs the Python dictionaries in medici-banking.py
- Compare loading CSV vs JSON formats
Key questions to answer:
- How many transactions are there?
- What columns exist?
- What data types are present?
- Are there any missing values?

Module 2: Data Inspection & Cleaning

Learning Objectives:

Handle data types (especially dates and decimals)
Identify and handle missing values
Basic data validation

Hands-on Activities:

Convert date strings to datetime objects
```
df['date'] = pd.to_datetime(df['date'])
```
Validate double-entry accounting
- Calculate total debits vs total credits
- Verify the accounting equation holds (matching validate_transactions.py)
Handle optional fields
- Investigate credit_account_2 and credit_amount_2
- Fill NaN values appropriately
Data type optimization
- Convert currency amounts to proper numeric types
- Categorical data for branches, types, counterparties

Module 3: Basic Analysis & Aggregation

Learning Objectives:

Filter and select data
Group by operations
Basic aggregation functions

Hands-on Activities:

Transaction type analysis

df.groupby('type')['debit_amount'].agg(['count', 'sum', 'mean'])

Branch performance
- Which branch processed the most transactions?
- Which branch handled the largest total value?
- Compare with the 33.1% Rome figure from documentation
Time-based filtering
- Filter transactions during the Western Schism (1402-1420)
- Find the Council of Constance ransom (May 29, 1415)
- War financing during Florentine-Milanese wars
Counterparty analysis
- Who were the most frequent customers?
- Which counterparties had the largest transactions?

Module 4: Time Series Analysis

Learning Objectives:

Work with datetime indices
Resample and aggregate over time periods
Identify trends and patterns

Hands-on Activities:

Set datetime index
```
df.set_index('date', inplace=True)
```
Temporal aggregations
- Monthly transaction volumes
- Quarterly revenue patterns
- Yearly trends from 1390-1440
Historical event analysis
- Spike in transactions during Council of Constance (1414-1418)
- War financing patterns during conflict periods
- Seasonal patterns in banking activity
Moving averages and trends
- Calculate 90-day moving average of transaction volumes
- Identify peak activity periods

Module 5: Advanced Filtering & Queries

Learning Objectives:

Complex boolean indexing
Query syntax
Multi-condition filtering

Hands-on Activities:

Find specific transaction types
- All papal deposits over 10,000 florins
- War financing to Republic of Florence
- Bills of exchange between specific branches

Complex queries

papal_war_loans = df.query('type == "war_financing" and counterparty.str.contains("Pope")')

Account analysis
- Track all transactions affecting "Cash" account
- Analyze loan repayments with interest components
- Revenue vs expense patterns

Module 6: Data Transformation & Feature Engineering

Learning Objectives:

Create calculated columns
Apply custom functions
Reshape data (pivot, melt, merge)

Hands-on Activities:

Create derived metrics
- Total transaction value (debit + credits)
- Extract interest amounts from loan repayments
- Calculate transaction fees from bills of exchange
Categorize transactions
- Create "transaction_size" categories (small/medium/large)
- Flag high-value transactions (>10,000 florins)
- Identify multi-account transactions

Branch comparison tables

pivot = df.pivot_table(
    values='debit_amount',
    index='branch',
    columns='type',
    aggfunc='sum'
)

Module 7: Data Visualization Integration

Learning Objectives:

Use Pandas plotting capabilities
Prepare data for visualization libraries
Tell stories with data

Hands-on Activities:

Built-in Pandas plots
- Transaction volume over time (line plot)
- Transaction type distribution (bar plot)
- Branch comparison (horizontal bar)
Prepare data for Matplotlib/Seaborn
- Aggregate data appropriately
- Handle date formatting for plots
- Create multi-series comparisons
Historical narrative visualization
- Show the impact of the Council of Constance ransom
- Visualize war financing spikes during conflicts
- Compare papal banking (Rome) vs commercial (Florence)

Module 8: Performance & Best Practices

Learning Objectives:

Optimize Pandas operations
Memory management with large datasets
Vectorization vs loops

Hands-on Activities:

Memory optimization

df.memory_usage(deep=True)
# Optimize with categorical dtypes

Efficient operations
- Compare .apply() vs vectorized operations
- Use .groupby() efficiently
- Avoid iterating with .iterrows()
Chunking for larger datasets
- Read CSV in chunks
- Process incrementally

Capstone Project: Comprehensive Analysis

Project: Create a complete analytical report on Medici Bank operations

Requirements:

Financial Health Analysis
- Verify double-entry accounting across all transactions
- Calculate net income by year
- Track assets, liabilities, equity over time
Business Intelligence
- Identify most profitable branches
- Analyze customer segments (papal, merchant, noble)
- Calculate average interest rates on loans
Historical Insights
- Document impact of major historical events
- Correlate war periods with financing activity
- Trace growth of international branch network
Deliverables
- Jupyter notebook with analysis
- Summary visualizations
- Written findings (2-3 pages)
- Recommendations for "modern-day Medici Bank"

Bonus: Advanced Topics (Optional)

For advanced learners:

Merge with external data
- Historical inflation data
- Convert florins to modern currency
- GDP comparison
Statistical analysis
- Correlation between transaction types
- Outlier detection
- Distribution fitting
Export results
- Create summary CSV/Excel files
- Generate HTML reports
- Prepare data for databases

Assessment Strategy

Throughout the course:

Short coding exercises after each module
Peer code reviews
Progressive complexity in queries

Final evaluation:

Capstone project (60%)
Module quizzes (20%)
Participation in code reviews (20%)

Required Tools & Setup

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # to start the venv

# Install required packages
pip install pandas jupyter matplotlib seaborn

# Launch Jupyter
jupyter notebook

Key Advantages of This Dataset

Real-world complexity: Multi-account transactions, optional fields, various transaction types
Historical context: Makes learning memorable and engaging
Large enough: 20,000 rows provide meaningful aggregations
Domain knowledge: Links to accounting principles from medici-banking.py
Validation built-in: Can verify results against double-entry rules
Multiple formats: Practice with both CSV and JSON

This plan provides a very intensive introduction to Pandas for data engineering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas Learning Plan: Analyzing Medici Bank Transactions

Course Structure Overview

Module 1: Introduction & Data Loading

Module 2: Data Inspection & Cleaning

Module 3: Basic Analysis & Aggregation

Module 4: Time Series Analysis

Module 5: Advanced Filtering & Queries

Module 6: Data Transformation & Feature Engineering

Module 7: Data Visualization Integration

Module 8: Performance & Best Practices

Capstone Project: Comprehensive Analysis

Bonus: Advanced Topics (Optional)

Assessment Strategy

Required Tools & Setup

Key Advantages of This Dataset

FilesExpand file tree

PANDAS_INTRO.md

Latest commit

History

PANDAS_INTRO.md

File metadata and controls

Pandas Learning Plan: Analyzing Medici Bank Transactions

Course Structure Overview

Module 1: Introduction & Data Loading

Module 2: Data Inspection & Cleaning

Module 3: Basic Analysis & Aggregation

Module 4: Time Series Analysis

Module 5: Advanced Filtering & Queries

Module 6: Data Transformation & Feature Engineering

Module 7: Data Visualization Integration

Module 8: Performance & Best Practices

Capstone Project: Comprehensive Analysis

Bonus: Advanced Topics (Optional)

Assessment Strategy

Required Tools & Setup

Key Advantages of This Dataset