This plan uses the 20,000 historical transactions (1390-1440) as a rich, real-world dataset to teach Pandas fundamentals through progressively complex analyses.
Learning Objectives:
- Make a special git branch to hold all the changes to the dataset
- Understand what Pandas is and why it's essential for data engineering
- Load CSV and JSON data into DataFrames
- Explore basic DataFrame properties
Hands-on Activities:
-
Load the transaction data
import pandas as pd df = pd.read_csv('medici_transactions.csv')
-
Initial exploration
df.head(),df.tail(),df.info(),df.describe()- Understanding DataFrame structure vs the Python dictionaries in medici-banking.py
- Compare loading CSV vs JSON formats
-
Key questions to answer:
- How many transactions are there?
- What columns exist?
- What data types are present?
- Are there any missing values?
Learning Objectives:
- Handle data types (especially dates and decimals)
- Identify and handle missing values
- Basic data validation
Hands-on Activities:
-
Convert date strings to datetime objects
df['date'] = pd.to_datetime(df['date'])
-
Validate double-entry accounting
- Calculate total debits vs total credits
- Verify the accounting equation holds (matching validate_transactions.py)
-
Handle optional fields
- Investigate
credit_account_2andcredit_amount_2 - Fill NaN values appropriately
- Investigate
-
Data type optimization
- Convert currency amounts to proper numeric types
- Categorical data for branches, types, counterparties
Learning Objectives:
- Filter and select data
- Group by operations
- Basic aggregation functions
Hands-on Activities:
-
Transaction type analysis
df.groupby('type')['debit_amount'].agg(['count', 'sum', 'mean'])
-
Branch performance
- Which branch processed the most transactions?
- Which branch handled the largest total value?
- Compare with the 33.1% Rome figure from documentation
-
Time-based filtering
- Filter transactions during the Western Schism (1402-1420)
- Find the Council of Constance ransom (May 29, 1415)
- War financing during Florentine-Milanese wars
-
Counterparty analysis
- Who were the most frequent customers?
- Which counterparties had the largest transactions?
Learning Objectives:
- Work with datetime indices
- Resample and aggregate over time periods
- Identify trends and patterns
Hands-on Activities:
-
Set datetime index
df.set_index('date', inplace=True)
-
Temporal aggregations
- Monthly transaction volumes
- Quarterly revenue patterns
- Yearly trends from 1390-1440
-
Historical event analysis
- Spike in transactions during Council of Constance (1414-1418)
- War financing patterns during conflict periods
- Seasonal patterns in banking activity
-
Moving averages and trends
- Calculate 90-day moving average of transaction volumes
- Identify peak activity periods
Learning Objectives:
- Complex boolean indexing
- Query syntax
- Multi-condition filtering
Hands-on Activities:
-
Find specific transaction types
- All papal deposits over 10,000 florins
- War financing to Republic of Florence
- Bills of exchange between specific branches
-
Complex queries
papal_war_loans = df.query('type == "war_financing" and counterparty.str.contains("Pope")')
-
Account analysis
- Track all transactions affecting "Cash" account
- Analyze loan repayments with interest components
- Revenue vs expense patterns
Learning Objectives:
- Create calculated columns
- Apply custom functions
- Reshape data (pivot, melt, merge)
Hands-on Activities:
-
Create derived metrics
- Total transaction value (debit + credits)
- Extract interest amounts from loan repayments
- Calculate transaction fees from bills of exchange
-
Categorize transactions
- Create "transaction_size" categories (small/medium/large)
- Flag high-value transactions (>10,000 florins)
- Identify multi-account transactions
-
Branch comparison tables
pivot = df.pivot_table( values='debit_amount', index='branch', columns='type', aggfunc='sum' )
Learning Objectives:
- Use Pandas plotting capabilities
- Prepare data for visualization libraries
- Tell stories with data
Hands-on Activities:
-
Built-in Pandas plots
- Transaction volume over time (line plot)
- Transaction type distribution (bar plot)
- Branch comparison (horizontal bar)
-
Prepare data for Matplotlib/Seaborn
- Aggregate data appropriately
- Handle date formatting for plots
- Create multi-series comparisons
-
Historical narrative visualization
- Show the impact of the Council of Constance ransom
- Visualize war financing spikes during conflicts
- Compare papal banking (Rome) vs commercial (Florence)
Learning Objectives:
- Optimize Pandas operations
- Memory management with large datasets
- Vectorization vs loops
Hands-on Activities:
-
Memory optimization
df.memory_usage(deep=True) # Optimize with categorical dtypes
-
Efficient operations
- Compare
.apply()vs vectorized operations - Use
.groupby()efficiently - Avoid iterating with
.iterrows()
- Compare
-
Chunking for larger datasets
- Read CSV in chunks
- Process incrementally
Project: Create a complete analytical report on Medici Bank operations
Requirements:
-
Financial Health Analysis
- Verify double-entry accounting across all transactions
- Calculate net income by year
- Track assets, liabilities, equity over time
-
Business Intelligence
- Identify most profitable branches
- Analyze customer segments (papal, merchant, noble)
- Calculate average interest rates on loans
-
Historical Insights
- Document impact of major historical events
- Correlate war periods with financing activity
- Trace growth of international branch network
-
Deliverables
- Jupyter notebook with analysis
- Summary visualizations
- Written findings (2-3 pages)
- Recommendations for "modern-day Medici Bank"
For advanced learners:
-
Merge with external data
- Historical inflation data
- Convert florins to modern currency
- GDP comparison
-
Statistical analysis
- Correlation between transaction types
- Outlier detection
- Distribution fitting
-
Export results
- Create summary CSV/Excel files
- Generate HTML reports
- Prepare data for databases
Throughout the course:
- Short coding exercises after each module
- Peer code reviews
- Progressive complexity in queries
Final evaluation:
- Capstone project (60%)
- Module quizzes (20%)
- Participation in code reviews (20%)
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # to start the venv
# Install required packages
pip install pandas jupyter matplotlib seaborn
# Launch Jupyter
jupyter notebook- Real-world complexity: Multi-account transactions, optional fields, various transaction types
- Historical context: Makes learning memorable and engaging
- Large enough: 20,000 rows provide meaningful aggregations
- Domain knowledge: Links to accounting principles from medici-banking.py
- Validation built-in: Can verify results against double-entry rules
- Multiple formats: Practice with both CSV and JSON
This plan provides a very intensive introduction to Pandas for data engineering.