Exploratory Data Analysis (EDA) on Spotify’s music dataset (175k rows). Includes feature exploration, normalization, outlier detection, and trend analysis across decades to uncover how music characteristics evolved over time.
This project performs an in-depth exploratory data analysis (EDA) on a large-scale Spotify dataset containing over 175,000 songs.
The goal is to explore the evolution of musical characteristics across decades and uncover key patterns in the data using statistical and visual analysis.
- Clean and preprocess the raw dataset (handling duplicates, missing values, inconsistent date formats).
- Engineer meaningful features such as
duration_min,tempo_energy, andvibe_score. - Analyze numerical variables like
danceability,energy,loudness, andpopularityover time. - Detect outliers and study feature distributions and correlations.
- Visualize long-term musical trends across decades.
- Loudness and energy have both increased notably over time — confirming the "Loudness War" phenomenon.
- Acousticness has decreased, showing a shift toward more electronic production styles.
- Danceability and valence reveal that modern songs tend to be more upbeat and emotionally positive.
- Popularity remains highly right-skewed — very few songs achieve exceptional fame.
- Outlier analysis shows extreme loudness variability in the 1940s–1950s, likely due to recording limitations.
- Python (Pandas, NumPy, Seaborn, Matplotlib, Kagglehub and OS)
- Datetime was about to be used however after analysis I decided not to.
- Jupyter Notebook for full workflow transparency
- Data Import
- Data Inspection
- Data Cleaning
- Feature Engineering
- Statistical Analysis
- Visualization
- Conclusion
Panagiotis Kardatos
Mathematician | Aspiring Machine Learning Engineer