🚖 Uber Ride Analysis - SQL Project

📌 Project Overview

This project focuses on analyzing Uber's operational data to uncover insights related to ride volume, revenue streams, and cancellation patterns. By utilizing advanced SQL techniques, raw dirty data was transformed into actionable business intelligence.

The primary goal is to demonstrate the End-to-End Data Analysis process: from raw data to structured insights.

🛠️ The Process

1. Data Cleaning & Schema Design

The raw dataset contained unformatted columns (c1, c2...) and mixed data types.

Schema Mapping: Renamed cryptic columns to meaningful names (e.g., c16 → incomplete_reason).
Data Type Conversion: Converted string values to DECIMAL for calculations and DATE/TIME for analysis.
Handling Nulls: Cleaned NaN values in financial columns using NULLIF.

2. Feature Engineering

New variables were created to enable deeper analysis:

time_of_day: Segmented rides into Morning, Afternoon, Evening, and Night.
day_name: Extracted day names (Mon, Tue...) to analyze weekly trends.
month_name: Extracted month names for seasonal analysis.

🔍 Key Business Questions Answered

1️⃣ Generic & Operational Analysis

Cancellation Rate: What is the ratio of completed vs. cancelled rides?
Root Cause: What are the primary reasons for incomplete rides (Driver vs. Customer)?
Peak Days: Which day of the week sees the highest demand?

2️⃣ Product & Vehicle Analysis

Top Earners: Which vehicle type (e.g., UberXL, Premier) generates the highest revenue?
Cost Efficiency: What is the average price-per-km for each vehicle category?

3️⃣ Sales (Revenue) Analysis

Revenue by Payment: Which payment method (Cash, Card, UPI) drives the most cash flow?
Revenue Loss: How much potential revenue is estimated to be lost due to high cancellation rates? (Critical Financial KPI)
Top Locations: Which pickup locations contribute to 80% of the total revenue?

4️⃣ Time-Based Analysis

Peak Hours: At what time of day do customers book the most rides?
Pricing Dynamics: Does the average fare increase during night hours (Surge Pricing)?

5️⃣ Route & Efficiency Analysis

Top Routes: Which Pickup-Dropoff combinations are the most popular?
Operational Speed (VTAT): Which vehicle types arrive at the customer location the fastest?

Analysis based on Domain Expertise & Operational Experience

During the exploratory analysis, distinct patterns indicated that this dataset is synthetic. Comparing the data trends against real-world logistics experience (e.g., Chicago ride-share dynamics), the following discrepancies were identified:

The "Human Factor" in Cancellations

Real-World Reality: Cancellations are driven by complex variables. Based on field experience, drivers often cancel due to safety concerns (e.g., dangerous neighborhoods, masked individuals), customer condition (intoxicated/hygiene issues), or unprofitable pickups (long pickup distance for a short trip).

Dataset Pattern: The data shows a uniform cancellation distribution across reasons (~33% split), failing to capture these behavioral and environmental triggers.

Economic Discrepancies (Vehicle Types)

Real-World Reality: The operating costs (fuel, maintenance) of an UberXL are significantly higher than an eBike. Therefore, the pricing model must reflect this difference.

Dataset Pattern: The analysis revealed near-identical Average Fare and Price per KM metrics across all vehicle types (e.g., eBike vs. Premier), which contradicts basic logistics economics.

📂 Repository Structure

├── raw_data_uber.csv       -- The original dataset (dirty data with c1, c2 headers)
├── 1_schema_setup.sql          -- Database creation & table definition
├── 2_etl_data_cleaning.sql     -- ETL process: Cleaning raw data & Feature Engineering
└── 3_business_analysis.sql     -- SQL queries for KPIs and Business Insights

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚖 Uber Ride Analysis - SQL Project

📌 Project Overview

🛠️ The Process

1. Data Cleaning & Schema Design

2. Feature Engineering

🔍 Key Business Questions Answered

1️⃣ Generic & Operational Analysis

2️⃣ Product & Vehicle Analysis

3️⃣ Sales (Revenue) Analysis

4️⃣ Time-Based Analysis

5️⃣ Route & Efficiency Analysis

Analysis based on Domain Expertise & Operational Experience

📂 Repository Structure

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
1_schema_setup.sql		1_schema_setup.sql
2_etl_data_cleaning.sql		2_etl_data_cleaning.sql
3_business_analysis.sql		3_business_analysis.sql
README.md		README.md
raw_data_uber.csv		raw_data_uber.csv

tuhanx/Uber-Ride-Analysis-SQL-Project

Folders and files

Latest commit

History

Repository files navigation

🚖 Uber Ride Analysis - SQL Project

📌 Project Overview

🛠️ The Process

1. Data Cleaning & Schema Design

2. Feature Engineering

🔍 Key Business Questions Answered

1️⃣ Generic & Operational Analysis

2️⃣ Product & Vehicle Analysis

3️⃣ Sales (Revenue) Analysis

4️⃣ Time-Based Analysis

5️⃣ Route & Efficiency Analysis

Analysis based on Domain Expertise & Operational Experience

📂 Repository Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages