Skip to content

nishanKhanal/pfas-data-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Michigan PFAS Analysis

An R-based data analysis and interactive Shiny dashboard for exploring PFAS hazard patterns across Michigan.

This project combines PFAS contamination site data with drinking water sample data to analyze spatial and temporal trends in hazard index values, identify potential hotspots, and present the results through an interactive dashboard.

Live dashboard

Open the dashboard

Dashboard preview

Dashboard screenshot

Project overview

PFAS contamination has become an important environmental and public health issue in Michigan. This project analyzes Michigan PFAS data to better understand:

  • how hazard index values are distributed across sites and counties
  • where contamination hotspots may exist
  • how sampling effort varies across space and time
  • whether selected sites show statistically meaningful differences in hazard index values

The repository includes both the analytical report and a Shiny dashboard built on top of the cleaned and merged data.

What this project does

Data preparation

The analysis starts from multiple CSV files in the data/ directory, including site-level data, hazard index sample data, and accompanying data dictionaries. During preprocessing, the project:

  • inspects variable types and missingness patterns
  • removes columns with heavy missingness that are not essential to the analysis
  • assigns missing GEOID values using the nearest county by spatial proximity
  • links samples to nearby facilities using a derived nearest_facility field
  • creates a merged dataset used for downstream analysis and the dashboard

Exploratory analysis

The report explores:

  • the overall distribution of hazard index values
  • counties with high maximum hazard index values
  • sampling effort by county and by month
  • site-level comparisons between sample count and maximum hazard index

Statistical testing

The project also includes a permutation test comparing hazard index values between Pellston Regional Airport and Manistee Blacker Airport.

Key findings

  • Most hazard index values are low and concentrated below the EPA threshold of concern (HI < 1).
  • A small number of counties and sites show much higher maximum hazard index values, suggesting localized hotspots.
  • Sampling effort is uneven across counties, which means some areas are monitored much more heavily than others.
  • Sampling activity shows seasonality, with higher activity during summer months.
  • Although Pellston Regional Airport had a higher mean hazard index than Manistee Blacker Airport, the permutation test did not find strong enough evidence to conclude that the difference was statistically significant.

Dashboard features

The Shiny dashboard provides an interactive way to explore the processed data.

Main features

  • interactive Michigan map with county boundaries
  • site markers colored by hazard severity
  • click-to-zoom behavior for viewing samples around a selected site
  • connecting lines from a selected site to its related samples
  • site summary panel with sample count and descriptive statistics
  • default global plots for hazard index distribution and sampling effort over time
  • site-specific plots after selection, including:
    • hazard index distribution for the selected site
    • stacked view of hazard index composition by sample for non-zero cases
  • embedded analysis report viewer inside the app

Repository structure

pfas-data-analytics/
├── app.R
├── project_report.qmd
├── _quarto.yml
├── README.md
├── data/
│   ├── data_dict_hazard.csv
│   ├── data_dict_sites.csv
│   ├── pfas_hazard_index.csv
│   ├── pfas_public_water_long.csv
│   ├── pfas_sites.csv
│   ├── pfas_surface_water_long.csv
│   └── samples_site.csv
└── www/

Tools and packages

This project is built in R and uses packages from the tidyverse ecosystem together with geospatial, reporting, and interactive visualization tools.

Core packages used across the analysis and dashboard include:

  • tidyverse
  • dplyr
  • knitr
  • skimr
  • flextable
  • naniar
  • purrr
  • sf
  • plotly
  • tigris
  • leaflet
  • shiny

Running the project locally

1. Clone the repository

git clone https://github.com/nishanKhanal/pfas-data-analytics.git
cd pfas-data-analytics

2. Install required packages

Open R or RStudio and install the required packages:

install.packages(c(
  "tidyverse", "shiny", "leaflet", "plotly", "tigris",
  "knitr", "skimr", "flextable", "naniar", "purrr", "sf", "dplyr"
))

3. Run the dashboard

shiny::runApp("app.R")

4. Render the report

Because the Quarto configuration writes output to www/, you can regenerate the report with:

quarto render project_report.qmd

Data notes

The project relies on Michigan PFAS data files stored locally in the repository under data/. The dashboard reads from the processed file data/samples_site.csv, while the analytical report starts from the raw site and hazard datasets and performs cleaning, transformation, and merging steps.

Because some spatial joins are based on nearest-county and nearest-facility assumptions, results should be interpreted carefully, especially for samples with missing location identifiers in the original data.

Authors

  • Nishan
  • Kabin
  • Udita

Acknowledgment

This repository was created as part of an R-based PFAS data analysis project focused on understanding hazard index patterns in Michigan and communicating the results through both a reproducible report and an interactive dashboard.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors