An R-based data analysis and interactive Shiny dashboard for exploring PFAS hazard patterns across Michigan.
This project combines PFAS contamination site data with drinking water sample data to analyze spatial and temporal trends in hazard index values, identify potential hotspots, and present the results through an interactive dashboard.
PFAS contamination has become an important environmental and public health issue in Michigan. This project analyzes Michigan PFAS data to better understand:
- how hazard index values are distributed across sites and counties
- where contamination hotspots may exist
- how sampling effort varies across space and time
- whether selected sites show statistically meaningful differences in hazard index values
The repository includes both the analytical report and a Shiny dashboard built on top of the cleaned and merged data.
The analysis starts from multiple CSV files in the data/ directory, including site-level data, hazard index sample data, and accompanying data dictionaries. During preprocessing, the project:
- inspects variable types and missingness patterns
- removes columns with heavy missingness that are not essential to the analysis
- assigns missing GEOID values using the nearest county by spatial proximity
- links samples to nearby facilities using a derived
nearest_facilityfield - creates a merged dataset used for downstream analysis and the dashboard
The report explores:
- the overall distribution of hazard index values
- counties with high maximum hazard index values
- sampling effort by county and by month
- site-level comparisons between sample count and maximum hazard index
The project also includes a permutation test comparing hazard index values between Pellston Regional Airport and Manistee Blacker Airport.
- Most hazard index values are low and concentrated below the EPA threshold of concern (
HI < 1). - A small number of counties and sites show much higher maximum hazard index values, suggesting localized hotspots.
- Sampling effort is uneven across counties, which means some areas are monitored much more heavily than others.
- Sampling activity shows seasonality, with higher activity during summer months.
- Although Pellston Regional Airport had a higher mean hazard index than Manistee Blacker Airport, the permutation test did not find strong enough evidence to conclude that the difference was statistically significant.
The Shiny dashboard provides an interactive way to explore the processed data.
- interactive Michigan map with county boundaries
- site markers colored by hazard severity
- click-to-zoom behavior for viewing samples around a selected site
- connecting lines from a selected site to its related samples
- site summary panel with sample count and descriptive statistics
- default global plots for hazard index distribution and sampling effort over time
- site-specific plots after selection, including:
- hazard index distribution for the selected site
- stacked view of hazard index composition by sample for non-zero cases
- embedded analysis report viewer inside the app
pfas-data-analytics/
├── app.R
├── project_report.qmd
├── _quarto.yml
├── README.md
├── data/
│ ├── data_dict_hazard.csv
│ ├── data_dict_sites.csv
│ ├── pfas_hazard_index.csv
│ ├── pfas_public_water_long.csv
│ ├── pfas_sites.csv
│ ├── pfas_surface_water_long.csv
│ └── samples_site.csv
└── www/
This project is built in R and uses packages from the tidyverse ecosystem together with geospatial, reporting, and interactive visualization tools.
Core packages used across the analysis and dashboard include:
tidyversedplyrknitrskimrflextablenaniarpurrrsfplotlytigrisleafletshiny
git clone https://github.com/nishanKhanal/pfas-data-analytics.git
cd pfas-data-analyticsOpen R or RStudio and install the required packages:
install.packages(c(
"tidyverse", "shiny", "leaflet", "plotly", "tigris",
"knitr", "skimr", "flextable", "naniar", "purrr", "sf", "dplyr"
))shiny::runApp("app.R")Because the Quarto configuration writes output to www/, you can regenerate the report with:
quarto render project_report.qmdThe project relies on Michigan PFAS data files stored locally in the repository under data/. The dashboard reads from the processed file data/samples_site.csv, while the analytical report starts from the raw site and hazard datasets and performs cleaning, transformation, and merging steps.
Because some spatial joins are based on nearest-county and nearest-facility assumptions, results should be interpreted carefully, especially for samples with missing location identifiers in the original data.
- Nishan
- Kabin
- Udita
This repository was created as part of an R-based PFAS data analysis project focused on understanding hazard index patterns in Michigan and communicating the results through both a reproducible report and an interactive dashboard.
