semseeker

semseeker detects Stochastic Epigenetic Mutations (SEMs) — aberrant, sample-specific methylation changes that deviate from the expected population signal. Starting from a normalised beta-value (or M-value) matrix, semseeker identifies hypomethylated and hypermethylated mutations at single-probe resolution, aggregates them into lesions (genomic clusters of co-occurring mutations), and computes population-level delta metrics (ΔBeta, ΔIV, ΔEV, ΔRV) that summarise stochastic epigenetic variability per region.

Supported input technologies:

Illumina microarrays: EPIC (850k), 450k, 27k — detected automatically
Bisulfite sequencing (WGBS / RRBS) and long-read sequencing (ONT / PacBio): planned (see documents/requirements.md)

Installation

Install from GitHub using devtools:

install.packages("devtools")
library(devtools)
install_github("drake69/semseeker")

Dependencies

semseeker relies on several CRAN packages. Install them before the first use:

install.packages(c(
  "doFuture", "FactoMineR", "factoextra", "FSA", "fst",
  "future", "future.apply", "ggplot2", "gridExtra", "gtools",
  "Hmisc", "lqmm", "progressr", "reshape2", "zoo",
  "VennDiagram", "effsize", "pwr", "combinat", "philentropy",
  "parallelly", "knitr", "lmtest", "usethis", "wordcloud",
  "WebGestaltR", "sqldf", "caret", "openai", "glmnet"
))

Quick Example

Step 1 — Load and normalise IDAT files with ChAMP

library(ChAMP)

idat_folder  <- "~/source_idat/"
result_folder <- "~/result/"

myLoad <- champ.load(
  directory      = idat_folder,
  method         = "minfi",
  methValue      = "B",
  autoimpute     = TRUE,
  filterDetP     = TRUE,
  ProbeCutoff    = 0,
  SampleCutoff   = 0.1,
  detPcut        = 0.01,
  filterBeads    = TRUE,
  beadCutoff     = 0.05,
  filterNoCG     = TRUE,
  filterSNPs     = TRUE,
  filterMultiHit = TRUE,
  filterXY       = TRUE,
  arraytype      = "EPIC"   # or "450K"
)

myNorm <- champ.norm(
  beta       = myLoad$beta,
  rgSet      = myLoad$rgSet,
  mset       = myLoad$mset,
  resultsDir = result_folder,
  method     = "SWAN",
  arraytype  = "EPIC",
  cores      = parallel::detectCores() - 1
)

saveRDS(myNorm, "~/normalizedData.rds")

Step 2 — Run semseeker

library(semseeker)

normalizedData <- readRDS("~/normalizedData.rds")
sample_sheet   <- read.csv2("~/sample_sheet.csv")

semseeker(
  sample_sheet  = sample_sheet,
  signal_data   = normalizedData,
  result_folder = "~/semseeker_result/"
)

Results (BED files, pivot tables, delta-metric summaries) are written to result_folder.

Complete Example

A fully worked example using public data from Gene Expression Omnibus (GEO) is available in the repository's example/ folder.

Input Requirements

Beta-value matrix

Rows = probes (Illumina probe IDs, e.g. cg...)
Columns = samples
Values = beta values in [0, 1] or M-values (detected automatically)

Sample sheet

A data frame (or CSV readable with read.csv2) with at minimum these columns:

Column	Description
`Sample_ID`	Unique sample identifier (must match matrix column names)
`Sample_Name`	Human-readable label
`Sample_Group`	Must be one of `"Case"`, `"Control"`, `"Reference"`

Note: If no Reference population is available, duplicate the Control rows and assign Sample_Group = "Reference" to the duplicates.

Note: Sample_ID values are uppercased internally by semseeker (name_cleaning()); output file names and pivot column headers reflect the uppercased form.

Output

For each sample semseeker writes to result_folder/Data/:

*_MUTATIONS_HYPO.bed / *_MUTATIONS_HYPER.bed — probe-level mutations
*_LESIONS_HYPO.bed / *_LESIONS_HYPER.bed — genomic lesion clusters
*_DELTA*.bed — ΔBeta, ΔIV, ΔEV, ΔRV per region
Pivot tables aggregating all samples per marker

Citation

If you use semseeker in published work, please cite the package:

citation("semseeker")

A Zenodo DOI will be registered for each release tag (see documents/requirements.md).

Name		Name	Last commit message	Last commit date
Latest commit History 476 Commits
..Rcheck		..Rcheck
.github		.github
R		R
data		data
man		man
renv		renv
semseeker.Rcheck		semseeker.Rcheck
semseeker.Rproj.Rcheck		semseeker.Rproj.Rcheck
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.Rprofile		.Rprofile
.gitignore		.gitignore
.gitleaksignore		.gitleaksignore
DESCRIPTION		DESCRIPTION
Dockerfile.ci		Dockerfile.ci
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
ci-local.sh		ci-local.sh
renv.lock		renv.lock
semseeker.Rproj		semseeker.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

semseeker

Installation

Dependencies

Quick Example

Step 1 — Load and normalise IDAT files with ChAMP

Step 2 — Run semseeker

Complete Example

Input Requirements

Beta-value matrix

Sample sheet

Output

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

semseeker

Installation

Dependencies

Quick Example

Step 1 — Load and normalise IDAT files with ChAMP

Step 2 — Run semseeker

Complete Example

Input Requirements

Beta-value matrix

Sample sheet

Output

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages