Skip to content

check_model() performance degradation with large datasets #420

@ANAMASGARD

Description

@ANAMASGARD

Problem

check_model() becomes unusably slow (5+ minutes) when checking models fitted on datasets with >10K observations.

Reproducible Example

library(performance)
library(lme4)

Large dataset
data <- data.frame(
subject = rep(1:500, each = 50),
x = rnorm(25000),
y = rnorm(25000)
)

model <- lmer(y ~ x + (1|subject), data = data)
check_model(model) # Hangs for minutes

text

Root Cause

The plot.check_model() function in R/plot.check_model.R plots ALL data points when show_dots = TRUE, causing rendering slowdown with large datasets.

Proposed Fix

Implement intelligent data sampling in R/plot.check_model.R:

Sample data when too large
if (nrow(model_data) > 5000) {
model_data <- model_data[sample(nrow(model_data), 5000), ]
}

text

This maintains visual fidelity while improving performance.

Related Links : -

easystats/performance#851

Environment

  • R 4.3.0
  • see 0.8.6
  • performance 0.12.4

May I submit a PR with this fix?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions