feat: gnn diagnostics tooling #353

paulhendricks · 2025-11-24T02:02:14Z

Summary

Add a runnable synthetic diagnostics demo that trains a small MLP, forces a learning-rate plateau, and emits loss, curvature, confusion-matrix, and degree-bucket plots under artifacts/.
Introduce reusable helpers for degree decile evaluation, overall confusion matrix plotting, and Hessian top-eigenvalue estimation/visualization to probe training curvature.

copy-pr-bot · 2025-11-24T02:02:17Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2025-11-24T02:04:38Z

Greptile Overview

Greptile Summary

This PR introduces a comprehensive GNN diagnostics toolkit with utilities for analyzing model training and performance. The changes add a new example directory with scripts for environment verification, synthetic dataset generation, and multiple diagnostic visualizations including Hessian eigenvalue tracking, confusion matrices, and degree-based performance analysis.

Key additions:

Environment verification script (verify_cugraph_gnn.py) to validate torch/PyG/cuGraph setup
Synthetic GNN training scripts for smoke testing and controlled diagnostic demonstrations
Hessian eigenvalue estimation via power iteration for loss curvature analysis
Degree-based performance slicing to identify model behavior on high/low-degree nodes
Confusion matrix visualization utilities
End-to-end demo script that integrates all diagnostic tools with forced learning rate plateaus

Previous review issues addressed:

Fixed import ordering in synthetic_diagnostics_demo.py
Corrected label text from "LR zeroed" to "LR reduced" to match actual behavior
Replaced pd.qcut with percentile-based binning using np.percentile and np.digitize to avoid bucket mismatch errors

The implementation is well-documented with clear docstrings, follows Python best practices, and provides practical diagnostic tools for GNN model analysis.

Confidence Score: 5/5

This PR is safe to merge with no blocking issues
All previously identified issues have been properly addressed. The code is well-structured with comprehensive documentation, proper error handling, and clear separation of concerns. The diagnostic utilities are self-contained examples that don't modify core library code, reducing risk of regression.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
python/cugraph-pyg/cugraph_pyg/examples/gnn_diagnostics/train_synthetic_gnn.py	5/5	Clean smoke-test script for validating torch/PyG installation with synthetic data
python/cugraph-pyg/cugraph_pyg/examples/gnn_diagnostics/hessian_top_eigen.py	5/5	Well-documented Hessian eigenvalue estimation using power iteration and VHP
python/cugraph-pyg/cugraph_pyg/examples/gnn_diagnostics/degree_decile_performance.py	5/5	Degree-based performance analysis using percentile binning, fixed from previous qcut approach
python/cugraph-pyg/cugraph_pyg/examples/gnn_diagnostics/synthetic_diagnostics_demo.py	5/5	Comprehensive demo script integrating all diagnostic tools with controlled training trajectory

Sequence Diagram

sequenceDiagram
    participant User
    participant Demo as synthetic_diagnostics_demo.py
    participant Data as make_synthetic()
    participant Model as MLP
    participant Hessian as hessian_top_eigen.py
    participant Degree as degree_decile_performance.py
    participant Confusion as overall_confusion_matrix.py
    participant Artifacts as artifacts/

    User->>Demo: Run with CLI args
    Demo->>Data: Generate synthetic dataset
    Data-->>Demo: x, y, degrees (with degree-dependent labels)
    
    Demo->>Model: Initialize MLP & optimizer
    
    loop Training epochs
        Demo->>Model: Forward pass
        Model-->>Demo: Logits & loss
        
        alt Step == plateau_step
            Demo->>Demo: Reduce learning rate by plateau_lr_scale
        end
        
        Demo->>Model: Backward & optimizer step
        
        alt Step % hessian_sample_every == 0
            Demo->>Hessian: estimate_top_eigenvalue_vhp()
            Hessian->>Model: Power iteration via VHP
            Hessian-->>Demo: Top eigenvalue
            Demo->>Demo: Store (step, eigenvalue)
        end
    end
    
    Demo->>Model: Full inference on dataset
    Model-->>Demo: Predictions
    
    Demo->>Confusion: plot_overall_confusion_matrix()
    Confusion->>Artifacts: Save confusion_matrix.png
    
    Demo->>Degree: evaluate_by_degree_bucket()
    Degree->>Degree: Compute percentile bins
    Degree->>Degree: Calculate acc/F1 per bucket
    Degree-->>Demo: results_df, confusions
    
    Demo->>Degree: plot_performance()
    Degree->>Artifacts: Save degree_performance.png
    
    Demo->>Hessian: plot_curvature()
    Hessian->>Artifacts: Save hessian_curve.png
    
    Demo->>Artifacts: Save loss_curve.png
    
    Demo-->>User: All diagnostics complete

greptile-apps

_{9 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

python/cugraph-pyg/cugraph_pyg/examples/gnn_diagnostics/synthetic_diagnostics_demo.py

greptile-apps

_{9 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

python/cugraph-pyg/cugraph_pyg/examples/gnn_diagnostics/synthetic_diagnostics_demo.py

python/cugraph-pyg/cugraph_pyg/examples/gnn_diagnostics/README.md

python/cugraph-pyg/cugraph_pyg/examples/gnn_diagnostics/degree_decile_performance.py

…ins to percentile based bins

paulhendricks added 2 commits November 22, 2025 16:58

feat: initial commit with gnn diagnostic tooling

d61fcec

feat: updated instruction

fd85e15

paulhendricks requested a review from a team as a code owner November 24, 2025 02:02

greptile-apps bot reviewed Nov 24, 2025

View reviewed changes

python/cugraph-pyg/cugraph_pyg/examples/gnn_diagnostics/synthetic_diagnostics_demo.py Outdated Show resolved Hide resolved

feat: fix greptile

7abae1d

greptile-apps bot reviewed Nov 24, 2025

View reviewed changes

feat: fixing greptile comments, including changing from decile qcut b…

a9f7065

…ins to percentile based bins

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: gnn diagnostics tooling #353

feat: gnn diagnostics tooling #353

Uh oh!

paulhendricks commented Nov 24, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Nov 24, 2025

Uh oh!

greptile-apps bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: gnn diagnostics tooling #353

Are you sure you want to change the base?

feat: gnn diagnostics tooling #353

Uh oh!

Conversation

paulhendricks commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

copy-pr-bot bot commented Nov 24, 2025

Uh oh!

greptile-apps bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

paulhendricks commented Nov 24, 2025 •

edited

Loading

greptile-apps bot commented Nov 24, 2025 •

edited

Loading