Add opt-in torch GPU solver for invert_network by s-sasaki-earthsea-wizard · Pull Request #1490 · insarlab/MintPy

s-sasaki-earthsea-wizard · 2026-05-06T15:09:01Z

Description of proposed changes

This PR adds an opt-in CUDA-accelerated path for the per-pixel weighted
least-squares inversion in the invert_network step (ifgram_inversion.py).
The fork has been running this code on tutorial-scale and large-scale scenes
for several weeks; this submission consolidates the implementation as it
currently stands.

The default path is unchanged. mintpy.networkInversion.solver = auto
resolves to cpu, and the existing CPU code path is byte-for-byte identical
to upstream — every other step in smallbaselineApp.py continues to run on
the CPU regardless of this setting.

The aim is to contribute faster InSAR time-series processing for NVIDIA GPU
users, since invert_network is the dominant CPU bottleneck on typical
workflows and the gap widens with scene size.

Closes #1489 (RFC).

Implementation summary

New module src/mintpy/ifgram_inversion_gpu.py batches the per-pixel WLS
systems on a single CUDA device. The solver is normal-equations +
Cholesky via torch.linalg.cholesky_ex, which (a) is significantly
faster than torch.linalg.lstsq on the matrix shapes encountered here and
(b) lets us detect rank-deficient pixels through the returned info codes
rather than via post-hoc residual checks.
ifgram_inversion.py dispatches to the GPU module only when
solver = torch is explicitly requested; the CPU loop is untouched.
New [gpu] extras in pyproject.toml, sourced from requirements-gpu.txt
(just torch>=2.11). Install requires the PyTorch CUDA wheel index:
pip install -e ".[gpu]" --extra-index-url https://download.pytorch.org/whl/cu128
(documented in docs/installation.md §2.4).
Tests in tests/test_ifgram_inversion_gpu.py cover the dispatch logic and
the GPU fast paths with synthetic NaN / rank-deficient fixtures.

Behavior notes

VRAM auto-sizing — gpuChunkSize = 0 (the default) probes free GPU
memory at runtime and chooses a per-chunk pixel count with a fixed
headroom factor; passing a positive integer overrides this for
reproducible chunking across hosts with different VRAM.
Rank-deficient pixels are detected via cholesky_ex info codes and
zeroed so NaN/Inf cannot propagate downstream; a warning line reports
the count per chunk.
Per-pixel NaN observations are handled by zeroing the corresponding
row weight, which is mathematically equivalent to dropping that row from
the WLS system.
No silent CPU fallback — selecting solver = torch on a host without
a visible CUDA device raises immediately rather than silently falling
back to CPU; this keeps any performance regression visible.

Design pivot vs the original RFC

The RFC (#1489) originally described torch.linalg.lstsq as the GPU solver.
During development the path was switched to normal-equations + Cholesky
after a side-by-side benchmark showed it preserves output equivalence to
float32 round-off (RMS ~1e-5) while running ~16× faster than lstsq on the
same matrix shapes (tutorial dataset: FernandinaSenDT128). An RMS difference
on the order of 1e-5 in the displacement field is well below the typical
InSAR noise floor — sub-millimeter on a per-pixel basis — so the two solvers
are operationally equivalent for the geophysical use case. The lstsq path
was removed before this submission so there is only one supported GPU code
path to reason about.

Performance

Indicative numbers measured on an NVIDIA RTX 5080 (Blackwell sm_120, CUDA
12.8, PyTorch 2.11). Speedup will vary with scene size, GPU class, and
chunk-size tuning.

Scene	Pixels	ifgs	`invert_network` internal	step wall
FernandinaSenDT128 (tutorial)	270k	288	~16× faster	~4.5× faster
GalapagosSenDT128 (large)	3.4M	475	~44× faster	~36× faster

Large-scene absolute timings: CPU 6189 s → torch 170 s on the same machine.
Numerical equivalence between the cpu and torch solvers holds to
float32 round-off in both cases (RMS on the order of 1e-5; absolute RMS
max ~16 µm on the large-scene case).

Reproduction artifacts (harness scripts, raw logs, full reports) live in a
separate repository. Links below are pinned to a single sibling commit so
the data does not move during review:

cpu vs torch end-to-end on Fernandina:
https://github.com/s-sasaki-earthsea-wizard/mintpy-benchmark/blob/c20ca8bb/reports/report_torch.md
lstsq vs Cholesky equivalence + per-step speedup:
https://github.com/s-sasaki-earthsea-wizard/mintpy-benchmark/blob/c20ca8bb/reports/report_solver_comparison.md
Chunk-size sensitivity sweep:
https://github.com/s-sasaki-earthsea-wizard/mintpy-benchmark/blob/c20ca8bb/reports/report_chunk_sweep.md
torch.profiler GPU kernel breakdown:
https://github.com/s-sasaki-earthsea-wizard/mintpy-benchmark/blob/c20ca8bb/reports/report_profile.md
Large-scene Galapagos run:
https://github.com/s-sasaki-earthsea-wizard/mintpy-benchmark/blob/c20ca8bb/reports/report_large_scene.md

Numbers are from a single development machine; absolute timings will vary
across hardware, but the qualitative findings (Cholesky > lstsq; GPU > CPU
at this matrix scale; speedup grows with scene size) should hold for any
recent NVIDIA CUDA-class device. Harness scripts and raw logs in the
mintpy-benchmark repository above let other GPU users reproduce on their
own data.

Local validation

Run on the PR branch (upstream/main + the three commits in this PR), against
FernandinaSenDT128:

pre-commit run --all-files exits clean (13 hooks pass + 1 skip on json,
per the upstream .pre-commit-config.yaml).
smallbaselineApp.py end-to-end with default settings (solver = auto
resolves to cpu): all 18 steps, Normal end of smallbaselineApp processing!,
total wall time 2 h 2 m (the correct_troposphere step's CDS download
dominated; the actual computation portion is small). All standard output
products generated (timeseries.h5, timeseries_ERA5.h5,
timeseries_ERA5_ramp.h5, timeseries_ERA5_ramp_demErr.h5, velocity.h5,
velocityERA5.h5, geo/, etc.).
smallbaselineApp.py end-to-end with solver = torch (same dataset,
ERA5 grib + tropo product reused via symlink so this run only re-exercises
invert_network and the post-tropo steps): all 18 steps, Normal end of smallbaselineApp processing!, total wall time 6 m 57 s. Log confirms the
GPU path was actually entered:
```
mintpy.networkInversion.solver: auto --> torch
estimating time-series via torch solver (batched, GPU)
GPU auto chunk_size = 19403 pixels (free VRAM 15.1 GiB)
estimating time-series via torch batched WLS in 14 chunk(s) of up to 19403 pixels ...
```
Same set of standard output products as the CPU run.

Disclosure

This work was developed with the assistance of Claude Opus 4.7 (Anthropic's coding
assistant). All design decisions, benchmark execution, and review of the
generated code were performed by me. Per project convention the
Assisted-by: Claude Opus 4.7 trailers used during fork development have been stripped
from this branch's commit history; this paragraph is the canonical
disclosure.

If the AI-assisted aspect raises review or maintenance concerns for the
project, I'm happy to discuss — including whether to keep the GPU module
opt-in / under a feature flag.

Reminders

Fix RFC: opt-in GPU backend for invert_network (torch.linalg.lstsq, CUDA) #1489
Pass Pre-commit check (green) — verified locally with the upstream .pre-commit-config.yaml; CI to confirm.
Pass Codacy code review (green)
Pass Circle CI test (green)
Make sure that your code follows our style. Use the other functions/files as a basis.
If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration.
If adding new functionality, add a detailed description to the documentation and/or an example.

Summary by Sourcery

Introduce an opt-in GPU-accelerated solver for the invert_network step using a PyTorch CUDA backend, while preserving the existing CPU behavior as default and documenting configuration and usage.

New Features:

Add a torch-based CUDA-batched weighted least-squares solver for invert_network via the new ifgram_inversion_gpu module.
Expose CLI and template options to select the WLS solver (cpu or torch) and configure GPU chunk size for network inversion.
Provide dedicated documentation describing GPU acceleration for invert_network, including setup, configuration, and performance expectations.

Enhancements:

Extend ifgram_inversion to dispatch to the GPU-batched solver when explicitly requested, without altering the default CPU code path.
Add optional [gpu] extras and GPU-specific requirements to the build configuration for installing CUDA-enabled PyTorch.
Update default smallbaselineApp configuration files to include explicit defaults for the network inversion solver and GPU chunk size.

Tests:

Add numerical-equivalence and behavior tests for the GPU-batched solver, including CUDA-availability gating and solver selection validation.

Add a CUDA-accelerated path for the per-pixel weighted least-squares inversion in `ifgram_inversion.py`, batched as normal-equations + Cholesky on a single CUDA device via PyTorch. The solver is opt-in and the default (`mintpy.networkInversion.solver = auto`) resolves to `cpu`, so existing setups are unaffected and the CPU path is byte-for-byte unchanged. Surface - cfg keys: `mintpy.networkInversion.solver = cpu|torch` (default `auto`), `mintpy.networkInversion.gpuChunkSize = <int>` (default 0 = auto-size). - CLI flags: `--solver {cpu,torch}` and `--gpu-chunk-size N` on `ifgram_inversion.py`. - New module `src/mintpy/ifgram_inversion_gpu.py` holds the torch path; `ifgram_inversion.py` dispatches to it only when `solver=torch` is explicitly requested. Behavior - VRAM auto-sizing probes free GPU memory and chooses a per-chunk pixel count with a fixed headroom factor; `gpuChunkSize > 0` overrides. - Rank-deficient pixels are detected via `torch.linalg.cholesky_ex` info codes and zeroed so NaN/Inf cannot propagate downstream. - Per-pixel NaN observations are handled by zeroing the corresponding row weight, which is mathematically equivalent to dropping that row from the WLS system. - Selecting `solver=torch` on a host without a visible CUDA device raises immediately rather than silently falling back to CPU, keeping any performance regression visible. Packaging - Adds `[gpu]` extras in `pyproject.toml`, sourced from `requirements-gpu.txt`. The PyTorch CUDA wheels live on a separate index; `installation.md` documents the install command in a follow-up commit.

The opt-in GPU solver in `ifgram_inversion_gpu.py` is implemented entirely on top of `torch.linalg.cholesky_ex`, with no cupy entry point. Listing `cupy-cuda12x` in `requirements-gpu.txt` therefore pulls a multi-hundred-MB runtime that no code path imports. Drop it. Pin `torch>=2.11` to match the version exercised in the bench matrix used during development (Blackwell sm_120 wheel from the cu128 index). Earlier torch releases have not been validated against this code path.

Document the new opt-in `torch` GPU solver added in the previous commits: - `docs/gpu.md` — setup, CLI / template surface, behavior notes (VRAM auto-sizing, rank-deficient pixel handling, NaN observations, hard-fail on missing CUDA), and indicative performance numbers. - `docs/installation.md` §2.4 — install the `[gpu]` extras together with the matching PyTorch CUDA wheel index. - `docs/README.md` and `docs/dask.md` — add cross-links so readers can reach the GPU page from the documentation root and from the Dask page (since the two parallelism paths are orthogonal and need to be picked one or the other). Performance numbers in `gpu.md` §4 are stated inline without any external repository links so the page stays self-contained.

sourcery-ai · 2026-05-06T15:09:09Z

Reviewer's Guide

Adds an opt-in CUDA-accelerated PyTorch solver for the invert_network step, with CLI/template wiring, configuration defaults, docs, tests, and packaging extras, while keeping the existing CPU path as the default and behaviorally unchanged.

Sequence diagram for GPU-backed invert_network execution

sequenceDiagram
    actor User
    participant CLI as IfgramInversionCLI
    participant Runner as run_ifgram_inversion
    participant Patch as run_ifgram_inversion_patch
    participant GPU as IfgramInversionGPUModule
    participant Torch as TorchLibrary
    participant CUDA as CUDADevice

    User->>CLI: run ifgram_inversion.py\n--solver torch --gpu-chunk-size 0
    CLI->>Runner: inps with solver=torch,\n gpuChunkSize=0
    Runner->>Patch: run_ifgram_inversion_patch(...,\n solver=torch, gpu_chunk_size=0)

    Patch->>Patch: build design matrices A, B\nselect pixels idx_pixel2inv
    Patch->>GPU: estimate_timeseries_batch(A, B, y, tbase_diff,\n weight_sqrt, min_norm_velocity, rcond,\n min_redundancy, inv_quality_name,\n chunk_size=gpu_chunk_size, solver=torch)

    GPU->>Torch: torch.cuda.is_available()
    Torch-->>GPU: True (or raises error if False)

    GPU->>Torch: cuda.mem_get_info() (when chunk_size<=0)
    Torch-->>GPU: free_bytes, total_bytes
    GPU->>GPU: _auto_chunk_size()\nchoose chunk_size from free VRAM

    loop for each pixel chunk
        GPU->>Torch: as_tensor(G, tbase_diff, y_chunk, w_chunk)
        Torch->>CUDA: launch kernels for Gw, yw, N, r
        CUDA-->>Torch: Gw, yw, N, r on device

        GPU->>Torch: linalg.cholesky_ex(N)
        Torch->>CUDA: batched Cholesky
        CUDA-->>Torch: L, info
        GPU->>GPU: zero-out rank-deficient pixels\n(info != 0)

        GPU->>Torch: cholesky_solve(r, L)
        Torch->>CUDA: solve for X_batch
        CUDA-->>Torch: X_batch

        GPU->>GPU: build ts_chunk, inv_quality_chunk,\n num_inv_obs_chunk
        GPU-->>Patch: partial ts, quality, counts\nfor this chunk
        Patch->>Patch: write into global\narrays at indices
    end

    GPU-->>Patch: ts, inv_quality, num_inv_obs

    Patch->>Patch: assign ts[:, idx_pixel2inv]\ninv_quality[idx_pixel2inv]\nnum_inv_obs[idx_pixel2inv]
    Patch-->>Runner: inversion results
    Runner-->>User: timeseries/velocity outputs\n(with GPU-accelerated invert_network)

Class diagram for the new GPU-batched inversion solver and integration

classDiagram
    class IfgramInversionPatch {
        +run_ifgram_inversion_patch(ifgram_file, box, ref_phase, obs_ds_name, weight_func, water_mask_file, min_norm_velocity, mask_ds_name, mask_threshold, min_redundancy, calc_cov, solver, gpu_chunk_size)
    }

    class IfgramInversionCLI {
        +create_parser(subparsers)
        +--solver: str  (cpu|torch)
        +--gpuChunkSize: int
        +read_template2inps(template_file, inps)
    }

    class SmallbaselineAppDefaults {
        +mintpy.networkInversion.solver: str
        +mintpy.networkInversion.gpuChunkSize: int
    }

    class IfgramInversionGPUModule {
        +SUPPORTED_SOLVERS: tuple
        +DEFAULT_CHUNK_SIZE: int
        +VRAM_SAFETY: float
        +is_solver_available(solver)
        +_get_torch_device(solver)
        +_auto_chunk_size(num_pair, num_unknown, dtype_bytes)
        +_solve_cholesky(G_dev, w_dev, y_dev)
        +estimate_timeseries_batch(A, B, y, tbase_diff, weight_sqrt, min_norm_velocity, rcond, min_redundancy, inv_quality_name, chunk_size, solver, print_msg)
    }

    class TorchLibrary {
        +cuda.is_available()
        +cuda.mem_get_info()
        +as_tensor(data, dtype, device)
        +linalg.cholesky_ex(N)
        +cholesky_solve(r, L)
    }

    class CUDADevice {
        <<hardware>>
    }

    IfgramInversionCLI --> IfgramInversionPatch : passes solver\n& gpuChunkSize
    SmallbaselineAppDefaults --> IfgramInversionCLI : provides default\nconfig values

    IfgramInversionPatch ..> IfgramInversionGPUModule : imports\nestimate_timeseries_batch
    IfgramInversionGPUModule ..> TorchLibrary : uses
    TorchLibrary ..> CUDADevice : executes kernels on

    IfgramInversionPatch --> IfgramInversionGPUModule : uses when\nsolver != cpu

File-Level Changes

Change	Details	Files
Introduce a GPU-batched weighted least-squares solver for invert_network using PyTorch/CUDA and integrate it as an alternative solver path.	Add new module implementing batched normal-equations + Cholesky WLS solver on CUDA with VRAM-aware chunking, NaN handling, and rank-deficiency detection via torch.linalg.cholesky_ex. Provide estimate_timeseries_batch API that mirrors the CPU estimate_timeseries signature and returns timeseries, inversion quality, and observation counts. Implement solver availability checks, CUDA device probing, and automatic chunk-size selection based on free VRAM with configurable overrides.	`src/mintpy/ifgram_inversion_gpu.py`
Wire the GPU solver into ifgram inversion while preserving the existing CPU behavior as the default.	Extend run_ifgram_inversion_patch to accept solver and gpu_chunk_size parameters, dispatching to the GPU batch solver when solver!='cpu' and otherwise leaving the CPU code path intact. Pass solver and gpuChunkSize from CLI/template inputs into run_ifgram_inversion_patch via the options dictionary used for block-wise inversion. Ensure GPU path handles both weighted and unweighted inversions in one call and writes results back into the existing timeseries, inversion quality, and observation count arrays.	`src/mintpy/ifgram_inversion.py`
Expose configuration for choosing the WLS solver backend and GPU chunk size via CLI and templates, including defaults for smallbaselineApp.	Add --solver and --gpu-chunk-size options to the ifgram_inversion CLI, with choices {'cpu','torch'} and descriptive help text including requirements and behavior. Teach template reader to parse mintpy.networkInversion.solver and mintpy.networkInversion.gpuChunkSize keys and map them to inps.solver and inps.gpuChunkSize. Set explicit defaults in smallbaselineApp_auto.cfg and document the auto/CPU vs torch options and gpuChunkSize behavior in smallbaselineApp.cfg comments.	`src/mintpy/cli/ifgram_inversion.py` `src/mintpy/defaults/smallbaselineApp.cfg` `src/mintpy/defaults/smallbaselineApp_auto.cfg`
Add packaging and dependency wiring for an optional GPU extras group that pulls CUDA-enabled PyTorch.	Define a gpu optional-dependency group in pyproject.toml, sourcing from a new requirements-gpu.txt file. Introduce an empty requirements-gpu.txt placeholder to be populated with CUDA-enabled torch>=2.11, referenced by docs and install commands.	`pyproject.toml` `requirements-gpu.txt`
Document installation, configuration, and performance characteristics of the optional GPU solver and link it into existing docs.	Extend installation docs with a new section describing GPU prerequisites, [gpu] extras installation via PyTorch CUDA wheel indices, uv-specific notes, verification, and enabling the solver from CLI or template. Add a dedicated gpu.md page covering configuration, behavior notes (VRAM auto-sizing, rank-deficient handling, NaN treatment, no CPU fallback), and performance benchmarks for tutorial and large scenes. Link gpu.md from the main docs README and note its relation to Dask in the dask.md documentation.	`docs/installation.md` `docs/gpu.md` `docs/README.md` `docs/dask.md`
Add GPU-path tests that validate numerical equivalence with the CPU solver, chunk-size invariance, and error handling.	Create synthetic SBAS-like network generators and observation simulators to drive both CPU and GPU solvers on controlled fixtures, including optional NaN masking and weighting. Compare GPU estimate_timeseries_batch outputs against the per-pixel CPU estimate_timeseries reference over multiple scenarios (WLS/OLS, min_norm_velocity True/False) with tight float32 RMS tolerances, and assert identical observation counts. Test invariance across different GPU chunk sizes and verify that unsupported solver names raise ValueError; mark all tests to skip automatically when CUDA is unavailable.	`tests/test_ifgram_inversion_gpu.py`

Assessment against linked issues

Issue	Objective	Addressed
#1489	Add an opt-in CUDA-only PyTorch-based GPU backend for the invert_network step that accelerates per-pixel WLS/OLS inversion, while preserving existing CPU behavior as the default and keeping the original CPU code path unchanged.	✅
#1489	Provide configuration and packaging hooks for the GPU backend, including template and CLI options to select the solver and GPU chunk size, an optional [gpu] extras dependency group for installing CUDA-enabled PyTorch, and fail-fast behavior when CUDA/PyTorch are unavailable (no silent CPU fallback).	✅
#1489	Document the optional GPU backend (installation, enabling, tuning, behavior/performance notes) and add tests to validate numerical equivalence and behavior of the GPU path against the existing CPU implementation.	✅

Possibly linked issues

RFC: opt-in GPU backend for invert_network (torch.linalg.lstsq, CUDA) #1489: They are the same feature: an opt-in CUDA torch-based GPU backend for invert_network, now fully implemented with config and docs.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

s-sasaki-earthsea-wizard · 2026-05-18T15:23:58Z

Sharing additional benchmark data for the GPU torch solver. The PR description's Performance section reports per-step invert_network numbers on FernandinaSenDT128 and GalapagosSenDT128 (both ISCE2 / Sentinel-1 C-band). The bench has since been extended to 5 scenes spanning 4 InSAR processors (ISCE2, GMTSAR, ARIA, ROI_PAC) and 2 sensors / wavelengths (Sentinel-1 C-band, ALOS-1 L-band):

Scene	Processor / Sensor	Pixels	Ifgs (K)	Dates (D)	`invert_network` cpu wall	torch wall	speedup
FernandinaSenDT128	ISCE2 / S1 (C)	270 k	288	98	645.12 s	6.88 s	93.77×
GalapagosSenDT128	ISCE2 / S1 (C)	3.40 M	490	98	2976.72 s	79.40 s	37.49×
SanFranBaySenD42	GMTSAR / S1 (C)	326 k	1297	333	1080.38 s	17.42 s	62.02×
KujuAlosAT422F650	ROI_PAC / ALOS-1 (L)	226 k	167	24	31.01 s	4.53 s	6.85×
SanFranSenDT42	ARIA / S1 (C)	1.04 M	505	114	58.85 s	11.07 s	5.32×

Configuration: warm SSD, NVIDIA RTX 5080 (16 GiB), float32, mintpy.networkInversion.solver = torch. Each run was end-to-end smallbaselineApp.py (full 18-step pipeline), not a direct call to the solver — wall numbers above are extracted from /usr/bin/time -v capture of the invert_network segment. The cpu-only steps in the same pipeline (load_data, modify_network, correct_SET, correct_troposphere, deramp, save_hdfeos5) stay within ±5 % between the cpu and torch runs of each scene, serving as an I/O / cache control.

The 5 — 94× range is structurally driven by per-pixel solve cost (∝ K · D²): Kuju (K=167, D=24) sits at the floor, SanFranBay (K=1297, D=333) at the ceiling. The Fernandina and Galapagos figures above are consistent with the ~16× internal / ~4.5× step-wall (Fernandina) and ~44× internal / ~36× step-wall (Galapagos) numbers in the original PR description; the larger headline here reflects the warm-SSD scene root and the per-step wall extracted from a fresh end-to-end run with both cpu and torch using identical fixtures.

Numerical agreement: the float32 round-off gate (rms / |cpu|.max < 1e-5) is met for the user-visible final products (velocity.h5, geocoded outputs) in all 5 scenes. Two scenes (Kuju, SanFranSF) show divergence on radar-coordinate intermediate products at rms/scale 1 — 7 %, but Kuju's geocoded velocity (filtered through maskTempCoh.h5) passes at 1.38e-7 — consistent with the divergence being confined to pixels that the downstream maskTempCoh.h5 mask drops anyway. Diagnosed in the report as cpu scipy.linalg.lstsq min-norm fill vs torch cholesky_ex fill for near-rank-deficient masked-out pixels; making the radar-coord diff tool mask-aware is queued as a sibling-repo follow-up.

Full per-step wall breakdown, fixture parity verification (cpu and torch fixtures verified byte-identical except for the 2 mintpy.*.solver = torch lines), and the numerical comparison methodology are in the report:

→ reports/report_end_to_end_bench.md @ 0fbf71b

Dataset records (Zenodo):

FernandinaSenDT128 — https://zenodo.org/records/3952953
GalapagosSenDT128 — https://zenodo.org/records/4743058
SanFranBaySenD42 — https://zenodo.org/records/15814132
KujuAlosAT422F650 — https://zenodo.org/records/3952917
SanFranSenDT42 — https://zenodo.org/records/4265413

s-sasaki-earthsea-wizard added 3 commits May 6, 2026 18:52

yunjunz requested a review from huchangyang May 27, 2026 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opt-in torch GPU solver for invert_network#1490

Add opt-in torch GPU solver for invert_network#1490
s-sasaki-earthsea-wizard wants to merge 3 commits into
insarlab:mainfrom
s-sasaki-earthsea-wizard:gpu_torch_solver

s-sasaki-earthsea-wizard commented May 6, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot commented May 6, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

s-sasaki-earthsea-wizard commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

s-sasaki-earthsea-wizard commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of proposed changes

Implementation summary

Behavior notes

Design pivot vs the original RFC

Performance

Local validation

Disclosure

Reminders

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for GPU-backed invert_network execution

Class diagram for the new GPU-batched inversion solver and integration

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

s-sasaki-earthsea-wizard commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

s-sasaki-earthsea-wizard commented May 6, 2026 •

edited

Loading

sourcery-ai Bot commented May 6, 2026 •

edited

Loading