Context
After #172, the adjacency zarr will contain flowpath physical attributes (length_m, slope, top_width, side_slope, muskingum_x). Currently, these are read from the GeoPackage via gpd.read_file() in both geodataset constructors:
lynker_hydrofabric.py:72-75 — reads layer flowpath-attributes-ml
merit.py:71-74 — reads the full shapefile
This creates a hard runtime dependency on the GeoPackage.
Proposal
Refactor both geodataset __init__ methods to read flowpath attributes from the adjacency zarr instead of the GeoPackage.
Changes in lynker_hydrofabric.py
- Replace
gpd.read_file(self.cfg.data_sources.geospatial_fabric_gpkg, layer="flowpath-attributes-ml") with reads from self.conus_adjacency["length_m"], self.conus_adjacency["slope"], etc.
self.flowpath_attr currently stores a GeoDataFrame indexed by id. Replace with a dict/DataFrame built from zarr arrays indexed by order.
- Update
_build_common_tensors to use the new data source.
- The
phys_means computation (lines 77-84) should read from the same zarr arrays.
Changes in merit.py
- Same pattern: replace
gpd.read_file() with zarr reads.
- MERIT currently uses
lengthkm and slope only. Other fields are empty/constant.
Changes in _build_common_tensors
- Both implementations index into
self.flowpath_attr using wb_ids (Lynker) or compressed_merit_ids (MERIT) via .reindex(). The refactored version should use array indexing into the zarr order array instead.
Files to modify
src/ddr/geodatazoo/lynker_hydrofabric.py
src/ddr/geodatazoo/merit.py
Acceptance criteria
Depends on
Blocks
Context
After #172, the adjacency zarr will contain flowpath physical attributes (
length_m,slope,top_width,side_slope,muskingum_x). Currently, these are read from the GeoPackage viagpd.read_file()in both geodataset constructors:lynker_hydrofabric.py:72-75— reads layerflowpath-attributes-mlmerit.py:71-74— reads the full shapefileThis creates a hard runtime dependency on the GeoPackage.
Proposal
Refactor both geodataset
__init__methods to read flowpath attributes from the adjacency zarr instead of the GeoPackage.Changes in
lynker_hydrofabric.pygpd.read_file(self.cfg.data_sources.geospatial_fabric_gpkg, layer="flowpath-attributes-ml")with reads fromself.conus_adjacency["length_m"],self.conus_adjacency["slope"], etc.self.flowpath_attrcurrently stores a GeoDataFrame indexed byid. Replace with a dict/DataFrame built from zarr arrays indexed byorder._build_common_tensorsto use the new data source.phys_meanscomputation (lines 77-84) should read from the same zarr arrays.Changes in
merit.pygpd.read_file()with zarr reads.lengthkmandslopeonly. Other fields are empty/constant.Changes in
_build_common_tensorsself.flowpath_attrusingwb_ids(Lynker) orcompressed_merit_ids(MERIT) via.reindex(). The refactored version should use array indexing into the zarrorderarray instead.Files to modify
src/ddr/geodatazoo/lynker_hydrofabric.pysrc/ddr/geodatazoo/merit.pyAcceptance criteria
gpd.read_file()for flowpath attributes is removed from both constructorsgeopandasimport needed for routing (only for plotting)Depends on
Blocks