Skip to content

Water Level Import Reference.md

Kelsey Smuczynski edited this page Apr 17, 2026 · 4 revisions

Water Level Import Reference

This page documents the water-level CSV bulk-upload feature in detail: the import workflow, transaction model, field mappings, validation rules, and operational notes.

For a broader overview of all CSV import surfaces, see CSV Ingestion and Validation. To compare this importer with the well inventory importer, see Well Inventory Import Reference.


What This Importer Does

Both of these entry points call the same service: bulk_upload_water_levels(...).

  • CLI: oco water-levels bulk-upload --file <path>
  • API: POST /observation/groundwater-level/bulk-upload

The importer reads a CSV file, validates each row, finds an existing well by name, and then creates or updates:

  • field_event
  • field_activity
  • sample
  • observation
  • field_event_participant for non-empty staff columns
  • contact records for staff names when no matching name + organization row already exists

It also links each imported groundwater-level sample to the structured participant who matches measuring_person.

It does not create new wells.


High-Level Flow

The importer runs in this order:

  1. Read the CSV.
  2. Normalize header aliases.
  3. Check that all required headers are present.
  4. Validate each row.
  5. Look up the target well for each valid row.
  6. Create or update database records.
  7. Commit all successful rows at the end.

Rows with validation errors are collected in validation_errors. A bad row does not stop other rows from importing.


Workflow Overview

CSV Parsing and Header Standardization

The service accepts a file path, raw bytes, or a binary file-like object. CSV parsing uses Python's csv.DictReader.

Before validation, certain header aliases are mapped to their standard names:

Accepted alias Standard name
measurement_date_time water_level_date_time
sampler measuring_person
mp_height_ft mp_height

The following headers are silently ignored:

  • hold(not saved)
  • cut(not saved)

Any other unrecognized columns are also ignored — only known fields are processed.

If a row contains both an alias and its standard name, the importer keeps the later non-empty value in CSV column order. A later blank value does not overwrite an earlier non-blank one.

BOM warning: This importer does not handle UTF-8 files saved with a byte-order mark (utf-8-sig). Unlike the well inventory importer, a BOM will be left attached to the first header name, which will cause a missing-column error for that field.

Header validation

After aliases are normalized, the following required columns must be present:

  • well_name_point_id
  • field_event_date_time
  • field_staff
  • water_level_date_time
  • measuring_person
  • sample_method

Header validation runs before any data is written. If a required column is missing, the importer fails and produces file-level errors like CSV missing required column 'field_staff'.

Row validation

Each row goes through two validation phases:

Format validation (WaterLevelCsvRow):

  • parses timestamps and optional numeric fields
  • checks that enum fields contain allowed values
  • enforces cross-field constraints.

Database validation:

  • looks up well_name_point_id to confirm it matches exactly one water well
  • determines the effective measuring-point height
  • checks the corrected depth-to-water against the well depth.

Rows that fail validation are collected into validation_errors; other rows continue processing.

Record creation and update

For each valid row, the importer:

  1. Builds a sample name using a fixed pattern: <well.name>-WL-<UTC YYYYMMDDHHMM>.
  2. Checks for an existing groundwater-level sample for that well with the same sample name.
  3. Normalizes field_staff, field_staff_2, and field_staff_3 into an ordered participant list without changing the CSV input shape.
  4. Creates or updates records in field_event, field_activity, sample, and observation. Records are created if no match exists and updated if one does.
  5. Creates or reuses structured participant data for each non-empty staff value.
  6. Sets sample.field_event_participant_id to the unique participant matching measuring_person.

The groundwater-level Parameter is looked up once per file. If no matching parameter exists, the importer creates one with:

  • parameter_name = groundwater level
  • matrix = groundwater
  • parameter_type = Field Parameter
  • default_unit = ft

Every imported sample is stored with:

  • sample_matrix = groundwater
  • qc_type = Normal.

Well lookups are cached during the run, so repeated rows for the same well do not repeat the database query.

If measuring_person cannot be matched to exactly one participant for the field event, the importer rejects that row with a human-readable row error.


Response and exit behavior

The service returns a response with the following structure:

  • summary.total_rows_processed
  • summary.total_rows_imported
  • summary.validation_errors_or_warnings
  • water_levels
  • validation_errors

Exit behavior depends on the outcome:

Outcome CLI exit code API status
At least one row imported (even with other failures) 0 200 OK
Zero rows imported and errors present 1 400 Bad Request
Uploaded file is empty 400 Bad Request

Transaction Model

The water-level importer is not all-or-nothing at the row level, but it still uses one final commit for the file.

  • One SQLAlchemy session is opened for the entire file
  • Header and row validation run before any inserts or updates
  • All record writes run inside a single outer transaction
  • Each row gets its own savepoint (session.begin_nested()), so a failure on one row rolls back only that row
  • Successfully written rows stay pending in the outer transaction until a final session.commit()

Important consequence: other database sessions do not see imported rows until the final commit. If that final commit fails, all rows from that file roll back, including rows that had succeeded earlier.


Rerun Behavior

The water-level importer uses update-on-match behavior. This differs from the well inventory importer, which skips records that already exist.

How matches are identified: well name + activity type groundwater level + sample name <well.name>-WL-<UTC YYYYMMDDHHMM>

Because the sample name is derived from the measurement time converted to UTC, matching is UTC-based — not based on the raw timestamp string from the CSV.

When a matching sample is found:

  • The existing Sample record is reused
  • The linked FieldEvent date and notes are updated
  • FieldActivity.notes is updated
  • Structured participants are created or reused idempotently for the imported staff names
  • sample.field_event_participant_id is set or updated to the unique participant matching measuring_person
  • The groundwater-level Observation is updated if it exists
  • If the sample exists but has no groundwater-level observation, a new one is created without affecting other observations on that sample

If two rows in the same file share the same well and measurement time (after UTC conversion), they point to the same sample — the later row overwrites the values written by the earlier one.


Field Mapping Reference

CSV Column Schema Field DB Table DB Column Validation / Processing Write Behavior
well_name_point_id well_name_point_id thing name Required. Must resolve to exactly one water well. Lookup only — no well record is created or modified.
field_event_date_time field_event_date_time field_event event_date Required. ISO 8601 datetime. Timestamps without a timezone are assumed America/Denver, converted to UTC. Sets or updates the field-event timestamp.
field_staff field_staff contact, field_event_participant name, participant_role Normalized with additional staff fields into an ordered participant list. Resolves or creates a Contact for the staff name and creates or reuses a FieldEventParticipant with role Lead.
field_staff_2 field_staff_2 contact, field_event_participant name, participant_role Optional. Blank → None. Resolves or creates a Contact and creates or reuses a FieldEventParticipant with role Participant.
field_staff_3 field_staff_3 none none Optional. Blank → None. Resolves or creates a Contact and creates or reuses a FieldEventParticipant with role Participant.
water_level_date_time / measurement_date_time water_level_date_time sample sample_date Required. Alias via measurement_date_time. Timestamps without a timezone are assumed America/Denver, converted to UTC. Sets or updates the sample timestamp.
water_level_date_time / measurement_date_time water_level_date_time sample sample_name Same processing as above. Derives <well>-WL-<UTC YYYYMMDDHHMM>.
water_level_date_time / measurement_date_time water_level_date_time observation observation_datetime Same processing as above. Sets or updates the observation timestamp.
measuring_person / sampler measuring_person sample field_event_participant_id Required. Must match one of field_staff, field_staff_2, or field_staff_3. Alias via sampler. Links the sample to the unique structured participant matching the measuring person.
sample_method sample_method sample sample_method Required. Must be one of the allowed values (see below). Case-insensitive. Short aliases accepted (see below). Sets or updates the sample method.
sample_method sample_method sample sample_matrix n/a Always stored as groundwater.
sample_method sample_method sample qc_type n/a Always stored as Normal.
mp_height / mp_height_ft mp_height observation measuring_point_height Optional float. Alias via mp_height_ft. Falls back to the well's current measuring-point height if blank. Saves the measuring-point height used for the observation.
depth_to_water_ft depth_to_water_ft observation value Optional float. Must be >= 0 when present. Stored as the groundwater-level measurement value.
depth_to_water_ft depth_to_water_ft observation unit n/a Always stored as ft.
level_status level_status observation groundwater_level_reason Required when depth_to_water_ft is blank. Must be one of the allowed values; short aliases accepted (see below). Stored in a structured column, not as notes text.
data_quality data_quality observation nma_data_quality Optional. Must be one of the allowed values. Stored in a structured column, not as notes text.
water_level_notes water_level_notes field_event notes Optional. Blank → None. Stored as the field-event freeform notes only. Staff names are not written into note text.
water_level_notes water_level_notes sample notes Optional. Blank → None. Stored on the sample.
water_level_notes water_level_notes observation notes Optional. Blank → None. Stored on the observation.

The parameter table is also updated indirectly (see Record creation and update).

Tables not written by this workflow: note. The importer uses table note/notes columns, but it does not create rows in the polymorphic note table.


Validation Rules

Datetime handling

  • field_event_date_time and water_level_date_time accept ISO 8601 strings
  • Timestamps without a timezone are assumed to be America/Denver and converted to UTC before writing
  • water_level_date_time must be greater than or equal to field_event_date_time

Blank value handling

  • Optional text fields: blank becomes None.
  • Optional numeric fields: blank becomes None.
  • Required fields: blank after trimming is rejected.

Enum-backed fields

sample_method, level_status, and data_quality are validated against allowed-value lists built from lexicon categories in core/enums.py. Matching is case-insensitive and whitespace-tolerant. The standardized value is written to the database regardless of how it was entered in the CSV.

sample_method accepted values:

CSV value Stored as
electric tape Electric tape measurement (E-probe)
steel tape Steel-tape measurement

level_status accepted values:

CSV value Stored as
dry Site was dry
obstructed / obstruction Obstruction was encountered in the well (no level recorded)
flowing Site was flowing. Water level or head couldn't be measured w/out additional equipment.
flowing recently Site was flowing recently.
pumped Site was being pumped
pumped recently Site was pumped recently
not affected Water level not affected
other Other conditions exist that would affect the level (remarks)

Cross-field rules

  • At least one of depth_to_water_ft or level_status must be present per row
  • measuring_person must match one of field_staff, field_staff_2, or field_staff_3
  • measuring_person must resolve to exactly one structured participant for the field event; duplicate matching staff names on the same row cause the row to fail
  • depth_to_water_ft must be >= 0 when present — negative values are rejected

Well lookup

  • well_name_point_id must match exactly one water well — zero or multiple matches both produce a row-level error
  • Matched wells are cached by name for the duration of the file run

Measuring-point height and depth check

The measuring-point height used for an observation is determined in this order:

  1. CSV mp_height
  2. Existing Thing.measuring_point_height from the database
  3. None

If both the CSV and the well record have a measuring-point height and the values differ, the CSV value is used and a warning is added to validation_errors.

When depth_to_water_ft, a measuring-point height, and a well depth are all present, the importer checks:

depth_to_water_ft - measuring_point_height < well_depth

Rows that fail this check are rejected.


Summary and Warning Semantics

summary.validation_errors_or_warnings is not a simple count of entries in validation_errors.

The service counts the number of unique row numbers found in messages matching the pattern Row N: .... This means:

  • Multiple errors for the same row count as one
  • Warning rows (e.g., measuring-point height mismatches) are counted the same way as error rows
  • Messages that are not scoped to a specific row fall back to a simple message count

Both warnings and errors are reported in the same validation_errors array.


Operational Notes

File handling behavior

Behavior Detail
Extra columns Ignored
Column order Does not matter
Alias headers Accepted
Empty file (API) Rejected with 400 Uploaded file is empty before parsing
BOM (utf-8-sig) Not handled — causes a missing-column error for the first field
Duplicate headers Not explicitly rejected — behavior is undefined if the same standard column name appears more than once
Row count limit None enforced
Delimiter validation None — files using semicolons or tabs as delimiters will surface as missing required-column errors

Differences from the well inventory importer

Dimension Water level importer Well inventory importer
Rerun behavior Updates existing records Skips existing records
Transaction model Best-effort per row (savepoints) Best-effort per row (savepoints)
Creates wells or locations No — links to existing wells only Yes
Creates contacts or notes Reuses or creates contact rows for staff names. Does not create rows in the polymorphic note table. Creates contacts and note-backed well metadata.
Structured field participants Yes. Creates or reuses field_event_participant rows and sets sample.field_event_participant_id when measuring_person resolves uniquely. Yes for field_event_participant rows during well inventory import, but does not currently set sample.field_event_participant_id on the groundwater-level sample it creates.
BOM support No Yes (utf-8-sig)
Row count limit None 2,000
level_status / data_quality storage Structured columns Notes text

Known Gaps

  • The importer assumes measurements are depth-to-water values.
  • Elevation-based and pressure-transducer bulk uploads are not supported through this flow.
  • The bulk upload can create or update records, but it cannot delete them.
  • The CSV/schema contract still supports only three staff columns (field_staff, field_staff_2, field_staff_3), even though the importer now normalizes them internally into an iterable participant list.
  • Reruns update matching records in place, but there is no explicit policy for partial field changes beyond normal update behavior.

Clone this wiki locally