Skip to content

Redesign key quantity extraction/insertion pipeline#205

Open
sbreitbart-NOAA wants to merge 23 commits intomainfrom
redesign-kqs
Open

Redesign key quantity extraction/insertion pipeline#205
sbreitbart-NOAA wants to merge 23 commits intomainfrom
redesign-kqs

Conversation

@sbreitbart-NOAA
Copy link
Collaborator

Since we've redesigned the figures and tables to be more flexible on the user's end (i.e., they can more powerfully customize what their plots show by specifying plot arguments), the existing pipeline to extract key quantities and then insert them into the captions/alt text template doesn't work for several quantities anymore. Basically, key quantities need to reflect the users' data summarization, grouping specifications, filtering, etc.

To fix this, I've begun redesigning this pipeline so that key quantities are:

  1. extracted in each plotting function (rather than most at once with write_captions(), and then the remainders with add_more_key_quants())
  2. exported to a csv listing ALL key quantities
  3. inserted into a csv with captions and alt text for figures and tables.

This pipeline should also resolve issues of key quantity calculation dependency, where the calculation of one kq depends on the prior calculation of another. If they're all calculated and stored in a single place, all can be referenced for future calculations of other quantities.

This will also facilitate sharing the kqs with SIS.

Ultimately, write_captions() and add_more_key_quants() should become redundant and removed.

@Schiano-NOAA I've just made a pilot draft focused on plot_landings(), which was a simple example. Could you please take a look and let me know if this is on the right track?

@github-actions
Copy link
Contributor

New version checklist

  • Package version in DESCRIPTION has been updated
  • Release notes have been drafted/published
  • Cheatsheet content has been updated (if applicable)
  • Cheatsheet version has been updated


### Make RDA ----
if (make_rda) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this chunk going to be unique to all plots? Like we won't be able to generalize this and make it more concise?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 102-108

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 102-108 are where key quantities are calculated for this plot (and I'd expect this is where each plot/table's KQs will be calculated too). The next couple of functions are where the generalization happens: in the exporting of the KQs into a csv, and the insertion of the KQs into the alt text/captions csv.

@Schiano-NOAA
Copy link
Collaborator

Schiano-NOAA commented Mar 16, 2026

I think this is a good start! Minor questions/comments in the code base.

I also noticed that there is an extra space after the year quantity and it would be nice to have similar kqs grouped together. In the example below, the year kqs are not placed together.

image

Oh! Also, please add the new csv created to the gitignore

@Schiano-NOAA
Copy link
Collaborator

Schiano-NOAA commented Mar 16, 2026

Also, what is the use of the key_quantities.csv for the user/function? Could it be a temporary file? Maybe you can make it an excel file and use tabs or something? Just a thought in order to reduce the number of files a user has to keep track of.

@sbreitbart-NOAA
Copy link
Collaborator Author

Also, what is the use of the key_quantities.csv for the user/function? Could it be a temporary file? Maybe you can make it an excel file and use tabs or something? Just a thought in order to reduce the number of files a user has to keep track of.

It's a file where they can find their model's key quantities. I think it should be permanent, especially because it could be useful when sending information to SIS. They won't have to edit it unless they need to add or edit a key quantity, but I don't expect they'll need to do that very often

@sbreitbart-NOAA
Copy link
Collaborator Author

I think this is a good start! Minor questions/comments in the code base.

I also noticed that there is an extra space after the year quantity and it would be nice to have similar kqs grouped together. In the example below, the year kqs are not placed together.
image

Oh! Also, please add the new csv created to the gitignore

Thanks for your review! I'll group the KQs and remove that space.

…accurate when relative = TRUE; update plot_indices() documentation
Copy link
Collaborator

@Schiano-NOAA Schiano-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it functions well and is a nice adjustment to the extrraction and replacement of key quantities. I am highly advocating to use a function to extract the quantities instead of repeating the same extraction for each function. I think it could be done as such:

# data = either returned data frame from process_data or a ggplot
# x = name of x axis column
# y = name of y-axis column
# additional_columns = a string of names of columns that are also included within the plot
# topic = label found in the key quantities template to be replaced
extract_quantities <- function(data, x, y, additional_columns = NULL, topic = "label") {
  if (class(x)[1] == "gg" | class(x)[2] == "ggplot") {
    # this indicates that the plot is relative
    my_quantities <- list(
     glue::glue("{label}_{y}_min") = ggplot2::ggplot_build(final)@data[[2]] |>
        as.data.frame() |>
        dplyr::pull(y) |>
        min() |>
        round(digits = 2)
      glue::glue("{label}_{y}_max") = ggplot2::ggplot_build(final)@data[[2]] |>
        as.data.frame() |>
        dplyr::pull(y) |>
        max() |>
        round(digits = 2)
    )
  } else {
    # indicate plot not relative
    my_quantities <- list(
      glue::glue("{label}_{x}_min") = min(data[[x]])
      glue::glue("{label}_{x}_max")  = max(data[[x]])
      glue::glue("{label}_{y}_min")  = min(data[[y]])
      glue::glue("{label}_{y}_max") = max(data[[y]])
    )
    if (!is.null(additional_columns)) {
      for (i in seq_along(additional_columns) {
        append(my_quantities,
          glue::glue("{label}_{i}_min") = min(data[[i]])
          glue::glue("{label}_{i}_max") = max(data[[i]])
        )
      } # close for loop
    } # close if add col
  } # close ifelse
  # potentially add check if rounding is needed and perform the round(x, digits = 3)
  my_quantities 
} # close fxn

quantities <- extract_quantities(data, x = "age", y = "year", additional_columns = "estimate", topic = "pop.baa")

# calculate & export key quantities
# need to change fxn to accept list
export_kqs(quantities)
    
# Add key quantities to captions/alt text
# need to change fxn to accept list
insert_kqs(quantities)

Hopefully this is a start to this fxn to build off of. I have not tested it and wrote it in this comment, so apologies if it's a little wack 😆

Thanks for doing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants