Skip to content

Update cloud instance type history ingestion process.#2160

Merged
eiffel777 merged 8 commits intoubccr:mainfrom
eiffel777:remove-groupby-cloud-staging
Mar 3, 2026
Merged

Update cloud instance type history ingestion process.#2160
eiffel777 merged 8 commits intoubccr:mainfrom
eiffel777:remove-groupby-cloud-staging

Conversation

@eiffel777
Copy link
Copy Markdown
Contributor

This PR does two things.

  1. Changes the ingestion steps for creating the configuration history for an instance type. Now that we use a version of MariaDB that supports window functions we can use those to do create it instead of using PHP.
  2. Add a clause to the join statement in staging_event ingestor that joins to the instance_type table to get the correct instance type for the time the event happened. We are seeing slowness in this query because without the time clause because it joins to all rows that have the same instance_type, cpus, memory_mb, and disk_gb, which can be a lot in some cases.

A couple of notes.

  • instance_type_change_flag.json - uses LAG to set a 1 or 0 to denote if that row is where a change in configuation occurred
  • instance_type_config_group.json - uses the is_change column from modw_cloud.instance_type_change_flag and a window function to set a mark each different configuration for an instance type
  • instance_type_grouped.json - Groups all instance types using the config group to get a MIN start time for each configuration. Uses MAX on display and description columns to make it compliant with the ONLY_FULL_GROUP_BY mode. Since these column should have the same value within each group it should always have the correct value.
  • instance_type_staging.json - Sets the end time for each instance type configuration using LEAD.
  • The CloudInstanceTypeStateIngestor ingestor and it's associated test were deleted because they are no longer needed.

Tests performed

Tested in docker

Checklist:

  • The pull request description is suitable for a Changelog entry
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request

@eiffel777 eiffel777 added this to the 11.5.0 milestone Jan 29, 2026
@eiffel777 eiffel777 self-assigned this Jan 29, 2026
@eiffel777 eiffel777 added Category:ETL Extract Transform Load Category:Cloud Cloud Realm labels Jan 29, 2026
"disk_gb": "staging.disk_gb",
"start_time": "staging.start_time",
"end_time": -1
"end_time": "CASE WHEN LEAD(staging.start_time) OVER (PARTITION BY staging.resource_id, staging.instance_type ORDER BY staging.start_time) IS NOT NULL THEN LEAD(staging.start_time) OVER (PARTITION BY staging.resource_id, staging.instance_type ORDER BY staging.start_time) - 0.000001 ELSE UNIX_TIMESTAMP(DATE_ADD(TIMESTAMP(CURDATE()), INTERVAL '23:59:59' HOUR_SECOND)) END"
Copy link
Copy Markdown
Member

@jpwhite4 jpwhite4 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the data type of end_time? The - 0.000001 seems like an unsual offset to add since the else statement is a UNIX_TIMESTAMP() which is going to be to the nearest INT

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the semantics of end_time? Is it the closed or open interval end?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpwhite4 The data type for end_time is decimal(16,6) hence the 6 decimal places. This is because in the cloud log files the event time is to 6 decimal places. Also, end_time is a closed interval.

@eiffel777 eiffel777 merged commit 03d2286 into ubccr:main Mar 3, 2026
5 checks passed
@eiffel777 eiffel777 modified the milestones: 11.5.0, ACCESS 11.0.2 p4 Mar 27, 2026
aaronweeden pushed a commit to aaronweeden/xdmod that referenced this pull request Mar 30, 2026
…ging

Update cloud instance type history ingestion process.
aaronweeden pushed a commit to aaronweeden/xdmod that referenced this pull request Mar 30, 2026
…ging

Update cloud instance type history ingestion process.
aaronweeden added a commit that referenced this pull request Mar 31, 2026
* Merge pull request #2072 from eiffel777/add-memory-instance-state-machine

Add memory to cloud instance type ingestor sorting to prevent unique key errors

* Merge pull request #2084 from eiffel777/add-straight-join-metrics-explorer

Adding STRAIGHT_JOIN to metrics explorer/usage tab queries to improve performance

* Merge pull request #2160 from eiffel777/remove-groupby-cloud-staging

Update cloud instance type history ingestion process.

---------

Co-authored-by: Greg Dean <gmdean@buffalo.edu>
@aaronweeden aaronweeden modified the milestones: ACCESS 11.0.2 p4, 11.5.0 Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Category:Cloud Cloud Realm Category:ETL Extract Transform Load

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants