feat(nimbus): Add enrollment alert data GCS reader and Celery fetch task#15035
feat(nimbus): Add enrollment alert data GCS reader and Celery fetch task#15035yashikakhurana wants to merge 3 commits intomainfrom
Conversation
| "fetch_monitoring_data": { | ||
| "task": "experimenter.jetstream.tasks.fetch_monitoring_data", | ||
| "schedule": crontab(minute=0, hour=8), | ||
| "options": {"expires": 3600}, |
There was a problem hiding this comment.
What's this expiration option do?
There was a problem hiding this comment.
timeout is 1 hour for this request
There was a problem hiding this comment.
Is this useful here? We don't use it for any of the other tasks, and I can't really tell from the Celery docs what scenario this parameter is trying to handle.
If we do want this, I think a more appropriate timeout would be ~5min. Were you seeing really long execution times in testing this? Or is there another reason this should be 1hr?
| }, | ||
| "fetch_monitoring_data": { | ||
| "task": "experimenter.jetstream.tasks.fetch_monitoring_data", | ||
| "schedule": crontab(minute=0, hour=8), |
There was a problem hiding this comment.
This is fine, but be aware that there will be times when the data isn't available by 8 (esp if this is UTC, which I think it is?). It might be nice to have a way to trigger this manually in those cases, but I think it's ok to worry about it later if it happens enough to be worth fixing.
There was a problem hiding this comment.
makes sense, let me note down it somewhere so that we don't miss it
There was a problem hiding this comment.
It doesn't look like we're actually using these constants yet, can you wait to include them on the PR where they are actually needed?
There was a problem hiding this comment.
okay I can remove these, just added as was prepping for the next task
| except Exception as e: | ||
| logger.error(f"Failed to load monitoring data from GCS: {e}") | ||
| return {} |
There was a problem hiding this comment.
I think we should just raise this exception and let the task handle it along with any logging. That fits with what we're doing in the other tasks, and it also doesn't seem useful to overwrite yesterday's data with blank data if there was an error getting the new data.
| logger.warning("No enrollment alert data found in GCS") | ||
| metrics.incr("fetch_monitoring_data.completed") |
There was a problem hiding this comment.
Should this be an error log and a .failed metric status?
|
|
||
| self.assertEqual(result, {}) | ||
|
|
||
| @patch("experimenter.jetstream.client.load_data_from_gcs") |
There was a problem hiding this comment.
I think you can patch these for everything without annotating every function by creating a fixture like this, and having it take a parameter to set the return value dynamically.
| result = get_monitoring_data() | ||
|
|
||
| self.assertEqual(result, {}) |
There was a problem hiding this comment.
Yea see my other comment on this, but I really think this should result in an exception from get_monitoring_data that is handled by the task.
| ): | ||
| mock_get.return_value = experiment | ||
| # Should not raise, should log and continue | ||
| tasks.fetch_monitoring_data() |
There was a problem hiding this comment.
I'm not totally sure this test is necessary, but I guess it ensures that a random exception doesn't break the task?
Can we test for something here, like that the status or log occurs?
| try: | ||
| experiment = NimbusExperiment.objects.get( | ||
| slug=exp_slug, | ||
| status=NimbusConstants.Status.LIVE, |
There was a problem hiding this comment.
This should work for COMPLETE also, no?
| @patch("experimenter.jetstream.tasks.get_monitoring_data") | ||
| def test_fetch_monitoring_data_updates_live_experiment(self, mock_get_data): | ||
| experiment = NimbusExperimentFactory.create( | ||
| status=NimbusExperiment.Status.LIVE, |
There was a problem hiding this comment.
Maybe parametrize this so it takes both LIVE and COMPLETE.
Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com>
Because
This commit
monitoring_datafieldFixes #15024 #15036