We find that with our current deployment, even though the scrape time is under 2.5 seconds, HTTP GET on the /metrics endpoint will just fail if mtail has more than 70k metrics. There are no errors in the logs, no issues, just a failed response and a connection close after 2 seconds. Reducing the metric count appears to restore operation
However, we need more metrics.
We find that with our current deployment, even though the scrape time is under 2.5 seconds, HTTP GET on the /metrics endpoint will just fail if mtail has more than 70k metrics. There are no errors in the logs, no issues, just a failed response and a connection close after 2 seconds. Reducing the metric count appears to restore operation
However, we need more metrics.