Hello,
I am trying to scrape the metrics of my Juju controller charm using Opentelemetry Collector (inside my controller model) and then remote write them to a Cos Lite instance over a CMR. The problem is, when I look at what the controller has written to the relation databag with Otelcol, it seems like the prometheus_scrape_unit_address is not reachable by the collector. In fact, that host doesn't seem to exist on my machine.
Here is my status:
Model Controller Cloud/Region Version SLA Timestamp
controller ck8s ck8s 3.6.11 unsupported 17:31:13-05:00
SAAS Status Store URL
prom active ck8s admin/cos-lite.prom
App Version Status Scale Charm Channel Rev Address Exposed Message
controller active 1 juju-controller 3.6/stable 116 yes
otel 0.130.1 active 1 opentelemetry-collector-k8s 2 10.152.183.77 no
Unit Workload Agent Address Ports Message
controller/0* active idle 10.1.0.71 37017/TCP
otel/0* active idle 10.1.0.184
Offer Application Charm Rev Connected Endpoint Interface Role
controller controller juju-controller 116 1/1 metrics-endpoint prometheus_scrape provider
Integration provider Requirer Interface Type Message
controller:metrics-endpoint otel:metrics-endpoint prometheus_scrape regular
prom:receive-remote-write otel:send-remote-write prometheus_remote_write regular
The unit IP for the controller, as you can see, is 10.1.0.71. I expect to see the same IP as a scrape target in the Otelcol config. However, in the config, the relevant config block shows:
prometheus/metrics-endpoint/otel/0:
config:
scrape_configs:
- basic_auth:
password: foobar
username: foobar
job_name: juju_controller_57d71968_controller_prometheus_scrape-0
metrics_path: /introspection/metrics
relabel_configs:
- regex: (.*)
separator: _
source_labels:
- juju_model
- juju_model_uuid
- juju_application
- juju_unit
target_label: instance
scheme: https
scrape_interval: 1m
scrape_timeout: 10s
static_configs:
- labels:
juju_application: controller
juju_charm: juju-controller
juju_model: controller
juju_model_uuid: 57d71968-7396-4b74-8afb-c935e3770802
juju_unit: controller/0
targets:
- 10.1.0.241:17070
tls_config:
ca_file: /etc/ssl/certs/otel_juju_controller_57d71968_controller_prometheus_scrape_0_ca.pem
insecure_skip_verify: false
server_name: juju-apiserver
Otelcol fails to scrape the controller because 10.1.0.241 is not actually a valid target. If I retry with 10.1.0.71 , it works.
The relevant LOC seems to be
|
metrics_endpoint = MetricsEndpointProvider( |
|
self, |
|
jobs=[{ |
|
"metrics_path": "/introspection/metrics", |
|
"scheme": "https", |
|
"static_configs": [{ |
|
"targets": [ |
|
f'*:{api_port}' |
|
] |
|
}], |
|
"basic_auth": { |
|
"username": f'user-{username}', |
|
"password": password, |
|
}, |
|
"tls_config": { |
|
"ca_file": self.ca_cert(), |
|
"server_name": "juju-apiserver", |
|
}, |
|
}], |
|
) |
10.1.0.241 seems to be coming from
|
def _set_unit_ip(self, _=None): |
|
"""Set unit host address. |
|
|
|
Each time a metrics provider charm container is restarted it updates its own |
|
host address in the unit relation data for the prometheus charm. |
|
|
|
The only argument specified is an event, and it ignored. This is for expediency |
|
to be able to use this method as an event handler, although no access to the |
|
event is actually needed. |
|
""" |
|
for relation in self._charm.model.relations[self._relation_name]: |
|
unit_ip = str(self._charm.model.get_binding(relation).network.bind_address) |
|
|
|
# TODO store entire url in relation data, instead of only select url parts. |
|
|
|
if self.external_url: |
|
parsed = urlparse(self.external_url) |
|
unit_address = parsed.hostname |
|
path = parsed.path |
|
elif self._is_valid_unit_address(unit_ip): |
|
unit_address = unit_ip |
|
path = "" |
|
else: |
|
unit_address = socket.getfqdn() |
|
path = "" |
but why is the bind address not reachable?
I'd appreciate any thoughts or ideas here.
Why is the unit IP passed here and instead a wild card is passed?
Related issues:
Hello,
I am trying to scrape the metrics of my Juju controller charm using Opentelemetry Collector (inside my controller model) and then remote write them to a Cos Lite instance over a CMR. The problem is, when I look at what the controller has written to the relation databag with Otelcol, it seems like the
prometheus_scrape_unit_addressis not reachable by the collector. In fact, that host doesn't seem to exist on my machine.Here is my status:
The unit IP for the controller, as you can see, is
10.1.0.71. I expect to see the same IP as a scrape target in the Otelcol config. However, in the config, the relevant config block shows:Otelcol fails to scrape the controller because
10.1.0.241is not actually a valid target. If I retry with10.1.0.71, it works.The relevant LOC seems to be
juju-controller/src/charm.py
Lines 87 to 106 in abb5f63
10.1.0.241seems to be coming fromjuju-controller/lib/charms/prometheus_k8s/v0/prometheus_scrape.py
Lines 1543 to 1567 in abb5f63
but why is the bind address not reachable?
I'd appreciate any thoughts or ideas here.
Why is the unit IP passed here and instead a wild card is passed?
Related issues:
ca_filenot properly configured canonical/opentelemetry-collector-k8s-operator#132