Skip to content

On Platform BareMetal UserManaged load balancer configuration are not rendered in AgentClusterInstall #10108

@mjovanovic0

Description

@mjovanovic0

When deploying openshift cluster with platform: baremetal you cannot have same VIP address for both API and Ingress even when setting loadbalancer: type: UserManaged.

For example, following snippet in install-config.yaml:

platform:
  baremetal:
    apiVIPs:
      - 10.0.10.100
    ingressVIPs:
      - 10.0.10.100

will produce validation error:

$ openshift-install-linux agent create image
INFO Configuration has 3 master replicas, 0 arbiter replicas, and 4 worker replicas 
ERROR failed to write asset (Agent Installer ISO) to disk: cannot generate ISO image due to configuration errors 
FATAL failed to fetch Agent Installer ISO: failed to load asset "Install Config": invalid install-config configuration: platform.baremetal.apiVIPs: Invalid value: "10.0.10.100": VIP for API must not be one of the Ingress VIPs

Which is expected. But uppon further investigation of the installer code:
https://github.com/openshift/installer/blob/release-4.20/pkg/types/validation/installconfig.go#L971

If you update snippet in install-config.yaml to:

platform:
  baremetal:
    loadBalancer:
      type: UserManaged
    apiVIPs:
      - 10.0.10.100
    ingressVIPs:
      - 10.0.10.100

Then install succesfully generate agent.x86_64.iso file that you can boot up on the node(s).

$ openshift-install-linux agent create image
INFO Configuration has 3 master replicas, 0 arbiter replicas, and 4 worker replicas 
INFO The rendezvous host IP (node0 IP) is 10.0.10.10
INFO Extracting base ISO from release payload     
INFO Verifying cached file                        
INFO Using cached Base ISO /root/.cache/agent/image_cache/coreos-x86_64.iso 
INFO Consuming Install Config from target directory 
INFO Consuming Agent Config from target directory 
INFO Generated ISO at agent.x86_64.iso.   

But when that ISO image is booted on the node, the assisted-service reports following validation violation:

  • The IP address "10.0.10.100" appears both in apiVIPs and ingressVIPs

https://github.com/openshift/assisted-service/blob/c616cdc/internal/network/machine_network_cidr.go#L254

Uppon further digging around assisted-service codebase, there are code paths to have UserManaged load balancer that will not enforce different VIPs for both API and Ingress.
https://github.com/openshift/assisted-service/blob/c616cdc/internal/cluster/validations/validations.go#L525

It checks the file on node /etc/assisted/manifests/agent-cluster-install.yaml if

apiVersion: extensions.hive.openshift.io/v1beta1
kind: AgentClusterInstall
spec:
...
  platformType: BareMetal
  loadBalancer:
    type: UserManaged

is defined inside, but that stanza is not defined even though it was defined in initial install-config.yaml (probably not copied during rendering of manifests in installer iso image creation process).

It seems that in this place it's missing copy of load balancer definition into AgentClusterInstall manifest:
https://github.com/openshift/installer/blob/release-4.20/pkg/asset/agent/manifests/agentclusterinstall.go#L307

Version of openshift-installer:

openshift-install-linux 4.20.1
built from commit e23807689ec464da30e771dda70fd8989680a011
release image quay.io/openshift-release-dev/ocp-release@sha256:cbde13fe6ed4db88796be201fbdb2bbb63df5763ae038a9eb20bc793d5740416
release architecture amd64

Version of assisted-service:
openshift/assisted-service@c616cdc

Also, I'm willing to create PR for this if this is accepted as bug/improvement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions