Use 50% Discount for all private proxies!
Unlimited possibilities with ExtraProxies

Struggling with a new Rancher HA RKE2 install, getting a “404 Not Found” page. How can I troubleshoot this?

I’ve never installed Rancher before, but I am attempting to set up a Rancher environment onto an on-prem HA RKE2 cluster. I have an F5 as the load balancer, and it is set up to handle ports 80, 443, 6443, and 9345. A DNS record called rancher-demo.localdomain.local points to the IP address of the load balancer. I want to provide my own certificate files, and have created such a certificate via our internal CA.

The cluster itself was made operational, and works. When I ran the install on the nodes other than the first, they used the DNS name that points to the LB IP, so I know that part of the LB works.

kubectl get nodes  NAME                             STATUS   ROLES                       AGE   VERSION rancher0001.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1 rancher0002.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1 rancher0003.localdomain.local    Ready    control-plane,etcd,master   25h   v1.26.12+rke2r1 

Before installing Rancher, I ran the following commands:

kubectl create namespace cattle-system kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=~/tls.crt --key=~/tls.key kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem=~/cacerts.pem 

Finally, I installed Rancher:

helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=rancher-demo.localdomain.local --set bootstrapPassword=passwordgoeshere --set ingress.tls.source=secret --set privateCA=true 

I don’t remember the error, but I did see a timeout error soon after running the install. It definitely did *some* of the installation:

kubectl -n cattle-system rollout status deploy/rancher deployment "rancher" successfully rolled out  kubectl get ns NAME                                     STATUS   AGE cattle-fleet-clusters-system             Active   5h18m cattle-fleet-system                      Active   5h24m cattle-global-data                       Active   5h25m cattle-global-nt                         Active   5h25m cattle-impersonation-system              Active   5h24m cattle-provisioning-capi-system          Active   5h6m cattle-system                            Active   5h29m cluster-fleet-local-local-1a3d67d0a899   Active   5h18m default                                  Active   25h fleet-default                            Active   5h25m fleet-local                              Active   5h26m kube-node-lease                          Active   25h kube-public                              Active   25h kube-system                              Active   25h local                                    Active   5h25m p-c94zp                                  Active   5h24m p-m64sb                                  Active   5h24m  kubectl get pods --all-namespaces NAMESPACE             NAME                                                      READY   STATUS    RESTARTS        AGE cattle-fleet-system   fleet-controller-56968b86b6-6xdng                         1/1     Running   0               5h19m cattle-fleet-system   gitjob-7d68454468-tvcrt                                   1/1     Running   0               5h19m cattle-system         rancher-64bdc898c7-56fpm                                  1/1     Running   0               5h27m cattle-system         rancher-64bdc898c7-dl4cz                                  1/1     Running   0               5h27m cattle-system         rancher-64bdc898c7-z55lh                                  1/1     Running   1 (5h25m ago)   5h27m cattle-system         rancher-webhook-58d68fb97d-zpg2p                          1/1     Running   0               5h17m kube-system           cloud-controller-manager-rancher0001.localdomain.local    1/1     Running   1 (22h ago)     25h kube-system           cloud-controller-manager-rancher0002.localdomain.local    1/1     Running   1 (22h ago)     25h kube-system           cloud-controller-manager-rancher0003.localdomain.local    1/1     Running   1 (22h ago)     25h kube-system           etcd-rancher0001.localdomain.local                        1/1     Running   0               25h kube-system           etcd-rancher0002.localdomain.local                        1/1     Running   3 (22h ago)     25h kube-system           etcd-rancher0003.localdomain.local                        1/1     Running   3 (22h ago)     25h kube-system           kube-apiserver-rancher0001.localdomain.local              1/1     Running   0               25h kube-system           kube-apiserver-rancher0002.localdomain.local              1/1     Running   0               25h kube-system           kube-apiserver-rancher0003.localdomain.local              1/1     Running   0               25h kube-system           kube-controller-manager-rancher0001.localdomain.local     1/1     Running   1 (22h ago)     25h kube-system           kube-controller-manager-rancher0002.localdomain.local     1/1     Running   1 (22h ago)     25h kube-system           kube-controller-manager-rancher0003.localdomain.local     1/1     Running   0               25h kube-system           kube-proxy-rancher0001.localdomain.local                  1/1     Running   0               25h kube-system           kube-proxy-rancher0002.localdomain.local                  1/1     Running   0               25h kube-system           kube-proxy-rancher0003.localdomain.local                  1/1     Running   0               25h kube-system           kube-scheduler-rancher0001.localdomain.local              1/1     Running   1 (22h ago)     25h kube-system           kube-scheduler-rancher0002.localdomain.local              1/1     Running   0               25h kube-system           kube-scheduler-rancher0003.localdomain.local              1/1     Running   0               25h kube-system           rke2-canal-2jngw                                          2/2     Running   0               25h kube-system           rke2-canal-6qrc4                                          2/2     Running   0               25h kube-system           rke2-canal-bk2f8                                          2/2     Running   0               25h kube-system           rke2-coredns-rke2-coredns-565dfc7d75-87pjr                1/1     Running   0               25h kube-system           rke2-coredns-rke2-coredns-565dfc7d75-wh64f                1/1     Running   0               25h kube-system           rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-mlcln     1/1     Running   0               25h kube-system           rke2-ingress-nginx-controller-6p8ll                       1/1     Running   0               22h kube-system           rke2-ingress-nginx-controller-7pm5c                       1/1     Running   0               5h22m kube-system           rke2-ingress-nginx-controller-brfwh                       1/1     Running   0               22h kube-system           rke2-metrics-server-c9c78bd66-f5vrb                       1/1     Running   0               25h kube-system           rke2-snapshot-controller-6f7bbb497d-vqg9s                 1/1     Running   0               22h kube-system           rke2-snapshot-validation-webhook-65b5675d5c-dt22h         1/1     Running   0               22h 

However, obviously (given the 404 Not Found page when I go to https://rancher-demo.localdomain.local) things aren’t working right.

I’ve never set this up before, so I’m not sure how to troubleshoot this. I’ve spent hours prodding through various posts but nothing I’ve found seems to match up to this particular issue.

Some things I have found:

kubectl -n cattle-system logs -f rancher-64bdc898c7-56fpm 2024/01/17 21:13:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:13:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:13:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout (repeats every 15 seconds)  kubectl get ingress --all-namespaces No resources found (I *know* there was an ingress at some point, I believe in cattle-system; now it's gone. I didn't remove it.)  kubectl -n cattle-system describe service rancher Name:              rancher Namespace:         cattle-system Labels:            app=rancher                    app.kubernetes.io/managed-by=Helm                    chart=rancher-2.7.9                    heritage=Helm                    release=rancher Annotations:       meta.helm.sh/release-name: rancher                    meta.helm.sh/release-namespace: cattle-system Selector:          app=rancher Type:              ClusterIP IP Family Policy:  SingleStack IP Families:       IPv4 IP:                10.43.199.3 IPs:               10.43.199.3 Port:              http  80/TCP TargetPort:        80/TCP Endpoints:         10.42.0.26:80,10.42.1.22:80,10.42.1.23:80 Port:              https-internal  443/TCP TargetPort:        444/TCP Endpoints:         10.42.0.26:444,10.42.1.22:444,10.42.1.23:444 Session Affinity:  None Events:            <none>  kubectl -n cattle-system logs -l app=rancher 2024/01/17 21:17:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:17:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:18:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:18:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:18:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:18:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:19:08 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:19:23 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:19:38 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:19:53 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.23]: dial tcp 10.42.0.26:443: i/o timeout 2024/01/17 21:19:40 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout E0117 21:19:45.551484      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0117 21:19:45.646038      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request 2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request] 2024/01/17 21:19:49 [ERROR] [updateClusterHealth] Failed to update cluster [local]: Internal error occurred: failed calling webhook "rancher.cattle.io.clusters.management.cattle.io": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/clusters.management.cattle.io?timeout=10s": context deadline exceeded E0117 21:19:52.882877      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0117 21:19:53.061671      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request 2024/01/17 21:19:53 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request] 2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.23/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.23:443: i/o timeout 2024/01/17 21:19:55 [ERROR] Failed to connect to peer wss://10.42.1.22/v3/connect [local ID=10.42.0.26]: dial tcp 10.42.1.22:443: i/o timeout E0117 21:19:37.826713      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0117 21:19:37.918579      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request 2024/01/17 21:19:37 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request] E0117 21:19:45.604537      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0117 21:19:45.713901      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request 2024/01/17 21:19:45 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request] 2024/01/17 21:19:49 [ERROR] Failed to connect to peer wss://10.42.0.26/v3/connect [local ID=10.42.1.22]: dial tcp 10.42.0.26:443: i/o timeout E0117 21:19:52.899035      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0117 21:19:52.968048      34 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request 2024/01/17 21:19:52 [ERROR] Failed to read API for groups map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request] 

I’m sure I did something wrong, but I don’t know what and don’t know how to troubleshoot this further.

Related Posts