Rancher Turtles Troubleshooting

This guide provides troubleshooting steps for Rancher Turtles and Cluster API (CAPI) clusters. It covers common issues and their resolutions to help you maintain a stable and functional environment.

Before reading this guide, you can first look at the official troubleshooting documentation for the CAPI project.

Understanding the intricacies of both Rancher Turtles and CAPI is crucial for effective troubleshooting. This guide aims to equip you with the knowledge to diagnose and resolve basic problems in Rancher Turtles and CAPI.

Prerequisites

To effectively troubleshoot Rancher Turtles and CAPI clusters, you need:

  • Access to a terminal with kubectl configured to access Rancher management cluster

  • Krew plugin manager for kubectl installed

  • Helm CLI tool installed

You can install Krew by following the instructions at Krew installation guide.

How to check the logs

You can view the following pods and check their logs in namespaces:

Namespace Pod Description

rancher-turtles-system

rancher-turtles-cluster-api-operator

Manages the installation and lifecycle of Cluster API providers in the management cluster

rancher-turtles-system

rancher-turtles-controller-manager

Handles Rancher integration and manages the custom resources for Turtles

rancher-turtles-system

caapf-controller-manager

Controls the Cluster API Provider Framework for custom infrastructure providers

capi-system

capi-controller-manager

Core Cluster API controller that reconciles main cluster resources

rke2-bootstrap-system

rke2-bootstrap-controller-manager

Manages the bootstrap configuration for RKE2 nodes

rke2-control-plane-system

rke2-control-plane-controller-manager

Handles the initialization and management of RKE2 control plane components

*-system

*-controller-manager

Handles the initialization and management of Infrastructure provider components

$-bootstrap-system

$-bootstrap-controller-manager

Manages the bootstrap configuration for custom providers

$-control-plane-system

$-control-plane-controller-manager

Handles the initialization and management of custom providers control plane components

'*' - depending on the infrastructure provider in use '$' - depending on controlplane/bootstrap provider in use, e.g. kubeadm or rke2

One way to monitor logs from Cluster API controll plane pods by usuing kubectl logs with label selector.

kubectl logs -l control-plane=controller-manager -n rancher-turtles-system -f
kubectl logs -l control-plane=controller-manager -n capi-system -f
kubectl logs -l control-plane=controller-manager -n rke2-bootstrap-system -f
kubectl logs -l control-plane=controller-manager -n rke2-control-plane-system -f

Using kubectl krew to install the stern plugin involves third-party software which may pose security risks in production environments. While it’s a useful tool for log viewing, it should primarily be used in development or testing environments. Always review third-party plugins before installing them in sensitive or production clusters.

The best method to monitor logs from different pods is by using the following method.

kubectl krew install stern

Then you can tail logs from all pods in real time.

kubectl stern . -n rancher-turtles-system -n rke2-bootstrap-system -n rke2-control-plane-system -n capi-system

or

kubectl stern -A -l control-plane=controller-manager

How to manage Rancher Turtles and CAPI resources

List Rancher and Cluster API resources

Often users are asking how to deal with long list of different CRDs created by CAPI and Rancher, here is some idea how you can deal with it.

  1. List all CAPI and Rancher related credentials

    kubectl api-resources | grep 'cattle.io\|cluster.x-k8s.io'
  2. About the resource, you can get a full explanation of what it is doing or what the configuration options are

    kubectl api-resources | grep clusterclass
    kubectl explain clusterclass.cluster.x-k8s.io
  3. Then you get a description about each field and configuration section

    kubectl explain clusterclass.spec
    kubectl explain clusterclass.spec.controlPlane
    kubectl explain clusterclasses.spec.controlPlane.machineInfrastructure

List existing Rancher and Cluster API objects resources

The most common mistake is to use the default namespace in Kubernetes for your work. This will make your configuration really messy and difficult to manage and troubleshoot. It is highly recommended to use separate namespaces for deploying and creating CAPI downstream clusters and avoid using kubernetes system namespaces, such as default and kube-*.

When applying templates with kubectl, always specify -n <NAMESPACE> to provision resources in a known locaction. You can switch between namespaces in KUBECONFIG with command:

kubectl config view
kubectl config set-context --current --namespace <NAMESPACE>

With separate namespaces, you can easily manage resources by following this method.

Using kubectl krew to install the lineage plugin involves third-party software which may pose security risks in production environments. While it’s a useful tool for log viewing, it should primarily be used in development or testing environments. Always review third-party plugins before installing them in sensitive or production clusters.

  1. Install get-all and lineage plugins using krew for kubectl

    kubectl krew install lineage
    kubectl krew install get-all
  2. Then list all existing resources in for example capi-clusters namespace where your downstream cluster configuration was deployed.

    kubectl get-all -n capi-clusters

    Example output:

    NAME                                                                                      NAMESPACE      AGE
    configmap/kube-root-ca.crt                                                                capi-clusters  23h
    secret/cluster1-shim                                                                     capi-clusters  32s
    serviceaccount/default                                                                    capi-clusters  23h
    rke2config.bootstrap.cluster.x-k8s.io/cluster1-mp-0-dm59p                                capi-clusters  32s
    rke2config.bootstrap.cluster.x-k8s.io/cluster1-mp-1-gv6kh                                capi-clusters  32s
    rke2configtemplate.bootstrap.cluster.x-k8s.io/cluster1-pool0                             capi-clusters  33s
    rke2configtemplate.bootstrap.cluster.x-k8s.io/cluster1-pool1                             capi-clusters  33s
    clusterclass.cluster.x-k8s.io/clusterclass1                                               capi-clusters  33s
    cluster.cluster.x-k8s.io/cluster1                                                        capi-clusters  32s
    machinepool.cluster.x-k8s.io/cluster1-mp-0-d4fdv                                         capi-clusters  32s
    machinepool.cluster.x-k8s.io/cluster1-mp-1-l86kv                                         capi-clusters  32s
    clustergroup.fleet.cattle.io/clusterclass1                                                capi-clusters  33s
    azureclusteridentity.infrastructure.cluster.x-k8s.io/cluster-identity                     capi-clusters  33s
    azuremanagedcluster.infrastructure.cluster.x-k8s.io/cluster1-l2cs6                       capi-clusters  32s
    azuremanagedclustertemplate.infrastructure.cluster.x-k8s.io/cluster                     capi-clusters  33s
    azuremanagedcontrolplane.infrastructure.cluster.x-k8s.io/cluster1-rv8v4                  capi-clusters  32s
    azuremanagedcontrolplanetemplate.infrastructure.cluster.x-k8s.io/cluster1-control-plane  capi-clusters  33s
    azuremanagedmachinepool.infrastructure.cluster.x-k8s.io/cluster1-mp-0-78tck              capi-clusters  32s
    azuremanagedmachinepool.infrastructure.cluster.x-k8s.io/cluster1-mp-1-zdscw              capi-clusters  32s
    azuremanagedmachinepooltemplate.infrastructure.cluster.x-k8s.io/cluster1-pool0           capi-clusters  33s
    azuremanagedmachinepooltemplate.infrastructure.cluster.x-k8s.io/cluster1-pool1           capi-clusters  33s
  3. Then you can check what the relation is between all object resources

    kubectl lineage -n capi-clusters cluster.cluster.x-k8s.io/cluster1

    Output:

    Output

How to enable debug mode for Rancher Turtles and CAPI operators

The helm chart exposes values for increasing the log level via the usual values.yaml configuration parameters:

  • Cluster API Operator - log level for CAPI operator can also be increased using helm chart, if installed with Rancher Turtles chart use:

        --set cluster-api-operator.logLevel=5
        --set rancherTurtles.managerArguments[0]="-v=5"

For example:

helm upgrade rancher-turtles turtles/rancher-turtles \
            -n rancher-turtles-system \
            --reuse-values \
            --set "rancherTurtles.managerArguments={--insecure-skip-verify,-v=5}" \
            --set cluster-api-operator.logLevel=5
  • Cluster API Providers - edit CAPIProvider resources for providers where increasing log level is needed. Change to desidered level:

CAPIProvider.Spec.Manager.Verbosity=5

(5 is equivalent to DEBUG)

How to collect infortmation from Rancher Turtles and CAPI

crust-gather is a project created by Cluster API developers specifically designed for gathering logs and resource states from CAPI environments. It’s a safe and official tool that can be used in any type of environment (development, testing, or production) to collect comprehensive diagnostic information for troubleshooting.

You can install it via following instructions:

kubectl krew install crust-gather

kubectl crust-gather --help

Alternatively, it can be installed standalone via install.sh script:

curl -sSfL https://github.com/crust-gather/crust-gather/raw/main/install.sh | sh

crust-gather --help

You can specify a list of filters to collect data. By default it takes a full cluster snapshot. Data is stored in crust-gather dir by default. Crust gather accepts a pre-defined filter configs for Rancher or child cluster. Kubeconfig should point to the correct cluster. Usage with config file:

kubectl crust-gather collect-from-config -c config.yaml

Usage via regular flags:

kubectl crust-gather collect --include-namespace rancher-turtles-system --include-namespace capi-* --include-namespace cattle* --include-namespace c-* --include-namespace=<any-capi-cluster-namespace> --kubeconfig=<KUBECONFIG>

You can specify a file with secrets or environment variables with secrets strings to exclude.

For example:

kubectl crust-gather collect -s ENV_WITH_SECRET --secrets-file=secrets.txt

Or exclude all secret resources from collection:

kubectl crust-gather collect --exclude-kind Secret

How to clean up Rancher Turtles and CAPI resources

Sometimes cleanup of your infrastructure might fail and it might lead to pending resources. In this situation, we have to remove resources manually.

Keep in mind, that removal of finalizers manually will require to perform manual cleanup of provisioned resources by the infrastructure provider.

export NAMESPACE=capi-clusters

for RESOURCE in `kubectl get-all -n $NAMESPACE -o name | grep 'cattle.io\|cluster.x-k8s.io'`;
do
        echo "Patching $RESOURCE in namespace $NAMESPACE";
        kubectl patch $RESOURCE -n $NAMESPACE -p '{"metadata":{"finalizers":[]}}' --type=merge;
        kubectl delete $RESOURCE -n $NAMESPACE;
done

How to uninstall Rancher Turtles and CAPI project

To uninstall Rancher Turtles and CAPI components from your management cluster, follow these steps in order:

  1. First, delete all downstream clusters created with CAPI. For each cluster:

    kubectl delete -n capi-clusters cluster.cluster.x-k8s.io cluster1

    Replace capi-clusters with the namespace where your clusters are deployed and cluster1 with your cluster name. Wait for each cluster to be fully deleted before proceeding to the next step.

  2. Uninstall the Rancher Turtles Helm chart:

    helm uninstall -n rancher-turtles-system rancher-turtles
  3. Remove any webhook configurations that might have been created by CAPI providers:

    # List all webhook configurations
    kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io
    
    # Delete provider-specific webhooks (examples)
    kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io azureserviceoperator-validating-webhook-configuration
    kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io capz-validating-webhook-configuration
    kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io capi-validating-webhook-configuration
    kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io rke2-webhook-configuration
    
    # Also check and delete mutating webhooks if present
    kubectl get mutatingwebhookconfigurations
    kubectl delete mutatingwebhookconfigurations [webhook-name]
  4. Clean up any leftover namespaces and resources. The following namespaces may remain after uninstallation:

    • rancher-turtles-system

    • rke2-bootstrap-system

    • rke2-control-plane-system

    • capi-system

    • capz-system (or other provider-specific namespaces like capv-system, capa-system, etc.)

    • capi-clusters (or other namespaces where you deployed clusters)

      To remove these namespaces:

      # First remove any finalizers that might be blocking deletion
      for NS in rancher-turtles-system rke2-bootstrap-system rke2-control-plane-system capi-system capz-system capi-clusters; do
        kubectl get namespace $NS -o json | jq '.spec.finalizers = []' | kubectl replace --raw "/api/v1/namespaces/$NS/finalize" -f -
      done
      
      # Then delete the namespaces
      kubectl delete namespace rancher-turtles-system rke2-bootstrap-system rke2-control-plane-system capi-system capz-system capi-clusters
  5. Finally, remove the CRDs related to Cluster API and Rancher Turtles:

    # Delete all Cluster API and Rancher Turtles CRDs
    kubectl get crds | grep 'cluster.x-k8s.io\|turtles-capi.cattle.io' | awk '{print $1}' | xargs kubectl delete crd
    
    # Or manually delete them one by one
    kubectl delete crd clusters.cluster.x-k8s.io
    kubectl delete crd clusterclasses.cluster.x-k8s.io
    kubectl delete crd machines.cluster.x-k8s.io
    kubectl delete crd machinepools.cluster.x-k8s.io
    kubectl delete crd providers.turtles-capi.cattle.io
    kubectl delete crd clusterconfigs.turtles-capi.cattle.io
    # ... and other related CRDs

    Ensure all clusters are completely deleted before removing these CRDs, or you may leave orphaned cloud resources.