Table of Contents
< All Topics
Print

Worker node deployment failed and cilium container startup exception

Problem Description

Cluster creation failed, worker node deployment failed, cilium container start-up abnormal

Alert Information

Deployment failure on interface + cilium component not up

image

Effective Troubleshooting Steps

  • When deployment fails, you can directly use the password k8sadmin to log into the node to check the situation. After deployment succeeds, the password will be randomized.

  • Use crictl ps on the node to view the situation, and use crictl logs -f to find out the failure reason of cilium.

  • image

  • From the reason, it appears that cilium failed to request the apiserver address, which is the first address in the service network segment, and then it would forward via iptables to the VIP node's IP address.

  • Use iptables -t nat -S | grep -w ${cilium failure reported IP} (assuming the failure reported IP is 10.96.0.1) to find the relevant chain, then use grep to find the actual IP address being mapped to (as shown in the below image) image

  • Check this IP and find that it is the business port IP of the master node. The master node was successfully deployed and should provide services.

  • Using curl https://${master-business port IP}:6443 -k finds that it is not accessible, while curl https://${VIP}:6443 can access the service.

  • Attempt arping ${master-business port IP}, finding that the MAC address does not match, which likely indicates an IP conflict. Try to find if this IP has been configured in the environment on routes, hosts, or elastic IPs.

  • Finally, the customer inspected the environment and found that indeed two IPs were already in use. Previously, the customer had mounted an elastic IP to the router and then forgotten about it, subsequently assigning the already-used elastic IP to a new host, resulting in unexpected worker node deployment failure

Root Cause

IP conflict, leading to cilium on the node being unable to reach apiserver through iptables DNAT mapping, resulting in service access being rejected or timing out.

  • Need to test if the apiserver service is really running. Usually, the node's /etc/hosts file accesses through VIP. If it's okay there, but cilium is having issues, image

  • Check if you can ping through, check if you can send large packets.

  • Check if you get multiple MAC addresses via arping, or if the address does not match.

Solution

Refer to the above troubleshooting for network issues

Scope of Impact

NA

Is it a Temporary Solution

NA

Recommendations and Summary

NA

Troubleshooting Content

NA

Original Link

https://support.sangfor.com.cn/cases/list?product_id=37&type=1&category_id=29046&isOpen=true