cool hit counter Network down for all hosts in a Rancher environment_Intefrankly

Network down for all hosts in a Rancher environment


Welcome to Chen's original blog post

Rancher 1.6.14 OS:Ubuntu 16.04

This article documents the problem of Rancher's entire environment being unable to communicate across hosts due to a network issue on one host that brings down the network of all hosts in the environment.

identify a problem

Received a fault warning email and the site is inaccessible.

For presentation purposes, a sketch of the request processing is drawn first.

Requests are forwarded by nginx as the first layer. haproxy is the standard Loader Balance component provided by Rancher that proxies requests to specific applications based on rules and does load balancing at the same time if the application has multiple instances.

Identification of problems

  • ping domain name, can pass => That means the network is working.
  • Accessing the website address, the request status in the nginx log is502 perhaps 504 => It means that the request reached nginx and there was a problem with the subsequent gateway

note:502 Bad Gateway; 504 Gateway Time-out

  • View all hosts in Rancher and find all Rancher network containers healthcheck The component is in initializing state, and containers between different hosts cannot be pinged => Confirmation that there is a problem with the Rancher network

The healthcheck status of all hosts is shown in the following screenshot.

<img width="60%"src="https://media.chenyongjun.vip/2018/06/26/6fa73d3128a2400d829dd616c03a4603.png"/>

  • Looking at the logs for the healthcheck, rancher-agent, rancher-server, network-manager containers yields nothing => Embarrassed and very reactive to problems with third party tools used without in-depth knowledge
  • Thinking about the last rancher network issue I dealt with, Rancher can't start healthcheck and lb To troubleshoot, follow the official rancher steps.
  • Host not enabled UFW Services, excluding firewall interference
  • Check that the console host IP is correct. find a clue as follows.

A host's IP becomes 172.17.0.1 , which is not the normal IP of the machine, is usually docker0 IP of the bridge

<img width="60%" src="https://media.chenyongjun.vip/2018/06/26/4941b27646624b84a7bf71ef210b35d7.png">

  • ifconfig Check the problem host IP, 172.17.0.1 for docker0 of the IP. Rancher's website says it encountered the wrong IP Host re-registration required

This is GG, we have to remove the containers on the host or stop

docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 0.0.0.0
        inet6 fe80::42:9cff:fea1:bc40  prefixlen 64  scopeid 0x20<link>
        ether 02:42:9c:a1:bc:40  txqueuelen 0  (Ethernet)
        RX packets 144756223  bytes 17497382352 (16.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 124049363  bytes 79629803176 (74.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
  • After removing this problematic host, restart the other host'shealthcheck service, the host communications are restored to normal. This concludes the problem identification.

cure

The problem host was removed and re-added and this problem host returned to normal.

note:I forget how many times this is handledRancher There's a network problem.,Rancher One version up., He's also stepped in a lot of holes

The problem resurfaces

Regardless of what caused the problem, it's curious how one host with the wrong IP can trigger all hosts avalanche What? Try to reproduce the problem.

Reproduction method : Add an IP in a normal network environment with docker0 The host on the bridge with the IP of 172.17.0.1

Reproduction of results : Add IP as 172.17.0.1 The network of the entire environment immediately becomes abnormal after the hosts, and the hosts cannot communicate with each other, reproducing the above problem

Questions to explore

Why does the host IP become 172.17.0.1?

FAQs on the Rancher website cross host communication Narrative.

Every so often, the IP of the host will accidentally pick up the docker bridge IP instead of the actual IP. These are typically 172.17.42.1 or starting with 172.17.x.x. If this is the case, you need to re-register your host with the correct IP by explicitly setting the CATTLE_AGENT_IP environment variable in the docker run command.

That is, every once in a while, the docker bridge IPs are occasionally used to replace the host's actual IPs, which are usually 172.17.42.1 or start with 172.17.x.x. If you encounter this situation, you need to re-add the host.

todo: doubts to be solved

Why does a problem with one host affect all hosts?

todo: doubts to be solved


Recommended>>
1、Thinking and Implementing Interface Automation Testing
2、Audis new A8L completes China debut with March launch in China
3、Name one advantage you have over a robot lol Can you eat it
4、Use of SpringCloudConfig distributed configuration center and pitfalls encountered
5、iponeX heartsensing because its smart

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号