1.3.3 -> 1.4.8
升级步骤
- istio-proxy 每半小时断开一次所有的 grpc 的连接。
升级完后,所有的 pod 正常,访问也没有问题。但是在日志中每隔半个小时会有以下提示:
[Envoy (Epoch 0)] [2020-05-15 06:30:48.170][21][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13,
原因:pilot 每隔 30 分钟会断开一次所有的 grpc 的连接,这个日志如果也可以关掉,但是会引起 cpu /内存的升高。
I understand the issue correctly, every 30m Pilot will reset connections made to xDS servers by the Envoy proxy which can cause the proxy to drop all configuration and reload.
In the interim period the service can have disruption as there might be no listeners/routes/clusters/endpoint config at the Envoy proxies between a disconnect and config getting re-populated.
参考
- 更新后服务 503不可用?
更新完后,在describe pod my-pod -n pj-demo
时,看到有如下 events:
Warning Unhealthy 10m (x2 over 10m) kubelet, ks-allinone Readiness probe failed: HTTP probe failed with statuscode: 503
原因:在更新过程中,会造成 service down,istio-proxy 的 readiness定义如下:
#root@ks-allinone:/root # kubectl describe pod/nginx-xxx
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
istio 可以理解为"第二层的 lb", 当第二层的 lb 如果有问题时,需要上报给第一层的 Lb,这样就保持两个 Lb 的状态一致。否则第一层误以为第二层是健康的,而将流量转发过来后,容易造成服务的访问的不可用。
15020是 istio-proxy 专门用于健康检查的端口
参考
- 更新数据平面时,没有更新成功。
更新完后,在执行kubectl rollout restart deployment -n pj-demo-xx
滚动更新时,发现该 ns 下的 pods 还是原先的 pods 并没有更新成功。
原因:查看日志应该会有报错,一般是更新出现了问题,或是新版本的代码没有生效。这个时候可以重启下 istio-system 下的 pod
$ kubectl rollout restart deployment -n istio-system
1.4.8 -> 1.6.10
https://kubesphere.com.cn/forum/d/2459-istio-1481610
问题:
- ingress controller 的日志中有error:
2020-10-21T03:02:48.404748Z info Subchannel Connectivity change to CONNECTING
2020-10-21T03:02:48.404805Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2020-10-21T03:02:48.404885Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000ece380, {CONNECTING <nil>}
2020-10-21T03:02:48.404900Z info Channel Connectivity change to CONNECTING
2020-10-21T03:02:48.404918Z info Subchannel picks a new address "istiod-1-6-10.istio-system.svc:15012" to connect
2020-10-21T03:02:48.414178Z info Subchannel Connectivity change to READY
2020-10-21T03:02:48.414235Z info pickfirstBalancer: HandleSubConnStateChange: 0xc000ece380, {READY <nil>}
2020-10-21T03:02:48.414247Z info Channel Connectivity change to READY
原因:
pilot每30分钟断开一次连接,正常情况,非错误,只是提示而已。
https://github.com/istio/istio/issues/24800
https://github.com/istio/istio/wiki/Troubleshooting-Istio#common-issues