操作系统信息
虚拟机Centos7
容器运行时
docker
KubeSphere版本信息
例如:v3.4.1。离线安装 All in one
问题是什么:
一个使用 go16 构建的流水线项目,之前(一个月前)还能正常打包成功,今天再次运行流水线时,却无法构建,一直在「环境准备中」,没有报错日志,等了最长有一个小时,也卡在等待 agent 启动:
做了哪些排查:
检索了互联网和论坛,没有类似的案例。
检查了 devops-system 命名空间下的几个 pod 的日志,devops-controller 中与此条流水线相关日志如下:
E1017 03:29:31.249617 1 project_pipeline.go:148] GET http://devops-jenkins.kubesphere-devops-system/job/baseinfo-backend27x69/job/xq-pdf-maker-go16/api/json: 404 <html>
<tr><th>URI:</th><td>/job/baseinfo-backend27x69/job/xq-pdf-maker-go16/api/json</td></tr>
E1017 03:29:31.258502 1 pipeline_metadata_controller.go:68] pipeline-metadata-controller "msg"="unable to obtain and update Pipeline metadata from Jenkins" "error"="not found resources" "Pipeline"={"Namespace":"baseinfo-backend27x69","Name":"xq-pdf-maker-go16"}
E1017 03:29:31.258651 1 controller.go:326] "msg"="Reconciler error" "error"="not found resources" "controller"="pipeline" "controllerGroup"="devops.kubesphere.io" "controllerKind"="Pipeline" "name"="xq-pdf-maker-go16" "namespace"="baseinfo-backend27x69" "pipeline"={"name":"xq-pdf-maker-go16","namespace":"baseinfo-backend27x69"} "reconcileID"="c5448910-1486-4548-9bf2-2247a40dbd5a"
E1017 03:29:31.270164 1 pipeline_metadata_controller.go:68] pipeline-metadata-controller "msg"="unable to obtain and update Pipeline metadata from Jenkins" "error"="not found resources" "Pipeline"={"Namespace":"baseinfo-backend27x69","Name":"xq-pdf-maker-go16"}
E1017 03:29:31.270302 1 controller.go:326] "msg"="Reconciler error" "error"="not found resources" "controller"="pipeline" "controllerGroup"="devops.kubesphere.io" "controllerKind"="Pipeline" "name"="xq-pdf-maker-go16" "namespace"="baseinfo-backend27x69" "pipeline"={"name":"xq-pdf-maker-go16","namespace":"baseinfo-backend27x69"} "reconcileID"="04f4cc75-b6c0-4220-9fca-7fa51da8d06e"
E1017 03:29:31.289309 1 pipeline_metadata_controller.go:68] pipeline-metadata-controller "msg"="unable to obtain and update Pipeline metadata from Jenkins" "error"="not found resources" "Pipeline"={"Namespace":"baseinfo-backend27x69","Name":"xq-pdf-maker-go16"}
E1017 03:29:31.289445 1 controller.go:326] "msg"="Reconciler error" "error"="not found resources" "controller"="pipeline" "controllerGroup"="devops.kubesphere.io" "controllerKind"="Pipeline" "name"="xq-pdf-maker-go16" "namespace"="baseinfo-backend27x69" "pipeline"={"name":"xq-pdf-maker-go16","namespace":"baseinfo-backend27x69"} "reconcileID"="fbfbc9c8-50c4-4325-b525-26059dfa5da9"
I1017 03:29:31.718727 1 pipeline_controller.go:321] update pipeline baseinfo-backend27x69:xq-pdf-maker-go16 successful
I1017 03:29:31.778681 1 pipeline_controller.go:321] update pipeline baseinfo-backend27x69:xq-pdf-maker-go16 successful
I1017 03:29:31.806635 1 pipeline_controller.go:321] update pipeline baseinfo-backend27x69:xq-pdf-maker-go16 successful
I1017 03:29:56.763311 1 pipeline_controller.go:321] update pipeline baseinfo-backend27x69:xq-pdf-maker-go16 successful
I1017 03:30:19.198240 1 pipelinerun_controller.go:222] pipelinerun-controller "msg"="Triggered a PipelineRun" "Pipeline"="xq-pdf-maker-go16" "PipelineRun"={"Namespace":"baseinfo-backend27x69","Name":"xq-pdf-maker-go16-2hts7"} "namespace"="baseinfo-backend27x69" "runID"="1"
"runID"="1" 是因为我重新创建了一个一样的流水线进行排查。
进入了 jenkins web 页面,只有一个 Build Queue 的等待状态:
在 devops-worker 的命名空间,也确实看不到相关 pod 被拉起:
(上图这几个 pod ,为了加速流水线构建时间而做了 idleminutes 参数的变更,所以一直持久化运行)
有一个异常的情况是,devops-jenkins 这个工作负载在14天前发生过 pod 的自动重建,在这14天内的其它构建任务中只使用了持久化的构建环境:
请教下,我该从何处继续排查处理此问题?有没有重启 devops 系统的操作?