[eggjs/egg]容器启动项目,主进程退出导致容器启动失败

2025-10-27 961 views
3

在容器中运行,启动后马上退出,容器启动失败 猜测是主进程退出,但我没有用daemon模式,启动命令是egg-scripts start --workers=1 --title=performance 看了这个https://github.com/eggjs/egg/issues/3885 ,没解决我的问题

错误日志:

2021-10-08 23:00:27,245 INFO 80 [egg-redis] client connect success
2021-10-08 23:00:27,247 INFO 86 [egg-redis] client connect success
2021-10-08 23:00:27,250 INFO 86 [egg-redis] instance[0] status OK, client ready
2021-10-08 23:00:27,251 INFO 86 [egg:core:ready_stat] end ready task /app/node_modules/egg-redis/lib/redis.js:53:7, remain ["/app/node_modules/egg-watcher/lib/init.js:15:14","/app/node_modules/egg-mongoose/lib/mongoose.js:104:7"]
2021-10-08 23:00:27,251 INFO 80 [egg-redis] instance[0] status OK, client ready
2021-10-08 23:00:27,251 INFO 80 [egg:core:ready_stat] end ready task /app/node_modules/egg-redis/lib/redis.js:53:7, remain ["/app/node_modules/egg-watcher/lib/init.js:15:14","/app/node_modules/egg-mongoose/lib/mongoose.js:104:7"]
2021-10-08 23:00:27,253 INFO 86 [egg-watcher:application] watcher start success
2021-10-08 23:00:27,253 INFO 86 [egg:core:ready_stat] end ready task /app/node_modules/egg-watcher/lib/init.js:15:14, remain ["/app/node_modules/egg-mongoose/lib/mongoose.js:104:7"]
2021-10-08 23:00:27,253 INFO 80 [egg-watcher:application] watcher start success
2021-10-08 23:00:27,253 INFO 80 [egg:core:ready_stat] end ready task /app/node_modules/egg-watcher/lib/init.js:15:14, remain ["/app/node_modules/egg-mongoose/lib/mongoose.js:104:7"]
2021-10-08 23:00:27,258 INFO 86 [egg-mongoose] mongodb://10.118.71.86/apm-web connected successfully
2021-10-08 23:00:27,259 INFO 80 [egg-mongoose] mongodb://10.118.71.86/apm-web connected successfully
2021-10-08 23:00:27,347 INFO 86 [egg-mongoose] instance[0] start successfully
2021-10-08 23:00:27,347 INFO 86 [egg:core:ready_stat] end ready task /app/node_modules/egg-mongoose/lib/mongoose.js:104:7, remain []
2021-10-08 23:00:27,350 INFO 80 [egg-mongoose] instance[0] start successfully
2021-10-08 23:00:27,350 INFO 80 [egg:core:ready_stat] end ready task /app/node_modules/egg-mongoose/lib/mongoose.js:104:7, remain []
2021-10-08 23:00:27,447 INFO 52 [master] app_worker#2:86 started at 22345, remain 1 (4763ms)
2021-10-08 23:00:27,447 INFO 52 [master] app_worker#1:80 started at 22345, remain 0 (4763ms)
2021-10-08 23:00:27,448 INFO 52 [master] egg started on http://127.0.0.1:22345 (5855ms)
2021-10-08 23:02:08,278 INFO 52 [master] receive signal SIGTERM, closing
2021-10-08 23:02:08,278 INFO 52 [master] send kill SIGTERM to app workers, will exit with code:0 after 5000ms
2021-10-08 23:02:08,278 INFO 52 [master] wait 5000ms
2021-10-08 23:02:08,293 INFO 80 [app_worker] receive signal SIGTERM, exiting with code:0
[2021-10-08 23:02:08.293] [cfork:master:52] worker:80 disconnect (exitedAfterDisconnect: true, state: disconnected, isDead: false, worker.disableRefork: true)
[2021-10-08 23:02:08.293] [cfork:master:52] don't fork, because worker:80 will be kill soon
2021-10-08 23:02:08,293 INFO 52 [master] app_worker#1:80 disconnect, suicide: true, state: disconnected, current workers: []
[2021-10-08 23:02:08.293] [cfork:master:52] worker:86 disconnect (exitedAfterDisconnect: true, state: disconnected, isDead: false, worker.disableRefork: true)
[2021-10-08 23:02:08.293] [cfork:master:52] don't fork, because worker:86 will be kill soon
2021-10-08 23:02:08,293 INFO 52 [master] app_worker#2:86 disconnect, suicide: true, state: disconnected, current workers: []
2021-10-08 23:02:08,293 INFO 86 [app_worker] receive signal SIGTERM, exiting with code:0
2021-10-08 23:02:08,294 INFO 80 [app_worker] beforeExit success
2021-10-08 23:02:08,294 INFO 80 [app_worker] exit with code:0
2021-10-08 23:02:08,295 INFO 86 [app_worker] beforeExit success
2021-10-08 23:02:08,295 INFO 86 [app_worker] exit with code:0
[2021-10-08 23:02:08.301] [cfork:master:52] worker:80 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: false, worker.disableRefork: true)
[2021-10-08 23:02:08.303] [cfork:master:52] worker:86 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: false, worker.disableRefork: true)
2021-10-08 23:02:08,303 INFO 52 [master] send kill SIGTERM to agent worker, will exit with code:0 after 5000ms
2021-10-08 23:02:08,303 INFO 52 [master] wait 5000ms
2021-10-08 23:02:08,303 INFO 52 [master] kill agent worker with signal SIGTERM
2021-10-08 23:02:08,308 INFO 65 [agent_worker] receive signal SIGTERM, exiting with code:0
2021-10-08 23:02:08,310 INFO 65 [agent_worker] beforeExit success
2021-10-08 23:02:08,310 INFO 65 [agent_worker] exit with code:0
2021-10-08 23:02:08,342 INFO 52 [master] close done, exiting with code:0
2021-10-08 23:02:08,342 INFO 52 [master] exit with code:0

镜像:

FROM registry.cn-hangzhou.aliyuncs.com/aliyun-node/alinode:5.13.0-alpine

回答

8

从这里看,是启动了多个 worker,而不是一个,可以检测下日志前面的 options 传递进去的是啥,然后顺着 egg-scripts 源码查下为啥 workers 没生效。

然后后面那段是 master 收到外部发来的 kill 信号,所以自杀退出了,也需要自行检查下为啥。

2

@atian25 感谢回复,options也是1,我在本地是正常运行的,放到容器里就不行了,容器好像调试不了。我也问问运维

[egg-scripts] Run node --no-deprecation /app/node_modules/egg-scripts/lib/start-cluster {"workers":1,"title":"performance","port":"22345","baseDir":"/app","framework":"/app/node_modules/egg"} --title=performance
2021-10-09 09:59:25,570 INFO 42 [master] =================== egg start =====================
2021-10-09 09:59:25,570 INFO 42 [master] node version v12.13.0
2021-10-09 09:59:25,570 INFO 42 [master] alinode version v5.13.0
2021-10-09 09:59:25,570 INFO 42 [master] egg version 2.30.0
2021-10-09 09:59:25,571 INFO 42 [master] start with options:
{
  "framework": "/app/node_modules/egg",
  "baseDir": "/app",
  "port": 22345,
  "workers": 1,
  "plugins": null,
  "https": false,
  "title": "performance"
}
2021-10-09 09:59:25,571 INFO 42 [master] start with env: isProduction: true, EGG_SERVER_ENV: undefined, NODE_ENV: production
2021-10-09 09:59:25,578 INFO 42 [master] agent_worker#1:55 start with clusterPort:42861
2021-10-09 09:59:26,265 INFO 55 Plugin development is disabled by env unmatched, require env([ 'local' ]) but got env is prod
2021-10-09 09:59:26,278 WARN 55 [egg:loader] pluginName(email) is different from pluginConfigName(egg-email)
2021-10-09 09:59:26,279 WARN 55 [egg:loader] pluginName(kafka) is different from pluginConfigName(egg-kafka)
0

好吧,我的原因是hostname设置错误,我设置的是127.0.0.1,导致容器访问不了,然后kill了进程,改成0.0.0.0就可以了。 改了hostname也没出现多个worker了

8

最近也碰到这种问题,看不出什么原因

[2023-04-21 10:20:26.463] [cfork:master:40] worker:72 disconnect (exitedAfterDisconnect: true, state: disconnected, isDead: false, worker.disableRefork: true)
[2023-04-21 10:20:26.464] [cfork:master:40] don't fork, because worker:72 will be kill soon
2023-04-21 10:20:26,464 INFO 40 [master] app_worker#4:72 disconnect, suicide: true, state: disconnected, current workers: []
[2023-04-21 10:20:26.465] [cfork:master:40] worker:86 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: false, worker.disableRefork: true)
[2023-04-21 10:20:26.484] [cfork:master:40] worker:58 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: false, worker.disableRefork: true)
[2023-04-21 10:20:26.493] [cfork:master:40] worker:72 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: false, worker.disableRefork: true)
[2023-04-21 10:20:26.515] [cfork:master:40] worker:59 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: false, worker.disableRefork: true)
[2023-04-21 10:20:26.522] [cfork:master:40] worker:79 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: false, worker.disableRefork: true)
[2023-04-21 10:20:26.522] [cfork:master:40] worker:65 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: false, worker.disableRefork: true)
[2023-04-21 10:20:26.540] [cfork:master:40] worker:93 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: false, worker.disableRefork: true)
[2023-04-21 10:20:26.555] [cfork:master:40] worker:99 exit (code: 0, exitedAfterDisconnect: true, state: dead, isDead: true, isExpected: false, worker.disableRefork: true)
2023-04-21 10:20:26,556 INFO 40 [master] send kill SIGTERM to agent worker, will exit with code:0 after 5000ms
2023-04-21 10:20:26,556 INFO 40 [master] wait 5000ms
2023-04-21 10:20:26,556 INFO 40 [master] kill agent worker with signal SIGTERM
2023-04-21 10:20:26,565 INFO 47 [agent_worker] receive signal SIGTERM, exiting with code:0
2023-04-21 10:20:26,568 INFO 47 [agent_worker] beforeExit success
2023-04-21 10:20:26,568 INFO 47 [agent_worker] exit with code:0
2023-04-21 10:20:26,578 INFO 40 [master] close done, exiting with code:0
2023-04-21 10:20:26,579 INFO 40 [master] exit with code:0
9

看起来像是没指定 worker 进程数,导致启动了太多 worker(你看 worker id 都到 99 了),然后被容器杀了

5

这边再补充一种情况,就是如果是把服务直接注册为linux系统服务,然后通过systemctl start来启动时,也可能会出现主进程收到receive signal SIGTERM,然后服务启动失败的问题。这个问题的主要原因是systemctl start有个超时检测逻辑,默认是90s,如果我们的egg服务没有在90s内正常退出的话,linux系统就会向该服务发送 SIGTERM 信号,以请求优雅停止。解决方案就是两个思路:
1、分析egg服务的启动脚本,看看为啥启动这么慢,提高启动速度。
2、调整注册为linux系统服务的配置,将启动超时时间调大,这个配置项为:TimeoutStartSec ,位于 [Service] 段下面,示例入如下:

[Unit] 
Description=My service 

[Service] 
ExecStart=/usr/bin/my-service 
TimeoutStartSec=120 

[Install] 
WantedBy=multi-user.target