[eggjs/egg][RFC] egg-cluster 支持 worker_threads 启动模型

背景

目前 worker_threads API 已经稳定，在 >= node-v12.x 的环境下，考虑将原本的基于 child_process & cluster 的多进程模型之外，提供可选配置 mode：

default: 默认值，继续采用基于 process 的多进程启动
worker_threads: 采用基于 worker_threads 的启动方案

其中对于在 worker_threads 模型下，Node.js 针对每一个线程的 even_tool 以及 v8::Isolate 均为独立，这样 JavaScript 层面的资源是隔离的，互相之间的通信依靠标准 html structured clone algorithm 进行。

更多的，资源限制方面，可以通过 worker.resourceLimits 来限制每一个 worker_threads 的内存等资源，提供和基于 process 隔离相似的稳定性体验。

一些优势，worker_threads 之间的通信是进程内的，对于原本基于 process 的 ipc 或者走本地端口的方案，设计的复杂度和数据交互的性能均可以得到提升，并且基于 worker_threads 的 agent_worker 和 app_worker 数据频繁交互也成为可以在生产使用的方案。

最后在云下资源占用的角度，进程模型下一个 Egg 应用最少需要启动三个进程：master - app - agent，线程模型下则实际上只有一个进程，更加适合 serveless 中的低配额场景（例如 < 1c < 1g 的规格）

思路

Egg 框架本身 app_worker 和 agent_worker 的设计无需更改，只需要在 egg-cluster 中进行适配即可：

const startCluster = require('egg-cluster').startCluster;

startCluster({
  baseDir: __dirname,
  mode: 'worker_threads'
});

具体为将 egg-cluster 中原本的进程 fork & cluster 相关操作进一步抽象为一个 BaseWorker 的原型方法，并提供基于进程和线程对应的 Impl 。

以后如果有更多上层的部署启动模型也可以根据 mode 配置依次实现 BaseWorker 的 Impl 即可

跟进

[ ] egg-cluster 中的启动模型抽象，增加 mode 配置
[ ] 进程模型的启动逻辑收敛为单独的 Impl
[ ] 提供基于 worker_threads 的 Impl
[ ] 在上层验证 worker_threads mode 是否完全兼容

hyj1991

cc @atian25 @fengmk2

hyj1991

进程间的通讯方式，对应的 API 会变化么？对用户有感知么？
mode 好像冲突了？是不是被单进程的开关占用了。
可以支持 egg-scripts 传参（应该默认支持了）
egg-mock 层面是否需要优化，不需要现在这样强行改单进程？

atian25

进程间的通讯方式，对应的 API 会变化么？对用户有感知么？

对用户的使用 API 期望是保持不变，这也是抽象 BaseWorker + 各个 Impl 的作用

mode 配置如果已经被占用，也可以更改为别的字段，比如 startMode，尽量语义化即可。

可以支持 egg-scripts 传参

这个等 egg-cluster 改造完成后可以加上对应的启动参数适配

egg-mock 层面是否需要优化，不需要现在这样强行改单进程？

对于单进程的使用，其实我的想法是将原本的 singleMode 通过抽象也纳入这个 BaseWorker 中，对应的 Impl 可以是实际上 AgentWorker 的 Impl 依旧是在当前进程执行

hyj1991

cluster-client 这个也应该会涉及到

atian25

其实在 worker_threads 的启动模式下，就不需要 cluster-client 了，或者你指的是要平滑在上层适配 cluster-client 的 API？

hyj1991

其实在 worker_threads 的启动模式下，就不需要 cluster-client 了，或者你指的是要平滑在上层适配 cluster-client 的 API？

社区可以不建议 cluster-client 的方式了，复杂度好高。内部可能还得看看

mansonchor

egg 3.0 直接就切换到 worker_threads 模式应该就可以了，我们确保之前的插件使用 agent 和 worker 的标准通讯 api 做到一个大版本的兼容性。这样也代表我们的 3.0 是一个新技术架构。

fengmk2

关于这个pull ， https://github.com/eggjs/egg-cluster/pull/100 ，运行了下，还是很节省内存的，起码省了一半，这在一些物联网小机器上显得非常友好。建议官方支持这种启动模式。不过官方真的合并代码可能还有很多额外因素要考虑。

在官方决定方案前，我暂时先自己内部用起了worker_threads模式。在这里留下启动方式给一些新手后人参考：

// 下载源码包https://github.com/hyj1991/egg-cluster // 解压到项目目录的egg-cluster // 在项目目录新建single.js ，写入以下内容：

const startCluster = require("./egg-cluster").startCluster;
startCluster({
  baseDir: __dirname,
  port: 7001,
  workers: 1,
  mode: "worker_threads",
});

执行 node single.js 以单文件入口方式启动，内存占用超低。

star7th

@star7th 这个 pr 目前的主要问题是 worker_threads 尚不支持 port 转发，所以要完全推到上游就得在 egg-cluster 这一层自己做一个 main thread 的逻辑转发，在考虑需不需要这么处理。

另外，对于 agent 的逻辑目前放到 worker_threads 中是完全没有问题的，你看到的内存节省应该就是 agent 从 process -> thread 节省下来的开销

hyj1991

@hyj1991 在这个 PR 里面，顺便把 single mode 的逻辑也下沉掉？让 single mode 也正式可用。现在是在 egg 里面的，记得之前 @popomore 提过，希望把几种不同的启动方式都统一到 egg-cluster 这个库

atian25

@atian25 single mode 是可以完全基于 worker thread 做，相当于 app thread 只有一个，可以自己去做端口监听，不过这样会将 1 个 app worker 的场景和多个 app worker 的场景区分开：1 个 app worker 使用 thread 模型，多个 app worker 就需要回退到 process 模型。

这个需要考虑下是否需要这样去设计 single mode

hyj1991

这样做对比粗暴拍平到一个进程中的优势是：agent 和 app thread 依旧上下文独立的，不会产生互相干扰，比如之前遇到的部分 agent 负载高的场景，在这种模式下依旧不会对 app thread 产生直接的影响

hyj1991

确实可以，区别就是 thread 的个数。不过这种模式下， agent 和 worker 的通信还是通过 ipc 的，原来的 single 是直接函数调用。这块同进程里面应该不会有问题吧？然后 agent 和 worker 是隔离的，这样方便很多。

可以搞起。

atian25

这个 RFC 搞了 single 时，可以顺便把 egg-mock 改下，那块的强行抹平单进程也挺恶心的

atian25

agent 和 worker 的通信还是通过 ipc 的

不是 ipc，是进程内的通信，性能比 ipc 好很多很多

hyj1991

egg-cluster v3

fengmk2

这个改动是不是没有改 egg 项目，导致 worker_threads 启动模式下 ipc 消息收不到？

sjfkai

[eggjs/egg][RFC] egg-cluster 支持 worker_threads 启动模型

回答