[wasm] EmccCompile: Improve AOT time by better utilizing the cores (#67195)
* [wasm] EmccCompile: Improve AOT time by better utilizing the cores Problem: `EmccCompile` tasks compiles `.bc` files to `.o` files, and uses `Parallel.ForEach` to run `emcc` for these in parallel. The problem manifests when `EmccCompile` is compiling lot of files. - To start with, the intended number of cores are being used - but at some point (in my case after ~150 out of 180 files), the number of cores being utilized goes down to 1. - And the reason is that `Parallel.ForEach` partitions the list of files(jobs), and they execute only the assigned jobs From: https://github.com/dotnet/runtime/issues/46146#issuecomment-754021690 Stephen Toub: "As such, by default ForEach works on a scheme whereby each thread takes one item each time it goes back to the enumerator, and then after a few times of this upgrades to taking two items each time it goes back to the enumerator, and then four, and then eight, and so on. This ammortizes the cost of taking and releasing the lock across multiple items, while still enabling parallelization for enumerables containing just a few items. It does, however, mean that if you've got a case where the body takes a really long time and the work for every item is heterogeneous, you can end up with an imbalance." The above means that with wildy different times taken by each job, we can end up in this imbalance, leading to some cores being idle, which others get reduced to running jobs sequentially. Instead, we want to use work-stealing so jobs can be run by any partition. In my highly unscientific testing, with AOT for `System.Buffers.Tests`, the total time to run `EmccCompile` for 181 assemblies goes from 5.7mins to 4.0mins . * MonoAOTCompiler.cs: Ensure that the parallel jobs get scheduled with .. work-stealing, instead of being partitioned.
Showing
想要评论请 注册 或 登录