新功能
- NHWC的warpperspective添加matidx支持。
- 添加CUDA版本的remap算子支持。
- 支持编译ios whl包。
- megengine模型支持TensorCore加速。
- Parameter 中增加 replica_mode来指定是否需要同步。
- collective_comm算子添加local_grad参数。
性能优化
- 持续优化CPU下NCHW44性能,在业务线模型有5%-30%性能提升。
- 添加更多midout支持,进一步减少binary size体积。
问题修复
- 修复使用vs2015 编译后megbrain执行速度慢的问题。
- 修复CPU端偶尔出现free < total的内存分配问题。
- 修复arm linux下GCC编译器无法inline小函数导致的性能问题。
- 修复cuda warpperspective算子在batch * img_size 超过INT_MAX时的计算错误。
- 修复cuda elemwise 在int8 broadcast情况下的计算错误,不影响NCHW4模型。
- 修复psroi_pooling 算子的indexing计算逻辑。
- 修复若干个JIT求导时的错误。
- 修复GCC7下编译问题。
- 修复部分NCHW→NCHWxx的转换器问题。
- 修复reduce和gather的求导问题。
- 修复fbs模型格式下无法正确加载含有多个graph的情况(不影响内部mdl模型格式)。
- 修复warpperspective在开midout时可能存在的undefined reference问题。
- 修复开exception引入的敏感词问题。
- 修复标注中由于categories乱序导致生成的contiguous id错误。
- 修复了当一个进程中存在多个 dataloader 实例时,MGE_PLASMA_STORE_MANAGER销毁行为不正确的问题。
- 修复无法加载量化int8 pkl模型的问题。
- 修复nn.flatten的API 说明。@ChaiMind
Thanks to our Contributors
- 本次release非常感谢@ChaiMind 提交PR,期待更多的开发者一起共建MegEngine!
New Features
- Add matidx support to warp perspective operator of NHWC.
- Add remap operator support for CUDA.
- Support compile whl package of IOS.
- MegEngine quantized model supports tensorcore acceleration.
- Add replica_mode to Parameter to specify whether synchronization is required.
- Add local_grad parameter to collective_ comm operators.
Optimization
- Continue to optimize the performance of NCHW44 under CPU, and improve the performance of online model by 5% - 30%.
- Add more midout support to further reduce the size of binary.
Bug Fixes
- Fixed slow execution of megbrain compiled with vs2015.
- Fix the memory allocation problem of free < total on CPU side occasionally.
- Fix the performance problem caused by GCC compiler unable to inline small functions in arm Linux.
- Fix CUDA warp perspective operator in batch * img_ size exceeds INT_MAX.
- Fix CUDA elemwise's calculation error in int8 broadcast without affecting current CUDA nchw4 model.
- Fix psroi_ Indexing computational logic of pooling operators.
- Fixed several JIT grad errors.
- Fix compilation problem under gcc7.
- Fix some converter bugs of NCHW → NCHW4.
- Fix the derivation problem of reduce and gather.
- Fix the situation that FBS model format cannot load multiple graphs correctly (It does not affect the internal MDL model format).
- Fix possible undefined reference issue with warp perspective operator when using midout.
- Fix sensitive words introduced by exception.
- Fix the generated contiguous id error due to the disorder of categories in the annotation.
- Fix incorrect destruction behavior of MGE_PLASMA_STORE_MANAGER when multiple instances of dataloader exist in a process.
- Fix the problem of loading quantified int8 pkl model.
- fix the msgstr for nn.flatten.@ChaiMind
Thanks to our Contributors
- A kind acknowledgement to PR lodged by @ChaiMind , and we are genuinely welcoming more developers to co-build MegEngine!