问题修复
- 修复 trace 中的静态推导
- 修复 trace 中io和opr的执行顺序避免死锁
- 修复cd4在convolution中group=1时候转换错误以及elemwise转换错误
- 修复fuse conv bias optpass中bias shape固定时,inference时shape匹配不上问题
- 修复 mkl elemwise计算 LOG是异常
- 修复load_and_run --input对于单输入未指定正确输入名字时的错误处理
新功能
- 支持 scalar 类型的 Tensor
- 使用async_level来控制异步执行中错误检查
- 增加 group_norm、 instance_norm、layer_norm、conv1d、remap等算子
- GradManger.attach 对 Tensor 使用 weak reference
- 支持分布式量化训练
- 支持inference weight-preprocess 释放原来weight的内存
- jit mlir后端全面支持Elemwise, DimShuffle
- 增加cv DCT 算子支持
性能优化
- 减少了 batch normalization、elementwise、broadcast等算子在host CPU上的耗时
- 优化了 optimizer 的 step() 性能
- 优化量化训练性能
- 优化arm64 int8X8X16_mk4_k8x8x8 matmul 算子
兼容性破坏
- 无
Bug Fixes
- Fixed static shape inference in trace to allow training larger models
- Link io-opr in trace to avoid deadlock
- Fixed cd4 conversion error when group=1 in convolution and some cases in elemwise
- Fixed the problem of shape matching when the bias shape is fixed in fuse conv bias optpass
- Fixed LOG mode of elemwise in MKL calculation abnormal
- Fix the error processing when load_and_run --input does not specify the correct input name for a single input
New Features
- Support representation of scalar-type tensor
- Enable users to control error check during asynchronous execution by parameter async_level
- Add operators including group_norm, instance_norm and layer_norm, conv1d and remap
- Use weakref for GradManger.attach
- Support distributed quantize aware training
- After weight preprocessing, release the original weight memory during inference
- Support Elemwise and DimShuffle operators in JIT of mlir backend
- Support DCT operator in cv
Optimization
- Reduce host overhead for operators including batch normalization, elementwise, and broadcast
- Improve performance of the step function in optimizers
- Improve performance of quantization training
- Optimize arm64 int8X8X16_mk4_k8x8x8 matmul operator
Compatibility violation
- No