v1.3.0 · 标签 · MegEngine 天元 / MegEngine

v1.3.0

兼容性破坏

由于 C++ 序列化增加了 opname 字段，导致老版本不能加载新版本序列化文件。
废弃 set/get_conv_execution_strategy ，请使用新接口 set/get_execution_strategy 。

其他说明

funtional.nn 模块中 interpolate/roi_pooling/roi_align/nms/remap/warp_affine/warp_perspective/cvt_color 移动到 funtional.vision 模块。
functional.elemwsie 模块中 sigmoid/hsigmoid/relu/relu6/hswish 移动到 funtional.nn 模块。
functional.utils 模块中 topk_accuracy 被移动到 funtional.metric 模块。
functional.utils 模块中 copy 被移动到 funtional.tensor 模块。

问题修复

通用组件

修复 reshape 推导 shape 错误导致 trace 报错的问题。
修复 trace 内存泄漏的问题。
修复 linspace 造成 trace 报错的问题。
修复 scalar 参数经过求导后变成 1 维 tensor 的问题。
修复图优化中 NCHW 转 NCHW4 出错的问题。
修复异步执行下发任务过快导致内存泄漏问题。
修复 pyobject 引用计数问题引起的段错误。
修复 roialign 越界访存的问题。
修复 CompNode reuse 某些情况下 load 错误。
修复 NormalizeArithChainPass 和 WarpFusion 的图优化错误。
修复 linspace 中 device 参数。

Python API

修复 F.full/F.ones/F.zeros 输入 shape 是 scalar 类型的 tensor 会报错的问题。

量化

修复量化类型在某些 case 下判等会报错的问题。
修复量化训练 checkpoint 加载出错的问题。
修复 TQT 量化训练参数不更新的问题。
修复 TQT 量化训练反向求导计算的问题。
修复量化训练未转换自定义量化 Module 的问题。

其他

修复 set_mgb_log_level 不生效的问题。
修复 batch normalization 中的 freeze 参数的问题。

新功能

通用组件

支持小 tensor 在 host 上的计算以减少 host-device 同步。
fastrun 添加 fast profile 模式。
fast-run 支持递归搜索。
Matmul Opr 支持 fast-run 搜参。
load_and_run 增加 disable-optimize-for-inference 参数。
增加 trace 时根据 module 结构自动命名 op name 的功能。
Reshape 增加静态 shape 推导。

Python API

增加 TensorRT/Atalas/Cambricon (三方硬件）、cvt_color、matinv、resize、warp_affine、deformable_conv2d、deformable_psroi_pooling、repeat、tile 等新算子。
增加给 tensor 命名的功能。

分布式训练

增加分布式通信算子对 scalar 的支持。

周边工具

在 cgtools 中增加 GraphInference 并支持指定输出节点。
增加基于 .mge 文件的可视化、统计参数量计算量的工具。
增加 python 版 load_and_run 工具。

Dataloader

stream dataloader 支持设置 timeout 以及设置 timeout 后的回调函数。

ARM

自动检测 ARM 平台特征并开启相应优化。
添加 ARM64 CUDA 推理支持。

改进

通用组件

被 trace 的函数增加支持返回dict的功能。
Python API
module 支持用复杂 key 来做 getattr。
module repr 支持 list/dict。

分布式训练

分布式训练增加返回值功能。

量化

调整了假量化 bias 的策略，只有在 weight 和 activation 都被量化时才对 bias 做假量化。
优化量化数据类型结构使量化框架支持第三方量化数据类型。

ARM

增加了 Matmul 的分块实现，优化某些 shape 下的性能。

Thanks to our Contributors

本次 release 非常感谢 @jia-kai 提交 PR ，期待更多的开发者一起共建 MegEngine！

Compatibility violation

Since C++ serialization adds new opname filed, C++ serialization file dumped by this version can not be loaded by earlier releases.
set/get_conv_execution_strategy is deprecated and set/get_execution_strategy is suggested to use.

Additional Note

Some functionals are moved to new modules for better orgnization. Backward compatibility is also gurrettened so the change is not expected to affact original usage. The moved functionals include:interpolate/roi_pooling/roi_align/nms/remap/warp_affine/warp_perspective/cvt_color are moved from funtional.nn to funtional.vision.
sigmoid/hsigmoid/relu/relu6/hswish are moved from functional.elemwsie to funtional.nn.
topk_accuracy is moved from functional.utils to funtional.metric copy is moved from functional.utils to funtional.tensor.
copy is moved from functional.utils to funtional.tensor.

Bug Fixes

General components

Fix shape inference in reshape which may lead to error in trace.
Fix the problem of trace memory leak.
Fix trace error caused by linspace.
Fix the bug in automatic differentiation which turns a scalar into an 1-dim tensor.
Fix NCHW-to-NCHW4 layout transform in gopt.
Fix memory leak when python frontend runs much faster without synchronization to the device.
Fix segfault caused by pyobject reference counting error.
Fix the illegal memory access in ROIAlign operator.
Fix CompNode reuse load error in some cases.
Fix the graph optimization error of NormalizeArithChainPass and WarpFusion.
Fix the device parameter in linspace.

Python API

Fix scalar as the input shape of F.full/F.ones/F.zeros.

Quantization

Fix comparision error of quantized data type.
Fix checkpoint loading error in quantized training.
Fix parameters which cannot be updated in TQT.
Fix gradient calculation in TQT.
FIx bug in user-defined TQT module.

Others

Fix set_mgb_log_level malfunction.
Fix freeze parameter in batch normalization.

New Features

General components

Support host computation for small tensors to reduce synchronization between host and device.
Add fast profile mode for fastrun.
Support recursive search in fastrun.
Add matmul support in fastrun.
Add disable-optimize-for-inference parameter to load_and_run.
Add automatic naming of op's based on module structure.
Add static shape inference for reshape operator.

Python API

Add new operators: TensorRT/Atalas/Cambricon(third party hardwares）, cvt_color, matinv, resize, warp_affine, deformable_conv2d, deformable_psroi_pooling、repeat、tile.
Enable tensor naming.

Distributed training

Support scalar tensors for distributed operators.

Tools

Add GraphInference in cgtools and support specifying output nodes.
Support model visualization and parameter statistics from .mge files.
Add python load_and_run.

Dataloader

Support setting timeout and callback function after timeout in stream dataloader.

ARM

Automatically detect ARM platform calculation characteristics and enable corresponding optimization.
Support inference on ARM64 with CUDA.

Improvements

General components

Support dict as returned value for traced function.

Python API

Add get/set_expand_structure to deal with complex key.
Support list and dict in module repr methods.

Distributed training

Add return values for distributed training.

Quantization

Adjust fake quantization method such that bias is quantized only both weight and activation are quantized.
Support user-defined quantized data type in quantized training.

ARM

Add more tiled kernels of Matmul to improve performance.

Thanks to our Contributors

A kind acknowledgement to PR lodged by @jia-kai , and we are genuinely welcoming more developers to co-build MegEngine!

项目简介

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

🚀 Github 镜像仓库 🚀

源项目地址 ⬇ ⬇ ⬇

https://github.com/MegEngine/MegEngine

Apache License 2.0
文件大小 6.3 MB
仓库大小 6.3 MB

发行版本 37

MegEngine v1.13.1

8月 31, 2023

全部发行版

贡献者 39

全部贡献者

开发语言

C++ 79.8 %
Cuda 13.8 %
Python 4.9 %
C 0.9 %
CMake 0.5 %

MegEngine 天元 / MegEngine 大约 1 年 前同步成功

兼容性破坏

其他说明

问题修复

通用组件

Python API

量化

其他

新功能

通用组件

Python API

分布式训练

周边工具

Dataloader

ARM

改进

通用组件

分布式训练

量化

ARM

Thanks to our Contributors

Compatibility violation

Additional Note

Bug Fixes

General components

Python API

Quantization

Others

New Features

General components

Python API

Distributed training

Tools

Dataloader

ARM

Improvements

General components

Python API

Distributed training

Quantization

ARM

Thanks to our Contributors

项目简介

发行版本 37

贡献者 39

开发语言

MegEngine 天元 / MegEngine
大约 1 年前同步成功