未验证 提交 11f2f784 编写于 作者: Z Zeng Jinle 提交者: GitHub

fix sofmax seg fault in AVX, test=develop (#19487)

上级 1fe468d3
......@@ -160,7 +160,7 @@ inline void vec_sum<float, platform::avx>(const size_t n, const float* x,
end = n & ~(block - 1);
__m256 tmp = _mm256_setzero_ps();
for (i = 0; i < end; i += block) {
tmp = _mm256_add_ps(tmp, _mm256_load_ps(x + i));
tmp = _mm256_add_ps(tmp, _mm256_loadu_ps(x + i));
}
__m256 hsum = _mm256_hadd_ps(tmp, tmp);
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册