未验证 提交 e6bc358d 编写于 作者: Z zhang wenhui 提交者: GitHub

【NPU】Cherry-pick ascendrc ops code by 0325 to develop (#32197)

* merge 31065

* Fix typo of selected_npus (#31230)

* merge 31249

* [NPU] Support npu op pow and pow grad (#31247)

* [NPU] Support npu op: (1) pow (2) pow_grad

* Support fp16

* Fix pow npu fp16 test (#31256)

* support list of list attribute for NPU (#31299)

* support list of list attribute for NPU

* fix compile problem

* fix reference

* [NPU] Support npu op: (1) slice (2) slice_grad (#31275)

* fix reading flags from env (#31329)

* merge 31347

* [NPU] Support npu op layer_norm and layer_norm_grad (#31310)

* init commit, add layer_norm npu kernel

* fix typo

* add unittest

* add unittest

* fix bug

* fix bug

* refine ut

* [NPU] add npu kernel for equal op (#31393)

* add npu kernel for equal op

* refine code

* add more ut

* update year

* [NPU] Support npu kernel for shape op  (#31427)

* add shape npu

* fix

* fix

* fix endif (#31431)

* Fix pow, use fillD instead of broadcast (#31433)

* Fix pow, refine code (#31440)

* fix cmake of cryptopp to avoid downloading every time (#31451)

* [NPU] squeeze and unsqueeze op for ascend (#31452)
Co-authored-by: Nroot <xiayanming@baidu.com>

* Support npu kernel for gather op (#31458)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* 【NPU】add scale op for npu (#31499)

* add scale npu

* fix

* fix

* Support TensorFormVector, TensorToVector of bool type (#31518)

* support TensorFormVector, TensorToVector of bool type

* add ut

* fix compile problem

* 【NPU】support npu kernel for fill_constant op (#31521)

* add fill_constant npu

* add fill_constant npu

* fix

* cherry-pick 31422, solve conflict

* 【NPU】Support npu kernel for matmul op (#31544)

* add matmulv2_npu

* add matmul

* add matmul

* [NPU] Support npu op elementwise_mul and elementwise_mul_grad (#31571)

* [NPU] Support npu op elementwise_max (#31574)

* 【NPU】add relu op for  npu (#31515)

* add relu npu

* fixed

* fix

* 【NPU】Suppert npu kernel for reshape2 op (#31524)

* add reshape2 npu

* add reshpe2

* [NPU] Support npu kernel for gather op fix bug (#31541)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* update gather_grad

* fix bug

* fix bug

* [NPU] Support npu kernel for amp_check_finite_and_unscale_npu op (#31457)

* Support npu kernel for amp_check_finite_and_unscale_npu op

* support EnforceNotMet exception

* fix exception bug

* modify python unittest

* precommit

* update c++ unittest

* fix review

* fix review

* [NPU] accuracy op (#31492)

* accuracy op

* fix license

* fix

* add test and fix bug

* [NPU] add Assign OP (#31561)

* add assign op

* add test assign npu test

* dele if def
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] fix npu op elementwise_mul_grad (#31592)

* 【NPU】Support npu op gelu and gelu_grad (#31530)

* Support npu op gelu and gelu_grad

* Support npu op gelu and gelu_grad

* [NPU] fix assgin cmake (#31595)

* fix gather_grad bug (#31607)

* [NPU] add range op (#31560)

* add range op

* fix codestyle; call GetSize directly
Co-authored-by: Noyjxer <1728722986@qq.com>

* 【NPU】Support npu op elementwise_div and elementwise_div_grad (#31573)

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* [NPU] Support npu op log, log_grad, sqrt, sqrt_grad, square, tanh and tanh_grad (#31600)

* [NPU] Support npu op logicalnot_op (#31534)

* [NPU] Support npu op elementwise_min (#31575)

* [NPU] Support npu op elementwise_pow (#31576)

* [NPU] Support npu op table_lookup_v2 and table_lookup_v2_grad (#31399)

* [npu] support npu kernel `table_lookup_v2`

* clean up

* +python test

* +cmake

* clean up

* remove int8 kernel
+ python unitest for fp16

* clean up

* [NPU] support npu kernel for `less_than` (#31327)

* [npu] support npu kernel for `less than`

* remove int* kernel

* cleanup

* [NPU] Support npu kernel scatter op (#31624)

* Support npu kernel scatter op

* Add more test

* [NPU] fix allocator min chunk size (#31632)

* [NPU] Support NPU kernel cast op (#31635)
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for sgd (#31639)

* 【NPU】Support NPU kernel for reduce_sum op v2 (#31620)

* add reduce_sum

* fix broadcastd

* fix test

* fix

* add unsqueeze in reduce_sum

* add template

* add unittest for keep_dim

* test reduce_all
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for adam (#31644)

* add npu kernel for adam

* refine code

* disable test

* modify atol

* 【NPU】Support npu kernel for mul op (#31584)

* add mul

* add test mul

* [NPU] add npu kernel for softmax_with_cross_entropy (#31656)

* init

* fix bugs

* [NPU] add npu kernel for mean Op (#31562)

* update mean op

* update mean op

* give a better test activation
Co-authored-by: Noyjxer <1728722986@qq.com>

* Revert "[NPU] add npu kernel for mean Op (#31562)" (#31665)

This reverts commit 468ac699.

* 【NPU】Add TensorCopy to NPU kernel for reduce_sum op  (#31667)

* update unittest

* add TensorCopy in npu grad kernel

* [NPU] Support npu op `expand` (#31405)

* [npu] support npu kernel  for `expand`

* [NPU] fix shape of dx in mul_grad (#31675)

* fix shape of dx

* refine code

* [NPU] add Increment op (#31563)

* add increment

* fix

* update test increment op inplace

* update increment op

* increment b = 2
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] add NPU add topk  (#31596)

* add topk op

* add cmake

* update topk npu op

* refactor func

* fix test not go npu TopKD bug

* NPUPlace(4) to NPUPlace(0)

* update comment
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] Support NPU kernel sum op (#31671)

* [NPU] npu support `transpose` (#31486)

* cherry-pick 31564, solve conflict

* [NPU] Fix bug: Fix calculation errors of pow grad npu kernel (#31699)

* [NPU] Support testing grad of NPU ops in OpTest (#31697)

* [NPU] Support NPU kernel of stack op (#31711)

* [NPU] Remove redundant ctest of top_k_op_npu_test (#31718)

* [NPU] fix reshape npu op kernel (#31726)

* rename npu op file

* fix reshape

* [NPU] change transpose to transpose2 (#31734)

* change transpose to transpose2

* fix bug

* [NPU] Support  mean npu kernel (#31729)

* [NPU] fix some bugs of npu op (#31739)

* fix softmax

* fix mean

* fix lookup_table_v2

* 【NPU】Fix npu kernel elementwise_div_grad  (#31753)

* [NPU] fix the grad kernel diff bug of gather op (#31757)

* fix gather grad kernel diff

* fix gather grad kernel diff

* fix gather review bug

* 【NPU】Fix reshape test & add grad test (#31776)

* fix

* fix

* [NPU] support fp16 for npu accuracy op (#31797)

* [NPU] support list of tensor input (#31801)

* support list of tensor as npu input

* add comment

* fix typo

* fix typo

* [NPU] add npu kernel for concat op (#31695)

* add npu kernel for concat op

* add npu kernel for concat op

* refine code

* update

* refine concat_grad

* [NPU] Support npu kernel for op elementwise_floordiv (#31822)

* [NPU] fix bug of lookup_table_v2_grad (#31834)

* [NPU] support default stream (#31510)

* [NPU] support mixed precision input for npu layer norm (#31847)

* support mixed precision input for npu layer norm

* fix layer_norm npu kernel
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* 【NPU】Support npu kernel for update_loss_scaling op (#31830)

* add update_loss_scaling_npu NPU kernel

* change TensorFromVec to Memset

* fix compile problem (#31850)

* [NPU] support npu for conditional_block op (#31854)

* 【NPU】Add int dtype kernel for reshape2 op (#31864)

* fix

* fix

* [NPU] fix some op bugs (#31855)

* fix some op bugs

* fix some bugs

* follow comments

* fix log level

* add ut

* [NPU] support fp16 of input for api pow (#31871)

* [NPU] add npu kernel for truncated_gaussian_random op (#31654)

* init

* add todo

* add npu kernel for truncated_gaussian_random

* add sync

* fix concat_grad

* fix typo

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix code style

* fix code style

* fix code

* Fix op test (#32231)

* fix conditional block (#32243)

* fix style code
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: NReventon_L <luyuxiang1994@qq.com>
Co-authored-by: Nroot <xiayanming@baidu.com>
Co-authored-by: Noyjxer <1728722986@qq.com>
Co-authored-by: Nyinhaofeng <66763551+yinhaofeng@users.noreply.github.com>
Co-authored-by: NOleNet <olenet@126.com>
Co-authored-by: NMeiyim <chen_xuyi@outlook.com>
Co-authored-by: Noyxuan-11 <963650125@qq.com>
Co-authored-by: Npangyoki <pangyoki@126.com>
上级 69d80274
...@@ -32,7 +32,7 @@ cache_third_party(extern_gloo ...@@ -32,7 +32,7 @@ cache_third_party(extern_gloo
TAG ${GLOO_TAG} TAG ${GLOO_TAG}
DIR GLOO_SOURCE_DIR) DIR GLOO_SOURCE_DIR)
if(WITH_ASCEND) if(WITH_ASCEND OR WITH_ASCEND_CL)
ExternalProject_Add( ExternalProject_Add(
extern_gloo extern_gloo
${EXTERNAL_PROJECT_LOG_ARGS} ${EXTERNAL_PROJECT_LOG_ARGS}
......
...@@ -242,7 +242,7 @@ endif() ...@@ -242,7 +242,7 @@ endif()
) )
ENDFUNCTION() ENDFUNCTION()
if(WITH_ASCEND) if(WITH_ASCEND OR WITH_ASCEND_CL)
SET(PROTOBUF_VERSION 3.8.0) SET(PROTOBUF_VERSION 3.8.0)
else() else()
SET(PROTOBUF_VERSION 3.1.0) SET(PROTOBUF_VERSION 3.1.0)
......
...@@ -16,7 +16,7 @@ INCLUDE(ExternalProject) ...@@ -16,7 +16,7 @@ INCLUDE(ExternalProject)
SET(THREADPOOL_PREFIX_DIR ${THIRD_PARTY_PATH}/threadpool) SET(THREADPOOL_PREFIX_DIR ${THIRD_PARTY_PATH}/threadpool)
SET(THREADPOOL_SOURCE_DIR ${THIRD_PARTY_PATH}/threadpool/src/extern_threadpool) SET(THREADPOOL_SOURCE_DIR ${THIRD_PARTY_PATH}/threadpool/src/extern_threadpool)
if(WITH_ASCEND) if(WITH_ASCEND OR WITH_ASCEND_CL)
SET(THREADPOOL_REPOSITORY https://gitee.com/tianjianhe/ThreadPool.git) SET(THREADPOOL_REPOSITORY https://gitee.com/tianjianhe/ThreadPool.git)
else() else()
SET(THREADPOOL_REPOSITORY ${GIT_URL}/progschj/ThreadPool.git) SET(THREADPOOL_REPOSITORY ${GIT_URL}/progschj/ThreadPool.git)
......
...@@ -43,7 +43,7 @@ cache_third_party(extern_warpctc ...@@ -43,7 +43,7 @@ cache_third_party(extern_warpctc
TAG ${WARPCTC_TAG} TAG ${WARPCTC_TAG}
DIR WARPCTC_SOURCE_DIR) DIR WARPCTC_SOURCE_DIR)
if(WITH_ASCEND) if(WITH_ASCEND OR WITH_ASCEND_CL)
ExternalProject_Add( ExternalProject_Add(
extern_warpctc extern_warpctc
${EXTERNAL_PROJECT_LOG_ARGS} ${EXTERNAL_PROJECT_LOG_ARGS}
......
...@@ -135,6 +135,7 @@ void TensorFromArray(const T* src, const size_t& array_size, ...@@ -135,6 +135,7 @@ void TensorFromArray(const T* src, const size_t& array_size,
} }
#endif #endif
} }
template <typename T> template <typename T>
void TensorFromVector(const std::vector<T>& src, void TensorFromVector(const std::vector<T>& src,
const platform::DeviceContext& ctx, Tensor* dst) { const platform::DeviceContext& ctx, Tensor* dst) {
...@@ -167,6 +168,49 @@ void TensorFromVector(const std::vector<T>& src, ...@@ -167,6 +168,49 @@ void TensorFromVector(const std::vector<T>& src,
#endif #endif
} }
// The fully specialized function should be inline to avoid
// multi-definition.
template <>
inline void TensorFromVector(const std::vector<bool>& src,
const platform::DeviceContext& ctx, Tensor* dst) {
// vector<bool> has no data() member, use array instead.
// See details:
// https://stackoverflow.com/questions/46115669/why-does-stdvectorbool-have-no-data/46115714
bool* array = new bool[src.size()];
for (unsigned int i = 0; i < src.size(); i++) {
array[i] = static_cast<bool>(src[i]);
}
auto dst_place = ctx.GetPlace();
auto src_ptr = static_cast<const void*>(array);
platform::CPUPlace src_place;
dst->Resize({static_cast<int64_t>(src.size())});
auto dst_ptr = static_cast<void*>(dst->mutable_data<bool>(dst_place));
auto size = src.size() * sizeof(bool);
if (platform::is_cpu_place(dst_place)) {
memory::Copy(BOOST_GET_CONST(platform::CPUPlace, dst_place), dst_ptr,
src_place, src_ptr, size);
}
#ifdef PADDLE_WITH_CUDA
else if (platform::is_gpu_place(dst_place)) { // NOLINT
memory::Copy(
BOOST_GET_CONST(platform::CUDAPlace, dst_place), dst_ptr, src_place,
src_ptr, size,
reinterpret_cast<const platform::CUDADeviceContext&>(ctx).stream());
}
#endif
#ifdef PADDLE_WITH_ASCEND_CL
else if (platform::is_npu_place(dst_place)) { // NOLINT
memory::Copy(
BOOST_GET_CONST(platform::NPUPlace, dst_place), dst_ptr, src_place,
src_ptr, size,
reinterpret_cast<const platform::NPUDeviceContext&>(ctx).stream());
}
#endif
delete[] array;
}
template <typename T> template <typename T>
void TensorFromVector(const std::vector<T>& src, Tensor* dst) { void TensorFromVector(const std::vector<T>& src, Tensor* dst) {
platform::CPUPlace dst_place = platform::CPUPlace(); platform::CPUPlace dst_place = platform::CPUPlace();
...@@ -179,6 +223,23 @@ void TensorFromVector(const std::vector<T>& src, Tensor* dst) { ...@@ -179,6 +223,23 @@ void TensorFromVector(const std::vector<T>& src, Tensor* dst) {
memory::Copy(dst_place, dst_ptr, src_place, src_ptr, size); memory::Copy(dst_place, dst_ptr, src_place, src_ptr, size);
} }
template <>
inline void TensorFromVector(const std::vector<bool>& src, Tensor* dst) {
bool* array = new bool[src.size()];
for (unsigned int i = 0; i < src.size(); i++) {
array[i] = static_cast<bool>(src[i]);
}
platform::CPUPlace dst_place = platform::CPUPlace();
auto src_ptr = static_cast<const void*>(array);
platform::CPUPlace src_place;
dst->Resize({static_cast<int64_t>(src.size())});
auto dst_ptr = static_cast<void*>(dst->mutable_data<bool>(dst_place));
auto size = src.size() * sizeof(bool);
memory::Copy(dst_place, dst_ptr, src_place, src_ptr, size);
delete[] array;
}
template <typename T> template <typename T>
void TensorToVector(const Tensor& src, const platform::DeviceContext& ctx, void TensorToVector(const Tensor& src, const platform::DeviceContext& ctx,
std::vector<T>* dst) { std::vector<T>* dst) {
...@@ -212,6 +273,46 @@ void TensorToVector(const Tensor& src, const platform::DeviceContext& ctx, ...@@ -212,6 +273,46 @@ void TensorToVector(const Tensor& src, const platform::DeviceContext& ctx,
#endif #endif
} }
template <>
inline void TensorToVector(const Tensor& src,
const platform::DeviceContext& ctx,
std::vector<bool>* dst) {
auto src_ptr = static_cast<const void*>(src.data<bool>());
auto size = src.numel() * sizeof(bool);
bool* array = new bool[src.numel()];
platform::CPUPlace dst_place;
dst->resize(src.numel());
auto dst_ptr = static_cast<void*>(array);
if (platform::is_cpu_place(src.place())) {
memory::Copy(dst_place, dst_ptr,
BOOST_GET_CONST(platform::CPUPlace, src.place()), src_ptr,
size);
}
#ifdef PADDLE_WITH_CUDA
else if (platform::is_gpu_place(src.place())) { // NOLINT
memory::Copy(
dst_place, dst_ptr, BOOST_GET_CONST(platform::CUDAPlace, src.place()),
src_ptr, size,
reinterpret_cast<const platform::CUDADeviceContext&>(ctx).stream());
}
#endif
#ifdef PADDLE_WITH_ASCEND_CL
else if (platform::is_npu_place(src.place())) { // NOLINT
memory::Copy(
dst_place, dst_ptr, BOOST_GET_CONST(platform::NPUPlace, src.place()),
src_ptr, size,
reinterpret_cast<const platform::NPUDeviceContext&>(ctx).stream());
}
#endif
for (unsigned int i = 0; i < src.numel(); i++) {
(*dst)[i] = static_cast<bool>(array[i]);
}
delete[] array;
}
template <typename T> template <typename T>
void TensorToVector(const Tensor& src, std::vector<T>* dst) { void TensorToVector(const Tensor& src, std::vector<T>* dst) {
auto src_ptr = static_cast<const void*>(src.data<T>()); auto src_ptr = static_cast<const void*>(src.data<T>());
...@@ -231,6 +332,32 @@ void TensorToVector(const Tensor& src, std::vector<T>* dst) { ...@@ -231,6 +332,32 @@ void TensorToVector(const Tensor& src, std::vector<T>* dst) {
BOOST_GET_CONST(platform::CPUPlace, src.place()), src_ptr, size); BOOST_GET_CONST(platform::CPUPlace, src.place()), src_ptr, size);
} }
template <>
inline void TensorToVector(const Tensor& src, std::vector<bool>* dst) {
auto src_ptr = static_cast<const void*>(src.data<bool>());
auto size = src.numel() * sizeof(bool);
bool* array = new bool[src.numel()];
platform::CPUPlace dst_place;
dst->resize(src.numel());
auto dst_ptr = static_cast<void*>(array);
PADDLE_ENFORCE_EQ(
platform::is_cpu_place(src.place()), true,
platform::errors::InvalidArgument(
"The input tensor should be CPU device, but actually it is in %s.",
src.place()));
memory::Copy(dst_place, dst_ptr,
BOOST_GET_CONST(platform::CPUPlace, src.place()), src_ptr, size);
for (unsigned int i = 0; i < src.numel(); i++) {
(*dst)[i] = static_cast<bool>(array[i]);
}
delete[] array;
}
std::ostream& operator<<(std::ostream& os, const Tensor& t); std::ostream& operator<<(std::ostream& os, const Tensor& t);
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -242,6 +242,61 @@ TEST(TensorToVector, Tensor) { ...@@ -242,6 +242,61 @@ TEST(TensorToVector, Tensor) {
#endif #endif
} }
TEST(TensorToVector, Tensor_bool) {
{
paddle::framework::Tensor src;
bool* src_ptr =
src.mutable_data<bool>({3, 3}, paddle::platform::CPUPlace());
for (int i = 0; i < 3 * 3; ++i) {
src_ptr[i] = static_cast<bool>(i % 2);
}
paddle::platform::CPUPlace place;
std::vector<bool> dst;
paddle::framework::TensorToVector<bool>(src, &dst);
for (int i = 0; i < 3 * 3; ++i) {
EXPECT_EQ(src_ptr[i], dst[i]);
}
}
#ifdef PADDLE_WITH_CUDA
{
std::vector<bool> src_vec = {
false, true, false, true, false, true, false, true, false,
};
paddle::framework::Tensor gpu_tensor;
paddle::platform::CUDAPlace place;
paddle::platform::CUDADeviceContext gpu_ctx(place);
paddle::framework::TensorFromVector<bool>(src_vec, gpu_ctx, &gpu_tensor);
std::vector<bool> dst;
paddle::framework::TensorToVector<bool>(gpu_tensor, gpu_ctx, &dst);
for (int i = 0; i < 3 * 3; ++i) {
EXPECT_EQ(src_vec[i], dst[i]);
}
}
#endif
#ifdef PADDLE_WITH_ASCEND_CL
{
std::vector<bool> src_vec = {
false, true, false, true, false, true, false, true, false,
};
paddle::framework::Tensor npu_tensor;
paddle::platform::NPUPlace place(0);
paddle::platform::NPUDeviceContext npu_ctx(place);
paddle::framework::TensorFromVector<bool>(src_vec, npu_ctx, &npu_tensor);
std::vector<bool> dst;
paddle::framework::TensorToVector<bool>(npu_tensor, npu_ctx, &dst);
for (int i = 0; i < 3 * 3; ++i) {
EXPECT_EQ(src_vec[i], dst[i]);
}
}
#endif
}
TEST(TensorFromDLPack, Tensor) { TEST(TensorFromDLPack, Tensor) {
{ {
std::vector<int> src_vec = {1, 2, 3, 4, 5, 6, 7, 8, 9}; std::vector<int> src_vec = {1, 2, 3, 4, 5, 6, 7, 8, 9};
......
...@@ -45,6 +45,17 @@ using Attribute = boost::variant< ...@@ -45,6 +45,17 @@ using Attribute = boost::variant<
using AttributeMap = std::unordered_map<std::string, Attribute>; using AttributeMap = std::unordered_map<std::string, Attribute>;
#ifdef PADDLE_WITH_ASCEND_CL
using NPUAttribute =
boost::variant<boost::blank, int, float, std::string, std::vector<int>,
std::vector<float>, std::vector<std::string>, bool,
std::vector<bool>, BlockDesc*, int64_t,
std::vector<BlockDesc*>, std::vector<int64_t>,
std::vector<double>, std::vector<std::vector<int64_t>>>;
using NPUAttributeMap = std::unordered_map<std::string, NPUAttribute>;
#endif
using OpCreator = std::function<OperatorBase*( using OpCreator = std::function<OperatorBase*(
const std::string& /*type*/, const VariableNameMap& /*inputs*/, const std::string& /*type*/, const VariableNameMap& /*inputs*/,
const VariableNameMap& /*outputs*/, const AttributeMap& /*attrs*/)>; const VariableNameMap& /*outputs*/, const AttributeMap& /*attrs*/)>;
......
...@@ -206,8 +206,16 @@ void Copy<platform::NPUPlace, platform::CPUPlace>(platform::NPUPlace dst_place, ...@@ -206,8 +206,16 @@ void Copy<platform::NPUPlace, platform::CPUPlace>(platform::NPUPlace dst_place,
if (UNLIKELY(num == 0)) return; if (UNLIKELY(num == 0)) return;
platform::SetNPUDeviceId(dst_place.device); platform::SetNPUDeviceId(dst_place.device);
// NOTE(ascendrc): NPU memcpy async from host to device is a "real" async,
// which is different from CUDA. In Paddle, when async is called, "sync"
// is run actually, which means Paddle doesn't fully supported async.
// TODO(ascendrc): Support NPU memcpy async for better performance.
stream = nullptr;
VLOG(4) << "memory::Copy " << num << " Bytes from " << src_place << " to " VLOG(4) << "memory::Copy " << num << " Bytes from " << src_place << " to "
<< dst_place << " by thream(" << stream << ")"; << dst_place << " by thream(" << stream << ")";
if (stream) { if (stream) {
platform::RecordEvent record_event("NpuMemcpyAsync:CPU->NPU"); platform::RecordEvent record_event("NpuMemcpyAsync:CPU->NPU");
platform::NPUMemcpyAsync(dst, src, num, ACL_MEMCPY_HOST_TO_DEVICE, stream); platform::NPUMemcpyAsync(dst, src, num, ACL_MEMCPY_HOST_TO_DEVICE, stream);
...@@ -226,8 +234,16 @@ void Copy<platform::CPUPlace, platform::NPUPlace>(platform::CPUPlace dst_place, ...@@ -226,8 +234,16 @@ void Copy<platform::CPUPlace, platform::NPUPlace>(platform::CPUPlace dst_place,
if (UNLIKELY(num == 0)) return; if (UNLIKELY(num == 0)) return;
platform::SetNPUDeviceId(src_place.device); platform::SetNPUDeviceId(src_place.device);
// NOTE(ascendrc): NPU memcpy async from device to host is a "real" async,
// which is different from CUDA. In Paddle, when async is called, "sync"
// is run actually, which means Paddle doesn't fully supported async.
// TODO(ascendrc): Support NPU memcpy async for better performance.
stream = nullptr;
VLOG(4) << "memory::Copy " << num << " Bytes from " << src_place << " to " VLOG(4) << "memory::Copy " << num << " Bytes from " << src_place << " to "
<< dst_place << " by thream(" << stream << ")"; << dst_place << " by thream(" << stream << ")";
if (stream) { if (stream) {
platform::RecordEvent record_event("NpuMemcpyAsync:NPU->CPU"); platform::RecordEvent record_event("NpuMemcpyAsync:NPU->CPU");
platform::NPUMemcpyAsync(dst, src, num, ACL_MEMCPY_DEVICE_TO_HOST, stream); platform::NPUMemcpyAsync(dst, src, num, ACL_MEMCPY_DEVICE_TO_HOST, stream);
......
...@@ -124,6 +124,7 @@ if (WITH_ASCEND) ...@@ -124,6 +124,7 @@ if (WITH_ASCEND)
endif() endif()
if (WITH_ASCEND_CL) if (WITH_ASCEND_CL)
cc_test(assign_op_npu_test SRCS assign_op_npu_test.cc DEPS assign_op)
cc_library(npu_op_runner SRCS npu_op_runner.cc DEPS operator npu_info) cc_library(npu_op_runner SRCS npu_op_runner.cc DEPS operator npu_info)
set(COMMON_OP_DEPS ${COMMON_OP_DEPS} npu_op_runner) set(COMMON_OP_DEPS ${COMMON_OP_DEPS} npu_op_runner)
endif() endif()
...@@ -141,8 +142,8 @@ set(OPERATOR_DEPS ${OPERATOR_DEPS} ${COMMON_OP_DEPS}) ...@@ -141,8 +142,8 @@ set(OPERATOR_DEPS ${OPERATOR_DEPS} ${COMMON_OP_DEPS})
set(GLOB_OPERATOR_DEPS ${OPERATOR_DEPS} CACHE INTERNAL "Global Op dependencies") set(GLOB_OPERATOR_DEPS ${OPERATOR_DEPS} CACHE INTERNAL "Global Op dependencies")
cc_test(test_common_infer_shape_functions SRCS test_common_infer_shape_functions.cc DEPS common_infer_shape_functions ${COMMON_OP_DEPS} activation_op elementwise_add_op softmax_op softmax) cc_test(test_common_infer_shape_functions SRCS test_common_infer_shape_functions.cc DEPS common_infer_shape_functions ${COMMON_OP_DEPS} activation_op elementwise_add_op softmax_op softmax)
cc_test(assign_op_test SRCS assign_op_test.cc DEPS assign_op)
cc_test(gather_test SRCS gather_test.cc DEPS tensor) cc_test(gather_test SRCS gather_test.cc DEPS tensor)
cc_test(assign_op_test SRCS assign_op_test.cc DEPS assign_op)
cc_test(scatter_test SRCS scatter_test.cc DEPS tensor math_function) cc_test(scatter_test SRCS scatter_test.cc DEPS tensor math_function)
cc_test(beam_search_decode_op_test SRCS beam_search_decode_op_test.cc DEPS lod_tensor) cc_test(beam_search_decode_op_test SRCS beam_search_decode_op_test.cc DEPS lod_tensor)
cc_test(strided_memcpy_test SRCS strided_memcpy_test.cc DEPS tensor memory) cc_test(strided_memcpy_test SRCS strided_memcpy_test.cc DEPS tensor memory)
...@@ -163,10 +164,19 @@ if (WITH_PYTHON) ...@@ -163,10 +164,19 @@ if (WITH_PYTHON)
cc_library(py_func_op SRCS py_func_op.cc DEPS op_registry python pybind) cc_library(py_func_op SRCS py_func_op.cc DEPS op_registry python pybind)
endif() endif()
if (WITH_ASCEND_CL)
cc_test(range_op_npu_test SRCS range_op_npu_test.cc DEPS op_registry range_op scope device_context enforce executor)
cc_test(lookup_table_v2_op_npu_test SRCS lookup_table_v2_op_npu_test.cc DEPS op_registry lookup_table_v2_op scope device_context enforce executor compare_op)
endif()
set(GLOB_OP_LIB ${OP_LIBRARY} CACHE INTERNAL "Global OP library") set(GLOB_OP_LIB ${OP_LIBRARY} CACHE INTERNAL "Global OP library")
add_subdirectory(benchmark) add_subdirectory(benchmark)
cc_test(op_debug_string_test SRCS op_debug_string_test.cc DEPS elementwise_add_op) cc_test(op_debug_string_test SRCS op_debug_string_test.cc DEPS elementwise_add_op)
if (WITH_ASCEND_CL)
cc_test(transpose_op_npu_test SRCS transpose_op_npu_test.cc DEPS op_registry transpose_op scope device_context enforce executor)
endif()
if(WITH_MKLDNN) if(WITH_MKLDNN)
include(mkldnn/inplace_op_tests.cmake) include(mkldnn/inplace_op_tests.cmake)
...@@ -180,3 +190,7 @@ if(WITH_UNITY_BUILD) ...@@ -180,3 +190,7 @@ if(WITH_UNITY_BUILD)
# The specified link dependency needs to be displayed here. # The specified link dependency needs to be displayed here.
target_link_libraries(paddle_operators_unity ${OP_HEADER_DEPS} ${COMMON_OP_DEPS}) target_link_libraries(paddle_operators_unity ${OP_HEADER_DEPS} ${COMMON_OP_DEPS})
endif() endif()
if(WITH_ASCEND_CL)
cc_test(gelu_op_npu_test SRCS gelu_op_npu_test.cc DEPS op_registry gelu_op scope device_context enforce executor)
endif()
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the Licnse. */
#include <memory>
#include <string>
#include "paddle/fluid/framework/ddim.h"
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/operators/activation_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class PowNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* out = ctx.Output<Tensor>("Out");
auto factor = ctx.Attr<float>("factor");
out->mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner("Power", {*x}, {*out},
{{"power", factor},
{"scale", static_cast<float>(1.0)},
{"shift", static_cast<float>(0.0)}});
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class PowGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto factor = ctx.Attr<float>("factor");
auto x_dims = x->dims();
auto place = ctx.GetPlace();
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
// NOTE(liym27): dx = dout * factor * x.pow(factor-1)
// Step1: Compute x_pow = x.pow(factor-1)
Tensor x_pow(x->type());
x_pow.mutable_data<T>(x->dims(), place);
auto runner_pow = NpuOpRunner("Power", {*x}, {x_pow},
{{"power", factor - static_cast<float>(1)}});
runner_pow.Run(stream);
// Step 2: Construct a broadcast factor, which has the same shape with x.
// 2.1 Get a factor tensor with shape [1].
Tensor factor_tensor(framework::proto::VarType::FP32);
factor_tensor.mutable_data<float>({1}, place);
TensorFromVector(std::vector<float>{factor}, ctx.device_context(),
&factor_tensor);
// 2.2 Get the factor which has the shape with x and the same value with
// factor.
Tensor factor_bc_tensor(framework::proto::VarType::FP32);
factor_bc_tensor.mutable_data<float>(x_dims, place);
auto runner_bc = NpuOpRunner("FillD", {factor_tensor}, {factor_bc_tensor},
{{"dims", framework::vectorize(x_dims)}});
runner_bc.Run(stream);
// Step 3: Compute x_power_mul_factor = factor * x.pow(factor-1)
Tensor x_power_mul_factor(x->type());
x_power_mul_factor.mutable_data<T>(x->dims(), place);
auto runner_mul_1 =
NpuOpRunner("Mul", {factor_bc_tensor, x_pow}, {x_power_mul_factor}, {});
runner_mul_1.Run(stream);
// Step 4: Compute dx = dout * factor * x.pow(factor-1)
dx->mutable_data<T>(place);
auto runner_mul_2 =
NpuOpRunner("Mul", {*dout, x_power_mul_factor}, {*dx}, {});
runner_mul_2.Run(stream);
}
};
template <typename DeviceContext, typename T>
class ReluNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* out = ctx.Output<Tensor>("Out");
out->mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner("Relu",
{
*x,
},
{*out}, {});
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class ReluGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* out = ctx.Input<Tensor>("Out");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
dx->mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner("ReluGrad", {*dout, *out}, {*dx}, {});
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class SqrtNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Sqrt", {*x}, {*out}, {});
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class SqrtGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* out = ctx.Input<Tensor>("Out");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto place = ctx.GetPlace();
dx->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto dx_runner = NpuOpRunner("SqrtGrad", {*out, *dout}, {*dx}, {});
dx_runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class LogNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
Tensor one(x->type());
one.mutable_data<T>(x->dims(), place);
auto one_runner = NpuOpRunner("OnesLike", {*x}, {one}, {});
one_runner.Run(stream);
Tensor sub(x->type());
sub.mutable_data<T>(x->dims(), place);
auto sub_runner = NpuOpRunner("Sub", {*x, one}, {sub}, {});
sub_runner.Run(stream);
auto out_runner = NpuOpRunner("Log1p", {sub}, {*out}, {});
out_runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class LogGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* x = ctx.Input<Tensor>("X");
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto place = ctx.GetPlace();
dx->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("DivNoNan", {*dout, *x}, {*dx}, {});
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class TanhNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Tanh", {*x}, {*out}, {});
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class TanhGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* out = ctx.Input<Tensor>("Out");
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto place = ctx.GetPlace();
dx->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto dx_runner = NpuOpRunner("TanhGrad", {*out, *dout}, {*dx}, {});
dx_runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class SquareNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Square", {*x}, {*out}, {});
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
pow, ops::PowNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::PowNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
pow_grad, ops::PowGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::PowGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
relu, ops::ReluNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ReluNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
relu_grad,
ops::ReluGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ReluGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
sqrt, ops::SqrtNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::SqrtNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
sqrt_grad,
ops::SqrtGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::SqrtGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
log, ops::LogNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::LogNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
log_grad, ops::LogGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::LogGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
tanh, ops::TanhNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::TanhNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
tanh_grad,
ops::TanhGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::TanhGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
square, ops::SquareNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::SquareNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>,
ops::SquareNPUKernel<paddle::platform::NPUDeviceContext, int>);
...@@ -4,3 +4,7 @@ if(WITH_UNITY_BUILD) ...@@ -4,3 +4,7 @@ if(WITH_UNITY_BUILD)
include(unity_build_rule.cmake) include(unity_build_rule.cmake)
endif() endif()
register_operators() register_operators()
if(WITH_ASCEND_CL)
cc_test(check_finite_and_unscale_op_npu_test SRCS check_finite_and_unscale_op_npu_test.cc DEPS op_registry check_finite_and_unscale_op scope device_context enforce executor)
endif()
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/operators/amp/check_finite_and_unscale_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T>
class CheckFiniteAndUnscaleNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const {
const auto xs = ctx.MultiInput<framework::Tensor>("X");
const auto* scale = ctx.Input<framework::Tensor>("Scale");
auto outs = ctx.MultiOutput<framework::Tensor>("Out");
auto* found_inf = ctx.Output<framework::Tensor>("FoundInfinite");
found_inf->mutable_data<bool>(ctx.GetPlace());
bool found_inf_data = false;
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
// step1: inverse scale(RealDiv)
Tensor const_tensor;
const_tensor.mutable_data<T>({1}, ctx.GetPlace());
TensorFromVector(std::vector<T>{static_cast<T>(1.0)}, ctx.device_context(),
&const_tensor);
ctx.template device_context<paddle::platform::NPUDeviceContext>().Wait();
// Inverse(1.0/scale)
Tensor* tmp_inverse_out = const_cast<Tensor*>(scale);
Tensor inverse_out(scale->type());
inverse_out.Resize(scale->dims());
inverse_out.mutable_data<T>(ctx.GetPlace());
auto runner_inverse =
NpuOpRunner("Div", {const_tensor, *scale}, {inverse_out}, {});
runner_inverse.Run(stream);
tmp_inverse_out = &inverse_out;
size_t x_size = xs.size();
for (size_t i = 0; i < x_size; ++i) {
found_inf_data = true;
const auto* x = xs[i];
auto* out = outs[i];
out->mutable_data<T>(ctx.GetPlace());
// step2: CheckNumerics
// CheckNumerics runs on the Ascend AI CPU, which delivers poor
// performance.
Tensor check_xout(x->type());
check_xout.Resize(x->dims());
check_xout.mutable_data<T>(ctx.GetPlace());
try {
auto runner_checknumerics =
NpuOpRunner("CheckNumerics", {*x}, {check_xout},
{{"message", std::string("check_nan_and_inf")}});
runner_checknumerics.Run(stream);
} catch (platform::EnforceNotMet& exception) {
LOG(WARNING) << "[check_nan_and_inf] detected contains NaN or INF!!!";
found_inf_data = true;
} catch (...) {
LOG(WARNING) << "[check_nan_and_inf] detected contains NaN or INF!!!";
found_inf_data = true;
}
if (!found_inf_data) {
// MatMul
auto runner_matmul =
NpuOpRunner("Mul", {*x, *tmp_inverse_out}, {*out}, {});
runner_matmul.Run(stream);
} else {
// ZerosLike
auto runner_zeroslike = NpuOpRunner("ZerosLike", {*x}, {*out}, {});
runner_zeroslike.Run(stream);
} // end if
} // end for
// set found_inf to true
if (found_inf_data) {
Tensor found_inf_tensor;
found_inf_tensor.Resize({1});
bool* is_found_inf =
found_inf_tensor.mutable_data<bool>(paddle::platform::CPUPlace());
*is_found_inf = true;
framework::TensorCopySync(found_inf_tensor, ctx.GetPlace(), found_inf);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(check_finite_and_unscale,
ops::CheckFiniteAndUnscaleNPUKernel<float>,
ops::CheckFiniteAndUnscaleNPUKernel<plat::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <algorithm>
#include <cstdlib>
#include <memory>
#include <random>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/platform/enforce.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
using Tensor = paddle::framework::Tensor;
USE_OP(check_finite_and_unscale);
USE_OP_DEVICE_KERNEL(check_finite_and_unscale, NPU);
struct InputVars {
std::string name;
f::LoDTensor *tensor;
};
template <typename T>
void Compare(f::Scope *scope, const p::DeviceContext &ctx) {
const f::DDim dims = f::make_ddim({2, 2});
auto place = ctx.GetPlace();
// init input
std::vector<InputVars> input_names = {
{"x", scope->Var("x")->GetMutable<f::LoDTensor>()},
{"x1", scope->Var("x1")->GetMutable<f::LoDTensor>()}};
auto *scale = scope->Var("scale")->GetMutable<f::LoDTensor>();
// init output
auto *out = scope->Var("out")->GetMutable<f::LoDTensor>();
auto *out1 = scope->Var("out1")->GetMutable<f::LoDTensor>();
auto *found_inf = scope->Var("found_inf")->GetMutable<f::LoDTensor>();
// Initialize input data
const int num_inputs = input_names.size();
size_t numel = static_cast<size_t>(f::product(dims));
for (int i = 0; i < num_inputs; ++i) {
std::vector<T> init_xs;
for (size_t j = 0; j < numel; ++j) {
if (j == 0) {
init_xs.push_back(static_cast<T>(NAN));
} else {
init_xs.push_back(static_cast<T>(j + 1));
}
}
f::TensorFromVector(init_xs, ctx, input_names[i].tensor);
input_names[i].tensor->Resize(dims);
}
f::TensorFromVector(std::vector<T>{static_cast<T>(0.5)}, ctx, scale);
ctx.Wait();
// run
f::AttributeMap attrs;
auto op = f::OpRegistry::CreateOp(
"check_finite_and_unscale", {{"X", {"x", "x1"}}, {"Scale", {"scale"}}},
{{"Out", {"out", "out1"}}, {"FoundInfinite", {"found_inf"}}}, attrs);
op->Run(*scope, place);
ctx.Wait();
// out0
std::vector<T> out_vec;
f::TensorToVector(*out, ctx, &out_vec);
EXPECT_EQ(out_vec.size(), static_cast<size_t>(4));
for (size_t j = 0; j < out_vec.size(); ++j) {
VLOG(3) << "out_vec[" << j << "]:" << out_vec[j];
}
ctx.Wait();
// out0
std::vector<T> out1_vec;
f::TensorToVector(*out1, ctx, &out1_vec);
EXPECT_EQ(out1_vec.size(), static_cast<size_t>(4));
for (size_t j = 0; j < out1_vec.size(); ++j) {
VLOG(3) << "out1_vec[" << j << "]:" << out1_vec[j];
}
ctx.Wait();
// out found_inf
Tensor found_inf_tensor;
found_inf_tensor.Resize({1});
bool *is_finite_data =
found_inf_tensor.mutable_data<bool>(paddle::platform::CPUPlace());
f::TensorCopy(*found_inf, place, &found_inf_tensor);
EXPECT_FALSE(*is_finite_data);
ctx.Wait();
}
TEST(check_finite_and_unscale, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx);
}
TEST(check_finite_and_unscale, NPU_fp16) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<p::float16>(&scope, ctx);
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/amp/update_loss_scaling_op.h"
#include <cmath>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T>
void Update(const platform::NPUDeviceContext& ctx,
const std::vector<bool> found_inf_vec,
const Tensor* pre_loss_scaling_tensor, const Tensor* good_in_tensor,
const Tensor* bad_in_tensor, const int incr_every_n_steps,
const int decr_every_n_nan_or_inf, const float incr_ratio,
const float decr_ratio, Tensor* updated_loss_scaling_tensor,
Tensor* good_out_tensor, Tensor* bad_out_tensor) {
auto place = ctx.GetPlace();
auto stream = ctx.stream();
if (found_inf_vec[0]) {
// good_out_data = 0
auto g = good_out_tensor->mutable_data<int>(place);
platform::NPUMemsetAsync(static_cast<void*>(g), 0,
good_out_tensor->numel() * sizeof(int), stream);
// bad_out_data = bad_in_data + 1
Tensor factor_tensor(bad_out_tensor->type());
factor_tensor.mutable_data<int>({1}, place);
TensorFromVector(std::vector<int>{1}, ctx, &factor_tensor);
auto runner_p2 = NpuOpRunner("Add", {*bad_in_tensor, factor_tensor},
{*bad_out_tensor}, {});
runner_p2.Run(stream);
std::vector<int> bad_out_data;
TensorToVector(*bad_out_tensor, ctx, &bad_out_data);
if (bad_out_data[0] == decr_every_n_nan_or_inf) {
auto runner_p3 = NpuOpRunner("Power", {*pre_loss_scaling_tensor},
{*updated_loss_scaling_tensor},
{{"power", static_cast<float>(1)},
{"scale", decr_ratio},
{"shift", static_cast<float>(0)}});
runner_p3.Run(stream);
std::vector<T> new_loss_scaling;
TensorToVector(*updated_loss_scaling_tensor, ctx, &new_loss_scaling);
if (new_loss_scaling[0] < static_cast<T>(1)) {
// updated_loss_scaling_data = 1
auto runner_p4 = NpuOpRunner("Power", {*pre_loss_scaling_tensor},
{*updated_loss_scaling_tensor},
{{"power", static_cast<float>(1)},
{"scale", static_cast<float>(0)},
{"shift", static_cast<float>(1)}});
runner_p4.Run(stream);
}
// bad_out_data = 0
auto b = bad_out_tensor->mutable_data<int>(place);
platform::NPUMemsetAsync(static_cast<void*>(b), 0,
bad_out_tensor->numel() * sizeof(int), stream);
}
} else {
// bad_out_data = 0
auto b = bad_out_tensor->mutable_data<int>(place);
platform::NPUMemsetAsync(static_cast<void*>(b), 0,
bad_out_tensor->numel() * sizeof(int), stream);
// good_out_data = good_in_data + 1
Tensor factor_tensor(good_out_tensor->type());
factor_tensor.mutable_data<int>({1}, place);
TensorFromVector(std::vector<int>{1}, ctx, &factor_tensor);
auto runner_p2 = NpuOpRunner("Add", {*good_in_tensor, factor_tensor},
{*good_out_tensor}, {});
runner_p2.Run(stream);
std::vector<int> good_out_data;
TensorToVector(*good_out_tensor, ctx, &good_out_data);
if (good_out_data[0] == incr_every_n_steps) {
auto runner_p3 = NpuOpRunner("Power", {*pre_loss_scaling_tensor},
{*updated_loss_scaling_tensor},
{{"power", static_cast<float>(1)},
{"scale", incr_ratio},
{"shift", static_cast<float>(0)}});
runner_p3.Run(stream);
std::vector<T> new_loss_scaling;
TensorToVector(*updated_loss_scaling_tensor, ctx, &new_loss_scaling);
if (!std::isfinite(new_loss_scaling[0])) {
// updated_loss_scaling_data = pre_loss_scaling_data
auto runner_p4 = NpuOpRunner("Power", {*pre_loss_scaling_tensor},
{*updated_loss_scaling_tensor},
{{"power", static_cast<float>(1)},
{"scale", static_cast<float>(1)},
{"shift", static_cast<float>(0)}});
runner_p4.Run(stream);
}
// good_out_data = 0
auto g = good_out_tensor->mutable_data<int>(place);
platform::NPUMemsetAsync(static_cast<void*>(g), 0,
good_out_tensor->numel() * sizeof(int), stream);
}
}
}
template <typename T>
class UpdateLossScalingFunctor<platform::NPUDeviceContext, T> {
public:
void operator()(const platform::NPUDeviceContext& dev_ctx,
const std::vector<bool> found_inf_vec,
const Tensor* pre_loss_scaling_tensor,
const Tensor* good_in_tensor, const Tensor* bad_in_tensor,
const int incr_every_n_steps,
const int decr_every_n_nan_or_inf, const float incr_ratio,
const float decr_ratio, Tensor* updated_loss_scaling_tensor,
Tensor* good_out_tensor, Tensor* bad_out_tensor) const {
Update<T>(dev_ctx, found_inf_vec, pre_loss_scaling_tensor, good_in_tensor,
bad_in_tensor, incr_every_n_steps, decr_every_n_nan_or_inf,
incr_ratio, decr_ratio, updated_loss_scaling_tensor,
good_out_tensor, bad_out_tensor);
}
};
template <typename T>
class LazyZerosNPU {
public:
void operator()(const platform::NPUDeviceContext& dev_ctx,
const std::vector<bool> found_inf_vec,
const std::vector<const framework::Tensor*>& xs,
const std::vector<framework::Tensor*>& outs) const {
for (size_t i = 0; i < xs.size(); ++i) {
auto* out = outs[i];
if (found_inf_vec[0]) {
VLOG(4) << "-- UpdateLossScaling: Find infinite grads. --";
auto place = dev_ctx.GetPlace();
auto stream = dev_ctx.stream();
auto g = out->mutable_data<T>(place);
platform::NPUMemsetAsync(static_cast<void*>(g), 0,
out->numel() * sizeof(T), stream);
}
}
}
};
template <typename DeviceContext, typename T>
class UpdateLossScalingNPUKernel : public framework::OpKernel<T> {
using MPDType = typename details::MPTypeTrait<T>::Type;
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto& dev_ctx = ctx.template device_context<DeviceContext>();
const auto xs = ctx.MultiInput<framework::Tensor>("X");
auto outs = ctx.MultiOutput<framework::Tensor>("Out");
const auto* found_inf = ctx.Input<Tensor>("FoundInfinite");
PADDLE_ENFORCE_EQ(found_inf->numel(), 1,
platform::errors::InvalidArgument(
"FoundInfinite must has only one element."));
std::vector<bool> found_inf_vec;
TensorToVector(*found_inf, ctx.device_context(), &found_inf_vec);
LazyZerosNPU<T>{}(dev_ctx, found_inf_vec, xs, outs);
const bool stop_update = ctx.Attr<bool>("stop_update");
if (stop_update) {
return;
}
const auto* pre_loss_scaling = ctx.Input<Tensor>("PrevLossScaling");
const auto* good_in = ctx.Input<Tensor>("InGoodSteps");
const auto* bad_in = ctx.Input<Tensor>("InBadSteps");
auto* updated_loss_scaling = ctx.Output<Tensor>("LossScaling");
auto* good_out = ctx.Output<Tensor>("OutGoodSteps");
auto* bad_out = ctx.Output<Tensor>("OutBadSteps");
updated_loss_scaling->mutable_data<MPDType>(dev_ctx.GetPlace());
good_out->mutable_data<int>(dev_ctx.GetPlace());
bad_out->mutable_data<int>(dev_ctx.GetPlace());
const int incr_every_n_steps = ctx.Attr<int>("incr_every_n_steps");
const int decr_every_n_nan_or_inf =
ctx.Attr<int>("decr_every_n_nan_or_inf");
const float incr_ratio = ctx.Attr<float>("incr_ratio");
const float decr_ratio = ctx.Attr<float>("decr_ratio");
UpdateLossScalingFunctor<DeviceContext, MPDType>{}(
dev_ctx, found_inf_vec, pre_loss_scaling, good_in, bad_in,
incr_every_n_steps, decr_every_n_nan_or_inf, incr_ratio, decr_ratio,
updated_loss_scaling, good_out, bad_out);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
update_loss_scaling,
ops::UpdateLossScalingNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::UpdateLossScalingNPUKernel<paddle::platform::NPUDeviceContext,
double>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <string>
#include "paddle/fluid/operators/assign_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/platform/float16.h"
namespace paddle {
namespace framework {
class OpDesc;
class Variable;
} // namespace framework
namespace imperative {
class OpBase;
} // namespace imperative
namespace platform {
struct CPUPlace;
struct CUDAPlace;
struct float16;
} // namespace platform
} // namespace paddle
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class AssignNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::LoDTensor>("X");
auto* out = ctx.Output<framework::LoDTensor>("Out");
out->mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner("Assign", {*out, *x}, {*out}, {});
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(
assign, ops::AssignNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::AssignNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::AssignNPUKernel<paddle::platform::NPUDeviceContext, double>)
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(assign);
USE_OP_DEVICE_KERNEL(assign, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx,
std::string op_type) {
// init
auto x = scope->Var("X");
auto tensor_x = x->GetMutable<f::LoDTensor>();
std::vector<T> init;
init.push_back(static_cast<T>(1.0));
init.push_back(static_cast<T>(2.0));
init.push_back(static_cast<T>(3.0));
init.push_back(static_cast<T>(4.0));
TensorFromVector(init, ctx, tensor_x);
tensor_x->Resize({4});
ctx.Wait();
auto place = ctx.GetPlace();
auto out = scope->Var("Out");
auto tensor_out = out->GetMutable<f::LoDTensor>();
auto op =
f::OpRegistry::CreateOp(op_type, {{"X", {"X"}}}, {{"Out", {"Out"}}}, {});
op->Run(*scope, place);
std::vector<T> out_vec;
TensorToVector(*tensor_out, ctx, &out_vec);
ctx.Wait();
EXPECT_EQ((uint32_t)out_vec.size(), (uint32_t)4);
EXPECT_EQ(out_vec[0], static_cast<T>(1.0));
EXPECT_EQ(out_vec[1], static_cast<T>(2.0));
EXPECT_EQ(out_vec[2], static_cast<T>(3.0));
EXPECT_EQ(out_vec[3], static_cast<T>(4.0));
}
TEST(assign, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx, "assign");
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <memory>
#include <string>
#include "paddle/fluid/operators/cast_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
static std::map<framework::proto::VarType::Type, aclDataType>
DTYPE_2_ACL_DTYPE = {
{framework::proto::VarType::BOOL, ACL_BOOL},
{framework::proto::VarType::INT16, ACL_INT16},
{framework::proto::VarType::INT32, ACL_INT32},
{framework::proto::VarType::INT64, ACL_INT64},
{framework::proto::VarType::FP16, ACL_FLOAT16},
{framework::proto::VarType::FP32, ACL_FLOAT},
{framework::proto::VarType::FP64, ACL_DOUBLE},
};
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class CastNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
int dtype = ctx.Attr<int>("out_dtype");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
auto iter = DTYPE_2_ACL_DTYPE.find(
static_cast<framework::proto::VarType::Type>(dtype));
int aclDtype = iter->second;
if (dtype == framework::proto::VarType::FP32) {
out->mutable_data<float>(place);
} else if (dtype == framework::proto::VarType::FP16) {
out->mutable_data<paddle::platform::float16>(place);
} else if (dtype == framework::proto::VarType::INT16) {
out->mutable_data<int16_t>(place);
} else if (dtype == framework::proto::VarType::INT32) {
out->mutable_data<int32_t>(place);
} else if (dtype == framework::proto::VarType::INT64) {
out->mutable_data<int64_t>(place);
} else if (dtype == framework::proto::VarType::FP64) {
out->mutable_data<double>(place);
} else if (dtype == framework::proto::VarType::BOOL) {
out->mutable_data<bool>(place);
}
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Cast", {*x}, {*out},
{{"dst_type", static_cast<int32_t>(aclDtype)}});
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddleaclDtype
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
cast, ops::CastNPUKernel<paddle::platform::NPUDeviceContext, int16_t>,
ops::CastNPUKernel<paddle::platform::NPUDeviceContext, int32_t>,
ops::CastNPUKernel<paddle::platform::NPUDeviceContext, int64_t>,
ops::CastNPUKernel<paddle::platform::NPUDeviceContext, bool>,
ops::CastNPUKernel<paddle::platform::NPUDeviceContext, double>,
ops::CastNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::CastNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
#endif
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/concat_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
template <typename T>
class ConcatNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto ins = ctx.MultiInput<framework::LoDTensor>("X");
framework::LoDTensor* out = ctx.Output<framework::LoDTensor>("Out");
PADDLE_ENFORCE_NOT_NULL(ins[0],
platform::errors::NotFound(
"The first input tensor is not initalized."));
auto axis = ctx.Attr<int>("axis");
if (ctx.HasInput("AxisTensor")) {
PADDLE_THROW(platform::errors::NotFound(
"The AxisTensor is not supported on NPU now."));
}
axis = ComputeAxis(static_cast<int64_t>(axis),
static_cast<int64_t>(ins[0]->dims().size()));
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
std::vector<framework::Tensor> inputs;
std::vector<std::string> names;
for (size_t i = 0; i < ins.size(); ++i) {
if (ins[i] && ins[i]->numel() > 0) {
inputs.push_back(*ins[i]);
names.push_back("x" + std::to_string(i));
} else {
continue;
}
}
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner(
"ConcatD", {inputs}, {*out},
{{"concat_dim", axis}, {"N", static_cast<int>(inputs.size())}});
runner.AddInputNames(names);
runner.Run(stream);
}
};
template <typename T>
class ConcatGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* out_grad =
ctx.Input<framework::Tensor>(framework::GradVarName("Out"));
auto ins = ctx.MultiInput<framework::LoDTensor>("X");
auto out_var_names = ctx.OutputNames(framework::GradVarName("X"));
auto outs =
ctx.MultiOutput<framework::LoDTensor>(framework::GradVarName("X"));
PADDLE_ENFORCE_NOT_NULL(ins[0],
platform::errors::NotFound(
"The first input tensor is not initalized."));
auto axis = ctx.Attr<int>("axis");
axis = ComputeAxis(static_cast<int64_t>(axis),
static_cast<int64_t>(ins[0]->dims().size()));
int offset = 0;
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
for (size_t j = 0; j < outs.size(); ++j) {
// For stop gradient
// get output tensor that the name is not kEmptyVarName
if (out_var_names[j] != framework::kEmptyVarName &&
outs[j]->numel() != 0UL) {
outs[j]->mutable_data<T>(ctx.GetPlace());
std::vector<int> offsets;
std::vector<int> sizes;
for (int dim = 0; dim < ins[j]->dims().size(); ++dim) {
if (dim == axis) {
offsets.push_back(offset);
sizes.push_back(ins[j]->dims()[dim]);
} else {
offsets.push_back(0);
sizes.push_back(ins[j]->dims()[dim]);
}
}
auto runner = NpuOpRunner("SliceD", {*out_grad}, {*outs[j]},
{{"offsets", offsets}, {"size", sizes}});
runner.Run(stream);
}
if (ins[j]->numel() != 0UL) {
offset += ins[j]->dims()[axis];
}
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(concat, ops::ConcatNPUKernel<float>,
ops::ConcatNPUKernel<paddle::platform::float16>,
ops::ConcatNPUKernel<int>);
REGISTER_OP_NPU_KERNEL(concat_grad, ops::ConcatGradNPUKernel<float>,
ops::ConcatGradNPUKernel<paddle::platform::float16>,
ops::ConcatGradNPUKernel<int>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <algorithm>
#include <string>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/op_version_registry.h"
#include "paddle/fluid/operators/controlflow/compare_op.h"
#include "paddle/fluid/operators/elementwise/elementwise_op_function.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#ifdef PADDLE_WITH_ASCEND_CL
namespace paddle {
namespace operators {
template <typename T>
class EqualNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::LoDTensor>("X");
auto* y = ctx.Input<framework::LoDTensor>("Y");
auto* out = ctx.Output<framework::LoDTensor>("Out");
out->mutable_data<bool>(ctx.GetPlace());
auto runner = NpuOpRunner("Equal", {*x, *y}, {*out}, {});
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class LessThanNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::LoDTensor>("X");
auto* y = ctx.Input<framework::LoDTensor>("Y");
auto* z = ctx.Output<framework::LoDTensor>("Out");
// int axis = context.Attr<int>("axis");
z->mutable_data<bool>(ctx.GetPlace()); // allocate
auto runner = NpuOpRunner("Less", {*x, *y}, {*z});
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(equal, ops::EqualNPUKernel<float>,
ops::EqualNPUKernel<plat::float16>,
ops::EqualNPUKernel<int>);
REGISTER_OP_NPU_KERNEL(
less_than,
ops::LessThanNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::LessThanNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
#endif
...@@ -78,6 +78,13 @@ class ConditionalOp : public framework::OperatorBase { ...@@ -78,6 +78,13 @@ class ConditionalOp : public framework::OperatorBase {
framework::TensorCopy(*ips[0], platform::CPUPlace(), &cpu_tensor); framework::TensorCopy(*ips[0], platform::CPUPlace(), &cpu_tensor);
platform::DeviceContextPool::Instance().Get(ips[0]->place())->Wait(); platform::DeviceContextPool::Instance().Get(ips[0]->place())->Wait();
res = cpu_tensor.data<bool>()[0]; res = cpu_tensor.data<bool>()[0];
#endif
} else if (platform::is_npu_place(ips[0]->place())) {
#ifdef PADDLE_WITH_ASCEND_CL
framework::LoDTensor cpu_tensor;
framework::TensorCopy(*ips[0], platform::CPUPlace(), &cpu_tensor);
platform::DeviceContextPool::Instance().Get(ips[0]->place())->Wait();
res = cpu_tensor.data<bool>()[0];
#endif #endif
} else { } else {
res = ips[0]->data<bool>()[0]; res = ips[0]->data<bool>()[0];
......
...@@ -44,6 +44,11 @@ static void DataCopy(const framework::LoDTensor &src_item, ...@@ -44,6 +44,11 @@ static void DataCopy(const framework::LoDTensor &src_item,
TensorCopySync(src_item, platform::CPUPlace(), dst_item); TensorCopySync(src_item, platform::CPUPlace(), dst_item);
} }
#else #else
#ifdef PADDLE_WITH_ASCEND_CL
if (platform::is_npu_place(src_item.place())) {
platform::DeviceContextPool::Instance().Get(src_item.place())->Wait();
}
#endif
TensorCopySync(src_item, platform::CPUPlace(), dst_item); TensorCopySync(src_item, platform::CPUPlace(), dst_item);
#endif #endif
} else { } else {
......
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <memory>
#include <string>
#include "paddle/fluid/operators/controlflow/logical_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class LogicalNotNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("LogicalNot", {*x}, {*out}, {});
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
logical_not,
ops::LogicalNotNPUKernel<paddle::platform::NPUDeviceContext, bool>);
#endif
...@@ -12,17 +12,18 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ...@@ -12,17 +12,18 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <memory> #include <memory>
#include <string> #include <string>
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/operators/elementwise/elementwise_add_op.h" #include "paddle/fluid/operators/elementwise/elementwise_add_op.h"
#include "paddle/fluid/operators/npu_op_runner.h" #include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T> template <typename T>
class ElementwiseAddNPUKernel : public framework::OpKernel<T> { class ElementwiseAddNPUKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
...@@ -39,12 +40,127 @@ class ElementwiseAddNPUKernel : public framework::OpKernel<T> { ...@@ -39,12 +40,127 @@ class ElementwiseAddNPUKernel : public framework::OpKernel<T> {
} }
}; };
template <typename T>
class ElementwiseAddGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
// NOTE(zhiqiu): It seems Ascend Sub follow the broadcast sematics with
// default axis=-1?
// So, the sub_grad should do reduce if needed.
// For example, the shape of each variable in elementwise_sub:
// x, dx: [2, 3, 5]
// y, dy: [1, 5]
// out, dout: [2, 3, 5]
// Then, out = x - y => dx = dout, dy = -dout
// And, the shape of dy can be computed by two stages reduce,
// 1. [2, 3, 5] => [3, 5], ReduceSumD on axis = 0, keep_dims = false.
// 2. [3, 5] => [1, 5], ReduceSumD on axis = 0, keep_dims = true.
if (dx) {
dx->mutable_data<T>(ctx.GetPlace());
// For dx
// stage 1
auto reduce_ndim = dout->dims().size() - dx->dims().size();
std::vector<int> axes;
for (auto i = 0; i < reduce_ndim; ++i) {
axes.push_back(i);
}
Tensor* tmp_dout = const_cast<Tensor*>(dout);
Tensor reduced_dout(dx->type());
if (axes.size() != 0) {
std::vector<int64_t> reduced_dout_dims;
for (auto i = reduce_ndim; i < dout->dims().size(); ++i) {
reduced_dout_dims.push_back(dout->dims()[i]);
}
reduced_dout.Resize(framework::make_ddim(reduced_dout_dims));
reduced_dout.mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner("ReduceSumD", {*dout}, {reduced_dout},
{{"axes", axes}, {"keep_dims", false}});
runner.Run(stream);
tmp_dout = &reduced_dout;
}
// stage 2
axes.clear();
for (auto i = 0; i < dx->dims().size(); ++i) {
if (dx->dims()[i] == 1) {
axes.push_back(i);
}
}
if (axes.size() != 0) {
auto runner = NpuOpRunner("ReduceSumD", {*tmp_dout}, {*dx},
{{"axes", axes}, {"keep_dims", true}});
runner.Run(stream);
} else {
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.Wait();
framework::TensorCopySync(*tmp_dout, ctx.GetPlace(), dx);
}
}
if (dy) {
// For dy
// stage 1
auto reduce_ndim = dout->dims().size() - dy->dims().size();
std::vector<int> axes;
for (auto i = 0; i < reduce_ndim; ++i) {
axes.push_back(i);
}
Tensor* tmp_dout = const_cast<Tensor*>(dout);
Tensor reduced_dout(dout->type());
if (axes.size() != 0) {
std::vector<int64_t> reduced_dout_dims;
for (auto i = reduce_ndim; i < dout->dims().size(); ++i) {
reduced_dout_dims.push_back(dout->dims()[i]);
}
reduced_dout.Resize(framework::make_ddim(reduced_dout_dims));
reduced_dout.mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner("ReduceSumD", {*dout}, {reduced_dout},
{{"axes", axes}, {"keep_dims", false}});
runner.Run(stream);
tmp_dout = &reduced_dout;
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.Wait();
}
// stage 2
axes.clear();
for (auto i = 0; i < dy->dims().size(); ++i) {
if (dy->dims()[i] == 1) {
axes.push_back(i);
}
}
if (axes.size() != 0) {
dy->mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner("ReduceSumD", {*tmp_dout}, {*dy},
{{"axes", axes}, {"keep_dims", true}});
runner.Run(stream);
} else {
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.Wait();
framework::TensorCopySync(*tmp_dout, ctx.GetPlace(), dy);
}
}
}
};
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(elementwise_add, ops::ElementwiseAddNPUKernel<float>,
ops::ElementwiseAddNPUKernel<plat::float16>);
REGISTER_OP_NPU_KERNEL( REGISTER_OP_NPU_KERNEL(elementwise_add_grad,
elementwise_add, ops::ElementwiseAddGradNPUKernel<float>,
ops::ElementwiseAddNPUKernel<paddle::platform::NPUDeviceContext, float>); ops::ElementwiseAddGradNPUKernel<plat::float16>);
#endif
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/elementwise/elementwise_div_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class ElementwiseDivNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Div", {*x, *y}, {*out}, {});
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class ElementwiseDivGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* out = ctx.Input<Tensor>("Out");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
auto place = ctx.GetPlace();
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
Tensor y_power(y->type());
y_power.mutable_data<T>(y->dims(), place);
auto y_power_runner = NpuOpRunner("Power", {*y}, {y_power},
{{"power", static_cast<float>(-1)}});
y_power_runner.Run(stream);
if (dx) {
dx->mutable_data<T>(place);
Tensor tensor_zeros(x->type());
tensor_zeros.mutable_data<T>(x->dims(), place);
auto tensor_zeros_runner =
NpuOpRunner("ZerosLike", {*x}, {tensor_zeros}, {});
tensor_zeros_runner.Run(stream);
Tensor x_zero(paddle::framework::proto::VarType::BOOL);
x_zero.mutable_data<bool>(x->dims(), place);
auto x_zero_runner =
NpuOpRunner("Equal", {*x, tensor_zeros}, {x_zero}, {});
x_zero_runner.Run(stream);
Tensor x_nozero(paddle::framework::proto::VarType::BOOL);
x_nozero.mutable_data<bool>(x->dims(), place);
auto x_nozero_runner =
NpuOpRunner("LogicalNot", {x_zero}, {x_nozero}, {});
x_nozero_runner.Run(stream);
Tensor x_nozero_f(x->type());
x_nozero_f.mutable_data<T>(x->dims(), place);
auto x_nozero_f_runner =
NpuOpRunner("Cast", {x_nozero}, {x_nozero_f},
{{"dst_type", static_cast<int32_t>(0)}});
x_nozero_f_runner.Run(stream);
Tensor x_grad_w(x->type());
x_grad_w.mutable_data<T>(x->dims(), place);
auto x_grad_w_runner =
NpuOpRunner("Mul", {x_nozero_f, y_power}, {x_grad_w}, {});
x_grad_w_runner.Run(stream);
auto x_grad_runner = NpuOpRunner("Mul", {x_grad_w, *dout}, {*dx}, {});
x_grad_runner.Run(stream);
}
if (dy) {
dy->mutable_data<T>(place);
Tensor neg_out(y->type());
neg_out.mutable_data<T>(y->dims(), place);
auto neg_out_runner = NpuOpRunner("Neg", {*out}, {neg_out}, {});
neg_out_runner.Run(stream);
Tensor y_grad_w(y->type());
y_grad_w.mutable_data<T>(y->dims(), place);
auto y_grad_w_runner = NpuOpRunner("Div", {neg_out, *y}, {y_grad_w}, {});
y_grad_w_runner.Run(stream);
auto y_grad_runner = NpuOpRunner("Mul", {y_grad_w, *dout}, {*dy}, {});
y_grad_runner.Run(stream);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
elementwise_div,
ops::ElementwiseDivNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ElementwiseDivNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
elementwise_div_grad,
ops::ElementwiseDivGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ElementwiseDivGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/elementwise/elementwise_div_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T>
class ElementwiseFloorDivNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* out = ctx.Output<Tensor>("Out");
out->mutable_data<T>(ctx.GetPlace());
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("FloorDiv", {*x, *y}, {*out}, {});
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(elementwise_floordiv,
ops::ElementwiseFloorDivNPUKernel<int>,
ops::ElementwiseFloorDivNPUKernel<int64_t>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/elementwise/elementwise_max_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class ElementwiseMaxNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Maximum", {*x, *y}, {*out}, {});
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
elementwise_max,
ops::ElementwiseMaxNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ElementwiseMaxNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/elementwise/elementwise_min_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class ElementwiseMinNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Minimum", {*x, *y}, {*out}, {});
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
elementwise_min,
ops::ElementwiseMinNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ElementwiseMinNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <memory>
#include <string>
#include "paddle/fluid/operators/elementwise/elementwise_mul_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class ElementwiseMulNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Mul", {*x, *y}, {*out}, {});
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class ElementwiseMulGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
auto place = ctx.GetPlace();
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
if (dx) {
dx->mutable_data<T>(place);
auto dx_runner = NpuOpRunner("Mul", {*dout, *y}, {*dx}, {});
dx_runner.Run(stream);
}
if (dy) {
dy->mutable_data<T>(place);
auto dy_runner = NpuOpRunner("Mul", {*x, *dout}, {*dy}, {});
dy_runner.Run(stream);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
elementwise_mul,
ops::ElementwiseMulNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ElementwiseMulNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
elementwise_mul_grad,
ops::ElementwiseMulGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ElementwiseMulGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
#endif
...@@ -74,6 +74,7 @@ void Compare(f::Scope* scope, const p::DeviceContext& ctx, ...@@ -74,6 +74,7 @@ void Compare(f::Scope* scope, const p::DeviceContext& ctx,
{{"Out", {"Out"}}}, attrs); {{"Out", {"Out"}}}, attrs);
op->Run(*scope, place); op->Run(*scope, place);
ctx.Wait();
std::vector<T> out_vec; std::vector<T> out_vec;
TensorToVector(*tensor_out, ctx, &out_vec); TensorToVector(*tensor_out, ctx, &out_vec);
...@@ -131,6 +132,7 @@ void CompareGrad(f::Scope* scope, const p::DeviceContext& ctx, ...@@ -131,6 +132,7 @@ void CompareGrad(f::Scope* scope, const p::DeviceContext& ctx,
auto place = ctx.GetPlace(); auto place = ctx.GetPlace();
op->Run(*scope, place); op->Run(*scope, place);
ctx.Wait();
std::vector<T> dx_vec; std::vector<T> dx_vec;
TensorToVector(*tensor_dx, ctx, &dx_vec); TensorToVector(*tensor_dx, ctx, &dx_vec);
...@@ -179,3 +181,9 @@ TEST(elementwise_sub_grad, NPU) { ...@@ -179,3 +181,9 @@ TEST(elementwise_sub_grad, NPU) {
p::NPUDeviceContext ctx(p::NPUPlace(0)); p::NPUDeviceContext ctx(p::NPUPlace(0));
CompareGrad<float>(&scope, ctx, "elementwise_sub_grad"); CompareGrad<float>(&scope, ctx, "elementwise_sub_grad");
} }
TEST(elementwise_add_grad, NPU) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
CompareGrad<float>(&scope, ctx, "elementwise_add_grad");
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/elementwise/elementwise_pow_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class ElementwisePowNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* y = ctx.Input<Tensor>("Y");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Pow", {*x, *y}, {*out}, {});
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
elementwise_pow,
ops::ElementwisePowNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ElementwisePowNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
...@@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ...@@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <memory> #include <memory>
#include <string> #include <string>
...@@ -24,7 +23,7 @@ namespace operators { ...@@ -24,7 +23,7 @@ namespace operators {
using Tensor = framework::Tensor; using Tensor = framework::Tensor;
template <typename DeviceContext, typename T> template <typename T>
class ElementwiseSubNPUKernel : public framework::OpKernel<T> { class ElementwiseSubNPUKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
...@@ -43,7 +42,7 @@ class ElementwiseSubNPUKernel : public framework::OpKernel<T> { ...@@ -43,7 +42,7 @@ class ElementwiseSubNPUKernel : public framework::OpKernel<T> {
} }
}; };
template <typename DeviceContext, typename T> template <typename T>
class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> { class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
...@@ -51,8 +50,9 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> { ...@@ -51,8 +50,9 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> {
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X")); auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y")); auto* dy = ctx.Output<Tensor>(framework::GradVarName("Y"));
dx->mutable_data<T>(ctx.GetPlace()); auto stream =
dy->mutable_data<T>(ctx.GetPlace()); ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
// NOTE(zhiqiu): It seems Ascend Sub follow the broadcast sematics with // NOTE(zhiqiu): It seems Ascend Sub follow the broadcast sematics with
// default axis=-1? // default axis=-1?
...@@ -66,9 +66,8 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> { ...@@ -66,9 +66,8 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> {
// 1. [2, 3, 5] => [3, 5], ReduceSumD on axis = 0, keep_dims = false. // 1. [2, 3, 5] => [3, 5], ReduceSumD on axis = 0, keep_dims = false.
// 2. [3, 5] => [1, 5], ReduceSumD on axis = 0, keep_dims = true. // 2. [3, 5] => [1, 5], ReduceSumD on axis = 0, keep_dims = true.
auto stream = if (dx) {
ctx.template device_context<paddle::platform::NPUDeviceContext>() dx->mutable_data<T>(ctx.GetPlace());
.stream();
// For dx // For dx
// stage 1 // stage 1
auto reduce_ndim = dout->dims().size() - dx->dims().size(); auto reduce_ndim = dout->dims().size() - dx->dims().size();
...@@ -76,7 +75,7 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> { ...@@ -76,7 +75,7 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> {
for (auto i = 0; i < reduce_ndim; ++i) { for (auto i = 0; i < reduce_ndim; ++i) {
axes.push_back(i); axes.push_back(i);
} }
auto tmp_dout = dout; Tensor* tmp_dout = const_cast<Tensor*>(dout);
Tensor reduced_dout(dx->type()); Tensor reduced_dout(dx->type());
if (axes.size() != 0) { if (axes.size() != 0) {
std::vector<int64_t> reduced_dout_dims; std::vector<int64_t> reduced_dout_dims;
...@@ -105,16 +104,19 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> { ...@@ -105,16 +104,19 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> {
} else { } else {
framework::TensorCopySync(*tmp_dout, ctx.GetPlace(), dx); framework::TensorCopySync(*tmp_dout, ctx.GetPlace(), dx);
} }
}
if (dy) {
dy->mutable_data<T>(ctx.GetPlace());
// For dy // For dy
// stage 1 // stage 1
reduce_ndim = dout->dims().size() - dy->dims().size(); auto reduce_ndim = dout->dims().size() - dy->dims().size();
axes.clear(); std::vector<int> axes;
for (auto i = 0; i < reduce_ndim; ++i) { for (auto i = 0; i < reduce_ndim; ++i) {
axes.push_back(i); axes.push_back(i);
} }
tmp_dout = dout; Tensor* tmp_dout = const_cast<Tensor*>(dout);
Tensor reduced_dy(dy->type()); Tensor reduced_dy(dy->type());
Tensor reduced_dout(dy->type());
if (axes.size() != 0) { if (axes.size() != 0) {
std::vector<int64_t> reduced_dout_dims; std::vector<int64_t> reduced_dout_dims;
...@@ -131,7 +133,7 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> { ...@@ -131,7 +133,7 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> {
// stage 2 // stage 2
axes.clear(); axes.clear();
auto* tmp_dy = tmp_dout; Tensor* tmp_dy = tmp_dout;
for (auto i = 0; i < dy->dims().size(); ++i) { for (auto i = 0; i < dy->dims().size(); ++i) {
if (dy->dims()[i] == 1) { if (dy->dims()[i] == 1) {
axes.push_back(i); axes.push_back(i);
...@@ -150,22 +152,18 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> { ...@@ -150,22 +152,18 @@ class ElementwiseSubGradNPUKernel : public framework::OpKernel<T> {
auto runner = NpuOpRunner("Neg", {*tmp_dy}, {*dy}, {}); auto runner = NpuOpRunner("Neg", {*tmp_dy}, {*dy}, {});
runner.Run(stream); runner.Run(stream);
} }
}
}; };
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(elementwise_sub, ops::ElementwiseSubNPUKernel<float>,
ops::ElementwiseSubNPUKernel<plat::float16>);
REGISTER_OP_NPU_KERNEL( REGISTER_OP_NPU_KERNEL(elementwise_sub_grad,
elementwise_sub, ops::ElementwiseSubGradNPUKernel<float>,
ops::ElementwiseSubNPUKernel<paddle::platform::NPUDeviceContext, float>, ops::ElementwiseSubGradNPUKernel<plat::float16>);
ops::ElementwiseSubNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
elementwise_sub_grad,
ops::ElementwiseSubGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ElementwiseSubGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
#endif
...@@ -64,6 +64,12 @@ inline std::vector<int> get_expand_times( ...@@ -64,6 +64,12 @@ inline std::vector<int> get_expand_times(
TensorCopySync(*expand_tensor, platform::CPUPlace(), &cpu_expand_tensor); TensorCopySync(*expand_tensor, platform::CPUPlace(), &cpu_expand_tensor);
expand_data = cpu_expand_tensor.data<int>(); expand_data = cpu_expand_tensor.data<int>();
} }
#ifdef PADDLE_WITH_ASCEND_CL
if (platform::is_npu_place(expand_tensor->place())) {
TensorCopySync(*expand_tensor, platform::CPUPlace(), &cpu_expand_tensor);
expand_data = cpu_expand_tensor.data<int>();
}
#endif
#ifdef PADDLE_WITH_XPU #ifdef PADDLE_WITH_XPU
if (platform::is_xpu_place(expand_tensor->place())) { if (platform::is_xpu_place(expand_tensor->place())) {
TensorCopySync(*expand_tensor, platform::CPUPlace(), &cpu_expand_tensor); TensorCopySync(*expand_tensor, platform::CPUPlace(), &cpu_expand_tensor);
......
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <iostream>
#include <memory>
#include <string>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/expand_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class ExpandNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto rank = context.Input<Tensor>("X")->dims().size();
PADDLE_ENFORCE_GE(
rank, 1,
platform::errors::InvalidArgument(
"The number of dimensions of the input 'x' for Op(expand) "
"must be greater than or equal to 1, but the value received is %d.",
rank));
PADDLE_ENFORCE_LE(
rank, MAX_RANK_SUPPORTED,
platform::errors::InvalidArgument(
"The number of dimensions of the input 'x' for Op(expand) "
"must be less than or equal to %d, but the value received is %d.",
MAX_RANK_SUPPORTED, rank));
switch (rank) { REP_EXPAND_TEMPLATE(MAX_RANK_SUPPORTED) }
}
protected:
template <int Rank>
void Expand(const framework::ExecutionContext& context) const {
auto* in0 = context.Input<framework::LoDTensor>("X");
auto in_dims = in0->dims();
auto expand_times = get_expand_times(context);
PADDLE_ENFORCE_EQ(
static_cast<size_t>(in_dims.size()), expand_times.size(),
platform::errors::InvalidArgument(
"The number of elements (%d) of 'expand_times' for "
"Op(expand) must be equal to the number "
"of dimensions (%d) of the input.",
expand_times.size(), static_cast<size_t>(in_dims.size())));
auto* out0 = context.Output<framework::LoDTensor>("Out");
framework::DDim out_dims(in_dims);
for (size_t i = 0; i < expand_times.size(); ++i) {
out_dims[i] *= expand_times[i];
}
out0->Resize(out_dims);
out0->mutable_data<T>(context.device_context().GetPlace());
auto runner =
NpuOpRunner("TileD", {*in0}, {*out0}, {{"multiples", expand_times}});
auto stream =
context.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
expand, ops::ExpandNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ExpandNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
#endif
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <iostream>
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(expand);
USE_OP_DEVICE_KERNEL(expand, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto in = scope->Var("X");
auto expand_times = scope->Var("ExpandTimes");
auto out = scope->Var("Out");
auto in_t = in->GetMutable<f::LoDTensor>();
auto out_t = out->GetMutable<f::LoDTensor>();
auto expand_times_t = expand_times->GetMutable<f::LoDTensor>();
auto place = ctx.GetPlace();
TensorFromVector(std::vector<T>(3 * 1 * 7, 1), ctx, in_t);
TensorFromVector(std::vector<int>({1, 10, 1}), ctx, expand_times_t);
in_t->Resize(f::make_ddim({3, 1, 7}));
expand_times_t->Resize(f::make_ddim({3}));
out_t->Resize(f::make_ddim({3, 10, 7}));
out_t->mutable_data<T>(place);
f::AttributeMap attrs = {{}};
auto op = f::OpRegistry::CreateOp(
"expand", {{"X", {"X"}}, {"ExpandTimes", {"ExpandTimes"}}},
{{"Out", {"Out"}}}, attrs);
op->Run(*scope, place);
ctx.Wait();
auto out_dim = out_t->dims();
EXPECT_EQ(out_dim.at(0), 3);
EXPECT_EQ(out_dim.at(1), 10);
EXPECT_EQ(out_dim.at(2), 7);
}
TEST(expand, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx);
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/fill_constant_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/utils.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class FillConstantNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto data_type =
static_cast<framework::proto::VarType::Type>(ctx.Attr<int>("dtype"));
auto str_value = ctx.Attr<std::string>("str_value");
auto float_value = ctx.Attr<float>("value");
auto* out_var = ctx.Output<framework::Tensor>("Out");
auto place = ctx.GetPlace();
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
T value;
if (str_value.empty()) {
value = static_cast<T>(float_value);
} else {
// handle NaN/Inf first, which cannot be read from stream.
if (str_value == "inf") {
value = static_cast<T>(std::numeric_limits<double>::infinity());
} else if (str_value == "-inf") {
value = static_cast<T>(-std::numeric_limits<double>::infinity());
} else if (str_value == "nan") {
value = static_cast<T>(std::numeric_limits<double>::quiet_NaN());
} else {
std::stringstream convert_stream(str_value);
if (std::is_same<int64_t, T>::value) {
int64_t tmp_value;
convert_stream >> tmp_value;
value = static_cast<T>(tmp_value);
} else {
double tmp_value;
convert_stream >> tmp_value;
value = static_cast<T>(tmp_value);
}
}
}
auto shape = GetShape(ctx);
Tensor tensor_tmp(data_type);
tensor_tmp.mutable_data<T>({1}, ctx.GetPlace());
TensorFromVector(std::vector<T>{value}, ctx.device_context(), &tensor_tmp);
out_var->mutable_data<T>(shape, place);
auto runner = NpuOpRunner("FillD", {tensor_tmp}, {*out_var},
{{"dims", framework::vectorize(shape)}});
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
fill_constant,
ops::FillConstantNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::FillConstantNPUKernel<paddle::platform::NPUDeviceContext, bool>,
ops::FillConstantNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::FillConstantNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/gather_op.h"
#include <memory>
#include <string>
#include <vector>
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/operators/kron_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/platform/npu_info.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class GatherOpNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
auto *x = ctx.Input<Tensor>("X");
auto *index = ctx.Input<Tensor>("Index");
auto *out = ctx.Output<Tensor>("Out");
out->mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner("Gather", {*x, *index}, {*out},
{{"validate_indices", true}});
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class GatherGradOpNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
auto *index = ctx.Input<Tensor>("Index");
auto *x = ctx.Input<Tensor>("X");
auto *dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto *dx = ctx.Output<Tensor>(framework::GradVarName("X"));
// step1: Unsqueeze index
framework::Tensor tmp_tensor(index->type());
const auto index_dims = index->dims();
if (index_dims.size() == 1) {
tmp_tensor.ShareDataWith(*index);
std::vector<int64_t> new_dim = {index_dims[0], 1};
tmp_tensor.Resize(framework::make_ddim(new_dim));
index = &tmp_tensor;
}
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
// step2: ZerosLike x in device
Tensor zeroslike_xout(x->type());
zeroslike_xout.Resize(x->dims());
auto p = zeroslike_xout.mutable_data<T>(ctx.GetPlace());
platform::NPUMemsetAsync(static_cast<void *>(p), 0,
zeroslike_xout.numel() * sizeof(T), stream);
// step3: scatter(x_grad)
dx->mutable_data<T>(ctx.GetPlace());
auto runner_scatter = NpuOpRunner(
"TensorScatterUpdate", {zeroslike_xout, *index, *dout}, {*dx}, {});
runner_scatter.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
gather, ops::GatherOpNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::GatherOpNPUKernel<paddle::platform::NPUDeviceContext, double>,
ops::GatherOpNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
gather_grad,
ops::GatherGradOpNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::GatherGradOpNPUKernel<paddle::platform::NPUDeviceContext, double>,
ops::GatherGradOpNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/gather_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(gather);
USE_OP_DEVICE_KERNEL(gather, NPU);
USE_OP(gather_grad);
USE_OP_DEVICE_KERNEL(gather_grad, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx,
std::string op_type) {
// init
auto x = scope->Var("X");
auto tensor_x = x->GetMutable<f::LoDTensor>();
auto index = scope->Var("Index");
auto tensor_index = index->GetMutable<f::LoDTensor>();
std::vector<T> init_x;
for (int64_t i = 1; i < 7; ++i) {
// 1,2,3,4,5,6
init_x.push_back(static_cast<T>(i));
}
// [[1, 2],[3, 4],[5, 6]]
TensorFromVector(init_x, ctx, tensor_x);
tensor_x->Resize(paddle::framework::make_ddim({3, 2}));
std::vector<int> init_index = {1, 2};
paddle::framework::TensorFromVector<int>(init_index, ctx, tensor_index);
tensor_index->Resize(paddle::framework::make_ddim({2}));
ctx.Wait();
auto out = scope->Var("Out");
auto tensor_out = out->GetMutable<f::LoDTensor>();
// run
f::AttributeMap attrs = {{"validate_indices", true}};
auto op = f::OpRegistry::CreateOp(
op_type, {{"X", {"X"}}, {"Index", {"Index"}}}, {{"Out", {"Out"}}}, attrs);
auto place = ctx.GetPlace();
op->Run(*scope, place);
std::vector<T> out_vec;
TensorToVector(*tensor_out, ctx, &out_vec);
ctx.Wait();
// ref:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/tensor/manipulation/gather_cn.html#gather
for (int i = 0; i < static_cast<int>(out_vec.size()); ++i) {
VLOG(3) << "out_vec[" << i << "] : " << out_vec[i];
}
uint32_t expected_size = 4;
EXPECT_EQ((uint32_t)out_vec.size(), expected_size);
// {3, 4, 5, 6}
std::vector<T> expected_out_vec;
for (int64_t i = 3; i < 7; ++i) {
expected_out_vec.push_back(static_cast<T>(i));
}
for (uint32_t i = 0; i < out_vec.size(); i++) {
EXPECT_EQ(out_vec[i], expected_out_vec[i]);
}
}
template <typename T>
void CompareGrad(f::Scope* scope, const p::DeviceContext& ctx,
std::string op_type) {
// init
auto index = scope->Var("Index");
auto tensor_index = index->GetMutable<f::LoDTensor>();
auto x = scope->Var("X");
auto tensor_x = x->GetMutable<f::LoDTensor>();
auto dout = scope->Var("DOut");
auto tensor_dout = dout->GetMutable<f::LoDTensor>();
std::vector<int> init_index = {0, 1};
paddle::framework::TensorFromVector<int>(init_index, ctx, tensor_index);
tensor_index->Resize(paddle::framework::make_ddim({2}));
std::vector<T> init_x = {1.0, 1.0, 1.0, 1.0, 1.0, 1.0};
TensorFromVector(init_x, ctx, tensor_x);
tensor_x->Resize(paddle::framework::make_ddim({3, 2}));
std::vector<T> init_dout = {5.0, 10.0, 2.0, 3.0};
TensorFromVector(init_dout, ctx, tensor_dout);
tensor_dout->Resize(paddle::framework::make_ddim({2, 2}));
ctx.Wait();
auto dx = scope->Var("DX");
auto tensor_dx = dx->GetMutable<f::LoDTensor>();
// run
f::AttributeMap attrs;
auto op = f::OpRegistry::CreateOp(
op_type, {{"X", {"X"}}, {"Index", {"Index"}}, {"Out@GRAD", {"DOut"}}},
{{"X@GRAD", {"DX"}}}, attrs);
auto place = ctx.GetPlace();
op->Run(*scope, place);
std::vector<T> dx_vec;
TensorToVector(*tensor_dx, ctx, &dx_vec);
ctx.Wait();
uint32_t expected_size = 3 * 2;
EXPECT_EQ((uint32_t)dx_vec.size(), expected_size);
std::vector<T> expected_dx_vec = {5.0, 10.0, 2.0, 3.0, 0.0, 0.0};
for (uint32_t i = 0; i < dx_vec.size(); i++) {
VLOG(3) << "dx_vec[i]=" << dx_vec[i];
EXPECT_EQ(dx_vec[i], expected_dx_vec[i]);
}
}
TEST(gather, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx, "gather");
}
TEST(gather, NPU_fp16) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<p::float16>(&scope, ctx, "gather");
}
TEST(gather_grad, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
CompareGrad<float>(&scope, ctx, "gather_grad");
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/gelu_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class GeluNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Gelu", {*x}, {*out}, {});
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class GeluGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto place = ctx.GetPlace();
dx->mutable_data<T>(place);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
Tensor out(x->type());
out.mutable_data<T>(x->dims(), place);
auto out_runner = NpuOpRunner("Gelu", {*x}, {out}, {});
out_runner.Run(stream);
auto dx_runner = NpuOpRunner("GeluGrad", {*dout, *x, out}, {*dx}, {});
dx_runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
gelu, ops::GeluNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::GeluNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
gelu_grad,
ops::GeluGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::GeluGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(gelu);
USE_OP_DEVICE_KERNEL(gelu, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto x = scope->Var("X");
auto tensor_x = x->GetMutable<f::LoDTensor>();
std::vector<T> init_x;
for (int64_t i = 0; i < 10 * 10; ++i) {
init_x.push_back(static_cast<T>(1.0));
}
TensorFromVector(init_x, ctx, tensor_x);
tensor_x->Resize({10, 10});
auto out = scope->Var("Out");
auto tensor_out = out->GetMutable<f::LoDTensor>();
f::AttributeMap attrs;
ctx.Wait();
// run
auto place = ctx.GetPlace();
auto op = f::OpRegistry::CreateOp("gelu", {{"X", {"X"}}}, {{"Out", {"Out"}}},
attrs);
op->Run(*scope, place);
ctx.Wait();
// eval time
struct timeval start, end;
gettimeofday(&start, NULL);
for (int i = 0; i < 100; i++) {
op->Run(*scope, place);
}
ctx.Wait();
gettimeofday(&end, NULL);
int micros =
(((end.tv_sec - start.tv_sec) * 1000000) + end.tv_usec) - (start.tv_usec);
printf("used time: %d\n", micros / 100);
// eval value
std::vector<T> out_vec;
TensorToVector(*tensor_out, ctx, &out_vec);
float expected = 0.841192;
for (uint32_t i = 0; i < out_vec.size(); i++) {
EXPECT_FLOAT_EQ(out_vec[i], static_cast<T>(expected));
}
}
template <typename T>
void CompareGrad(f::Scope* scope, const p::DeviceContext& ctx) {
auto dout = scope->Var("DOut");
auto tensor_dout = dout->GetMutable<f::LoDTensor>();
auto x = scope->Var("X");
auto tensor_x = x->GetMutable<f::LoDTensor>();
std::vector<T> init_dout;
for (int64_t i = 0; i < 10 * 10; ++i) {
init_dout.push_back(static_cast<T>(1.0));
}
std::vector<T> init_x;
for (int64_t i = 0; i < 10 * 10; ++i) {
init_x.push_back(static_cast<T>(1.0));
}
TensorFromVector(init_dout, ctx, tensor_dout);
tensor_dout->Resize({10, 10});
TensorFromVector(init_x, ctx, tensor_x);
tensor_x->Resize({10, 10});
auto dx = scope->Var("DX");
auto tensor_dx = dx->GetMutable<f::LoDTensor>();
f::AttributeMap attrs;
ctx.Wait();
// run
auto place = ctx.GetPlace();
auto op = f::OpRegistry::CreateOp("gelu_grad",
{{"Out@GRAD", {"DOut"}}, {"X", {"X"}}},
{{"X@GRAD", {"DX"}}}, attrs);
op->Run(*scope, place);
ctx.Wait();
// eval time
struct timeval start, end;
gettimeofday(&start, NULL);
for (int i = 0; i < 100; i++) {
op->Run(*scope, place);
}
ctx.Wait();
gettimeofday(&end, NULL);
int micros =
(((end.tv_sec - start.tv_sec) * 1000000) + end.tv_usec) - (start.tv_usec);
printf("used time: %d\n", micros / 100);
// eval value
std::vector<T> dx_vec;
TensorToVector(*tensor_dx, ctx, &dx_vec);
float expected = 1.082964;
for (uint32_t i = 0; i < dx_vec.size(); i++) {
EXPECT_FLOAT_EQ(dx_vec[i], static_cast<T>(expected));
}
}
TEST(gelu, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx);
}
TEST(gelu_grad, NPU) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
CompareGrad<float>(&scope, ctx);
}
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "paddle/fluid/operators/increment_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/platform/float16.h"
namespace paddle {
namespace framework {
class OpDesc;
class Variable;
} // namespace framework
namespace imperative {
class OpBase;
} // namespace imperative
} // namespace paddle
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class IncrementalNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto* x_tensor = context.Input<framework::Tensor>("X");
auto* out_tensor = context.Output<framework::Tensor>("Out");
float step = context.Attr<float>("step");
out_tensor->mutable_data<T>(context.GetPlace());
Tensor step_tensor(x_tensor->type());
std::vector<T> step_vec;
step_vec.push_back(static_cast<T>(step));
framework::TensorFromVector(step_vec, context.device_context(),
&step_tensor);
auto runner =
NpuOpRunner("Add", {*x_tensor, step_tensor}, {*out_tensor}, {});
auto stream =
context.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace plat = paddle::platform;
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
increment,
ops::IncrementalNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::IncrementalNPUKernel<paddle::platform::NPUDeviceContext, double>,
ops::IncrementalNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::IncrementalNPUKernel<paddle::platform::NPUDeviceContext, int64_t>,
ops::IncrementalNPUKernel<paddle::platform::NPUDeviceContext,
plat::float16>)
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(increment);
USE_OP_DEVICE_KERNEL(increment, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx,
std::string op_type) {
// init
auto x = scope->Var("X");
auto tensor_x = x->GetMutable<f::LoDTensor>();
std::vector<T> init;
init.push_back(static_cast<T>(1.0));
TensorFromVector(init, ctx, tensor_x);
tensor_x->Resize({1});
ctx.Wait();
auto place = ctx.GetPlace();
auto out = scope->Var("Out");
auto tensor_out = out->GetMutable<f::LoDTensor>();
f::AttributeMap attr_input = {{"step", static_cast<float>(2.0)}};
auto op = f::OpRegistry::CreateOp("increment", {{"X", {"X"}}},
{{"Out", {"Out"}}}, attr_input);
op->Run(*scope, place);
std::vector<T> out_vec;
TensorToVector(*tensor_out, ctx, &out_vec);
ctx.Wait();
EXPECT_EQ((uint32_t)out_vec.size(), (uint32_t)1);
EXPECT_EQ(out_vec[0], static_cast<T>(3.0));
}
TEST(increment, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx, "increment");
}
TEST(increment, NPU_fp64) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx, "increment");
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/layer_norm_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using DDim = framework::DDim;
using DataLayout = framework::DataLayout;
template <typename T>
class NormDataType;
template <>
class NormDataType<platform::float16> {
public:
// The scaling param type is float for HALF and FLOAT tensors
using ScalingParamType = const float;
using BatchNormParamType = float;
};
template <>
class NormDataType<float> {
public:
using ScalingParamType = const float;
using BatchNormParamType = float;
};
template <typename T>
using NormDataType = NormDataType<T>;
template <typename T>
using LayerNormParamType = typename NormDataType<T>::BatchNormParamType;
template <typename T>
class LayerNormNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
using U = LayerNormParamType<T>;
const auto begin_norm_axis = ctx.Attr<int>("begin_norm_axis");
const auto epsilon = ctx.Attr<float>("epsilon");
const auto* x = ctx.Input<Tensor>("X");
const auto* scale = ctx.Input<Tensor>("Scale");
const auto* bias = ctx.Input<Tensor>("Bias");
auto* y = ctx.Output<Tensor>("Y");
auto* mean = ctx.Output<Tensor>("Mean");
auto* variance = ctx.Output<Tensor>("Variance");
const auto& x_dims = x->dims();
std::vector<int> axes;
auto matrix_dim = framework::flatten_to_2d(x_dims, begin_norm_axis);
int right = static_cast<int>(matrix_dim[1]);
// The shape of scale and bias should be equal to x.shape[begin_norm_axis:],
// required by Ascend.
for (auto i = begin_norm_axis; i < x_dims.size(); ++i) {
axes.push_back(x_dims[i]);
}
auto place = ctx.GetPlace();
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
Tensor default_scale(x->type());
if (!scale) {
default_scale.mutable_data<T>(framework::make_ddim(axes), place);
Tensor value(x->type());
value.mutable_data<T>({1}, place);
TensorFromVector(std::vector<T>{static_cast<T>(1.0)},
ctx.device_context(), &value);
auto runner =
NpuOpRunner("FillD", {value}, {default_scale}, {{"dims", axes}});
runner.Run(stream);
scale = &default_scale;
} else {
const_cast<Tensor*>(scale)->Resize(framework::make_ddim(axes));
}
Tensor default_bias(x->type());
if (!bias) {
default_bias.mutable_data<T>(framework::make_ddim(axes), place);
Tensor value(x->type());
value.mutable_data<T>({1}, place);
TensorFromVector(std::vector<T>{static_cast<T>(0)}, ctx.device_context(),
&value);
auto runner =
NpuOpRunner("FillD", {value}, {default_bias}, {{"dims", axes}});
runner.Run(stream);
bias = &default_bias;
} else {
const_cast<Tensor*>(bias)->Resize(framework::make_ddim(axes));
}
// cast scale from LayerNormParamType to T if needed
Tensor cast_scale(x->type());
if (x->type() == framework::proto::VarType::FP16 &&
scale->type() == framework::proto::VarType::FP32) {
cast_scale.Resize(scale->dims());
cast_scale.mutable_data<T>(ctx.GetPlace());
auto dst_dtype = ConvertToNpuDtype(x->type());
auto runner_cast_scale =
NpuOpRunner("Cast", {*scale}, {cast_scale},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_scale.Run(stream);
} else {
cast_scale.ShareDataWith(*scale);
}
// cast bias from LayerNormParamType to T if needed
Tensor cast_bias(x->type());
if (x->type() == framework::proto::VarType::FP16 &&
bias->type() == framework::proto::VarType::FP32) {
cast_bias.Resize(bias->dims());
cast_bias.mutable_data<T>(ctx.GetPlace());
auto dst_dtype = ConvertToNpuDtype(x->type());
auto runner_cast_bias =
NpuOpRunner("Cast", {*bias}, {cast_bias},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_bias.Run(stream);
} else {
cast_bias.ShareDataWith(*bias);
}
y->mutable_data<T>(ctx.GetPlace());
// mean should be of U type
Tensor* tmp_mean = mean;
Tensor cast_mean(x->type());
if (x->type() == framework::proto::VarType::FP16 &&
(scale->type() == framework::proto::VarType::FP32 ||
bias->type() == framework::proto::VarType::FP32)) {
cast_mean.Resize(mean->dims());
cast_mean.mutable_data<T>(ctx.GetPlace());
tmp_mean = &cast_mean;
mean->mutable_data<U>(ctx.GetPlace());
} else {
mean->mutable_data<T>(ctx.GetPlace());
}
// same for variance
Tensor* tmp_variance = variance;
Tensor cast_variance(x->type());
if (x->type() == framework::proto::VarType::FP16 &&
(scale->type() == framework::proto::VarType::FP32 ||
bias->type() == framework::proto::VarType::FP32)) {
cast_variance.Resize(variance->dims());
cast_variance.mutable_data<T>(ctx.GetPlace());
tmp_variance = &cast_variance;
variance->mutable_data<U>(ctx.GetPlace());
} else {
variance->mutable_data<T>(ctx.GetPlace());
}
auto runner = NpuOpRunner("LayerNorm", {*x, cast_scale, cast_bias},
{*y, *tmp_mean, *tmp_variance},
{{"begin_norm_axis", begin_norm_axis},
{"begin_params_axis", begin_norm_axis},
{"epsilon", epsilon}});
runner.Run(stream);
// cast back from FP16 to FP32
if (x->type() == framework::proto::VarType::FP16 &&
mean->type() == framework::proto::VarType::FP32) {
auto dst_dtype = ConvertToNpuDtype(mean->type());
auto runner_cast_mean =
NpuOpRunner("Cast", {*tmp_mean}, {*mean},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_mean.Run(stream);
}
// same for variance
if (x->type() == framework::proto::VarType::FP16 &&
variance->type() == framework::proto::VarType::FP32) {
auto dst_dtype = ConvertToNpuDtype(variance->type());
auto runner_cast_variance =
NpuOpRunner("Cast", {*tmp_variance}, {*variance},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_variance.Run(stream);
}
// revert shape of scale and bias
// TODO(zhiqiu): better implementation, use tmp tensor to avoid write input
// tensor.
const_cast<Tensor*>(scale)->Resize(framework::make_ddim({right}));
const_cast<Tensor*>(bias)->Resize(framework::make_ddim({right}));
}
};
template <typename T>
class LayerNormGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
using U = LayerNormParamType<T>;
const auto begin_norm_axis = ctx.Attr<int>("begin_norm_axis");
const auto* x = ctx.Input<Tensor>("X");
const auto& x_dims = x->dims();
const auto* mean = ctx.Input<Tensor>("Mean");
const auto* variance = ctx.Input<Tensor>("Variance");
const auto* scale = ctx.Input<Tensor>("Scale");
const auto* dy = ctx.Input<Tensor>(framework::GradVarName("Y"));
auto* dx = ctx.Output<Tensor>(framework::GradVarName("X"));
auto* dscale = ctx.Output<Tensor>(framework::GradVarName("Scale"));
auto* dbias = ctx.Output<Tensor>(framework::GradVarName("Bias"));
auto matrix_dim = framework::flatten_to_2d(x_dims, begin_norm_axis);
int right = static_cast<int>(matrix_dim[1]);
std::vector<int> axes;
for (auto i = begin_norm_axis; i < x_dims.size(); ++i) {
axes.push_back(x_dims[i]);
}
auto place = ctx.GetPlace();
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
// No need to compute any gradient, jusr return
if (!dx && !dscale && !dbias) {
return;
}
// The rank of mean should be equal to x, required by Ascend.
std::vector<int> new_shape;
for (auto i = 0; i < begin_norm_axis; ++i) {
new_shape.push_back(x_dims[i]);
}
for (auto i = begin_norm_axis; i < x_dims.size(); ++i) {
new_shape.push_back(1);
}
auto mean_dims = mean->dims();
const_cast<Tensor*>(mean)->Resize(framework::make_ddim({new_shape}));
const_cast<Tensor*>(variance)->Resize(framework::make_ddim({new_shape}));
Tensor default_scale(x->type());
if (!scale) {
default_scale.mutable_data<T>(framework::make_ddim(axes), place);
Tensor value(x->type());
value.mutable_data<T>({1}, place);
TensorFromVector(std::vector<T>{static_cast<T>(1.0)},
ctx.device_context(), &value);
auto runner =
NpuOpRunner("FillD", {value}, {default_scale}, {{"dims", axes}});
runner.Run(stream);
scale = &default_scale;
} else {
const_cast<Tensor*>(scale)->Resize(framework::make_ddim(axes));
}
// cast scale from LayerNormParamType to T if needed
Tensor cast_scale(x->type());
if (x->type() == framework::proto::VarType::FP16 &&
scale->type() == framework::proto::VarType::FP32) {
cast_scale.Resize(scale->dims());
cast_scale.mutable_data<T>(ctx.GetPlace());
auto dst_dtype = ConvertToNpuDtype(x->type());
auto runner_cast_scale =
NpuOpRunner("Cast", {*scale}, {cast_scale},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_scale.Run(stream);
} else {
cast_scale.ShareDataWith(*scale);
}
// cast mean from LayerNormParamType to T if needed
Tensor cast_mean(x->type());
if (x->type() == framework::proto::VarType::FP16 &&
mean->type() == framework::proto::VarType::FP32) {
cast_mean.Resize(mean->dims());
cast_mean.mutable_data<T>(ctx.GetPlace());
auto dst_dtype = ConvertToNpuDtype(x->type());
auto runner_cast_mean =
NpuOpRunner("Cast", {*mean}, {cast_mean},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_mean.Run(stream);
} else {
cast_mean.ShareDataWith(*mean);
}
// cast variance from LayerNormParamType to T if needed
Tensor cast_variance(x->type());
if (x->type() == framework::proto::VarType::FP16 &&
variance->type() == framework::proto::VarType::FP32) {
cast_variance.Resize(variance->dims());
cast_variance.mutable_data<T>(ctx.GetPlace());
auto dst_dtype = ConvertToNpuDtype(x->type());
auto runner_cast_variance =
NpuOpRunner("Cast", {*variance}, {cast_variance},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_variance.Run(stream);
} else {
cast_variance.ShareDataWith(*variance);
}
Tensor dx_(dy->type()), dscale_(dy->type()), dbias_(dy->type());
dx = (dx == nullptr) ? &dx_ : dx;
dscale = (dscale == nullptr) ? &dscale_ : dscale;
dbias = (dbias == nullptr) ? &dbias_ : dbias;
dx->Resize(x->dims());
dx->mutable_data<T>(ctx.GetPlace());
dscale->Resize(framework::make_ddim(axes));
dbias->Resize(framework::make_ddim(axes));
// dscale should be of U type
Tensor* tmp_dscale = dscale;
Tensor cast_dscale(x->type());
if (x->type() == framework::proto::VarType::FP16 &&
(mean->type() == framework::proto::VarType::FP32 ||
variance->type() == framework::proto::VarType::FP32)) {
cast_dscale.Resize(dscale->dims());
cast_dscale.mutable_data<T>(ctx.GetPlace());
tmp_dscale = &cast_dscale;
dscale->mutable_data<U>(ctx.GetPlace());
} else {
dscale->mutable_data<T>(ctx.GetPlace());
}
// same for dbias
Tensor* tmp_dbias = dbias;
Tensor cast_dbias(x->type());
if (x->type() == framework::proto::VarType::FP16 &&
(mean->type() == framework::proto::VarType::FP32 ||
variance->type() == framework::proto::VarType::FP32)) {
cast_dbias.Resize(dbias->dims());
cast_dbias.mutable_data<T>(ctx.GetPlace());
tmp_dbias = &cast_dbias;
dbias->mutable_data<U>(ctx.GetPlace());
} else {
dbias->mutable_data<T>(ctx.GetPlace());
}
auto runner = NpuOpRunner("LayerNormGrad",
{*dy, *x, cast_variance, cast_mean, cast_scale},
{*dx, *tmp_dscale, *tmp_dbias}, {});
runner.Run(stream);
// cast back from FP16 to FP32
if (x->type() == framework::proto::VarType::FP16 &&
dscale->type() == framework::proto::VarType::FP32) {
auto dst_dtype = ConvertToNpuDtype(dscale->type());
auto runner_cast_dscale =
NpuOpRunner("Cast", {*tmp_dscale}, {*dscale},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_dscale.Run(stream);
}
// same for dbias
if (x->type() == framework::proto::VarType::FP16 &&
dbias->type() == framework::proto::VarType::FP32) {
auto dst_dtype = ConvertToNpuDtype(dbias->type());
auto runner_cast_dbias =
NpuOpRunner("Cast", {*tmp_dbias}, {*dbias},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_dbias.Run(stream);
}
const_cast<Tensor*>(mean)->Resize(mean_dims);
const_cast<Tensor*>(variance)->Resize(mean_dims);
const_cast<Tensor*>(scale)->Resize(framework::make_ddim({right}));
dscale->Resize(framework::make_ddim({right}));
dbias->Resize(framework::make_ddim({right}));
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(layer_norm, ops::LayerNormNPUKernel<float>,
ops::LayerNormNPUKernel<plat::float16>);
REGISTER_OP_NPU_KERNEL(layer_norm_grad, ops::LayerNormGradNPUKernel<float>,
ops::LayerNormGradNPUKernel<plat::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <iostream>
#include <memory>
#include <string>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class LookupTableV2NPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
auto *ids_t = ctx.Input<framework::LoDTensor>("Ids"); // int tensor
auto *output_t = ctx.Output<framework::LoDTensor>("Out"); // float tensor
auto *table_t = ctx.Input<framework::LoDTensor>("W");
auto *table_var = ctx.InputVar("W");
PADDLE_ENFORCE_EQ(
table_var->IsType<framework::LoDTensor>(), true,
platform::errors::InvalidArgument("npu only accept LoDTensor"));
output_t->mutable_data<T>(ctx.GetPlace());
framework::NPUAttributeMap attr_input = {{"validate_indices", false}};
auto runner =
NpuOpRunner("Gather", {*table_t, *ids_t}, {*output_t}, attr_input);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
template <typename T>
class LookupTableV2GradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext &ctx) const override {
auto *ids_t = ctx.Input<framework::LoDTensor>("Ids");
auto *output_grad_t =
ctx.Input<framework::LoDTensor>(framework::GradVarName("Out"));
auto *table_grad_t =
ctx.Output<framework::LoDTensor>(framework::GradVarName("W"));
table_grad_t->mutable_data<T>(ctx.GetPlace());
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
// step2: ZerosLike x in device
Tensor zeroslike_w(table_grad_t->type());
zeroslike_w.Resize(table_grad_t->dims());
auto p = zeroslike_w.mutable_data<T>(ctx.GetPlace());
platform::NPUMemsetAsync(static_cast<void *>(p), 0,
zeroslike_w.numel() * sizeof(T), stream);
table_grad_t->mutable_data<T>(ctx.GetPlace());
auto runner_scatter =
NpuOpRunner("ScatterAdd", {zeroslike_w, *ids_t, *output_grad_t},
{*table_grad_t}, {});
runner_scatter.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
lookup_table_v2,
ops::LookupTableV2NPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::LookupTableV2NPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
lookup_table_v2_grad, ops::LookupTableV2GradNPUKernel<float>,
ops::LookupTableV2GradNPUKernel<paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <cmath>
#include <iostream>
#include <numeric>
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(lookup_table_v2);
USE_OP_DEVICE_KERNEL(lookup_table_v2, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto ids = scope->Var("Ids");
auto out = scope->Var("Out");
auto w = scope->Var("W");
auto ids_t = ids->GetMutable<f::LoDTensor>();
auto out_t = out->GetMutable<f::LoDTensor>();
auto w_t = w->GetMutable<f::LoDTensor>();
int bsz = 10;
int dim = 32;
int seqlen = 8;
int vocab_size = 100;
TensorFromVector(std::vector<int64_t>(bsz * seqlen, 3), ctx, ids_t);
std::vector<T> val(vocab_size * dim, 10.);
TensorFromVector(val, ctx, w_t);
ids_t->Resize({bsz, seqlen});
w_t->Resize({vocab_size, dim});
out_t->Resize({bsz, seqlen, dim});
ctx.Wait();
auto place = ctx.GetPlace();
out_t->mutable_data<T>(place);
f::AttributeMap attrs = {{}};
auto op = f::OpRegistry::CreateOp("lookup_table_v2",
{{"W", {"W"}}, {"Ids", {"Ids"}}},
{{"Out", {"Out"}}}, attrs);
op->Run(*scope, place);
std::vector<T> out_v;
TensorToVector(*out_t, ctx, &out_v);
ctx.Wait();
EXPECT_EQ(out_t->numel(), bsz * seqlen * dim);
T res = std::accumulate(out_v.begin(), out_v.end(), 0.);
float eps = 1.e-6;
EXPECT_LT(fabs(res - bsz * seqlen * dim * 10.), eps);
}
template <typename T>
void CompareGrad(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto w = scope->Var("W");
auto ids = scope->Var("Ids");
auto out = scope->Var("DOut");
auto dw = scope->Var("DW");
auto w_t = w->GetMutable<f::LoDTensor>();
auto ids_t = ids->GetMutable<f::LoDTensor>();
auto out_t = out->GetMutable<f::LoDTensor>();
auto dw_t = dw->GetMutable<f::LoDTensor>();
int bsz = 2;
int dim = 2;
int seqlen = 2;
int vocab_size = 4;
std::vector<int64_t> val_int(bsz * seqlen, 3);
std::vector<T> val(vocab_size * dim, 0.);
std::vector<T> val_out(bsz * seqlen * dim, 1.);
TensorFromVector(val_int, ctx, ids_t);
TensorFromVector(val, ctx, w_t);
TensorFromVector(val, ctx, dw_t);
TensorFromVector(val_out, ctx, out_t);
w_t->Resize({vocab_size, dim});
ids_t->Resize({bsz, seqlen});
out_t->Resize({bsz, seqlen, dim});
dw_t->Resize({vocab_size, dim});
ctx.Wait();
auto place = ctx.GetPlace();
out_t->mutable_data<T>(place);
w_t->mutable_data<T>(place);
dw_t->mutable_data<T>(place);
f::AttributeMap attrs = {{}};
auto op = f::OpRegistry::CreateOp(
"lookup_table_v2_grad",
{{"Ids", {"Ids"}}, {"W", {"W"}}, {"Out@GRAD", {"DOut"}}},
{{"W@GRAD", {"DW"}}}, attrs);
op->Run(*scope, place);
ctx.Wait();
std::vector<T> w_v;
TensorToVector(*dw_t, ctx, &w_v);
ctx.Wait();
EXPECT_EQ(dw_t->numel(), vocab_size * dim);
T res = std::accumulate(w_v.begin(), w_v.end(), 0.);
float eps = 1.e-6;
EXPECT_LT(fabs(res - bsz * seqlen * dim), eps);
}
TEST(lookup_table_v2, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx);
}
TEST(lookup_table_v2_grad, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
CompareGrad<float>(&scope, ctx);
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/matmul_v2_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class MatMulV2NPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::Tensor>("X");
auto* y = ctx.Input<framework::Tensor>("Y");
auto* out = ctx.Output<framework::Tensor>("Out");
bool transpose_x = ctx.Attr<bool>("trans_x");
bool transpose_y = ctx.Attr<bool>("trans_y");
if (x->dims().size() == 2) {
out->mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner(
"MatMul", {*x, *y}, {*out},
{{"transpose_x1", transpose_x}, {"transpose_x2", transpose_y}});
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
} else if (x->dims().size() > 2) {
out->mutable_data<T>(ctx.GetPlace());
auto runner =
NpuOpRunner("BatchMatMul", {*x, *y}, {*out},
{{"adj_x1", transpose_x}, {"adj_x2", transpose_y}});
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
}
};
template <typename DeviceContext, typename T>
class MatMulV2GradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::Tensor>("X");
auto* y = ctx.Input<framework::Tensor>("Y");
auto* dout = ctx.Input<framework::Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<framework::Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<framework::Tensor>(framework::GradVarName("Y"));
bool transpose_y = ctx.Attr<bool>("trans_y");
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
if (x->dims().size() == 2) {
if (transpose_y) {
if (dx) {
dx->mutable_data<T>(ctx.GetPlace());
auto runner_dx =
NpuOpRunner("MatMul", {*dout, *y}, {*dx},
{{"transpose_x1", false}, {"transpose_x2", false}});
runner_dx.Run(stream);
}
if (dy) {
dy->mutable_data<T>(ctx.GetPlace());
auto runner_dy =
NpuOpRunner("MatMul", {*dout, *x}, {*dy},
{{"transpose_x1", true}, {"transpose_x2", false}});
runner_dy.Run(stream);
}
} else {
if (dx) {
dx->mutable_data<T>(ctx.GetPlace());
auto runner_dx =
NpuOpRunner("MatMul", {*dout, *y}, {*dx},
{{"transpose_x1", false}, {"transpose_x2", true}});
runner_dx.Run(stream);
}
if (dy) {
dy->mutable_data<T>(ctx.GetPlace());
auto runner_dy =
NpuOpRunner("MatMul", {*x, *dout}, {*dy},
{{"transpose_x1", true}, {"transpose_x2", false}});
runner_dy.Run(stream);
}
}
} else if (x->dims().size() > 2) {
if (transpose_y) {
if (dx) {
dx->mutable_data<T>(ctx.GetPlace());
auto runner_dx = NpuOpRunner("BatchMatMul", {*dout, *y}, {*dx},
{{"adj_x1", false}, {"adj_x2", false}});
runner_dx.Run(stream);
}
if (dy) {
dy->mutable_data<T>(ctx.GetPlace());
auto runner_dy = NpuOpRunner("BatchMatMul", {*dout, *x}, {*dy},
{{"adj_x1", true}, {"adj_x2", false}});
runner_dy.Run(stream);
}
} else {
if (dx) {
dx->mutable_data<T>(ctx.GetPlace());
auto runner_dx = NpuOpRunner("BatchMatMul", {*dout, *y}, {*dx},
{{"adj_x1", false}, {"adj_x2", true}});
runner_dx.Run(stream);
}
if (dy) {
dy->mutable_data<T>(ctx.GetPlace());
auto runner_dy = NpuOpRunner("BatchMatMul", {*x, *dout}, {*dy},
{{"adj_x1", true}, {"adj_x2", false}});
runner_dy.Run(stream);
}
}
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
matmul_v2,
ops::MatMulV2NPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::MatMulV2NPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
matmul_v2_grad,
ops::MatMulV2GradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::MatMulV2GradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/mean_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/platform/float16.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class MeanNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::LoDTensor>("X");
auto* out = ctx.Output<framework::LoDTensor>("Out");
std::vector<int> axes;
framework::NPUAttributeMap attr_input = {{"keep_dims", false},
{"axes", axes}};
out->mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner("ReduceMeanD", {*x}, {*out}, attr_input);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class MeanGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto stream =
context.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto grad = context.Input<Tensor>(framework::GradVarName("Out"));
PADDLE_ENFORCE_EQ(grad->numel(), 1,
platform::errors::InvalidArgument(
"Mean Gradient Input Tensor len should be 1. But "
"received Out@Grad's elements num is %d.",
grad->numel()));
auto IG = context.Output<Tensor>(framework::GradVarName("X"));
IG->mutable_data<T>(context.GetPlace());
// ones
Tensor ones(grad->type());
ones.mutable_data<T>(IG->dims(), context.GetPlace());
auto runner_ones = NpuOpRunner("OnesLike", {*IG}, {ones}, {});
runner_ones.Run(stream);
// means
Tensor mean_tensor(grad->type());
mean_tensor.Resize({1});
mean_tensor.mutable_data<T>(context.GetPlace());
std::vector<float> mean_vec;
mean_vec.push_back(1.0 / static_cast<float>(IG->numel()));
framework::TensorFromVector(mean_vec, context.device_context(),
&mean_tensor);
// means mul ones
Tensor mean_ma(grad->type());
mean_ma.Resize(IG->dims());
mean_ma.mutable_data<T>(context.GetPlace());
auto runner_mul_1 = NpuOpRunner("Mul", {mean_tensor, ones}, {mean_ma}, {});
runner_mul_1.Run(stream);
// and mul grad
auto runner_mul_2 = NpuOpRunner("Mul", {mean_ma, *grad}, {*IG}, {});
runner_mul_2.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(
mean, ops::MeanNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::MeanNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::MeanNPUKernel<paddle::platform::NPUDeviceContext, double>,
ops::MeanNPUKernel<paddle::platform::NPUDeviceContext, plat::float16>)
REGISTER_OP_NPU_KERNEL(
mean_grad, ops::MeanGradNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::MeanGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::MeanGradNPUKernel<paddle::platform::NPUDeviceContext, double>,
ops::MeanGradNPUKernel<paddle::platform::NPUDeviceContext, plat::float16>)
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/controlflow/compare_op.h"
#include "paddle/fluid/operators/metrics/accuracy_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class AccuracyNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* pred = ctx.Input<Tensor>("Out");
auto* label = ctx.Input<Tensor>("Label");
// auto* logits = ctx.Input<Tensor>("Indices");
auto* acc = ctx.Output<Tensor>("Accuracy");
auto* correct = ctx.Output<Tensor>("Correct");
auto* total = ctx.Output<Tensor>("Total");
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
// cast pred
Tensor tmp_pred(pred->type());
tmp_pred.Resize(pred->dims());
tmp_pred.mutable_data<int>(ctx.GetPlace());
auto runner_cast_pred =
NpuOpRunner("Cast", {*pred}, {tmp_pred},
{{"dst_type", static_cast<int>(ACL_INT32)}});
runner_cast_pred.Run(stream);
// cast label
Tensor tmp_label(label->type());
tmp_label.Resize(label->dims());
tmp_label.mutable_data<int>(ctx.GetPlace());
auto runner_cast_label =
NpuOpRunner("Cast", {*label}, {tmp_label},
{{"dst_type", static_cast<int>(ACL_INT32)}});
runner_cast_label.Run(stream);
// equal
Tensor tmp_equal(label->type());
tmp_equal.Resize(label->dims());
tmp_equal.mutable_data<bool>(ctx.GetPlace());
auto runner_equal =
NpuOpRunner("Equal", {tmp_pred, tmp_label}, {tmp_equal}, {});
runner_equal.Run(stream);
// cast equal
Tensor tmp_equal_cast(label->type());
tmp_equal_cast.Resize(label->dims());
tmp_equal_cast.mutable_data<float>(ctx.GetPlace());
auto runner_cast_equal =
NpuOpRunner("Cast", {tmp_equal}, {tmp_equal_cast},
{{"dst_type", static_cast<float>(ACL_FLOAT)}});
runner_cast_equal.Run(stream);
// acc
acc->mutable_data<float>(ctx.GetPlace());
std::vector<int> axes_vec_1;
auto runner_acc = NpuOpRunner("ReduceMeanD", {tmp_equal_cast}, {*acc},
{{"keep_dims", false}, {"axes", axes_vec_1}});
runner_acc.Run(stream);
// correct
correct->mutable_data<float>(ctx.GetPlace());
std::vector<int> axes_vec_2;
auto runner_correct =
NpuOpRunner("ReduceSumD", {tmp_equal_cast}, {*correct},
{{"keep_dims", false}, {"axes", axes_vec_2}});
runner_correct.Run(stream);
// ones_tensor
Tensor ones_tensor(label->type());
ones_tensor.Resize(label->dims());
ones_tensor.mutable_data<int>(ctx.GetPlace());
auto runner_oneslike =
NpuOpRunner("OnesLike", {tmp_label}, {ones_tensor}, {});
runner_oneslike.Run(stream);
// ones_tensor_cast
Tensor ones_tensor_cast(label->type());
ones_tensor_cast.Resize(label->dims());
ones_tensor_cast.mutable_data<float>(ctx.GetPlace());
auto runner_ones_cast =
NpuOpRunner("Cast", {ones_tensor}, {ones_tensor_cast},
{{"dst_type", static_cast<float>(ACL_FLOAT)}});
runner_ones_cast.Run(stream);
// total
total->mutable_data<float>(ctx.GetPlace());
std::vector<int> axes_vec_3;
auto runner_total =
NpuOpRunner("ReduceSumD", {ones_tensor_cast}, {*total},
{{"keep_dims", false}, {"axes", axes_vec_3}});
runner_total.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
accuracy, ops::AccuracyNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::AccuracyNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>,
ops::AccuracyNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::AccuracyNPUKernel<paddle::platform::NPUDeviceContext, int64_t>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/mul_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class MulNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::Tensor>("X");
auto* y = ctx.Input<framework::Tensor>("Y");
auto* out = ctx.Output<framework::Tensor>("Out");
int x_num_col_dims = ctx.Attr<int>("x_num_col_dims");
int y_num_col_dims = ctx.Attr<int>("y_num_col_dims");
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
if (x_num_col_dims == 1 && y_num_col_dims == 1) {
if (x->dims().size() == 2 && y->dims().size() == 2) {
out->mutable_data<T>(ctx.GetPlace());
auto runner =
NpuOpRunner("MatMul", {*x, *y}, {*out},
{{"transpose_x1", false}, {"transpose_x2", false}});
runner.Run(stream);
} else if (x->dims().size() == 3 && y->dims().size() == 2) {
// reshape
Tensor tmp_x(x->type());
int64_t sec_dim = x->dims()[1] * x->dims()[2];
int64_t first_dim = x->dims()[0];
tmp_x.Resize(framework::make_ddim({first_dim, sec_dim}));
tmp_x.mutable_data<T>(ctx.GetPlace());
framework::TensorCopy(
*x, ctx.GetPlace(),
ctx.template device_context<platform::DeviceContext>(), &tmp_x);
tmp_x.Resize(framework::make_ddim({first_dim, sec_dim}));
out->mutable_data<T>(ctx.GetPlace());
// matmul
auto runner =
NpuOpRunner("MatMul", {tmp_x, *y}, {*out},
{{"transpose_x1", false}, {"transpose_x2", false}});
runner.Run(stream);
} else {
PADDLE_THROW(
platform::errors::InvalidArgument("npu error: not suppert dims"));
}
// to do other
} else if (x->dims().size() == 3 && y->dims().size() == 2) {
// for example: x.shape=[2, 3, 4] y.shape=[4, 5], expect [2, 3, 5]
PADDLE_ENFORCE_EQ(x_num_col_dims, 2,
platform::errors::InvalidArgument(
"now only support x_num_col_dims == 2: but got %d",
x_num_col_dims));
// flatten => x.shape=[6, 4]
Tensor tmp_x(x->type());
int64_t first_dim = x->dims()[0] * x->dims()[1];
int64_t sec_dim = x->dims()[2];
tmp_x.Resize(framework::make_ddim({first_dim, sec_dim}));
tmp_x.mutable_data<T>(ctx.GetPlace());
framework::TensorCopy(
*x, ctx.GetPlace(),
ctx.template device_context<platform::DeviceContext>(), &tmp_x);
tmp_x.Resize(framework::make_ddim({first_dim, sec_dim}));
// matmul [6,4] , [4, 5] => [6, 5]
Tensor tmp_matmul(x->type());
tmp_matmul.Resize(framework::make_ddim({first_dim, y->dims()[1]}));
tmp_matmul.mutable_data<T>(ctx.GetPlace());
auto runner_matmul =
NpuOpRunner("MatMul", {tmp_x, *y}, {tmp_matmul},
{{"transpose_x1", false}, {"transpose_x2", false}});
runner_matmul.Run(stream);
// reshape [6, 5] => [2, 3, 5]
(*out).Resize(
framework::make_ddim({x->dims()[0], x->dims()[1], y->dims()[1]}));
out->mutable_data(ctx.GetPlace(), x->type());
framework::TensorCopy(
tmp_matmul, ctx.GetPlace(),
ctx.template device_context<platform::DeviceContext>(), out);
(*out).Resize(
framework::make_ddim({x->dims()[0], x->dims()[1], y->dims()[1]}));
}
}
};
template <typename DeviceContext, typename T>
class MulGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::Tensor>("X");
auto* y = ctx.Input<framework::Tensor>("Y");
auto* dout = ctx.Input<framework::Tensor>(framework::GradVarName("Out"));
auto* dx = ctx.Output<framework::Tensor>(framework::GradVarName("X"));
auto* dy = ctx.Output<framework::Tensor>(framework::GradVarName("Y"));
int x_num_col_dims = ctx.Attr<int>("x_num_col_dims");
int y_num_col_dims = ctx.Attr<int>("y_num_col_dims");
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
if (x_num_col_dims == 1 && y_num_col_dims == 1) {
if (x->dims().size() == 2 && y->dims().size() == 2) {
if (dx) {
dx->mutable_data<T>(ctx.GetPlace());
auto runner_dx =
NpuOpRunner("MatMul", {*dout, *y}, {*dx},
{{"transpose_x1", false}, {"transpose_x2", true}});
runner_dx.Run(stream);
}
if (dy) {
dy->mutable_data<T>(ctx.GetPlace());
auto runner_dy =
NpuOpRunner("MatMul", {*x, *dout}, {*dy},
{{"transpose_x1", true}, {"transpose_x2", false}});
runner_dy.Run(stream);
}
} else if (x->dims().size() == 3 && y->dims().size() == 2) {
// flatten => x.shape=[6, 4]
// matmul
if (dx) {
// matmul [2, 5] * [12, 5] => [2, 12]
dx->mutable_data<T>(ctx.GetPlace());
auto dx_dims = dx->dims();
dx->Resize(framework::make_ddim({dout->dims()[0], y->dims()[0]}));
auto runner_matmul =
NpuOpRunner("MatMul", {*dout, *y}, {*dx},
{{"transpose_x1", false}, {"transpose_x2", true}});
runner_matmul.Run(stream);
// reshape [2, 12] => [2, 3, 4]
dx->Resize(dx_dims);
}
if (dy) {
// flatten
Tensor tmp_x(x->type());
int64_t sec_dim = x->dims()[1] * x->dims()[2];
int64_t first_dim = x->dims()[0];
tmp_x.Resize(framework::make_ddim({first_dim, sec_dim}));
tmp_x.mutable_data<T>(ctx.GetPlace());
framework::TensorCopy(
*x, ctx.GetPlace(),
ctx.template device_context<platform::DeviceContext>(), &tmp_x);
tmp_x.Resize(framework::make_ddim({first_dim, sec_dim}));
dy->mutable_data<T>(ctx.GetPlace());
auto runner_dy =
NpuOpRunner("MatMul", {tmp_x, *dout}, {*dy},
{{"transpose_x1", true}, {"transpose_x2", false}});
runner_dy.Run(stream);
}
}
} else if (x->dims().size() == 3 && y->dims().size() == 2) {
// for example: x.shape=[2, 3, 4] y.shape=[4, 5], expect [2, 3, 5]
PADDLE_ENFORCE_EQ(x_num_col_dims, 2,
platform::errors::InvalidArgument(
"now only support x_num_col_dims == 2: but got %d",
x_num_col_dims));
// tmp_dout both used by dx and dy
Tensor tmp_dout(x->type());
int64_t dout_first_dim = dout->dims()[0] * dout->dims()[1];
int64_t dout_sec_dim = dout->dims()[2];
tmp_dout.Resize(framework::make_ddim({dout_first_dim, dout_sec_dim}));
tmp_dout.mutable_data<T>(ctx.GetPlace());
framework::TensorCopy(
*dout, ctx.GetPlace(),
ctx.template device_context<platform::DeviceContext>(), &tmp_dout);
tmp_dout.Resize(framework::make_ddim({dout_first_dim, dout_sec_dim}));
if (dx) {
// tmp_dout * y [6,5] * [4,5] => [6, 4]
dx->mutable_data<T>(ctx.GetPlace());
auto dx_dims = dx->dims();
dx->Resize(framework::make_ddim({dout_first_dim, y->dims()[0]}));
auto runner_matmul =
NpuOpRunner("MatMul", {tmp_dout, *y}, {*dx},
{{"transpose_x1", false}, {"transpose_x2", true}});
runner_matmul.Run(stream);
// reshape [2, 12] => [2, 3, 4]
dx->Resize(dx_dims);
}
if (dy) {
// flatten x.shape [2,3,4] => [6, 4]
Tensor tmp_x(x->type());
int64_t first_dim = x->dims()[0] * x->dims()[1];
int64_t sec_dim = x->dims()[2];
tmp_x.Resize(framework::make_ddim({first_dim, sec_dim}));
tmp_x.mutable_data<T>(ctx.GetPlace());
framework::TensorCopy(
*x, ctx.GetPlace(),
ctx.template device_context<platform::DeviceContext>(), &tmp_x);
tmp_x.Resize(framework::make_ddim({first_dim, sec_dim}));
// mamtul [6,4] [6,5] =>[4,5]
dy->mutable_data<T>(ctx.GetPlace());
auto runner_dy =
NpuOpRunner("MatMul", {tmp_x, tmp_dout}, {*dy},
{{"transpose_x1", true}, {"transpose_x2", false}});
runner_dy.Run(stream);
}
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
mul, ops::MulNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::MulNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
mul_grad, ops::MulGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::MulGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
...@@ -64,13 +64,21 @@ aclFormat ConvertToNpuFormat(DataLayout layout) { ...@@ -64,13 +64,21 @@ aclFormat ConvertToNpuFormat(DataLayout layout) {
return iter->second; return iter->second;
} }
aclrtStream GetCurrentNPUStream() {
int device_id = platform::GetCurrentNPUDeviceId();
platform::DeviceContextPool &pool = platform::DeviceContextPool::Instance();
auto *dev_ctx = static_cast<platform::NPUDeviceContext *>(
pool.Get(platform::NPUPlace(device_id)));
return dev_ctx->stream();
}
NpuOpRunner::NpuOpRunner(std::string op_type) : op_type_(op_type) { NpuOpRunner::NpuOpRunner(std::string op_type) : op_type_(op_type) {
attr_ = aclopCreateAttr(); attr_ = aclopCreateAttr();
} }
NpuOpRunner::NpuOpRunner(std::string op_type, const std::vector<Tensor> &inputs, NpuOpRunner::NpuOpRunner(std::string op_type, const std::vector<Tensor> &inputs,
const std::vector<Tensor> &outputs, const std::vector<Tensor> &outputs,
const AttributeMap &attrs) const NPUAttributeMap &attrs)
: op_type_(op_type) { : op_type_(op_type) {
attr_ = aclopCreateAttr(); attr_ = aclopCreateAttr();
AddInputs(inputs); AddInputs(inputs);
...@@ -85,7 +93,7 @@ NpuOpRunner::~NpuOpRunner() { ...@@ -85,7 +93,7 @@ NpuOpRunner::~NpuOpRunner() {
const std::string &NpuOpRunner::Type() { return op_type_; } const std::string &NpuOpRunner::Type() { return op_type_; }
NpuOpRunner &NpuOpRunner::AddAttr(const std::string &name, NpuOpRunner &NpuOpRunner::AddAttr(const std::string &name,
const Attribute &attr) { const NPUAttribute &attr) {
if (attr.type() == typeid(bool)) { if (attr.type() == typeid(bool)) {
PADDLE_ENFORCE_NPU_SUCCESS( PADDLE_ENFORCE_NPU_SUCCESS(
aclopSetAttrBool(attr_, name.c_str(), BOOST_GET_CONST(bool, attr))); aclopSetAttrBool(attr_, name.c_str(), BOOST_GET_CONST(bool, attr)));
...@@ -135,6 +143,16 @@ NpuOpRunner &NpuOpRunner::AddAttr(const std::string &name, ...@@ -135,6 +143,16 @@ NpuOpRunner &NpuOpRunner::AddAttr(const std::string &name,
} }
PADDLE_ENFORCE_NPU_SUCCESS( PADDLE_ENFORCE_NPU_SUCCESS(
aclopSetAttrListString(attr_, name.c_str(), s.size(), s.data())); aclopSetAttrListString(attr_, name.c_str(), s.size(), s.data()));
} else if (attr.type() == typeid(std::vector<std::vector<int64_t>>)) {
auto a = BOOST_GET_CONST(std::vector<std::vector<int64_t>>, attr);
std::vector<int64_t *> data;
std::vector<int> num;
for (auto &&v : a) {
data.push_back(v.data());
num.push_back(v.size());
}
PADDLE_ENFORCE_NPU_SUCCESS(aclopSetAttrListListInt(
attr_, name.c_str(), data.size(), num.data(), data.data()));
} else { } else {
PADDLE_THROW(platform::errors::Unimplemented( PADDLE_THROW(platform::errors::Unimplemented(
"Can not convert attribubte '%s' to convert to aclopAttr", name)); "Can not convert attribubte '%s' to convert to aclopAttr", name));
...@@ -142,7 +160,7 @@ NpuOpRunner &NpuOpRunner::AddAttr(const std::string &name, ...@@ -142,7 +160,7 @@ NpuOpRunner &NpuOpRunner::AddAttr(const std::string &name,
return *this; return *this;
} }
NpuOpRunner &NpuOpRunner::AddAttrs(const AttributeMap &attrs) { NpuOpRunner &NpuOpRunner::AddAttrs(const NPUAttributeMap &attrs) {
for (const auto &pair : attrs) { for (const auto &pair : attrs) {
AddAttr(pair.first, pair.second); AddAttr(pair.first, pair.second);
} }
...@@ -175,6 +193,21 @@ NpuOpRunner &NpuOpRunner::AddInputs(const std::vector<Tensor> &tensors) { ...@@ -175,6 +193,21 @@ NpuOpRunner &NpuOpRunner::AddInputs(const std::vector<Tensor> &tensors) {
return *this; return *this;
} }
// NOTE(zhiqiu): For operators whose input is a list (such as concat, stack),
// It is needed to set the name of each input tensor.
NpuOpRunner &NpuOpRunner::AddInputNames(const std::vector<std::string> &names) {
PADDLE_ENFORCE_EQ(names.size(), input_descs_.size(),
platform::errors::InvalidArgument(
"The size of input names should be "
"equal to the size of input descs, but got the size "
"of input names is %d, the size of input descs is %d.",
names.size(), input_descs_.size()));
for (size_t i = 0; i < names.size(); ++i) {
aclSetTensorDescName(input_descs_[i], names[i].c_str());
}
return *this;
}
NpuOpRunner &NpuOpRunner::AddOutputs(const std::vector<Tensor> &tensors) { NpuOpRunner &NpuOpRunner::AddOutputs(const std::vector<Tensor> &tensors) {
for (auto tensor : tensors) { for (auto tensor : tensors) {
// create aclTensorDesc // create aclTensorDesc
...@@ -224,18 +257,22 @@ aclTensorDesc *NpuOpRunner::CreateTensorDesc(Tensor tensor) { ...@@ -224,18 +257,22 @@ aclTensorDesc *NpuOpRunner::CreateTensorDesc(Tensor tensor) {
auto format = ConvertToNpuFormat(tensor.layout()); auto format = ConvertToNpuFormat(tensor.layout());
auto dims = framework::vectorize(tensor.dims()); auto dims = framework::vectorize(tensor.dims());
VLOG(4) << dtype << " " << dims.size() << " " << dims[0] << "," << dims[1] VLOG(4) << "NPU dtype:" << dtype << " "
<< " " << format; << "rank:" << dims.size() << " dims:" << tensor.dims()
<< " format:" << format;
auto *desc = aclCreateTensorDesc(dtype, dims.size(), dims.data(), format); auto *desc = aclCreateTensorDesc(dtype, dims.size(), dims.data(), format);
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE_NOT_NULL(
desc, platform::errors::External("Call aclCreateTensorDesc failed.")); desc, platform::errors::External("Call aclCreateTensorDesc failed."));
PADDLE_ENFORCE_NPU_SUCCESS(aclSetTensorStorageFormat(desc, format));
PADDLE_ENFORCE_NPU_SUCCESS(
aclSetTensorStorageShape(desc, dims.size(), dims.data()));
return desc; return desc;
} }
aclDataBuffer *NpuOpRunner::CreateDataBuffer(Tensor tensor) { aclDataBuffer *NpuOpRunner::CreateDataBuffer(Tensor tensor) {
void *ptr = tensor.data<void>(); void *ptr = tensor.data<void>();
VLOG(4) << "ptr: " << ptr << ", size: " << tensor.memory_size(); VLOG(4) << "NPU ptr: " << ptr << ", size: " << tensor.memory_size();
auto *buffer = aclCreateDataBuffer(ptr, tensor.memory_size()); auto *buffer = aclCreateDataBuffer(ptr, tensor.memory_size());
PADDLE_ENFORCE_NOT_NULL( PADDLE_ENFORCE_NOT_NULL(
buffer, platform::errors::External("Call aclCreateDataBuffer failed.")); buffer, platform::errors::External("Call aclCreateDataBuffer failed."));
...@@ -243,11 +280,17 @@ aclDataBuffer *NpuOpRunner::CreateDataBuffer(Tensor tensor) { ...@@ -243,11 +280,17 @@ aclDataBuffer *NpuOpRunner::CreateDataBuffer(Tensor tensor) {
} }
void NpuOpRunner::Run(aclrtStream stream) { void NpuOpRunner::Run(aclrtStream stream) {
if (!stream) {
VLOG(4) << "Run with default current npu stream: " << stream;
stream = GetCurrentNPUStream();
}
VLOG(4) << "op_type: " << op_type_; VLOG(4) << "op_type: " << op_type_;
VLOG(4) << "input_desc.size: " << input_descs_.size(); VLOG(4) << "input_desc.size: " << input_descs_.size();
VLOG(4) << "output_desc.size: " << output_descs_.size(); VLOG(4) << "output_desc.size: " << output_descs_.size();
VLOG(4) << "stream: " << stream;
VLOG(4) << "attr: " << attr_; VLOG(4) << "attr: " << attr_;
VLOG(4) << "stream: " << stream;
aclError ret = aclopCompileAndExecute( aclError ret = aclopCompileAndExecute(
op_type_.c_str(), input_descs_.size(), input_descs_.data(), op_type_.c_str(), input_descs_.size(), input_descs_.data(),
input_buffers_.data(), output_descs_.size(), output_descs_.data(), input_buffers_.data(), output_descs_.size(), output_descs_.data(),
......
...@@ -12,8 +12,10 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ...@@ -12,8 +12,10 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#pragma once #pragma once
#include <paddle/fluid/framework/operator.h> #include <paddle/fluid/framework/operator.h>
#include <paddle/fluid/framework/type_defs.h>
#include <string> #include <string>
#include <vector> #include <vector>
...@@ -26,8 +28,8 @@ namespace operators { ...@@ -26,8 +28,8 @@ namespace operators {
using Tensor = framework::Tensor; using Tensor = framework::Tensor;
using DataLayout = framework::DataLayout; using DataLayout = framework::DataLayout;
using Attribute = framework::Attribute; using NPUAttribute = framework::NPUAttribute;
using AttributeMap = framework::AttributeMap; using NPUAttributeMap = framework::NPUAttributeMap;
class NpuOpRunner { class NpuOpRunner {
public: public:
...@@ -35,15 +37,15 @@ class NpuOpRunner { ...@@ -35,15 +37,15 @@ class NpuOpRunner {
explicit NpuOpRunner(std::string op_type, explicit NpuOpRunner(std::string op_type,
const std::vector<Tensor> &inputs = {}, const std::vector<Tensor> &inputs = {},
const std::vector<Tensor> &outputs = {}, const std::vector<Tensor> &outputs = {},
const AttributeMap &attrs = {}); const NPUAttributeMap &attrs = {});
~NpuOpRunner(); ~NpuOpRunner();
const std::string &Type(); const std::string &Type();
NpuOpRunner &AddAttr(const std::string &name, const Attribute &attr); NpuOpRunner &AddAttr(const std::string &name, const NPUAttribute &attr);
NpuOpRunner &AddAttrs(const AttributeMap &attrs); NpuOpRunner &AddAttrs(const NPUAttributeMap &attrs);
NpuOpRunner &AddInput(const Tensor &tensor); NpuOpRunner &AddInput(const Tensor &tensor);
...@@ -51,6 +53,8 @@ class NpuOpRunner { ...@@ -51,6 +53,8 @@ class NpuOpRunner {
NpuOpRunner &AddInputs(const std::vector<Tensor> &tensors); NpuOpRunner &AddInputs(const std::vector<Tensor> &tensors);
NpuOpRunner &AddInputNames(const std::vector<std::string> &names);
NpuOpRunner &AddOutputs(const std::vector<Tensor> &tensors); NpuOpRunner &AddOutputs(const std::vector<Tensor> &tensors);
aclTensorDesc *GetInputDesc(size_t index); aclTensorDesc *GetInputDesc(size_t index);
...@@ -65,7 +69,7 @@ class NpuOpRunner { ...@@ -65,7 +69,7 @@ class NpuOpRunner {
std::vector<aclDataBuffer *> &GetOutputBuffers(); std::vector<aclDataBuffer *> &GetOutputBuffers();
void Run(aclrtStream stream); void Run(aclrtStream stream = nullptr);
private: private:
aclTensorDesc *CreateTensorDesc(Tensor tensor); aclTensorDesc *CreateTensorDesc(Tensor tensor);
...@@ -80,5 +84,8 @@ class NpuOpRunner { ...@@ -80,5 +84,8 @@ class NpuOpRunner {
aclopAttr *attr_{nullptr}; aclopAttr *attr_{nullptr};
}; };
aclDataType ConvertToNpuDtype(framework::proto::VarType::Type dtype);
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
#endif
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/optimizers/adam_op.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
template <typename DeviceContext, typename T>
class AdamNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
const auto* param_var = ctx.InputVar("Param");
PADDLE_ENFORCE_EQ(param_var->IsType<framework::LoDTensor>(), true,
platform::errors::InvalidArgument(
"The Var(%s)'s type should be LoDTensor, "
"but the received is %s",
ctx.InputNames("Param").front(),
framework::ToTypeName(param_var->Type())));
T epsilon = static_cast<T>(ctx.Attr<float>("epsilon"));
auto* param = ctx.Input<LoDTensor>("Param");
auto* grad_var = ctx.InputVar("Grad");
PADDLE_ENFORCE_EQ(grad_var->IsType<framework::LoDTensor>(), true,
platform::errors::InvalidArgument(
"The Grad(%s)'s type should be LoDTensor, "
"but the received is %s",
ctx.InputNames("Grad").front(),
framework::ToTypeName(param_var->Type())));
auto* grad = ctx.Input<LoDTensor>("Grad");
auto* mom1 = ctx.Input<LoDTensor>("Moment1");
auto* mom2 = ctx.Input<LoDTensor>("Moment2");
auto* lr = ctx.Input<LoDTensor>("LearningRate");
auto* beta1_pow = ctx.Input<LoDTensor>("Beta1Pow");
auto* beta2_pow = ctx.Input<LoDTensor>("Beta2Pow");
auto* param_out = ctx.Output<LoDTensor>("ParamOut");
auto* mom1_out = ctx.Output<LoDTensor>("Moment1Out");
auto* mom2_out = ctx.Output<LoDTensor>("Moment2Out");
auto* beta1_pow_out = ctx.Output<LoDTensor>("Beta1PowOut");
auto* beta2_pow_out = ctx.Output<LoDTensor>("Beta2PowOut");
param_out->mutable_data<T>(ctx.GetPlace());
mom1_out->mutable_data<T>(ctx.GetPlace());
mom2_out->mutable_data<T>(ctx.GetPlace());
beta1_pow_out->mutable_data<T>(ctx.GetPlace());
beta2_pow_out->mutable_data<T>(ctx.GetPlace());
T beta1 = static_cast<T>(ctx.Attr<float>("beta1"));
if (ctx.HasInput("Beta1Tensor")) {
auto* beta1_tensor = ctx.Input<framework::Tensor>("Beta1Tensor");
PADDLE_ENFORCE_EQ(beta1_tensor->numel(), 1,
platform::errors::InvalidArgument(
"Input(Beta1Tensor) size must be 1, but get %d",
beta1_tensor->numel()));
beta1 = static_cast<T>(GetAttrFromTensor(beta1_tensor));
}
T beta2 = static_cast<T>(ctx.Attr<float>("beta2"));
if (ctx.HasInput("Beta2Tensor")) {
auto* beta2_tensor = ctx.Input<framework::Tensor>("Beta2Tensor");
PADDLE_ENFORCE_EQ(beta2_tensor->numel(), 1,
platform::errors::InvalidArgument(
"Input(Beta2Tensor) size must be 1, but get %d",
beta2_tensor->numel()));
beta2 = static_cast<T>(GetAttrFromTensor(beta2_tensor));
}
VLOG(3) << "beta1_pow.numel() : " << beta1_pow->numel()
<< "beta2_pow.numel() : " << beta2_pow->numel();
VLOG(3) << "param.numel(): " << param->numel();
PADDLE_ENFORCE_EQ(beta1_pow_out->numel(), 1,
platform::errors::InvalidArgument(
"beta1 pow output size should be 1, but received "
"value is:%d.",
beta1_pow_out->numel()));
PADDLE_ENFORCE_EQ(beta2_pow_out->numel(), 1,
platform::errors::InvalidArgument(
"beta2 pow output size should be 1, but received "
"value is:%d.",
beta2_pow_out->numel()));
// reshape
Tensor beta1_tensor(framework::proto::VarType::FP32);
beta1_tensor.mutable_data<float>({1}, ctx.GetPlace());
TensorFromVector(std::vector<T>{beta1}, ctx.device_context(),
&beta1_tensor);
Tensor beta2_tensor(framework::proto::VarType::FP32);
beta2_tensor.mutable_data<float>({1}, ctx.GetPlace());
TensorFromVector(std::vector<T>{beta2}, ctx.device_context(),
&beta2_tensor);
Tensor epsilon_tensor(framework::proto::VarType::FP32);
epsilon_tensor.mutable_data<T>({1}, ctx.GetPlace());
TensorFromVector(std::vector<T>{epsilon}, ctx.device_context(),
&epsilon_tensor);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner =
NpuOpRunner("ApplyAdamD",
{
*param, *mom1, *mom2, *beta1_pow, *beta2_pow, *lr,
beta1_tensor, beta2_tensor, epsilon_tensor, *grad,
},
{
*param_out, *mom1_out, *mom2_out,
},
{});
runner.Run(stream);
// NOTE(zhiqiu): ApplyAdamD updates params inplace, so
// if param and param_out is not same, we need to do copy.
if (param_out->data<T>() != param->data<T>()) {
ctx.template device_context<paddle::platform::NPUDeviceContext>().Wait();
framework::TensorCopySync(*param, ctx.GetPlace(), param_out);
}
if (mom1_out->data<T>() != mom1->data<T>()) {
ctx.template device_context<paddle::platform::NPUDeviceContext>().Wait();
framework::TensorCopySync(*mom1, ctx.GetPlace(), mom1_out);
}
if (mom2_out->data<T>() != mom2->data<T>()) {
ctx.template device_context<paddle::platform::NPUDeviceContext>().Wait();
framework::TensorCopySync(*mom2, ctx.GetPlace(), mom2_out);
}
auto runner_m1 =
NpuOpRunner("Mul", {*beta1_pow, beta1_tensor}, {*beta1_pow_out}, {});
runner_m1.Run(stream);
auto runner_m2 =
NpuOpRunner("Mul", {*beta2_pow, beta2_tensor}, {*beta2_pow_out}, {});
runner_m2.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
adam, ops::AdamNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::AdamNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/optimizers/sgd_op.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class SGDNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* learning_rate = ctx.Input<framework::LoDTensor>("LearningRate");
auto* param_var = ctx.Input<framework::LoDTensor>("Param");
auto* grad_var = ctx.Input<framework::LoDTensor>("Grad");
auto* param_out = ctx.Output<framework::LoDTensor>("ParamOut");
param_out->mutable_data<T>(ctx.GetPlace());
auto runner =
NpuOpRunner("ApplyGradientDescent",
{*param_var, *learning_rate, *grad_var}, {*param_out}, {});
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
// NOTE(zhiqiu): ApplyGradientDescent updates params inplace, so
// if param and param_out is not same, we need to do copy.
if (param_out->data<T>() != param_var->data<T>()) {
ctx.template device_context<paddle::platform::NPUDeviceContext>().Wait();
framework::TensorCopySync(*param_var, ctx.GetPlace(), param_out);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
sgd, ops::SGDNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::SGDNPUKernel<paddle::platform::NPUDeviceContext, double>,
ops::SGDNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <memory>
#include <string>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/range_op.h"
#include "paddle/fluid/operators/utils.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class RangeNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto* start_t = context.Input<framework::Tensor>("Start");
auto* end_t = context.Input<framework::Tensor>("End");
auto* step_t = context.Input<framework::Tensor>("Step");
auto* out = context.Output<framework::Tensor>("Out");
framework::Tensor n;
framework::TensorCopySync(*start_t, platform::CPUPlace(), &n);
T start = n.data<T>()[0];
framework::TensorCopySync(*end_t, platform::CPUPlace(), &n);
T end = n.data<T>()[0];
framework::TensorCopySync(*step_t, platform::CPUPlace(), &n);
T step = n.data<T>()[0];
int64_t size = 0;
GetSize(start, end, step, &size);
out->Resize(framework::make_ddim({size}));
out->mutable_data<T>(context.GetPlace());
std::vector<T> odata;
T value = start;
for (int64_t i = 0; i < size; ++i) {
odata.push_back(value);
value += step;
}
framework::TensorFromVector(odata, context.device_context(), out);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
range, ops::RangeNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::RangeNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::RangeNPUKernel<paddle::platform::NPUDeviceContext, double>)
#endif
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(range);
USE_OP_DEVICE_KERNEL(range, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx,
std::string op_type) {
// init
auto start = scope->Var("Start");
auto tensor_start = start->GetMutable<f::LoDTensor>();
std::vector<T> init_start;
init_start.push_back(static_cast<T>(1));
TensorFromVector(init_start, ctx, tensor_start);
tensor_start->Resize({1});
auto end = scope->Var("End");
auto tensor_end = end->GetMutable<f::LoDTensor>();
std::vector<T> init_end;
init_end.push_back(static_cast<T>(10));
TensorFromVector(init_end, ctx, tensor_end);
tensor_end->Resize({1});
auto step = scope->Var("Step");
auto tensor_step = step->GetMutable<f::LoDTensor>();
std::vector<T> init_step;
init_step.push_back(static_cast<T>(2));
TensorFromVector(init_step, ctx, tensor_step);
tensor_step->Resize({1});
ctx.Wait();
auto place = ctx.GetPlace();
auto out = scope->Var("Out");
auto tensor_out = out->GetMutable<f::LoDTensor>();
// run
auto op = f::OpRegistry::CreateOp(
op_type, {{"Start", {"Start"}}, {"End", {"End"}}, {"Step", {"Step"}}},
{{"Out", {"Out"}}}, {});
op->Run(*scope, place);
std::vector<T> out_vec;
TensorToVector(*tensor_out, ctx, &out_vec);
ctx.Wait();
EXPECT_EQ(static_cast<T>(out_vec.size()), static_cast<T>(5));
EXPECT_EQ(static_cast<T>(out_vec[0]), static_cast<T>(1.0));
EXPECT_EQ(static_cast<T>(out_vec[1]), static_cast<T>(3.0));
EXPECT_EQ(static_cast<T>(out_vec[2]), static_cast<T>(5.0));
EXPECT_EQ(static_cast<T>(out_vec[3]), static_cast<T>(7.0));
EXPECT_EQ(static_cast<T>(out_vec[4]), static_cast<T>(9.0));
}
TEST(range, NPU) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<int>(&scope, ctx, "range");
}
...@@ -42,3 +42,7 @@ endif() ...@@ -42,3 +42,7 @@ endif()
if(WITH_ROCM) if(WITH_ROCM)
hip_test(check_reduce_rank_test SRCS check_reduce_rank_test.cu DEPS tensor) hip_test(check_reduce_rank_test SRCS check_reduce_rank_test.cu DEPS tensor)
endif() endif()
if(WITH_ASCEND_CL)
cc_test(reduce_any_op_npu_test SRCS reduce_any_op_npu_test.cc DEPS op_registry reduce_any_op scope device_context enforce executor)
endif()
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename T>
class ReduceAnyNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
const Tensor* x = ctx.Input<Tensor>("X");
auto* out = ctx.Output<Tensor>("Out");
bool keep_dim = ctx.Attr<bool>("keep_dim");
auto dims = ctx.Attr<std::vector<int>>("dim");
out->mutable_data<T>(ctx.GetPlace());
// set attr
NPUAttributeMap attr = {{"keep_dims", keep_dim}, {"axes", dims}};
auto runner = NpuOpRunner("ReduceAnyD", {*x}, {*out}, attr);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(reduce_any, ops::ReduceAnyNPUKernel<bool>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <memory>
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/memory/malloc.h"
#include "paddle/fluid/memory/memcpy.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
using Tensor = paddle::framework::Tensor;
USE_OP(reduce_any);
USE_OP_DEVICE_KERNEL(reduce_any, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto x = scope->Var("X");
auto tensor_x = x->GetMutable<f::LoDTensor>();
std::vector<bool> init_x = {true, false, false, false};
f::TensorFromVector<bool>(init_x, ctx, tensor_x);
tensor_x->Resize(paddle::framework::make_ddim({2}));
ctx.Wait();
auto place = ctx.GetPlace();
auto out = scope->Var("Out");
auto tensor_out = out->GetMutable<f::LoDTensor>();
// run
std::vector<int> axes;
f::AttributeMap attrs = {{"axes", axes}, {"keep_dims", true}};
auto op = f::OpRegistry::CreateOp("reduce_any", {{"X", {"X"}}},
{{"Out", {"Out"}}}, attrs);
op->Run(*scope, place);
ctx.Wait();
std::vector<bool> out_vec;
f::TensorToVector<bool>(*tensor_out, ctx, &out_vec);
ctx.Wait();
std::vector<bool> expected_vec = {true};
EXPECT_EQ(out_vec.size(), expected_vec.size());
for (uint32_t i = 0; i < out_vec.size(); i++) {
EXPECT_EQ(out_vec[i], expected_vec[i]);
}
}
TEST(reduce_any, NPU) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<bool>(&scope, ctx);
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/reduce_ops/reduce_op.h"
#include "paddle/fluid/operators/unsqueeze_op.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class ReduceSumNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::Tensor>("X");
auto* out = ctx.Output<framework::Tensor>("Out");
bool reduce_all = ctx.Attr<bool>("reduce_all");
bool keep_dims = ctx.Attr<bool>("keep_dim");
auto dims = ctx.Attr<std::vector<int>>("dim");
out->mutable_data<T>(ctx.GetPlace());
// special case
if (x->dims().size() == 1 && keep_dims == false) {
keep_dims = true;
}
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
framework::Tensor cast_x;
framework::Tensor cast_out;
// NOTE: ReduceSumD only supports fp32 and fp16
if (x->type() != framework::proto::VarType::FP32 &&
x->type() != framework::proto::VarType::FP16) {
cast_x.Resize(x->dims());
cast_x.mutable_data<float>(ctx.GetPlace());
auto dst_dtype = ConvertToNpuDtype(framework::proto::VarType::FP32);
auto runner_cast = NpuOpRunner(
"Cast", {*x}, {cast_x}, {{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast.Run(stream);
cast_out.Resize(out->dims());
cast_out.mutable_data<float>(ctx.GetPlace());
} else {
cast_x.ShareDataWith(*x);
cast_out.ShareDataWith(*out);
}
if (reduce_all) {
std::vector<int> dim_vec;
for (int i = 0; i < x->dims().size(); i++) {
dim_vec.push_back(i);
}
auto runner = NpuOpRunner("ReduceSumD", {cast_x}, {cast_out},
{{"axes", dim_vec}, {"keep_dims", keep_dims}});
runner.Run(stream);
} else {
auto runner = NpuOpRunner("ReduceSumD", {cast_x}, {cast_out},
{{"axes", dims}, {"keep_dims", keep_dims}});
runner.Run(stream);
}
if (x->type() != framework::proto::VarType::FP32 &&
x->type() != framework::proto::VarType::FP16) {
auto dst_dtype = ConvertToNpuDtype(out->type());
auto runner_cast =
NpuOpRunner("Cast", {cast_out}, {*out},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast.Run(stream);
}
}
};
template <typename DeviceContext, typename T>
class ReduceSumGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::Tensor>("X");
auto* out_grad =
ctx.Input<framework::Tensor>(framework::GradVarName("Out"));
auto* x_grad = ctx.Output<framework::Tensor>(framework::GradVarName("X"));
bool reduce_all = ctx.Attr<bool>("reduce_all");
bool keep_dims = ctx.Attr<bool>("keep_dim");
auto dims = ctx.Attr<std::vector<int>>("dim");
x_grad->mutable_data<T>(ctx.GetPlace());
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
if (keep_dims || reduce_all) {
auto runner = NpuOpRunner("BroadcastToD", {*out_grad}, {*x_grad},
{{"shape", framework::vectorize(x->dims())}});
runner.Run(stream);
} else {
framework::DDim out_dims;
out_dims = UnsqueezeKernel<DeviceContext, T>::GetOutputShape(
dims, out_grad->dims());
Tensor out_grad_tmp(out_grad->type());
out_grad_tmp.Resize(out_dims);
out_grad_tmp.mutable_data<T>(ctx.GetPlace());
framework::TensorCopy(
*out_grad, ctx.GetPlace(),
ctx.template device_context<platform::DeviceContext>(),
&out_grad_tmp);
out_grad_tmp.Resize(out_dims);
auto runner = NpuOpRunner("BroadcastToD", {out_grad_tmp}, {*x_grad},
{{"shape", framework::vectorize(x->dims())}});
runner.Run(stream);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
reduce_sum,
ops::ReduceSumNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ReduceSumNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::ReduceSumNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
reduce_sum_grad,
ops::ReduceSumGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ReduceSumGradNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::ReduceSumGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class Reshape2NPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::Tensor>("X");
auto* out = ctx.Output<framework::Tensor>("Out");
auto list_new_shape_tensor =
ctx.MultiInput<framework::Tensor>("ShapeTensor");
if (list_new_shape_tensor.size() > 0) {
PADDLE_THROW(platform::errors::Unimplemented(
"Input(ShapeTensor) is not supported on NPU."));
}
PADDLE_ENFORCE_EQ(ctx.Input<framework::LoDTensor>("Shape"), nullptr,
platform::errors::Unimplemented(
"Input(Shape) is not supported on NPU."));
auto shape = out->dims();
out->mutable_data(ctx.GetPlace(), x->type());
framework::TensorCopy(
*x, ctx.GetPlace(),
ctx.template device_context<platform::DeviceContext>(), out);
out->Resize(shape);
}
};
template <typename DeviceContext, typename T>
class Reshape2GradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* d_x = ctx.Output<framework::Tensor>(framework::GradVarName("X"));
auto* d_out = ctx.Input<framework::Tensor>(framework::GradVarName("Out"));
auto in_dims = d_x->dims();
d_x->mutable_data(ctx.GetPlace(), d_out->type());
framework::TensorCopy(
*d_out, ctx.GetPlace(),
ctx.template device_context<platform::DeviceContext>(), d_x);
d_x->Resize(in_dims);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
reshape2, ops::Reshape2NPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::Reshape2NPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::Reshape2NPUKernel<paddle::platform::NPUDeviceContext, int64_t>,
ops::Reshape2NPUKernel<paddle::platform::NPUDeviceContext, bool>,
ops::Reshape2NPUKernel<paddle::platform::NPUDeviceContext, double>,
ops::Reshape2NPUKernel<paddle::platform::NPUDeviceContext, uint8_t>,
ops::Reshape2NPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
reshape2_grad,
ops::Reshape2GradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::Reshape2GradNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::Reshape2GradNPUKernel<paddle::platform::NPUDeviceContext, int64_t>,
ops::Reshape2GradNPUKernel<paddle::platform::NPUDeviceContext, bool>,
ops::Reshape2GradNPUKernel<paddle::platform::NPUDeviceContext, double>,
ops::Reshape2GradNPUKernel<paddle::platform::NPUDeviceContext, uint8_t>,
ops::Reshape2GradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/scale_op.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class ScaleNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::Tensor>("X");
auto* out = ctx.Output<framework::Tensor>("Out");
auto scale = static_cast<float>(ctx.Attr<float>("scale"));
auto bias = static_cast<float>(ctx.Attr<float>("bias"));
auto bias_after_scale = ctx.Attr<bool>("bias_after_scale");
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
float _power = 1.0;
if (bias_after_scale) {
out->mutable_data<T>(ctx.GetPlace());
auto runner =
NpuOpRunner("Power", {*x}, {*out},
{{"power", _power}, {"scale", scale}, {"shift", bias}});
runner.Run(stream);
} else {
Tensor tmp_x(x->type());
tmp_x.Resize(x->dims());
tmp_x.mutable_data<T>(ctx.GetPlace());
auto runner_tmp = NpuOpRunner("Adds", {*x}, {tmp_x}, {{"value", bias}});
runner_tmp.Run(stream);
out->mutable_data<T>(ctx.GetPlace());
float _bias = 0.0;
auto runner =
NpuOpRunner("Power", {tmp_x}, {*out},
{{"power", _power}, {"scale", scale}, {"shift", _bias}});
runner.Run(stream);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
scale, ops::ScaleNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ScaleNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <memory>
#include <string>
#include "paddle/fluid/operators/kron_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/scatter_op.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class ScatterNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<Tensor>("X");
auto* index = ctx.Input<Tensor>("Ids");
auto* updates = ctx.Input<Tensor>("Updates");
bool overwrite = ctx.Attr<bool>("overwrite");
auto* out = ctx.Output<Tensor>("Out");
auto place = ctx.GetPlace();
out->mutable_data<T>(place);
framework::Tensor tmp_tensor(index->type());
const auto index_dims = index->dims();
if (index_dims.size() == 1) {
tmp_tensor.ShareDataWith(*index);
std::vector<int64_t> new_dim = {index_dims[0], 1};
tmp_tensor.Resize(framework::make_ddim(new_dim));
index = &tmp_tensor;
}
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
if (overwrite) {
auto runner_update = NpuOpRunner("TensorScatterUpdate",
{*x, *index, *updates}, {*out}, {});
runner_update.Run(stream);
} else {
auto runner_add =
NpuOpRunner("TensorScatterAdd", {*x, *index, *updates}, {*out}, {});
runner_add.Run(stream);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
scatter, ops::ScatterNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ScatterNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
#endif
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/shape_op.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
using SelectedRows = framework::SelectedRows;
template <typename DeviceContext, typename T>
class ShapeNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* in_var = ctx.InputVar("Input");
framework::DDim in_dims;
if (in_var->IsType<SelectedRows>()) {
in_dims = in_var->Get<SelectedRows>().value().dims();
} else {
in_dims = in_var->Get<LoDTensor>().dims();
}
auto* out_t = ctx.Output<Tensor>("Out");
out_t->Resize({in_dims.size()});
// to do: cpuplace?
auto out_data = out_t->mutable_data<int32_t>(platform::CPUPlace());
for (int i = 0; i < in_dims.size(); ++i) {
out_data[i] = in_dims[i];
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
shape, ops::ShapeNPUKernel<paddle::platform::NPUDeviceContext, bool>,
ops::ShapeNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::ShapeNPUKernel<paddle::platform::NPUDeviceContext, int8_t>,
ops::ShapeNPUKernel<paddle::platform::NPUDeviceContext, uint8_t>,
ops::ShapeNPUKernel<paddle::platform::NPUDeviceContext, int64_t>,
ops::ShapeNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::ShapeNPUKernel<paddle::platform::NPUDeviceContext, double>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the Licnse. */
#include <memory>
#include <string>
#include "paddle/fluid/framework/ddim.h"
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/slice_op.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
void UpdateAttr(const framework::DDim in_dims, const std::vector<int> axes,
const std::vector<int> starts, const std::vector<int> ends,
std::vector<int>* offsets, std::vector<int>* size) {
int cnt = 0;
for (int i = 0; i < in_dims.size(); ++i) {
int start = 0;
int end = in_dims[i];
int axis = axes[cnt];
if (axis == i) {
start = starts[cnt];
if (start < 0) {
start = (start + in_dims[i]);
}
start = std::max(start, static_cast<int>(0));
end = ends[cnt];
if (end < 0) {
end = (end + in_dims[i]);
}
end = std::min(end, static_cast<int>(in_dims[i]));
cnt++;
}
(*offsets)[i] = start;
(*size)[i] = end - start;
}
}
template <typename DeviceContext, typename T>
class SliceNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* input = ctx.Input<Tensor>("Input");
auto* out = ctx.Output<Tensor>("Out");
auto axes = ctx.Attr<std::vector<int>>("axes");
auto starts = ctx.Attr<std::vector<int>>("starts");
auto ends = ctx.Attr<std::vector<int>>("ends");
out->mutable_data<T>(ctx.GetPlace());
auto in_dims = input->dims();
std::vector<int> offsets(in_dims.size());
std::vector<int> size(in_dims.size());
UpdateAttr(in_dims, axes, starts, ends, &offsets, &size);
auto runner = NpuOpRunner("SliceD", {*input}, {*out},
{{"offsets", offsets}, {"size", size}});
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class SliceGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* input = ctx.Input<Tensor>("Input");
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out"));
auto* dinput = ctx.Output<Tensor>(framework::GradVarName("Input"));
auto axes = ctx.Attr<std::vector<int>>("axes");
auto starts = ctx.Attr<std::vector<int>>("starts");
auto ends = ctx.Attr<std::vector<int>>("ends");
auto in_dims = input->dims();
int rank = in_dims.size();
std::vector<int> offsets(rank);
std::vector<int> size(rank);
UpdateAttr(in_dims, axes, starts, ends, &offsets, &size);
std::vector<std::vector<int64_t>> paddings(rank, std::vector<int64_t>(2));
for (int i = 0; i < rank; ++i) {
paddings[i][0] = static_cast<int64_t>(offsets[i]);
paddings[i][1] = static_cast<int64_t>(in_dims[i] - size[i] - offsets[i]);
}
dinput->mutable_data<T>(ctx.GetPlace());
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner =
NpuOpRunner("PadD", {*dout}, {*dinput}, {{"paddings", paddings}});
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
slice, ops::SliceNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::SliceNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
slice_grad,
ops::SliceGradNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::SliceGradNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
...@@ -83,11 +83,13 @@ class SoftmaxOp : public framework::OperatorWithKernel { ...@@ -83,11 +83,13 @@ class SoftmaxOp : public framework::OperatorWithKernel {
} }
#endif #endif
#ifndef PADDLE_WITH_ASCEND_CL
if (input_data_type == framework::proto::VarType::FP16) { if (input_data_type == framework::proto::VarType::FP16) {
PADDLE_ENFORCE_EQ(platform::is_gpu_place(ctx.GetPlace()), true, PADDLE_ENFORCE_EQ(platform::is_gpu_place(ctx.GetPlace()), true,
platform::errors::InvalidArgument( platform::errors::InvalidArgument(
"float16 can only be used on GPU place")); "float16 can only be used on GPU place"));
} }
#endif
return framework::OpKernelType(input_data_type, ctx.GetPlace(), layout_, return framework::OpKernelType(input_data_type, ctx.GetPlace(), layout_,
library_); library_);
...@@ -207,9 +209,10 @@ class SoftmaxOpGrad : public framework::OperatorWithKernel { ...@@ -207,9 +209,10 @@ class SoftmaxOpGrad : public framework::OperatorWithKernel {
} }
#endif #endif
if (input_data_type == framework::proto::VarType::FP16) { if (input_data_type == framework::proto::VarType::FP16) {
PADDLE_ENFORCE_EQ(platform::is_gpu_place(ctx.GetPlace()), true, if (!(platform::is_gpu_place(ctx.GetPlace()) ||
platform::errors::InvalidArgument( platform::is_npu_place(ctx.GetPlace())))
"float16 can only be used on GPU place")); PADDLE_THROW(platform::errors::InvalidArgument(
"float16 can only be used on GPU/NPU place"));
} }
return framework::OpKernelType(input_data_type, ctx.GetPlace(), layout_, return framework::OpKernelType(input_data_type, ctx.GetPlace(), layout_,
......
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/softmax_op.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class SoftmaxNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* in = ctx.Input<framework::LoDTensor>("X");
auto axis = ctx.Attr<int>("axis");
std::vector<int> axes;
axes.push_back(axis);
framework::NPUAttributeMap attr_input = {{"axes", axes}};
auto* out = ctx.Output<framework::LoDTensor>("Out");
out->mutable_data<T>(ctx.GetPlace());
auto runner = NpuOpRunner("SoftmaxV2", {*in}, {*out}, attr_input);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
template <typename DeviceContext, typename T>
class SoftmaxGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* out = ctx.Input<framework::LoDTensor>("Out");
auto* dOut = ctx.Input<framework::LoDTensor>(framework::GradVarName("Out"));
auto* dX = ctx.Output<Tensor>(framework::GradVarName("X"));
auto dims = dX->dims();
const int rank = dims.size();
const int axis = CanonicalAxis(ctx.Attr<int>("axis"), rank);
int64_t first_dim = 1;
int64_t sec_dim = 1;
for (int i = 0; i < axis; i++) {
first_dim *= dims[i];
}
for (int i = axis; i < rank; i++) {
sec_dim *= dims[i];
}
Tensor tmp_out;
tmp_out.ShareDataWith(*out).Resize({first_dim, sec_dim});
Tensor tmp_dOut;
tmp_dOut.ShareDataWith(*dOut).Resize({first_dim, sec_dim});
dX->Resize(framework::make_ddim({first_dim, sec_dim}));
dX->mutable_data<T>(ctx.GetPlace());
framework::NPUAttributeMap attr_input = {};
auto runner = NpuOpRunner(std::string("SoftmaxGrad"), {tmp_out, tmp_dOut},
{*dX}, attr_input);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
dX->Resize(dims);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(
softmax, ops::SoftmaxNPUKernel<plat::NPUDeviceContext, float>,
ops::SoftmaxNPUKernel<plat::NPUDeviceContext, double>,
ops::SoftmaxNPUKernel<plat::NPUDeviceContext, plat::float16>);
REGISTER_OP_NPU_KERNEL(
softmax_grad, ops::SoftmaxGradNPUKernel<plat::NPUDeviceContext, float>,
ops::SoftmaxGradNPUKernel<plat::NPUDeviceContext, double>,
ops::SoftmaxGradNPUKernel<plat::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/framework/tensor_util.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(softmax);
USE_OP_DEVICE_KERNEL(softmax, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto x = scope->Var("X");
auto tensor_x = x->GetMutable<f::LoDTensor>();
std::vector<T> init;
for (int i = 3; i < 9; ++i) {
init.push_back(static_cast<T>(i));
}
TensorFromVector(init, ctx, tensor_x);
tensor_x->Resize({2, 3});
ctx.Wait();
auto place = ctx.GetPlace();
auto out = scope->Var("Out");
auto tensor_out = out->GetMutable<f::LoDTensor>();
tensor_out->Resize({2, 3});
tensor_out->mutable_data<T>(place); // allocate
// run
int axis = 1;
f::AttributeMap attrs = {
{"axis", axis}, {"use_cudnn", false},
{"use_mkldnn", false}, {"mkldnn_data_type", std::string("float32")},
{"is_test", false},
};
auto op = f::OpRegistry::CreateOp("softmax", {{"X", {"X"}}},
{{"Out", {"Out"}}}, attrs);
op->Run(*scope, place);
ctx.Wait();
std::vector<T> out_vec;
TensorToVector(*tensor_out, ctx, &out_vec);
for (int i = 0; i < static_cast<int>(out_vec.size()); ++i) {
VLOG(3) << "out_vec[" << i << "] : " << out_vec[i];
}
ctx.Wait();
EXPECT_EQ((uint32_t)out_vec.size(), (uint32_t)(6));
}
template <typename T>
void CompareGrad(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto out = scope->Var("Out");
auto tensor_out = out->GetMutable<f::LoDTensor>();
std::vector<T> out_init;
out_init.push_back(static_cast<T>(0.6670));
out_init.push_back(static_cast<T>(0.5888));
out_init.push_back(static_cast<T>(0.4543));
out_init.push_back(static_cast<T>(0.3330));
out_init.push_back(static_cast<T>(0.4112));
out_init.push_back(static_cast<T>(0.5457));
TensorFromVector(out_init, ctx, tensor_out);
tensor_out->Resize({2, 3});
ctx.Wait();
auto dout = scope->Var("DOut");
auto tensor_dout = dout->GetMutable<f::LoDTensor>();
std::vector<T> dout_init;
for (int i = 0; i < 6; ++i) {
dout_init.push_back(static_cast<T>(1.0));
}
TensorFromVector(dout_init, ctx, tensor_dout);
tensor_dout->Resize({2, 3});
ctx.Wait();
auto dx = scope->Var("DX");
auto tensor_dx = dx->GetMutable<f::LoDTensor>();
ctx.Wait();
// run
f::AttributeMap attrs;
attrs = {
{"name", std::string("softmax_grad")},
{"axis", static_cast<int>(0)},
{"use_cudnn", false},
{"use_mkldnn", false},
{"mkldnn_data_type", std::string("float32")},
{"is_test", false},
{"data_format", std::string("AnyLayout")},
};
auto op = f::OpRegistry::CreateOp("softmax_grad",
{{"Out", {"Out"}}, {"Out@GRAD", {"DOut"}}},
{{"X@GRAD", {"DX"}}}, attrs);
auto place = ctx.GetPlace();
op->Run(*scope, place);
ctx.Wait();
EXPECT_EQ((uint32_t)tensor_dx->dims()[0], (uint32_t)(2));
EXPECT_EQ((uint32_t)tensor_dx->dims()[1], (uint32_t)(3));
ctx.Wait();
std::vector<float> out_vec;
TensorToVector(*tensor_dx, ctx, &out_vec);
ctx.Wait();
EXPECT_EQ((uint32_t)out_vec.size(), (uint32_t)(6));
EXPECT_NEAR((float)out_vec[0], (float)(-0.4737), 0.1);
EXPECT_NEAR((float)out_vec[1], (float)(-0.4181), 0.1);
EXPECT_NEAR((float)out_vec[2], (float)(-0.3226), 0.1);
EXPECT_NEAR((float)out_vec[3], (float)(-0.0965), 0.1);
EXPECT_NEAR((float)out_vec[4], (float)(-0.1192), 0.1);
EXPECT_NEAR((float)out_vec[5], (float)(-0.1582), 0.1);
}
TEST(softmax, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx);
}
TEST(softmax_grad, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
CompareGrad<float>(&scope, ctx);
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/math/softmax.h"
#include <memory>
#include <string>
#include "paddle/fluid/operators/math/cross_entropy.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/softmax_op.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class SoftmaxWithCrossEntropyNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* logits = ctx.Input<Tensor>("Logits");
auto* labels = ctx.Input<Tensor>("Label");
auto* softmax = ctx.Output<Tensor>("Softmax");
auto* loss = ctx.Output<Tensor>("Loss");
int cls_num = logits->dims()[1];
const int rank = logits->dims().size();
const int axis = CanonicalAxis(ctx.Attr<int>("axis"), rank);
std::vector<int> axes;
for (auto i = axis; i < logits->dims().size(); ++i) {
axes.push_back(i);
}
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
// softmax
softmax->mutable_data<T>(ctx.GetPlace());
auto runner_softmax =
NpuOpRunner("SoftmaxV2", {*logits}, {*softmax}, {{"axes", axes}});
runner_softmax.Run(stream);
// cast label from int64/int32 to int32
Tensor tmp_labels(framework::proto::VarType::INT32);
if (labels->type() != framework::proto::VarType::INT32) {
tmp_labels.Resize(labels->dims());
tmp_labels.mutable_data(ctx.GetPlace(), framework::proto::VarType::INT32);
auto dst_dtype = ConvertToNpuDtype(framework::proto::VarType::INT32);
auto runner_cast_label =
NpuOpRunner("Cast", {*labels}, {tmp_labels},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_label.Run(stream);
labels = &tmp_labels;
}
// on and off
Tensor on_tensor(framework::proto::VarType::INT32);
on_tensor.mutable_data<int>({1}, ctx.GetPlace());
TensorFromVector(std::vector<int>{static_cast<int>(1)},
ctx.device_context(), &on_tensor);
Tensor off_tensor(framework::proto::VarType::INT32);
off_tensor.mutable_data<int>({1}, ctx.GetPlace());
TensorFromVector(std::vector<int>{static_cast<int>(0)},
ctx.device_context(), &off_tensor);
// one_hot
Tensor tmp_onehot(on_tensor.type());
tmp_onehot.Resize(logits->dims());
tmp_onehot.mutable_data<int>(ctx.GetPlace());
auto runner_onehot =
NpuOpRunner("OneHotD", {*labels, on_tensor, off_tensor}, {tmp_onehot},
{{"axis", -1}, {"depth", cls_num}});
runner_onehot.Run(stream);
// cast one_hot from int32 to T
Tensor cast_onehot(logits->type());
cast_onehot.Resize(tmp_onehot.dims());
cast_onehot.mutable_data<T>(ctx.GetPlace());
auto dst_dtype = ConvertToNpuDtype(logits->type());
auto runner_cast_onehot =
NpuOpRunner("Cast", {tmp_onehot}, {cast_onehot},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_onehot.Run(stream);
// SoftmaxCrossEntropyWithLogits
Tensor backprop(logits->type());
backprop.Resize(logits->dims());
backprop.mutable_data<T>(ctx.GetPlace());
loss->mutable_data<T>(ctx.GetPlace());
// SoftmaxCrossEntropyWithLogits requires loss to be of shape [batch_size]
auto loss_dims = loss->dims();
loss->Resize({loss_dims[0]});
auto runner_s = NpuOpRunner("SoftmaxCrossEntropyWithLogits",
{*logits, cast_onehot}, {*loss, backprop}, {});
runner_s.Run(stream);
loss->Resize(loss_dims);
}
};
template <typename DeviceContext, typename T>
class SoftmaxWithCrossEntropyGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* labels = ctx.Input<Tensor>("Label");
auto* softmax = ctx.Input<Tensor>("Softmax");
auto* loss_grad = ctx.Input<Tensor>(framework::GradVarName("Loss"));
auto* logits_grad = ctx.Output<Tensor>(framework::GradVarName("Logits"));
int cls_num = softmax->dims()[1];
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
// cast label from int64/int32 to int32
Tensor tmp_labels(framework::proto::VarType::INT32);
if (labels->type() != framework::proto::VarType::INT32) {
tmp_labels.Resize(labels->dims());
tmp_labels.mutable_data(ctx.GetPlace(), framework::proto::VarType::INT32);
auto dst_dtype = ConvertToNpuDtype(framework::proto::VarType::INT32);
auto runner_cast_label =
NpuOpRunner("Cast", {*labels}, {tmp_labels},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_label.Run(stream);
labels = &tmp_labels;
}
// on and off
Tensor on_tensor(framework::proto::VarType::INT32);
on_tensor.mutable_data<int>({1}, ctx.GetPlace());
TensorFromVector(std::vector<int>{static_cast<int>(1)},
ctx.device_context(), &on_tensor);
Tensor off_tensor(framework::proto::VarType::INT32);
off_tensor.mutable_data<int>({1}, ctx.GetPlace());
TensorFromVector(std::vector<int>{static_cast<int>(0)},
ctx.device_context(), &off_tensor);
// one_hot
Tensor tmp_onehot(on_tensor.type());
tmp_onehot.Resize(softmax->dims());
tmp_onehot.mutable_data<int>(ctx.GetPlace());
auto runner_onehot =
NpuOpRunner("OneHotD", {*labels, on_tensor, off_tensor}, {tmp_onehot},
{{"axis", -1}, {"depth", cls_num}});
runner_onehot.Run(stream);
// cast one_hot from int32 to T
Tensor cast_onehot(softmax->type());
cast_onehot.Resize(tmp_onehot.dims());
cast_onehot.mutable_data<T>(ctx.GetPlace());
auto dst_dtype = ConvertToNpuDtype(softmax->type());
auto runner_cast_onehot =
NpuOpRunner("Cast", {tmp_onehot}, {cast_onehot},
{{"dst_type", static_cast<int>(dst_dtype)}});
runner_cast_onehot.Run(stream);
// sub
Tensor tmp_sub(softmax->type());
tmp_sub.Resize(softmax->dims());
tmp_sub.mutable_data<T>(ctx.GetPlace());
auto runner_sub =
NpuOpRunner("Sub", {*softmax, cast_onehot}, {tmp_sub}, {});
runner_sub.Run(stream);
// mul
logits_grad->mutable_data<T>(ctx.GetPlace());
auto runner_mul =
NpuOpRunner("Mul", {*loss_grad, tmp_sub}, {*logits_grad}, {});
runner_mul.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
softmax_with_cross_entropy,
ops::SoftmaxWithCrossEntropyNPUKernel<paddle::platform::NPUDeviceContext,
float>,
ops::SoftmaxWithCrossEntropyNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
REGISTER_OP_NPU_KERNEL(
softmax_with_cross_entropy_grad,
ops::SoftmaxWithCrossEntropyGradNPUKernel<
paddle::platform::NPUDeviceContext, float>,
ops::SoftmaxWithCrossEntropyGradNPUKernel<
paddle::platform::NPUDeviceContext, paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <memory>
#include <string>
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/squeeze_op.h"
namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(
squeeze, ops::SqueezeKernel<plat::NPUDeviceContext, float>,
ops::SqueezeKernel<plat::NPUDeviceContext, double>,
ops::SqueezeKernel<plat::NPUDeviceContext, plat::float16>,
ops::SqueezeKernel<plat::NPUDeviceContext, bool>,
ops::SqueezeKernel<plat::NPUDeviceContext, int>,
ops::SqueezeKernel<plat::NPUDeviceContext, uint8_t>,
ops::SqueezeKernel<plat::NPUDeviceContext, int8_t>,
ops::SqueezeKernel<plat::NPUDeviceContext, int64_t>);
REGISTER_OP_NPU_KERNEL(
squeeze2, ops::SqueezeKernel<plat::NPUDeviceContext, float>,
ops::SqueezeKernel<plat::NPUDeviceContext, double>,
ops::SqueezeKernel<plat::NPUDeviceContext, plat::float16>,
ops::SqueezeKernel<plat::NPUDeviceContext, bool>,
ops::SqueezeKernel<plat::NPUDeviceContext, int>,
ops::SqueezeKernel<plat::NPUDeviceContext, uint8_t>,
ops::SqueezeKernel<plat::NPUDeviceContext, int8_t>,
ops::SqueezeKernel<plat::NPUDeviceContext, int64_t>);
#endif
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(squeeze);
USE_OP_DEVICE_KERNEL(squeeze, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto x = scope->Var("X");
auto tensor_x = x->GetMutable<f::LoDTensor>();
int dim0 = 1;
int dim1 = 10;
int dim2 = 1;
std::vector<T> init;
for (int64_t i = 0; i < dim0 * dim1 * dim2; ++i) {
init.push_back(static_cast<T>(0.1));
}
TensorFromVector(init, ctx, tensor_x);
tensor_x->Resize({dim0, dim1, dim2});
ctx.Wait();
// run
auto place = ctx.GetPlace();
auto out = scope->Var("Out");
auto tensor_out = out->GetMutable<f::LoDTensor>();
std::vector<int> axis;
axis.push_back(2);
f::AttributeMap attrs = {{"axes", axis}};
auto op = f::OpRegistry::CreateOp("squeeze", {{"X", {"X"}}},
{{"Out", {"Out"}}}, attrs);
op->Run(*scope, place);
ctx.Wait();
EXPECT_EQ((uint32_t)tensor_out->dims().size(), uint32_t(2));
EXPECT_EQ((uint32_t)tensor_out->dims()[0], uint32_t(dim0));
EXPECT_EQ((uint32_t)tensor_out->dims()[1], uint32_t(dim1));
std::vector<T> out_vec;
TensorToVector(*tensor_out, ctx, &out_vec);
for (uint32_t i = 0; i < out_vec.size(); i++) {
EXPECT_EQ(out_vec[i], static_cast<T>(0.1));
}
ctx.Wait();
}
TEST(squeeze, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx);
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <memory>
#include <string>
#include <vector>
#include "paddle/fluid/operators/activation_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/stack_op.h"
#include "paddle/fluid/operators/unsqueeze_op.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class StackNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto x = ctx.MultiInput<Tensor>("X");
int32_t N = x.size();
PADDLE_ENFORCE_GT(
N, 0, platform::errors::InvalidArgument("number of input Tensor <= 0"));
std::vector<paddle::framework::Tensor> x_list;
for (int i = 0; i < N; i++) {
x_list.push_back(*x[i]);
}
int axis = ctx.Attr<int>("axis");
if (axis < 0) {
axis = axis + x_list[0].dims().size() + 1;
}
auto* out = ctx.Output<Tensor>("Y");
auto place = ctx.GetPlace();
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
out->mutable_data<T>(place);
if (axis != 0) {
auto x_dim = x_list[0].dims();
std::vector<int> vec_dim_tmp;
vec_dim_tmp.push_back(N);
for (auto i = 0; i < x_dim.size(); ++i) {
vec_dim_tmp.push_back(x_dim[i]);
}
Tensor tmp_stack(out->type());
tmp_stack.Resize(framework::make_ddim(vec_dim_tmp));
tmp_stack.mutable_data<T>(ctx.GetPlace());
auto runner =
NpuOpRunner("Pack", {x_list}, {tmp_stack}, {{"axis", 0}, {"N", N}});
runner.Run(stream);
std::vector<int64_t> vec_trans;
for (auto i = 1; i <= x_dim.size(); ++i) {
vec_trans.push_back(i);
if (i == axis) {
vec_trans.push_back(0);
}
}
auto runner_trans_final =
NpuOpRunner("TransposeD", {tmp_stack}, {*out}, {{"perm", vec_trans}});
runner_trans_final.Run(stream);
} else {
auto runner =
NpuOpRunner("Pack", {x_list}, {*out}, {{"axis", axis}, {"N", N}});
runner.Run(stream);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
stack, ops::StackNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::StackNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
#endif
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include <vector>
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/sum_op.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class SumNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto x = ctx.MultiInput<Tensor>("X");
auto* out = ctx.Output<Tensor>("Out");
out->mutable_data<T>(ctx.GetPlace());
auto place = ctx.GetPlace();
int n = static_cast<int>(x.size());
PADDLE_ENFORCE_EQ(n > 1, true,
platform::errors::InvalidArgument(
"The size of Input(x) list must larger or equal 2"));
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner("Add", {*x[0], *x[1]}, {*out}, {});
runner.Run(stream);
for (int i = 2; i < n; i++) {
runner = NpuOpRunner("Add", {*out, *x[i]}, {*out}, {});
runner.Run(stream);
}
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
sum, ops::SumNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::SumNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <memory>
#include <string>
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/top_k_op.h"
namespace paddle {
namespace operators {
void gen_assist_seq(framework::Tensor* assit_tensor, int64_t dim,
const framework::ExecutionContext& ctx) {
const int64_t dimx2 = dim;
std::vector<paddle::platform::float16> assit;
assit.resize(2 * dimx2);
for (int64_t i = 0; i < dimx2; i++) {
// for i in range [0, dim]
assit[i] = static_cast<paddle::platform::float16>(i);
// for i in range [dim, dimx2]
int64_t idx =
static_cast<int64_t>(static_cast<paddle::platform::float16>(i));
int64_t gap = i - idx;
assit[i + dim] = static_cast<paddle::platform::float16>(gap);
}
framework::TensorFromVector(assit, ctx.device_context(), assit_tensor);
}
template <typename DeviceContext, typename T>
class TopkNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
// read input
auto* input = ctx.Input<framework::LoDTensor>("X");
auto* output = ctx.Output<framework::LoDTensor>("Out");
auto* indices = ctx.Output<framework::LoDTensor>("Indices");
size_t k = static_cast<int>(ctx.Attr<int>("k"));
output->mutable_data<T>(ctx.GetPlace());
indices->mutable_data<int>(ctx.GetPlace());
// prepare assit
auto dim = input->dims().size();
framework::Tensor assist_seq_tensor;
assist_seq_tensor.Resize({2 * dim});
assist_seq_tensor.mutable_data<T>(ctx.GetPlace());
gen_assist_seq(&assist_seq_tensor, dim, ctx);
framework::NPUAttributeMap attr_input = {{"sorted", "true"},
{"k", static_cast<int>(k)},
{"dim", -1},
{"largest", true}};
// run ascend
auto runner = NpuOpRunner("TopKD", {*input, assist_seq_tensor},
{*output, *indices}, attr_input);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
// Ascend Op TopKD only support input float 16 dtype
REGISTER_OP_NPU_KERNEL(top_k,
ops::TopkNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <iostream>
#include <memory>
#include <string>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/expand_op.h"
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
template <typename DeviceContext, typename T>
class TransposeNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* x = ctx.Input<framework::LoDTensor>("X");
auto* out = ctx.Output<framework::LoDTensor>("Out");
std::vector<int> axis = ctx.Attr<std::vector<int>>("axis");
framework::NPUAttributeMap attr_input = {{"perm", axis}};
out->mutable_data<T>(ctx.device_context().GetPlace());
auto runner = NpuOpRunner("TransposeD", {*x}, {*out}, attr_input);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
template <typename T>
class TransposeGradNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
auto* out_grad =
ctx.Input<framework::LoDTensor>(framework::GradVarName("Out"));
auto* x_grad =
ctx.Output<framework::LoDTensor>(framework::GradVarName("X"));
std::vector<int> axis = ctx.Attr<std::vector<int>>("axis");
std::vector<int> reversed_axis(axis);
for (size_t i = 0; i < axis.size(); i++) {
reversed_axis[axis[i]] = i;
}
x_grad->mutable_data<T>(ctx.GetPlace());
framework::NPUAttributeMap attr_input = {{"perm", reversed_axis}};
auto runner = NpuOpRunner("TransposeD", {*out_grad}, {*x_grad}, attr_input);
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
runner.Run(stream);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(
transpose2,
ops::TransposeNPUKernel<paddle::platform::NPUDeviceContext, float>,
ops::TransposeNPUKernel<paddle::platform::NPUDeviceContext,
paddle::platform::float16>,
ops::TransposeNPUKernel<paddle::platform::NPUDeviceContext, int>,
ops::TransposeNPUKernel<paddle::platform::NPUDeviceContext, uint8_t>,
ops::TransposeNPUKernel<paddle::platform::NPUDeviceContext, int8_t>);
REGISTER_OP_NPU_KERNEL(transpose2_grad, ops::TransposeGradNPUKernel<float>,
ops::TransposeGradNPUKernel<paddle::platform::float16>,
ops::TransposeGradNPUKernel<int>,
ops::TransposeGradNPUKernel<uint8_t>,
ops::TransposeGradNPUKernel<int8_t>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <cmath>
#include <iostream>
#include <numeric>
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(transpose2);
USE_OP_DEVICE_KERNEL(transpose2, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto x = scope->Var("X");
auto out = scope->Var("Out");
auto xshape = scope->Var("XShape");
auto* x_t = x->GetMutable<f::LoDTensor>();
auto* out_t = out->GetMutable<f::LoDTensor>();
auto* xshape_t = xshape->GetMutable<f::LoDTensor>();
auto place = ctx.GetPlace();
int dim0 = 2;
int dim1 = 3;
TensorFromVector(std::vector<T>({0, 1, 2, 3, 4, 5}), ctx, x_t);
ctx.Wait();
x_t->Resize({dim0, dim1});
out_t->Resize({dim0, dim1});
ctx.Wait();
out_t->mutable_data<T>(place);
ctx.Wait();
xshape_t->Resize({dim0, dim1});
xshape_t->mutable_data<T>(place);
f::AttributeMap attrs = {{"axis", std::vector<int>({1, 0})},
{"data_format", std::string("AnyLayout")}};
auto op = f::OpRegistry::CreateOp("transpose2", {{"X", {"X"}}},
{{"Out", {"Out"}}, {"XShape", {"XShape"}}},
attrs);
ctx.Wait();
op->Run(*scope, place);
ctx.Wait();
std::vector<T> out_v;
TensorToVector(*out_t, ctx, &out_v);
ctx.Wait();
EXPECT_EQ(out_t->numel(), dim0 * dim1);
EXPECT_EQ(out_v[0], 0);
EXPECT_EQ(out_v[1], 3);
EXPECT_EQ(out_v[2], 1);
EXPECT_EQ(out_v[3], 4);
EXPECT_EQ(out_v[4], 2);
EXPECT_EQ(out_v[5], 5);
}
template <typename T>
void CompareGrad(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto xshape = scope->Var("XShape");
auto x_grad = scope->Var("X@GRAD");
auto out_grad = scope->Var("Out@GRAD");
auto* x_grad_t = x_grad->GetMutable<f::LoDTensor>();
auto* xshape_t = xshape->GetMutable<f::LoDTensor>();
auto* out_grad_t = out_grad->GetMutable<f::LoDTensor>();
int dim0 = 2;
int dim1 = 3;
auto place = ctx.GetPlace();
TensorFromVector(std::vector<T>({0, 1, 2, 3, 4, 5}), ctx, out_grad_t);
ctx.Wait();
x_grad_t->Resize({dim0, dim1});
xshape_t->Resize(
{0, dim0,
dim1}); // NOTE(zhiqiu): 0 is needed, see its infershape function
out_grad_t->Resize({dim0, dim1});
f::AttributeMap attrs = {{"axis", std::vector<int>({1, 0})},
{"data_format", std::string("AnyLayout")}};
auto op = f::OpRegistry::CreateOp(
"transpose2_grad", {{"Out@GRAD", {"Out@GRAD"}}, {"XShape", {"XShape"}}},
{{"X@GRAD", {"X@GRAD"}}}, attrs);
op->Run(*scope, place);
ctx.Wait();
std::vector<T> out_v;
TensorToVector(*x_grad_t, ctx, &out_v);
ctx.Wait();
EXPECT_EQ(x_grad_t->numel(), dim0 * dim1);
EXPECT_EQ(out_v[0], 0);
EXPECT_EQ(out_v[1], 3);
EXPECT_EQ(out_v[2], 1);
EXPECT_EQ(out_v[3], 4);
EXPECT_EQ(out_v[4], 2);
EXPECT_EQ(out_v[5], 5);
}
TEST(transpose2, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx);
}
TEST(transpose2_grad, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
CompareGrad<float>(&scope, ctx);
}
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/truncated_gaussian_random_op.h"
#include <memory>
#include <string>
#include "paddle/fluid/operators/npu_op_runner.h"
namespace paddle {
namespace operators {
using Tensor = framework::Tensor;
template <typename DeviceContext, typename T>
class TruncatedGaussianRandomNPUKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
// TODO(zhiqiu): support dynamic shape and call ParameterizedTruncatedNormal
std::vector<int> shape = ctx.Attr<std::vector<int>>("shape");
Tensor shape_tensor(framework::proto::VarType::INT32);
shape_tensor.mutable_data<int32_t>({static_cast<int>(shape.size())},
ctx.GetPlace());
TensorFromVector(shape, ctx.device_context(), &shape_tensor);
float mean = ctx.Attr<float>("mean");
Tensor mean_tensor(framework::proto::VarType::FP32);
mean_tensor.mutable_data<float>({1}, ctx.GetPlace());
TensorFromVector(std::vector<float>{mean}, ctx.device_context(),
&mean_tensor);
float std = ctx.Attr<float>("std");
Tensor std_tensor(framework::proto::VarType::FP32);
std_tensor.mutable_data<float>({1}, ctx.GetPlace());
TensorFromVector(std::vector<float>{std}, ctx.device_context(),
&std_tensor);
int32_t seed_var = ctx.Attr<int32_t>("seed");
Tensor min_tensor(framework::proto::VarType::FP32);
min_tensor.mutable_data<float>({1}, ctx.GetPlace());
float min_value = mean - std * 2.0;
TensorFromVector(std::vector<float>{min_value}, ctx.device_context(),
&min_tensor);
Tensor max_tensor(framework::proto::VarType::FP32);
max_tensor.mutable_data<float>({1}, ctx.GetPlace());
float max_value = mean + std * 2.0;
TensorFromVector(std::vector<float>{max_value}, ctx.device_context(),
&max_tensor);
auto* out = ctx.Output<framework::Tensor>("Out");
out->mutable_data<T>(ctx.GetPlace());
auto stream =
ctx.template device_context<paddle::platform::NPUDeviceContext>()
.stream();
auto runner = NpuOpRunner(
"ParameterizedTruncatedNormal",
{shape_tensor, mean_tensor, std_tensor, min_tensor, max_tensor}, {*out},
{{"seed", seed_var}});
runner.Run(stream);
}
};
// NOTE(zhiqiu): actually, this is cpu version kernel, and we need to make the
// above
// npu version work in the future.
template <typename T>
class NPUTruncatedGaussianRandomKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
float mean = context.Attr<float>("mean");
float std = context.Attr<float>("std");
auto* tensor = context.Output<framework::Tensor>("Out");
tensor->mutable_data<T>(context.GetPlace());
Tensor cpu_tensor(tensor->type());
cpu_tensor.Resize(tensor->dims());
T* cpu_data = cpu_tensor.mutable_data<T>(platform::CPUPlace());
std::uniform_real_distribution<T> dist(std::numeric_limits<float>::min(),
1.0);
TruncatedNormal<T> truncated_normal(mean, std);
int64_t size = tensor->numel();
unsigned int seed = static_cast<unsigned int>(context.Attr<int>("seed"));
auto engine = framework::GetCPURandomEngine(seed);
for (int64_t i = 0; i < size; ++i) {
cpu_data[i] = truncated_normal(dist(*engine));
}
framework::TensorCopy(
cpu_tensor, context.GetPlace(),
context.template device_context<platform::DeviceContext>(), tensor);
context.template device_context<paddle::platform::NPUDeviceContext>()
.Wait();
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators;
REGISTER_OP_NPU_KERNEL(truncated_gaussian_random,
ops::NPUTruncatedGaussianRandomKernel<float>);
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifdef PADDLE_WITH_ASCEND_CL
#include <memory>
#include <string>
#include "paddle/fluid/operators/npu_op_runner.h"
#include "paddle/fluid/operators/unsqueeze_op.h"
namespace ops = paddle::operators;
namespace plat = paddle::platform;
REGISTER_OP_NPU_KERNEL(
unsqueeze, ops::UnsqueezeKernel<plat::NPUDeviceContext, float>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, double>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, plat::float16>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, bool>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, int>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, int8_t>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, int64_t>);
REGISTER_OP_NPU_KERNEL(
unsqueeze2, ops::UnsqueezeKernel<plat::NPUDeviceContext, float>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, double>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, plat::float16>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, bool>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, int>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, int8_t>,
ops::UnsqueezeKernel<plat::NPUDeviceContext, int64_t>);
#endif
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef _WIN32
#include <unistd.h>
#endif
#include <string>
#include <thread> // NOLINT
#include <vector>
#include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/fluid/framework/program_desc.h"
#include "paddle/fluid/operators/dropout_op.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/string/printf.h"
namespace f = paddle::framework;
namespace p = paddle::platform;
namespace m = paddle::operators::math;
USE_OP(unsqueeze);
USE_OP_DEVICE_KERNEL(unsqueeze, NPU);
template <typename T>
void Compare(f::Scope* scope, const p::DeviceContext& ctx) {
// init
auto x = scope->Var("X");
auto tensor_x = x->GetMutable<f::LoDTensor>();
int dim0 = 5;
int dim1 = 10;
std::vector<T> init;
for (int64_t i = 0; i < dim0 * dim1; ++i) {
init.push_back(static_cast<T>(0.1));
}
TensorFromVector(init, ctx, tensor_x);
tensor_x->Resize({dim0, dim1});
ctx.Wait();
// run
auto place = ctx.GetPlace();
auto out = scope->Var("Out");
auto tensor_out = out->GetMutable<f::LoDTensor>();
std::vector<int> axis;
axis.push_back(1);
f::AttributeMap attrs = {{"axes", axis}};
auto op = f::OpRegistry::CreateOp("unsqueeze", {{"X", {"X"}}},
{{"Out", {"Out"}}}, attrs);
op->Run(*scope, place);
ctx.Wait();
EXPECT_EQ((uint32_t)tensor_out->dims().size(), uint32_t(3));
EXPECT_EQ((uint32_t)tensor_out->dims()[0], uint32_t(5));
EXPECT_EQ((uint32_t)tensor_out->dims()[1], uint32_t(1));
EXPECT_EQ((uint32_t)tensor_out->dims()[2], uint32_t(10));
std::vector<T> out_vec;
TensorToVector(*tensor_out, ctx, &out_vec);
for (uint32_t i = 0; i < out_vec.size(); i++) {
EXPECT_EQ(out_vec[i], static_cast<T>(0.1));
}
ctx.Wait();
}
TEST(unsqueeze, NPU_fp32) {
f::Scope scope;
p::NPUDeviceContext ctx(p::NPUPlace(0));
Compare<float>(&scope, ctx);
}
...@@ -163,8 +163,11 @@ size_t NPUInitAllocSize() { return NPUAllocSize(/* realloc = */ false); } ...@@ -163,8 +163,11 @@ size_t NPUInitAllocSize() { return NPUAllocSize(/* realloc = */ false); }
size_t NPUReallocSize() { return NPUAllocSize(/* realloc = */ true); } size_t NPUReallocSize() { return NPUAllocSize(/* realloc = */ true); }
size_t NPUMinChunkSize() { size_t NPUMinChunkSize() {
// Allow to allocate the minimum chunk size is 256 bytes. // NOTE(zhiqiu): It seems the min chunk size should be 512 on NPU,
return 1 << 8; // though no document specify that explicitly.
// See https://gitee.com/zhiqiuchen/Ascend/tree/master/test_reduce_sum_d for
// details.
return 1 << 9;
} }
size_t NPUMaxChunkSize() { size_t NPUMaxChunkSize() {
......
...@@ -82,24 +82,25 @@ def _get_ascend_rankfile(rank_table_file_path): ...@@ -82,24 +82,25 @@ def _get_ascend_rankfile(rank_table_file_path):
def get_cloud_cluster(rank_table_file=None, def get_cloud_cluster(rank_table_file=None,
device_mode=DeviceMode.ASCEND_NPU, device_mode=DeviceMode.ASCEND_NPU,
devices_per_proc=None,
start_port=6070): start_port=6070):
""" """
Args: Args:
rank_table_file: string, ascend npu rank file path rank_table_file: string, ascend npu rank file path
device_mode: DeviceMode(Int) device_mode: DeviceMode(Int)
devices_per_proc:list
start_port: the start port of current runtime env start_port: the start port of current runtime env
""" """
if rank_table_file: if rank_table_file:
# multi trainers # multi trainers
node_ips, device_count = _get_ascend_rankfile(rank_table_file) node_ips, device_count = _get_ascend_rankfile(rank_table_file)
if len(node_ips) == 1:
node_ip = node_ips[0]
else:
node_index = os.environ.get("PADDLE_TRAINER_ID") node_index = os.environ.get("PADDLE_TRAINER_ID")
node_ip = None node_ip = None
if node_index is None: if node_index:
_, node_ip = get_host_name_ip()
else:
node_ip = node_ips[int(node_index)] node_ip = node_ips[int(node_index)]
else:
_, node_ip = get_host_name_ip()
assert node_ip in node_ips, "Can't find your local ip {%s} in node_ips: {%s}" \ assert node_ip in node_ips, "Can't find your local ip {%s} in node_ips: {%s}" \
% (node_ip, node_ips) % (node_ip, node_ips)
...@@ -108,11 +109,8 @@ def get_cloud_cluster(rank_table_file=None, ...@@ -108,11 +109,8 @@ def get_cloud_cluster(rank_table_file=None,
node_ips = ["127.0.0.1"] node_ips = ["127.0.0.1"]
node_ip = node_ips[0] node_ip = node_ips[0]
device_count = 1 device_count = 1
devices_per_proc = None
if devices_per_proc is None:
devices_per_proc = [str(x) for x in range(device_count)] devices_per_proc = [str(x) for x in range(device_count)]
free_ports = [ free_ports = [
x for x in range(start_port, start_port + len(devices_per_proc)) x for x in range(start_port, start_port + len(devices_per_proc))
] ]
......
...@@ -115,15 +115,6 @@ see: http://www.paddlepaddle.org/documentation/docs/zh/1.6/user_guides/howto/tra ...@@ -115,15 +115,6 @@ see: http://www.paddlepaddle.org/documentation/docs/zh/1.6/user_guides/howto/tra
default="collective", default="collective",
help="run mode of job, can be:collective/ps/ps-heter") help="run mode of job, can be:collective/ps/ps-heter")
base_group.add_argument(
"--ascend_npus",
type=str,
default=None,
help="It's for ascend npu training."
"For example:"
"--ascend_npus=\"0,1,2,3\" will launch four training processes each bound to one npu."
)
if fluid.core.is_compiled_with_cuda(): if fluid.core.is_compiled_with_cuda():
base_group.add_argument( base_group.add_argument(
"--gpus", "--gpus",
...@@ -243,7 +234,6 @@ def launch_collective(args): ...@@ -243,7 +234,6 @@ def launch_collective(args):
cluster, pod = ascend_utils.get_cloud_cluster( cluster, pod = ascend_utils.get_cloud_cluster(
rank_table_file=os.getenv("RANK_TABLE_FILE", None), rank_table_file=os.getenv("RANK_TABLE_FILE", None),
device_mode=device_mode, device_mode=device_mode,
devices_per_proc=devices_per_proc,
start_port=start_port) start_port=start_port)
else: else:
# trainers_num = 1 or not use paddlecloud ips="a,b" # trainers_num = 1 or not use paddlecloud ips="a,b"
......
...@@ -484,6 +484,11 @@ def start_local_trainers(cluster, ...@@ -484,6 +484,11 @@ def start_local_trainers(cluster,
proc_env["FLAGS_selected_gpus"] = "%s" % ",".join( proc_env["FLAGS_selected_gpus"] = "%s" % ",".join(
[str(g) for g in t.accelerators]) [str(g) for g in t.accelerators])
elif len(t.
accelerators) > 0 and pod.device_mode == DeviceMode.ASCEND_NPU:
proc_env["FLAGS_selected_npus"] = "%s" % ",".join(
[str(g) for g in t.accelerators])
if len(t.accelerators) > 0: if len(t.accelerators) > 0:
proc_env["FLAGS_selected_accelerators"] = "%s" % ",".join( proc_env["FLAGS_selected_accelerators"] = "%s" % ",".join(
[str(g) for g in t.accelerators]) [str(g) for g in t.accelerators])
...@@ -589,17 +594,6 @@ def watch_local_trainers(procs, nranks): ...@@ -589,17 +594,6 @@ def watch_local_trainers(procs, nranks):
return alive return alive
def get_ascend_npus(npus):
if npus is None:
count = fluid.core.NPUDevice.get_device_count()
if count <= 0:
return None
ret = [str(x) for x in range(count)]
else:
ret = [x.strip() for x in npus.split(',')]
return ret
def get_gpus(gpus): def get_gpus(gpus):
if gpus is None: if gpus is None:
gpus_num = fluid.core.get_cuda_device_count() gpus_num = fluid.core.get_cuda_device_count()
...@@ -697,9 +691,7 @@ def get_device_proc_info(args): ...@@ -697,9 +691,7 @@ def get_device_proc_info(args):
else: else:
devices_per_proc = gpus devices_per_proc = gpus
elif device_mode == DeviceMode.ASCEND_NPU: elif device_mode == DeviceMode.ASCEND_NPU:
npus = get_ascend_npus(args.ascend_npus) devices_per_proc = None
assert args.nproc_per_node is None, "ascend_npus need't nproc_per_node arguments"
devices_per_proc = npus
elif device_mode == DeviceMode.XPU: elif device_mode == DeviceMode.XPU:
xpus = get_xpus(args.xpus) xpus = get_xpus(args.xpus)
if args.nproc_per_node is not None: if args.nproc_per_node is not None:
......
...@@ -9518,8 +9518,8 @@ def pow(x, factor=1.0, name=None): ...@@ -9518,8 +9518,8 @@ def pow(x, factor=1.0, name=None):
y_2 = fluid.layers.pow(x, factor=factor_tensor) y_2 = fluid.layers.pow(x, factor=factor_tensor)
# y_2 is x^{3.0} # y_2 is x^{3.0}
""" """
check_variable_and_dtype(x, 'x', ['int32', 'int64', 'float32', 'float64'], check_variable_and_dtype(
'pow') x, 'x', ['int32', 'int64', 'float16', 'float32', 'float64'], 'pow')
helper = LayerHelper('pow', **locals()) helper = LayerHelper('pow', **locals())
inputs = {'X': x} inputs = {'X': x}
......
...@@ -531,7 +531,7 @@ if(WITH_DISTRIBUTE) ...@@ -531,7 +531,7 @@ if(WITH_DISTRIBUTE)
bash_test_modules(test_fleet_launch_async START_BASH test_fleet_launch_async.sh ENVS PADDLE_BINARY_DIR=${PADDLE_BINARY_DIR}) bash_test_modules(test_fleet_launch_async START_BASH test_fleet_launch_async.sh ENVS PADDLE_BINARY_DIR=${PADDLE_BINARY_DIR})
bash_test_modules(test_fleet_launch_cloud START_BASH test_fleet_launch_cloud.sh ENVS PADDLE_BINARY_DIR=${PADDLE_BINARY_DIR}) bash_test_modules(test_fleet_launch_cloud START_BASH test_fleet_launch_cloud.sh ENVS PADDLE_BINARY_DIR=${PADDLE_BINARY_DIR})
bash_test_modules(test_fleet_launch_nproc START_BASH test_fleet_launch_nproc.sh ENVS PADDLE_BINARY_DIR=${PADDLE_BINARY_DIR}) bash_test_modules(test_fleet_launch_nproc START_BASH test_fleet_launch_nproc.sh ENVS PADDLE_BINARY_DIR=${PADDLE_BINARY_DIR})
if(WITH_ASCEND) if(WITH_ASCEND OR WITH_ASCEND_CL)
bash_test_modules(test_fleet_launch_ascend START_BASH test_fleet_launch_ascend.sh ENVS PADDLE_BINARY_DIR=${PADDLE_BINARY_DIR}) bash_test_modules(test_fleet_launch_ascend START_BASH test_fleet_launch_ascend.sh ENVS PADDLE_BINARY_DIR=${PADDLE_BINARY_DIR})
bash_test_modules(test_ascend_group START_BASH test_ascend_group.sh ENVS PADDLE_BINARY_DIR=${PADDLE_BINARY_DIR}) bash_test_modules(test_ascend_group START_BASH test_ascend_group.sh ENVS PADDLE_BINARY_DIR=${PADDLE_BINARY_DIR})
endif() endif()
......
...@@ -71,6 +71,24 @@ def init_communicator(startup_program, main_program, current_endpoint, ...@@ -71,6 +71,24 @@ def init_communicator(startup_program, main_program, current_endpoint,
OP_ROLE_KEY: OpRole.Forward, OP_ROLE_KEY: OpRole.Forward,
}) })
# add input op for test
fill_var_name = "tensor@Filled"
fill_var = block.create_var(
name=fill_var_name,
shape=[10, 10],
dtype='float32',
persistable=False,
stop_gradient=True)
block.append_op(
type="fill_constant",
outputs={"Out": fill_var_name},
attrs={
"shape": [10, 10],
"dtype": fill_var.dtype,
"value": 1.0,
"place_type": 1
})
with fluid.program_guard(main_program): with fluid.program_guard(main_program):
op_type = "c_allreduce_sum" op_type = "c_allreduce_sum"
data = fluid.layers.fill_constant(shape=[1], dtype='float32', value=2.5) data = fluid.layers.fill_constant(shape=[1], dtype='float32', value=2.5)
...@@ -120,10 +138,14 @@ def train(world_endpoints, world_device_ids, local_device_ids, local_rank): ...@@ -120,10 +138,14 @@ def train(world_endpoints, world_device_ids, local_device_ids, local_rank):
main_program = main_programs[local_rank] main_program = main_programs[local_rank]
loss = Loss(Block(main_program)) loss = Loss(Block(main_program))
optimizer = ascend_optimizer.AscendOptimizer(None, fetch_list=[]) optimizer = ascend_optimizer.AscendOptimizer(None, fetch_list=[])
optimizer.minimize(loss, startup_program, auto_dp=True) optimizer.minimize(
loss,
startup_program,
auto_dp=True,
rank_table_file=os.getenv("RANK_TABLE_FILE", None))
exe = paddle.static.Executor(paddle.CPUPlace()) exe = paddle.static.Executor(paddle.CPUPlace())
#exe.run(startup_program) exe.run(startup_program)
exe.run(main_program) exe.run(main_program)
......
...@@ -19,6 +19,7 @@ import time ...@@ -19,6 +19,7 @@ import time
def train(prefix): def train(prefix):
selected_accelerators = os.getenv("FLAGS_selected_accelerators") selected_accelerators = os.getenv("FLAGS_selected_accelerators")
selected_npus = os.getenv("FLAGS_selected_npus")
trainer_id = int(os.getenv("PADDLE_TRAINER_ID")) trainer_id = int(os.getenv("PADDLE_TRAINER_ID"))
worker_endpoints_env = os.getenv("PADDLE_TRAINER_ENDPOINTS") worker_endpoints_env = os.getenv("PADDLE_TRAINER_ENDPOINTS")
current_endpoint = os.getenv("PADDLE_CURRENT_ENDPOINT") current_endpoint = os.getenv("PADDLE_CURRENT_ENDPOINT")
...@@ -27,8 +28,8 @@ def train(prefix): ...@@ -27,8 +28,8 @@ def train(prefix):
device_ids = os.getenv("PADDLE_WORLD_DEVICE_IDS") device_ids = os.getenv("PADDLE_WORLD_DEVICE_IDS")
current_device_id = os.getenv("PADDLE_LOCAL_DEVICE_IDS") current_device_id = os.getenv("PADDLE_LOCAL_DEVICE_IDS")
details = "selected_accelerators:{} worker_endpoints:{} trainers_num:{} current_endpoint:{} trainer_id:{} device_ids:{} device_id:{}"\ details = "selected_accelerators:{} selected_npus:{} worker_endpoints:{} trainers_num:{} current_endpoint:{} trainer_id:{} device_ids:{} device_id:{}"\
.format(selected_accelerators, worker_endpoints, trainers_num, current_endpoint,trainer_id,device_ids, current_device_id) .format(selected_accelerators, selected_npus, worker_endpoints, trainers_num, current_endpoint,trainer_id,device_ids, current_device_id)
print(details) print(details)
with open("multi_process_{}.check_{}.log".format(prefix, trainer_id), with open("multi_process_{}.check_{}.log".format(prefix, trainer_id),
......
# -*- coding:UTF-8 -*-
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright 2020 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
"""generate hccl config file script"""
import os
import sys
import json
import socket
from argparse import ArgumentParser
from typing import Dict, Any
def parse_args():
"""
parse args .
Args:
Returns:
args.
Examples:
>>> parse_args()
"""
parser = ArgumentParser(description="mindspore distributed training launch "
"helper utilty that will generate hccl"
" config file")
parser.add_argument(
"--device_num",
type=str,
default="[0,8)",
help="The number of the Ascend accelerators used. please note that the Ascend accelerators"
"used must be continuous, such [0,4) means to use four chips "
"0,1,2,3; [0,1) means to use chip 0; The first four chips are"
"a group, and the last four chips are a group. In addition to"
"the [0,8) chips are allowed, other cross-group such as [3,6)"
"are prohibited.")
parser.add_argument(
"--visible_devices",
type=str,
default="0,1,2,3,4,5,6,7",
help="will use the visible devices sequentially")
parser.add_argument("--server_ip", type=str, default="", help="server ip")
args = parser.parse_args()
return args
def get_host_ip():
"""
get host ip
"""
ip = None
try:
hostname = socket.gethostname()
ip = socket.gethostbyname(hostname)
except EOFError:
pass
return ip
def main():
print("start", __file__)
args = parse_args()
# visible_devices
visible_devices = args.visible_devices.split(',')
print('visible_devices:{}'.format(visible_devices))
# server_id
ip = get_host_ip()
if args.server_ip:
server_id = args.server_ip
elif ip:
server_id = ip
else:
raise ValueError("please input server ip!")
print('server_id:{}'.format(server_id))
# device_num
first_num = int(args.device_num[1])
last_num = int(args.device_num[3])
if first_num < 0 or last_num > 8:
raise ValueError("device num {} must be in range [0,8] !".format(
args.device_num))
if first_num > last_num:
raise ValueError(
"First num {} of device num {} must less than last num {} !".format(
first_num, args.device_num, last_num))
if first_num < 4:
if last_num > 4:
if first_num == 0 and last_num == 8:
pass
else:
raise ValueError(
"device num {} must be in the same group of [0,4] or [4,8] !".
format(args.device_num))
device_num_list = list(range(first_num, last_num))
print("device_num_list:", device_num_list)
assert len(visible_devices) >= len(device_num_list)
# construct hccn_table
device_ips = {}
with open('/etc/hccn.conf', 'r') as fin:
for hccn_item in fin.readlines():
if hccn_item.strip().startswith('address_'):
device_id, device_ip = hccn_item.split('=')
device_id = device_id.split('_')[1]
device_ips[device_id] = device_ip.strip()
hccn_table = {'version': '1.0', 'server_count': '1', 'server_list': []}
device_list = []
rank_id = 0
for instance_id in device_num_list:
device_id = visible_devices[instance_id]
device_ip = device_ips[device_id]
device = {
'device_id': device_id,
'device_ip': device_ip,
'rank_id': str(rank_id)
}
print('rank_id:{}, device_id:{}, device_ip:{}'.format(
rank_id, device_id, device_ip))
rank_id += 1
device_list.append(device)
hccn_table['server_list'].append({
'server_id': server_id,
'device': device_list,
'host_nic_ip': 'reserve'
})
hccn_table['status'] = 'completed'
# save hccn_table to file
table_path = os.getcwd()
table_fn = os.path.join(table_path, 'hccl_{}p_{}_{}.json'.format(
len(device_num_list), "".join(map(str, device_num_list)), server_id))
with open(table_fn, 'w') as table_fp:
json.dump(hccn_table, table_fp, indent=4)
sys.stdout.flush()
print("Completed: hccl file was save in :", table_fn)
if __name__ == "__main__":
main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestAccuracy(OpTest):
def setUp(self):
self.op_type = "accuracy"
self.set_npu()
self.init_dtype()
np.random.seed(SEED)
pred = np.random.uniform(1, 2, [11, 1]).astype(self.dtype)
label = pred.copy()
accuracy = np.array([1]).astype(self.dtype)
correct = np.array([11 * 1]).astype(self.dtype)
total = np.array([11 * 1]).astype(self.dtype)
self.inputs = {
"Out": OpTest.np_dtype_to_fluid_dtype(pred),
"Label": OpTest.np_dtype_to_fluid_dtype(label),
"Indices": OpTest.np_dtype_to_fluid_dtype(pred)
}
self.outputs = {
"Accuracy": accuracy,
"Correct": correct,
"Total": total
}
def set_npu(self):
self.__class__.use_npu = True
self.place = paddle.NPUPlace(0)
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestAccuracy2(TestAccuracy):
def setUp(self):
self.op_type = "accuracy"
self.set_npu()
self.init_dtype()
np.random.seed(SEED)
pred = np.random.uniform(1, 2, [11, 1]).astype(self.dtype)
label = np.random.uniform(4, 5, [11, 1]).astype(self.dtype)
accuracy = np.array([0]).astype(self.dtype)
correct = np.array([11 * 0]).astype(self.dtype)
total = np.array([11 * 1]).astype(self.dtype)
self.inputs = {
"Out": OpTest.np_dtype_to_fluid_dtype(pred),
"Label": OpTest.np_dtype_to_fluid_dtype(label),
"Indices": OpTest.np_dtype_to_fluid_dtype(pred)
}
self.outputs = {
"Accuracy": accuracy,
"Correct": correct,
"Total": total
}
class TestAccuracy3(TestAccuracy):
def setUp(self):
self.op_type = "accuracy"
self.set_npu()
self.init_dtype()
np.random.seed(SEED)
a = np.random.randint(1, 2, [5, 1])
b = np.random.randint(0, 1, [5, 1])
pred = np.row_stack((a, b)).astype(self.dtype)
label = np.random.randint(1, 2, [10, 1]).astype(self.dtype)
accuracy = np.array([0.5]).astype(self.dtype)
correct = np.array([5]).astype(self.dtype)
total = np.array([10 * 1]).astype(self.dtype)
self.inputs = {
"Out": OpTest.np_dtype_to_fluid_dtype(pred),
"Label": OpTest.np_dtype_to_fluid_dtype(label),
"Indices": OpTest.np_dtype_to_fluid_dtype(pred)
}
self.outputs = {
"Accuracy": accuracy,
"Correct": correct,
"Total": total
}
class TestAccuracyInt(TestAccuracy):
def init_dtype(self):
self.dtype = np.int
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
from test_adam_op import adam_step
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSGD(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "adam"
param = np.random.uniform(-1, 1, (102, 105)).astype("float32")
grad = np.random.uniform(-1, 1, (102, 105)).astype("float32")
moment1 = np.random.uniform(-1, 1, (102, 105)).astype("float32")
# The second moment is positive
moment2 = np.random.random((102, 105)).astype("float32")
learning_rate = 0.004
beta1 = 0.78
beta2 = 0.836
epsilon = 1e-4
beta1_pow = beta1**10
beta2_pow = beta2**10
self.inputs = {
'Param': param,
'Grad': grad,
'Moment1': moment1,
'Moment2': moment2,
'LearningRate': np.array([learning_rate]).astype("float32"),
'Beta1Pow': np.array([beta1_pow]).astype("float32"),
'Beta2Pow': np.array([beta2_pow]).astype("float32")
}
self.attrs = {'epsilon': epsilon, 'beta1': beta1, 'beta2': beta2}
param_out, moment1_out, \
moment2_out = adam_step(self.inputs, self.attrs)
self.outputs = {
'Moment1Out': moment1_out,
'Moment2Out': moment2_out,
'ParamOut': param_out,
'Beta1PowOut': np.array([beta1_pow]).astype("float32") * beta1,
'Beta2PowOut': np.array([beta2_pow]).astype("float32") * beta2
}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, atol=1e-5, check_dygraph=False)
'''
# TODO(zhiqiu): The following test may let 0-3 card down.
# we need to analyze it and open it.
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
sum = paddle.add(a, b)
z = paddle.pow(sum, 2.0)
fc_1 = fluid.layers.fc(input=z, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
adam = fluid.optimizer.Adam(learning_rate=0.01)
adam.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
'''
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import numpy as np
from op_test import OpTest, skip_check_grad_ci
import paddle
import paddle.fluid as fluid
paddle.enable_static()
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestCheckFiniteAndUnscaleOp(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "check_finite_and_unscale"
self.place = paddle.NPUPlace(0)
self.init_dtype()
x = np.random.random((1024, 1024)).astype(self.dtype)
scale = np.random.random((1)).astype(self.dtype)
self.inputs = {'X': [('x0', x)], 'Scale': scale}
self.outputs = {
'FoundInfinite': np.array([0]),
'Out': [('out0', x / scale)],
}
def set_npu(self):
self.__class__.use_npu = True
def init_kernel_type(self):
self.use_mkldnn = False
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestCheckFiniteAndUnscaleOpWithNan(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "check_finite_and_unscale"
self.place = paddle.NPUPlace(0)
self.init_dtype()
x = np.random.random((1024, 1024)).astype(self.dtype)
x[128][128] = np.nan
scale = np.random.random((1)).astype(self.dtype)
self.inputs = {'X': [('x0', x)], 'Scale': scale}
self.outputs = {
'FoundInfinite': np.array([1]),
'Out': [('out0', x)],
}
def set_npu(self):
self.__class__.use_npu = True
def init_kernel_type(self):
self.use_mkldnn = False
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
# When input contains nan, do not check the output,
# since the output may be nondeterministic and will be discarded.
self.check_output_with_place(
self.place, check_dygraph=False, no_check_set=['Out'])
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestCheckFiniteAndUnscaleOpWithInf(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "check_finite_and_unscale"
self.place = paddle.NPUPlace(0)
self.init_dtype()
x = np.random.random((1024, 1024)).astype(self.dtype)
x[128][128] = np.inf
scale = np.random.random((1)).astype(self.dtype)
self.inputs = {'X': [('x0', x)], 'Scale': scale}
self.outputs = {
'FoundInfinite': np.array([1]),
'Out': [('out0', x)],
}
def set_npu(self):
self.__class__.use_npu = True
def init_kernel_type(self):
self.use_mkldnn = False
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
# When input contains inf, do not check the output,
# since the output may be nondeterministic and will be discarded.
self.check_output_with_place(
self.place, check_dygraph=False, no_check_set=['Out'])
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
import paddle.fluid.core as core
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestCast1(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "cast"
self.place = paddle.NPUPlace(0)
ipt = np.random.random(size=[10, 10]) + 1
self.inputs = {'X': ipt.astype('float32')}
self.outputs = {'Out': ipt.astype('float16')}
self.attrs = {
'in_dtype': int(core.VarDesc.VarType.FP32),
'out_dtype': int(core.VarDesc.VarType.FP16)
}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestCast2(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "cast"
self.place = paddle.NPUPlace(0)
ipt = np.random.random(size=[10, 10]) + 1
self.inputs = {'X': ipt.astype('float16')}
self.outputs = {'Out': ipt.astype('float32')}
self.attrs = {
'in_dtype': int(core.VarDesc.VarType.FP16),
'out_dtype': int(core.VarDesc.VarType.FP32)
}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-3)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestEqual(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "equal"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
y = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = x == y # all elements are not equal
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestLessthan(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "less_than"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
y = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = x < y
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestEqual2(TestEqual):
def setUp(self):
self.set_npu()
self.op_type = "equal"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
y = x.copy()
y[0][1] = 1
out = x == y # all elements are equal, except position [0][1]
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.outputs = {'Out': out}
class TestLessthan2(TestLessthan):
def setUp(self):
self.set_npu()
self.op_type = "less_than"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
y = x.copy()
y[0][1] = 1
out = x < y # all elements are equal, except position [0][1]
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.outputs = {'Out': out}
class TestEqual2FP16(TestEqual2):
def init_dtype(self):
self.dtype = np.float16
class TestEqual2Int(TestEqual2):
def init_dtype(self):
self.dtype = np.int32
class TestLessthan2FP16(TestLessthan2):
def init_dtype(self):
self.dtype = np.float16
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestConcat(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "concat"
self.place = paddle.NPUPlace(0)
self.init_dtype()
self.init_test_data()
self.inputs = {'X': [('x0', self.x0), ('x1', self.x1), ('x2', self.x2)]}
self.attrs = {'axis': self.axis}
if self.axis < 0:
self.actual_axis = self.axis + len(self.x0.shape)
self.actual_axis = self.actual_axis if self.actual_axis > 0 else 0
else:
self.actual_axis = self.axis
self.outputs = {
'Out': np.concatenate(
(self.x0, self.x1, self.x2), axis=self.actual_axis)
}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
def init_test_data(self):
self.x0 = np.random.random((1, 4, 50)).astype(self.dtype)
self.x1 = np.random.random((2, 4, 50)).astype(self.dtype)
self.x2 = np.random.random((3, 4, 50)).astype(self.dtype)
self.axis = 0
def test_check_grad(self):
self.check_grad_with_place(
self.place, ['x0', 'x2'], 'Out', check_dygraph=False)
self.check_grad_with_place(
self.place, ['x1'], 'Out', check_dygraph=False)
self.check_grad_with_place(
self.place, ['x2'], 'Out', check_dygraph=False)
class TestConcatFP16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "concat"
self.place = paddle.NPUPlace(0)
self.init_dtype()
self.init_test_data()
self.inputs = {'X': [('x0', self.x0), ('x1', self.x1), ('x2', self.x2)]}
self.attrs = {'axis': self.axis}
if self.axis < 0:
self.actual_axis = self.axis + len(self.x0.shape)
self.actual_axis = self.actual_axis if self.actual_axis > 0 else 0
else:
self.actual_axis = self.axis
self.outputs = {
'Out': np.concatenate(
(self.x0, self.x1, self.x2), axis=self.actual_axis)
}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
def init_test_data(self):
self.x0 = np.random.random((1, 4, 50)).astype(self.dtype)
self.x1 = np.random.random((2, 4, 50)).astype(self.dtype)
self.x2 = np.random.random((3, 4, 50)).astype(self.dtype)
self.axis = 0
if __name__ == '__main__':
unittest.main()
...@@ -64,28 +64,28 @@ class TestElementwiseAddOp(OpTest): ...@@ -64,28 +64,28 @@ class TestElementwiseAddOp(OpTest):
def test_check_output(self): def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False) self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Test grad op after it is implemented. def test_check_grad_normal(self):
# def test_check_grad_normal(self): self.check_grad_with_place(
# self.check_grad_with_place( self.place, ['X', 'Y'],
# self.place, ['X', 'Y'], 'Out',
# 'Out', max_relative_error=0.006,
# max_relative_error=0.006, check_dygraph=False)
# check_dygraph=False)
# def test_check_grad_ingore_x(self):
# def test_check_grad_ingore_x(self): self.check_grad_with_place(
# self.check_grad_with_place( self.place, ['Y'],
# self.place, ['Y'], 'Out',
# 'Out', no_grad_set=set("X"),
# no_grad_set=set("X"), max_relative_error=0.006,
# max_relative_error=0.006, check_dygraph=False)
# check_dygraph=False)
# def test_check_grad_ingore_y(self):
# def test_check_grad_ingore_y(self): self.check_grad_with_place(
# self.check_grad_with_place( self.place, ['X'],
# self.place, ['X'], 'Out',
# 'Out', no_grad_set=set("Y"),
# no_grad_set=set("Y"), max_relative_error=0.006,
# max_relative_error=0.006,check_dygraph=False) check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(), @unittest.skipIf(not paddle.is_compiled_with_npu(),
...@@ -133,10 +133,6 @@ class TestAddAPI(unittest.TestCase): ...@@ -133,10 +133,6 @@ class TestAddAPI(unittest.TestCase):
True, True,
msg="z_value = {}, but expected {}".format(z_value, z_expected)) msg="z_value = {}, but expected {}".format(z_value, z_expected))
def test_backward(self):
# TODO(ascendrc): Test backward after add grad npu op implemented.
pass
@unittest.skipIf(not paddle.is_compiled_with_npu(), @unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU") "core is not compiled with NPU")
......
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseDiv(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "elementwise_div"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
y = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.divide(x, y)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
def test_check_grad_normal(self):
self.check_grad_with_place(
self.place, ['X', 'Y'],
'Out',
max_relative_error=0.007,
check_dygraph=False)
def test_check_grad_ingore_x(self):
self.check_grad_with_place(
self.place, ['Y'],
'Out',
max_relative_error=0.007,
no_grad_set=set("X"),
check_dygraph=False)
def test_check_grad_ingore_y(self):
self.check_grad_with_place(
self.place, ['X'], 'Out', no_grad_set=set("Y"), check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseDivFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "elementwise_div"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
y = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np.divide(x, y)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseDivNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.uniform(1, 2, [32, 32]).astype('float32')
b_np = np.random.uniform(1, 2, [32, 32]).astype('float32')
c_np = np.random.uniform(1, 2, [32, 32]).astype('float32')
d_np = np.random.uniform(1, 2, [32, 32]).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
c = paddle.static.data(name="c", shape=[32, 32], dtype='float32')
d = paddle.static.data(name="d", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
e = paddle.multiply(a, b)
f = paddle.multiply(c, d)
f.stop_gradient = True
g = fluid.layers.elementwise_div(e, f)
fc_1 = fluid.layers.fc(input=g, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(main_prog,
feed={
"a": a_np,
"b": b_np,
"c": c_np,
"d": d_np,
"label": label_np
},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
paddle.enable_static()
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseFloorDiv(OpTest):
def setUp(self):
self.op_type = "elementwise_floordiv"
self.set_npu()
self.init_dtype()
self.init_input_output()
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(self.x),
'Y': OpTest.np_dtype_to_fluid_dtype(self.y)
}
self.attrs = {}
self.outputs = {'Out': self.out}
def set_npu(self):
self.__class__.use_npu = True
self.place = paddle.NPUPlace(0)
def init_input_output(self):
self.x = np.random.uniform(1, 1000, [10, 10]).astype(self.dtype)
self.y = np.random.uniform(1, 1000, [10, 10]).astype(self.dtype)
self.out = np.floor_divide(self.x, self.y)
def init_dtype(self):
self.dtype = "int64"
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseFloorDiv2(TestElementwiseFloorDiv):
def init_dtype(self):
self.dtype = "int32"
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseMin(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "elementwise_min"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
y = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.minimum(x, y)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Min grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseMinFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "elementwise_min"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
y = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np.minimum(x, y)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseMinNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
c = paddle.minimum(a, b)
fc_1 = fluid.layers.fc(input=c, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseMul(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "elementwise_mul"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
y = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.multiply(x, y)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Mul grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseMulFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "elementwise_mul"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
y = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np.multiply(x, y)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseMulNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
c_np = np.random.random(size=(32, 32)).astype('float32')
d_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
c = paddle.static.data(name="c", shape=[32, 32], dtype='float32')
d = paddle.static.data(name="d", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
e = paddle.multiply(a, b)
f = paddle.multiply(c, d)
f.stop_gradient = True
g = paddle.multiply(e, f)
fc_1 = fluid.layers.fc(input=g, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(main_prog,
feed={
"a": a_np,
"b": b_np,
"c": c_np,
"d": d_np,
"label": label_np
},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwisePow(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "elementwise_pow"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
y = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.power(x, y)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Pow grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwisePowFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "elementwise_pow"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
y = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np.power(x, y)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwisePowNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
c = paddle.pow(a, b)
fc_1 = fluid.layers.fc(input=c, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestExpand(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "expand"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.randn(3, 1, 7).astype(self.dtype)
out = np.tile(x, [1, 10, 1])
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {'expand_times': [1, 10, 1]}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Add grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestExpandV2(TestExpand):
def setUp(self):
self.set_npu()
self.op_type = "expand"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.randn(3, 1, 7).astype(self.dtype)
out = np.tile(x, [1, 10, 1])
expand_times = np.array([1, 10, 1]).astype(np.int32)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'ExpandTimes': OpTest.np_dtype_to_fluid_dtype(expand_times)
}
self.attrs = {}
self.outputs = {'Out': out}
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestExpandFp16(TestExpand):
no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestExpandNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 1)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 1], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
res = paddle.fluid.layers.expand(a, [1, 32])
loss = res.sum()
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
for epoch in range(100):
loss_res = exe.run(main_prog,
feed={"a": a_np,
"label": label_np},
fetch_list=[loss])
if epoch % 10 == 0:
print("Epoch {} | Loss: {}".format(epoch, loss))
return loss_res
def test_npu(self):
cpu_loss = self._test(False)
npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
from paddle.fluid import core
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestFillConstant(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "fill_constant"
self.init_dtype()
self.inputs = {}
self.attrs = {'shape': [123, 92], 'value': 3.8}
self.outputs = {'Out': np.full((123, 92), 3.8)}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestFillConstantInt(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "fill_constant"
self.inputs = {}
self.attrs = {
'shape': [123, 92],
'value': 1,
'dtype': core.VarDesc.VarType.INT32
}
self.outputs = {'Out': np.full((123, 92), 1).astype(self.dtype)}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.int32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestFillConstantFP16(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "fill_constant"
self.inputs = {}
self.attrs = {
'shape': [123, 92],
'value': 1.0,
'dtype': core.VarDesc.VarType.FP16
}
self.outputs = {'Out': np.full((123, 92), 1.0).astype(self.dtype)}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-3)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import unittest
import numpy as np
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
from paddle.framework import core
paddle.enable_static()
SEED = 2021
def gather_numpy(x, index, axis):
x_transpose = np.swapaxes(x, 0, axis)
tmp_gather = x_transpose[index, ...]
gather = np.swapaxes(tmp_gather, 0, axis)
return gather
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestGatherOp(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "gather"
self.config()
xnp = np.random.random(self.x_shape).astype(self.x_type)
self.inputs = {
'X': xnp,
'Index': np.array(self.index).astype(self.index_type)
}
self.outputs = {'Out': self.inputs["X"][self.inputs["Index"]]}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
def test_check_grad(self):
self.check_grad_with_place(
self.place, ['X'],
'Out',
max_relative_error=0.006,
check_dygraph=False)
def config(self):
"""
For multi-dimension input
"""
self.x_shape = (10, 20)
self.x_type = "float32"
self.index = [1, 3, 5]
self.index_type = "int32"
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestCase1(TestGatherOp):
def config(self):
"""
For one dimension input
"""
self.x_shape = (100)
self.x_type = "float32"
self.index = [1, 3, 5]
self.index_type = "int32"
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class API_TestGather(unittest.TestCase):
def test_out1(self):
with fluid.program_guard(fluid.Program(), fluid.Program()):
data1 = fluid.layers.data('data1', shape=[-1, 2], dtype='float32')
index = fluid.layers.data('index', shape=[-1, 1], dtype='int32')
out = paddle.fluid.layers.gather(data1, index)
place = paddle.NPUPlace(0)
exe = fluid.Executor(place)
input = np.array([[1, 2], [3, 4], [5, 6]])
index_1 = np.array([1, 2])
result, = exe.run(feed={"data1": input,
"index": index_1},
fetch_list=[out])
expected_output = np.array([[3, 4], [5, 6]])
self.assertTrue(np.allclose(result, expected_output))
def test_out2(self):
with paddle.static.program_guard(paddle.static.Program(),
paddle.static.Program()):
x = paddle.fluid.data('x', shape=[-1, 2], dtype='float32')
index = paddle.fluid.data('index', shape=[-1, 1], dtype='int32')
out = paddle.gather(x, index)
place = paddle.NPUPlace(0)
exe = paddle.static.Executor(place)
x_np = np.array([[1, 2], [3, 4], [5, 6]]).astype('float32')
index_np = np.array([1, 1]).astype('int32')
result, = exe.run(feed={"x": x_np,
"index": index_np},
fetch_list=[out])
expected_output = gather_numpy(x_np, index_np, axis=0)
self.assertTrue(np.allclose(result, expected_output))
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestGatherGrad(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(8192, 768)).astype('float32')
index_np = np.random.randint(0, 8192, size=(1232, 1)).astype('int32')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[8192, 768], dtype='float32')
index = paddle.static.data(
name="index", shape=[1232, 1], dtype='int32')
a.stop_gradient = False
b = paddle.gather(a, index)
loss = fluid.layers.reduce_mean(b)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(main_prog,
feed={"a": a_np,
"index": index_np},
fetch_list=[b, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res[0]))
return pred_res, loss_res
def test_npu(self):
npu_pred, npu_loss = self._test(True)
cpu_pred, cpu_loss = self._test(False)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
from scipy import special
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
def np_gelu(x):
y = 0.5 * x * (1 + special.erf(x / np.sqrt(2)))
return y
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestGelu(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "gelu"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np_gelu(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-3)
# TODO(ascendrc): Add grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestGeluFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "gelu"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np_gelu(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-3)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestGeluNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
c = paddle.multiply(a, b)
d = fluid.layers.gelu(c)
fc_1 = fluid.layers.fc(input=d, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred, atol=1e-3))
self.assertTrue(np.allclose(npu_loss, cpu_loss, atol=1e-3))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
from paddle.fluid import core
paddle.enable_static()
SEED = 2021
NPUPlace = 5
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestIncrement(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(NPUPlace)
self.op_type = "increment"
self.init_dtype()
self.inputs = {
'X':
OpTest.np_dtype_to_fluid_dtype(np.array([1]).astype(self.dtype)),
}
self.attrs = {"Step": 1}
self.outputs = {'Out': np.array([2])}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.int64
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestIncrementFP16(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(NPUPlace)
self.op_type = "increment"
self.init_dtype()
self.inputs = {
'X':
OpTest.np_dtype_to_fluid_dtype(np.array([1]).astype(self.dtype)),
}
self.pre_input_id = id(self.inputs['X'])
self.attrs = {"Step": 1}
self.outputs = {'Out': np.array([2])}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestIncrementInplace(unittest.TestCase):
def test_npu(self):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.array([1]).astype('float32')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[1], dtype='float32')
b = fluid.layers.increment(a)
place = paddle.NPUPlace(NPUPlace)
exe = paddle.static.Executor(place)
exe.run(startup_prog)
b_value = exe.run(main_prog, feed={"a": a_np, }, fetch_list=[b])
print('input a id is : {}'.format(id(a)))
print('input b id is : {}'.format(id(b)))
self.assertEqual(id(a), id(b))
self.assertEqual(b_value[0], 2)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
from functools import reduce
from operator import mul
import paddle
import paddle.fluid as fluid
import paddle.fluid.core as core
from test_layer_norm_op import _reference_layer_norm_naive, _reference_layer_norm_grad
paddle.enable_static()
SEED = 2021
EPOCH = 100
from op_test import _set_use_system_allocator
_set_use_system_allocator(False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestLayerNormOp(unittest.TestCase):
def setUp(self):
self.use_cudnn = True
self.set_npu()
self.init_dtype()
def set_npu(self):
self.__class__.use_npu = True
self.place = paddle.NPUPlace(0)
def init_dtype(self):
self.dtype = np.float32
self.atol = 1e-4
def __assert_close(self, tensor, np_array, msg, atol=1e-4):
self.assertTrue(
np.allclose(
np.array(tensor).astype(np_array.dtype), np_array, atol=atol),
msg)
def check_forward_backward(self,
shape,
begin_norm_axis,
has_scale=True,
has_bias=True,
y_grad_scale=1.0,
use_mkldnn=False):
def test_with_place(place,
shape,
begin_norm_axis,
use_mkldnn=use_mkldnn):
# attr
epsilon = 0.00001
x_shape = shape
D = reduce(mul, x_shape[begin_norm_axis:len(x_shape)], 1)
scale_shape = [D]
np.random.seed(123)
x = np.random.random_sample(x_shape).astype(self.dtype)
scale = np.random.random_sample(scale_shape).astype(
np.float32) if has_scale else None
bias = np.random.random_sample(scale_shape).astype(
np.float32) if has_bias else None
y_grad = (np.random.random_sample(x_shape) *
y_grad_scale).astype(self.dtype)
# reference forward & backward
y, mean, variance = _reference_layer_norm_naive(
x, scale, bias, epsilon, begin_norm_axis)
x_grad, scale_grad, bias_grad = _reference_layer_norm_grad(
x, y_grad, scale, bias, mean, variance, begin_norm_axis)
var_dict = locals()
var_dict['y@GRAD'] = y_grad
var_names = ['x', 'mean', 'variance', 'y', 'y@GRAD']
if has_scale:
var_names += ['scale']
if has_bias:
var_names += ['bias']
ground_truth = {name: var_dict[name] for name in var_names}
program = fluid.Program()
with fluid.program_guard(program):
block = program.global_block()
for name in ground_truth:
block.create_var(
name=name,
dtype=self.dtype,
shape=ground_truth[name].shape)
inputs = {"X": block.var('x')}
fetch_list = [
'y',
'mean',
'variance',
'x@GRAD',
]
if has_scale:
inputs["Scale"] = block.var('scale')
fetch_list += ['scale@GRAD']
if has_bias:
inputs["Bias"] = block.var('bias')
fetch_list += ['bias@GRAD']
layer_norm_op = block.append_op(
type="layer_norm",
inputs=inputs,
outputs={
"Y": block.var('y'),
"Mean": block.var('mean'), # share the same memory
"Variance":
block.var('variance'), # share the same memory
},
attrs={
"epsilon": epsilon,
"begin_norm_axis": begin_norm_axis,
"use_mkldnn": use_mkldnn
})
# generate backward op_desc
grad_op_desc_list, op_grad_to_var = core.get_grad_op_desc(
layer_norm_op.desc, set(), [])
grad_op_desc = grad_op_desc_list[0]
new_op_desc = block.desc.append_op()
new_op_desc.copy_from(grad_op_desc)
for var_name in grad_op_desc.output_arg_names():
block.desc.var(var_name.encode("ascii"))
grad_op_desc.infer_var_type(block.desc)
grad_op_desc.infer_shape(block.desc)
for arg in grad_op_desc.output_arg_names():
grad_var = block.desc.find_var(arg.encode("ascii"))
grad_var.set_dtype(core.VarDesc.VarType.FP32)
program._sync_with_cpp()
exe = fluid.Executor(place)
out = exe.run(program,
feed={
name: var_dict[name]
for name in ['x', 'scale', 'bias', 'y@GRAD']
},
fetch_list=fetch_list)
self.__assert_close(y, out[0], "y", self.atol)
self.__assert_close(mean, out[1], "mean")
self.__assert_close(variance, out[2], "variance", 1e-3)
self.__assert_close(x_grad, out[3], "x_grad", 1e-2)
if has_scale:
self.__assert_close(scale_grad,
out[fetch_list.index('scale@GRAD')],
"scale_grad", 1e-2)
if has_bias:
self.__assert_close(bias_grad,
out[fetch_list.index('bias@GRAD')],
"bias_grad", self.atol)
test_with_place(self.place, shape, begin_norm_axis)
def test_check_forward_backward_with_scale_and_bias(self):
self.check_forward_backward(shape=[2, 3, 4, 5], begin_norm_axis=1)
self.check_forward_backward(
shape=[2, 3, 4, 5],
begin_norm_axis=1,
has_scale=False,
has_bias=True)
self.check_forward_backward(
shape=[2, 3, 4, 5],
begin_norm_axis=1,
has_scale=True,
has_bias=False)
self.check_forward_backward(
shape=[2, 3, 4, 5],
begin_norm_axis=1,
has_scale=False,
has_bias=False)
self.check_forward_backward(shape=[2, 3, 4, 5], begin_norm_axis=3)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestLayerNormOpFP16(TestLayerNormOp):
def init_dtype(self):
self.dtype = np.float16
self.atol = 1e-2
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestLog(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "log"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.log(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Add grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestLogFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "log"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np.log(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestLogNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
c = paddle.multiply(a, b)
d = paddle.log(c)
fc_1 = fluid.layers.fc(input=d, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred, atol=1e-4))
self.assertTrue(np.allclose(npu_loss, cpu_loss, atol=1e-4))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestLogicalNot(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "logical_not"
self.place = paddle.NPUPlace(4)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.logical_not(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.bool
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Add grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestLogcialNotNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('bool')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='bool')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
c = paddle.logical_not(a)
d = paddle.cast(c, dtype="float32")
fc_1 = fluid.layers.fc(input=d, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(4)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(main_prog,
feed={"a": a_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestLookupTableV2(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "lookup_table_v2"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
bsz = 6
seqlen = 8
vocab = 10
dim = 20
w = np.ones([vocab, dim]).astype(self.dtype)
x = np.random.randint(0, vocab, size=(bsz, seqlen)).astype(np.int64)
out = np.ones([bsz, seqlen, dim]).astype(self.dtype)
self.inputs = {
'W': OpTest.np_dtype_to_fluid_dtype(w),
'Ids': OpTest.np_dtype_to_fluid_dtype(x)
}
self.attrs = {
'is_sparse': False,
'is_distributed': False,
'remote_prefetch': False,
'padding_idx': -1
}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
def test_check_grad(self):
if self.dtype == np.float16:
return
self.check_grad_with_place(
self.place, ['W'], 'Out', check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestLookupTableV2FP16(TestLookupTableV2):
no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
def reference_matmul(X, Y, transpose_X=False, transpose_Y=False):
"""Reference forward implementation using np.matmul."""
# np.matmul does not support the transpose flags, so we manually
# transpose X and Y appropriately.
if transpose_X:
if X.ndim == 1:
X = X.reshape((X.size, ))
elif X.ndim == 2:
X = X.T
else:
dim = [i for i in range(len(X.shape))]
dim[-1], dim[len(X.shape) - 2] = dim[len(X.shape) - 2], dim[-1]
X = np.transpose(X, tuple(dim))
if transpose_Y:
if Y.ndim == 1:
Y = Y.reshape((Y.size, ))
else:
dim = [i for i in range(len(Y.shape))]
dim[-1], dim[len(Y.shape) - 2] = dim[len(Y.shape) - 2], dim[-1]
Y = np.transpose(Y, tuple(dim))
Out = np.matmul(X, Y)
if not Out.shape:
# We do not support 0-dimensional Tensors (scalars). So where
# np.matmul outputs a scalar, we must convert to a Tensor of
# shape (1, ) instead.
# Everywhere else, we are compatible with np.matmul.
Out = np.array([Out], dtype="float64")
return Out
class TestMatMul(OpTest):
def config(self):
self.x_shape = (100, 24)
self.y_shape = (24, 100)
self.trans_x = False
self.trans_y = False
def setUp(self):
self.set_npu()
self.op_type = "matmul_v2"
self.place = paddle.NPUPlace(0)
self.init_dtype()
self.config()
np.random.seed(SEED)
x = np.random.random(self.x_shape).astype(self.dtype)
y = np.random.random(self.y_shape).astype(self.dtype)
# -0.1 ~ 0.1
x = -0.1 + 0.2 * x
y = -0.1 + 0.2 * y
result = reference_matmul(x, y, self.trans_x, self.trans_y)
result = result.astype(self.dtype)
self.inputs = {
'X': x,
'Y': y,
}
self.attrs = {'trans_x': self.trans_x, 'trans_y': self.trans_y}
self.outputs = {'Out': result}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
# TODO(ascendrc): Add grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
class TestMatMul2(TestMatMul):
"""
case 2
"""
def config(self):
self.x_shape = (32, 24)
self.y_shape = (32, 24)
self.trans_x = False
self.trans_y = True
class TestMatMul3(TestMatMul):
"""
case 3
"""
def init_dtype(self):
self.dtype = np.float16
class TestMatMul4(TestMatMul):
"""
case 4 dim=3
"""
def config(self):
self.x_shape = (2, 3, 4)
self.y_shape = (2, 4, 3)
self.trans_x = False
self.trans_y = False
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestMatMulNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(2, 3)).astype('float32')
b_np = np.random.random(size=(2, 3)).astype('float32')
c_np = np.random.random(size=(3, 2)).astype('float32')
d_np = np.random.random(size=(3, 2)).astype('float32')
label_np = np.random.randint(2, size=(2, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[2, 3], dtype='float32')
b = paddle.static.data(name="b", shape=[2, 3], dtype='float32')
c = paddle.static.data(name="c", shape=[3, 2], dtype='float32')
d = paddle.static.data(name="d", shape=[3, 2], dtype='float32')
label = paddle.static.data(
name="label", shape=[2, 1], dtype='int64')
sum_1 = paddle.add(a, b)
sum_2 = paddle.add(c, d)
result = paddle.matmul(sum_1, sum_2)
fc_1 = fluid.layers.fc(input=result, size=8)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(main_prog,
feed={
"a": a_np,
"b": b_np,
"c": c_np,
"d": d_np,
"label": label_np
},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
from paddle.fluid import core
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestMean(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "mean"
self.init_dtype()
x = np.random.random([1, 100]).astype(self.dtype)
self.inputs = {'X': x}
self.attrs = {}
np_out = np.mean(x)
self.outputs = {'Out': np_out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
def test_check_grad(self):
self.check_grad_with_place(
self.place, ['X'], 'Out', check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestMeanFP16(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "mean"
self.init_dtype()
x = np.random.random([3, 200]).astype(self.dtype)
self.inputs = {'X': x}
self.attrs = {}
np_out = np.mean(x)
self.outputs = {'Out': np_out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
class TestMul(OpTest):
def config(self):
self.x_shape = (32, 5)
self.y_shape = (5, 100)
def setUp(self):
self.set_npu()
self.op_type = "mul"
self.place = paddle.NPUPlace(0)
self.init_dtype()
self.config()
np.random.seed(SEED)
self.inputs = {
'X': np.random.random(self.x_shape).astype(self.dtype),
'Y': np.random.random(self.y_shape).astype(self.dtype)
}
self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
#
class TestMulFP16(TestMul):
"""
case 2
"""
def init_dtype(self):
self.dtype = np.float16
class TestMul3(TestMul):
"""
case 3
"""
def config(self):
self.x_shape = (2, 2, 5)
self.y_shape = (10, 5)
def setUp(self):
self.set_npu()
self.op_type = "mul"
self.place = paddle.NPUPlace(0)
self.init_dtype()
self.config()
np.random.seed(SEED)
self.inputs = {
'X': np.random.random(self.x_shape).astype(self.dtype),
'Y': np.random.random(self.y_shape).astype(self.dtype)
}
self.outputs = {
'Out': np.dot(self.inputs['X'].reshape(2, 10), self.inputs['Y'])
}
class TestMul4(TestMul):
"""
case 4
"""
def config(self):
self.x_shape = (2, 3, 4)
self.y_shape = (4, 5)
def setUp(self):
self.set_npu()
self.op_type = "mul"
self.place = paddle.NPUPlace(0)
self.init_dtype()
self.config()
np.random.seed(SEED)
self.inputs = {
'X': np.random.random(self.x_shape).astype(self.dtype),
'Y': np.random.random(self.y_shape).astype(self.dtype)
}
self.attrs = {"x_num_col_dims": 2}
self.outputs = {'Out': np.matmul(self.inputs['X'], self.inputs['Y'])}
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestMulNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(2, 3)).astype('float32')
b_np = np.random.random(size=(2, 3)).astype('float32')
c_np = np.random.random(size=(3, 2)).astype('float32')
d_np = np.random.random(size=(3, 2)).astype('float32')
label_np = np.random.randint(2, size=(2, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[2, 3], dtype='float32')
b = paddle.static.data(name="b", shape=[2, 3], dtype='float32')
c = paddle.static.data(name="c", shape=[3, 2], dtype='float32')
d = paddle.static.data(name="d", shape=[3, 2], dtype='float32')
label = paddle.static.data(
name="label", shape=[2, 1], dtype='int64')
sum_1 = paddle.add(a, b)
sum_2 = paddle.add(c, d)
result = paddle.fluid.layers.mul(sum_1, sum_2)
fc_1 = fluid.layers.fc(input=result, size=8)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("TestMulNet Start run on {} . ".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(main_prog,
feed={
"a": a_np,
"b": b_np,
"c": c_np,
"d": d_np,
"label": label_np
},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestMulNet3_2(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(2, 3, 4)).astype('float32')
b_np = np.random.random(size=(2, 3, 4)).astype('float32')
c_np = np.random.random(size=(12, 5)).astype('float32')
d_np = np.random.random(size=(12, 5)).astype('float32')
label_np = np.random.randint(2, size=(2, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[2, 3, 4], dtype='float32')
b = paddle.static.data(name="b", shape=[2, 3, 4], dtype='float32')
c = paddle.static.data(name="c", shape=[12, 5], dtype='float32')
d = paddle.static.data(name="d", shape=[12, 5], dtype='float32')
label = paddle.static.data(
name="label", shape=[2, 1], dtype='int64')
sum_1 = paddle.add(a, b)
sum_2 = paddle.add(c, d)
result = paddle.fluid.layers.mul(sum_1, sum_2)
fc_1 = fluid.layers.fc(input=result, size=8)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("testMulNet3_2 tart run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(main_prog,
feed={
"a": a_np,
"b": b_np,
"c": c_np,
"d": d_np,
"label": label_np
},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestMulNet3_2_xc2(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(2, 3, 4)).astype('float32')
b_np = np.random.random(size=(2, 3, 4)).astype('float32')
c_np = np.random.random(size=(4, 5)).astype('float32')
d_np = np.random.random(size=(4, 5)).astype('float32')
label_np = np.random.randint(2, size=(2, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[2, 3, 4], dtype='float32')
b = paddle.static.data(name="b", shape=[2, 3, 4], dtype='float32')
c = paddle.static.data(name="c", shape=[4, 5], dtype='float32')
d = paddle.static.data(name="d", shape=[4, 5], dtype='float32')
label = paddle.static.data(
name="label", shape=[2, 1], dtype='int64')
sum_1 = paddle.add(a, b)
sum_2 = paddle.add(c, d)
result = paddle.fluid.layers.mul(sum_1, sum_2, x_num_col_dims=2)
result_re = paddle.reshape(result, shape=[2, 15])
fc_1 = fluid.layers.fc(input=result_re, size=8)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("TestMulNet3_2_xc2. Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(main_prog,
feed={
"a": a_np,
"b": b_np,
"c": c_np,
"d": d_np,
"label": label_np
},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestPow(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "pow"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.power(x, 3)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {'factor': 3.0}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
def test_check_grad(self):
self.check_grad_with_place(
self.place, ['X'], 'Out', check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestPowFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "pow"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np.power(x, 2)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {'factor': 2.0}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestPowNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
sum = paddle.add(a, b)
z = paddle.pow(sum, 2.0)
fc_1 = fluid.layers.fc(input=z, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import unittest
import numpy as np
from op_test import OpTest, skip_check_grad_ci
import paddle
import paddle.fluid.core as core
import paddle.fluid as fluid
from paddle.fluid import compiler, Program, program_guard
from paddle.fluid.framework import convert_np_dtype_to_dtype_
paddle.enable_static()
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestAny8DOp(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "reduce_any"
self.place = paddle.NPUPlace(0)
self.inputs = {
'X': np.random.randint(0, 2,
(2, 5, 3, 2, 2, 3, 4, 2)).astype("bool")
}
self.attrs = {'dim': (3, 5, 4)}
self.outputs = {'Out': self.inputs['X'].any(axis=self.attrs['dim'])}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestAnyOpWithDim(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "reduce_any"
self.place = paddle.NPUPlace(0)
self.inputs = {'X': np.random.randint(0, 2, (5, 6, 10)).astype("bool")}
self.attrs = {'dim': [1]}
self.outputs = {'Out': self.inputs['X'].any(axis=1)}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestAny8DOpWithDim(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "reduce_any"
self.place = paddle.NPUPlace(0)
self.inputs = {
'X': np.random.randint(0, 2,
(2, 5, 3, 2, 2, 3, 4, 2)).astype("bool")
}
self.attrs = {'dim': (3, 6)}
self.outputs = {'Out': self.inputs['X'].any(axis=self.attrs['dim'])}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestAnyOpWithKeepDim(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "reduce_any"
self.place = paddle.NPUPlace(0)
self.inputs = {'X': np.random.randint(0, 2, (5, 6, 10)).astype("bool")}
self.attrs = {'dim': (1, ), 'keep_dim': True}
self.outputs = {
'Out': np.expand_dims(
self.inputs['X'].any(axis=self.attrs['dim']), axis=1)
}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestAny8DOpWithKeepDim(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "reduce_any"
self.place = paddle.NPUPlace(0)
self.inputs = {
'X': np.random.randint(0, 2,
(2, 5, 3, 2, 2, 3, 4, 2)).astype("bool")
}
self.attrs = {'dim': (1, ), 'keep_dim': True}
self.outputs = {
'Out': np.expand_dims(
self.inputs['X'].any(axis=self.attrs['dim']), axis=1)
}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestReduceSum(OpTest):
def setUp(self):
np.random.seed(SEED)
self.set_npu()
self.init_dtype()
self.place = paddle.NPUPlace(0)
self.init_op_type()
self.initTestCase()
self.use_mkldnn = False
self.attrs = {
'dim': self.axis,
'keep_dim': self.keep_dim,
'reduce_all': self.reduce_all
}
self.inputs = {'X': np.random.random(self.shape).astype(self.dtype)}
if self.attrs['reduce_all']:
self.outputs = {'Out': self.inputs['X'].sum()}
else:
self.outputs = {
'Out': self.inputs['X'].sum(axis=self.axis,
keepdims=self.attrs['keep_dim'])
}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def init_op_type(self):
self.op_type = "reduce_sum"
self.use_mkldnn = False
self.keep_dim = False
self.reduce_all = False
def initTestCase(self):
self.shape = (5, 6)
self.axis = (0, )
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Add grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
class TestReduceSum2(OpTest):
def init_dtype(self):
self.dtype = np.int32
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestReduceSumNet(unittest.TestCase):
def set_reduce_sum_function(self, x):
# keep_dim = False
return paddle.fluid.layers.reduce_sum(x, dim=-1)
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(2, 3, 4)).astype('float32')
b_np = np.random.random(size=(2, 3, 4)).astype('float32')
label_np = np.random.randint(2, size=(2, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[2, 3, 4], dtype='float32')
b = paddle.static.data(name="b", shape=[2, 3, 4], dtype='float32')
label = paddle.static.data(
name="label", shape=[2, 1], dtype='int64')
a_1 = fluid.layers.fc(input=a, size=4, num_flatten_dims=2, act=None)
b_1 = fluid.layers.fc(input=b, size=4, num_flatten_dims=2, act=None)
z = paddle.add(a_1, b_1)
z_1 = self.set_reduce_sum_function(z)
prediction = fluid.layers.fc(input=z_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestReduceSumNet2(TestReduceSumNet):
def set_reduce_sum_function(self, x):
# keep_dim = True
return paddle.fluid.layers.reduce_sum(x, dim=-1, keep_dim=True)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestReduceSumNet3(TestReduceSumNet):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(2, 3, 4)).astype('float32')
b_np = np.random.random(size=(2, 3, 4)).astype('float32')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[2, 3, 4], dtype='float32')
b = paddle.static.data(name="b", shape=[2, 3, 4], dtype='float32')
z = paddle.add(a, b)
loss = fluid.layers.reduce_sum(z)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
loss_res = exe.run(main_prog,
feed={"a": a_np,
"b": b_np},
fetch_list=[loss])
if epoch % 10 == 0:
print("Epoch {} | Loss: {}".format(epoch, loss_res))
return loss_res, loss_res
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestRelu(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "relu"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.rand(3, 2).astype(self.dtype)
out = x
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestReluFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "relu"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.rand(3, 2).astype(self.dtype)
out = x
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestReluNeg(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "relu"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.array([0.1, -0.1, -1.0]).astype(self.dtype)
out = np.array([0.1, 0.0, 0.0]).astype(self.dtype)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
#
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestReluNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
sum = paddle.add(a, b)
z = paddle.nn.functional.relu(sum)
fc_1 = fluid.layers.fc(input=z, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestReshape2(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "reshape2"
self.place = paddle.NPUPlace(0)
self.init_data()
self.inputs = {"X": np.random.random(self.ori_shape).astype("float32")}
self.attrs = {"shape": self.new_shape}
self.outputs = {
"Out": self.inputs["X"].reshape(self.infered_shape),
'XShape': np.random.random(self.ori_shape).astype("float32")
}
def set_npu(self):
self.__class__.use_npu = True
def init_data(self):
self.ori_shape = (2, 100)
self.new_shape = (20, 10)
self.infered_shape = (20, 10)
def test_check_output(self):
self.check_output_with_place(
self.place, check_dygraph=False, no_check_set=['XShape'])
def test_check_grad_normal(self):
self.check_grad_with_place(
self.place, ['X'], 'Out', check_dygraph=False)
class TestReshape2_case2(TestReshape2):
def init_data(self):
self.ori_shape = (2, 100)
self.new_shape = (-1, 10)
self.infered_shape = (20, 10)
class TestReshape2_case3(TestReshape2):
def init_data(self):
self.ori_shape = (100, 5, 6)
self.new_shape = (-1, 0, 3)
self.infered_shape = (200, 5, 3)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestScale(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "scale"
self.place = paddle.NPUPlace(0)
self.init_dtype()
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(
np.random.random((10, 10)).astype(self.dtype))
}
self.attrs = {'scale': -2.3, 'bias': 0, 'bias_after_scale': True}
self.outputs = {
'Out': self.inputs['X'] * self.dtype(self.attrs['scale'])
}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestFP16Scale(TestScale):
def init_dtype(self):
self.dtype = np.float16
class TestBiasAfterScale(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "scale"
self.place = paddle.NPUPlace(0)
self.init_dtype()
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(
np.random.random((10, 10)).astype(self.dtype))
}
self.attrs = {'scale': -2.3, 'bias': 0, 'bias_after_scale': False}
self.outputs = {
'Out': self.inputs['X'] * self.dtype(self.attrs['scale'])
}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
import paddle.fluid.core as core
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestCast1(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "scatter"
self.place = paddle.NPUPlace(0)
ref_np = np.ones((3, 2)).astype("float32")
index_np = np.array([1]).astype("int32")
updates_np = np.random.random((1, 2)).astype("float32")
output_np = np.copy(ref_np)
output_np[index_np] = updates_np
self.inputs = {'X': ref_np, 'Ids': index_np, 'Updates': updates_np}
self.outputs = {'Out': output_np}
self.attrs = {'overwrite': True}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestCast2(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "scatter"
self.place = paddle.NPUPlace(0)
ref_np = np.ones((3, 2)).astype("int32")
index_np = np.array([1]).astype("int32")
updates_np = np.zeros((1, 2)).astype("int32")
output_np = np.copy(ref_np)
output_np[index_np] = updates_np
self.inputs = {'X': ref_np, 'Ids': index_np, 'Updates': updates_np}
self.outputs = {'Out': output_np}
self.attrs = {'overwrite': True}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestCast3(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "scatter"
self.place = paddle.NPUPlace(0)
ref_np = np.ones((3, 2)).astype("float32")
index_np = np.array([1]).astype("int32")
updates_np = np.random.random((1, 2)).astype("float32")
output_np = np.copy(ref_np)
output_np[index_np] += updates_np
self.inputs = {'X': ref_np, 'Ids': index_np, 'Updates': updates_np}
self.outputs = {'Out': output_np}
self.attrs = {'overwrite': False}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestCast4(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "scatter"
self.place = paddle.NPUPlace(0)
ref_np = np.ones((3, 2)).astype("float32")
index_np = np.array([1, 2]).astype("int32")
updates_np = np.random.random((2, 2)).astype("float32")
output_np = np.copy(ref_np)
output_np[1] = updates_np[0]
output_np[2] = updates_np[1]
self.inputs = {'X': ref_np, 'Ids': index_np, 'Updates': updates_np}
self.outputs = {'Out': output_np}
self.attrs = {'overwrite': True}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSGD(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "sgd"
self.conf()
w = np.random.random((self.h, self.w)).astype("float32")
g = np.random.random((self.h, self.w)).astype("float32")
lr = np.array([0.1]).astype("float32")
self.inputs = {'Param': w, 'Grad': g, 'LearningRate': lr}
self.outputs = {'ParamOut': w - lr * g}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def conf(self):
self.h = 12
self.w = 15
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
sum = paddle.add(a, b)
z = paddle.pow(sum, 2.0)
fc_1 = fluid.layers.fc(input=z, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestShape(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "shape"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [5, 10]).astype(self.dtype)
out = np.array([5, 10])
self.inputs = {'Input': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
paddle.enable_static()
SEED = 2021
EPOCH = 100
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSliceOp(OpTest):
def setUp(self):
self.op_type = "slice"
self.set_npu()
self.init_dtype()
self.config()
self.inputs = {'Input': self.input}
self.outputs = {'Out': self.out}
self.attrs = {
'axes': self.axes,
'starts': self.starts,
'ends': self.ends,
'infer_flags': self.infer_flags
}
def config(self):
self.input = np.random.random([3, 4, 5, 6]).astype(self.dtype)
self.starts = [1, 0, 2]
self.ends = [3, 3, 4]
self.axes = [0, 1, 2]
self.infer_flags = [1, 1, 1]
self.out = self.input[1:3, 0:3, 2:4, :]
def init_dtype(self):
self.dtype = np.float32
def set_npu(self):
self.__class__.use_npu = True
self.place = paddle.NPUPlace(0)
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
def test_check_grad_normal(self):
if self.dtype == np.float16:
return
self.check_grad_with_place(
self.place, ['Input'], 'Out', check_dygraph=False)
class TestSliceOp2(TestSliceOp):
def config(self):
self.input = np.random.random([3, 4, 5, 6]).astype(self.dtype)
self.starts = [1, 0, -3]
self.ends = [3, 3, -1]
self.axes = [0, 1, 2]
self.infer_flags = [1, 1, 1]
self.out = self.input[1:3, 0:3, -3:-1, :]
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSliceOpFp16(TestSliceOp):
def init_dtype(self):
self.dtype = np.float16
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
self.place = paddle.NPUPlace(0)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSliceNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
batch_size = 32
data_shape = (32, 32)
a_np = np.random.random(size=data_shape).astype('float32')
b_np = np.random.random(size=data_shape).astype('float32')
label_np = np.random.randint(2, size=(batch_size, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=data_shape, dtype='float32')
b = paddle.static.data(name="b", shape=data_shape, dtype='float32')
label = paddle.static.data(
name="label", shape=[batch_size, 1], dtype='int64')
sum = paddle.add(a, b)
z = paddle.slice(sum, axes=[0, 1], starts=[0, 0], ends=[33, 2])
prediction = paddle.static.nn.fc(z, size=2, activation='softmax')
cost = paddle.nn.functional.cross_entropy(
input=prediction, label=label)
loss = paddle.mean(cost)
sgd = paddle.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(EPOCH):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
from paddle.fluid import core
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSoftmax(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "softmax"
self.init_dtype()
x = np.random.random([3, 3]).astype(self.dtype)
np_out = np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)
self.inputs = {'X': x}
self.attrs = {}
self.outputs = {'Out': np_out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSoftmaxNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(4, 32)).astype('float32')
b_np = np.random.random(size=(4, 32)).astype('float32')
label_np = np.random.randint(2, size=(4, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[4, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[4, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[4, 1], dtype='int64')
c = paddle.multiply(a, b)
d = paddle.sqrt(c)
# 4 x 128
fc_1 = fluid.layers.fc(input=d, size=128)
# 4 x 2
prediction = fluid.layers.fc(input=fc_1, size=2)
# 4 x 2
prob = fluid.layers.softmax(prediction, axis=1)
cost = fluid.layers.cross_entropy(input=prob, label=label)
loss = fluid.layers.mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred, rtol=1e-2))
self.assertTrue(np.allclose(npu_loss, cpu_loss, rtol=1e-2))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
from test_softmax_op import stable_softmax
from test_softmax_with_cross_entropy_op import cross_entropy
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSoftmaxWithCrossEntropyOp(OpTest):
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def initParams(self):
self.set_npu()
self.op_type = "softmax_with_cross_entropy"
self.numeric_stable_mode = False
self.place = paddle.NPUPlace(0)
self.soft_label = False
self.init_dtype()
self.axis = -1
self.ignore_index = -1
self.shape = [41, 37]
np.random.seed(SEED)
def setUp(self):
self.initParams()
logits = getattr(
self, "logits",
np.random.uniform(0.1, 1.0, self.shape).astype(self.dtype))
softmax = np.apply_along_axis(stable_softmax, self.axis, logits)
if self.soft_label:
labels = np.random.uniform(0.1, 1.0, self.shape).astype(self.dtype)
labels /= np.sum(labels, axis=self.axis, keepdims=True)
else:
axis_dim = self.shape[self.axis]
self.shape[self.axis] = 1
labels = np.random.randint(0, axis_dim, self.shape, dtype="int64")
loss = cross_entropy(softmax, labels, self.soft_label, self.axis,
self.ignore_index)
self.inputs = {"Logits": logits, "Label": labels}
self.outputs = {
"Softmax": softmax.astype(self.dtype),
"Loss": loss.astype(self.dtype)
}
self.attrs = {
"numeric_stable_mode": self.numeric_stable_mode,
"soft_label": self.soft_label,
"ignore_index": self.ignore_index,
}
if self.axis != -1:
self.attrs['axis'] = self.axis
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Add grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestPowNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
sum = paddle.add(a, b)
z = paddle.pow(sum, 2.0)
fc_1 = fluid.layers.fc(input=z, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2)
cost = fluid.layers.softmax_with_cross_entropy(prediction, label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSqrt(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "sqrt"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.sqrt(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Add grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSqrtFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "sqrt"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np.sqrt(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSqrtNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
c = paddle.multiply(a, b)
d = paddle.sqrt(c)
fc_1 = fluid.layers.fc(input=d, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSquare(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "square"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.square(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Add grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSquareFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "square"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np.square(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSquareNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
c = paddle.multiply(a, b)
d = paddle.square(c)
fc_1 = fluid.layers.fc(input=d, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
import paddle.fluid.core as core
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestStack1(OpTest):
def initDefaultParameters(self):
self.num_inputs = 4
self.input_dim = (5, 6, 7)
self.axis = 0
self.dtype = 'float32'
def get_x_names(self):
x_names = []
for i in range(self.num_inputs):
x_names.append('x{}'.format(i))
return x_names
def setUp(self):
self.initDefaultParameters()
self.set_npu()
self.op_type = "stack"
self.place = paddle.NPUPlace(0)
self.x = []
for i in range(self.num_inputs):
self.x.append(
np.random.random(size=self.input_dim).astype(self.dtype))
tmp = []
x_names = self.get_x_names()
for i in range(self.num_inputs):
tmp.append((x_names[i], self.x[i]))
self.inputs = {'X': tmp}
self.outputs = {'Y': np.stack(self.x, axis=self.axis)}
self.attrs = {'axis': self.axis}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestStack2(OpTest):
def initDefaultParameters(self):
self.num_inputs = 4
self.input_dim = (2, 3, 4)
self.axis = -1
self.dtype = 'float32'
def get_x_names(self):
x_names = []
for i in range(self.num_inputs):
x_names.append('x{}'.format(i))
return x_names
def setUp(self):
self.initDefaultParameters()
self.set_npu()
self.op_type = "stack"
self.place = paddle.NPUPlace(0)
self.x = []
for i in range(self.num_inputs):
self.x.append(
np.random.random(size=self.input_dim).astype(self.dtype))
tmp = []
x_names = self.get_x_names()
for i in range(self.num_inputs):
tmp.append((x_names[i], self.x[i]))
self.inputs = {'X': tmp}
self.outputs = {'Y': np.stack(self.x, axis=self.axis)}
self.attrs = {'axis': self.axis}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestStack3(OpTest):
def initDefaultParameters(self):
self.num_inputs = 4
self.input_dim = (2, 3, 4)
self.axis = 1
self.dtype = 'float32'
def get_x_names(self):
x_names = []
for i in range(self.num_inputs):
x_names.append('x{}'.format(i))
return x_names
def setUp(self):
self.initDefaultParameters()
self.set_npu()
self.op_type = "stack"
self.place = paddle.NPUPlace(0)
self.x = []
for i in range(self.num_inputs):
self.x.append(
np.random.random(size=self.input_dim).astype(self.dtype))
tmp = []
x_names = self.get_x_names()
for i in range(self.num_inputs):
tmp.append((x_names[i], self.x[i]))
self.inputs = {'X': tmp}
self.outputs = {'Y': np.stack(self.x, axis=self.axis)}
self.attrs = {'axis': self.axis}
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
import paddle.fluid.core as core
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestSum1(OpTest):
def setUp(self):
self.set_npu()
self.init_dtype()
self.op_type = "sum"
self.place = paddle.NPUPlace(0)
x0 = np.random.random((3, 40)).astype(self.dtype)
x1 = np.random.random((3, 40)).astype(self.dtype)
x2 = np.random.random((3, 40)).astype(self.dtype)
self.inputs = {'X': [("x0", x0), ("x1", x1), ("x2", x2)]}
y = x0 + x1 + x2
self.outputs = {'Out': y}
self.attrs = {'use_mkldnn': False}
def init_dtype(self):
self.dtype = np.float32
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
class TestSum2(OpTest):
def setUp(self):
self.set_npu()
self.init_dtype()
self.op_type = "sum"
self.place = paddle.NPUPlace(0)
x0 = np.random.random((3, 3)).astype(self.dtype)
x1 = np.random.random((3, 3)).astype(self.dtype)
x2 = np.random.random((3, 3)).astype(self.dtype)
x3 = np.random.random((3, 3)).astype(self.dtype)
self.inputs = {'X': [("x0", x0), ("x1", x1), ("x2", x2), ("x3", x3)]}
y = x0 + x1 + x2 + x3
self.outputs = {'Out': y}
self.attrs = {'use_mkldnn': False}
def init_dtype(self):
self.dtype = np.float16
def set_npu(self):
self.__class__.use_npu = True
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestTanh(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "tanh"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.tanh(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Add grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestTanhFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "tanh"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np.tanh(x)
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(x)}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-3)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestTanhNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
c = paddle.multiply(a, b)
d = paddle.tanh(c)
fc_1 = fluid.layers.fc(input=d, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
from paddle.fluid import core
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestTopk(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "top_k"
self.init_dtype()
x = np.array([[0.78104149, 0.88745828, 0.32362268],
[0.82196718, 0.48763277, 0.42826136],
[0.96527182, 0.34851612, 0.12959783]]).astype(self.dtype)
self.inputs = {'X': x}
np_out = np.array(
[[0.88745828], [0.82196718], [0.96527182]]).astype(self.dtype)
np_indices = np.array([[1], [0], [0]])
self.attrs = {'k': 1, "axis": -1}
self.outputs = {'Out': np_out, 'Indices': np_indices}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestTopkV2(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "top_k"
self.init_dtype()
x = np.array([[0.78104149, 0.88745828, 0.32362268],
[0.82196718, 0.48763277, 0.42826136],
[0.96527182, 0.34851612, 0.12959783]]).astype(self.dtype)
self.inputs = {'X': x}
np_out = np.array([[0.88745828, 0.78104149], [0.82196718, 0.48763277],
[0.96527182, 0.34851612]]).astype(self.dtype)
np_indices = np.array([[1, 0], [0, 1], [0, 1]])
self.attrs = {'k': 2, "axis": -1}
self.outputs = {'Out': np_out, 'Indices': np_indices}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest, _set_use_system_allocator
import paddle
import paddle.fluid as fluid
paddle.enable_static()
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestTransposeOp(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "transpose2"
self.place = paddle.NPUPlace(0)
self.init_dtype()
self.init_input_output()
self.init_kernel_type()
self.init_axis()
self.inputs = {'X': OpTest.np_dtype_to_fluid_dtype(self.x)}
self.attrs = {'axis': [0, 2, 1, 3], 'data_format': 'AnyLayout'}
self.outputs = {'Out': self.out}
def set_npu(self):
self.__class__.use_npu = True
def init_kernel_type(self):
self.use_mkldnn = False
def init_input_output(self):
self.x = np.random.uniform(0.1, 1, [8, 512, 12, 64]).astype(self.dtype)
self.out = np.transpose(self.x, [0, 2, 1, 3])
def init_dtype(self):
self.dtype = np.float32
def init_axis(self):
self.axis = -1
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestTransposeOpFP16(TestTransposeOp):
no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
import paddle.fluid.core as core
from paddle.fluid.op import Operator
from paddle.fluid.executor import Executor
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestTruncatedNormal(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
scope = paddle.fluid.core.Scope()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
paddle.seed(SEED)
with fluid.scope_guard(scope):
with paddle.static.program_guard(main_prog, startup_prog):
weight_attr = paddle.framework.ParamAttr(
name="linear_weight",
initializer=paddle.nn.initializer.TruncatedNormal(
mean=0.0, std=2.0))
linear = paddle.nn.Linear(
2, 2, weight_attr=weight_attr, bias_attr=False)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
w = exe.run(startup_prog, fetch_list=['linear_weight'])
return w
def test_npu(self):
cpu_w = self._test(False)
npu_w = self._test(True)
self.assertTrue(np.allclose(npu_w, cpu_w))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import numpy as np
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
import paddle.fluid.contrib.mixed_precision.amp_nn as amp_nn
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestUpdateLossScalingOp(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "update_loss_scaling"
self.place = paddle.NPUPlace(0)
self.init()
found_inf = np.array([False], dtype=np.bool)
x = np.random.random((1024, 1024)).astype(self.dtype)
self.inputs = {
'X': [('x0', x)],
'FoundInfinite': found_inf,
'PrevLossScaling': self.prev_loss_scaling,
'InGoodSteps': self.num_good_steps,
'InBadSteps': self.num_bad_steps
}
self.outputs = {
'Out': [('out0', x)],
'LossScaling': self.prev_loss_scaling * self.incr_ratio,
'OutGoodSteps': self.zero_steps,
'OutBadSteps': self.zero_steps
}
def set_npu(self):
self.__class__.use_npu = True
def init(self):
self.incr_ratio = 2.0
self.decr_ratio = 0.8
self.dtype = np.float32
self.prev_loss_scaling = np.array([2048]).astype(self.dtype)
self.num_good_steps = np.array([999], dtype=np.int32)
self.num_bad_steps = np.array([1], dtype=np.int32)
self.zero_steps = np.array([0], dtype=np.int32)
self.attrs = {
'incr_every_n_steps': 1000,
'decr_every_n_nan_or_inf': 2,
'incr_ratio': self.incr_ratio,
'decr_ratio': self.decr_ratio,
}
def test_check_output(self):
self.check_output_with_place(
self.place, check_dygraph=False, no_check_set=['Out'])
class TestUpdateLossScalingOpBad(TestUpdateLossScalingOp):
def setUp(self):
self.set_npu()
self.op_type = "update_loss_scaling"
self.place = paddle.NPUPlace(0)
self.init()
found_inf = np.array([True], dtype=np.bool)
x = np.random.random((1024, 1024)).astype(self.dtype)
i = np.random.randint(0, 1024, 1)
j = np.random.randint(0, 1024, 1)
x[i[0]][j[0]] = np.inf
self.inputs = {
'X': [('x0', x)],
'FoundInfinite': found_inf,
'PrevLossScaling': self.prev_loss_scaling,
'InGoodSteps': self.num_good_steps,
'InBadSteps': self.num_bad_steps
}
self.outputs = {
'Out': [('out0', np.zeros_like(x))],
'LossScaling': self.prev_loss_scaling * self.decr_ratio,
'OutGoodSteps': self.zero_steps,
'OutBadSteps': self.zero_steps
}
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestUpdateLossScalingLayer(unittest.TestCase):
def loss_scaling_check(self, use_npu=True, scope=fluid.Scope()):
a = fluid.data(name="a", shape=[1024, 1024], dtype='float32')
b = fluid.data(name="b", shape=[512, 128], dtype='float32')
x = [a, b]
found_inf = fluid.data(name="found_inf", shape=[1], dtype='bool')
prev_loss_scaling = fluid.data(
name="prev_loss_scaling", shape=[1], dtype='float32')
num_good_steps = fluid.data(
name="num_good_steps", shape=[1], dtype='int32')
num_bad_steps = fluid.data(
name="num_bad_steps", shape=[1], dtype='int32')
a_v = np.random.random([1024, 1024]).astype('float32')
b_v = np.random.random([512, 128]).astype('float32')
found_inf_v = np.array([False]).astype('bool')
prev_loss_scaling_v = np.array([2048]).astype('float32')
num_good_steps_v = np.array([999], dtype=np.int32)
num_bad_steps_v = np.array([1], dtype=np.int32)
incr_every_n_steps = 1000
decr_every_n_nan_or_inf = 2
incr_ratio = 2
decr_ratio = 0.8
result = amp_nn.update_loss_scaling(
x,
found_inf,
prev_loss_scaling,
num_good_steps,
num_bad_steps,
incr_every_n_steps,
decr_every_n_nan_or_inf,
incr_ratio,
decr_ratio,
name="update_loss_scaling")
place = paddle.NPUPlace(0) if use_npu else fluid.CPUPlace()
exe = fluid.Executor(place)
with fluid.scope_guard(scope):
exe.run(fluid.default_startup_program())
result_v = exe.run(feed={
'a': a_v,
'b': b_v,
'found_inf': found_inf_v,
'prev_loss_scaling': prev_loss_scaling_v,
'num_good_steps': num_good_steps_v,
'num_bad_steps': num_bad_steps_v
},
fetch_list=[
result, x, found_inf, prev_loss_scaling,
num_good_steps, num_bad_steps
])
assert np.array_equal(result_v[0], a_v)
assert np.array_equal(result_v[1], b_v)
assert np.array_equal(result_v[0], result_v[2])
assert np.array_equal(result_v[1], result_v[3])
assert np.array_equal(result_v[4], found_inf_v)
assert np.array_equal(result_v[5], prev_loss_scaling_v * incr_ratio)
assert np.array_equal(result_v[6], np.zeros_like(num_good_steps_v))
assert np.array_equal(result_v[7], np.zeros_like(num_bad_steps_v))
def loss_scaling_check_inf(self, use_npu=True, scope=fluid.Scope()):
a = fluid.data(name="a", shape=[1024, 1024], dtype='float32')
b = fluid.data(name="b", shape=[512, 128], dtype='float32')
x = [a, b]
found_inf = fluid.data(name="found_inf", shape=[1], dtype='bool')
prev_loss_scaling = fluid.data(
name="prev_loss_scaling", shape=[1], dtype='float32')
num_good_steps = fluid.data(
name="num_good_steps", shape=[1], dtype='int32')
num_bad_steps = fluid.data(
name="num_bad_steps", shape=[1], dtype='int32')
a_v = np.random.random([1024, 1024]).astype('float32')
b_v = np.random.random([512, 128]).astype('float32')
i = np.random.randint(0, 1024, 1)
j = np.random.randint(0, 1024, 1)
a_v[i[0]][j[0]] = np.inf
found_inf_v = np.array([True]).astype('bool')
prev_loss_scaling_v = np.array([2048]).astype('float32')
num_good_steps_v = np.array([999], dtype=np.int32)
num_bad_steps_v = np.array([1], dtype=np.int32)
incr_every_n_steps = 1000
decr_every_n_nan_or_inf = 2
incr_ratio = 2
decr_ratio = 0.8
result = amp_nn.update_loss_scaling(
x,
found_inf,
prev_loss_scaling,
num_good_steps,
num_bad_steps,
incr_every_n_steps,
decr_every_n_nan_or_inf,
incr_ratio,
decr_ratio,
name="update_loss_scaling")
place = paddle.NPUPlace(0) if use_npu else fluid.CPUPlace()
exe = fluid.Executor(place)
with fluid.scope_guard(scope):
exe.run(fluid.default_startup_program())
result_v = exe.run(feed={
'a': a_v,
'b': b_v,
'found_inf': found_inf_v,
'prev_loss_scaling': prev_loss_scaling_v,
'num_good_steps': num_good_steps_v,
'num_bad_steps': num_bad_steps_v
},
fetch_list=[
result, x, found_inf, prev_loss_scaling,
num_good_steps, num_bad_steps
])
assert np.array_equal(result_v[0], np.zeros_like(a_v))
assert np.array_equal(result_v[1], np.zeros_like(b_v))
assert np.array_equal(result_v[2], np.zeros_like(a_v))
assert np.array_equal(result_v[3], np.zeros_like(b_v))
assert np.array_equal(result_v[4], found_inf_v)
assert np.array_equal(result_v[5], prev_loss_scaling_v * decr_ratio)
assert np.array_equal(result_v[6], np.zeros_like(num_good_steps_v))
assert np.array_equal(result_v[7], np.zeros_like(num_bad_steps_v))
def test_loss_scaling_cpu(self):
main = fluid.Program()
startup = fluid.Program()
with fluid.unique_name.guard():
with fluid.program_guard(main, startup):
self.loss_scaling_check(use_npu=False)
def test_loss_scaling_cpu_inf(self):
main = fluid.Program()
startup = fluid.Program()
with fluid.unique_name.guard():
with fluid.program_guard(main, startup):
self.loss_scaling_check_inf(use_npu=False)
def test_loss_scaling_npu(self):
main = fluid.Program()
startup = fluid.Program()
with fluid.unique_name.guard():
with fluid.program_guard(main, startup):
self.loss_scaling_check(use_npu=True)
def test_loss_scaling_npu_inf(self):
main = fluid.Program()
startup = fluid.Program()
with fluid.unique_name.guard():
with fluid.program_guard(main, startup):
self.loss_scaling_check_inf(use_npu=True)
if __name__ == '__main__':
unittest.main()
...@@ -1449,9 +1449,18 @@ class OpTest(unittest.TestCase): ...@@ -1449,9 +1449,18 @@ class OpTest(unittest.TestCase):
if not type(output_names) is list: if not type(output_names) is list:
output_names = [output_names] output_names = [output_names]
# FIXME: Replace numeric_place with place to calculate numeric_grads.
# NOTE(liym27): There is an unknown error when call op.run() on NPUPlace, which
# needs to be fixed.
if hasattr(self.__class__,
"use_npu") and self.__class__.use_npu == True:
numeric_place = paddle.CPUPlace()
else:
numeric_place = place
numeric_grads = user_defined_grads or [ numeric_grads = user_defined_grads or [
get_numeric_gradient( get_numeric_gradient(
place, numeric_place,
self.scope, self.scope,
self.op, self.op,
self.inputs, self.inputs,
......
...@@ -16,15 +16,14 @@ ...@@ -16,15 +16,14 @@
set -e set -e
cluster_node_ips="127.0.0.1" curr_host_ip=`hostname -i`
export PADDLE_TRAINERS_NUM=4 python hccl_tools.py --device_num "[0,4)" --server_ip ${curr_host_ip}
export POD_IP=127.0.0.1
export PADDLE_TRAINERS=127.0.0.1
export PADDLE_TRAINER_ID=0
export PADDLE_PORT=35789 export RANK_TABLE_FILE="${PWD}/hccl_4p_0123_${curr_host_ip}.json"
export TRAINER_PORTS_NUM=4
distributed_args="--ips=${cluster_node_ips} --ascend_npus=0,1,2,3 --log_dir=testlog" # use ascend
echo "begin test use ascend npu"
distributed_args="--run_mode=collective --log_dir=testlog"
python -m paddle.distributed.fleet.launch ${distributed_args} \ python -m paddle.distributed.fleet.launch ${distributed_args} \
ascend_group.py fleetascendgroup ascend_group.py fleetascendgroup
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
from paddle.fluid import core
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestAssign(OpTest):
def setUp(self):
self.set_npu()
self.place = paddle.NPUPlace(0)
self.op_type = "assign"
self.init_dtype()
x = np.rand.random([3, 3])
self.inputs = {'X': x}
self.attrs = {}
self.outputs = {'Out': x}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.int64
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import unittest
import sys
sys.path.append("..")
from op_test import OpTest
import paddle
import paddle.fluid as fluid
paddle.enable_static()
SEED = 2021
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseMax(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "elementwise_max"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
y = np.random.uniform(1, 2, [11, 17]).astype(self.dtype)
out = np.maximum(x, y)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
def init_dtype(self):
self.dtype = np.float32
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False)
# TODO(ascendrc): Max grad test
# def test_check_grad(self):
# if self.dtype == np.float16:
# return
# self.check_grad(['X'], 'Out')
#
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseMaxFp16(OpTest):
def setUp(self):
self.set_npu()
self.op_type = "elementwise_max"
self.place = paddle.NPUPlace(0)
self.init_dtype()
np.random.seed(SEED)
x = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
y = np.random.uniform(1, 2, [3, 4]).astype(self.dtype)
out = np.maximum(x, y)
self.inputs = {
'X': OpTest.np_dtype_to_fluid_dtype(x),
'Y': OpTest.np_dtype_to_fluid_dtype(y)
}
self.attrs = {}
self.outputs = {'Out': out}
def set_npu(self):
self.__class__.use_npu = True
self.__class__.no_need_check_grad = True
def init_dtype(self):
self.dtype = np.float16
def test_check_output(self):
self.check_output_with_place(self.place, check_dygraph=False, atol=1e-5)
@unittest.skipIf(not paddle.is_compiled_with_npu(),
"core is not compiled with NPU")
class TestElementwiseMaxNet(unittest.TestCase):
def _test(self, run_npu=True):
main_prog = paddle.static.Program()
startup_prog = paddle.static.Program()
main_prog.random_seed = SEED
startup_prog.random_seed = SEED
np.random.seed(SEED)
a_np = np.random.random(size=(32, 32)).astype('float32')
b_np = np.random.random(size=(32, 32)).astype('float32')
label_np = np.random.randint(2, size=(32, 1)).astype('int64')
with paddle.static.program_guard(main_prog, startup_prog):
a = paddle.static.data(name="a", shape=[32, 32], dtype='float32')
b = paddle.static.data(name="b", shape=[32, 32], dtype='float32')
label = paddle.static.data(
name="label", shape=[32, 1], dtype='int64')
c = paddle.maximum(a, b)
fc_1 = fluid.layers.fc(input=c, size=128)
prediction = fluid.layers.fc(input=fc_1, size=2, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
loss = fluid.layers.reduce_mean(cost)
sgd = fluid.optimizer.SGD(learning_rate=0.01)
sgd.minimize(loss)
if run_npu:
place = paddle.NPUPlace(0)
else:
place = paddle.CPUPlace()
exe = paddle.static.Executor(place)
exe.run(startup_prog)
print("Start run on {}".format(place))
for epoch in range(100):
pred_res, loss_res = exe.run(
main_prog,
feed={"a": a_np,
"b": b_np,
"label": label_np},
fetch_list=[prediction, loss])
if epoch % 10 == 0:
print("Epoch {} | Prediction[0]: {}, Loss: {}".format(
epoch, pred_res[0], loss_res))
return pred_res, loss_res
def test_npu(self):
cpu_pred, cpu_loss = self._test(False)
npu_pred, npu_loss = self._test(True)
self.assertTrue(np.allclose(npu_pred, cpu_pred))
self.assertTrue(np.allclose(npu_loss, cpu_loss))
if __name__ == '__main__':
unittest.main()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import sys
import os
import time
import six
import copy
import json
import unittest
import paddle.fluid as fluid
import paddle.distributed.fleet.ascend_utils as ascend_utils
RANK_TABLE_JSON = {
"status": "completed",
"version": "1.0",
"server_count": "1",
"server_list": [{
"server_id": "127.0.0.1",
"device": [{
"device_id": "0",
"device_ip": "192.1.184.23",
"rank_id": "0"
}, {
"device_id": "1",
"device_ip": "192.2.21.93",
"rank_id": "1"
}]
}]
}
class TestAscendUtil(unittest.TestCase):
def test_get_cloud_cluster(self):
cluster, pod = ascend_utils.get_cloud_cluster()
self.assertTrue(cluster)
self.assertTrue(pod)
with open('rank_table_file.json', 'w') as f:
json.dump(RANK_TABLE_JSON, f)
rank_table_file = "./rank_table_file.json"
cluster, pod = ascend_utils.get_cloud_cluster(
rank_table_file=rank_table_file)
self.assertTrue(cluster)
self.assertTrue(pod)
if __name__ == '__main__':
unittest.main()
...@@ -16,22 +16,43 @@ ...@@ -16,22 +16,43 @@
set -e set -e
# use paddlecloud RANK_TABLE_FILE_NAME="rank_table_file.json"
echo "begin test use paddlecloud" cat > ${RANK_TABLE_FILE_NAME} <<EOF
cluster_node_ips="127.0.0.1,127.0.0.2" {
export PADDLE_TRAINERS_NUM=2 "status": "completed",
export POD_IP=127.0.0.1 "version": "1.0",
export PADDLE_TRAINERS=127.0.0.1,127.0.0.2 "server_count": "1",
export PADDLE_TRAINER_ID=0 "server_list": [
{
export PADDLE_PORT=35789 "server_id": "127.0.0.1",
export TRAINER_PORTS_NUM=2 "device": [
{
distributed_args="--ips=${cluster_node_ips} --ascend_npus=0,1 --log_dir=testlog" "device_id": "0",
"device_ip": "192.1.184.23",
"rank_id": "0"
},
{
"device_id": "1",
"device_ip": "192.2.21.93",
"rank_id": "1"
}
]
}
]
}
EOF
# set ascend rank table file env
export RANK_TABLE_FILE="${PWD}/${RANK_TABLE_FILE_NAME}"
# use ascend
echo "begin test use ascend npu"
distributed_args="--run_mode=collective --log_dir=testlog"
python -m paddle.distributed.fleet.launch ${distributed_args} ascend_multi_process_collective.py fleetlaunchascend python -m paddle.distributed.fleet.launch ${distributed_args} ascend_multi_process_collective.py fleetlaunchascend
str1="selected_accelerators:0 worker_endpoints:127.0.0.1:35789,127.0.0.1:35790,127.0.0.2:35789,127.0.0.2:35790 trainers_num:4 current_endpoint:127.0.0.1:35789 trainer_id:0 device_ids:0,1,0,1 device_id:0" str1="selected_accelerators:0 selected_npus:0 worker_endpoints:127.0.0.1:6170,127.0.0.1:6171 trainers_num:2 current_endpoint:127.0.0.1:6170 trainer_id:0 device_ids:0,1 device_id:0"
str2="selected_accelerators:1 worker_endpoints:127.0.0.1:35789,127.0.0.1:35790,127.0.0.2:35789,127.0.0.2:35790 trainers_num:4 current_endpoint:127.0.0.1:35790 trainer_id:1 device_ids:0,1,0,1 device_id:1" str2="selected_accelerators:1 selected_npus:1 worker_endpoints:127.0.0.1:6170,127.0.0.1:6171 trainers_num:2 current_endpoint:127.0.0.1:6171 trainer_id:1 device_ids:0,1 device_id:1"
file_0="multi_process_fleetlaunchascend.check_0.log" file_0="multi_process_fleetlaunchascend.check_0.log"
file_1="multi_process_fleetlaunchascend.check_1.log" file_1="multi_process_fleetlaunchascend.check_1.log"
......
#!/bin/bash
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -e
RANK_TABLE_FILE_NAME="rank_table_file.json"
cat > ${RANK_TABLE_FILE_NAME} <<EOF
{
"status": "completed",
"version": "1.0",
"server_count": "2",
"server_list": [
{
"server_id": "127.0.0.1",
"device": [
{
"device_id": "0",
"device_ip": "192.1.184.23",
"rank_id": "0"
},
{
"device_id": "1",
"device_ip": "192.2.21.93",
"rank_id": "1"
}
]
},
{
"server_id": "127.0.0.2",
"device": [
{
"device_id": "0",
"device_ip": "192.1.94.132",
"rank_id": "2"
},
{
"device_id": "1",
"device_ip": "192.2.94.30",
"rank_id": "3"
}
]
}
]
}
EOF
# set ascend rank table file env
export RANK_TABLE_FILE="${PWD}/${RANK_TABLE_FILE_NAME}"
# use paddlecloud
echo "begin test use paddlecloud"
cluster_node_ips="127.0.0.1,127.0.0.2"
export PADDLE_TRAINERS_NUM=2
export POD_IP=127.0.0.1
export PADDLE_TRAINERS=127.0.0.1,127.0.0.2
export PADDLE_TRAINER_ID=0
export PADDLE_PORT=35789
export TRAINER_PORTS_NUM=2
distributed_args="--run_mode=collective --log_dir=testlog"
python -m paddle.distributed.fleet.launch ${distributed_args} ascend_multi_process_collective.py fleetlaunchascend
str1="selected_accelerators:0 worker_endpoints:127.0.0.1:35789,127.0.0.1:35790,127.0.0.2:35789,127.0.0.2:35790 trainers_num:4 current_endpoint:127.0.0.1:35789 trainer_id:0 device_ids:0,1,0,1 device_id:0"
str2="selected_accelerators:1 worker_endpoints:127.0.0.1:35789,127.0.0.1:35790,127.0.0.2:35789,127.0.0.2:35790 trainers_num:4 current_endpoint:127.0.0.1:35790 trainer_id:1 device_ids:0,1,0,1 device_id:1"
file_0="multi_process_fleetlaunchascend.check_0.log"
file_1="multi_process_fleetlaunchascend.check_1.log"
echo "paddlecloud params test"
if grep -q "$str1" "$file_0"; then
echo "find trainer 0"
else
echo "not find trainer 0"
exit -1
fi
if grep -q "$str2" "$file_1"; then
echo "find trainer 1"
else
echo "not find trainer 1"
exit -1
fi
# test async poll process
if [ -f $file_0 ]; then
rm $file_0
fi
if [ -f $file_1 ]; then
rm $file_1
fi
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册