Created by: DannyIsFunny
【Issue】Current Tensor
struct will consume almost 120% physical space as referenced to the inside data size.
// Experiment: Create 100 Tensors of 8K byte,800K memory usage in theory.
// But corresponding malloc procedure will consume 1000K actually.
std::vector<Tensor*> test_tensors;
test_tensors.resize(100);
for(int i = 0; i < 100; i++) {
test_tensors[i] = new Tensor();
test_tensors[i]->Resize({2048});
auto tensor_data = test_tensors[i]->mutable_data<float>();
for(int j = 0; j < 2048; j++) {
tensor_data[j] = 1;
}
}
【Reason】
The implementation of Malloc in Paddle-Lite has assign MALLOC_ALIGN
to be 64. The MALLOC_ALIGN
in Anakin is 16 and will result in a reduce in Tensor memory usage.
TargetWrapper<TARGET(kHost)>::Malloc(size_t size)
【Effect of Current PR】
Change MALLOC_ALIGN
in TargetWrapper<TARGET(kHost)>::Malloc(size_t size)
from 64 to 16.
Experiment result: The memory consumed by 98 tensors has reduced from 1060KB to 800KB