Update Memory ReadMe Doc

340b8bad · liaogang · 03b3d0d8 · 340b8bad
隐藏空白更改
内联并排

Showing with 3 addition and 139 deletion

paddle/memory/README.md paddle/memory/README.md +3 -139

未找到文件。
--- a/paddle/memory/README.md
+++ b/paddle/memory/README.md
-## Design
+# Region-based Heterogeneous Memory Management
-### Usage
+Please check out the [design documentation](http://gangliao.me) to find out more details about
+buddy memory allocator for both CPU and GPU.
-To allocate 4KB CPU memory:
-```cpp
-p = memory::Alloc(platform::CPUPlace(), 4*1024);
-```
-To allocate 4KB memory on the 3rd GPU:
-```cpp
-p = memory::Alloc(platform::GPUPlace(2), 4*1024);
-```
-To free memory and check the so-far used amount of memory on a place:
-```cpp
-auto pl = platform::GPUPlace(0);
-p = memory::Alloc(pl, 4*1024);
-cout << memory::Used(pl);
-memory::Free(pl, p);
-```
-### API
-In `paddle/memory/memory.h` we have:
-```cpp
-namespace memory {
-template <typename Place> void* Alloc(Place, size_t);
-template <typename Place> void Free(Place, void*);
-template <typename Place> size_t Used(Place);
-}  // namespace memory
-```
-These function templates have specializations on either `platform::CPUPlace` or `platform::GPUPlace`:
-```cpp
-template<>
-void* Alloc<CPUPlace>(CPUPlace p, size_t size) {
-  return GetCPUBuddyAllocator()->Alloc(size);
-}
-```
-and 
-```cpp
-template<>
-void Alloc<GPUPlace>(GPUPlace p, size_t size) {
-  return GetGPUBuddyAllocator(p.id)->Alloc(size);
-}
-```
-Similar specializations exist for `Free` and `Used`.
-### Implementation
-`GetCPUBuddyAllocator` and `GetGPUBuddyAllocator` are singletions.
-```cpp
-BuddyAllocator* GetCPUBuddyAllocator() {
-  static BuddyAllocator* a = NULL;
-  if (a == NULL) {
-    a = new BuddyAllocator(new CPUAllocator /*backup allocator*/, ...);
-  }
-  return a;
-}
-BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) {
-  static BuddyAllocator* as = NULL;
-  if (as == NULL) {
-    as = new BuddyAllocator*[platform::NumGPUs()];
-    for (int gpu = 0; gpu < platform::NumGPUs(); gpu++) {
-      as[gpu] = new BuddyAllocator(new GPUAllocator(gpu) /* backup allocator */, ...);
-    }
-  }
-  return as[gpu_id);
-```
-#### `BuddyAllocator`
-`BuddyAllocator` implements the buddy allocation algorithm.  Its constructor takes parameters only related with the algorithm:
-```cpp
-BuddyAllocator::BuddyAllocator(initial_pool_size, max_pool_size) {
-  ...
-}
-```
-Please be aware that **`BuddyAllocator` always allocate aligned memory**, aligned on 32-bytes, which can hold a `BuddyAllocator::Block` object:
-```cpp
-class BuddyAllocator {
- private:
-  struct Block {
-    size_t size;
-    Block* left, right;
-    size_t index; // allocator id
-  };
-  ...
-};
-```
-Because BuddyAllocator has the meta-data of each block, it can trace the used memory -- record the amount returned by `Alloc` freed in `Free`.  Instead, `CPUAllocator` and `GPUAllocator` doesn't know the size of freed memory block and cannot do the trace.
-#### System Allocators
-The `GPUAllocator` and `CPUAllocator` are calls *system allocators*.  They work as the fallback allocators of `BuddyAllocator`.
-## Justification
-I got inspiration from Majel and Caffe2, though above design look different from both.
-### Caffe2
-In Caffe2, `Tensor<Context>::mutable_data()` allocates the memroy.  In particular, [`Tensor<Context>::mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L523) calls [`Tensor<Context>::raw_mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L459), which in turn calls [`Context::New`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L479).
-There are two implementations of `Context`:
-1. [`CPUContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L105), whose [`New` method](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L131) calls [`g_cpu_allocator.get()->New(size_t)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.cc#L15) to allocate the memory.
-1. [`CUDAContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L99), which has a data member [`int gpu_id_`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L202).  This looks very similar to class `majel::GPUPlace`, who also has an `int id_` data member.   `CUDAContext::New(size_t)` calls [`g_cub_allocator->DeviceAllocate(&ptr, nbytes)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.cu#L355) to allocate the memory.
-### Majel
-In Majel, there are basically two allocator types:
-1. `cpu::SystemAllocator`, which has similar functionality to `caffe2::CPUContext::New/Delete`.
-1. `gpu::SystemAllocator`, which has similar functionality to `caffe2::CUDAContext::New/Delete`.
-However, memory allocation is not via these two allocators.  Instead, these two allocators are defined in hidden namespaces.
-In Majel there are hidden global variables like:
-1. `cpu::SystemAllocator g_cpu_allocator`, and
-1. `vector<gpu::SystemAllocator*> g_gpu_allocators(NUM_GPUS)`.
-Programs allocate memory via a BuddyAllocator, which can take the `g_cpu_allocator` or a `g_gpu_allocators[gpu_id]` as its *fallback allocator*, so that if BuddyAllocator cannot find a block in its memory pool, it extends its memory pool by calling the fallback allocator's `New(size_t)`.