README.md 4.7 KB
Newer Older
Y
Yi Wang 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## Design

### Usage

To allocate 4KB CPU memory:

```cpp
p = memory::Alloc(platform::CPUPlace(), 4*1024);
```

To allocate 4KB memory on the 3rd GPU:

```cpp
p = memory::Alloc(platform::GPUPlace(2), 4*1024);
```

To free memory and check the so-far used amount of memory on a place:

```cpp
auto pl = platform::GPUPlace(0);
p = memory::Alloc(pl, 4*1024);
cout << memory::Used(pl);
memory::Free(pl, p);
```

26
### API
Y
Yi Wang 已提交
27 28 29 30

In `paddle/memory/memory.h` we have:

```cpp
31 32 33 34 35
namespace memory {
template <typename Place> void* Alloc(Place, size_t);
template <typename Place> void Free(Place, void*);
template <typename Place> void Used(Place);
}  // namespace memory
Y
Yi Wang 已提交
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
```

These function templates have specializations on either `platform::CPUPlace` or `platform::GPUPlace`:

```cpp
template<>
void Alloc<CPUPlace>(CPUPlace p, size_t size) {
  return GetCPUBuddyAllocator()->Alloc(size);
}
```

and 

```cpp
template<>
51
void Alloc<GPUPlace>(GPUPlace p, size_t size) {
Y
Yi Wang 已提交
52 53 54 55
  return GetGPUBuddyAllocator(p.id)->Alloc(size);
}
```

56 57 58
Similar specializations exist for `Free` and `Used`.

### Implementation
Y
Yi Wang 已提交
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98

`GetCPUBuddyAllocator` and `GetGPUBuddyAllocator` are singletions.

```cpp
BuddyAllocator* GetCPUBuddyAllocator() {
  static BuddyAllocator* a = NULL;
  if (a == NULL) {
    a = new BuddyAllocator(new CPUAllocator /*backup allocator*/, ...);
  }
  return a;
}

BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) {
  static BuddyAllocator* as = NULL;
  if (as == NULL) {
    as = new BuddyAllocator*[platform::NumGPUs()];
    for (int gpu = 0; gpu < platform::NumGPUs(); gpu++) {
      as[gpu] = new BuddyAllocator(new GPUAllocator(gpu) /* backup allocator */, ...);
    }
  }
  return as[gpu_id);
```

#### `BuddyAllocator`

`BuddyAllocator` implements the buddy allocation algorithm.  Its constructor takes parameters only related with the algorithm:

```cpp
BuddyAllocator::BuddyAllocator(initial_pool_size, max_pool_size) {
  ...
}
```

Please be aware that **`BuddyAllocator` always allocate aligned memory**, aligned on 32-bytes, which can hold a `BuddyAllocator::Block` object:

```cpp
class BuddyAllocator {
 private:
  struct Block {
    size_t size;
99
    Block* left, right;
Y
Yi Wang 已提交
100 101 102 103 104 105 106
  };
  ...
};
```

#### System Allocators

107
The `GPUAllocator` and `CPUAllocator` are calls *system allocators*.  They work as the fallback allocators of `BuddyAllocator`.  A system allocator holds information about a device, including the amount of memory has been allocated, so we can call
Y
Yi Wang 已提交
108

109 110
- `GPUAllocator::Used()` and
- `CPUAllocator::Used()`
Y
Yi Wang 已提交
111 112 113 114

to get the amount of memory that has been allocated so far.


115
## Justification
Y
Yi Wang 已提交
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143

I got inspiration from Majel and Caffe2, though above design look different from both.

### Caffe2

In Caffe2, `Tensor<Context>::mutable_data()` allocates the memroy.  In particular, [`Tensor<Context>::mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L523) calls [`Tensor<Context>::raw_mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L459), which in turn calls [`Context::New`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L479).

There are two implementations of `Context`:

1. [`CPUContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L105), whose [`New` method](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L131) calls [`g_cpu_allocator.get()->New(size_t)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.cc#L15) to allocate the memory.

1. [`CUDAContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L99), which has a data member [`int gpu_id_`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L202).  This looks very similar to class `majel::GPUPlace`, who also has an `int id_` data member.   `CUDAContext::New(size_t)` calls [`g_cub_allocator->DeviceAllocate(&ptr, nbytes)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.cu#L355) to allocate the memory.

### Majel

In Majel, there are basically two allocator types:

1. `cpu::SystemAllocator`, which has similar functionality to `caffe2::CPUContext::New/Delete`.
1. `gpu::SystemAllocator`, which has similar functionality to `caffe2::CUDAContext::New/Delete`.

However, memory allocation is not via these two allocators.  Instead, these two allocators are defined in hidden namespaces.

In Majel there are hidden global variables like:

1. `cpu::SystemAllocator g_cpu_allocator`, and
1. `vector<gpu::SystemAllocator*> g_gpu_allocators(NUM_GPUS)`.

Programs allocate memory via a BuddyAllocator, which can take the `g_cpu_allocator` or a `g_gpu_allocators[gpu_id]` as its *fallback allocator*, so that if BuddyAllocator cannot find a block in its memory pool, it extends its memory pool by calling the fallback allocator's `New(size_t)`.