diff --git a/doc/design/dist_refactor/parameter_server.md b/doc/design/dist_refactor/parameter_server.md
index 1094f06d461275a9ad4034d5e48b39856d967b71..805dd13048d41b995d2a01cda52b2ea33e4bbe1d 100644
--- a/doc/design/dist_refactor/parameter_server.md
+++ b/doc/design/dist_refactor/parameter_server.md
@@ -9,16 +9,16 @@ different purposes.
## Background
-The previous implementations of the parameter server does not run a
+The previous implementations of the parameter server do not run a
fluid sub-program. Parameter initialization, optimizer computation, network
communication and checkpointing are implemented twice on both the
-trainer and the parameter server.
+trainer as well as the parameter server.
-It would be great if we can write code once and use them on both the
-trainer and the parameter server: reduces code duplication and
-improves extensibility. Given that after the current refactor, we are
-representing everything as a computing graph on the
-trainer. Representing everything as a computing graph on the parameter
+It would be great if we can write code once and use them on both: the
+trainer and the parameter server, since this reduces code duplication and
+improves extensibility. Given that after the current refactoring, we are
+representing everything as a computation graph on the
+trainer. Representing everything as a computation graph on the parameter
server becomes a natural extension.
## Design
@@ -30,9 +30,9 @@ into sub-programs to be scheduled on different nodes with the following
steps:
1. OP placement: the OPs will be placed on different nodes according
- to heuristic that minimizes estimated total computation
+ to a heuristic that minimizes the estimated total computation
time. Currently we will use a simple heuristic that puts parameter
- varable on parameter server workers and everything else on trainer
+ variable on parameter server workers and everything else on trainer
workers.
1. Add communication OPs to enable the communication between nodes.
@@ -47,22 +47,22 @@ After converting:
-1. The parameter variable W and it's optimizer program are placed on the parameter server.
+1. The parameter variable W and its optimizer program are placed on the parameter server.
1. Operators are added to the program.
- *Send* sends data to the connected *Recv* operator. The
scheduler on the receive node will only schedule *Recv* operator
to run when the *Send* operator has ran (the *Send* OP will mark
the *Recv* OP runnable automatically).
- - *Enueue* enqueues the input variable, it can block until space
+ - *Enqueue* enqueues the input variable, it can block until space
become available in the queue.
- *Dequeue* outputs configurable numbers of tensors from the
- queue. It will block until the queue have the required number of
+ queue. It will block until the queue has the required number of
tensors.
### Benefits
-- Model parallelism become easier to implement: it's an extension to
+- Model parallelism becomes easier to implement: it is an extension to
the trainer - parameter server approach. We can have several "Transpilers"
to achieve different goals.
- User-defined optimizer is easier to add - user can now express it as
@@ -72,22 +72,22 @@ After converting:
### Challenges
-- It's important to balance the parameter shards of on multiple
- parameter server. If a single parameter is very big (some
+- It is important to balance the parameter shards on multiple
+ parameter servers. If a single parameter is very big (for example: some
word-embedding, fully connected, softmax layer), we need to
automatically partition the single parameter onto different
parameter servers when possible (only element-wise optimizer depends
on the parameter variable).
-- In the "Aync SGD" figure, the "W" variable on the parameter server
- could be read and wrote concurrently. See
+- In the "Async SGD" figure, the "W" variable on the parameter server
+ could be read and written concurrently. See
[here](https://github.com/PaddlePaddle/Paddle/pull/6394) for more
- details about concurrent program in fluid.
+ details about concurrent program in Fluid.
### Discussion
- Can the Enqueue OP be implemented under our current tensor design
- (puts the input tensor into the queue tensor)?
-- *Dequeue* OP will have variable numbers of output (depends on the
+ (put the input tensor into the queue tensor)?
+- *Dequeue* OP will have variable numbers of output (depending on the
`min_count` attribute), does our current design support it? (similar
question for the *Add* OP)
diff --git a/doc/howto/optimization/cpu_profiling.md b/doc/howto/optimization/cpu_profiling.md
index 1775374cf6e518586c28bbd8e04946c74df7e4c5..368af40cc7308cf6f4c609361078fe3ba02213ed 100644
--- a/doc/howto/optimization/cpu_profiling.md
+++ b/doc/howto/optimization/cpu_profiling.md
@@ -60,8 +60,7 @@ each column is as follows:
| column | meaning |
| --- | --- |
| ncalls | the number of calls into a function |
-| tottime | the total execution time of the function, not including the
- execution time of other functions called by the function |
+| tottime | the total execution time of the function, not including the execution time of other functions called by the function |
| percall | tottime divided by ncalls |
| cumtime | the total execution time of the function, including the execution time of other functions being called |
| percall | cumtime divided by ncalls |
diff --git a/paddle/gserver/layers/PriorBox.cpp b/paddle/gserver/layers/PriorBox.cpp
index 337b9ba7bc0fc4e4bb80ee7b248d934f111379d5..8faf032f550836579522016b4fff3db7e94746e3 100644
--- a/paddle/gserver/layers/PriorBox.cpp
+++ b/paddle/gserver/layers/PriorBox.cpp
@@ -69,7 +69,7 @@ bool PriorBoxLayer::init(const LayerMap& layerMap,
if (maxSize_.size() > 0) CHECK_EQ(minSize_.size(), maxSize_.size());
// flip aspect ratios
- for (int index = 0; index < tmp.size(); index++) {
+ for (unsigned index = 0; index < tmp.size(); index++) {
real ar = tmp[index];
if (fabs(ar - 1.) < 1e-6) continue;
aspectRatio_.push_back(ar);
diff --git a/paddle/operators/ctc_align_op.h b/paddle/operators/ctc_align_op.h
index 589413feb3dcbb7fea1f0a878b35d4bf714b5318..fed89aa1e899a2450b315f352b9695056ed13aec 100644
--- a/paddle/operators/ctc_align_op.h
+++ b/paddle/operators/ctc_align_op.h
@@ -51,7 +51,7 @@ class CTCAlignKernel : public framework::OpKernel {
T prev_token = -1;
for (size_t i = input_lod[level][seq_idx];
i < input_lod[level][seq_idx + 1]; ++i) {
- if (input_data[i] != blank &&
+ if ((unsigned)input_data[i] != blank &&
!(merge_repeated && input_data[i] == prev_token)) {
output_data[output_idx] = input_data[i];
++output_idx;
diff --git a/paddle/operators/sequence_reshape_op.h b/paddle/operators/sequence_reshape_op.h
index c6f528ab8a73294bb8ee91425f34e44c66f1932c..aaae7ab29281b72848515b80cc60931c13a294c9 100644
--- a/paddle/operators/sequence_reshape_op.h
+++ b/paddle/operators/sequence_reshape_op.h
@@ -35,7 +35,7 @@ class SequenceReshapeKernel : public framework::OpKernel {
PADDLE_ENFORCE_EQ(in_lod.size(), 1UL,
"Only support one level sequence now.");
PADDLE_ENFORCE_EQ(
- in_dims[0], in_lod[0].back(),
+ (uint64_t)in_dims[0], in_lod[0].back(),
"Inconsistent size between X.shape[0] and X.lod()[0].back().");
auto in_lod_l0 = in_lod[0];
diff --git a/python/paddle/v2/image.py b/python/paddle/v2/image.py
index a7bb22a35519b87e196b014056649f3a1bfa504a..e5000e440cc8d822dbd38dce3978d2722d32ebe4 100644
--- a/python/paddle/v2/image.py
+++ b/python/paddle/v2/image.py
@@ -176,7 +176,6 @@ def resize_short(im, size):
:param size: the shorter edge size of image after resizing.
:type size: int
"""
- assert im.shape[-1] == 1 or im.shape[-1] == 3
h, w = im.shape[:2]
h_new, w_new = size, size
if h > w:
@@ -267,7 +266,7 @@ def random_crop(im, size, is_color=True):
return im
-def left_right_flip(im):
+def left_right_flip(im, is_color=True):
"""
Flip an image along the horizontal direction.
Return the flipped image.
@@ -278,13 +277,15 @@ def left_right_flip(im):
im = left_right_flip(im)
- :paam im: input image with HWC layout
+ :param im: input image with HWC layout or HW layout for gray image
:type im: ndarray
+ :param is_color: whether input image is color or not
+ :type is_color: bool
"""
- if len(im.shape) == 3:
+ if len(im.shape) == 3 and is_color:
return im[:, ::-1, :]
else:
- return im[:, ::-1, :]
+ return im[:, ::-1]
def simple_transform(im,
@@ -321,8 +322,9 @@ def simple_transform(im,
if is_train:
im = random_crop(im, crop_size, is_color=is_color)
if np.random.randint(2) == 0:
- im = left_right_flip(im)
+ im = left_right_flip(im, is_color)
else:
+ im = center_crop(im, crop_size, is_color)
im = center_crop(im, crop_size, is_color=is_color)
if len(im.shape) == 3:
im = to_chw(im)
@@ -331,8 +333,10 @@ def simple_transform(im,
if mean is not None:
mean = np.array(mean, dtype=np.float32)
# mean value, may be one value per channel
- if mean.ndim == 1:
+ if mean.ndim == 1 and is_color:
mean = mean[:, np.newaxis, np.newaxis]
+ elif mean.ndim == 1:
+ mean = mean
else:
# elementwise mean
assert len(mean.shape) == len(im)
@@ -372,6 +376,6 @@ def load_and_transform(filename,
mean values per channel.
:type mean: numpy array | list
"""
- im = load_image(filename)
+ im = load_image(filename, is_color)
im = simple_transform(im, resize_size, crop_size, is_train, is_color, mean)
return im