提交 ff5ca692 编写于 作者: Y Yu Yang

Merge branch 'develop' of github.com:baidu/Paddle into feature/refine_doc_drnn

...@@ -42,7 +42,7 @@ addons: ...@@ -42,7 +42,7 @@ addons:
before_install: before_install:
- | - |
if [ ${JOB} == "BUILD_AND_TEST" ]; then if [ ${JOB} == "BUILD_AND_TEST" ]; then
if ! git diff --name-only $TRAVIS_COMMIT_RANGE | grep -qvE '(\.md$)' if ! git diff --name-only $TRAVIS_COMMIT_RANGE | grep -qvE '(\.md$)|(\.rst$)|(\.jpg$)|(\.png$)'
then then
echo "Only markdown docs were updated, stopping build process." echo "Only markdown docs were updated, stopping build process."
exit exit
......
...@@ -36,6 +36,7 @@ option(WITH_RDMA "Compile PaddlePaddle with rdma support" OFF) ...@@ -36,6 +36,7 @@ option(WITH_RDMA "Compile PaddlePaddle with rdma support" OFF)
option(WITH_GLOG "Compile PaddlePaddle use glog, otherwise use a log implement internally" ${LIBGLOG_FOUND}) option(WITH_GLOG "Compile PaddlePaddle use glog, otherwise use a log implement internally" ${LIBGLOG_FOUND})
option(WITH_GFLAGS "Compile PaddlePaddle use gflags, otherwise use a flag implement internally" ${GFLAGS_FOUND}) option(WITH_GFLAGS "Compile PaddlePaddle use gflags, otherwise use a flag implement internally" ${GFLAGS_FOUND})
option(WITH_TIMER "Compile PaddlePaddle use timer" OFF) option(WITH_TIMER "Compile PaddlePaddle use timer" OFF)
option(WITH_PROFILER "Compile PaddlePaddle use gpu profiler" OFF)
option(WITH_TESTING "Compile and run unittest for PaddlePaddle" ${GTEST_FOUND}) option(WITH_TESTING "Compile and run unittest for PaddlePaddle" ${GTEST_FOUND})
option(WITH_DOC "Compile PaddlePaddle with documentation" OFF) option(WITH_DOC "Compile PaddlePaddle with documentation" OFF)
option(WITH_SWIG_PY "Compile PaddlePaddle with py PaddlePaddle prediction api" ${SWIG_FOUND}) option(WITH_SWIG_PY "Compile PaddlePaddle with py PaddlePaddle prediction api" ${SWIG_FOUND})
...@@ -115,7 +116,6 @@ else() ...@@ -115,7 +116,6 @@ else()
endif(WITH_AVX) endif(WITH_AVX)
if(WITH_DSO) if(WITH_DSO)
set(CUDA_LIBRARIES "")
add_definitions(-DPADDLE_USE_DSO) add_definitions(-DPADDLE_USE_DSO)
endif(WITH_DSO) endif(WITH_DSO)
...@@ -135,6 +135,10 @@ if(NOT WITH_TIMER) ...@@ -135,6 +135,10 @@ if(NOT WITH_TIMER)
add_definitions(-DPADDLE_DISABLE_TIMER) add_definitions(-DPADDLE_DISABLE_TIMER)
endif(NOT WITH_TIMER) endif(NOT WITH_TIMER)
if(NOT WITH_PROFILER)
add_definitions(-DPADDLE_DISABLE_PROFILER)
endif(NOT WITH_PROFILER)
if(WITH_AVX) if(WITH_AVX)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${AVX_FLAG}") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${AVX_FLAG}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${AVX_FLAG}") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${AVX_FLAG}")
......
...@@ -24,7 +24,7 @@ paddle train \ ...@@ -24,7 +24,7 @@ paddle train \
--test_all_data_in_one_period=1 \ --test_all_data_in_one_period=1 \
--use_gpu=1 \ --use_gpu=1 \
--trainer_count=1 \ --trainer_count=1 \
--num_passes=200 \ --num_passes=300 \
--save_dir=$output \ --save_dir=$output \
2>&1 | tee $log 2>&1 | tee $log
......
...@@ -18,7 +18,5 @@ set -x ...@@ -18,7 +18,5 @@ set -x
# download the dictionary and pretrained model # download the dictionary and pretrained model
for file in baidu.dict model_32.emb model_64.emb model_128.emb model_256.emb for file in baidu.dict model_32.emb model_64.emb model_128.emb model_256.emb
do do
# following is the google drive address wget http://paddlepaddle.bj.bcebos.com/model_zoo/embedding/$file
# you can also directly download from https://pan.baidu.com/s/1o8q577s
wget https://www.googledrive.com/host/0B7Q8d52jqeI9ejh6Q1RpMTFQT1k/embedding/$file --no-check-certificate
done done
...@@ -24,9 +24,7 @@ echo "Downloading ResNet models..." ...@@ -24,9 +24,7 @@ echo "Downloading ResNet models..."
for file in resnet_50.tar.gz resnet_101.tar.gz resnet_152.tar.gz mean_meta_224.tar.gz for file in resnet_50.tar.gz resnet_101.tar.gz resnet_152.tar.gz mean_meta_224.tar.gz
do do
# following is the google drive address wget http://paddlepaddle.bj.bcebos.com/model_zoo/imagenet/$file
# you can also directly download from https://pan.baidu.com/s/1o8q577s
wget https://www.googledrive.com/host/0B7Q8d52jqeI9ejh6Q1RpMTFQT1k/imagenet/$file --no-check-certificate
tar -xvf $file tar -xvf $file
rm $file rm $file
done done
......
This dataset consists of electronics product reviews associated with
binary labels (positive/negative) for sentiment classification.
The preprocessed data can be downloaded by script `get_data.sh`.
The data was derived from reviews_Electronics_5.json.gz at
http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Electronics_5.json.gz
If you want to process the raw data, you can use the script `proc_from_raw_data/get_data.sh`.
...@@ -17,14 +17,11 @@ set -e ...@@ -17,14 +17,11 @@ set -e
DIR="$( cd "$(dirname "$0")" ; pwd -P )" DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd $DIR cd $DIR
echo "Downloading Amazon Electronics reviews data..." # Download the preprocessed data
# http://jmcauley.ucsd.edu/data/amazon/ wget http://paddlepaddle.bj.bcebos.com/demo/quick_start_preprocessed_data/preprocessed_data.tar.gz
wget http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Electronics_5.json.gz
echo "Downloading mosesdecoder..." # Extract package
#https://github.com/moses-smt/mosesdecoder tar zxvf preprocessed_data.tar.gz
wget https://github.com/moses-smt/mosesdecoder/archive/master.zip
unzip master.zip # Remove compressed package
rm master.zip rm preprocessed_data.tar.gz
echo "Done."
the device is cute , but that 's just about all that 's good. the specs are what you 'd expect : it 's a wifi mic , with some noise filter options. the app has the option to upload your baby 's name and photo , which is a cutesy touch. but the app is otherwise unstable and useless unless you upgrade for $ 60 / year.set up involves downloading the app , turning on the mic , switching your phone to the wifi network of the mic , telling the app your wifi settings , switching your wifi back to your home router. the app is then directly connected to your mic.the app is adware ! the main screen says " cry notifications on / off : upgrade to evoz premium and receive a text message of email when your baby is crying " .but the adware points out an important limitation , this monitor is only intended to be used from your home network. if you want to access it remotely , get a webcam. this app would make a lot more sense of the premium features were included with the hardware .
don 't be fooled by my one star rating. if there was a zero , i would have selected it. this product was a waste of my money.it has never worked like the company said it supposed to. i only have one device , an iphone 4gs. after charging the the iphone mid way , the i.sound portable power max 16,000 mah is completely drained. the led light no longer lit up. when plugging the isound portable power max into a wall outlet to charge , it would charge for about 20-30 minutes and then all four battery led indicator lit up showing a full charge. i would leave it on to charge for the full 8 hours or more but each time with the same result upon using. don 't buy this thing. put your money to good use elsewhere .
...@@ -16,10 +16,26 @@ ...@@ -16,10 +16,26 @@
# 1. size of pos : neg = 1:1. # 1. size of pos : neg = 1:1.
# 2. size of testing set = min(25k, len(all_data) * 0.1), others is traning set. # 2. size of testing set = min(25k, len(all_data) * 0.1), others is traning set.
# 3. distinct train set and test set. # 3. distinct train set and test set.
# 4. build dict
set -e set -e
DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd $DIR
# Download data
echo "Downloading Amazon Electronics reviews data..."
# http://jmcauley.ucsd.edu/data/amazon/
wget http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Electronics_5.json.gz
echo "Downloading mosesdecoder..."
# https://github.com/moses-smt/mosesdecoder
wget https://github.com/moses-smt/mosesdecoder/archive/master.zip
unzip master.zip
rm master.zip
##################
# Preprocess data
echo "Preprocess data..."
export LC_ALL=C export LC_ALL=C
UNAME_STR=`uname` UNAME_STR=`uname`
...@@ -29,11 +45,11 @@ else ...@@ -29,11 +45,11 @@ else
SHUF_PROG='gshuf' SHUF_PROG='gshuf'
fi fi
mkdir -p data/tmp mkdir -p tmp
python preprocess.py -i data/reviews_Electronics_5.json.gz python preprocess.py -i reviews_Electronics_5.json.gz
# uniq and shuffle # uniq and shuffle
cd data/tmp cd tmp
echo 'uniq and shuffle...' echo 'Uniq and shuffle...'
cat pos_*|sort|uniq|${SHUF_PROG}> pos.shuffed cat pos_*|sort|uniq|${SHUF_PROG}> pos.shuffed
cat neg_*|sort|uniq|${SHUF_PROG}> neg.shuffed cat neg_*|sort|uniq|${SHUF_PROG}> neg.shuffed
...@@ -53,11 +69,11 @@ cat train.pos train.neg | ${SHUF_PROG} >../train.txt ...@@ -53,11 +69,11 @@ cat train.pos train.neg | ${SHUF_PROG} >../train.txt
cat test.pos test.neg | ${SHUF_PROG} >../test.txt cat test.pos test.neg | ${SHUF_PROG} >../test.txt
cd - cd -
echo 'data/train.txt' > data/train.list echo 'train.txt' > train.list
echo 'data/test.txt' > data/test.list echo 'test.txt' > test.list
# use 30k dict # use 30k dict
rm -rf data/tmp rm -rf tmp
mv data/dict.txt data/dict_all.txt mv dict.txt dict_all.txt
cat data/dict_all.txt | head -n 30001 > data/dict.txt cat dict_all.txt | head -n 30001 > dict.txt
echo 'preprocess finished' echo 'Done.'
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" """
1. (remove HTML before or not)tokensizing 1. Tokenize the words and punctuation
2. pos sample : rating score 5; neg sample: rating score 1-2. 2. pos sample : rating score 5; neg sample: rating score 1-2.
Usage: Usage:
...@@ -76,7 +76,11 @@ def tokenize(sentences): ...@@ -76,7 +76,11 @@ def tokenize(sentences):
sentences : a list of input sentences. sentences : a list of input sentences.
return: a list of processed text. return: a list of processed text.
""" """
dir = './data/mosesdecoder-master/scripts/tokenizer/tokenizer.perl' dir = './mosesdecoder-master/scripts/tokenizer/tokenizer.perl'
if not os.path.exists(dir):
sys.exit(
"The ./mosesdecoder-master/scripts/tokenizer/tokenizer.perl does not exists."
)
tokenizer_cmd = [dir, '-l', 'en', '-q', '-'] tokenizer_cmd = [dir, '-l', 'en', '-q', '-']
assert isinstance(sentences, list) assert isinstance(sentences, list)
text = "\n".join(sentences) text = "\n".join(sentences)
...@@ -104,7 +108,7 @@ def tokenize_batch(id): ...@@ -104,7 +108,7 @@ def tokenize_batch(id):
num_batch, instance, pre_fix = parse_queue.get() num_batch, instance, pre_fix = parse_queue.get()
if num_batch == -1: ### parse_queue finished if num_batch == -1: ### parse_queue finished
tokenize_queue.put((-1, None, None)) tokenize_queue.put((-1, None, None))
sys.stderr.write("tokenize theread %s finish\n" % (id)) sys.stderr.write("Thread %s finish\n" % (id))
break break
tokenize_instance = tokenize(instance) tokenize_instance = tokenize(instance)
tokenize_queue.put((num_batch, tokenize_instance, pre_fix)) tokenize_queue.put((num_batch, tokenize_instance, pre_fix))
......
...@@ -14,10 +14,10 @@ ...@@ -14,10 +14,10 @@
# limitations under the License. # limitations under the License.
set -e set -e
wget http://www.cs.upc.edu/~srlconll/conll05st-tests.tar.gz wget http://www.cs.upc.edu/~srlconll/conll05st-tests.tar.gz
wget https://www.googledrive.com/host/0B7Q8d52jqeI9ejh6Q1RpMTFQT1k/semantic_role_labeling/verbDict.txt --no-check-certificate wget http://paddlepaddle.bj.bcebos.com/demo/srl_dict_and_embedding/verbDict.txt
wget https://www.googledrive.com/host/0B7Q8d52jqeI9ejh6Q1RpMTFQT1k/semantic_role_labeling/targetDict.txt --no-check-certificate wget http://paddlepaddle.bj.bcebos.com/demo/srl_dict_and_embedding/targetDict.txt
wget https://www.googledrive.com/host/0B7Q8d52jqeI9ejh6Q1RpMTFQT1k/semantic_role_labeling/wordDict.txt --no-check-certificate wget http://paddlepaddle.bj.bcebos.com/demo/srl_dict_and_embedding/wordDict.txt
wget https://www.googledrive.com/host/0B7Q8d52jqeI9ejh6Q1RpMTFQT1k/semantic_role_labeling/emb --no-check-certificate wget http://paddlepaddle.bj.bcebos.com/demo/srl_dict_and_embedding/emb
tar -xzvf conll05st-tests.tar.gz tar -xzvf conll05st-tests.tar.gz
rm conll05st-tests.tar.gz rm conll05st-tests.tar.gz
cp ./conll05st-release/test.wsj/words/test.wsj.words.gz . cp ./conll05st-release/test.wsj/words/test.wsj.words.gz .
......
...@@ -25,12 +25,13 @@ def hook(settings, word_dict, label_dict, predicate_dict, **kwargs): ...@@ -25,12 +25,13 @@ def hook(settings, word_dict, label_dict, predicate_dict, **kwargs):
#all inputs are integral and sequential type #all inputs are integral and sequential type
settings.slots = [ settings.slots = [
integer_value_sequence(len(word_dict)), integer_value_sequence(len(word_dict)),
integer_value_sequence(len(predicate_dict)),
integer_value_sequence(len(word_dict)), integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)), integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)), integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)), integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)), integer_value_sequence(2), integer_value_sequence(len(word_dict)),
integer_value_sequence(len(predicate_dict)),
integer_value_sequence(2),
integer_value_sequence(len(label_dict)) integer_value_sequence(len(label_dict))
] ]
...@@ -63,5 +64,5 @@ def process(settings, file_name): ...@@ -63,5 +64,5 @@ def process(settings, file_name):
label_list = label.split() label_list = label.split()
label_slot = [settings.label_dict.get(w) for w in label_list] label_slot = [settings.label_dict.get(w) for w in label_list]
yield word_slot, predicate_slot, ctx_n2_slot, ctx_n1_slot, \ yield word_slot, ctx_n2_slot, ctx_n1_slot, \
ctx_0_slot, ctx_p1_slot, ctx_p2_slot, mark_slot, label_slot ctx_0_slot, ctx_p1_slot, ctx_p2_slot, predicate_slot, mark_slot, label_slot
...@@ -55,18 +55,14 @@ class Prediction(): ...@@ -55,18 +55,14 @@ class Prediction():
slots = [ slots = [
integer_value_sequence(len_dict), integer_value_sequence(len_dict),
integer_value_sequence(len_pred),
integer_value_sequence(len_dict), integer_value_sequence(len_dict),
integer_value_sequence(len_dict), integer_value_sequence(len_dict),
integer_value_sequence(len_dict), integer_value_sequence(len_dict),
integer_value_sequence(len_dict), integer_value_sequence(len_dict),
integer_value_sequence(len_dict), integer_value_sequence(len_dict),
integer_value_sequence(len_pred),
integer_value_sequence(2) integer_value_sequence(2)
] ]
integer_value_sequence(len_dict), integer_value_sequence(len_dict),
integer_value_sequence(len_dict), integer_value_sequence(len_dict),
integer_value_sequence(len_dict), integer_value_sequence(2)
]
self.converter = DataProviderConverter(slots) self.converter = DataProviderConverter(slots)
def load_dict_label(self, dict_file, label_file, predicate_dict_file): def load_dict_label(self, dict_file, label_file, predicate_dict_file):
...@@ -104,8 +100,8 @@ class Prediction(): ...@@ -104,8 +100,8 @@ class Prediction():
marks = mark.split() marks = mark.split()
mark_slot = [int(w) for w in marks] mark_slot = [int(w) for w in marks]
yield word_slot, predicate_slot, ctx_n2_slot, ctx_n1_slot, \ yield word_slot, ctx_n2_slot, ctx_n1_slot, \
ctx_0_slot, ctx_p1_slot, ctx_p2_slot, mark_slot ctx_0_slot, ctx_p1_slot, ctx_p2_slot, predicate_slot, mark_slot
def predict(self, data_file, output_file): def predict(self, data_file, output_file):
""" """
......
...@@ -18,7 +18,7 @@ set -e ...@@ -18,7 +18,7 @@ set -e
function get_best_pass() { function get_best_pass() {
cat $1 | grep -Pzo 'Test .*\n.*pass-.*' | \ cat $1 | grep -Pzo 'Test .*\n.*pass-.*' | \
sed -r 'N;s/Test.* cost=([0-9]+\.[0-9]+).*\n.*pass-([0-9]+)/\1 \2/g' | \ sed -r 'N;s/Test.* cost=([0-9]+\.[0-9]+).*\n.*pass-([0-9]+)/\1 \2/g' | \
sort | head -n 1 sort -n | head -n 1
} }
log=train.log log=train.log
......
...@@ -18,7 +18,7 @@ set -e ...@@ -18,7 +18,7 @@ set -e
function get_best_pass() { function get_best_pass() {
cat $1 | grep -Pzo 'Test .*\n.*pass-.*' | \ cat $1 | grep -Pzo 'Test .*\n.*pass-.*' | \
sed -r 'N;s/Test.* cost=([0-9]+\.[0-9]+).*\n.*pass-([0-9]+)/\1 \2/g' |\ sed -r 'N;s/Test.* cost=([0-9]+\.[0-9]+).*\n.*pass-([0-9]+)/\1 \2/g' |\
sort | head -n 1 sort -n | head -n 1
} }
log=train.log log=train.log
......
...@@ -17,7 +17,7 @@ set -e ...@@ -17,7 +17,7 @@ set -e
function get_best_pass() { function get_best_pass() {
cat $1 | grep -Pzo 'Test .*\n.*pass-.*' | \ cat $1 | grep -Pzo 'Test .*\n.*pass-.*' | \
sed -r 'N;s/Test.* classification_error_evaluator=([0-9]+\.[0-9]+).*\n.*pass-([0-9]+)/\1 \2/g' |\ sed -r 'N;s/Test.* classification_error_evaluator=([0-9]+\.[0-9]+).*\n.*pass-([0-9]+)/\1 \2/g' |\
sort | head -n 1 sort -n | head -n 1
} }
log=train.log log=train.log
......
...@@ -16,9 +16,7 @@ set -e ...@@ -16,9 +16,7 @@ set -e
set -x set -x
# download the in-house paraphrase dataset # download the in-house paraphrase dataset
# following is the google drive address wget http://paddlepaddle.bj.bcebos.com/model_zoo/embedding/paraphrase.tar.gz
# you can also directly download from https://pan.baidu.com/s/1o8q577s
wget https://www.googledrive.com/host/0B7Q8d52jqeI9ejh6Q1RpMTFQT1k/embedding/paraphrase.tar.gz --no-check-certificate
# untar the dataset # untar the dataset
tar -zxvf paraphrase.tar.gz tar -zxvf paraphrase.tar.gz
......
...@@ -16,9 +16,7 @@ set -e ...@@ -16,9 +16,7 @@ set -e
set -x set -x
# download the pretrained model # download the pretrained model
# following is the google drive address wget http://paddlepaddle.bj.bcebos.com/model_zoo/wmt14_model.tar.gz
# you can also directly download from https://pan.baidu.com/s/1o8q577s
wget https://www.googledrive.com/host/0B7Q8d52jqeI9ejh6Q1RpMTFQT1k/wmt14_model.tar.gz --no-check-certificate
# untar the model # untar the model
tar -zxvf wmt14_model.tar.gz tar -zxvf wmt14_model.tar.gz
......
ABOUT
=======
PaddlPaddle is an easy-to-use, efficient, flexible and scalable deep learning platform,
which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.
PaddlePaddle is now open source but far from complete, which is intended to be built upon, improved, scaled, and extended.
We hope to build an active open source community both by providing feedback and by actively contributing to the source code.
Credits
--------
We owe many thanks to `all contributors and developers <https://github.com/PaddlePaddle/Paddle/blob/develop/authors>`_ of PaddlePaddle!
../../demo/sentiment_analysis/bi_lstm.jpg
\ No newline at end of file
../../demo/text_generation/encoder-decoder-attention-model.png
\ No newline at end of file
DataProvider Introduction Introduction
========================= ==============
DataProvider is a module that loads training or testing data into cpu or gpu DataProvider is a module that loads training or testing data into cpu or gpu
memory for the following triaining or testing process. memory for the following triaining or testing process.
......
How to use PyDataProvider2 PyDataProvider2
========================== =================
We highly recommand users to use PyDataProvider2 to provide training or testing We highly recommand users to use PyDataProvider2 to provide training or testing
data to PaddlePaddle. The user only needs to focus on how to read a single data to PaddlePaddle. The user only needs to focus on how to read a single
......
API
====
DataProvider API
----------------
.. toctree::
:maxdepth: 1
data_provider/index.rst
data_provider/pydataprovider2.rst
Model Config API
----------------
.. toctree::
:maxdepth: 1
trainer_config_helpers/index.rst
trainer_config_helpers/optimizers.rst
trainer_config_helpers/data_sources.rst
trainer_config_helpers/layers.rst
trainer_config_helpers/activations.rst
trainer_config_helpers/poolings.rst
trainer_config_helpers/networks.rst
trainer_config_helpers/evaluators.rst
trainer_config_helpers/attrs.rst
Applications API
----------------
.. toctree::
:maxdepth: 1
predict/swig_py_paddle_en.rst
\ No newline at end of file
Python Prediction API Python Prediction
===================== ==================
PaddlePaddle offers a set of clean prediction interfaces for python with the help of PaddlePaddle offers a set of clean prediction interfaces for python with the help of
SWIG. The main steps of predict values in python are: SWIG. The main steps of predict values in python are:
......
Parameter and Extra Layer Attribute Parameter Attributes
=================================== =======================
.. automodule:: paddle.trainer_config_helpers.attrs .. automodule:: paddle.trainer_config_helpers.attrs
:members: :members:
Cluster Train
====================
.. toctree::
:glob:
opensource/cluster_train.md
internal/index.md
Development Guide
=================
.. toctree::
:maxdepth: 1
layer.md
new_layer/new_layer.rst
../source/index.md
# Layer Documents
* [Layer Source Code Document](../source/gserver/layers/index.rst)
* [Layer Python API Document](../ui/api/trainer_config_helpers/index.rst)
# Introduction Basic Usage
=============
PaddlePaddle is a deep learning platform open-sourced by Baidu. With PaddlePaddle, you can easily train a classic neural network within a couple lines of configuration, or you can build sophisticated models that provide state-of-the-art performance on difficult learning tasks like sentiment analysis, machine translation, image caption and so on. PaddlePaddle is a deep learning platform open-sourced by Baidu. With PaddlePaddle, you can easily train a classic neural network within a couple lines of configuration, or you can build sophisticated models that provide state-of-the-art performance on difficult learning tasks like sentiment analysis, machine translation, image caption and so on.
## 1. A Classic Problem 1. A Classic Problem
---------------------
Now, to give you a hint of what using PaddlePaddle looks like, let's start with a fundamental learning problem - <a href="https://en.wikipedia.org/wiki/Simple_linear_regression">**simple linear regression**</a> : you have observed a set of two-dimensional data points of `X` and `Y`, where `X` is an explanatory variable and `Y` is corresponding dependent variable, and you want to recover the underlying correlation between `X` and `Y`. Linear regression can be used in many practical scenarios. For example, `X` can be a variable about house size, and `Y` a variable about house price. You can build a model that captures relationship between them by observing real estate markets. Now, to give you a hint of what using PaddlePaddle looks like, let's start with a fundamental learning problem - `simple linear regression <https://en.wikipedia.org/wiki/Simple_linear_regression>`_: you have observed a set of two-dimensional data points of ``X`` and ``Y``, where ``X`` is an explanatory variable and ``Y`` is corresponding dependent variable, and you want to recover the underlying correlation between ``X`` and ``Y``. Linear regression can be used in many practical scenarios. For example, ``X`` can be a variable about house size, and ``Y`` a variable about house price. You can build a model that captures relationship between them by observing real estate markets.
## 2. Prepare the Data 2. Prepare the Data
--------------------
Suppose the true relationship can be characterized as `Y = 2X + 0.3`, let's see how to recover this pattern only from observed data. Here is a piece of python code that feeds synthetic data to PaddlePaddle. The code is pretty self-explanatory, the only extra thing you need to add for PaddlePaddle is a definition of input data types. Suppose the true relationship can be characterized as ``Y = 2X + 0.3``, let's see how to recover this pattern only from observed data. Here is a piece of python code that feeds synthetic data to PaddlePaddle. The code is pretty self-explanatory, the only extra thing you need to add for PaddlePaddle is a definition of input data types.
```python .. code-block:: python
# dataprovider.py
from paddle.trainer.PyDataProvider2 import *
import random
# define data types of input: 2 real numbers # dataprovider.py
@provider(input_types=[dense_vector(1), dense_vector(1)],use_seq=False) from paddle.trainer.PyDataProvider2 import *
def process(settings, input_file): import random
# define data types of input: 2 real numbers
@provider(input_types=[dense_vector(1), dense_vector(1)],use_seq=False)
def process(settings, input_file):
for i in xrange(2000): for i in xrange(2000):
x = random.random() x = random.random()
yield [x], [2*x+0.3] yield [x], [2*x+0.3]
```
## 3. Train a NeuralNetwork in PaddlePaddle 3. Train a NeuralNetwork
-------------------------
To recover this relationship between ``X`` and ``Y``, we use a neural network with one layer of linear activation units and a square error cost layer. Don't worry if you are not familiar with these terminologies, it's just saying that we are starting from a random line ``Y' = wX + b`` , then we gradually adapt ``w`` and ``b`` to minimize the difference between ``Y'`` and ``Y``. Here is what it looks like in PaddlePaddle:
To recover this relationship between `X` and `Y`, we use a neural network with one layer of linear activation units and a square error cost layer. Don't worry if you are not familiar with these terminologies, it's just saying that we are starting from a random line `Y' = wX + b` , then we gradually adapt `w` and `b` to minimize the difference between `Y'` and `Y`. Here is what it looks like in PaddlePaddle: .. code-block:: python
```python # trainer_config.py
# trainer_config.py from paddle.trainer_config_helpers import *
from paddle.trainer_config_helpers import *
# 1. read data. Suppose you saved above python code as dataprovider.py # 1. read data. Suppose you saved above python code as dataprovider.py
data_file = 'empty.list' data_file = 'empty.list'
with open(data_file, 'w') as f: f.writelines(' ') with open(data_file, 'w') as f: f.writelines(' ')
define_py_data_sources2(train_list=data_file, test_list=None, define_py_data_sources2(train_list=data_file, test_list=None,
module='dataprovider', obj='process',args={}) module='dataprovider', obj='process',args={})
# 2. learning algorithm # 2. learning algorithm
settings(batch_size=12, learning_rate=1e-3, learning_method=MomentumOptimizer()) settings(batch_size=12, learning_rate=1e-3, learning_method=MomentumOptimizer())
# 3. Network configuration # 3. Network configuration
x = data_layer(name='x', size=1) x = data_layer(name='x', size=1)
y = data_layer(name='y', size=1) y = data_layer(name='y', size=1)
y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b')) y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b'))
cost = regression_cost(input=y_predict, label=y) cost = regression_cost(input=y_predict, label=y)
outputs(cost) outputs(cost)
```
Some of the most fundamental usages of PaddlePaddle are demonstrated: Some of the most fundamental usages of PaddlePaddle are demonstrated:
...@@ -55,46 +59,51 @@ Some of the most fundamental usages of PaddlePaddle are demonstrated: ...@@ -55,46 +59,51 @@ Some of the most fundamental usages of PaddlePaddle are demonstrated:
- The second part describes learning algorithm. It defines in what ways adjustments are made to model parameters. PaddlePaddle provides a rich set of optimizers, but a simple momentum based optimizer will suffice here, and it processes 12 data points each time. - The second part describes learning algorithm. It defines in what ways adjustments are made to model parameters. PaddlePaddle provides a rich set of optimizers, but a simple momentum based optimizer will suffice here, and it processes 12 data points each time.
- Finally, the network configuration. It usually is as simple as "stacking" layers. Three kinds of layers are used in this configuration: - Finally, the network configuration. It usually is as simple as "stacking" layers. Three kinds of layers are used in this configuration:
- **Data Layer**: a network always starts with one or more data layers. They provide input data to the rest of the network. In this problem, two data layers are used respectively for `X` and `Y`. - **Data Layer**: a network always starts with one or more data layers. They provide input data to the rest of the network. In this problem, two data layers are used respectively for ``X`` and ``Y``.
- **FC Layer**: FC layer is short for Fully Connected Layer, which connects all the input units to current layer and does the actual computation specified as activation function. Computation layers like this are the fundamental building blocks of a deeper model. - **FC Layer**: FC layer is short for Fully Connected Layer, which connects all the input units to current layer and does the actual computation specified as activation function. Computation layers like this are the fundamental building blocks of a deeper model.
- **Cost Layer**: in training phase, cost layers are usually the last layers of the network. They measure the performance of current model, and provide guidence to adjust parameters. - **Cost Layer**: in training phase, cost layers are usually the last layers of the network. They measure the performance of current model, and provide guidence to adjust parameters.
Now that everything is ready, you can train the network with a simple command line call: Now that everything is ready, you can train the network with a simple command line call:
```
.. code-block:: bash
paddle train --config=trainer_config.py --save_dir=./output --num_passes=30 paddle train --config=trainer_config.py --save_dir=./output --num_passes=30
```
This means that PaddlePaddle will train this network on the synthectic dataset for 30 passes, and save all the models under path `./output`. You will see from the messages printed out during training phase that the model cost is decreasing as time goes by, which indicates we are getting a closer guess.
This means that PaddlePaddle will train this network on the synthectic dataset for 30 passes, and save all the models under path ``./output``. You will see from the messages printed out during training phase that the model cost is decreasing as time goes by, which indicates we are getting a closer guess.
4. Evaluate the Model
-----------------------
## 4. Evaluate the Model Usually, a different dataset that left out during training phase should be used to evalute the models. However, we are lucky enough to know the real answer: ``w=2, b=0.3``, thus a better option is to check out model parameters directly.
Usually, a different dataset that left out during training phase should be used to evalute the models. However, we are lucky enough to know the real answer: `w=2, b=0.3`, thus a better option is to check out model parameters directly. In PaddlePaddle, training is just to get a collection of model parameters, which are ``w`` and ``b`` in this case. Each parameter is saved in an individual file in the popular ``numpy`` array format. Here is the code that reads parameters from last pass.
In PaddlePaddle, training is just to get a collection of model parameters, which are `w` and `b` in this case. Each parameter is saved in an individual file in the popular `numpy` array format. Here is the code that reads parameters from last pass. .. code-block:: python
```python import numpy as np
import numpy as np import os
import os
def load(file_name): def load(file_name):
with open(file_name, 'rb') as f: with open(file_name, 'rb') as f:
f.read(16) # skip header for float type. f.read(16) # skip header for float type.
return np.fromfile(f, dtype=np.float32) return np.fromfile(f, dtype=np.float32)
print 'w=%.6f, b=%.6f' % (load('output/pass-00029/w'), load('output/pass-00029/b')) print 'w=%.6f, b=%.6f' % (load('output/pass-00029/w'), load('output/pass-00029/b'))
# w=1.999743, b=0.300137 # w=1.999743, b=0.300137
```
<center> ![](./parameters.png) </center> .. image:: parameters.png
:align: center
Although starts from a random guess, you can see that value of `w` changes quickly towards 2 and `b` changes quickly towards 0.3. In the end, the predicted line is almost identical with real answer. Although starts from a random guess, you can see that value of ``w`` changes quickly towards 2 and ``b`` changes quickly towards 0.3. In the end, the predicted line is almost identical with real answer.
There, you have recovered the underlying pattern between `X` and `Y` only from observed data. There, you have recovered the underlying pattern between ``X`` and ``Y`` only from observed data.
## 5. Where to Go from Here 5. Where to Go from Here
-------------------------
- <a href="../build/index.html"> Build and Installation </a> - `Install and Build <../build_and_install/index.html>`_
- <a href="../demo/quick_start/index_en.html">Quick Start</a> - `Tutorials <../demo/quick_start/index_en.html>`_
- <a href="../demo/index.html">Example and Demo</a> - `Example and Demo <../demo/index.html>`_
...@@ -95,7 +95,7 @@ As a simple example, consider the following: ...@@ -95,7 +95,7 @@ As a simple example, consider the following:
```bash ```bash
# necessary # necessary
sudo apt-get update sudo apt-get update
sudo apt-get install -y g++ make cmake build-essential libatlas-base-dev python python-pip libpython-dev m4 libprotobuf-dev protobuf-compiler python-protobuf python-numpy git sudo apt-get install -y g++ make cmake swig build-essential libatlas-base-dev python python-pip libpython-dev m4 libprotobuf-dev protobuf-compiler python-protobuf python-numpy git
# optional # optional
sudo apt-get install libgoogle-glog-dev sudo apt-get install libgoogle-glog-dev
sudo apt-get install libgflags-dev sudo apt-get install libgflags-dev
...@@ -149,15 +149,15 @@ If still not found, you can manually set it based on CMake error information fro ...@@ -149,15 +149,15 @@ If still not found, you can manually set it based on CMake error information fro
As a simple example, consider the following: As a simple example, consider the following:
- **Only CPU** - **Only CPU with swig**
```bash ```bash
cmake .. -DWITH_GPU=OFF cmake .. -DWITH_GPU=OFF -DWITH_SWIG_PY=ON
``` ```
- **GPU** - **GPU with swig**
```bash ```bash
cmake .. -DWITH_GPU=ON cmake .. -DWITH_GPU=ON -DWITH_SWIG_PY=ON
``` ```
- **GPU with doc and swig** - **GPU with doc and swig**
...@@ -170,15 +170,13 @@ Finally, you can build PaddlePaddle: ...@@ -170,15 +170,13 @@ Finally, you can build PaddlePaddle:
```bash ```bash
# you can add build option here, such as: # you can add build option here, such as:
cmake .. -DWITH_GPU=ON -DCMAKE_INSTALL_PREFIX=<path to install> cmake .. -DWITH_GPU=ON -DCMAKE_INSTALL_PREFIX=<path to install> -DWITH_SWIG_PY=ON
# please use sudo make install, if you want to install PaddlePaddle into the system # please use sudo make install, if you want to install PaddlePaddle into the system
make -j `nproc` && make install make -j `nproc` && make install
# set PaddlePaddle installation path in ~/.bashrc # set PaddlePaddle installation path in ~/.bashrc
export PATH=<path to install>/bin:$PATH export PATH=<path to install>/bin:$PATH
``` ```
**Note:**
If you set `WITH_SWIG_PY=ON`, related python dependencies also need to be installed. If you set `WITH_SWIG_PY=ON`, related python dependencies also need to be installed.
Otherwise, PaddlePaddle will automatically install python dependencies Otherwise, PaddlePaddle will automatically install python dependencies
at first time when user run paddle commands, such as `paddle version`, `paddle train`. at first time when user run paddle commands, such as `paddle version`, `paddle train`.
......
...@@ -8,8 +8,6 @@ Install PaddlePaddle ...@@ -8,8 +8,6 @@ Install PaddlePaddle
:maxdepth: 1 :maxdepth: 1
:glob: :glob:
install_*
internal/install_from_jumbo.md
docker_install.rst docker_install.rst
ubuntu_install.rst ubuntu_install.rst
...@@ -25,4 +23,3 @@ Build from Source ...@@ -25,4 +23,3 @@ Build from Source
:glob: :glob:
build_from_source.md build_from_source.md
\ No newline at end of file
contribute_to_paddle.md
GET STARTED
============
.. toctree::
:maxdepth: 2
build_and_install/index.rst
basic_usage/basic_usage.rst
# Distributed Training # How to Run Distributed Training
In this article, we explain how to run distributed Paddle training jobs on clusters. We will create the distributed version of the single-process training example, [recommendation](https://github.com/baidu/Paddle/tree/develop/demo/recommendation). In this article, we explain how to run distributed Paddle training jobs on clusters. We will create the distributed version of the single-process training example, [recommendation](https://github.com/baidu/Paddle/tree/develop/demo/recommendation).
...@@ -9,7 +9,7 @@ In this article, we explain how to run distributed Paddle training jobs on clust ...@@ -9,7 +9,7 @@ In this article, we explain how to run distributed Paddle training jobs on clust
1. Aforementioned scripts use a Python library [fabric](http://www.fabfile.org/) to run SSH commands. We can use `pip` to install fabric: 1. Aforementioned scripts use a Python library [fabric](http://www.fabfile.org/) to run SSH commands. We can use `pip` to install fabric:
```bash ```bash
pip install fabric pip install fabric
``` ```
1. We need to install PaddlePaddle on all nodes in the cluster. To enable GPUs, we need to install CUDA in `/usr/local/cuda`; otherwise Paddle would report errors at runtime. 1. We need to install PaddlePaddle on all nodes in the cluster. To enable GPUs, we need to install CUDA in `/usr/local/cuda`; otherwise Paddle would report errors at runtime.
......
# How to Set Command-line Parameters
* [Use Case](use_case.md)
* [Arguments](arguments.md)
* [Detailed Descriptions](detail_introduction.md)
# Contribute Code # How to Contribute Code
We sincerely appreciate your contributions. You can use fork and pull request We sincerely appreciate your contributions. You can use fork and pull request
workflow to merge your code. workflow to merge your code.
......
Algorithm Tutorial How to Configure Deep Models
================== ============================
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
......
...@@ -42,7 +42,7 @@ Simple Gated Recurrent Neural Network ...@@ -42,7 +42,7 @@ Simple Gated Recurrent Neural Network
Recurrent neural network process a sequence at each time step sequentially. An example of the architecture of LSTM is listed below. Recurrent neural network process a sequence at each time step sequentially. An example of the architecture of LSTM is listed below.
.. image:: ./bi_lstm.jpg .. image:: ../../../tutorials/sentiment_analysis/bi_lstm.jpg
:align: center :align: center
Generally speaking, a recurrent network perform the following operations from :math:`t=1` to :math:`t=T`, or reversely from :math:`t=T` to :math:`t=1`. Generally speaking, a recurrent network perform the following operations from :math:`t=1` to :math:`t=T`, or reversely from :math:`t=T` to :math:`t=1`.
...@@ -101,7 +101,7 @@ Sequence to Sequence Model with Attention ...@@ -101,7 +101,7 @@ Sequence to Sequence Model with Attention
----------------------------------------- -----------------------------------------
We will use the sequence to sequence model with attention as an example to demonstrate how you can configure complex recurrent neural network models. An illustration of the sequence to sequence model with attention is shown in the following figure. We will use the sequence to sequence model with attention as an example to demonstrate how you can configure complex recurrent neural network models. An illustration of the sequence to sequence model with attention is shown in the following figure.
.. image:: ./encoder-decoder-attention-model.png .. image:: ../../../tutorials/text_generation/encoder-decoder-attention-model.png
:align: center :align: center
In this model, the source sequence :math:`S = \{s_1, \dots, s_T\}` is encoded with a bidirectional gated recurrent neural networks. The hidden states of the bidirectional gated recurrent neural network :math:`H_S = \{H_1, \dots, H_T\}` is called *encoder vector* The decoder is a gated recurrent neural network. When decoding each token :math:`y_t`, the gated recurrent neural network generates a set of weights :math:`W_S^t = \{W_1^t, \dots, W_T^t\}`, which are used to compute a weighted sum of the encoder vector. The weighted sum of the encoder vector is utilized to condition the generation of the token :math:`y_t`. In this model, the source sequence :math:`S = \{s_1, \dots, s_T\}` is encoded with a bidirectional gated recurrent neural networks. The hidden states of the bidirectional gated recurrent neural network :math:`H_S = \{H_1, \dots, H_T\}` is called *encoder vector* The decoder is a gated recurrent neural network. When decoding each token :math:`y_t`, the gated recurrent neural network generates a set of weights :math:`W_S^t = \{W_1^t, \dots, W_T^t\}`, which are used to compute a weighted sum of the encoder vector. The weighted sum of the encoder vector is utilized to condition the generation of the token :math:`y_t`.
......
HOW TO
=======
Usage
-------
.. toctree::
:maxdepth: 1
cmd_parameter/index.md
deep_model/index.rst
cluster/cluster_train.md
Development
------------
.. toctree::
:maxdepth: 1
new_layer/index.rst
contribute_to_paddle.md
Optimization
-------------
.. toctree::
:maxdepth: 1
optimization/index.rst
================== =======================
Writing New Layers How to Write New Layers
================== =======================
This tutorial will guide you to write customized layers in PaddlePaddle. We will utilize fully connected layer as an example to guide you through the following steps for writing a new layer. This tutorial will guide you to write customized layers in PaddlePaddle. We will utilize fully connected layer as an example to guide you through the following steps for writing a new layer.
......
Profiling on PaddlePaddle
=========================
This tutorial will guide you step-by-step through how to conduct profiling and performance tuning using built-in timer, **nvprof** and **nvvp**.
- What is profiling?
- Why we need profiling?
- How to do profiling?
- Profile tools
- Hands-on Tutorial
- Profiling tips
What's profiling?
=================
In software engineering, profiling is a form of dynamic program analysis that measures the space (memory) or time
complexity of a program, the usage of particular instructions, or the frequency and duration of function calls.
Most commonly, profiling information serves to aid program optimization.
Briefly, profiler is used to measure application performance. Program analysis tools are extremely important for
understanding program behavior. Simple profiling can tell you that how long does an operation take? For advanced
profiling, it can interpret why does an operation take a long time?
Why we need profiling?
======================
Since training deep neural network typically take a very long time to get over, performance is gradually becoming
the most important thing in deep learning field. The first step to improve performance is to understand what parts
are slow. There is no point in improving performance of a region which doesn’t take much time!
How to do profiling?
====================
To achieve maximum performance, there are five steps you can take to reach your goals.
- Profile the code
- Find the slow parts
- Work out why they’re slow
- Make them fast
- Profile the code again
Usually, processor has two key performance limits include float point throughput and
memory throughput. For GPU, it also need more parallelism to fulfill its potential.
This is why they can be so fast.
Profiler Tools
==============
For general GPU profiling, a bunch of tools are provided from both NVIDIA and third party.
**nvprof** is Nvidia profiler and **nvvp** is (GUI based) Nvidia visual profiler.
In this tutorial, we will focus on nvprof and nvvp.
:code:`test_GpuProfiler` from :code:`paddle/math/tests` directory will be used to evaluate
above profilers.
.. literalinclude:: ../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++
:lines: 111-124
:linenos:
The above code snippet includes two methods, you can use any of them to profile the regions of interest.
1. :code:`REGISTER_TIMER_INFO` is a built-in timer wrapper which can calculate the time overhead of both cpu functions and cuda kernels.
2. :code:`REGISTER_GPU_PROFILER` is a general purpose wrapper object of :code:`cudaProfilerStart` and :code:`cudaProfilerStop` to avoid
program crashes when CPU version of PaddlePaddle invokes them.
You can find more details about how to use both of them in the next session.
Hands-on Approach
=================
Built-in Timer
--------------
To enable built-in timer in PaddlePaddle, first you have to add :code:`REGISTER_TIMER_INFO` into the regions of you interest.
Then, all information could be stamped in the console via :code:`printStatus` or :code:`printAllStatus` function.
As a simple example, consider the following:
1. Add :code:`REGISTER_TIMER_INFO` and :code:`printAllStatus` functions (see the emphasize-lines).
.. literalinclude:: ../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++
:lines: 111-124
:emphasize-lines: 8-10,13
:linenos:
2. Configure cmake with **WITH_TIMER** and recompile PaddlePaddle.
.. code-block:: bash
cmake .. -DWITH_TIMER=ON
make
3. Execute your code and observe the results (see the emphasize-lines).
.. code-block:: bash
:emphasize-lines: 1,12-15
> ./paddle/math/tests/test_GpuProfiler
I1117 11:13:42.313065 2522362816 Util.cpp:155] commandline: ./paddle/math/tests/test_GpuProfiler
I1117 11:13:42.845065 2522362816 Util.cpp:130] Calling runInitFunctions
I1117 11:13:42.845208 2522362816 Util.cpp:143] Call runInitFunctions done.
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Profiler
[ RUN ] Profiler.BilinearFwdBwd
I1117 11:13:42.845310 2522362816 test_GpuProfiler.cpp:114] Enable GPU Profiler Stat: [testBilinearFwdBwd] "numSamples = 10, channels = 16, im
gSizeX = 64, imgSizeY = 64"
I1117 11:13:42.850154 2522362816 ThreadLocal.cpp:37] thread use undeterministic rand seed:20659751
I1117 11:13:42.981501 2522362816 Stat.cpp:130] ======= StatSet: [GlobalStatInfo] status ======
I1117 11:13:42.981539 2522362816 Stat.cpp:133] Stat=testBilinearFwdBwd total=136.141 avg=136.141 max=136.141 min=136.141 count=1
I1117 11:13:42.981572 2522362816 Stat.cpp:141] ======= BarrierStatSet status ======
I1117 11:13:42.981575 2522362816 Stat.cpp:154] --------------------------------------------------
[ OK ] Profiler.BilinearFwdBwd (136 ms)
[----------] 1 test from Profiler (136 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (136 ms total)
[ PASSED ] 1 test.
nvprof profiler
---------------
To use this command line profiler **nvprof**, you can simply issue the following command:
1. Add :code:`REGISTER_GPU_PROFILER` function (see the emphasize-lines).
.. literalinclude:: ../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++
:lines: 111-124
:emphasize-lines: 6-7
:linenos:
2. Configure cmake with **WITH_PROFILER** and recompile PaddlePaddle.
.. code-block:: bash
cmake .. -DWITH_PROFILER=ON
make
3. Use Nvidia profiler **nvprof** to profile the binary.
.. code-block:: bash
nvprof ./paddle/math/tests/test_GpuProfiler
Then, you can get the following profiling result:
.. code-block:: bash
==78544== Profiling application: ./paddle/math/tests/test_GpuProfiler
==78544== Profiling result:
Time(%) Time Calls Avg Min Max Name
27.60% 9.6305ms 5 1.9261ms 3.4560us 6.4035ms [CUDA memcpy HtoD]
26.07% 9.0957ms 1 9.0957ms 9.0957ms 9.0957ms KeBilinearInterpBw
23.78% 8.2977ms 1 8.2977ms 8.2977ms 8.2977ms KeBilinearInterpFw
22.55% 7.8661ms 2 3.9330ms 1.5798ms 6.2863ms [CUDA memcpy DtoH]
==78544== API calls:
Time(%) Time Calls Avg Min Max Name
46.85% 682.28ms 8 85.285ms 12.639us 682.03ms cudaStreamCreateWithFlags
39.83% 580.00ms 4 145.00ms 302ns 550.27ms cudaFree
9.82% 143.03ms 9 15.892ms 8.7090us 142.78ms cudaStreamCreate
1.23% 17.983ms 7 2.5690ms 23.210us 6.4563ms cudaMemcpy
1.23% 17.849ms 2 8.9247ms 8.4726ms 9.3768ms cudaStreamSynchronize
0.66% 9.5969ms 7 1.3710ms 288.43us 2.4279ms cudaHostAlloc
0.13% 1.9530ms 11 177.54us 7.6810us 591.06us cudaMalloc
0.07% 1.0424ms 8 130.30us 1.6970us 453.72us cudaGetDevice
0.04% 527.90us 40 13.197us 525ns 253.99us cudaEventCreateWithFlags
0.03% 435.73us 348 1.2520us 124ns 42.704us cuDeviceGetAttribute
0.03% 419.36us 1 419.36us 419.36us 419.36us cudaGetDeviceCount
0.02% 260.75us 2 130.38us 129.32us 131.43us cudaGetDeviceProperties
0.02% 222.32us 2 111.16us 106.94us 115.39us cudaLaunch
0.01% 214.06us 4 53.514us 28.586us 77.655us cuDeviceGetName
0.01% 115.45us 4 28.861us 9.8250us 44.526us cuDeviceTotalMem
0.01% 83.988us 4 20.997us 578ns 77.760us cudaSetDevice
0.00% 38.918us 1 38.918us 38.918us 38.918us cudaEventCreate
0.00% 34.573us 31 1.1150us 279ns 12.784us cudaDeviceGetAttribute
0.00% 17.767us 1 17.767us 17.767us 17.767us cudaProfilerStart
0.00% 15.228us 2 7.6140us 3.5460us 11.682us cudaConfigureCall
0.00% 14.536us 2 7.2680us 1.1490us 13.387us cudaGetLastError
0.00% 8.6080us 26 331ns 173ns 783ns cudaSetupArgument
0.00% 5.5470us 6 924ns 215ns 2.6780us cuDeviceGet
0.00% 5.4090us 6 901ns 328ns 3.3320us cuDeviceGetCount
0.00% 4.1770us 3 1.3920us 1.0630us 1.8300us cuDriverGetVersion
0.00% 3.4650us 3 1.1550us 1.0810us 1.2680us cuInit
0.00% 830ns 1 830ns 830ns 830ns cudaRuntimeGetVersion
nvvp profiler
-------------
For visual profiler **nvvp**, you can either import the output of :code:`nvprof –o ...` or
run application through GUI.
**Note: nvvp also support CPU profiling** (Click the box in nvvp to enable profile execution on CPU).
.. image:: nvvp1.png
:align: center
:scale: 33%
From the perspective of kernel functions, **nvvp** can even illustrate why does an operation take a long time?
As shown in the following figure, kernel's block usage, register usage and shared memory usage from :code:`nvvp`
allow us to fully utilize all warps on the GPU.
.. image:: nvvp2.png
:align: center
:scale: 33%
From the perspective of application, **nvvp** can give you some suggestions to address performance bottleneck.
For instance, some advice in data movement and compute utilization from the below figure can guide you to tune performance.
.. image:: nvvp3.png
:align: center
:scale: 33%
.. image:: nvvp4.png
:align: center
:scale: 33%
Profiling tips
==============
- The **nvprof** and **nvvp** output is a very good place to start.
- The timeline is a good place to go next.
- Only dig deep into a kernel if it’s taking a significant amount of your time.
- Where possible, try to match profiler output with theory.
1) For example, if I know I’m moving 1GB, and my kernel takes 10ms, I expect the profiler to report 100GB/s.
2) Discrepancies are likely to mean your application isn’t doing what you thought it was.
- Know your hardware: If your GPU can do 6 TFLOPs, and you’re already doing 5.5 TFLOPs, you won’t go much faster!
Profiling is a key step in optimization. Sometimes quite simple changes can lead to big improvements in performance.
Your mileage may vary!
Reference
=========
Jeremy Appleyard, `GPU Profiling for Deep Learning <http://www.robots.ox.ac.uk/~seminars/seminars/Extra/2015_10_08_JeremyAppleyard.pdf>`_, 2015
How to Tune GPU Performance
===========================
.. toctree::
:maxdepth: 3
gpu_profiling.rst
...@@ -4,7 +4,9 @@ PaddlePaddle Documentation ...@@ -4,7 +4,9 @@ PaddlePaddle Documentation
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
introduction/index.md getstarted/index.rst
user_guide.rst tutorials/index.md
dev/index.rst howto/index.rst
algorithm/index.rst api/index.rst
about/index.rst
\ No newline at end of file
../../doc_cn/introduction/parameters.png
\ No newline at end of file
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册