From e70839a0e42029f55178200ec9bfa4e3c7f839f1 Mon Sep 17 00:00:00 2001
From: qingqing01 <dangqingqing@baidu.com>
Date: Mon, 1 Jul 2019 13:08:32 +0800
Subject: [PATCH] Update README.md (#2628)

* Update README.md and GETTING_STARTED.md for PaddleDetection.
---
 README.md               | 20 ++++++++++++--------
 docs/GETTING_STARTED.md | 11 ++++++++---
 2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/README.md b/README.md
index 8fa34d76a..6777205de 100644
--- a/README.md
+++ b/README.md
@@ -13,23 +13,26 @@ flexible, catering to research needs.
 
 ## Introduction
 
-Design Principles:
+Features:
 
 - Production Ready:
-Key operations are implemented in C++ and CUDA, together with PaddlePaddle's
+
+  Key operations are implemented in C++ and CUDA, together with PaddlePaddle's
 highly efficient inference engine, enables easy deployment in server environments.
 
 - Highly Flexible:
-Components are designed to be modular. Model architectures, as well as data
+
+  Components are designed to be modular. Model architectures, as well as data
 preprocess pipelines, can be easily customized with simple configuration
 changes.
 
 - Performance Optimized:
-With the help of the underlying PaddlePaddle framework, faster training and
+
+  With the help of the underlying PaddlePaddle framework, faster training and
 reduced GPU memory footprint is achieved. Notably, Yolo V3 training is
 much faster compared to other frameworks. Another example is Mask-RCNN
-(ResNet50), we managed to fit up to 5 images per GPU (V100 16GB) during
-training.
+(ResNet50), we managed to fit up to 4 images per GPU (Tesla V100 16GB) during
+multi-GPU training.
 
 Supported Architectures:
 
@@ -44,7 +47,7 @@ Supported Architectures:
 | Yolov3             | ✓      |                             ✗ | ✗       | ✗     | ✓         | ✓       |
 | SSD                | ✗      |                             ✗ | ✗       | ✗     | ✓         | ✗       |
 
-<a name="vd">[1]</a> ResNet-vd models offer much improved accuracy with negligible performance cost.
+<a name="vd">[1]</a> [ResNet-vd](https://arxiv.org/pdf/1812.01187) models offer much improved accuracy with negligible performance cost.
 
 Advanced Features:
 
@@ -67,7 +70,7 @@ Please follow the [installation guide](docs/INSTALL.md).
 ## Get Started
 
 For inference, simply run the following command and the visualized result will
-be saved in `output/`.
+be saved in `output`.
 
 ```bash
 export PYTHONPATH=`pwd`:$PYTHONPATH
@@ -102,6 +105,7 @@ Some of the planned features include:
 ## Updates
 
 #### Initial release (7/3/2019)
+
 - Initial release of PaddleDetection and detection model zoo
 - Models included: Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
   R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, Yolo v3, and SSD.
diff --git a/docs/GETTING_STARTED.md b/docs/GETTING_STARTED.md
index b610f5a68..1612500b2 100644
--- a/docs/GETTING_STARTED.md
+++ b/docs/GETTING_STARTED.md
@@ -75,8 +75,13 @@ path, simply add a `--save_file=` flag.
 
 ## FAQ
 
+**Q:**  Why do I get `NaN` loss values during single GPU training? </br>
+**A:**  The default learning rate is tuned to multi-GPU training (8x GPUs), it must
+be adapted for single GPU training accordingly (e.g., divide by 8).
 
-Q: Why do I get `NaN` loss values during single GPU training?
 
-A: The default learning rate is tuned to multi-GPU training (8x GPUs), it must
-be adapted for single GPU training accordingly (e.g., divide by 8).
+**Q:**  How to reduce GPU memory usage? </br>
+**A:**  Setting environment variable FLAGS_conv_workspace_size_limit to a smaller
+number can reduce GPU memory footprint without affecting training speed.
+Take Mask-RCNN (R50) as example, by setting `export FLAGS_conv_workspace_size_limit=512`,
+batch size could reach 4 per GPU (Tesla V100 16GB).
-- 
GitLab