CPU rendering added

3d2c0e2f · gineshidalgo99 · 510f34c4 · 3d2c0e2f · 3d2c0e2f · 3d2c0e2f
74 changed file
--- a/.github/issue_template.md
+++ b/.github/issue_template.md
+### Posting rules
+1. **No duplicated** posts.
+2. **No** posts about **questions already answered in** the **documentation** (e.g. **no more low-speed nor out-of-memory questions**).
+3. **Add** the **system configuration (all of it!), command and output** if you have some kind of error or performance question.
+4. Set a **proper issue title**: add the Ubuntu/Windows word and be specific (e.g. do not simple call it: `Compile error`).
+5. Only English comments.
+Issues/comments that do not follow this will be **ignored or removed** with no further clarification.
+
+
+
 ### Issue summary


+
 ### Executed command (if any)


+
 ### OpenPose output (if any)


+
 ### Type of issue
 You might select multiple topics, delete the rest:
 - Compilation/installation error
@@ -16,6 +29,8 @@ You might select multiple topics, delete the rest:
 - Enhancement / offering possible extensions / pull request / etc
 - Other (type your own type)

+
+
 ### Your system configuration
 **Operating system** (`lsb_release -a` on Ubuntu):
 **CUDA version** (`cat /usr/local/cuda/version.txt` in most cases):

--- a/README.md
+++ b/README.md
@@ -6,6 +6,7 @@ OpenPose
 - May 2017: Windows version released!
 - Jun 2017: Face released!
 - Check all the [release notes](doc/release_notes.md).
+- Interested in an internship on CMU as OpenPose programmer? See [this link](https://docs.google.com/document/d/14SygG39NjIRZfx08clewTdFMGwVdtRu2acyCi3TYcHs/edit?usp=sharing) for details.




--- a/doc/GUI_help/GUI_help.odt
+++ b/doc/GUI_help/GUI_help.odt
--- a/doc/GUI_help/GUI_help.png
+++ b/doc/GUI_help/GUI_help.png
--- a/doc/UML/1_0_0rc3/UML.mdj
+++ b/doc/UML/1_0_0rc3/UML.mdj
--- a/doc/UML/1_0_0rc3/UML.pdf
+++ b/doc/UML/1_0_0rc3/UML.pdf
--- a/doc/demo_overview.md
+++ b/doc/demo_overview.md
@@ -44,13 +44,13 @@ Each flag is divided into flag name, default value, and description.
 - DEFINE_int32(part_to_show,              0,              "Part to show from the start.");
 - DEFINE_bool(disable_blending,           false,          "If blending is enabled, it will merge the results with the original frame. If disabled, it will only display the results.");
 8. OpenPose Rendering Pose
- DEFINE_bool(no_render_pose,             false,          "If false, it will fill both `outputData` and `cvOutputData` with the original image + desired part to be shown. If true, it will leave them empty.");
- DEFINE_double(alpha_pose,               0.6,            "Blending factor (range 0-1) for the body part rendering. 1 will show it completely, 0 will hide it.");
- DEFINE_double(alpha_heatmap,            0.7,            "Blending factor (range 0-1) between heatmap and original frame. 1 will only show the heatmap, 0 will only show the frame.");
+- DEFINE_int32(render_pose,               1,              "Set to 0 for no rendering, 1 for CPU rendering (slightly faster), and 2 for GPU rendering (slower but greater functionality, e.g. `alpha_X` flags). If rendering is enabled, it will render both `outputData` and `cvOutputData` with the original image and desired body part to be shown (i.e. keypoints, heat maps or PAFs).");
+- DEFINE_double(alpha_pose,               0.6,            "Blending factor (range 0-1) for the body part rendering. 1 will show it completely, 0 will hide it. Only valid for GPU rendering.");
+- DEFINE_double(alpha_heatmap,            0.7,            "Blending factor (range 0-1) between heatmap and original frame. 1 will only show the heatmap, 0 will only show the frame. Only valid for GPU rendering.");
 9. OpenPose Rendering Face
- DEFINE_bool(no_render_face,             false,          "Analogous to `no_render_pose` but applied to the face keypoints and heat maps.");
- DEFINE_double(alpha_face,               0.6,            "Blending factor (range 0-1) for the body part rendering. 1 will show it completely, 0 will hide it.");
- DEFINE_double(alpha_heatmap_face,       0.7,            "Blending factor (range 0-1) between heatmap and original frame. 1 will only show the heatmap, 0 will only show the frame.");
+- DEFINE_int32(render_face,               -1,             "Analogous to `render_pose` but applied to the face. Extra option: -1 to use the same configuration that `render_pose` is using.");
+- DEFINE_double(alpha_face,               0.6,            "Analogous to `alpha_pose` but applied to face.");
+- DEFINE_double(alpha_heatmap_face,       0.7,            "Analogous to `alpha_heatmap` but applied to face.");
 10. Display
 - DEFINE_bool(fullscreen,                 false,          "Run in full-screen mode (press f during runtime to toggle).");
 - DEFINE_bool(process_real_time,          false,          "Enable to keep the original source frame rate (e.g. for video). If the processing time is too long, it will skip frames. If it is too fast, it will slow it down.");
@@ -60,11 +60,11 @@ Each flag is divided into flag name, default value, and description.
 - DEFINE_string(write_images,             "",             "Directory to write rendered frames in `write_images_format` image format.");
 - DEFINE_string(write_images_format,      "png",          "File extension and format for `write_images`, e.g. png, jpg or bmp. Check the OpenCV function cv::imwrite for all compatible extensions.");
 - DEFINE_string(write_video,              "",             "Full file path to write rendered frames in motion JPEG video format. It might fail if the final path does not finish in `.avi`. It internally uses cv::VideoWriter.");
- DEFINE_string(write_keypoint,           "",             "Directory to write the people body pose keypoint data. Desired format on `write_keypoint_format`.");
- DEFINE_string(write_keypoint_format,    "yml",          "File extension and format for `write_keypoint`: json, xml, yaml and yml. Json not available for OpenCV < 3.0, use `write_keypoint_json` instead.");
- DEFINE_string(write_keypoint_json,      "",             "Directory to write people pose data with *.json format, compatible with any OpenCV version.");
+- DEFINE_string(write_keypoint,           "",             "Directory to write the people pose keypoint data. Format with `write_keypoint_format`.");
+- DEFINE_string(write_keypoint_format,    "yml",          "File extension and format for `write_keypoint`: json, xml, yaml & yml. Json not available for OpenCV < 3.0, use `write_keypoint_json` instead.");
+- DEFINE_string(write_keypoint_json,      "",             "Directory to write people pose data in *.json format, compatible with any OpenCV version.");
 - DEFINE_string(write_coco_json,          "",             "Full file path to write people pose data with *.json COCO validation format.");
- DEFINE_string(write_heatmaps,           "",             "Directory to write heatmaps with *.png format. At least 1 `add_heatmaps_X` flag must be enabled.");
+- DEFINE_string(write_heatmaps,           "",             "Directory to write heatmaps in *.png format. At least 1 `add_heatmaps_X` flag must be enabled.");
 - DEFINE_string(write_heatmaps_format,    "png",          "File extension and format for `write_heatmaps`, analogous to `write_images_format`. Recommended `png` or any compressed and lossless format.");

 ## Multiple Scales

--- a/doc/media/keypoints_hand.odt
+++ b/doc/media/keypoints_hand.odt
--- a/doc/media/keypoints_hand.png
+++ b/doc/media/keypoints_hand.png
--- a/doc/release_notes.md
+++ b/doc/release_notes.md
@@ -15,7 +15,7 @@ OpenPose Library - Release Notes
 1. Main improvements:
    1. Rendering max resolution from 720p to >32k images.
    2. Highly improved documentation.
-2. Functions or paremeters renamed:
+2. Functions or parameters renamed:
    1. Demo renamed from rtpose to openpose.
 3. Main bugs fixed:
    1. Demo uses exec instead of start, so it works with more OpenCV custom compiled versions.
@@ -25,14 +25,14 @@ OpenPose Library - Release Notes
 ## OpenPose 1.0.0rc3
 1. Main improvements:
    1. Added face keypoint detection.
-    2. Added Windows 10 compatibily.
+    2. Added Windows 10 compatibility.
    3. Auto-detection of the number of GPUs.
    4. MPI visualization more similar to COCO one.
    5. Rendering max resolution from 720p to >32k images.
    6. GUI info adder working when the worker TDatum has more than 1 Datum.
-    7. It prints out the error description before throwing the exception (so that it is writen on the Windows cmd).
+    7. It prints out the error description before throwing the exception (so that it is written on the Windows cmd).
    8. Highly improved documentation.
-2. Functions or paremeters renamed:
+2. Functions or parameters renamed:
    1. Flag `write_pose` renamed as `write_keypoint` and it also applies to face and/or hands.
    2. Flag `write_pose_json` renamed as `write_keypoint_json` and it also applies to face and/or hands.
    3. Flag `write_pose_format` renamed as `write_keypoint_format` and it also applies to face and/or hands.
@@ -46,11 +46,15 @@ OpenPose Library - Release Notes

 ## Current version (future OpenPose 1.0.0rc4)
 1. Main improvements:
-    1. Check() functions give more feedback.
-    2. Improved documentation.
-2. Functions or paremeters renamed:
-    1. `Datum::scaleRatios` to save the relative scale ratio when multi-scale.
+    1. Increased accuracy on multi-scale (added `Datum::scaleRatios` to save the relative scale ratio when multi-scale).
+    2. Increased speed ~3-5% by adding CPU rendering and setting it as default rendering.
+    3. Check() functions give more feedback.
+    4. WCocoJsonSaver finished and removed its 3599-image limit.
+    5. Improved documentation.
+2. Functions or parameters renamed:
+    1. Render flags renamed in the demo in order to incorporate the CPU/GPU rendering.
 3. Main bugs fixed:
    1. Fixed bug in Array::getConstCvMat() if mVolume=0, now returning empty cv::Mat.
    2. Fixed bug: `--process_real_time` threw error with webcam.
-    3. Fixed bug: Face not working with output resolution different to input.
+    3. Fixed bug: Face not working when input and output resolutions are different.
+    4. Fixed some bugs that prevented debug version to run.
--- a/examples/openpose/openpose.cpp
+++ b/examples/openpose/openpose.cpp
--- a/examples/tutorial_pose/1_extract_from_image.cpp
+++ b/examples/tutorial_pose/1_extract_from_image.cpp
@@ -22,21 +22,26 @@
 // Note: This command will show you flags for other unnecessary 3rdparty files. Check only the flags for the OpenPose
 // executable. E.g. for `openpose.bin`, look for `Flags from examples/openpose/openpose.cpp:`.
 // Debugging
-DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while 255 will not output any."
-                                                        " Current OpenPose library messages are in the range 0-4: 1 for low priority messages and 4 for important ones.");
+DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while"
+                                                        " 255 will not output any. Current OpenPose library messages are in the range 0-4: 1 for"
+                                                        " low priority messages and 4 for important ones.");
 // Producer
 DEFINE_string(image_path,               "examples/media/COCO_val2014_000000000192.jpg",     "Process the desired image.");
 // OpenPose
 DEFINE_string(model_pose,               "COCO",         "Model to be used (e.g. COCO, MPI, MPI_4_layers).");
 DEFINE_string(model_folder,             "models/",      "Folder path (absolute or relative) where the models (pose, face, ...) are located.");
-DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy usually increases. If it is decreased, the speed increases.");
-DEFINE_string(resolution,               "1280x720",     "The image resolution (display). Use \"-1x-1\" to force the program to use the default images resolution.");
+DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy usually increases. If it is decreased,"
+                                                        " the speed increases.");
+DEFINE_string(resolution,               "1280x720",     "The image resolution (display and output). Use \"-1x-1\" to force the program to use the"
+                                                        " default images resolution.");
 DEFINE_int32(num_gpu_start,             0,              "GPU device start number.");
-DEFINE_double(scale_gap,                0.3,            "Scale gap between scales. No effect unless num_scales>1. Initial scale is always 1. If you want to change the initial scale, "
-                                                        "you actually want to multiply the `net_resolution` by your desired initial scale.");
+DEFINE_double(scale_gap,                0.3,            "Scale gap between scales. No effect unless num_scales>1. Initial scale is always 1. If you"
+                                                        " want to change the initial scale, you actually want to multiply the `net_resolution` by"
+                                                        " your desired initial scale.");
 DEFINE_int32(num_scales,                1,              "Number of scales to average.");
 // OpenPose Rendering
-DEFINE_double(alpha_pose,               0.6,            "Blending factor (range 0-1) for the body part rendering. 1 will show it completely, 0 will hide it.");
+DEFINE_double(alpha_pose,               0.6,            "Blending factor (range 0-1) for the body part rendering. 1 will show it completely, 0 will"
+                                                        " hide it. Only valid for GPU rendering.");

 op::PoseModel gflagToPoseModel(const std::string& poseModeString)
 {
@@ -61,11 +66,13 @@ std::tuple<op::Point<int>, op::Point<int>, op::Point<int>, op::PoseModel> gflags
    // outputSize
    op::Point<int> outputSize;
    auto nRead = sscanf(FLAGS_resolution.c_str(), "%dx%d", &outputSize.x, &outputSize.y);
-    op::checkE(nRead, 2, "Error, resolution format (" +  FLAGS_resolution + ") invalid, should be e.g., 960x540 ", __LINE__, __FUNCTION__, __FILE__);
+    op::checkE(nRead, 2, "Error, resolution format (" +  FLAGS_resolution + ") invalid, should be e.g., 960x540 ",
+               __LINE__, __FUNCTION__, __FILE__);
    // netInputSize
    op::Point<int> netInputSize;
    nRead = sscanf(FLAGS_net_resolution.c_str(), "%dx%d", &netInputSize.x, &netInputSize.y);
-    op::checkE(nRead, 2, "Error, net resolution format (" +  FLAGS_net_resolution + ") invalid, should be e.g., 656x368 (multiples of 16)", __LINE__, __FUNCTION__, __FILE__);
+    op::checkE(nRead, 2, "Error, net resolution format (" +  FLAGS_net_resolution + ") invalid, should be e.g., 656x368 (multiples of 16)",
+               __LINE__, __FUNCTION__, __FILE__);
    // netOutputSize
    const auto netOutputSize = netInputSize;
    // poseModel

--- a/examples/tutorial_pose/2_extract_pose_or_heatmat_from_image.cpp
+++ b/examples/tutorial_pose/2_extract_pose_or_heatmat_from_image.cpp
@@ -22,23 +22,29 @@
 // Note: This command will show you flags for other unnecessary 3rdparty files. Check only the flags for the OpenPose
 // executable. E.g. for `openpose.bin`, look for `Flags from examples/openpose/openpose.cpp:`.
 // Debugging
-DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while 255 will not output any."
-                                                        " Current OpenPose library messages are in the range 0-4: 1 for low priority messages and 4 for important ones.");
+DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while"
+                                                        " 255 will not output any. Current OpenPose library messages are in the range 0-4: 1 for"
+                                                        " low priority messages and 4 for important ones.");
 // Producer
 DEFINE_string(image_path,               "examples/media/COCO_val2014_000000000192.jpg",     "Process the desired image.");
 // OpenPose
 DEFINE_string(model_pose,               "COCO",         "Model to be used (e.g. COCO, MPI, MPI_4_layers).");
 DEFINE_string(model_folder,             "models/",      "Folder path (absolute or relative) where the models (pose, face, ...) are located.");
-DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy usually increases. If it is decreased, the speed increases.");
-DEFINE_string(resolution,               "1280x720",     "The image resolution (display). Use \"-1x-1\" to force the program to use the default images resolution.");
+DEFINE_string(net_resolution,           "656x368",      "Multiples of 16. If it is increased, the accuracy usually increases. If it is decreased,"
+                                                        " the speed increases.");
+DEFINE_string(resolution,               "1280x720",     "The image resolution (display and output). Use \"-1x-1\" to force the program to use the"
+                                                        " default images resolution.");
 DEFINE_int32(num_gpu_start,             0,              "GPU device start number.");
-DEFINE_double(scale_gap,                0.3,            "Scale gap between scales. No effect unless num_scales>1. Initial scale is always 1. If you want to change the initial scale, "
-                                                        "you actually want to multiply the `net_resolution` by your desired initial scale.");
+DEFINE_double(scale_gap,                0.3,            "Scale gap between scales. No effect unless num_scales>1. Initial scale is always 1. If you"
+                                                        " want to change the initial scale, you actually want to multiply the `net_resolution` by"
+                                                        " your desired initial scale.");
 DEFINE_int32(num_scales,                1,              "Number of scales to average.");
 // OpenPose Rendering
 DEFINE_int32(part_to_show,              19,             "Part to show from the start.");
-DEFINE_double(alpha_pose,               0.6,            "Blending factor (range 0-1) for the body part rendering. 1 will show it completely, 0 will hide it.");
-DEFINE_double(alpha_heatmap,            0.7,            "Blending factor (range 0-1) between heatmap and original frame. 1 will only show the heatmap, 0 will only show the frame.");
+DEFINE_double(alpha_pose,               0.6,            "Blending factor (range 0-1) for the body part rendering. 1 will show it completely, 0 will"
+                                                        " hide it. Only valid for GPU rendering.");
+DEFINE_double(alpha_heatmap,            0.7,            "Blending factor (range 0-1) between heatmap and original frame. 1 will only show the"
+                                                        " heatmap, 0 will only show the frame. Only valid for GPU rendering.");

 op::PoseModel gflagToPoseModel(const std::string& poseModeString)
 {
@@ -63,11 +69,13 @@ std::tuple<op::Point<int>, op::Point<int>, op::Point<int>, op::PoseModel> gflags
    // outputSize
    op::Point<int> outputSize;
    auto nRead = sscanf(FLAGS_resolution.c_str(), "%dx%d", &outputSize.x, &outputSize.y);
-    op::checkE(nRead, 2, "Error, resolution format (" +  FLAGS_resolution + ") invalid, should be e.g., 960x540 ", __LINE__, __FUNCTION__, __FILE__);
+    op::checkE(nRead, 2, "Error, resolution format (" +  FLAGS_resolution + ") invalid, should be e.g., 960x540 ",
+               __LINE__, __FUNCTION__, __FILE__);
    // netInputSize
    op::Point<int> netInputSize;
    nRead = sscanf(FLAGS_net_resolution.c_str(), "%dx%d", &netInputSize.x, &netInputSize.y);
-    op::checkE(nRead, 2, "Error, net resolution format (" +  FLAGS_net_resolution + ") invalid, should be e.g., 656x368 (multiples of 16)", __LINE__, __FUNCTION__, __FILE__);
+    op::checkE(nRead, 2, "Error, net resolution format (" +  FLAGS_net_resolution + ") invalid, should be e.g., 656x368 (multiples of 16)",
+               __LINE__, __FUNCTION__, __FILE__);
    // netOutputSize
    const auto netOutputSize = netInputSize;
    // poseModel
@@ -100,8 +108,9 @@ int openPoseTutorialPose2()
    // Step 3 - Initialize all required classes
    op::CvMatToOpInput cvMatToOpInput{netInputSize, FLAGS_num_scales, (float)FLAGS_scale_gap};
    op::CvMatToOpOutput cvMatToOpOutput{outputSize};
-    std::shared_ptr<op::PoseExtractor> poseExtractorPtr = std::make_shared<op::PoseExtractorCaffe>(netInputSize, netOutputSize, outputSize, FLAGS_num_scales,
-                                                                                                   poseModel, FLAGS_model_folder, FLAGS_num_gpu_start);
+    std::shared_ptr<op::PoseExtractor> poseExtractorPtr = std::make_shared<op::PoseExtractorCaffe>(netInputSize, netOutputSize, outputSize,
+                                                                                                   FLAGS_num_scales, poseModel,
+                                                                                                   FLAGS_model_folder, FLAGS_num_gpu_start);
    op::PoseRenderer poseRenderer{netOutputSize, outputSize, poseModel, poseExtractorPtr, (float)FLAGS_alpha_pose, (float)FLAGS_alpha_heatmap};
    poseRenderer.setElementToRender(FLAGS_part_to_show);
    op::OpOutputToCvMat opOutputToCvMat{outputSize};

--- a/examples/tutorial_thread/1_openpose_read_and_display.cpp
+++ b/examples/tutorial_thread/1_openpose_read_and_display.cpp
@@ -21,19 +21,23 @@
 // Note: This command will show you flags for other unnecessary 3rdparty files. Check only the flags for the OpenPose
 // executable. E.g. for `openpose.bin`, look for `Flags from examples/openpose/openpose.cpp:`.
 // Debugging
-DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while 255 will not output any."
-                                                        " Current OpenPose library messages are in the range 0-4: 1 for low priority messages and 4 for important ones.");
+DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while"
+                                                        " 255 will not output any. Current OpenPose library messages are in the range 0-4: 1 for"
+                                                        " low priority messages and 4 for important ones.");
 // Producer
 DEFINE_int32(camera,                    0,              "The camera index for cv::VideoCapture. Integer in the range [0, 9].");
 DEFINE_string(camera_resolution,        "1280x720",     "Size of the camera frames to ask for.");
-DEFINE_string(video,                    "",             "Use a video file instead of the camera. Use `examples/media/video.avi` for our default example video.");
-DEFINE_string(image_dir,                "",             "Process a directory of images. Use `examples/media/` for our default example folder with 20 images.");
+DEFINE_string(video,                    "",             "Use a video file instead of the camera. Use `examples/media/video.avi` for our default"
+                                                        " example video.");
+DEFINE_string(image_dir,                "",             "Process a directory of images. Use `examples/media/` for our default example folder with 20"
+                                                        " images.");
 // OpenPose
-DEFINE_string(resolution,               "1280x720",     "The image resolution (display). Use \"-1x-1\" to force the program to use the default images resolution.");
+DEFINE_string(resolution,               "1280x720",     "The image resolution (display and output). Use \"-1x-1\" to force the program to use the"
+                                                        " default images resolution.");
 // Consumer
 DEFINE_bool(fullscreen,                 false,          "Run in full-screen mode (press f during runtime to toggle).");
-DEFINE_bool(process_real_time,          false,          "Enable to keep the original source frame rate (e.g. for video). If the processing time is too long, it will skip frames. If it is"
-                                                        " too fast, it will slow it down.");
+DEFINE_bool(process_real_time,          false,          "Enable to keep the original source frame rate (e.g. for video). If the processing time is"
+                                                        " too long, it will skip frames. If it is too fast, it will slow it down.");

 // Determine type of frame source
 op::ProducerType gflagsToProducerType(const std::string& imageDirectory, const std::string& videoPath, const int webcamIndex)
@@ -56,7 +60,8 @@ op::ProducerType gflagsToProducerType(const std::string& imageDirectory, const s
        return op::ProducerType::Webcam;
 }

-std::shared_ptr<op::Producer> gflagsToProducer(const std::string& imageDirectory, const std::string& videoPath, const int webcamIndex, const op::Point<int> webcamResolution)
+std::shared_ptr<op::Producer> gflagsToProducer(const std::string& imageDirectory, const std::string& videoPath, const int webcamIndex,
+                                               const op::Point<int> webcamResolution)
 {
    op::log("", op::Priority::Low, __LINE__, __FUNCTION__, __FILE__);
    const auto type = gflagsToProducerType(imageDirectory, videoPath, webcamIndex);
@@ -81,11 +86,13 @@ std::tuple<op::Point<int>, op::Point<int>, std::shared_ptr<op::Producer>> gflags
    // cameraFrameSize
    op::Point<int> cameraFrameSize;
    auto nRead = sscanf(FLAGS_camera_resolution.c_str(), "%dx%d", &cameraFrameSize.x, &cameraFrameSize.y);
-    op::checkE(nRead, 2, "Error, camera resolution format (" +  FLAGS_camera_resolution + ") invalid, should be e.g., 1280x720", __LINE__, __FUNCTION__, __FILE__);
+    op::checkE(nRead, 2, "Error, camera resolution format (" +  FLAGS_camera_resolution + ") invalid, should be e.g., 1280x720",
+               __LINE__, __FUNCTION__, __FILE__);
    // outputSize
    op::Point<int> outputSize;
    nRead = sscanf(FLAGS_resolution.c_str(), "%dx%d", &outputSize.x, &outputSize.y);
-    op::checkE(nRead, 2, "Error, resolution format (" +  FLAGS_resolution + ") invalid, should be e.g., 960x540 ", __LINE__, __FUNCTION__, __FILE__);
+    op::checkE(nRead, 2, "Error, camera resolution format (" +  FLAGS_camera_resolution + ") invalid, should be e.g., 1280x720",
+               __LINE__, __FUNCTION__, __FILE__);

    // producerType
    const auto producerSharedPtr = gflagsToProducer(FLAGS_image_dir, FLAGS_video, FLAGS_camera, cameraFrameSize);
@@ -121,7 +128,8 @@ int openPoseTutorialThread1()
        if (producerSize.area() > 0)
            outputSize = producerSize;
        else
-            op::error("Output resolution = input resolution not valid for image reading (size might change between images).", __LINE__, __FUNCTION__, __FILE__);
+            op::error("Output resolution = input resolution not valid for image reading (size might change between images).",
+                      __LINE__, __FUNCTION__, __FILE__);
    }
    // Step 4 - Setting thread workers && manager
    typedef std::vector<op::Datum> TypedefDatumsNoPtr;
@@ -138,7 +146,8 @@ int openPoseTutorialThread1()
    // ------------------------- CONFIGURING THREADING -------------------------
    // In this simple multi-thread example, we will do the following:
        // 3 (virtual) queues: 0, 1, 2
-        // 1 real queue: 1. The first and last queue ids (in this case 0 and 2) are not actual queues, but the beginning and end of the processing sequence
+        // 1 real queue: 1. The first and last queue ids (in this case 0 and 2) are not actual queues, but the beginning and end of the processing
+        // sequence
        // 2 threads: 0, 1
        // wDatumProducer will generate frames (there is no real queue 0) and push them on queue 1
        // wGui will pop frames from queue 1 and process them (there is no real queue 2)

--- a/examples/tutorial_thread/2_user_processing_function.cpp
+++ b/examples/tutorial_thread/2_user_processing_function.cpp
@@ -22,19 +22,23 @@
 // Note: This command will show you flags for other unnecessary 3rdparty files. Check only the flags for the OpenPose
 // executable. E.g. for `openpose.bin`, look for `Flags from examples/openpose/openpose.cpp:`.
 // Debugging
-DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while 255 will not output any."
-                                                        " Current OpenPose library messages are in the range 0-4: 1 for low priority messages and 4 for important ones.");
+DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while"
+                                                        " 255 will not output any. Current OpenPose library messages are in the range 0-4: 1 for"
+                                                        " low priority messages and 4 for important ones.");
 // Producer
 DEFINE_int32(camera,                    0,              "The camera index for cv::VideoCapture. Integer in the range [0, 9].");
 DEFINE_string(camera_resolution,        "1280x720",     "Size of the camera frames to ask for.");
-DEFINE_string(video,                    "",             "Use a video file instead of the camera. Use `examples/media/video.avi` for our default example video.");
-DEFINE_string(image_dir,                "",             "Process a directory of images. Use `examples/media/` for our default example folder with 20 images.");
+DEFINE_string(video,                    "",             "Use a video file instead of the camera. Use `examples/media/video.avi` for our default"
+                                                        " example video.");
+DEFINE_string(image_dir,                "",             "Process a directory of images. Use `examples/media/` for our default example folder with 20"
+                                                        " images.");
 // OpenPose
-DEFINE_string(resolution,               "1280x720",     "The image resolution (display). Use \"-1x-1\" to force the program to use the default images resolution.");
+DEFINE_string(resolution,               "1280x720",     "The image resolution (display and output). Use \"-1x-1\" to force the program to use the"
+                                                        " default images resolution.");
 // Consumer
 DEFINE_bool(fullscreen,                 false,          "Run in full-screen mode (press f during runtime to toggle).");
-DEFINE_bool(process_real_time,          false,          "Enable to keep the original source frame rate (e.g. for video). If the processing time is too long, it will skip frames. If it is"
-                                                        " too fast, it will slow it down.");
+DEFINE_bool(process_real_time,          false,          "Enable to keep the original source frame rate (e.g. for video). If the processing time is"
+                                                        " too long, it will skip frames. If it is too fast, it will slow it down.");

 // This class can be implemented either as a template or as a simple class given
 // that the user usually knows which kind of data he will move between the queues,
@@ -90,7 +94,8 @@ op::ProducerType gflagsToProducerType(const std::string& imageDirectory, const s
        return op::ProducerType::Webcam;
 }

-std::shared_ptr<op::Producer> gflagsToProducer(const std::string& imageDirectory, const std::string& videoPath, const int webcamIndex, const op::Point<int> webcamResolution)
+std::shared_ptr<op::Producer> gflagsToProducer(const std::string& imageDirectory, const std::string& videoPath, const int webcamIndex,
+                                               const op::Point<int> webcamResolution)
 {
    op::log("", op::Priority::Low, __LINE__, __FUNCTION__, __FILE__);
    const auto type = gflagsToProducerType(imageDirectory, videoPath, webcamIndex);
@@ -115,11 +120,13 @@ std::tuple<op::Point<int>, op::Point<int>, std::shared_ptr<op::Producer>> gflags
    // cameraFrameSize
    op::Point<int> cameraFrameSize;
    auto nRead = sscanf(FLAGS_camera_resolution.c_str(), "%dx%d", &cameraFrameSize.x, &cameraFrameSize.y);
-    op::checkE(nRead, 2, "Error, camera resolution format (" +  FLAGS_camera_resolution + ") invalid, should be e.g., 1280x720", __LINE__, __FUNCTION__, __FILE__);
+    op::checkE(nRead, 2, "Error, camera resolution format (" +  FLAGS_camera_resolution + ") invalid, should be e.g., 1280x720",
+               __LINE__, __FUNCTION__, __FILE__);
    // outputSize
    op::Point<int> outputSize;
    nRead = sscanf(FLAGS_resolution.c_str(), "%dx%d", &outputSize.x, &outputSize.y);
-    op::checkE(nRead, 2, "Error, resolution format (" +  FLAGS_resolution + ") invalid, should be e.g., 960x540 ", __LINE__, __FUNCTION__, __FILE__);
+    op::checkE(nRead, 2, "Error, camera resolution format (" +  FLAGS_camera_resolution + ") invalid, should be e.g., 1280x720",
+               __LINE__, __FUNCTION__, __FILE__);

    // producerType
    const auto producerSharedPtr = gflagsToProducer(FLAGS_image_dir, FLAGS_video, FLAGS_camera, cameraFrameSize);
@@ -155,7 +162,8 @@ int openPoseTutorialThread2()
        if (producerSize.area() > 0)
            outputSize = producerSize;
        else
-            op::error("Output resolution = input resolution not valid for image reading (size might change between images).", __LINE__, __FUNCTION__, __FILE__);
+            op::error("Output resolution = input resolution not valid for image reading (size might change between images).",
+                      __LINE__, __FUNCTION__, __FILE__);
    }
    // Step 4 - Setting thread workers && manager
    typedef std::vector<op::Datum> TypedefDatumsNoPtr;
@@ -174,7 +182,8 @@ int openPoseTutorialThread2()
    // ------------------------- CONFIGURING THREADING -------------------------
    // In this simple multi-thread example, we will do the following:
        // 3 (virtual) queues: 0, 1, 2
-        // 1 real queue: 1. The first and last queue ids (in this case 0 and 2) are not actual queues, but the beginning and end of the processing sequence
+        // 1 real queue: 1. The first and last queue ids (in this case 0 and 2) are not actual queues, but the beginning and end of the processing
+        // sequence
        // 2 threads: 0, 1
        // wDatumProducer will generate frames (there is no real queue 0) and push them on queue 1
        // wGui will pop frames from queue 1 and process them (there is no real queue 2)

--- a/examples/tutorial_thread/3_user_input_processing_and_output.cpp
+++ b/examples/tutorial_thread/3_user_input_processing_and_output.cpp
@@ -27,8 +27,9 @@
 // Note: This command will show you flags for other unnecessary 3rdparty files. Check only the flags for the OpenPose
 // executable. E.g. for `openpose.bin`, look for `Flags from examples/openpose/openpose.cpp:`.
 // Debugging
-DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while 255 will not output any."
-                                                        " Current OpenPose library messages are in the range 0-4: 1 for low priority messages and 4 for important ones.");
+DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while"
+                                                        " 255 will not output any. Current OpenPose library messages are in the range 0-4: 1 for"
+                                                        " low priority messages and 4 for important ones.");
 // Producer
 DEFINE_string(image_dir,                "examples/media/",      "Process a directory of images.");
 // Consumer
@@ -183,7 +184,8 @@ int openPoseTutorialThread3()
    // ------------------------- CONFIGURING THREADING -------------------------
    // In this simple multi-thread example, we will do the following:
        // 3 (virtual) queues: 0, 1, 2
-        // 1 real queue: 1. The first and last queue ids (in this case 0 and 2) are not actual queues, but the beginning and end of the processing sequence
+        // 1 real queue: 1. The first and last queue ids (in this case 0 and 2) are not actual queues, but the beginning and end of the processing
+        // sequence
        // 2 threads: 0, 1
        // wUserInput will generate frames (there is no real queue 0) and push them on queue 1
        // wGui will pop frames from queue 1 and process them (there is no real queue 2)

--- a/examples/tutorial_thread/4_user_input_processing_output_and_datum.cpp
+++ b/examples/tutorial_thread/4_user_input_processing_output_and_datum.cpp
@@ -27,8 +27,9 @@
 // Note: This command will show you flags for other unnecessary 3rdparty files. Check only the flags for the OpenPose
 // executable. E.g. for `openpose.bin`, look for `Flags from examples/openpose/openpose.cpp:`.
 // Debugging
-DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while 255 will not output any."
-                                                        " Current OpenPose library messages are in the range 0-4: 1 for low priority messages and 4 for important ones.");
+DEFINE_int32(logging_level,             3,              "The logging level. Integer in the range [0, 255]. 0 will output any log() message, while"
+                                                        " 255 will not output any. Current OpenPose library messages are in the range 0-4: 1 for"
+                                                        " low priority messages and 4 for important ones.");
 // Producer
 DEFINE_string(image_dir,                "examples/media/",      "Process a directory of images.");
 // Consumer
@@ -36,7 +37,8 @@ DEFINE_bool(fullscreen,                 false,          "Run in full-screen mode


 // If the user needs his own variables, he can inherit the op::Datum struct and add them
-// UserDatum can be directly used by the OpenPose wrapper because it inherits from op::Datum, just define Wrapper<UserDatum> instead of Wrapper<op::Datum>
+// UserDatum can be directly used by the OpenPose wrapper because it inherits from op::Datum, just define Wrapper<UserDatum> instead of
+// Wrapper<op::Datum>
 struct UserDatum : public op::Datum
 {
    bool boolThatUserNeedsForSomeReason;
@@ -195,7 +197,8 @@ int openPoseTutorialThread4()
    // ------------------------- CONFIGURING THREADING -------------------------
    // In this simple multi-thread example, we will do the following:
        // 3 (virtual) queues: 0, 1, 2
-        // 1 real queue: 1. The first and last queue ids (in this case 0 and 2) are not actual queues, but the beginning and end of the processing sequence
+        // 1 real queue: 1. The first and last queue ids (in this case 0 and 2) are not actual queues, but the beginning and end of the processing
+        // sequence
        // 2 threads: 0, 1
        // wUserInput will generate frames (there is no real queue 0) and push them on queue 1
        // wGui will pop frames from queue 1 and process them (there is no real queue 2)

--- a/examples/tutorial_wrapper/1_user_asynchronous.cpp
+++ b/examples/tutorial_wrapper/1_user_asynchronous.cpp
--- a/examples/tutorial_wrapper/2_user_synchronous.cpp
+++ b/examples/tutorial_wrapper/2_user_synchronous.cpp
--- a/include/openpose/core/array.hpp
+++ b/include/openpose/core/array.hpp
@@ -211,12 +211,6 @@ namespace op
            return spData.get();
        }

-        // Disabled because people might try to modify it in illegal ways and change the allocated memory (e.g. resizing), leading to huge errors
-        // inline cv::Mat& getCvMat()
-        // {
-        //     return mCvMatData.second;
-        // }
-
        /**
         * Return a cv::Mat wrapper to the data. It forbids the data to be modified.
         * OpenCV only admits unsigned char, signed char, int, float & double. If the T class is not supported by OpenCV, it will throw an error.
@@ -229,6 +223,14 @@ namespace op
         */
        const cv::Mat& getConstCvMat() const;

+        /**
+         * Analogous to getConstCvMat, but in this case it returns a editable cv::Mat.
+         * Very important: Only allowed functions which do not provoke data reallocation.
+         * E.g. resizing functions will not work and they would provoke an undefined behaviour and/or execution crashes.
+         * @return A cv::Mat pointing to the data.
+         */
+        cv::Mat& getCvMat();
+
        /**
         * [] operator
         * Similar to the [] operator for raw pointer data.

--- a/include/openpose/core/datum.hpp
+++ b/include/openpose/core/datum.hpp
@@ -2,7 +2,7 @@
 #define OPENPOSE_CORE_DATUM_HPP

 #include <memory> // std::shared_ptr
-#include <string> // std::string
+#include <string>
 #include <opencv2/core/core.hpp> // cv::Mat
 #include "array.hpp"
 #include "point.hpp"

--- a/include/openpose/core/enumClasses.hpp
+++ b/include/openpose/core/enumClasses.hpp
@@ -19,6 +19,13 @@ namespace op
        Background,
        PAFs,
    };
+
+    enum class RenderMode : unsigned char
+    {
+        None,
+        Cpu,
+        Gpu,
+    };
 }

 #endif // OPENPOSE_CORE_ENUM_CLASSES_HPP
--- a/include/openpose/core/headers.hpp
+++ b/include/openpose/core/headers.hpp
@@ -18,7 +18,6 @@
 #include "renderer.hpp"
 #include "resizeAndMergeBase.hpp"
 #include "resizeAndMergeCaffe.hpp"
-#include "scaleKeypoints.hpp"
 #include "wCvMatToOpInput.hpp"
 #include "wCvMatToOpOutput.hpp"
 #include "wKeypointScaler.hpp"

--- a/include/openpose/core/renderer.hpp
+++ b/include/openpose/core/renderer.hpp
@@ -11,8 +11,8 @@ namespace op
    class Renderer
    {
    public:
-        explicit Renderer(const unsigned long long volume, const float alphaKeypoint, const float alphaHeatMap, const unsigned int elementToRender = 0u,
-                          const unsigned int numberElementsToRender = 0u);
+        explicit Renderer(const unsigned long long volume, const float alphaKeypoint, const float alphaHeatMap,
+                          const unsigned int elementToRender = 0u, const unsigned int numberElementsToRender = 0u);

        ~Renderer();

@@ -22,9 +22,11 @@ namespace op

        void setElementToRender(const int elementToRender);

-        std::tuple<std::shared_ptr<float*>, std::shared_ptr<bool>, std::shared_ptr<std::atomic<unsigned int>>, std::shared_ptr<const unsigned int>> getSharedParameters();
+        std::tuple<std::shared_ptr<float*>, std::shared_ptr<bool>, std::shared_ptr<std::atomic<unsigned int>>,
+                   std::shared_ptr<const unsigned int>> getSharedParameters();

-        void setSharedParametersAndIfLast(const std::tuple<std::shared_ptr<float*>, std::shared_ptr<bool>, std::shared_ptr<std::atomic<unsigned int>>,
+        void setSharedParametersAndIfLast(const std::tuple<std::shared_ptr<float*>, std::shared_ptr<bool>,
+                                                           std::shared_ptr<std::atomic<unsigned int>>,
                                                           std::shared_ptr<const unsigned int>>& tuple, const bool isLast);

        float getAlphaKeypoint() const;

--- a/include/openpose/core/scaleKeypoints.hpp
+++ b/include/openpose/core/scaleKeypoints.hpp
-#ifndef OPENPOSE_CORE_SCALE_KEYPOINTS_HPP
-#define OPENPOSE_CORE_SCALE_KEYPOINTS_HPP
-
-#include "array.hpp"
-
-namespace op
-{
-    void scaleKeypoints(Array<float>& keypoints, const float scale);
-
-    void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY);
-
-    void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY, const float offsetX, const float offsetY);
-}
-
-#endif // OPENPOSE_CORE_SCALE_KEYPOINTS_HPP
--- a/include/openpose/core/wKeypointScaler.hpp
+++ b/include/openpose/core/wKeypointScaler.hpp
@@ -3,7 +3,6 @@

 #include <openpose/thread/worker.hpp>
 #include "keypointScaler.hpp"
-#include "scaleKeypoints.hpp"

 namespace op
 {

--- a/include/openpose/experimental/hand/handParameters.hpp
+++ b/include/openpose/experimental/hand/handParameters.hpp
@@ -9,7 +9,33 @@ namespace op
    const auto HAND_MAX_HANDS = 2*POSE_MAX_PEOPLE;

    const auto HAND_NUMBER_PARTS = 21u;
-    #define HAND_PAIRS_TO_RENDER {0,1,  1,2,  2,3,  3,4,  0,5,  5,6,  6,7,  7,8,  0,9,  9,10,  10,11,  11,12,  0,13,  13,14,  14,15,  15,16,  0,17,  17,18,  18,19,  19,20}
+    #define HAND_PAIRS_RENDER_GPU {0,1,  1,2,  2,3,  3,4,  0,5,  5,6,  6,7,  7,8,  0,9,  9,10,  10,11,  11,12,  0,13,  13,14,  14,15,  15,16,  0,17,  17,18,  18,19,  19,20}
+    const std::vector<unsigned int> HAND_PAIRS_RENDER {HAND_PAIRS_RENDER_GPU};
+    #define HAND_COLORS_RENDER \
+        179.f,    0.f,    0.f, \
+        204.f,    0.f,    0.f, \
+        230.f,    0.f,    0.f, \
+        255.f,    0.f,    0.f, \
+        143.f,  179.f,    0.f, \
+        163.f,  204.f,    0.f, \
+        184.f,  230.f,    0.f, \
+        204.f,  255.f,    0.f, \
+          0.f,  179.f,   71.f, \
+          0.f,  204.f,   82.f, \
+          0.f,  230.f,   92.f, \
+          0.f,  255.f,  102.f, \
+          0.f,   71.f,  179.f, \
+          0.f,   82.f,  204.f, \
+          0.f,   92.f,  230.f, \
+          0.f,  102.f,  255.f, \
+        143.f,    0.f,  179.f, \
+        163.f,    0.f,  204.f, \
+        184.f,    0.f,  230.f, \
+        204.f,    0.f,  255.f, \
+        179.f,  179.f,  179.f, \
+        179.f,  179.f,  179.f, \
+        179.f,  179.f,  179.f, \
+        179.f,  179.f,  179.f

    // Constant parameters
    const auto HAND_CCN_DECREASE_FACTOR = 8.f;

--- a/include/openpose/experimental/hand/handRenderer.hpp
+++ b/include/openpose/experimental/hand/handRenderer.hpp
@@ -2,9 +2,11 @@
 #define OPENPOSE_HAND_HAND_RENDERER_HPP

 #include <openpose/core/array.hpp>
+#include <openpose/core/enumClasses.hpp>
 #include <openpose/core/point.hpp>
 #include <openpose/core/renderer.hpp>
 #include <openpose/thread/worker.hpp>
+#include "handParameters.hpp"

 namespace op
 {
@@ -13,17 +15,23 @@ namespace op
        class HandRenderer : public Renderer
        {
        public:
-            explicit HandRenderer(const Point<int>& frameSize);
+            HandRenderer(const Point<int>& frameSize, const float alphaKeypoint = HAND_DEFAULT_ALPHA_KEYPOINT,
+                         const float alphaHeatMap = HAND_DEFAULT_ALPHA_HEAT_MAP, const RenderMode renderMode = RenderMode::Cpu);

            ~HandRenderer();

            void initializationOnThread();

-            void renderHands(Array<float>& outputData, const Array<float>& handKeypoints);
+            void renderHand(Array<float>& outputData, const Array<float>& handKeypoints);

        private:
            const Point<int> mFrameSize;
-            float* pGpuHands;           // GPU aux memory
+            const RenderMode mRenderMode;
+            float* pGpuHands; // GPU aux memory
+
+            void renderHandCpu(Array<float>& outputData, const Array<float>& handKeypoints);
+
+            void renderHandGpu(Array<float>& outputData, const Array<float>& handKeypoints);

            DELETE_COPY(HandRenderer);
        };

--- a/include/openpose/experimental/hand/headers.hpp
+++ b/include/openpose/experimental/hand/headers.hpp
@@ -6,7 +6,7 @@
 #include "handExtractor.hpp"
 #include "handParameters.hpp"
 #include "handRenderer.hpp"
-#include "handRenderGpu.hpp"
+#include "renderHand.hpp"
 #include "wHandExtractor.hpp"
 #include "wHandRenderer.hpp"


--- a/include/openpose/experimental/hand/handRenderGpu.hpp
+++ b/include/openpose/experimental/hand/handRenderGpu.hpp
 #ifndef OPENPOSE_HAND_GPU_HAND_RENDER_HPP
 #define OPENPOSE_HAND_GPU_HAND_RENDER_HPP

+#include <openpose/core/array.hpp>
 #include <openpose/core/point.hpp>
 #include "handParameters.hpp"

 namespace op
 {
-	void renderHandsGpu(float* framePtr, const Point<int>& frameSize, const float* const handsPtr, const int numberHands, const float alphaColorToAdd = HAND_DEFAULT_ALPHA_KEYPOINT);
+    void renderHandKeypointsCpu(Array<float>& frameArray, const Array<float>& handKeypoints);
+
+    void renderHandKeypointsGpu(float* framePtr, const Point<int>& frameSize, const float* const handsPtr, const int numberHands,
+                                const float alphaColorToAdd = HAND_DEFAULT_ALPHA_KEYPOINT);
 }

 #endif // OPENPOSE_HAND_GPU_HAND_RENDER_HPP
--- a/include/openpose/experimental/hand/wHandRenderer.hpp
+++ b/include/openpose/experimental/hand/wHandRenderer.hpp
@@ -65,7 +65,7 @@ namespace op
                    const auto profilerKey = Profiler::timerInit(__LINE__, __FUNCTION__, __FILE__);
                    // Render people hands
                    for (auto& tDatum : *tDatums)
-                        spHandRenderer->renderHands(tDatum.outputData, tDatum.handKeypoints);
+                        spHandRenderer->renderHand(tDatum.outputData, tDatum.handKeypoints);
                    // Profiling speed
                    Profiler::timerEnd(profilerKey);
                    Profiler::printAveragedTimeMsOnIterationX(profilerKey, __LINE__, __FUNCTION__, __FILE__, Profiler::DEFAULT_X);

--- a/include/openpose/experimental/producer/wPeoplePoseLoader.hpp
+++ b/include/openpose/experimental/producer/wPeoplePoseLoader.hpp
@@ -7,7 +7,7 @@

 // namespace op
 // {
-//	   template<typename TDatums>
+//     template<typename TDatums>
 //     class wPoseLoader : public Worker<TDatums>
 //     {
 //     public:

--- a/include/openpose/face/faceParameters.hpp
+++ b/include/openpose/face/faceParameters.hpp
@@ -9,10 +9,12 @@ namespace op
    const auto FACE_MAX_FACES = POSE_MAX_PEOPLE;

    const auto FACE_NUMBER_PARTS = 70u;
-    #define FACE_PAIRS_TO_RENDER {0,1,  1,2,  2,3,  3,4,  4,5,  5,6,  6,7,  7,8,  8,9,  9,10,  10,11,  11,12,  12,13,  13,14,  14,15,  15,16,  17,18,  18,19,  19,20, \
+    #define FACE_PAIRS_RENDER_GPU {0,1,  1,2,  2,3,  3,4,  4,5,  5,6,  6,7,  7,8,  8,9,  9,10,  10,11,  11,12,  12,13,  13,14,  14,15,  15,16,  17,18,  18,19,  19,20, \
                                  20,21,  22,23,  23,24,  24,25,  25,26,  27,28,  28,29,  29,30,  31,32,  32,33,  33,34,  34,35,  36,37,  37,38,  38,39,  39,40,  40,41, \
                                  41,36,  42,43,  43,44,  44,45,  45,46,  46,47,  47,42,  48,49,  49,50,  50,51,  51,52,  52,53,  53,54,  54,55,  55,56,  56,57,  57,58, \
                                  58,59,  59,48,  60,61,  61,62,  62,63,  63,64,  64,65,  65,66,  66,67,  67,60}
+    const std::vector<unsigned int> FACE_PAIRS_RENDER {FACE_PAIRS_RENDER_GPU};
+    #define FACE_COLORS_RENDER      255.f,    255.f,    255.f

    // Constant parameters
    const auto FACE_CCN_DECREASE_FACTOR = 8.f;

--- a/include/openpose/face/faceRenderGpu.hpp
+++ b/include/openpose/face/faceRenderGpu.hpp
-#ifndef OPENPOSE_FACE_GPU_FACE_RENDER_HPP
-#define OPENPOSE_FACE_GPU_FACE_RENDER_HPP
-
-#include <openpose/core/point.hpp>
-#include "faceParameters.hpp"
-
-namespace op
-{
-	void renderFaceGpu(float* framePtr, const Point<int>& frameSize, const float* const facePtr, const int numberFace, const float alphaColorToAdd = FACE_DEFAULT_ALPHA_KEYPOINT);
-}
-
-#endif // OPENPOSE_FACE_GPU_FACE_RENDER_HPP
--- a/include/openpose/face/faceRenderer.hpp
+++ b/include/openpose/face/faceRenderer.hpp
@@ -2,9 +2,11 @@
 #define OPENPOSE_FACE_FACE_RENDERER_HPP

 #include <openpose/core/array.hpp>
+#include <openpose/core/enumClasses.hpp>
 #include <openpose/core/point.hpp>
 #include <openpose/core/renderer.hpp>
 #include <openpose/thread/worker.hpp>
+#include "faceParameters.hpp"

 namespace op
 {
@@ -12,7 +14,7 @@ namespace op
    {
    public:
        explicit FaceRenderer(const Point<int>& frameSize, const float alphaKeypoint = FACE_DEFAULT_ALPHA_KEYPOINT,
-                              const float alphaHeatMap = FACE_DEFAULT_ALPHA_HEAT_MAP);
+                              const float alphaHeatMap = FACE_DEFAULT_ALPHA_HEAT_MAP, const RenderMode renderMode = RenderMode::Cpu);

        ~FaceRenderer();

@@ -22,7 +24,12 @@ namespace op

    private:
        const Point<int> mFrameSize;
-        float* pGpuFace;           // GPU aux memory
+        const RenderMode mRenderMode;
+        float* pGpuFace; // GPU aux memory
+
+        void renderFaceCpu(Array<float>& outputData, const Array<float>& faceKeypoints);
+
+        void renderFaceGpu(Array<float>& outputData, const Array<float>& faceKeypoints);

        DELETE_COPY(FaceRenderer);
    };

--- a/include/openpose/face/headers.hpp
+++ b/include/openpose/face/headers.hpp
@@ -7,7 +7,7 @@
 #include "faceExtractor.hpp"
 #include "faceParameters.hpp"
 #include "faceRenderer.hpp"
-#include "faceRenderGpu.hpp"
+#include "renderFace.hpp"
 #include "wFaceDetector.hpp"
 #include "wFaceExtractor.hpp"
 #include "wFaceRenderer.hpp"

--- a/include/openpose/face/renderFace.hpp
+++ b/include/openpose/face/renderFace.hpp
+#ifndef OPENPOSE_FACE_RENDER_FACE_HPP
+#define OPENPOSE_FACE_RENDER_FACE_HPP
+
+#include <openpose/core/array.hpp>
+#include <openpose/core/point.hpp>
+#include "faceParameters.hpp"
+
+namespace op
+{
+    void renderFaceKeypointsCpu(Array<float>& frameArray, const Array<float>& faceKeypoints);
+
+    void renderFaceKeypointsGpu(float* framePtr, const Point<int>& frameSize, const float* const facePtr, const int numberFace,
+                                const float alphaColorToAdd = FACE_DEFAULT_ALPHA_KEYPOINT);
+}
+
+#endif // OPENPOSE_FACE_RENDER_FACE_HPP
--- a/include/openpose/pose/headers.hpp
+++ b/include/openpose/pose/headers.hpp
@@ -8,8 +8,8 @@
 #include "poseExtractor.hpp"
 #include "poseExtractorCaffe.hpp"
 #include "poseRenderer.hpp"
-#include "poseRenderGpu.hpp"
 #include "poseParameters.hpp"
+#include "renderPose.hpp"
 #include "wPoseExtractor.hpp"
 #include "wPoseRenderer.hpp"


--- a/include/openpose/pose/poseParameters.hpp
+++ b/include/openpose/pose/poseParameters.hpp
@@ -31,10 +31,30 @@ namespace op
        {17, "LEar"},
        {18, "Background"}
    };
-    const unsigned int POSE_COCO_NUMBER_PARTS           = 18u; // Equivalent to size of std::map POSE_COCO_BODY_PARTS - 1 (removing background)
-    const std::vector<unsigned int> POSE_COCO_MAP_IDX   {31,32, 39,40, 33,34, 35,36, 41,42, 43,44, 19,20, 21,22, 23,24, 25,26, 27,28, 29,30, 47,48, 49,50, 53,54, 51,52, 55,56, 37,38, 45,46};
-    #define POSE_COCO_PAIRS_TO_RENDER                   {1,2,   1,5,   2,3,   3,4,   5,6,   6,7,   1,8,   8,9,   9,10, 1,11,  11,12, 12,13,  1,0,   0,14, 14,16,  0,15, 15,17}
-    const std::vector<unsigned int> POSE_COCO_PAIRS     {1,2,   1,5,   2,3,   3,4,   5,6,   6,7,   1,8,   8,9,   9,10, 1,11,  11,12, 12,13,  1,0,   0,14, 14,16,  0,15, 15,17,   2,16,  5,17};
+    const unsigned int POSE_COCO_NUMBER_PARTS               = 18u; // Equivalent to size of std::map POSE_COCO_BODY_PARTS - 1 (removing background)
+    const std::vector<unsigned int> POSE_COCO_MAP_IDX       {31,32, 39,40, 33,34, 35,36, 41,42, 43,44, 19,20, 21,22, 23,24, 25,26, 27,28, 29,30, 47,48, 49,50, 53,54, 51,52, 55,56, 37,38, 45,46};
+    #define POSE_COCO_PAIRS_RENDER_GPU                       {1,2,   1,5,   2,3,   3,4,   5,6,   6,7,   1,8,   8,9,   9,10, 1,11,  11,12, 12,13,  1,0,   0,14, 14,16,  0,15, 15,17}
+    const std::vector<unsigned int> POSE_COCO_PAIRS_RENDER  {POSE_COCO_PAIRS_RENDER_GPU};
+    const std::vector<unsigned int> POSE_COCO_PAIRS         {1,2,   1,5,   2,3,   3,4,   5,6,   6,7,   1,8,   8,9,   9,10, 1,11,  11,12, 12,13,  1,0,   0,14, 14,16,  0,15, 15,17,   2,16,  5,17};
+    #define POSE_COCO_COLORS_RENDER \
+        255.f,     0.f,     0.f, \
+        255.f,    85.f,     0.f, \
+        255.f,   170.f,     0.f, \
+        255.f,   255.f,     0.f, \
+        170.f,   255.f,     0.f, \
+         85.f,   255.f,     0.f, \
+          0.f,   255.f,     0.f, \
+          0.f,   255.f,    85.f, \
+          0.f,   255.f,   170.f, \
+          0.f,   255.f,   255.f, \
+          0.f,   170.f,   255.f, \
+          0.f,    85.f,   255.f, \
+          0.f,     0.f,   255.f, \
+         85.f,     0.f,   255.f, \
+        170.f,     0.f,   255.f, \
+        255.f,     0.f,   255.f, \
+        255.f,     0.f,   170.f, \
+        255.f,     0.f,    85.f

    const std::map<unsigned int, std::string> POSE_MPI_BODY_PARTS{
        {0,  "Head"},
@@ -56,8 +76,25 @@ namespace op
    };
    const unsigned int POSE_MPI_NUMBER_PARTS            = 15; // Equivalent to size of std::map POSE_MPI_NUMBER_PARTS - 1 (removing background)
    const std::vector<unsigned int> POSE_MPI_MAP_IDX    {16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43};
-    #define POSE_MPI_PAIRS_TO_RENDER                    { 0,1,    1,2,    2,3,     3,4,   1,5,    5,6,    6,7,    1,14,  14,8,    8,9,   9,10,  14,11,   11,12,  12,13}
-    const std::vector<unsigned int> POSE_MPI_PAIRS      POSE_MPI_PAIRS_TO_RENDER;
+    #define POSE_MPI_PAIRS_RENDER_GPU                    { 0,1,    1,2,    2,3,     3,4,   1,5,    5,6,    6,7,    1,14,  14,8,    8,9,   9,10,  14,11,   11,12,  12,13}
+    const std::vector<unsigned int> POSE_MPI_PAIRS      POSE_MPI_PAIRS_RENDER_GPU;
+    // MPI colors chosen such that they are closed to COCO colors
+    #define POSE_MPI_COLORS_RENDER \
+        255.f,     0.f,    85.f, \
+        255.f,     0.f,     0.f, \
+        255.f,    85.f,     0.f, \
+        255.f,   170.f,     0.f, \
+        255.f,   255.f,     0.f, \
+        170.f,   255.f,     0.f, \
+         85.f,   255.f,     0.f, \
+         43.f,   255.f,     0.f, \
+          0.f,   255.f,     0.f, \
+          0.f,   255.f,    85.f, \
+          0.f,   255.f,   170.f, \
+          0.f,   255.f,   255.f, \
+          0.f,   170.f,   255.f, \
+          0.f,    85.f,   255.f, \
+          0.f,     0.f,   255.f

    // Constant Global Parameters
    const unsigned int POSE_MAX_PEOPLE = 96u;
@@ -67,6 +104,7 @@ namespace op
    const std::array<unsigned int, (int)PoseModel::Size>        POSE_MAX_PEAKS{             POSE_MAX_PEOPLE,        POSE_MAX_PEOPLE,        POSE_MAX_PEOPLE};
    const std::array<unsigned int, (int)PoseModel::Size>        POSE_NUMBER_BODY_PARTS{     POSE_COCO_NUMBER_PARTS, POSE_MPI_NUMBER_PARTS,  POSE_MPI_NUMBER_PARTS};
    const std::array<std::vector<unsigned int>, 3>              POSE_BODY_PART_PAIRS{       POSE_COCO_PAIRS,        POSE_MPI_PAIRS,         POSE_MPI_PAIRS};
+    const std::array<std::vector<unsigned int>, 3>              POSE_BODY_PART_PAIRS_RENDER{POSE_COCO_PAIRS_RENDER, POSE_MPI_PAIRS,         POSE_MPI_PAIRS};
    const std::array<std::vector<unsigned int>, 3>              POSE_MAP_IDX{               POSE_COCO_MAP_IDX,      POSE_MPI_MAP_IDX,       POSE_MPI_MAP_IDX};
    const std::array<std::string, (int)PoseModel::Size> POSE_PROTOTXT{  "pose/coco/pose_deploy_linevec.prototxt",
                                                                        "pose/mpi/pose_deploy_linevec.prototxt",
@@ -74,17 +112,17 @@ namespace op
    const std::array<std::string, (int)PoseModel::Size> POSE_TRAINED_MODEL{ "pose/coco/pose_iter_440000.caffemodel",
                                                                            "pose/mpi/pose_iter_160000.caffemodel",
                                                                            "pose/mpi/pose_iter_160000.caffemodel"};
-	// POSE_BODY_PART_MAPPING crashes on Windows at dynamic initialization, to avoid this crash:
-	// POSE_BODY_PART_MAPPING has been moved to poseParameters.cpp and getPoseBodyPartMapping() wraps it
-	//const std::array<std::map<unsigned int, std::string>, 3>   POSE_BODY_PART_MAPPING{     POSE_COCO_BODY_PARTS,   POSE_MPI_BODY_PARTS,    POSE_MPI_BODY_PARTS};
-	const std::map<unsigned int, std::string>& getPoseBodyPartMapping(const PoseModel poseModel);
+    // POSE_BODY_PART_MAPPING crashes on Windows at dynamic initialization, to avoid this crash:
+    // POSE_BODY_PART_MAPPING has been moved to poseParameters.cpp and getPoseBodyPartMapping() wraps it
+    //const std::array<std::map<unsigned int, std::string>, 3>   POSE_BODY_PART_MAPPING{     POSE_COCO_BODY_PARTS,   POSE_MPI_BODY_PARTS,    POSE_MPI_BODY_PARTS};
+    const std::map<unsigned int, std::string>& getPoseBodyPartMapping(const PoseModel poseModel);

    // Default Model Parameters
    // They might be modified on running time
    const std::array<float, (int)PoseModel::Size>           POSE_DEFAULT_NMS_THRESHOLD{                     0.05f,      0.6f,       0.3f};
-    const std::array<unsigned int, (int)PoseModel::Size>   POSE_DEFAULT_CONNECT_INTER_MIN_ABOVE_THRESHOLD{ 9,          8,          8};
+    const std::array<unsigned int, (int)PoseModel::Size>    POSE_DEFAULT_CONNECT_INTER_MIN_ABOVE_THRESHOLD{ 9,          8,          8};
    const std::array<float, (int)PoseModel::Size>           POSE_DEFAULT_CONNECT_INTER_THRESHOLD{           0.05f,      0.01f,      0.01f};
-    const std::array<unsigned int, (int)PoseModel::Size>   POSE_DEFAULT_CONNECT_MIN_SUBSET_CNT{            3,          3,          3};
+    const std::array<unsigned int, (int)PoseModel::Size>    POSE_DEFAULT_CONNECT_MIN_SUBSET_CNT{            3,          3,          3};
    const std::array<float, (int)PoseModel::Size>           POSE_DEFAULT_CONNECT_MIN_SUBSET_SCORE{          0.4f,       0.4f,       0.4f};

    // Rendering default parameters

--- a/include/openpose/pose/poseRenderGpu.hpp
+++ b/include/openpose/pose/poseRenderGpu.hpp
-#ifndef OPENPOSE_POSE_GPU_POSE_RENDER_HPP
-#define OPENPOSE_POSE_GPU_POSE_RENDER_HPP
-
-#include <openpose/core/point.hpp>
-#include "enumClasses.hpp"
-#include "poseParameters.hpp"
-
-namespace op
-{
-    void renderPoseGpu(float* framePtr, const PoseModel poseModel, const int numberPeople, const Point<int>& frameSize, const float* const posePtr,
-                       const bool googlyEyes = false, const bool blendOriginalFrame = true, const float alphaBlending = POSE_DEFAULT_ALPHA_KEYPOINT);
-    void renderBodyPartGpu(float* frame, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatmap, const Point<int>& heatmapSize,
-                           const float scaleToKeepRatio, const int part, const float alphaBlending = POSE_DEFAULT_ALPHA_HEAT_MAP);
-    void renderBodyPartsGpu(float* frame, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatmap, const Point<int>& heatmapSize,
-                            const float scaleToKeepRatio, const float alphaBlending = POSE_DEFAULT_ALPHA_HEAT_MAP);
-    void renderPartAffinityFieldGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatmapPtr,
-                                    const Point<int>& heatmapSize, const float scaleToKeepRatio, const int part, const float alphaBlending = POSE_DEFAULT_ALPHA_HEAT_MAP);
-    void renderPartAffinityFieldsGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatmapPtr,
-                                     const Point<int>& heatmapSize, const float scaleToKeepRatio, const float alphaBlending = POSE_DEFAULT_ALPHA_HEAT_MAP);
-}
-
-#endif // OPENPOSE_POSE_GPU_POSE_RENDER_HPP
--- a/include/openpose/pose/poseRenderer.hpp
+++ b/include/openpose/pose/poseRenderer.hpp
@@ -3,6 +3,7 @@

 #include <memory> // std::shared_ptr
 #include <openpose/core/array.hpp>
+#include <openpose/core/enumClasses.hpp>
 #include <openpose/core/point.hpp>
 #include <openpose/core/renderer.hpp>
 #include <openpose/utilities/macros.hpp>
@@ -14,9 +15,10 @@ namespace op
    class PoseRenderer : public Renderer
    {
    public:
-        explicit PoseRenderer(const Point<int>& heatMapsSize, const Point<int>& outputSize, const PoseModel poseModel, const std::shared_ptr<PoseExtractor>& poseExtractor,
-                              const bool blendOriginalFrame = true, const float alphaKeypoint = POSE_DEFAULT_ALPHA_KEYPOINT,
-                              const float alphaHeatMap = POSE_DEFAULT_ALPHA_HEAT_MAP, const unsigned int elementToRender = 0u);
+        explicit PoseRenderer(const Point<int>& heatMapsSize, const Point<int>& outputSize, const PoseModel poseModel,
+                              const std::shared_ptr<PoseExtractor>& poseExtractor, const bool blendOriginalFrame = true,
+                              const float alphaKeypoint = POSE_DEFAULT_ALPHA_KEYPOINT, const float alphaHeatMap = POSE_DEFAULT_ALPHA_HEAT_MAP,
+                              const unsigned int elementToRender = 0u, const RenderMode renderMode = RenderMode::Cpu);

        ~PoseRenderer();

@@ -38,10 +40,15 @@ namespace op
        const PoseModel mPoseModel;
        const std::map<unsigned int, std::string> mPartIndexToName;
        const std::shared_ptr<PoseExtractor> spPoseExtractor;
+        const RenderMode mRenderMode;
        std::atomic<bool> mBlendOriginalFrame;
        std::atomic<bool> mShowGooglyEyes;
        // Init with thread
-        float* pGpuPose;        // GPU aux memory
+        float* pGpuPose; // GPU aux memory
+
+        std::pair<int, std::string> renderPoseCpu(Array<float>& outputData, const Array<float>& poseKeypoints, const float scaleNetToOutput = -1.f);
+
+        std::pair<int, std::string> renderPoseGpu(Array<float>& outputData, const Array<float>& poseKeypoints, const float scaleNetToOutput = -1.f);

        DELETE_COPY(PoseRenderer);
    };

--- a/include/openpose/pose/renderPose.hpp
+++ b/include/openpose/pose/renderPose.hpp
+#ifndef OPENPOSE_POSE_RENDER_POSE_HPP
+#define OPENPOSE_POSE_RENDER_POSE_HPP
+
+#include <opencv2/core/core.hpp> // cv::Mat
+#include <openpose/core/array.hpp>
+#include <openpose/core/point.hpp>
+#include "enumClasses.hpp"
+#include "poseParameters.hpp"
+
+namespace op
+{
+    void renderPoseKeypointsCpu(Array<float>& frameArray, const Array<float>& poseKeypoints, const PoseModel poseModel,
+                                const bool blendOriginalFrame = true);
+
+    void renderPoseKeypointsGpu(float* framePtr, const PoseModel poseModel, const int numberPeople, const Point<int>& frameSize,
+                                const float* const posePtr, const bool googlyEyes = false, const bool blendOriginalFrame = true,
+                                const float alphaBlending = POSE_DEFAULT_ALPHA_KEYPOINT);
+
+    void renderPoseHeatMapGpu(float* frame, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatmap,
+                              const Point<int>& heatmapSize, const float scaleToKeepRatio, const int part,
+                              const float alphaBlending = POSE_DEFAULT_ALPHA_HEAT_MAP);
+
+    void renderPoseHeatMapsGpu(float* frame, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatmap,
+                               const Point<int>& heatmapSize, const float scaleToKeepRatio,
+                               const float alphaBlending = POSE_DEFAULT_ALPHA_HEAT_MAP);
+
+    void renderPosePAFGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatmapPtr,
+                          const Point<int>& heatmapSize, const float scaleToKeepRatio, const int part,
+                          const float alphaBlending = POSE_DEFAULT_ALPHA_HEAT_MAP);
+
+    void renderPosePAFsGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatmapPtr,
+                           const Point<int>& heatmapSize, const float scaleToKeepRatio,
+                           const float alphaBlending = POSE_DEFAULT_ALPHA_HEAT_MAP);
+}
+
+#endif // OPENPOSE_POSE_RENDER_POSE_HPP
--- a/include/openpose/thread/workerConsumer.hpp
+++ b/include/openpose/thread/workerConsumer.hpp
@@ -13,7 +13,7 @@ namespace op

        inline void work(TDatums& tDatums)
        {
-        	workConsumer(tDatums);
+            workConsumer(tDatums);
        }

    protected:

--- a/include/openpose/utilities/cuda.hpp
+++ b/include/openpose/utilities/cuda.hpp
@@ -20,7 +20,7 @@ namespace op
        return (totalRequired + numberCudaThreads - 1) / numberCudaThreads;
    }

-	dim3 getNumberCudaBlocks(const Point<int>& frameSize, const dim3 numberCudaThreads = dim3{ CUDA_NUM_THREADS, CUDA_NUM_THREADS, 1 });
+    dim3 getNumberCudaBlocks(const Point<int>& frameSize, const dim3 numberCudaThreads = dim3{ CUDA_NUM_THREADS, CUDA_NUM_THREADS, 1 });

    std::pair<dim3, dim3> getNumberCudaThreadsAndBlocks(const Point<int>& frameSize);
 }

--- a/include/openpose/utilities/headers.hpp
+++ b/include/openpose/utilities/headers.hpp
@@ -8,6 +8,7 @@
 #include "errorAndLog.hpp"
 #include "fastMath.hpp"
 #include "fileSystem.hpp"
+#include "keypoint.hpp"
 #include "macros.hpp"
 #include "openCv.hpp"
 #include "pointerContainer.hpp"

--- a/include/openpose/utilities/keypoint.hpp
+++ b/include/openpose/utilities/keypoint.hpp
+#ifndef OPENPOSE_UTILITIES_KEYPOINT_HPP
+#define OPENPOSE_UTILITIES_KEYPOINT_HPP
+
+#include <vector>
+#include <openpose/core/array.hpp>
+#include <openpose/core/rectangle.hpp>
+
+namespace op
+{
+    float getDistance(const float* keypointPtr, const int elementA, const int elementB);
+
+    void scaleKeypoints(Array<float>& keypoints, const float scale);
+
+    void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY);
+
+    void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY, const float offsetX, const float offsetY);
+
+    void renderKeypointsCpu(Array<float>& frameArray, const Array<float>& keypoints, const std::vector<unsigned int>& pairs,
+                            const std::vector<float> colors, const float thicknessCircleRatio, const float thicknessLineRatioWRTCircle);
+
+    Rectangle<unsigned int> getKeypointsRectangle(const float* keypointPtr, const int numberBodyParts, const float threshold);
+
+    int getKeypointsArea(const float* keypointPtr, const int numberBodyParts, const float threshold);
+
+    int getBiggestPerson(const Array<float>& keypoints, const float threshold);
+}
+
+#endif // OPENPOSE_UTILITIES_KEYPOINT_HPP
--- a/include/openpose/wrapper/wrapper.hpp
+++ b/include/openpose/wrapper/wrapper.hpp
@@ -80,17 +80,15 @@ namespace op
                       // Consumer (keep default values to disable any output)
                       const WrapperStructOutput& wrapperStructOutput = WrapperStructOutput{});

-        // THIS FUNCTION IS NOT IMPLEMENTED YET -> COMING SOON
        // Similar to the previos configure, but it includes hand extraction and rendering
        void configure(const WrapperStructPose& wrapperStructPose,
                       // Hand (use the default WrapperStructHand{} to disable any hand detector)
-                       const experimental::WrapperStructHand& wrapperHandStruct,
+                       const experimental::WrapperStructHand& wrapperStructHand,
                       // Producer (set producerSharedPtr = nullptr or use the default WrapperStructInput{} to disable any input)
                       const WrapperStructInput& wrapperStructInput,
                       // Consumer (keep default values to disable any output)
                       const WrapperStructOutput& wrapperStructOutput = WrapperStructOutput{});

-        // THIS FUNCTION IS NOT IMPLEMENTED YET -> COMING SOON
        // Similar to the previos configure, but it includes hand extraction and rendering
        void configure(const WrapperStructPose& wrapperStructPose,
                       // Face (use the default WrapperStructFace{} to disable any face detector)
@@ -100,13 +98,12 @@ namespace op
                       // Consumer (keep default values to disable any output)
                       const WrapperStructOutput& wrapperStructOutput = WrapperStructOutput{});

-        // THIS FUNCTION IS NOT IMPLEMENTED YET -> COMING SOON
        // Similar to the previos configure, but it includes hand extraction and rendering
        void configure(const WrapperStructPose& wrapperStructPose = WrapperStructPose{},
                       // Face (use the default WrapperStructFace{} to disable any face detector)
                       const WrapperStructFace& wrapperStructFace = WrapperStructFace{},
                       // Hand (use the default WrapperStructHand{} to disable any hand detector)
-                       const experimental::WrapperStructHand& wrapperHandStruct = experimental::WrapperStructHand{},
+                       const experimental::WrapperStructHand& wrapperStructHand = experimental::WrapperStructHand{},
                       // Producer (set producerSharedPtr = nullptr or use the default WrapperStructInput{} to disable any input)
                       const WrapperStructInput& wrapperStructInput = WrapperStructInput{},
                       // Consumer (keep default values to disable any output)
@@ -372,12 +369,14 @@ namespace op
    }

    template<typename TDatums, typename TWorker, typename TQueue>
-    void Wrapper<TDatums, TWorker, TQueue>::configure(const WrapperStructPose& wrapperStructPose, const WrapperStructInput& wrapperStructInput,
+    void Wrapper<TDatums, TWorker, TQueue>::configure(const WrapperStructPose& wrapperStructPose,
+                                                      const WrapperStructInput& wrapperStructInput,
                                                      const WrapperStructOutput& wrapperStructOutput)
    {
        try
        {
-            configure(wrapperStructPose, WrapperStructFace{}, experimental::WrapperStructHand{}, wrapperStructInput, wrapperStructOutput);
+            configure(wrapperStructPose, WrapperStructFace{}, experimental::WrapperStructHand{},
+                      wrapperStructInput, wrapperStructOutput);
        }
        catch (const std::exception& e)
        {
@@ -386,12 +385,15 @@ namespace op
    }

    template<typename TDatums, typename TWorker, typename TQueue>
-    void Wrapper<TDatums, TWorker, TQueue>::configure(const WrapperStructPose& wrapperStructPose, const WrapperStructFace& wrapperStructFace,
-                                                      const WrapperStructInput& wrapperStructInput, const WrapperStructOutput& wrapperStructOutput)
+    void Wrapper<TDatums, TWorker, TQueue>::configure(const WrapperStructPose& wrapperStructPose,
+                                                      const WrapperStructFace& wrapperStructFace,
+                                                      const WrapperStructInput& wrapperStructInput,
+                                                      const WrapperStructOutput& wrapperStructOutput)
    {
        try
        {
-            configure(wrapperStructPose, wrapperStructFace, experimental::WrapperStructHand{}, wrapperStructInput, wrapperStructOutput);
+            configure(wrapperStructPose, wrapperStructFace, experimental::WrapperStructHand{},
+                      wrapperStructInput, wrapperStructOutput);
        }
        catch (const std::exception& e)
        {
@@ -400,12 +402,15 @@ namespace op
    }

    template<typename TDatums, typename TWorker, typename TQueue>
-    void Wrapper<TDatums, TWorker, TQueue>::configure(const WrapperStructPose& wrapperStructPose, const experimental::WrapperStructHand& wrapperHandStruct,
-                                                      const WrapperStructInput& wrapperStructInput, const WrapperStructOutput& wrapperStructOutput)
+    void Wrapper<TDatums, TWorker, TQueue>::configure(const WrapperStructPose& wrapperStructPose,
+                                                      const experimental::WrapperStructHand& wrapperStructHand,
+                                                      const WrapperStructInput& wrapperStructInput,
+                                                      const WrapperStructOutput& wrapperStructOutput)
    {
        try
        {
-            configure(wrapperStructPose, WrapperStructFace{}, wrapperHandStruct, wrapperStructInput, wrapperStructOutput);
+            configure(wrapperStructPose, WrapperStructFace{}, wrapperStructHand, 
+                      wrapperStructInput, wrapperStructOutput);
        }
        catch (const std::exception& e)
        {
@@ -414,8 +419,10 @@ namespace op
    }

    template<typename TDatums, typename TWorker, typename TQueue>
-    void Wrapper<TDatums, TWorker, TQueue>::configure(const WrapperStructPose& wrapperStructPose, const WrapperStructFace& wrapperStructFace,
-                                                      const experimental::WrapperStructHand& wrapperHandStruct, const WrapperStructInput& wrapperStructInput,
+    void Wrapper<TDatums, TWorker, TQueue>::configure(const WrapperStructPose& wrapperStructPose,
+                                                      const WrapperStructFace& wrapperStructFace,
+                                                      const experimental::WrapperStructHand& wrapperStructHand,
+                                                      const WrapperStructInput& wrapperStructInput,
                                                      const WrapperStructOutput& wrapperStructOutput)
    {
        try
@@ -426,8 +433,9 @@ namespace op
            typedef std::shared_ptr<TDatums> TDatumsPtr;

            // Required parameters
-            const auto renderOutput = wrapperStructPose.renderOutput || wrapperStructFace.renderOutput;
-            const auto renderFace = wrapperStructFace.enable && wrapperStructFace.renderOutput;
+            const auto renderOutput = wrapperStructPose.renderMode != RenderMode::None || wrapperStructFace.renderMode != RenderMode::None;
+            const auto renderOutputGpu = wrapperStructPose.renderMode == RenderMode::Gpu || wrapperStructFace.renderMode == RenderMode::Gpu;
+            const auto renderFace = wrapperStructFace.enable && wrapperStructFace.renderMode != RenderMode::None;

            // Check no wrong/contradictory flags enabled
            if (wrapperStructPose.alphaKeypoint < 0. || wrapperStructPose.alphaKeypoint > 1.
@@ -563,22 +571,38 @@ namespace op

            // Pose renderers
            std::vector<std::shared_ptr<PoseRenderer>> poseRenderers;
-            if (wrapperStructPose.renderOutput || renderOutput)
+            std::shared_ptr<PoseRenderer> poseCpuRenderer;
+            std::vector<TWorker> cpuRenderers;
+            if (renderOutputGpu || wrapperStructPose.renderMode == RenderMode::Cpu)
            {
-                // If !wrapperStructPose.renderOutput but renderOutput, then we create an alpha = 0 pose renderer
+                // If !wrapperStructPose.renderMode == RenderMode::Gpu but renderOutput, then we create an alpha = 0 pose renderer
                // in order to keep the removing background option
-                const auto alphaKeypoint = (wrapperStructPose.renderOutput ? wrapperStructPose.alphaKeypoint : 0.f);
-                const auto alphaHeatMap = (wrapperStructPose.renderOutput ? wrapperStructPose.alphaHeatMap : 0.f);
-                for (auto gpuId = 0; gpuId < poseExtractors.size(); gpuId++)
+                const auto alphaKeypoint = (wrapperStructPose.renderMode != RenderMode::None ? wrapperStructPose.alphaKeypoint : 0.f);
+                const auto alphaHeatMap = (wrapperStructPose.renderMode != RenderMode::None ? wrapperStructPose.alphaHeatMap : 0.f);
+                // GPU rendering
+                if (renderOutputGpu)
                {
-                    poseRenderers.emplace_back(std::make_shared<PoseRenderer>(
-                        poseNetOutputSize, finalOutputSize, wrapperStructPose.poseModel, poseExtractors[gpuId],
+                    for (auto gpuId = 0; gpuId < poseExtractors.size(); gpuId++)
+                    {
+                        poseRenderers.emplace_back(std::make_shared<PoseRenderer>(
+                            poseNetOutputSize, finalOutputSize, wrapperStructPose.poseModel, poseExtractors[gpuId],
+                            wrapperStructPose.blendOriginalFrame, alphaKeypoint,
+                            alphaHeatMap, wrapperStructPose.defaultPartToRender, wrapperStructPose.renderMode
+                        ));
+                    }
+                }
+                // CPU rendering
+                if (wrapperStructPose.renderMode == RenderMode::Cpu)
+                {
+                    poseCpuRenderer = std::make_shared<PoseRenderer>(
+                        poseNetOutputSize, finalOutputSize, wrapperStructPose.poseModel, nullptr,
                        wrapperStructPose.blendOriginalFrame, alphaKeypoint,
-                        alphaHeatMap, wrapperStructPose.defaultPartToRender
-                    ));
+                        alphaHeatMap, wrapperStructPose.defaultPartToRender, wrapperStructPose.renderMode
+                    );
+                    cpuRenderers.emplace_back(std::make_shared<WPoseRenderer<TDatumsPtr>>(poseCpuRenderer));
                }
+                log("", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
            }
-            log("", Priority::Low, __LINE__, __FUNCTION__, __FILE__);

            // Input cvMat to OpenPose format
            const auto cvMatToOpInput = std::make_shared<CvMatToOpInput>(
@@ -589,7 +613,6 @@ namespace op
            spWCvMatToOpOutput = std::make_shared<WCvMatToOpOutput<TDatumsPtr>>(cvMatToOpOutput);

            // Pose extractor(s)
-            spWPoses.clear();
            spWPoses.resize(poseExtractors.size());
            for (auto i = 0; i < spWPoses.size(); i++)
                spWPoses.at(i) = {std::make_shared<WPoseExtractor<TDatumsPtr>>(poseExtractors.at(i))};
@@ -612,7 +635,7 @@ namespace op
            }

            // Hand extractor(s)
-            if (wrapperHandStruct.extractAndRenderHands)
+            if (wrapperStructHand.extractAndRenderHands)
            {
                for (auto gpuId = 0; gpuId < spWPoses.size(); gpuId++)
                {
@@ -624,44 +647,76 @@ namespace op
            }

            // Pose renderer(s)
-            if (!poseRenderers.empty())
+            if (renderOutputGpu && !poseRenderers.empty())
                for (auto i = 0; i < spWPoses.size(); i++)
                    spWPoses.at(i).emplace_back(std::make_shared<WPoseRenderer<TDatumsPtr>>(poseRenderers.at(i)));

-            // Hands renderer(s)
-            if (wrapperHandStruct.extractAndRenderHands)
+            // Face renderer(s)
+            if (renderFace)
            {
-                for (auto i = 0; i < spWPoses.size(); i++)
+                // CPU rendering
+                if (wrapperStructFace.renderMode == RenderMode::Cpu)
                {
-                    // Construct hands renderer
-                    const auto handRenderer = std::make_shared<experimental::HandRenderer>(finalOutputSize);
-                    // Performance boost -> share spGpuMemoryPtr for all renderers
-                    if (!poseRenderers.empty())
+                    // Construct face renderer
+                    const auto faceRenderer = std::make_shared<FaceRenderer>(finalOutputSize, wrapperStructFace.alphaKeypoint,
+                                                                             wrapperStructFace.alphaHeatMap,
+                                                                             wrapperStructFace.renderMode);
+                    // Add worker
+                    cpuRenderers.emplace_back(std::make_shared<WFaceRenderer<TDatumsPtr>>(faceRenderer));
+                }
+                // GPU rendering
+                else if (wrapperStructFace.renderMode == RenderMode::Gpu)
+                {
+                    for (auto i = 0; i < spWPoses.size(); i++)
                    {
-                        const bool isLastRenderer = (!renderFace);
-                        handRenderer->setSharedParametersAndIfLast(poseRenderers.at(i)->getSharedParameters(), isLastRenderer);
+                        // Construct face renderer
+                        const auto faceRenderer = std::make_shared<FaceRenderer>(finalOutputSize, wrapperStructFace.alphaKeypoint,
+                                                                                 wrapperStructFace.alphaHeatMap,
+                                                                                 wrapperStructFace.renderMode);
+                        // Performance boost -> share spGpuMemoryPtr for all renderers
+                        if (!poseRenderers.empty())
+                        {
+                            const bool isLastRenderer = (!wrapperStructHand.extractAndRenderHands);
+                            faceRenderer->setSharedParametersAndIfLast(poseRenderers.at(i)->getSharedParameters(), isLastRenderer);
+                        }
+                        // Add worker
+                        spWPoses.at(i).emplace_back(std::make_shared<WFaceRenderer<TDatumsPtr>>(faceRenderer));
                    }
-                    // Add worker
-                    spWPoses.at(i).emplace_back(std::make_shared<experimental::WHandRenderer<TDatumsPtr>>(handRenderer));
                }
+                else
+                    error("Unknown RenderMode.", __LINE__, __FUNCTION__, __FILE__);
            }

-            // Face renderer(s)
-            if (renderFace)
+            // Hands renderer(s)
+            if (wrapperStructHand.extractAndRenderHands)
            {
-                for (auto i = 0; i < spWPoses.size(); i++)
+                // CPU rendering
+                // if (wrapperStructHand.renderMode == RenderMode::Cpu)
                {
-                    // Construct face renderer
-                    const auto faceRenderer = std::make_shared<FaceRenderer>(finalOutputSize, wrapperStructFace.alphaKeypoint, wrapperStructFace.alphaHeatMap);
-                    // Performance boost -> share spGpuMemoryPtr for all renderers
-                    if (!poseRenderers.empty())
-                    {
-                        const bool isLastRenderer = true;
-                        faceRenderer->setSharedParametersAndIfLast(poseRenderers.at(i)->getSharedParameters(), isLastRenderer);
-                    }
+                    // Construct hand renderer
+                    const auto handRenderer = std::make_shared<experimental::HandRenderer>(finalOutputSize);
                    // Add worker
-                    spWPoses.at(i).emplace_back(std::make_shared<WFaceRenderer<TDatumsPtr>>(faceRenderer));
+                    cpuRenderers.emplace_back(std::make_shared<experimental::WHandRenderer<TDatumsPtr>>(handRenderer));
                }
+                // GPU rendering
+                // else if (wrapperStructHand.renderMode == RenderMode::Gpu)
+                // {
+                    // for (auto i = 0; i < spWPoses.size(); i++)
+                    // {
+                    //     // Construct hands renderer
+                    //     const auto handRenderer = std::make_shared<experimental::HandRenderer>(finalOutputSize);
+                    //     // Performance boost -> share spGpuMemoryPtr for all renderers
+                    //     if (!poseRenderers.empty())
+                    //     {
+                    //         const bool isLastRenderer = true;
+                    //         handRenderer->setSharedParametersAndIfLast(poseRenderers.at(i)->getSharedParameters(), isLastRenderer);
+                    //     }
+                    //     // Add worker
+                    //     spWPoses.at(i).emplace_back(std::make_shared<experimental::WHandRenderer<TDatumsPtr>>(handRenderer));
+                    // }
+                // }
+                // else
+                //     error("Unknown RenderMode.", __LINE__, __FUNCTION__, __FILE__);
            }

            // Itermediate workers (e.g. OpenPose format to cv::Mat, json & frames recorder, ...)
@@ -672,6 +727,7 @@ namespace op
            // Frames processor (OpenPose format -> cv::Mat format)
            if (renderOutput)
            {
+                mPostProcessingWs = mergeWorkers(mPostProcessingWs, cpuRenderers);
                const auto opOutputToCvMat = std::make_shared<OpOutputToCvMat>(finalOutputSize);
                mPostProcessingWs.emplace_back(std::make_shared<WOpOutputToCvMat<TDatumsPtr>>(opOutputToCvMat));
            }
@@ -692,7 +748,7 @@ namespace op
                mOutputWs.emplace_back(std::make_shared<WPoseSaver<TDatumsPtr>>(keypointSaver));
                if (wrapperStructFace.enable)
                    mOutputWs.emplace_back(std::make_shared<WFaceSaver<TDatumsPtr>>(keypointSaver));
-                if (wrapperHandStruct.extractAndRenderHands)
+                if (wrapperStructHand.extractAndRenderHands)
                    mOutputWs.emplace_back(std::make_shared<WHandSaver<TDatumsPtr>>(keypointSaver));
            }
            // Write people pose data on disk (json format)
@@ -702,7 +758,7 @@ namespace op
                mOutputWs.emplace_back(std::make_shared<WPoseJsonSaver<TDatumsPtr>>(keypointJsonSaver));
                if (wrapperStructFace.enable)
                    mOutputWs.emplace_back(std::make_shared<WFaceJsonSaver<TDatumsPtr>>(keypointJsonSaver));
-                if (wrapperHandStruct.extractAndRenderHands)
+                if (wrapperStructHand.extractAndRenderHands)
                    mOutputWs.emplace_back(std::make_shared<WHandJsonSaver<TDatumsPtr>>(keypointJsonSaver));
            }
            // Write people pose data on disk (COCO validation json format)
@@ -750,7 +806,8 @@ namespace op
            if (wrapperStructOutput.displayGui)
            {
                const auto gui = std::make_shared<Gui>(
-                    wrapperStructOutput.fullScreen, finalOutputSize, mThreadManager.getIsRunningSharedPtr(), spVideoSeek, poseExtractors, poseRenderers
+                    wrapperStructOutput.fullScreen, finalOutputSize, mThreadManager.getIsRunningSharedPtr(), spVideoSeek, poseExtractors,
+                    (wrapperStructPose.renderMode == RenderMode::Cpu ? std::vector<std::shared_ptr<PoseRenderer>>{poseCpuRenderer} : poseRenderers)
                );
                spWGui = {std::make_shared<WGui<TDatumsPtr>>(gui)};
            }

--- a/include/openpose/wrapper/wrapperStructFace.hpp
+++ b/include/openpose/wrapper/wrapperStructFace.hpp
 #ifndef OPENPOSE_WRAPPER_WRAPPER_STRUCT_FACE_HPP
 #define OPENPOSE_WRAPPER_WRAPPER_STRUCT_FACE_HPP

+#include <openpose/core/enumClasses.hpp>
 #include <openpose/core/point.hpp>
 #include <openpose/face/faceParameters.hpp>

@@ -27,9 +28,10 @@ namespace op
        Point<int> netInputSize;

        /**
-         * Whether to render the output (pose locations, face, background or PAF heat maps).
+         * Whether to render the output (pose locations, body, background or PAF heat maps) with CPU or GPU.
+         * Select `None` for no rendering, `Cpu` or `Gpu` por CPU and GPU rendering respectively.
         */
-        bool renderOutput;
+        RenderMode renderMode;

        /**
         * Rendering blending alpha value of the pose point locations with respect to the background image.
@@ -49,7 +51,8 @@ namespace op
         * Since all the elements of the struct are public, they can also be manually filled.
         */
        WrapperStructFace(const bool enable = false, const Point<int>& netInputSize = Point<int>{368, 368},
-                          const bool renderOutput = false, const float alphaKeypoint = FACE_DEFAULT_ALPHA_KEYPOINT,
+                          const RenderMode renderMode = RenderMode::None,
+                          const float alphaKeypoint = FACE_DEFAULT_ALPHA_KEYPOINT,
                          const float alphaHeatMap = FACE_DEFAULT_ALPHA_HEAT_MAP);
    };
 }

--- a/include/openpose/wrapper/wrapperStructPose.hpp
+++ b/include/openpose/wrapper/wrapperStructPose.hpp
@@ -63,9 +63,10 @@ namespace op
        float scaleGap;

        /**
-         * Whether to render the output (pose locations, body, background or PAF heat maps).
+         * Whether to render the output (pose locations, body, background or PAF heat maps) with CPU or GPU.
+         * Select `None` for no rendering, `Cpu` or `Gpu` por CPU and GPU rendering respectively.
         */
-        bool renderOutput;
+        RenderMode renderMode;

        /**
         * Pose model, it affects the number of body parts to render
@@ -124,7 +125,7 @@ namespace op
         */
        WrapperStructPose(const Point<int>& netInputSize = Point<int>{656, 368}, const Point<int>& outputSize = Point<int>{1280, 720},
                          const ScaleMode keypointScale = ScaleMode::InputResolution, const int gpuNumber = 1, const int gpuNumberStart = 0,
-                          const int scalesNumber = 1, const float scaleGap = 0.15f, const bool renderOutput = false,
+                          const int scalesNumber = 1, const float scaleGap = 0.15f, const RenderMode renderMode = RenderMode::None,
                          const PoseModel poseModel = PoseModel::COCO_18, const bool blendOriginalFrame = true,
                          const float alphaKeypoint = POSE_DEFAULT_ALPHA_KEYPOINT, const float alphaHeatMap = POSE_DEFAULT_ALPHA_HEAT_MAP,
                          const int defaultPartToRender = 0, const std::string& modelFolder = "models/",

--- a/models/pose/coco/pose_deploy_linevec.prototxt
+++ b/models/pose/coco/pose_deploy_linevec.prototxt
 input: "image"
 input_dim: 1
 input_dim: 3
-input_dim: 1	# This value will be defined at runtime
-input_dim: 1	# This value will be defined at runtime
+input_dim: 1 # This value will be defined at runtime
+input_dim: 1 # This value will be defined at runtime
 layer {
  name: "conv1_1"
  type: "Convolution"

--- a/models/pose/mpi/pose_deploy_linevec.prototxt
+++ b/models/pose/mpi/pose_deploy_linevec.prototxt
 input: "image"
 input_dim: 1
 input_dim: 3
-input_dim: 1	# This value will be defined at runtime
-input_dim: 1	# This value will be defined at runtime
+input_dim: 1 # This value will be defined at runtime
+input_dim: 1 # This value will be defined at runtime
 layer {
  name: "conv1_1"
  type: "Convolution"

--- a/models/pose/mpi/pose_deploy_linevec_faster_4_stages.prototxt
+++ b/models/pose/mpi/pose_deploy_linevec_faster_4_stages.prototxt
 input: "image"
 input_dim: 1
 input_dim: 3
-input_dim: 1	# This value will be defined at runtime
-input_dim: 1	# This value will be defined at runtime
+input_dim: 1 # This value will be defined at runtime
+input_dim: 1 # This value will be defined at runtime
 layer {
  name: "conv1_1"
  type: "Convolution"

--- a/src/openpose/core/array.cpp
+++ b/src/openpose/core/array.cpp
@@ -322,13 +322,25 @@ namespace op
    {
        try
        {
-            if (mCvMatData.first)
-                return mCvMatData.second;
-            else
-            {
+            if (!mCvMatData.first)
                error("Array<T>: cv::Mat functions only valid for T types defined by OpenCV: unsigned char, signed char, int, float & double", __LINE__, __FUNCTION__, __FILE__);
-                return mCvMatData.second;
-            }
+            return mCvMatData.second;
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            return mCvMatData.second;
+        }
+    }
+
+    template<typename T>
+    cv::Mat& Array<T>::getCvMat()
+    {
+        try
+        {
+            if (!mCvMatData.first)
+                error("Array<T>: cv::Mat functions only valid for T types defined by OpenCV: unsigned char, signed char, int, float & double", __LINE__, __FUNCTION__, __FILE__);
+            return mCvMatData.second;
        }
        catch (const std::exception& e)
        {

--- a/src/openpose/core/keypointScaler.cpp
+++ b/src/openpose/core/keypointScaler.cpp
-#include <openpose/core/scaleKeypoints.hpp>
 #include <openpose/utilities/errorAndLog.hpp>
+#include <openpose/utilities/keypoint.hpp>
 #include <openpose/core/keypointScaler.hpp>

 namespace op

--- a/src/openpose/core/renderer.cpp
+++ b/src/openpose/core/renderer.cpp
-#include <cuda.h>
-#include <cuda_runtime_api.h>
+#ifndef CPU_ONLY
+    #include <cuda.h>
+    #include <cuda_runtime_api.h>
+#endif
 #include <openpose/utilities/errorAndLog.hpp>
 #include <openpose/core/renderer.hpp>

 namespace op
 {
-    Renderer::Renderer(const unsigned long long volume, const float alphaKeypoint, const float alphaHeatMap, const unsigned int elementToRender,
-                       const unsigned int numberElementsToRender) :
+    Renderer::Renderer(const unsigned long long volume, const float alphaKeypoint, const float alphaHeatMap,
+                       const unsigned int elementToRender, const unsigned int numberElementsToRender) :
        spGpuMemoryPtr{std::make_shared<float*>()},
        spElementToRender{std::make_shared<std::atomic<unsigned int>>(elementToRender)},
        spNumberElementsToRender{std::make_shared<const unsigned int>(numberElementsToRender)},
@@ -23,8 +25,10 @@ namespace op
    {
        try
        {
-            if (mIsLastRenderer)
-                cudaFree(*spGpuMemoryPtr);
+            #ifndef CPU_ONLY
+                if (mIsLastRenderer)
+                    cudaFree(*spGpuMemoryPtr);
+            #endif
        }
        catch (const std::exception& e)
        {
@@ -36,8 +40,10 @@ namespace op
    {
        try
        {
-            if (mIsFirstRenderer)
-                cudaMalloc((void**)(spGpuMemoryPtr.get()), mVolume * sizeof(float));
+            #ifndef CPU_ONLY
+                if (mIsFirstRenderer)
+                    cudaMalloc((void**)(spGpuMemoryPtr.get()), mVolume * sizeof(float));
+            #endif
        }
        catch (const std::exception& e)
        {
@@ -74,7 +80,8 @@ namespace op
        }
    }

-    std::tuple<std::shared_ptr<float*>, std::shared_ptr<bool>, std::shared_ptr<std::atomic<unsigned int>>, std::shared_ptr<const unsigned int>> Renderer::getSharedParameters()
+    std::tuple<std::shared_ptr<float*>, std::shared_ptr<bool>, std::shared_ptr<std::atomic<unsigned int>>,
+               std::shared_ptr<const unsigned int>> Renderer::getSharedParameters()
    {
        try
        {
@@ -88,7 +95,8 @@ namespace op
        }
    }

-    void Renderer::setSharedParametersAndIfLast(const std::tuple<std::shared_ptr<float*>, std::shared_ptr<bool>, std::shared_ptr<std::atomic<unsigned int>>,
+    void Renderer::setSharedParametersAndIfLast(const std::tuple<std::shared_ptr<float*>, std::shared_ptr<bool>,
+                                                                 std::shared_ptr<std::atomic<unsigned int>>,
                                                                 std::shared_ptr<const unsigned int>>& tuple, const bool isLast)
    {
        try
@@ -160,11 +168,16 @@ namespace op
    {
        try
        {
-            if (!*spGpuMemoryAllocated)
-            {
-                cudaMemcpy(*spGpuMemoryPtr, cpuMemory, mVolume * sizeof(float), cudaMemcpyHostToDevice);
-                *spGpuMemoryAllocated = true;
-            }
+            #ifndef CPU_ONLY
+                if (!*spGpuMemoryAllocated)
+                {
+                    cudaMemcpy(*spGpuMemoryPtr, cpuMemory, mVolume * sizeof(float), cudaMemcpyHostToDevice);
+                    *spGpuMemoryAllocated = true;
+                }
+            #else
+                error("GPU rendering not available if `CPU_ONLY` is set.", __LINE__, __FUNCTION__, __FILE__);
+                UNUSED(cpuMemory);
+            #endif
        }
        catch (const std::exception& e)
        {
@@ -176,11 +189,16 @@ namespace op
    {
        try
        {
-            if (*spGpuMemoryAllocated && mIsLastRenderer)
-            {
-                cudaMemcpy(cpuMemory, *spGpuMemoryPtr, mVolume * sizeof(float), cudaMemcpyDeviceToHost);
-                *spGpuMemoryAllocated = false;
-            }
+            #ifndef CPU_ONLY
+                if (*spGpuMemoryAllocated && mIsLastRenderer)
+                {
+                    cudaMemcpy(cpuMemory, *spGpuMemoryPtr, mVolume * sizeof(float), cudaMemcpyDeviceToHost);
+                    *spGpuMemoryAllocated = false;
+                }
+            #else
+                error("GPU rendering not available if `CPU_ONLY` is set.", __LINE__, __FUNCTION__, __FILE__);
+                UNUSED(cpuMemory);
+            #endif
        }
        catch (const std::exception& e)
        {

--- a/src/openpose/core/scaleKeypoints.cpp
+++ b/src/openpose/core/scaleKeypoints.cpp
-#include <openpose/utilities/errorAndLog.hpp>
-#include <openpose/core/scaleKeypoints.hpp>
-
-namespace op
-{
-    const std::string errorMessage = "This function is only for array of dimension: [sizeA x sizeB x 3].";
-
-    void scaleKeypoints(Array<float>& keypoints, const float scale)
-    {
-        try
-        {
-            scaleKeypoints(keypoints, scale, scale);
-        }
-        catch (const std::exception& e)
-        {
-            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-        }
-    }
-
-    void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY)
-    {
-        try
-        {
-            if (scaleX != 1. && scaleY != 1.)
-            {
-                // Error check
-                if (!keypoints.empty() && keypoints.getSize(2) != 3)
-                    error(errorMessage, __LINE__, __FUNCTION__, __FILE__);
-                // Get #people and #parts
-                const auto numberPeople = keypoints.getSize(0);
-                const auto numberParts = keypoints.getSize(1);
-                // For each person
-                for (auto person = 0 ; person < numberPeople ; person++)
-                {
-                    // For each body part
-                    for (auto part = 0 ; part < numberParts ; part++)
-                    {
-                        const auto finalIndex = 3*(person*numberParts + part);
-                        keypoints[finalIndex] *= scaleX;
-                        keypoints[finalIndex+1] *= scaleY;
-                    }
-                }
-            }
-        }
-        catch (const std::exception& e)
-        {
-            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-        }
-    }
-
-    void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY, const float offsetX, const float offsetY)
-    {
-        try
-        {
-            if (scaleX != 1. && scaleY != 1.)
-            {
-                // Error check
-                if (!keypoints.empty() && keypoints.getSize(2) != 3)
-                    error(errorMessage, __LINE__, __FUNCTION__, __FILE__);
-                // Get #people and #parts
-                const auto numberPeople = keypoints.getSize(0);
-                const auto numberParts = keypoints.getSize(1);
-                // For each person
-                for (auto person = 0 ; person < numberPeople ; person++)
-                {
-                    // For each body part
-                    for (auto part = 0 ; part < numberParts ; part++)
-                    {
-                        const auto finalIndex = 3*(person*numberParts + part);
-                        keypoints[finalIndex] = keypoints[finalIndex] * scaleX + offsetX;
-                        keypoints[finalIndex+1] = keypoints[finalIndex+1] * scaleY + offsetY;
-                    }
-                }
-            }
-        }
-        catch (const std::exception& e)
-        {
-            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-        }
-    }
-}
--- a/src/openpose/experimental/hand/handRenderer.cpp
+++ b/src/openpose/experimental/hand/handRenderer.cpp
-#include <cuda.h>
-#include <cuda_runtime_api.h>
-#include <openpose/experimental/hand/handParameters.hpp>
-#include <openpose/experimental/hand/handRenderGpu.hpp>
+#ifndef CPU_ONLY
+    #include <cuda.h>
+    #include <cuda_runtime_api.h>
+#endif
+#include <openpose/experimental/hand/renderHand.hpp>
 #include <openpose/utilities/cuda.hpp>
 #include <openpose/utilities/errorAndLog.hpp>
 #include <openpose/experimental/hand/handRenderer.hpp>
@@ -10,9 +11,10 @@ namespace op
 {
    namespace experimental
    {
-        HandRenderer::HandRenderer(const Point<int>& frameSize) :
-            Renderer{(unsigned long long)(frameSize.area() * 3), HAND_DEFAULT_ALPHA_KEYPOINT, HAND_DEFAULT_ALPHA_HEAT_MAP},
-            mFrameSize{frameSize}
+        HandRenderer::HandRenderer(const Point<int>& frameSize, const float alphaKeypoint, const float alphaHeatMap, const RenderMode renderMode) :
+            Renderer{(unsigned long long)(frameSize.area() * 3), alphaKeypoint, alphaHeatMap},
+            mFrameSize{frameSize},
+            mRenderMode{renderMode}
        {
        }

@@ -21,7 +23,9 @@ namespace op
            try
            {
                // Free CUDA pointers - Note that if pointers are 0 (i.e. nullptr), no operation is performed.
-                cudaFree(pGpuHands);
+                #ifndef CPU_ONLY
+                    cudaFree(pGpuHands);
+                #endif
            }
            catch (const std::exception& e)
            {
@@ -33,8 +37,13 @@ namespace op
        {
            try
            {
+                log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
                Renderer::initializationOnThread();
-                cudaMalloc((void**)(&pGpuHands), 2*HAND_NUMBER_PARTS * 3 * sizeof(float));
+                // GPU memory allocation for rendering
+                #ifndef CPU_ONLY
+                    cudaMalloc((void**)(&pGpuHands), 2*HAND_NUMBER_PARTS * 3 * sizeof(float));
+                #endif
+                log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
            }
            catch (const std::exception& e)
            {
@@ -42,25 +51,67 @@ namespace op
            }
        }

-        void HandRenderer::renderHands(Array<float>& outputData, const Array<float>& handKeypoints)
+        void HandRenderer::renderHand(Array<float>& outputData, const Array<float>& handKeypoints)
+        {
+            try
+            {
+                // Security checks
+                if (outputData.empty())
+                    error("Empty Array<float> outputData.", __LINE__, __FUNCTION__, __FILE__);
+
+                // CPU rendering
+                if (mRenderMode == RenderMode::Cpu)
+                    renderHandCpu(outputData, handKeypoints);
+
+                // GPU rendering
+                else
+                    renderHandGpu(outputData, handKeypoints);
+            }
+            catch (const std::exception& e)
+            {
+                error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            }
+        }
+
+        void HandRenderer::renderHandCpu(Array<float>& outputData, const Array<float>& handKeypoints)
+        {
+            try
+            {
+                renderHandKeypointsCpu(outputData, handKeypoints);
+            }
+            catch (const std::exception& e)
+            {
+                error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            }
+        }
+
+        void HandRenderer::renderHandGpu(Array<float>& outputData, const Array<float>& handKeypoints)
        {
            try
            {
-                const auto elementRendered = spElementToRender->load(); // I prefer std::round(T&) over intRound(T) for std::atomic
-                const auto numberPeople = handKeypoints.getSize(0);
                // GPU rendering
-                if (numberPeople > 0 && elementRendered == 0)
-                {
-                    cpuToGpuMemoryIfNotCopiedYet(outputData.getPtr());
-                    // Draw faceKeypoints
-                    cudaMemcpy(pGpuHands, handKeypoints.getConstPtr(), 2*HAND_NUMBER_PARTS*3 * sizeof(float), cudaMemcpyHostToDevice);
-                    renderHandsGpu(*spGpuMemoryPtr, mFrameSize, pGpuHands, handKeypoints.getSize(0));
-                    // CUDA check
+                #ifndef CPU_ONLY
+                    const auto elementRendered = spElementToRender->load(); // I prefer std::round(T&) over intRound(T) for std::atomic
+                    const auto numberPeople = handKeypoints.getSize(0);
+                    // GPU rendering
+                    if (numberPeople > 0 && elementRendered == 0)
+                    {
+                        cpuToGpuMemoryIfNotCopiedYet(outputData.getPtr());
+                        // Draw handKeypoints
+                        cudaMemcpy(pGpuHands, handKeypoints.getConstPtr(), 2*HAND_NUMBER_PARTS*3 * sizeof(float), cudaMemcpyHostToDevice);
+                        renderHandKeypointsGpu(*spGpuMemoryPtr, mFrameSize, pGpuHands, handKeypoints.getSize(0));
+                        // CUDA check
+                        cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+                    }
+                    // GPU memory to CPU if last renderer
+                    gpuToCpuMemoryIfLastRenderer(outputData.getPtr());
                    cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-                }
-                // GPU memory to CPU if last renderer
-                gpuToCpuMemoryIfLastRenderer(outputData.getPtr());
-                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+                // CPU_ONLY mode
+                #else
+                    error("GPU rendering not available if `CPU_ONLY` is set.", __LINE__, __FUNCTION__, __FILE__);
+                    UNUSED(outputData);
+                    UNUSED(handKeypoints);
+                #endif
            }
            catch (const std::exception& e)
            {

--- a/src/openpose/experimental/hand/renderHand.cpp
+++ b/src/openpose/experimental/hand/renderHand.cpp
+#include <openpose/experimental/hand/handParameters.hpp>
+#include <openpose/utilities/errorAndLog.hpp>
+#include <openpose/utilities/fastMath.hpp>
+#include <openpose/utilities/keypoint.hpp>
+#include <openpose/experimental/hand/renderHand.hpp>
+
+namespace op
+{
+    const std::vector<float> COLORS{HAND_COLORS_RENDER};
+
+    void renderHandKeypointsCpu(Array<float>& frameArray, const Array<float>& handKeypoints)
+    {
+        try
+        {
+            if (!frameArray.empty())
+            {
+                // Parameters
+                const auto thicknessCircleRatio = 1.f/200.f;
+                const auto thicknessLineRatioWRTCircle = 0.75f;
+                const auto& pairs = HAND_PAIRS_RENDER;
+
+                // Render keypoints
+                renderKeypointsCpu(frameArray, handKeypoints, pairs, COLORS, thicknessCircleRatio, thicknessLineRatioWRTCircle);
+            }
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
+}
--- a/src/openpose/experimental/hand/handRenderGpu.cu
+++ b/src/openpose/experimental/hand/handRenderGpu.cu
@@ -3,37 +3,12 @@
 #include <openpose/utilities/cuda.hpp>
 #include <openpose/utilities/cuda.hu>
 #include <openpose/utilities/render.hu>
-#include <openpose/experimental/hand/handRenderGpu.hpp>
+#include <openpose/experimental/hand/renderHand.hpp>

 namespace op
 {
-    __constant__ const unsigned int PART_PAIRS_GPU[] = HAND_PAIRS_TO_RENDER;
-    __constant__ const float RGB_COLORS[] = {
-        179.f,    0.f,    0.f,
-        204.f,    0.f,    0.f,
-        230.f,    0.f,    0.f,
-        255.f,    0.f,    0.f,
-        143.f,  179.f,    0.f,
-        163.f,  204.f,    0.f,
-        184.f,  230.f,    0.f,
-        204.f,  255.f,    0.f,
-          0.f,  179.f,   71.f,
-          0.f,  204.f,   82.f,
-          0.f,  230.f,   92.f,
-          0.f,  255.f,  102.f,
-          0.f,   71.f,  179.f,
-          0.f,   82.f,  204.f,
-          0.f,   92.f,  230.f,
-          0.f,  102.f,  255.f,
-        143.f,    0.f,  179.f,
-        163.f,    0.f,  204.f,
-        184.f,    0.f,  230.f,
-        204.f,    0.f,  255.f,
-        179.f,  179.f,  179.f,
-        179.f,  179.f,  179.f,
-        179.f,  179.f,  179.f,
-        179.f,  179.f,  179.f
-    };
+    __constant__ const unsigned int PART_PAIRS_GPU[] = HAND_PAIRS_RENDER_GPU;
+    __constant__ const float COLORS[] = {HAND_COLORS_RENDER};



@@ -51,18 +26,19 @@ namespace op

        // Other parameters
        const auto numberPartPairs = sizeof(PART_PAIRS_GPU) / (2*sizeof(PART_PAIRS_GPU[0]));
-        const auto numberColors = sizeof(RGB_COLORS) / (3*sizeof(RGB_COLORS[0]));
+        const auto numberColors = sizeof(COLORS) / (3*sizeof(COLORS[0]));
        const auto radius = fastMin(targetWidth, targetHeight) / 100.f;
        const auto stickwidth = fastMin(targetWidth, targetHeight) / 80.f;

        // Render key points
        renderKeypoints(targetPtr, sharedMaxs, sharedMins, sharedScaleF,
                        globalIdx, x, y, targetWidth, targetHeight, handsPtr, PART_PAIRS_GPU, numberHands,
-                        HAND_NUMBER_PARTS, numberPartPairs, RGB_COLORS, numberColors,
+                        HAND_NUMBER_PARTS, numberPartPairs, COLORS, numberColors,
                        radius, stickwidth, threshold, alphaColorToAdd);
    }

-    void renderHandsGpu(float* framePtr, const Point<int>& frameSize, const float* const handsPtr, const int numberHands, const float alphaColorToAdd)
+    void renderHandKeypointsGpu(float* framePtr, const Point<int>& frameSize, const float* const handsPtr, const int numberHands,
+                                const float alphaColorToAdd)
    {
        try
        {
@@ -72,7 +48,8 @@ namespace op
                dim3 threadsPerBlock;
                dim3 numBlocks;
                std::tie(threadsPerBlock, numBlocks) = getNumberCudaThreadsAndBlocks(frameSize);
-                renderHandsParts<<<threadsPerBlock, numBlocks>>>(framePtr, frameSize.x, frameSize.y, handsPtr, numberHands, threshold, alphaColorToAdd);
+                renderHandsParts<<<threadsPerBlock, numBlocks>>>(framePtr, frameSize.x, frameSize.y, handsPtr, numberHands, threshold,
+                                                                 alphaColorToAdd);
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
            }
        }

--- a/src/openpose/face/faceDetector.cpp
+++ b/src/openpose/face/faceDetector.cpp
@@ -2,6 +2,7 @@
 #include <openpose/pose/poseParameters.hpp>
 #include <openpose/utilities/check.hpp>
 #include <openpose/utilities/errorAndLog.hpp>
+#include <openpose/utilities/keypoint.hpp>
 #include <openpose/face/faceDetector.hpp>
 
 namespace op
@@ -16,21 +17,6 @@ namespace op
    {
    }

-    float getDistance(const float* posePtr, const int elementA, const int elementB)
-    {
-        try
-        {
-            const auto pixelX = posePtr[elementA*3] - posePtr[elementB*3];
-            const auto pixelY = posePtr[elementA*3+1] - posePtr[elementB*3+1];
-            return std::sqrt(pixelX*pixelX+pixelY*pixelY);
-        }
-        catch (const std::exception& e)
-        {
-            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-            return -1.f;
-        }
-    }
-
    inline Rectangle<float> getFaceFromPoseKeypoints(const Array<float>& poseKeypoints, const unsigned int personIndex, const unsigned int neck,
                                                     const unsigned int nose, const unsigned int lEar, const unsigned int rEar,
                                                     const unsigned int lEye, const unsigned int rEye, const float threshold)

--- a/src/openpose/face/faceExtractor.cpp
+++ b/src/openpose/face/faceExtractor.cpp
@@ -10,9 +10,11 @@
 
 namespace op
 {
-    FaceExtractor::FaceExtractor(const Point<int>& netInputSize, const Point<int>& netOutputSize, const std::string& modelFolder, const int gpuId) :
+    FaceExtractor::FaceExtractor(const Point<int>& netInputSize, const Point<int>& netOutputSize, const std::string& modelFolder,
+                                 const int gpuId) :
        mNetOutputSize{netOutputSize},
-        spNet{std::make_shared<NetCaffe>(std::array<int,4>{1, 3, mNetOutputSize.y, mNetOutputSize.x}, modelFolder + FACE_PROTOTXT, modelFolder + FACE_TRAINED_MODEL, gpuId)},
+        spNet{std::make_shared<NetCaffe>(std::array<int,4>{1, 3, mNetOutputSize.y, mNetOutputSize.x}, modelFolder + FACE_PROTOTXT,
+                                         modelFolder + FACE_TRAINED_MODEL, gpuId)},
        spResizeAndMergeCaffe{std::make_shared<ResizeAndMergeCaffe<float>>()},
        spNmsCaffe{std::make_shared<NmsCaffe<float>>()},
        mFaceImageCrop{mNetOutputSize.area()*3}
@@ -21,6 +23,7 @@ namespace op
        {
            checkE(netOutputSize.x, netInputSize.x, "Net input and output size must be equal.", __LINE__, __FUNCTION__, __FILE__);
            checkE(netOutputSize.y, netInputSize.y, "Net input and output size must be equal.", __LINE__, __FUNCTION__, __FILE__);
+            checkE(netInputSize.x, netInputSize.y, "Net input size must be squared.", __LINE__, __FUNCTION__, __FILE__);
            // Properties
            for (auto& property : mProperties)
                property = 0.;
@@ -77,24 +80,25 @@ namespace op

                // Set face size
                const auto numberPeople = (int)faceRectangles.size();
-                mFaceKeypoints.reset({numberPeople, FACE_NUMBER_PARTS, 3}, 0);
-// // Commented lines are for debugging
-// log("\nAreas:");
-// cv::Mat cvInputDataCopy = cvInputData.clone();
+                mFaceKeypoints.reset({numberPeople, (int)FACE_NUMBER_PARTS, 3}, 0);
+
+                // // Debugging
+                // cv::Mat cvInputDataCopy = cvInputData.clone();
+                // Extract face keypoints for each person
                for (auto person = 0 ; person < numberPeople ; person++)
                {
                    // Only consider faces with a minimum pixel area
                    const auto faceAreaSquared = std::sqrt(faceRectangles.at(person).area());
+                    // // Debugging
+                    // log(std::to_string(cvInputData.cols) + " " + std::to_string(cvInputData.rows));
+                    // cv::rectangle(cvInputDataCopy,
+                    //               cv::Point{(int)faceRectangle.x, (int)faceRectangle.y},
+                    //               cv::Point{(int)faceRectangle.bottomRight().x, (int)faceRectangle.bottomRight().y},
+                    //               cv::Scalar{0,0,255}, 2);
                    // Get parts
                    if (faceAreaSquared > 50)
                    {
                        const auto& faceRectangle = faceRectangles.at(person);
-// log(faceAreaSquared);
-// log(std::to_string(cvInputData.cols) + " " + std::to_string(cvInputData.rows));
-// cv::rectangle(cvInputDataCopy,
-//               cv::Point{(int)faceRectangle.x, (int)faceRectangle.y},
-//               cv::Point{(int)faceRectangle.bottomRight().x, (int)faceRectangle.bottomRight().y},
-//               cv::Scalar{255,0,255}, 2);
                        // Get face position(s)
                        const Point<float> faceCenterPosition{faceRectangle.topLeft()};
                        const auto faceSize = fastMax(faceRectangle.width, faceRectangle.height);
@@ -112,8 +116,11 @@ namespace op

                        // cv::Mat -> float*
                        uCharCvMatToFloatPtr(mFaceImageCrop.getPtr(), faceImage, true);
-// if (person < 5)
-// cv::imshow("faceImage" + std::to_string(person), faceImage);
+
+                        // // Debugging
+                        // if (person < 5)
+                        // cv::imshow("faceImage" + std::to_string(person), faceImage);
+
                        // 1. Caffe deep network
                        auto* inputDataGpuPtr = spNet->getInputDataGpuPtr();
                        cudaMemcpy(inputDataGpuPtr, mFaceImageCrop.getPtr(), mNetOutputSize.area() * 3 * sizeof(float), cudaMemcpyHostToDevice);
@@ -162,7 +169,7 @@ namespace op
                                const auto x = facePeaksPtr[xyIndex];
                                const auto y = facePeaksPtr[xyIndex + 1];
                                const auto score = facePeaksPtr[xyIndex + 2];
-                                const auto baseIndex = (person * FACE_NUMBER_PARTS + part) * 3;
+                                const auto baseIndex = mFaceKeypoints.getSize(2) * (person * mFaceKeypoints.getSize(1) + part);
                                mFaceKeypoints[baseIndex] = scaleInputToOutput * (Mscaling.at<double>(0,0) * x + Mscaling.at<double>(0,1) * y + Mscaling.at<double>(0,2));
                                mFaceKeypoints[baseIndex+1] = scaleInputToOutput * (Mscaling.at<double>(1,0) * x + Mscaling.at<double>(1,1) * y + Mscaling.at<double>(1,2));
                                mFaceKeypoints[baseIndex+2] = score;
@@ -170,7 +177,8 @@ namespace op
                        }
                    }
                }
-// cv::imshow("AcvInputDataCopy", cvInputDataCopy);
+                // // Debugging
+                // cv::imshow("AcvInputDataCopy", cvInputDataCopy);
            }
            else
                mFaceKeypoints.reset();

--- a/src/openpose/face/faceRenderer.cpp
+++ b/src/openpose/face/faceRenderer.cpp
-#include <cuda.h>
-#include <cuda_runtime_api.h>
-#include <openpose/face/faceParameters.hpp>
-#include <openpose/face/faceRenderGpu.hpp>
+#ifndef CPU_ONLY
+    #include <cuda.h>
+    #include <cuda_runtime_api.h>
+#endif
+#include <openpose/face/renderFace.hpp>
 #include <openpose/utilities/cuda.hpp>
 #include <openpose/utilities/errorAndLog.hpp>
 #include <openpose/face/faceRenderer.hpp>

 namespace op
 {
-    FaceRenderer::FaceRenderer(const Point<int>& frameSize, const float alphaKeypoint, const float alphaHeatMap) :
+    FaceRenderer::FaceRenderer(const Point<int>& frameSize, const float alphaKeypoint, const float alphaHeatMap, const RenderMode renderMode) :
        Renderer{(unsigned long long)(frameSize.area() * 3), alphaKeypoint, alphaHeatMap},
-        mFrameSize{frameSize}
+        mFrameSize{frameSize},
+        mRenderMode{renderMode}
    {
    }

@@ -19,7 +21,9 @@ namespace op
        try
        {
            // Free CUDA pointers - Note that if pointers are 0 (i.e. nullptr), no operation is performed.
-            cudaFree(pGpuFace);
+            #ifndef CPU_ONLY
+                cudaFree(pGpuFace);
+            #endif
        }
        catch (const std::exception& e)
        {
@@ -31,8 +35,13 @@ namespace op
    {
        try
        {
+            log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
            Renderer::initializationOnThread();
-            cudaMalloc((void**)(&pGpuFace), POSE_MAX_PEOPLE * FACE_NUMBER_PARTS * 3 * sizeof(float));
+            // GPU memory allocation for rendering
+            #ifndef CPU_ONLY
+                cudaMalloc((void**)(&pGpuFace), POSE_MAX_PEOPLE * FACE_NUMBER_PARTS * 3 * sizeof(float));
+            #endif
+            log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
        }
        catch (const std::exception& e)
        {
@@ -44,21 +53,63 @@ namespace op
    {
        try
        {
-            const auto elementRendered = spElementToRender->load(); // I prefer std::round(T&) over intRound(T) for std::atomic
-            const auto numberPeople = faceKeypoints.getSize(0);
+            // Security checks
+            if (outputData.empty())
+                error("Empty Array<float> outputData.", __LINE__, __FUNCTION__, __FILE__);
+
+            // CPU rendering
+            if (mRenderMode == RenderMode::Cpu)
+                renderFaceCpu(outputData, faceKeypoints);
+
+            // GPU rendering
+            else
+                renderFaceGpu(outputData, faceKeypoints);
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
+
+    void FaceRenderer::renderFaceCpu(Array<float>& outputData, const Array<float>& faceKeypoints)
+    {
+        try
+        {
+            renderFaceKeypointsCpu(outputData, faceKeypoints);
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
+
+    void FaceRenderer::renderFaceGpu(Array<float>& outputData, const Array<float>& faceKeypoints)
+    {
+        try
+        {
            // GPU rendering
-            if (numberPeople > 0 && elementRendered == 0)
-            {
-                cpuToGpuMemoryIfNotCopiedYet(outputData.getPtr());
-                // Draw faceKeypoints
-                cudaMemcpy(pGpuFace, faceKeypoints.getConstPtr(), faceKeypoints.getSize(0) * FACE_NUMBER_PARTS * 3 * sizeof(float), cudaMemcpyHostToDevice);
-                renderFaceGpu(*spGpuMemoryPtr, mFrameSize, pGpuFace, faceKeypoints.getSize(0), getAlphaKeypoint());
-                // CUDA check
+            #ifndef CPU_ONLY
+                const auto elementRendered = spElementToRender->load(); // I prefer std::round(T&) over intRound(T) for std::atomic
+                const auto numberPeople = faceKeypoints.getSize(0);
+                if (numberPeople > 0 && elementRendered == 0)
+                {
+                    cpuToGpuMemoryIfNotCopiedYet(outputData.getPtr());
+                    // Draw faceKeypoints
+                    cudaMemcpy(pGpuFace, faceKeypoints.getConstPtr(), faceKeypoints.getSize(0) * FACE_NUMBER_PARTS * 3 * sizeof(float),
+                               cudaMemcpyHostToDevice);
+                    renderFaceKeypointsGpu(*spGpuMemoryPtr, mFrameSize, pGpuFace, faceKeypoints.getSize(0), getAlphaKeypoint());
+                    // CUDA check
+                    cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+                }
+                // GPU memory to CPU if last renderer
+                gpuToCpuMemoryIfLastRenderer(outputData.getPtr());
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-            }
-            // GPU memory to CPU if last renderer
-            gpuToCpuMemoryIfLastRenderer(outputData.getPtr());
-            cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+            // CPU_ONLY mode
+            #else
+                error("GPU rendering not available if `CPU_ONLY` is set.", __LINE__, __FUNCTION__, __FILE__);
+                UNUSED(outputData);
+                UNUSED(faceKeypoints);
+            #endif
        }
        catch (const std::exception& e)
        {

--- a/src/openpose/face/renderFace.cpp
+++ b/src/openpose/face/renderFace.cpp
+#include <openpose/face/faceParameters.hpp>
+#include <openpose/utilities/errorAndLog.hpp>
+#include <openpose/utilities/fastMath.hpp>
+#include <openpose/utilities/keypoint.hpp>
+#include <openpose/face/renderFace.hpp>
+
+namespace op
+{
+    const std::vector<float> COLORS{FACE_COLORS_RENDER};
+
+    void renderFaceKeypointsCpu(Array<float>& frameArray, const Array<float>& faceKeypoints)
+    {
+        try
+        {
+            if (!frameArray.empty())
+            {
+                // Parameters
+                const auto thicknessCircleRatio = 1.f/75.f;
+                const auto thicknessLineRatioWRTCircle = 0.334f;
+                const auto& pairs = FACE_PAIRS_RENDER;
+
+                // Render keypoints
+                renderKeypointsCpu(frameArray, faceKeypoints, pairs, COLORS, thicknessCircleRatio, thicknessLineRatioWRTCircle);
+            }
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
+}
--- a/src/openpose/face/faceRenderGpu.cu
+++ b/src/openpose/face/faceRenderGpu.cu
@@ -3,15 +3,13 @@
 #include <openpose/utilities/cuda.hpp>
 #include <openpose/utilities/cuda.hu>
 #include <openpose/utilities/render.hu>
-#include <openpose/face/faceRenderGpu.hpp>
+#include <openpose/face/renderFace.hpp>

 namespace op
 {
    const dim3 THREADS_PER_BLOCK{128, 128, 1};
-    __constant__ const unsigned int PART_PAIRS_GPU[] = FACE_PAIRS_TO_RENDER;
-    __constant__ const float RGB_COLORS[] = {
-        255.f,    255.f,    255.f,
-    };
+    __constant__ const unsigned int PART_PAIRS_GPU[] = FACE_PAIRS_RENDER_GPU;
+    __constant__ const float COLORS[] = {FACE_COLORS_RENDER};



@@ -29,18 +27,19 @@ namespace op

        // Other parameters
        const auto numberPartPairs = sizeof(PART_PAIRS_GPU) / (2*sizeof(PART_PAIRS_GPU[0]));
-        const auto numberColors = sizeof(RGB_COLORS) / (3*sizeof(RGB_COLORS[0]));
+        const auto numberColors = sizeof(COLORS) / (3*sizeof(COLORS[0]));
        const auto radius = fastMin(targetWidth, targetHeight) / 120.f;
        const auto stickwidth = fastMin(targetWidth, targetHeight) / 250.f;

        // Render key points
        renderKeypoints(targetPtr, sharedMaxs, sharedMins, sharedScaleF,
                        globalIdx, x, y, targetWidth, targetHeight, facePtr, PART_PAIRS_GPU, numberFaces,
-                        FACE_NUMBER_PARTS, numberPartPairs, RGB_COLORS, numberColors,
+                        FACE_NUMBER_PARTS, numberPartPairs, COLORS, numberColors,
                        radius, stickwidth, threshold, alphaColorToAdd);
    }

-    void renderFaceGpu(float* framePtr, const Point<int>& frameSize, const float* const facePtr, const int numberFaces, const float alphaColorToAdd)
+    void renderFaceKeypointsGpu(float* framePtr, const Point<int>& frameSize, const float* const facePtr, const int numberFaces,
+                                const float alphaColorToAdd)
    {
        try
        {
@@ -48,7 +47,8 @@ namespace op
            {
                const auto threshold = 0.4f;
                const auto numBlocks = getNumberCudaBlocks(frameSize, THREADS_PER_BLOCK);
-                renderFaceParts<<<THREADS_PER_BLOCK, numBlocks>>>(framePtr, frameSize.x, frameSize.y, facePtr, numberFaces, threshold, alphaColorToAdd);
+                renderFaceParts<<<THREADS_PER_BLOCK, numBlocks>>>(framePtr, frameSize.x, frameSize.y, facePtr, numberFaces, threshold,
+                                                                  alphaColorToAdd);
                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
            }
        }

--- a/src/openpose/pose/poseExtractor.cpp
+++ b/src/openpose/pose/poseExtractor.cpp
@@ -7,78 +7,6 @@

 namespace op
 {
-    // In case I wanna order the pose keypoints by area
-    // inline int getPersonArea(const float* posePtr, const int numberBodyParts, const float threshold)
-    // {
-    //     try
-    //     {
-    //         if (numberBodyParts < 1)
-    //             error("Number body parts must be > 0", __LINE__, __FUNCTION__, __FILE__);
-
-    //         unsigned int minX = -1;
-    //         unsigned int maxX = 0u;
-    //         unsigned int minY = -1;
-    //         unsigned int maxY = 0u;
-    //         for (auto part = 0 ; part < numberBodyParts ; part++)
-    //         {
-    //             const auto score = posePtr[3*part + 2];
-    //             if (score > threshold)
-    //             {
-    //                 const auto x = posePtr[3*part];
-    //                 const auto y = posePtr[3*part + 1];
-    //                 // Set X
-    //                 if (maxX < x)
-    //                     maxX = x;
-    //                 if (minX > x)
-    //                     minX = x;
-    //                 // Set Y
-    //                 if (maxY < y)
-    //                     maxY = y;
-    //                 if (minY > y)
-    //                     minY = y;
-    //             }
-    //         }
-    //         return (maxX - minX) * (maxY - minY);
-    //     }
-    //     catch (const std::exception& e)
-    //     {
-    //         error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-    //         return 0;
-    //     }
-    // }
-
-    // inline int getPersonWithMaxArea(const Array<float>& poseKeypoints, const float threshold)
-    // {
-    //     try
-    //     {
-    //         if (!poseKeypoints.empty())
-    //         {
-    //             const auto numberPeople = poseKeypoints.getSize(0);
-    //             const auto numberBodyParts = poseKeypoints.getSize(1);
-    //             const auto area = numberBodyParts * poseKeypoints.getSize(2);
-    //             auto biggestPoseIndex = -1;
-    //             auto biggestArea = -1;
-    //             for (auto person = 0 ; person < numberPeople ; person++)
-    //             {
-    //                 const auto newPersonArea = getPersonArea(&poseKeypoints[person*area], numberBodyParts, threshold);
-    //                 if (newPersonArea > biggestArea)
-    //                 {
-    //                     biggestArea = newPersonArea;
-    //                     biggestPoseIndex = person;
-    //                 }
-    //             }
-    //             return biggestPoseIndex;
-    //         }
-    //         else
-    //             return -1;
-    //     }
-    //     catch (const std::exception& e)
-    //     {
-    //         error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-    //         return -1;
-    //     }
-    // }
-
    bool heatMapTypesHas(const std::vector<HeatMapType>& heatMapTypes, const HeatMapType heatMapType)
    {
        try

--- a/src/openpose/pose/poseParameters.cpp
+++ b/src/openpose/pose/poseParameters.cpp
@@ -3,7 +3,7 @@
 
 namespace op
 {
-	const std::array<std::map<unsigned int, std::string>, 3>   POSE_BODY_PART_MAPPING{ POSE_COCO_BODY_PARTS,   POSE_MPI_BODY_PARTS,    POSE_MPI_BODY_PARTS };
+    const std::array<std::map<unsigned int, std::string>, 3>   POSE_BODY_PART_MAPPING{ POSE_COCO_BODY_PARTS,   POSE_MPI_BODY_PARTS,    POSE_MPI_BODY_PARTS };

    unsigned int poseBodyPartMapStringToKey(const PoseModel poseModel, const std::vector<std::string>& strings)
    {
@@ -24,29 +24,29 @@ namespace op
        }
    }

-	unsigned int poseBodyPartMapStringToKey(const PoseModel poseModel, const std::string& string)
-	{
-		try
-		{
-			return poseBodyPartMapStringToKey(poseModel, std::vector<std::string>{string});
-		}
-		catch (const std::exception& e)
-		{
-			error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-			return 0;
-		}
-	}
+    unsigned int poseBodyPartMapStringToKey(const PoseModel poseModel, const std::string& string)
+    {
+        try
+        {
+            return poseBodyPartMapStringToKey(poseModel, std::vector<std::string>{string});
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            return 0;
+        }
+    }

-	const std::map<unsigned int, std::string>& getPoseBodyPartMapping(const PoseModel poseModel)
-	{
-		try
-		{
-			return POSE_BODY_PART_MAPPING.at((int)poseModel);
-		}
-		catch (const std::exception& e)
-		{
-			error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-			return POSE_BODY_PART_MAPPING[(int)poseModel];
-		}
-	}
+    const std::map<unsigned int, std::string>& getPoseBodyPartMapping(const PoseModel poseModel)
+    {
+        try
+        {
+            return POSE_BODY_PART_MAPPING.at((int)poseModel);
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            return POSE_BODY_PART_MAPPING[(int)poseModel];
+        }
+    }
 }
--- a/src/openpose/pose/poseRenderer.cpp
+++ b/src/openpose/pose/poseRenderer.cpp
-#include <cuda.h>
-#include <cuda_runtime_api.h>
+#ifndef CPU_ONLY
+    #include <cuda.h>
+    #include <cuda_runtime_api.h>
+#endif
 #include <openpose/pose/poseParameters.hpp>
-#include <openpose/pose/poseRenderGpu.hpp>
+#include <openpose/pose/renderPose.hpp>
 #include <openpose/utilities/cuda.hpp>
 #include <openpose/utilities/errorAndLog.hpp>
 #include <openpose/pose/poseRenderer.hpp>
@@ -37,8 +39,10 @@ namespace op
        }
    }

-    PoseRenderer::PoseRenderer(const Point<int>& heatMapsSize, const Point<int>& outputSize, const PoseModel poseModel, const std::shared_ptr<PoseExtractor>& poseExtractor,
-                               const bool blendOriginalFrame, const float alphaKeypoint, const float alphaHeatMap, const unsigned int elementToRender) :
+    PoseRenderer::PoseRenderer(const Point<int>& heatMapsSize, const Point<int>& outputSize, const PoseModel poseModel,
+                               const std::shared_ptr<PoseExtractor>& poseExtractor, const bool blendOriginalFrame,
+                               const float alphaKeypoint, const float alphaHeatMap, const unsigned int elementToRender,
+                               const RenderMode renderMode) :
        // #body elements to render = #body parts (size()) + #body part pair connections + 3 (+whole pose +whole heatmaps +PAFs)
        // POSE_BODY_PART_MAPPING crashes on Windows, replaced by getPoseBodyPartMapping
        Renderer{(unsigned long long)(outputSize.area() * 3), alphaKeypoint, alphaHeatMap, elementToRender,
@@ -48,6 +52,7 @@ namespace op
        mPoseModel{poseModel},
        mPartIndexToName{createPartToName(poseModel)},
        spPoseExtractor{poseExtractor},
+        mRenderMode{renderMode},
        mBlendOriginalFrame{blendOriginalFrame},
        mShowGooglyEyes{false},
        pGpuPose{nullptr}
@@ -59,7 +64,9 @@ namespace op
        try
        {
            // Free CUDA pointers - Note that if pointers are 0 (i.e. nullptr), no operation is performed.
-            cudaFree(pGpuPose);
+            #ifndef CPU_ONLY
+                cudaFree(pGpuPose);
+            #endif
        }
        catch (const std::exception& e)
        {
@@ -71,11 +78,13 @@ namespace op
    {
        try
        {
-            // GPU memory allocation for rendering
            log("Starting initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
            Renderer::initializationOnThread();
-            cudaMalloc((void**)(&pGpuPose), POSE_MAX_PEOPLE * POSE_NUMBER_BODY_PARTS[(int)mPoseModel] * 3 * sizeof(float));
-            cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+            // GPU memory allocation for rendering
+            #ifndef CPU_ONLY
+                cudaMalloc((void**)(&pGpuPose), POSE_MAX_PEOPLE * POSE_NUMBER_BODY_PARTS[(int)mPoseModel] * 3 * sizeof(float));
+                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+            #endif
            log("Finished initialization on thread.", Priority::Low, __LINE__, __FUNCTION__, __FILE__);
        }
        catch (const std::exception& e)
@@ -134,73 +143,135 @@ namespace op
        }
    }

-    std::pair<int, std::string> PoseRenderer::renderPose(Array<float>& outputData, const Array<float>& poseKeypoints, const float scaleNetToOutput)
+    std::pair<int, std::string> PoseRenderer::renderPose(Array<float>& outputData, const Array<float>& poseKeypoints,
+                                                         const float scaleNetToOutput)
    {
        try
        {
            // Security checks
            if (outputData.empty())
-                error("Empty outputData.", __LINE__, __FUNCTION__, __FILE__);
+                error("Empty Array<float> outputData.", __LINE__, __FUNCTION__, __FILE__);

-            const auto elementRendered = spElementToRender->load(); // I prefer std::round(T&) over intRound(T) for std::atomic
-            const auto numberPeople = poseKeypoints.getSize(0);
-            std::string elementRenderedName;
+            // CPU rendering
+            if (mRenderMode == RenderMode::Cpu)
+                return renderPoseCpu(outputData, poseKeypoints, scaleNetToOutput);

            // GPU rendering
-            if (numberPeople > 0 || elementRendered != 0 || !mBlendOriginalFrame)
+            else
+                return renderPoseGpu(outputData, poseKeypoints, scaleNetToOutput);
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            return std::make_pair(-1, "");
+        }
+    }
+
+    std::pair<int, std::string> PoseRenderer::renderPoseCpu(Array<float>& outputData, const Array<float>& poseKeypoints,
+                                                            const float scaleNetToOutput)
+    {
+        try
+        {
+            const auto elementRendered = spElementToRender->load();
+
+            std::string elementRenderedName;
+            // CPU rendering
+            // Draw poseKeypoints
+            if (elementRendered == 0)
+                renderPoseKeypointsCpu(outputData, poseKeypoints, mPoseModel, mBlendOriginalFrame);
+            // Draw heat maps / PAFs
+            else
            {
-                cpuToGpuMemoryIfNotCopiedYet(outputData.getPtr());
-                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-                const auto numberBodyParts = POSE_NUMBER_BODY_PARTS[(int)mPoseModel];
-                const auto numberBodyPartsPlusBkg = numberBodyParts+1;
-                // Draw poseKeypoints
-                if (elementRendered == 0)
-                {
-                    if (!poseKeypoints.empty())
-                        cudaMemcpy(pGpuPose, poseKeypoints.getConstPtr(), numberPeople * numberBodyParts * 3 * sizeof(float), cudaMemcpyHostToDevice);
-                    renderPoseGpu(*spGpuMemoryPtr, mPoseModel, numberPeople, mOutputSize, pGpuPose, mShowGooglyEyes, mBlendOriginalFrame, getAlphaKeypoint());
-                }
-                else
+                UNUSED(scaleNetToOutput);
+                error("CPU rendering only available for drawing keypoints, no heat maps nor PAFs.", __LINE__, __FUNCTION__, __FILE__);    
+            }
+            // Return result
+            return std::make_pair(elementRendered, elementRenderedName);
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            return std::make_pair(-1, "");
+        }
+    }
+
+    std::pair<int, std::string> PoseRenderer::renderPoseGpu(Array<float>& outputData, const Array<float>& poseKeypoints,
+                                                            const float scaleNetToOutput)
+    {
+        try
+        {
+            const auto elementRendered = spElementToRender->load();
+
+            std::string elementRenderedName;
+            // GPU rendering
+            #ifndef CPU_ONLY
+                const auto numberPeople = poseKeypoints.getSize(0);
+                if (numberPeople > 0 || elementRendered != 0 || !mBlendOriginalFrame)
                {
-                    if (scaleNetToOutput == -1.f)
-                        error("Non valid scaleNetToOutput.", __LINE__, __FUNCTION__, __FILE__);
-                    // Draw specific body part or bkg
-                    if (elementRendered <= numberBodyPartsPlusBkg)
+                    cpuToGpuMemoryIfNotCopiedYet(outputData.getPtr());
+                    cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+                    const auto numberBodyParts = POSE_NUMBER_BODY_PARTS[(int)mPoseModel];
+                    const auto numberBodyPartsPlusBkg = numberBodyParts+1;
+                    // Draw poseKeypoints
+                    if (elementRendered == 0)
                    {
-                        elementRenderedName = mPartIndexToName.at(elementRendered-1);
-                        renderBodyPartGpu(*spGpuMemoryPtr, mPoseModel, mOutputSize, spPoseExtractor->getHeatMapCpuConstPtr(), mHeatMapsSize,
-                                          scaleNetToOutput, elementRendered, (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
+                        if (!poseKeypoints.empty())
+                            cudaMemcpy(pGpuPose, poseKeypoints.getConstPtr(), numberPeople * numberBodyParts * 3 * sizeof(float),
+                                       cudaMemcpyHostToDevice);
+                        renderPoseKeypointsGpu(*spGpuMemoryPtr, mPoseModel, numberPeople, mOutputSize, pGpuPose,
+                                               mShowGooglyEyes, mBlendOriginalFrame, getAlphaKeypoint());
                    }
-                    // Draw PAFs (Part Affinity Fields)
-                    else if (elementRendered == numberBodyPartsPlusBkg+1)
-                    {
-                        elementRenderedName = "Heatmaps";
-                        renderBodyPartsGpu(*spGpuMemoryPtr, mPoseModel, mOutputSize, spPoseExtractor->getHeatMapCpuConstPtr(), mHeatMapsSize, scaleNetToOutput,
-                                           (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
-                    }
-                    // Draw PAFs (Part Affinity Fields)
-                    else if (elementRendered == numberBodyPartsPlusBkg+2)
-                    {
-                        elementRenderedName = "PAFs (Part Affinity Fields)";
-                        renderPartAffinityFieldsGpu(*spGpuMemoryPtr, mPoseModel, mOutputSize, spPoseExtractor->getHeatMapCpuConstPtr(),
-                                                    mHeatMapsSize, scaleNetToOutput, (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
-                    }
-                    // Draw affinity between 2 body parts
                    else
                    {
-                        const auto affinityPart = (elementRendered-numberBodyPartsPlusBkg-3)*2;
-                        const auto affinityPartMapped = POSE_MAP_IDX[(int)mPoseModel].at(affinityPart);
-                        elementRenderedName = mPartIndexToName.at(affinityPartMapped);
-                        elementRenderedName = elementRenderedName.substr(0, elementRenderedName.find("("));
-                        renderPartAffinityFieldGpu(*spGpuMemoryPtr, mPoseModel, mOutputSize, spPoseExtractor->getHeatMapCpuConstPtr(),
-                                                   mHeatMapsSize, scaleNetToOutput, affinityPartMapped, (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
-                    }     
+                        if (scaleNetToOutput == -1.f)
+                            error("Non valid scaleNetToOutput.", __LINE__, __FUNCTION__, __FILE__);
+                        // Draw specific body part or bkg
+                        if (elementRendered <= numberBodyPartsPlusBkg)
+                        {
+                            elementRenderedName = mPartIndexToName.at(elementRendered-1);
+                            renderPoseHeatMapGpu(*spGpuMemoryPtr, mPoseModel, mOutputSize, spPoseExtractor->getHeatMapCpuConstPtr(),
+                                                 mHeatMapsSize, scaleNetToOutput, elementRendered,
+                                                 (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
+                        }
+                        // Draw PAFs (Part Affinity Fields)
+                        else if (elementRendered == numberBodyPartsPlusBkg+1)
+                        {
+                            elementRenderedName = "Heatmaps";
+                            renderPoseHeatMapsGpu(*spGpuMemoryPtr, mPoseModel, mOutputSize, spPoseExtractor->getHeatMapCpuConstPtr(),
+                                                  mHeatMapsSize, scaleNetToOutput, (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
+                        }
+                        // Draw PAFs (Part Affinity Fields)
+                        else if (elementRendered == numberBodyPartsPlusBkg+2)
+                        {
+                            elementRenderedName = "PAFs (Part Affinity Fields)";
+                            renderPosePAFsGpu(*spGpuMemoryPtr, mPoseModel, mOutputSize, spPoseExtractor->getHeatMapCpuConstPtr(),
+                                              mHeatMapsSize, scaleNetToOutput, (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
+                        }
+                        // Draw affinity between 2 body parts
+                        else
+                        {
+                            const auto affinityPart = (elementRendered-numberBodyPartsPlusBkg-3)*2;
+                            const auto affinityPartMapped = POSE_MAP_IDX[(int)mPoseModel].at(affinityPart);
+                            elementRenderedName = mPartIndexToName.at(affinityPartMapped);
+                            elementRenderedName = elementRenderedName.substr(0, elementRenderedName.find("("));
+                            renderPosePAFGpu(*spGpuMemoryPtr, mPoseModel, mOutputSize, spPoseExtractor->getHeatMapCpuConstPtr(),
+                                             mHeatMapsSize, scaleNetToOutput, affinityPartMapped,
+                                             (mBlendOriginalFrame ? getAlphaHeatMap() : 1.f));
+                        }
+                    }
                }
-            }
-            // GPU memory to CPU if last renderer
-            gpuToCpuMemoryIfLastRenderer(outputData.getPtr());
-            cudaCheck(__LINE__, __FUNCTION__, __FILE__);
-
+                // GPU memory to CPU if last renderer
+                gpuToCpuMemoryIfLastRenderer(outputData.getPtr());
+                cudaCheck(__LINE__, __FUNCTION__, __FILE__);
+            // CPU_ONLY mode
+            #else
+                error("GPU rendering not available if `CPU_ONLY` is set.", __LINE__, __FUNCTION__, __FILE__);
+                UNUSED(elementRendered);
+                UNUSED(outputData);
+                UNUSED(poseKeypoints);
+                UNUSED(scaleNetToOutput);
+            #endif
+            // Return result
            return std::make_pair(elementRendered, elementRenderedName);
        }
        catch (const std::exception& e)

--- a/src/openpose/pose/renderPose.cpp
+++ b/src/openpose/pose/renderPose.cpp
+#include <openpose/pose/poseParameters.hpp>
+#include <openpose/utilities/errorAndLog.hpp>
+#include <openpose/utilities/fastMath.hpp>
+#include <openpose/utilities/keypoint.hpp>
+#include <openpose/pose/renderPose.hpp>
+
+namespace op
+{
+    const std::vector<float> COCO_COLORS{POSE_COCO_COLORS_RENDER};
+    const std::vector<float> MPI_COLORS{POSE_MPI_COLORS_RENDER};
+
+    void renderPoseKeypointsCpu(Array<float>& frameArray, const Array<float>& poseKeypoints, const PoseModel poseModel, const bool blendOriginalFrame)
+    {
+        try
+        {
+            if (!frameArray.empty())
+            {
+                // Array<float> --> cv::Mat
+                auto frame = frameArray.getCvMat();
+
+                // Background
+                if (!blendOriginalFrame)
+                    frame.setTo(0.f); // [0-255]
+
+                // Parameters
+                const auto thicknessCircleRatio = 1.f/75.f;
+                const auto thicknessLineRatioWRTCircle = 0.75f;
+                const auto& pairs = POSE_BODY_PART_PAIRS_RENDER[(int)poseModel];
+                const auto& colors = (poseModel == PoseModel::COCO_18 ? COCO_COLORS : MPI_COLORS);
+
+                // Render keypoints
+                renderKeypointsCpu(frameArray, poseKeypoints, pairs, colors, thicknessCircleRatio, thicknessLineRatioWRTCircle);
+            }
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
+}
--- a/src/openpose/pose/poseRenderGpu.cu
+++ b/src/openpose/pose/poseRenderGpu.cu
-#include <utility> // std::pair
 #include <openpose/pose/poseParameters.hpp>
 #include <openpose/utilities/errorAndLog.hpp>
 #include <openpose/utilities/cuda.hpp>
 #include <openpose/utilities/cuda.hu>
 #include <openpose/utilities/render.hu>
-#include <openpose/pose/poseRenderGpu.hpp>
+#include <openpose/pose/renderPose.hpp>

 namespace op
 {
-	// PI digits: http://www.piday.org/million/
-	__constant__ const float PI = 3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745f;
-    __constant__ const unsigned int COCO_PAIRS_GPU[] = POSE_COCO_PAIRS_TO_RENDER;
-    __constant__ const unsigned int MPI_PAIRS_GPU[] = POSE_MPI_PAIRS_TO_RENDER;
-    __constant__ const float COCO_RGB_COLORS[] = {
-        255.f,     0.f,     0.f,
-        255.f,    85.f,     0.f,
-        255.f,   170.f,     0.f,
-        255.f,   255.f,     0.f,
-        170.f,   255.f,     0.f,
-         85.f,   255.f,     0.f,
-          0.f,   255.f,     0.f,
-          0.f,   255.f,    85.f,
-          0.f,   255.f,   170.f,
-          0.f,   255.f,   255.f,
-          0.f,   170.f,   255.f,
-          0.f,    85.f,   255.f,
-          0.f,     0.f,   255.f,
-         85.f,     0.f,   255.f,
-        170.f,     0.f,   255.f,
-        255.f,     0.f,   255.f,
-        255.f,     0.f,   170.f,
-        255.f,     0.f,    85.f,
-    };
-    __constant__ const float MPI_RGB_COLORS[] = { // MPI colors chosen such that they are closed to COCO colors
-        255.f,     0.f,    85.f,
-        255.f,     0.f,     0.f,
-        255.f,    85.f,     0.f,
-        255.f,   170.f,     0.f,
-        255.f,   255.f,     0.f,
-        170.f,   255.f,     0.f,
-         85.f,   255.f,     0.f,
-          43.f,   255.f,     0.f,
-          0.f,   255.f,     0.f,
-          0.f,   255.f,    85.f,
-          0.f,   255.f,   170.f,
-          0.f,   255.f,   255.f,
-          0.f,   170.f,   255.f,
-          0.f,    85.f,   255.f,
-          0.f,     0.f,   255.f,
-    };
-    __constant__ const float RGB_COLORS_BACKGROUND[] = {
-        255.f,     0.f,     0.f,
-        255.f,    85.f,     0.f,
-        255.f,   170.f,     0.f,
-        255.f,   255.f,     0.f,
-        170.f,   255.f,     0.f,
-         85.f,   255.f,     0.f,
-          0.f,   255.f,     0.f,
-          0.f,   255.f,    85.f,
-          0.f,   255.f,   170.f,
-          0.f,   255.f,   255.f,
-          0.f,   170.f,   255.f,
-          0.f,    85.f,   255.f,
-          0.f,     0.f,   255.f,
-         85.f,     0.f,   255.f,
-        170.f,     0.f,   255.f,
-        255.f,     0.f,   255.f,
-        255.f,     0.f,   170.f,
-        255.f,     0.f,    85.f,
-    };
+    // PI digits: http://www.piday.org/million/
+    __constant__ const float PI = 3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745f;
+    __constant__ const unsigned int COCO_PAIRS_GPU[] = POSE_COCO_PAIRS_RENDER_GPU;
+    __constant__ const unsigned int MPI_PAIRS_GPU[] = POSE_MPI_PAIRS_RENDER_GPU;
+    __constant__ const float COCO_COLORS[] = {POSE_COCO_COLORS_RENDER};
+    __constant__ const float MPI_COLORS[] = {POSE_MPI_COLORS_RENDER};



@@ -163,14 +107,14 @@ namespace op

        // Other parameters
        const auto numberPartPairs = sizeof(COCO_PAIRS_GPU) / (2*sizeof(COCO_PAIRS_GPU[0]));
-        const auto numberColors = sizeof(COCO_RGB_COLORS) / (3*sizeof(COCO_RGB_COLORS[0]));
+        const auto numberColors = sizeof(COCO_COLORS) / (3*sizeof(COCO_COLORS[0]));
        const auto radius = fastMin(targetWidth, targetHeight) / 100.f;
        const auto stickwidth = fastMin(targetWidth, targetHeight) / 120.f;

        // Render key points
        renderKeypoints(targetPtr, sharedMaxs, sharedMins, sharedScaleF,
                        globalIdx, x, y, targetWidth, targetHeight, posePtr, COCO_PAIRS_GPU, numberPeople,
-                        POSE_COCO_NUMBER_PARTS, numberPartPairs, COCO_RGB_COLORS, numberColors,
+                        POSE_COCO_NUMBER_PARTS, numberPartPairs, COCO_COLORS, numberColors,
                        radius, stickwidth, threshold, alphaColorToAdd, blendOriginalFrame, (googlyEyes ? 14 : -1), (googlyEyes ? 15 : -1));
    }

@@ -188,14 +132,14 @@ namespace op

        // Other parameters
        const auto numberPartPairs = sizeof(MPI_PAIRS_GPU) / (2*sizeof(MPI_PAIRS_GPU[0]));
-        const auto numberColors = sizeof(MPI_RGB_COLORS) / (3*sizeof(MPI_RGB_COLORS[0]));
+        const auto numberColors = sizeof(MPI_COLORS) / (3*sizeof(MPI_COLORS[0]));
        const auto radius = fastMin(targetWidth, targetHeight) / 100.f;
        const auto stickwidth = fastMin(targetWidth, targetHeight) / 120.f;

        // Render key points
        renderKeypoints(targetPtr, sharedMaxs, sharedMins, sharedScaleF,
                        globalIdx, x, y, targetWidth, targetHeight, posePtr, MPI_PAIRS_GPU, numberPeople,
-                        POSE_MPI_NUMBER_PARTS, numberPartPairs, MPI_RGB_COLORS, numberColors,
+                        POSE_MPI_NUMBER_PARTS, numberPartPairs, MPI_COLORS, numberColors,
                        radius, stickwidth, threshold, alphaColorToAdd, blendOriginalFrame);
    }

@@ -205,7 +149,7 @@ namespace op
        const auto x = (blockIdx.x * blockDim.x) + threadIdx.x;
        const auto y = (blockIdx.y * blockDim.y) + threadIdx.y;

-        const auto numberColors = sizeof(RGB_COLORS_BACKGROUND)/(3*sizeof(RGB_COLORS_BACKGROUND[0]));
+        const auto numberColors = sizeof(COCO_COLORS)/(3*sizeof(COCO_COLORS[0]));

        if (x < targetWidth && y < targetHeight)
        {
@@ -220,9 +164,9 @@ namespace op
                const auto offsetOrigin = part * heatMapArea;
                const auto value = __saturatef(heatMapPtr[offsetOrigin + yHeatMap*widthHeatMap + xHeatMap]); // __saturatef = trucate to [0,1]
                const auto rgbColorIndex = (part%numberColors)*3;
-                rgbColor[0] += value*RGB_COLORS_BACKGROUND[rgbColorIndex];
-                rgbColor[1] += value*RGB_COLORS_BACKGROUND[rgbColorIndex+1];
-                rgbColor[2] += value*RGB_COLORS_BACKGROUND[rgbColorIndex+2];
+                rgbColor[0] += value*COCO_COLORS[rgbColorIndex];
+                rgbColor[1] += value*COCO_COLORS[rgbColorIndex+1];
+                rgbColor[2] += value*COCO_COLORS[rgbColorIndex+2];
            }

            const auto blueIndex = y * targetWidth + x;
@@ -340,9 +284,9 @@ namespace op
        }
    }

-    inline void renderKeypointsPartAffinityAux(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize,
-                                               const float* const heatMapPtr, const Point<int>& heatMapSize, const float scaleToKeepRatio,
-                                               const int part, const int partsToRender, const float alphaBlending)
+    inline void renderPosePAFGpuAux(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize,
+                                    const float* const heatMapPtr, const Point<int>& heatMapSize, const float scaleToKeepRatio,
+                                    const int part, const int partsToRender, const float alphaBlending)
    {
        try
        {
@@ -363,8 +307,8 @@ namespace op
        }
    }

-    void renderPoseGpu(float* framePtr, const PoseModel poseModel, const int numberPeople, const Point<int>& frameSize, const float* const posePtr,
-                       const bool googlyEyes, const bool blendOriginalFrame, const float alphaBlending)
+    void renderPoseKeypointsGpu(float* framePtr, const PoseModel poseModel, const int numberPeople, const Point<int>& frameSize,
+                                const float* const posePtr, const bool googlyEyes, const bool blendOriginalFrame, const float alphaBlending)
    {
        try
        {
@@ -398,8 +342,8 @@ namespace op
        }
    }

-    void renderBodyPartGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatMapPtr,
-                           const Point<int>& heatMapSize, const float scaleToKeepRatio, const int part, const float alphaBlending)
+    void renderPoseHeatMapGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatMapPtr,
+                              const Point<int>& heatMapSize, const float scaleToKeepRatio, const int part, const float alphaBlending)
    {
        try
        {
@@ -422,8 +366,8 @@ namespace op
        }
    }

-    void renderBodyPartsGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatMapPtr,
-                            const Point<int>& heatMapSize, const float scaleToKeepRatio, const float alphaBlending)
+    void renderPoseHeatMapsGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatMapPtr,
+                               const Point<int>& heatMapSize, const float scaleToKeepRatio, const float alphaBlending)
    {
        try
        {
@@ -446,12 +390,12 @@ namespace op
        }
    }

-    void renderPartAffinityFieldGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatMapPtr,
-                                    const Point<int>& heatMapSize, const float scaleToKeepRatio, const int part, const float alphaBlending)
+    void renderPosePAFGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatMapPtr,
+                          const Point<int>& heatMapSize, const float scaleToKeepRatio, const int part, const float alphaBlending)
    {
        try
        {
-            renderKeypointsPartAffinityAux(framePtr, poseModel, frameSize, heatMapPtr, heatMapSize, scaleToKeepRatio, part, 1, alphaBlending);
+            renderPosePAFGpuAux(framePtr, poseModel, frameSize, heatMapPtr, heatMapSize, scaleToKeepRatio, part, 1, alphaBlending);
        }
        catch (const std::exception& e)
        {
@@ -459,14 +403,14 @@ namespace op
        }
    }

-    void renderPartAffinityFieldsGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatMapPtr,
-                                     const Point<int>& heatMapSize, const float scaleToKeepRatio, const float alphaBlending)
+    void renderPosePAFsGpu(float* framePtr, const PoseModel poseModel, const Point<int>& frameSize, const float* const heatMapPtr,
+                           const Point<int>& heatMapSize, const float scaleToKeepRatio, const float alphaBlending)
    {
        try
        {
            const auto numberBodyPartPairs = (int)POSE_BODY_PART_PAIRS[(int)poseModel].size()/2;
-            renderKeypointsPartAffinityAux(framePtr, poseModel, frameSize, heatMapPtr, heatMapSize, scaleToKeepRatio, POSE_NUMBER_BODY_PARTS[(int)poseModel]+1,
-                                           numberBodyPartPairs, alphaBlending);
+            renderPosePAFGpuAux(framePtr, poseModel, frameSize, heatMapPtr, heatMapSize, scaleToKeepRatio,
+                                POSE_NUMBER_BODY_PARTS[(int)poseModel]+1, numberBodyPartPairs, alphaBlending);
        }
        catch (const std::exception& e)
        {

--- a/src/openpose/producer/defineTemplates.cpp
+++ b/src/openpose/producer/defineTemplates.cpp
@@ -2,6 +2,6 @@

 namespace op
 {
-	template class DatumProducer<DATUM_BASE_NO_PTR>;
+    template class DatumProducer<DATUM_BASE_NO_PTR>;
    template class WDatumProducer<DATUM_BASE, DATUM_BASE_NO_PTR>;
 }
--- a/src/openpose/producer/videoReader.cpp
+++ b/src/openpose/producer/videoReader.cpp
@@ -27,7 +27,7 @@ namespace op
    {
        try
        {
-        	return VideoCaptureReader::getRawFrame();
+            return VideoCaptureReader::getRawFrame();
        }
        catch (const std::exception& e)
        {

--- a/src/openpose/utilities/keypoint.cpp
+++ b/src/openpose/utilities/keypoint.cpp
+#include <openpose/utilities/errorAndLog.hpp>
+#include <openpose/utilities/fastMath.hpp>
+#include <openpose/utilities/keypoint.hpp>
+
+namespace op
+{
+    const std::string errorMessage = "The Array<float> is not a RGB image. This function is only for array of dimension: [sizeA x sizeB x 3].";
+
+    float getDistance(const float* keypointPtr, const int elementA, const int elementB)
+    {
+        try
+        {
+            const auto pixelX = keypointPtr[elementA*3] - keypointPtr[elementB*3];
+            const auto pixelY = keypointPtr[elementA*3+1] - keypointPtr[elementB*3+1];
+            return std::sqrt(pixelX*pixelX+pixelY*pixelY);
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            return -1.f;
+        }
+    }
+
+    void scaleKeypoints(Array<float>& keypoints, const float scale)
+    {
+        try
+        {
+            scaleKeypoints(keypoints, scale, scale);
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
+
+    void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY)
+    {
+        try
+        {
+            if (scaleX != 1. && scaleY != 1.)
+            {
+                // Error check
+                if (!keypoints.empty() && keypoints.getSize(2) != 3)
+                    error(errorMessage, __LINE__, __FUNCTION__, __FILE__);
+                // Get #people and #parts
+                const auto numberPeople = keypoints.getSize(0);
+                const auto numberParts = keypoints.getSize(1);
+                // For each person
+                for (auto person = 0 ; person < numberPeople ; person++)
+                {
+                    // For each body part
+                    for (auto part = 0 ; part < numberParts ; part++)
+                    {
+                        const auto finalIndex = 3*(person*numberParts + part);
+                        keypoints[finalIndex] *= scaleX;
+                        keypoints[finalIndex+1] *= scaleY;
+                    }
+                }
+            }
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
+
+    void scaleKeypoints(Array<float>& keypoints, const float scaleX, const float scaleY, const float offsetX, const float offsetY)
+    {
+        try
+        {
+            if (scaleX != 1. && scaleY != 1.)
+            {
+                // Error check
+                if (!keypoints.empty() && keypoints.getSize(2) != 3)
+                    error(errorMessage, __LINE__, __FUNCTION__, __FILE__);
+                // Get #people and #parts
+                const auto numberPeople = keypoints.getSize(0);
+                const auto numberParts = keypoints.getSize(1);
+                // For each person
+                for (auto person = 0 ; person < numberPeople ; person++)
+                {
+                    // For each body part
+                    for (auto part = 0 ; part < numberParts ; part++)
+                    {
+                        const auto finalIndex = keypoints.getSize(2)*(person*numberParts + part);
+                        keypoints[finalIndex] = keypoints[finalIndex] * scaleX + offsetX;
+                        keypoints[finalIndex+1] = keypoints[finalIndex+1] * scaleY + offsetY;
+                    }
+                }
+            }
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
+
+    void renderKeypointsCpu(Array<float>& frameArray, const Array<float>& keypoints, const std::vector<unsigned int>& pairs,
+                            const std::vector<float> colors, const float thicknessCircleRatio, const float thicknessLineRatioWRTCircle)
+    {
+        try
+        {
+            if (!frameArray.empty())
+            {
+                // Array<float> --> cv::Mat
+                auto frame = frameArray.getCvMat();
+
+                // Security check
+                if (frame.dims != 3 || frame.size[0] != 3)
+                    error(errorMessage, __LINE__, __FUNCTION__, __FILE__);
+
+                // Get frame channels
+                const auto width = frame.size[2];
+                const auto height = frame.size[1];
+                const auto area = width * height;
+                cv::Mat frameB{height, width, CV_32FC1, &frame.data[0]};
+                cv::Mat frameG{height, width, CV_32FC1, &frame.data[area * sizeof(float) / sizeof(uchar)]};
+                cv::Mat frameR{height, width, CV_32FC1, &frame.data[2 * area * sizeof(float) / sizeof(uchar)]};
+
+                // Parameters
+                const auto lineType = 8;
+                const auto shift = 0;
+                const auto threshold = 0.1;
+                const auto numberColors = colors.size();
+                const auto numberBodyParts = keypoints.getSize(1);
+                const auto areaKeypoints = numberBodyParts * keypoints.getSize(2);
+
+                // Keypoints
+                for (auto person = 0 ; person < keypoints.getSize(0) ; person++)
+                {
+                    const auto personRectangle = getKeypointsRectangle(&keypoints[person*areaKeypoints], keypoints.getSize(1), threshold);
+                    const auto ratioAreas = fastMin(1.f, fastMax(personRectangle.width/(float)width, personRectangle.height/(float)height));
+                    // Size-dependent variables
+                    const auto thicknessCircle = fastMax(intRound(std::sqrt(area)*thicknessCircleRatio * ratioAreas), 2);
+                    const auto thicknessLine = intRound(thicknessCircle * thicknessLineRatioWRTCircle);
+                    const auto radius = thicknessCircle / 2;
+
+                    // Draw lines
+                    for (auto pair = 0 ; pair < pairs.size() ; pair+=2)
+                    {
+                        const auto index1 = (person * keypoints.getSize(1) + pairs[pair]) * keypoints.getSize(2);
+                        const auto index2 = (person * keypoints.getSize(1) + pairs[pair+1]) * keypoints.getSize(2);
+                        if (keypoints[index1+2] > threshold && keypoints[index2+2] > threshold)
+                        {
+                            const auto colorIndex = pair/2*3;
+                            const cv::Scalar color{colors[colorIndex % numberColors],
+                                                   colors[(colorIndex+1) % numberColors],
+                                                   colors[(colorIndex+2) % numberColors]};
+                            const cv::Point keypoint1{intRound(keypoints[index1]), intRound(keypoints[index1+1])};
+                            const cv::Point keypoint2{intRound(keypoints[index2]), intRound(keypoints[index2+1])};
+                            cv::line(frameR, keypoint1, keypoint2, color[0], thicknessLine, lineType, shift);
+                            cv::line(frameG, keypoint1, keypoint2, color[1], thicknessLine, lineType, shift);
+                            cv::line(frameB, keypoint1, keypoint2, color[2], thicknessLine, lineType, shift);
+                        }
+                    }
+
+                    // Draw circles
+                    for (auto part = 0 ; part < keypoints.getSize(1) ; part++)
+                    {
+                        const auto faceIndex = (person * keypoints.getSize(1) + part) * keypoints.getSize(2);
+                        if (keypoints[faceIndex+2] > threshold)
+                        {
+                            const auto colorIndex = part*3;
+                            const cv::Scalar color{colors[colorIndex % numberColors],
+                                                   colors[(colorIndex+1) % numberColors],
+                                                   colors[(colorIndex+2) % numberColors]};
+                            const cv::Point center{intRound(keypoints[faceIndex]), intRound(keypoints[faceIndex+1])};
+                            cv::circle(frameR, center, radius, color[0], thicknessCircle, lineType, shift);
+                            cv::circle(frameG, center, radius, color[1], thicknessCircle, lineType, shift);
+                            cv::circle(frameB, center, radius, color[2], thicknessCircle, lineType, shift);
+                        }
+                    }
+                }
+            }
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
+
+    Rectangle<unsigned int> getKeypointsRectangle(const float* keypointPtr, const int numberBodyParts, const float threshold)
+    {
+        try
+        {
+            if (numberBodyParts < 1)
+                error("Number body parts must be > 0", __LINE__, __FUNCTION__, __FILE__);
+
+            unsigned int minX = -1;
+            unsigned int maxX = 0u;
+            unsigned int minY = -1;
+            unsigned int maxY = 0u;
+            for (auto part = 0 ; part < numberBodyParts ; part++)
+            {
+                const auto score = keypointPtr[3*part + 2];
+                if (score > threshold)
+                {
+                    const auto x = keypointPtr[3*part];
+                    const auto y = keypointPtr[3*part + 1];
+                    // Set X
+                    if (maxX < x)
+                        maxX = x;
+                    if (minX > x)
+                        minX = x;
+                    // Set Y
+                    if (maxY < y)
+                        maxY = y;
+                    if (minY > y)
+                        minY = y;
+                }
+            }
+            return Rectangle<unsigned int>{minX, minY, maxX-minX, maxY-minY};
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            return Rectangle<unsigned int>{};
+        }
+    }
+
+    int getKeypointsArea(const float* keypointPtr, const int numberBodyParts, const float threshold)
+    {
+        try
+        {
+            return getKeypointsRectangle(keypointPtr, numberBodyParts, threshold).area();
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            return 0;
+        }
+    }
+
+    int getBiggestPerson(const Array<float>& keypoints, const float threshold)
+    {
+        try
+        {
+            if (!keypoints.empty())
+            {
+                const auto numberPeople = keypoints.getSize(0);
+                const auto numberBodyParts = keypoints.getSize(1);
+                const auto area = numberBodyParts * keypoints.getSize(2);
+                auto biggestPoseIndex = -1;
+                auto biggestArea = -1;
+                for (auto person = 0 ; person < numberPeople ; person++)
+                {
+                    const auto newPersonArea = getKeypointsArea(&keypoints[person*area], numberBodyParts, threshold);
+                    if (newPersonArea > biggestArea)
+                    {
+                        biggestArea = newPersonArea;
+                        biggestPoseIndex = person;
+                    }
+                }
+                return biggestPoseIndex;
+            }
+            else
+                return -1;
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+            return -1;
+        }
+    }
+}
--- a/src/openpose/wrapper/wrapperStructFace.cpp
+++ b/src/openpose/wrapper/wrapperStructFace.cpp
@@ -2,11 +2,11 @@

 namespace op
 {
-    WrapperStructFace::WrapperStructFace(const bool enable_, const Point<int>& netInputSize_, const bool renderOutput_, const float alphaKeypoint_,
-                          				 const float alphaHeatMap_) :
+    WrapperStructFace::WrapperStructFace(const bool enable_, const Point<int>& netInputSize_, const RenderMode renderMode_,
+                                         const float alphaKeypoint_, const float alphaHeatMap_) :
        enable{enable_},
        netInputSize{netInputSize_},
-        renderOutput{renderOutput_},
+        renderMode{renderMode_},
        alphaKeypoint{alphaKeypoint_},
        alphaHeatMap{alphaHeatMap_}
    {

--- a/src/openpose/wrapper/wrapperStructPose.cpp
+++ b/src/openpose/wrapper/wrapperStructPose.cpp
@@ -3,7 +3,7 @@
 namespace op
 {
    WrapperStructPose::WrapperStructPose(const Point<int>& netInputSize_, const Point<int>& outputSize_, const ScaleMode keypointScale_, const int gpuNumber_,
-                                         const int gpuNumberStart_, const int scalesNumber_, const float scaleGap_, const bool renderOutput_,
+                                         const int gpuNumberStart_, const int scalesNumber_, const float scaleGap_, const RenderMode renderMode_,
                                         const PoseModel poseModel_, const bool blendOriginalFrame_, const float alphaKeypoint_, const float alphaHeatMap_,
                                         const int defaultPartToRender_, const std::string& modelFolder_, const std::vector<HeatMapType>& heatMapTypes_,
                                         const ScaleMode heatMapScale_) :
@@ -14,7 +14,7 @@ namespace op
        gpuNumberStart{gpuNumberStart_},
        scalesNumber{scalesNumber_},
        scaleGap{scaleGap_},
-        renderOutput{renderOutput_},
+        renderMode{renderMode_},
        poseModel{poseModel_},
        blendOriginalFrame{blendOriginalFrame_},
        alphaKeypoint{alphaKeypoint_},