Slightly improved mAP and reduced false positives, but reduced mAR

3c9441ae · Gines Hidalgo · 80fc1144 · 3c9441ae · 3c9441ae · 3c9441ae
16 changed file
--- a/doc/faq.md
+++ b/doc/faq.md
@@ -17,6 +17,7 @@ OpenPose - Frequently Asked Question (FAQ)
        11. [CUDA_cublas_device_LIBRARY Not Found](#cuda_cublas_device_library-not-found)
        12. [CMake-GUI Error While Getting Default Caffe](#cmake-gui-error-while-getting-default-caffe)
        13. [Libgomp Out of Memory Error](#libgomp-out-of-memory-error)
+        14. [Runtime Error with Turing GPU (Tesla T4) or Volta GPU][#runtime-error-with-turing-gpu-teslat4-or-volta-gpu)
    2. [Speed Performance Issues](#speed-performance-issues)
        1. [Speed Up, Memory Reduction, and Benchmark](#speed-up-memory-reduction-and-benchmark)
        2. [How to Measure the Latency Time?](#how-to-measure-the-latency-time)
@@ -39,7 +40,7 @@ OpenPose - Frequently Asked Question (FAQ)
 #### Out of Memory Error
 **Q: Out of memory error** - I get an error similar to: `Check failed: error == cudaSuccess (2 vs. 0)  out of memory`.
-**A**: Most probably cuDNN is not installed/enabled, the default Caffe model uses >12 GB of GPU memory, cuDNN reduces it to ~2 GB for BODY_25 and ~1.5 GB for COCO.
+**A**: Most probably cuDNN is not installed/enabled, the default Caffe model uses >12 GB of GPU memory, cuDNN reduces it to ~2.2 GB for BODY_25 (default) and ~1.5 GB for COCO (`--model_pose COCO`). Note that you still need at least about 2.2 GB free for the default OpenPose to run. I.e., GPUs with only 2 GB will not fit the default OpenPose, and you will have to either switch to the `COCO` model (slower and less accurate), or reduce the `--net_resolution` (faster speed but also lower accuracy).
@@ -162,6 +163,14 @@ git submodle update
+#### Runtime Error with Turing GPU (Tesla T4) or Volta GPU
+**Q**: When I start OpenPose, I receive a runtime error for new GPU architectures.
+**A**: To solve this problem, 1) make sure you are using CUDA 10 or higher, and 2) change line 7 in `{OPENPOSE_PATH}/3rdparty/caffe/cmake/Cuda.cmake`, from `set(Caffe_known_gpu_archs "30 35 50 52 60 61")` to `set(Caffe_known_gpu_archs "30 35 50 52 60 61 75")`.
 ### Speed Performance Issues
 #### Speed Up, Memory Reduction, and Benchmark

--- a/doc/installation.md
+++ b/doc/installation.md
@@ -54,7 +54,12 @@ We add links to some community-based work based on OpenPose. Note: We do not sup
 - [ROS example](https://github.com/firephinx/openpose_ros) (based on a very old OpenPose version). For questions and more details, read and post ONLY on [issue thread #51](https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/51).
 - Docker Images. For questions and more details, read and post ONLY on [issue thread #347](https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/347).
-    - Dockerfile working with CUDA 10: [link 1](https://github.com/ExSidius/openpose-docker/blob/master/Dockerfile) and [link 2](https://cloud.docker.com/repository/docker/exsidius/openpose/general).
+    - Dockerfile working also with CUDA 10:
+        - [Link 1](https://github.com/esemeniuc/openpose-docker), it claims to also include Python support. Read and post ONLY on [issue thread #1102](https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/1102).
+        - [Link 2](https://github.com/ExSidius/openpose-docker/blob/master/Dockerfile).
+        - [Link 3](https://cloud.docker.com/repository/docker/exsidius/openpose/general).
+    - Dockerfile working only with CUDA 8:
+        - [Dockerfile - OpenPose v1.4.0, OpenCV, CUDA 8, CuDNN 5, Python2.7](https://github.com/tlkh/openpose). Read and post ONLY on [issue thread #1102](https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/1102).
        - [Dockerfile - OpenPose v1.4.0, OpenCV, CUDA 8, CuDNN 6, Python2.7](https://gist.github.com/moiseevigor/11c02c694fc0c22fccd59521793aeaa6).
        - [Dockerfile - OpenPose v1.2.1](https://gist.github.com/sberryman/6770363f02336af82cb175a83b79de33).
@@ -168,7 +173,25 @@ make -j`nproc`
 ```
 #### Windows
-In order to build the project, open the Visual Studio solution (Windows), called `build/OpenPose.sln`. Then, set the configuration from `Debug` to `Release` and press the green triangle icon (alternatively press <kbd>F5</kbd>).
+In order to build the project, select and run only one of the 2 following alternatives.
+1. **CMake-GUI alternative (recommended)**: Open the Visual Studio solution (Windows), called `build/OpenPose.sln`. Then, set the configuration from `Debug` to `Release` and press the green triangle icon (alternatively press <kbd>F5</kbd>).
+2. Command-line build alternative (not recommended). NOTE: The command line alternative is not officially supported, but it was added in [GitHub issue #1198](https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/1198). For any questions or bug report about this command-line version, comment in that GitHub issue.
+    1. Run "MSVS 2017 Developer Command Console"
+    ```
+    openpose\mkdir  build
+    cd build
+    cmake .. -G "Visual Studio 15 2017 Win64" -T v140
+    cmake --build . --config Release
+    copy x64\Release\*  bin\
+    ```
+    2. If you want to clean build
+    ```
+    cmake --clean-first .
+    cmake --build . --config Release
+    copy x64\Release\*  bin\
+    ```
 **VERY IMPORTANT NOTE**: In order to use OpenPose outside Visual Studio, and assuming you have not unchecked the `BUILD_BIN_FOLDER` flag in CMake, copy all DLLs from `{build_directory}/bin` into the folder where the generated `openpose.dll` and `*.exe` demos are, e.g., `{build_directory}x64/Release` for the 64-bit release version.

--- a/doc/modules/3d_reconstruction_module.md
+++ b/doc/modules/3d_reconstruction_module.md
@@ -23,13 +23,13 @@ This module performs 3-D keypoint (body, face, and hand) reconstruction and rend
 ## Installation
-Check [doc/installation.md#3d-reconstruction-module](./installation.md#3d-reconstruction-module) for installation steps.
+Check [doc/installation.md#3d-reconstruction-module](../installation.md#3d-reconstruction-module) for installation steps.
 ## Non Linear Optimization
-In order to increase the 3-D reconstruction accuracy, OpenPose optionally performs non-linear optimization if Ceres solver support is enabled (only available in Ubuntu for now). To enable it, check [doc/installation.md#3d-reconstruction-module](./installation.md#3d-reconstruction-module) for more details.
+In order to increase the 3-D reconstruction accuracy, OpenPose optionally performs non-linear optimization if Ceres solver support is enabled (only available in Ubuntu for now). To enable it, check [doc/installation.md#3d-reconstruction-module](../installation.md#3d-reconstruction-module) for more details.

--- a/doc/release_notes.md
+++ b/doc/release_notes.md
@@ -369,10 +369,14 @@ OpenPose Library - Release Notes
 1. Main improvements:
    1. Highly improved 3D triangulation for >3 cameras by fixing some small bugs.
    2. Added community-based support for Nvidia NVCaffe.
+    3. Increased accuracy very lightly for CUDA version (about 0.01%) by adapting the threshold in `process()` in `bodyPartConnectorBase.cu` to `defaultNmsThreshold`. This also removes any posibility of future bugs in that function for using a default NMS threshold higher than 0.15 (which was the hard-coded value used previously).
+    4. Increased mAP but reduced mAR (both about 0.01%) as well as reduction of false positives. Step 1: removed legs where only knee/ankle/feet are found. Step 2: If no people is found in an image, `removePeopleBelowThresholds` is re-run with `maximizePositives = true`.
+    5. Number of maximum people is not limited by the maximum number of max peaks anymore. However, the number of body part candidates for a specific keypoint (e.g., nose) is still limited to the number of max peaks.
 2. Functions or parameters renamed:
    1. `--3d_min_views` default value (-1) no longer means that all camera views are required. Instead, it will be equal to max(2, min(4, #cameras-1)). This should provide a good trade-off between recall and precission.
 3. Main bugs fixed:
    1. Windows: Added back support for OpenGL and Spinnaker, as well as DLLs for debug compilation.
+    2. `06_face_from_image.cpp` and `07_hand_from_image.cpp` working again, they stopped working in version 1.5.0 with the GPU image resize for the GUI.
 4. Changes/additions that affect the compatibility with the OpenPose Unity Plugin:

--- a/include/openpose/net/bodyPartConnectorBase.hpp
+++ b/include/openpose/net/bodyPartConnectorBase.hpp
@@ -18,10 +18,10 @@ namespace op
    void connectBodyPartsGpu(
        Array<T>& poseKeypoints, Array<T>& poseScores, const T* const heatMapGpuPtr, const T* const peaksPtr,
        const PoseModel poseModel, const Point<int>& heatMapSize, const int maxPeaks, const T interMinAboveThreshold,
-        const T interThreshold, const int minSubsetCnt, const T minSubsetScore, const T scaleFactor = 1.f,
+        const T interThreshold, const int minSubsetCnt, const T minSubsetScore, const T scaleFactor,
-        const bool maximizePositives = false, Array<T> pairScoresCpu = Array<T>{}, T* pairScoresGpuPtr = nullptr,
+        const bool maximizePositives, Array<T> pairScoresCpu, T* pairScoresGpuPtr,
-        const unsigned int* const bodyPartPairsGpuPtr = nullptr, const unsigned int* const mapIdxGpuPtr = nullptr,
+        const unsigned int* const bodyPartPairsGpuPtr, const unsigned int* const mapIdxGpuPtr,
-        const T* const peaksGpuPtr = nullptr);
+        const T* const peaksGpuPtr, const T defaultNmsThreshold);
    template <typename T>
    void connectBodyPartsOcl(
@@ -41,16 +41,16 @@ namespace op
        const unsigned int numberBodyPartPairs, const Array<T>& precomputedPAFs = Array<T>());
    template <typename T>
-    void removePeopleBelowThresholds(std::vector<int>& validSubsetIndexes, int& numberPeople,
+    void removePeopleBelowThresholdsAndFillFaces(
-                                            const std::vector<std::pair<std::vector<int>, T>>& subsets,
+        std::vector<int>& validSubsetIndexes, int& numberPeople,
-                                            const unsigned int numberBodyParts, const int minSubsetCnt,
+        std::vector<std::pair<std::vector<int>, T>>& subsets, const unsigned int numberBodyParts,
-                                            const T minSubsetScore, const int maxPeaks, const bool maximizePositives);
+        const int minSubsetCnt, const T minSubsetScore, const bool maximizePositives, const T* const peaksPtr);
    template <typename T>
-    void peopleVectorToPeopleArray(Array<T>& poseKeypoints, Array<T>& poseScores, const T scaleFactor,
+    void peopleVectorToPeopleArray(
-                                          const std::vector<std::pair<std::vector<int>, T>>& subsets,
+        Array<T>& poseKeypoints, Array<T>& poseScores, const T scaleFactor,
-                                          const std::vector<int>& validSubsetIndexes, const T* const peaksPtr,
+        const std::vector<std::pair<std::vector<int>, T>>& subsets, const std::vector<int>& validSubsetIndexes,
-                                          const int numberPeople, const unsigned int numberBodyParts,
+        const T* const peaksPtr, const int numberPeople, const unsigned int numberBodyParts,
        const unsigned int numberBodyPartPairs);
    template <typename T>

--- a/include/openpose/net/bodyPartConnectorCaffe.hpp
+++ b/include/openpose/net/bodyPartConnectorCaffe.hpp
@@ -25,6 +25,8 @@ namespace op
        void setMaximizePositives(const bool maximizePositives);
+        void setDefaultNmsThreshold(const T defaultNmsThreshold);
        void setInterMinAboveThreshold(const T interMinAboveThreshold);
        void setInterThreshold(const T interThreshold);
@@ -56,6 +58,7 @@ namespace op
    private:
        PoseModel mPoseModel;
        bool mMaximizePositives;
+        T mDefaultNmsThreshold;
        T mInterMinAboveThreshold;
        T mInterThreshold;
        int mMinSubsetCnt;

--- a/include/openpose/pose/poseParametersRender.hpp
+++ b/include/openpose/pose/poseParametersRender.hpp
@@ -210,10 +210,12 @@ namespace op
        1.f,1.f,1.f,1.f,1.f,1.f, \
        0.60f,0.60f,0.60f,0.60f,0.60f, 0.60f,0.60f,0.60f,0.60f,0.60f, 0.60f,0.60f,0.60f,0.60f,0.60f, 0.60f,0.60f,0.60f,0.60f,0.60f, \
        0.60f,0.60f,0.60f,0.60f,0.60f, 0.60f,0.60f,0.60f,0.60f,0.60f, 0.60f,0.60f,0.60f,0.60f,0.60f, 0.60f,0.60f,0.60f,0.60f,0.60f, \
-        0.00f,0.00f,0.00f,0.00f,0.00f, 0.00f,0.00f,0.00f,0.00f,0.00f, 0.00f,0.00f,0.00f,0.00f,0.00f, 0.00f,0.00f,0.45f,0.45f,0.45f, \
+        0.45f,0.45f,0.45f,0.45f,0.45f, 0.45f,0.45f,0.45f,0.45f,0.45f, 0.45f,0.45f,0.45f,0.45f,0.45f, 0.45f,0.45f,0.45f,0.45f,0.45f, \
        0.45f,0.45f,0.45f,0.45f,0.45f, 0.45f,0.45f,0.45f,0.45f,0.45f, 0.45f,0.45f,0.45f,0.45f,0.45f, 0.45f,0.45f,0.45f,0.45f,0.45f, \
        0.45f,0.45f,0.45f,0.45f,0.45f, 0.45f,0.45f,0.45f,0.45f,0.45f, 0.45f,0.45f,0.45f,0.45f,0.45f, 0.45f,0.45f,0.45f,0.45f,0.45f, \
        0.45f,0.45f,0.45f,0.45f,0.45f, 0.45f,0.45f,0.45f,0.45f,0.45f
+        // First 0.45f row:
+        // 0.00f,0.00f,0.00f,0.00f,0.00f, 0.00f,0.00f,0.00f,0.00f,0.00f, 0.00f,0.00f,0.00f,0.00f,0.00f, 0.00f,0.00f,0.45f,0.45f,0.45f,
    #define POSE_BODY_135_COLORS_RENDER_GPU \
        255.f,     0.f,    85.f, \
        170.f,     0.f,   255.f, \

--- a/include/openpose/utilities/keypoint.hpp
+++ b/include/openpose/utilities/keypoint.hpp
@@ -21,12 +21,15 @@ namespace op
    void scaleKeypoints2d(Array<T>& keypoints, const T scaleX, const T scaleY, const T offsetX, const T offsetY);
    template <typename T>
-    void renderKeypointsCpu(Array<T>& frameArray, const Array<T>& keypoints, const std::vector<unsigned int>& pairs,
+    void renderKeypointsCpu(
-                            const std::vector<T> colors, const T thicknessCircleRatio,
+        Array<T>& frameArray, const Array<T>& keypoints, const std::vector<unsigned int>& pairs,
-                            const T thicknessLineRatioWRTCircle, const std::vector<T>& poseScales, const T threshold);
+        const std::vector<T> colors, const T thicknessCircleRatio, const T thicknessLineRatioWRTCircle,
+        const std::vector<T>& poseScales, const T threshold);
    template <typename T>
-    Rectangle<T> getKeypointsRectangle(const Array<T>& keypoints, const int person, const T threshold);
+    Rectangle<T> getKeypointsRectangle(
+        const Array<T>& keypoints, const int person, const T threshold, const int firstIndex = 0,
+        const int lastIndex = -1);
    template <typename T>
    T getAverageScore(const Array<T>& keypoints, const int person);
@@ -44,7 +47,8 @@ namespace op
    T getDistanceAverage(const Array<T>& keypoints, const int personA, const int personB, const T threshold);
    template <typename T>
-    T getDistanceAverage(const Array<T>& keypointsA, const int personA, const Array<T>& keypointsB, const int personB,
+    T getDistanceAverage(
+        const Array<T>& keypointsA, const int personA, const Array<T>& keypointsB, const int personB,
        const T threshold);
    /**

--- a/include/openpose/wrapper/wrapperAuxiliary.hpp
+++ b/include/openpose/wrapper/wrapperAuxiliary.hpp
@@ -267,7 +267,7 @@ namespace op
                // Input cvMat to OpenPose input & output format
                // Note: resize on GPU reduces accuracy about 0.1%
                bool resizeOnCpu = true;
-                // const auto resizeOnCpu = (numberGpuThreads < 3);
+                // const auto resizeOnCpu = (wrapperStructPose.poseMode != PoseMode::Enabled);
                if (resizeOnCpu)
                {
                    const auto gpuResize = false;
@@ -277,7 +277,8 @@ namespace op
                }
                // Note: We realized that somehow doing it on GPU for any number of GPUs does speedup the whole OP
                resizeOnCpu = false;
-                addCvMatToOpOutputInCpu = addCvMatToOpOutput && (resizeOnCpu || !renderOutputGpu);
+                addCvMatToOpOutputInCpu = addCvMatToOpOutput
+                    && (resizeOnCpu || !renderOutputGpu || wrapperStructPose.poseMode != PoseMode::Enabled);
                if (addCvMatToOpOutputInCpu)
                {
                    const auto gpuResize = false;
@@ -618,7 +619,7 @@ namespace op
                    {
                        const auto gpuResize = true;
                        opOutputToCvMats.emplace_back(std::make_shared<OpOutputToCvMat>(gpuResize));
-                        poseExtractorsWs[i].emplace_back(
+                        poseExtractorsWs.at(i).emplace_back(
                            std::make_shared<WOpOutputToCvMat<TDatumsSP>>(opOutputToCvMats.back()));
                        // Assign shared parameters
                        opOutputToCvMats.back()->setSharedParameters(

--- a/scripts/ubuntu/Makefile.example
+++ b/scripts/ubuntu/Makefile.example
@@ -33,7 +33,7 @@ LIBRARY_NAME := $(PROJECT)
 LIB_BUILD_DIR := $(BUILD_DIR)/lib
 STATIC_NAME := $(LIB_BUILD_DIR)/lib$(LIBRARY_NAME).a
 DYNAMIC_VERSION_MAJOR 		:= 1
-DYNAMIC_VERSION_MINOR 		:= 4
+DYNAMIC_VERSION_MINOR 		:= 5
 DYNAMIC_VERSION_REVISION 	:= 0
 DYNAMIC_NAME_SHORT := lib$(LIBRARY_NAME).so
 #DYNAMIC_SONAME_SHORT := $(DYNAMIC_NAME_SHORT).$(DYNAMIC_VERSION_MAJOR)

--- a/src/openpose/net/bodyPartConnectorBase.cpp
+++ b/src/openpose/net/bodyPartConnectorBase.cpp
 #include <set>
 #include <openpose/utilities/check.hpp>
 #include <openpose/utilities/fastMath.hpp>
+#include <openpose/utilities/keypoint.hpp>
 #include <openpose/pose/poseParameters.hpp>
 #include <openpose/net/bodyPartConnectorBase.hpp>
 namespace op
 {
    template <typename T>
-    inline T getScoreAB(const int i, const int j, const T* const candidateAPtr, const T* const candidateBPtr,
+    inline T getScoreAB(
-                        const T* const mapX, const T* const mapY, const Point<int>& heatMapSize,
+        const int i, const int j, const T* const candidateAPtr, const T* const candidateBPtr, const T* const mapX,
-                        const T interThreshold, const T interMinAboveThreshold)
+        const T* const mapY, const Point<int>& heatMapSize, const T interThreshold, const T interMinAboveThreshold)
    {
        try
        {
@@ -57,6 +58,27 @@ namespace op
        }
    }
+    template <typename T>
+    void getKeypointCounter(
+        int& personCounter, const std::vector<std::pair<std::vector<int>, T>>& peopleVector,
+        const unsigned int index, const int indexFirst, const int indexLast, const int minimum)
+    {
+        try
+        {
+            // Count keypoints
+            auto keypointCounter = 0;
+            for (auto i = indexFirst ; i < indexLast ; i++)
+                keypointCounter += (peopleVector[index].first.at(i) > 0);
+            // If enough keypoints --> subtract them and keep them at least as big as minimum
+            if (keypointCounter > minimum)
+                personCounter += minimum-keypointCounter; // personCounter = non-considered keypoints + minimum
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
    template <typename T>
    std::vector<std::pair<std::vector<int>, T>> createPeopleVector(
        const T* const heatMapPtr, const T* const peaksPtr, const PoseModel poseModel, const Point<int>& heatMapSize,
@@ -211,8 +233,9 @@ namespace op
                            for (auto j = 1; j <= numberPeaksB; j++)
                            {
                                // Initial PAF
-                                auto scoreAB = getScoreAB(i, j, candidateAPtr, candidateBPtr, mapX, mapY,
+                                auto scoreAB = getScoreAB(
-                                                          heatMapSize, interThreshold, interMinAboveThreshold);
+                                    i, j, candidateAPtr, candidateBPtr, mapX, mapY, heatMapSize, interThreshold,
+                                    interMinAboveThreshold);
                                // E.g., neck-nose connection. If possible PAF between neck i, nose j --> add
                                // parts score + connection score
@@ -263,9 +286,8 @@ namespace op
                            const auto indexB = std::get<2>(aBConnection);
                            if (!occurA[indexA-1] && !occurB[indexB-1])
                            {
-                                abConnections.emplace_back(std::make_tuple(bodyPartA*peaksOffset + indexA*3 + 2,
+                                abConnections.emplace_back(std::make_tuple(
-                                                                           bodyPartB*peaksOffset + indexB*3 + 2,
+                                    bodyPartA*peaksOffset+indexA*3+2, bodyPartB*peaksOffset+indexB*3+2, score));
-                                                                           score));
                                counter++;
                                if (counter==minAB)
                                    break;
@@ -298,8 +320,8 @@ namespace op
                        // Add ears connections (in case person is looking to opposite direction to camera)
                        // Note: This has some issues:
                        //     - It does not prevent repeating the same keypoint in different people
-                        //     - Assuming I have nose,eye,ear as 1 person subset, and whole arm as another one, it will not
+                        //     - Assuming I have nose,eye,ear as 1 person subset, and whole arm as another one, it
-                        //       merge them both
+                        //       will not merge them both
                        else if (
                            (numberBodyParts == 18 && (pairIndex==17 || pairIndex==18))
                            || ((numberBodyParts == 19 || (numberBodyParts == 25)
@@ -622,49 +644,139 @@ namespace op
    }
    template <typename T>
-    void removePeopleBelowThresholds(
+    void getRoiDiameterAndBounds(
+        Rectangle<int>& roi, int& diameter, int& indexFirstNon0, int& indexLastNon0,
+        const std::vector<int>& personVector, const T* const peaksPtr,
+        const int indexInit, const int indexEnd)
+    {
+        try
+        {
+            roi = Rectangle<int>{0,0,0,0};
+            for (auto index = 0u ; index < personVector.size()-1 ; index++)
+            {
+                const auto x = peaksPtr[personVector[index]-2];
+                const auto y = peaksPtr[personVector[index]-1];
+                const auto score = peaksPtr[personVector[index]];
+                if (roi.x > x)
+                    roi.x = x;
+                if (roi.y > y)
+                    roi.y = y;
+            }
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
+    template <typename T>
+    void removePeopleBelowThresholdsAndFillFaces(
        std::vector<int>& validSubsetIndexes, int& numberPeople,
-        const std::vector<std::pair<std::vector<int>, T>>& peopleVector, const unsigned int numberBodyParts,
+        std::vector<std::pair<std::vector<int>, T>>& peopleVector, const unsigned int numberBodyParts,
-        const int minSubsetCnt, const T minSubsetScore, const int maxPeaks, const bool maximizePositives)
+        const int minSubsetCnt, const T minSubsetScore, const bool maximizePositives, const T* const peaksPtr)
+        // const int minSubsetCnt, const T minSubsetScore, const int maxPeaks, const bool maximizePositives)
    {
        try
        {
            // Delete people below the following thresholds:
                // a) minSubsetCnt: removed if less than minSubsetCnt body parts
                // b) minSubsetScore: removed if global score smaller than this
-                // c) maxPeaks (POSE_MAX_PEOPLE): keep first maxPeaks people above thresholds
+                // c) maxPeaks (POSE_MAX_PEOPLE): keep first maxPeaks people above thresholds -> Not required
            numberPeople = 0;
            validSubsetIndexes.clear();
-            validSubsetIndexes.reserve(fastMin((size_t)maxPeaks, peopleVector.size()));
+            // validSubsetIndexes.reserve(fastMin((size_t)maxPeaks, peopleVector.size())); // maxPeaks is not required
+            validSubsetIndexes.reserve(peopleVector.size());
+            // Face valid sets
+            std::vector<int> faceValidSubsetIndexes;
+            faceValidSubsetIndexes.reserve(peopleVector.size());
+            // Face invalid sets
+            std::vector<int> faceInvalidSubsetIndexes;
+            faceInvalidSubsetIndexes.reserve(peopleVector.size());
+            // For each person candidate
            for (auto index = 0u ; index < peopleVector.size() ; index++)
            {
                auto personCounter = peopleVector[index].first.back();
+                // Analog for hand/face keypoints
+                if (numberBodyParts >= 135)
+                {
+                    // No consider face keypoints for personCounter
+                    const auto currentCounter = personCounter;
+                    getKeypointCounter(personCounter, peopleVector, index, 65, 135, 1);
+                    const auto newCounter = personCounter;
+                    if (personCounter == 0)
+                    {
+                        faceInvalidSubsetIndexes.emplace_back(index);
+                        continue;
+                    }
+                    // If body is still valid and facial points were removed, then add to valid faces
+                    else if (currentCounter != newCounter)
+                        faceValidSubsetIndexes.emplace_back(index);
+                    // No consider right hand keypoints for personCounter
+                    getKeypointCounter(personCounter, peopleVector, index, 45, 65, 1);
+                    // No consider left hand keypoints for personCounter
+                    getKeypointCounter(personCounter, peopleVector, index, 25, 45, 1);
+                }
                // Foot keypoints do not affect personCounter (too many false positives,
                // same foot usually appears as both left and right keypoints)
                // Pros: Removed tons of false positives
                // Cons: Standalone leg will never be recorded
+                // Solution: No consider foot keypoints for that
                if (!maximizePositives && (numberBodyParts == 25 || numberBodyParts > 70))
                {
-                    // No consider foot keypoints for that
+                    const auto currentCounter = personCounter;
-                    for (auto i = 19 ; i < 25 ; i++)
+                    getKeypointCounter(personCounter, peopleVector, index, 19, 25, 0);
-                        personCounter -= (peopleVector[index].first.at(i) > 0);
+                    const auto newCounter = personCounter;
-                    // No consider hand keypoints for that
+                    // Problem: Same leg/foot keypoints are considered for both left and right keypoints.
-                    if (numberBodyParts > 70)
+                    // Solution: Remove legs that are duplicated and that do not have upper torso
-                        for (auto i = 25 ; i < 65 ; i++)
+                    // Result: Slight increase in COCO mAP and decrease in mAR + reducing a lot false positives!
-                            personCounter -= (peopleVector[index].first.at(i) > 0);
+                    if (newCounter != currentCounter && newCounter <= 4)
+                        continue;
                }
+                // Add only valid people
                const auto personScore = peopleVector[index].second;
                if (personCounter >= minSubsetCnt && (personScore/personCounter) >= minSubsetScore)
                {
                    numberPeople++;
                    validSubsetIndexes.emplace_back(index);
-                    if (numberPeople == maxPeaks)
+                    // // This is not required, it is OK if there are more people. No more GPU memory used.
-                        break;
+                    // if (numberPeople == maxPeaks)
+                    //     break;
                }
+                // Sanity check
                else if ((personCounter < 1 && numberBodyParts != 25 && numberBodyParts < 70) || personCounter < 0)
                    error("Bad personCounter (" + std::to_string(personCounter) + "). Bug in this"
                          " function if this happens.", __LINE__, __FUNCTION__, __FILE__);
            }
+//             // Random standalone facial keypoints --> Merge into a more complete face
+//             if (numberPeople > 0 && faceInvalidSubsetIndexes.size() > 0)
+//             {
+//                 for (auto faceId = 0u ; faceId < faceInvalidSubsetIndexes.size() ; faceId++)
+//                 {
+//                     // Get ROI
+//                     Rectangle<int> roi;
+//                     int diameter;
+//                     int indexFirstNon0;
+//                     int indexLastNon0;
+//                     const auto index = faceValidSubsetIndexes[faceId];
+//                     getRoiDiameterAndBounds(
+//                         roi, diameter, indexFirstNon0, indexLastNon0, peopleVector[index].first, peaksPtr, 65, 135);
+//                     // const auto personCounter = peopleVector[index].first.back();
+//                     // const auto x = peaksPtr[peopleVector[index].first[part]-2];
+//                     // const auto y = peaksPtr[peopleVector[index].first[part]-1];
+//                     // const auto score = peaksPtr[peopleVector[index].first[part]];
+//                 }
+//             }
+            // If no people found --> Repeat with maximizePositives = true
+            // Result: Increased COCO mAP because we catch more foot-only images
+            if (numberPeople == 0 && !maximizePositives)
+            {
+                removePeopleBelowThresholdsAndFillFaces(
+                    validSubsetIndexes, numberPeople, peopleVector, numberBodyParts, minSubsetCnt, minSubsetScore,
+                    true, peaksPtr);
+                // // Debugging
+                // if (numberPeople > 0)
+                //     log("Found " + std::to_string(numberPeople) + " people in second iteration");
+            }
        }
        catch (const std::exception& e)
        {
@@ -673,30 +785,35 @@ namespace op
    }
    template <typename T>
-    void peopleVectorToPeopleArray(Array<T>& poseKeypoints, Array<T>& poseScores, const T scaleFactor,
+    void peopleVectorToPeopleArray(
-                                   const std::vector<std::pair<std::vector<int>, T>>& peopleVector,
+        Array<T>& poseKeypoints, Array<T>& poseScores, const T scaleFactor,
-                                   const std::vector<int>& validSubsetIndexes, const T* const peaksPtr,
+        const std::vector<std::pair<std::vector<int>, T>>& peopleVector, const std::vector<int>& validSubsetIndexes,
-                                   const int numberPeople, const unsigned int numberBodyParts,
+        const T* const peaksPtr, const int numberPeople, const unsigned int numberBodyParts,
        const unsigned int numberBodyPartPairs)
    {
        try
        {
+            // Allocate memory (initialized to 0)
            if (numberPeople > 0)
            {
                // Initialized to 0 for non-found keypoints in people
                poseKeypoints.reset({numberPeople, (int)numberBodyParts, 3}, 0.f);
                poseScores.reset(numberPeople);
            }
+            // No people --> Empty Arrays
            else
            {
                poseKeypoints.reset();
                poseScores.reset();
            }
+            // Fill people keypoints
            const auto oneOverNumberBodyPartsAndPAFs = 1/T(numberBodyParts + numberBodyPartPairs);
+            // For each person
            for (auto person = 0u ; person < validSubsetIndexes.size() ; person++)
            {
                const auto& personPair = peopleVector[validSubsetIndexes[person]];
                const auto& personVector = personPair.first;
+                // For each body part
                for (auto bodyPart = 0u; bodyPart < numberBodyParts; bodyPart++)
                {
                    const auto baseOffset = (person*numberBodyParts + bodyPart) * 3;
@@ -1109,10 +1226,10 @@ namespace op
 //     }
    template <typename T>
-    void connectBodyPartsCpu(Array<T>& poseKeypoints, Array<T>& poseScores, const T* const heatMapPtr,
+    void connectBodyPartsCpu(
-                             const T* const peaksPtr, const PoseModel poseModel, const Point<int>& heatMapSize,
+        Array<T>& poseKeypoints, Array<T>& poseScores, const T* const heatMapPtr, const T* const peaksPtr,
-                             const int maxPeaks, const T interMinAboveThreshold, const T interThreshold,
+        const PoseModel poseModel, const Point<int>& heatMapSize, const int maxPeaks, const T interMinAboveThreshold,
-                             const int minSubsetCnt, const T minSubsetScore, const T scaleFactor,
+        const T interThreshold, const int minSubsetCnt, const T minSubsetScore, const T scaleFactor,
        const bool maximizePositives)
    {
        try
@@ -1124,29 +1241,27 @@ namespace op
            if (numberBodyParts == 0)
                error("Invalid value of numberBodyParts, it must be positive, not " + std::to_string(numberBodyParts),
                      __LINE__, __FUNCTION__, __FILE__);
            // std::vector<std::pair<std::vector<int>, double>> refers to:
            //     - std::vector<int>: [body parts locations, #body parts found]
            //     - double: person subset score
-            const auto peopleVector = createPeopleVector(
+            auto peopleVector = createPeopleVector(
                heatMapPtr, peaksPtr, poseModel, heatMapSize, maxPeaks, interThreshold, interMinAboveThreshold,
                bodyPartPairs, numberBodyParts, numberBodyPartPairs);
            // Delete people below the following thresholds:
                // a) minSubsetCnt: removed if less than minSubsetCnt body parts
                // b) minSubsetScore: removed if global score smaller than this
                // c) maxPeaks (POSE_MAX_PEOPLE): keep first maxPeaks people above thresholds
            int numberPeople;
            std::vector<int> validSubsetIndexes;
-            validSubsetIndexes.reserve(fastMin((size_t)maxPeaks, peopleVector.size()));
+            // validSubsetIndexes.reserve(fastMin((size_t)maxPeaks, peopleVector.size()));
-            removePeopleBelowThresholds(
+            validSubsetIndexes.reserve(peopleVector.size());
+            removePeopleBelowThresholdsAndFillFaces(
                validSubsetIndexes, numberPeople, peopleVector, numberBodyParts, minSubsetCnt, minSubsetScore,
-                maxPeaks, maximizePositives);
+                maximizePositives, peaksPtr);
            // Fill and return poseKeypoints
-            peopleVectorToPeopleArray(poseKeypoints, poseScores, scaleFactor, peopleVector, validSubsetIndexes,
+            peopleVectorToPeopleArray(
-                                      peaksPtr, numberPeople, numberBodyParts, numberBodyPartPairs);
+                poseKeypoints, poseScores, scaleFactor, peopleVector, validSubsetIndexes, peaksPtr, numberPeople,
+                numberBodyParts, numberBodyPartPairs);
            // Experimental code
            if (poseModel == PoseModel::BODY_25D)
                error("BODY_25D is an experimental branch which is not usable.", __LINE__, __FUNCTION__, __FILE__);
@@ -1185,16 +1300,16 @@ namespace op
        const unsigned int numberBodyParts, const unsigned int numberBodyPartPairs,
        const Array<double>& precomputedPAFs);
-    template OP_API void removePeopleBelowThresholds(
+    template OP_API void removePeopleBelowThresholdsAndFillFaces(
        std::vector<int>& validSubsetIndexes, int& numberPeople,
-        const std::vector<std::pair<std::vector<int>, float>>& peopleVector,
+        std::vector<std::pair<std::vector<int>, float>>& peopleVector,
-        const unsigned int numberBodyParts,
+        const unsigned int numberBodyParts, const int minSubsetCnt, const float minSubsetScore,
-        const int minSubsetCnt, const float minSubsetScore, const int maxPeaks, const bool maximizePositives);
+        const bool maximizePositives, const float* const peaksPtr);
-    template OP_API void removePeopleBelowThresholds(
+    template OP_API void removePeopleBelowThresholdsAndFillFaces(
        std::vector<int>& validSubsetIndexes, int& numberPeople,
-        const std::vector<std::pair<std::vector<int>, double>>& peopleVector,
+        std::vector<std::pair<std::vector<int>, double>>& peopleVector,
-        const unsigned int numberBodyParts,
+        const unsigned int numberBodyParts, const int minSubsetCnt, const double minSubsetScore,
-        const int minSubsetCnt, const double minSubsetScore, const int maxPeaks, const bool maximizePositives);
+        const bool maximizePositives, const double* const peaksPtr);
    template OP_API void peopleVectorToPeopleArray(
        Array<float>& poseKeypoints, Array<float>& poseScores, const float scaleFactor,

--- a/src/openpose/net/bodyPartConnectorBase.cu
+++ b/src/openpose/net/bodyPartConnectorBase.cu
@@ -14,7 +14,7 @@ namespace op
    template <typename T>
    inline __device__  T process(
        const T* bodyPartA, const T* bodyPartB, const T* mapX, const T* mapY, const int heatmapWidth,
-        const int heatmapHeight, const T interThreshold, const T interMinAboveThreshold)
+        const int heatmapHeight, const T interThreshold, const T interMinAboveThreshold, const T defaultNmsThreshold)
    {
        const auto vectorAToBX = bodyPartB[0] - bodyPartA[0];
        const auto vectorAToBY = bodyPartB[1] - bodyPartA[1];
@@ -59,7 +59,7 @@ namespace op
                const auto l2Dist = sqrtf(vectorAToBX*vectorAToBX + vectorAToBY*vectorAToBY);
                const auto threshold = sqrtf(heatmapWidth*heatmapHeight)/150; // 3.3 for 368x656, 6.6 for 2x resolution
                if (l2Dist < threshold)
-                    return T(0.15);
+                    return T(defaultNmsThreshold+1e-6); // Without 1e-6 will not work because I use strict greater
            }
        }
        return -1;
@@ -69,7 +69,8 @@ namespace op
    // __global__ void pafScoreKernelOld(
    //     T* pairScoresPtr, const T* const heatMapPtr, const T* const peaksPtr, const unsigned int* const bodyPartPairsPtr,
    //     const unsigned int* const mapIdxPtr, const unsigned int maxPeaks, const int numberBodyPartPairs,
-    //     const int heatmapWidth, const int heatmapHeight, const T interThreshold, const T interMinAboveThreshold)
+    //     const int heatmapWidth, const int heatmapHeight, const T interThreshold, const T interMinAboveThreshold,
+    //     const T defaultNmsThreshold)
    // {
    //     const auto pairIndex = (blockIdx.x * blockDim.x) + threadIdx.x;
    //     const auto peakA = (blockIdx.y * blockDim.y) + threadIdx.y;
@@ -96,7 +97,7 @@ namespace op
    //             const T* const mapY = heatMapPtr + mapIdxY*heatmapWidth*heatmapHeight;
    //             pairScoresPtr[outputIndex] = process(
    //                 bodyPartA, bodyPartB, mapX, mapY, heatmapWidth, heatmapHeight, interThreshold,
-    //                 interMinAboveThreshold);
+    //                 interMinAboveThreshold, defaultNmsThreshold);
    //         }
    //         else
    //             pairScoresPtr[outputIndex] = -1;
@@ -107,7 +108,8 @@ namespace op
    __global__ void pafScoreKernel(
        T* pairScoresPtr, const T* const heatMapPtr, const T* const peaksPtr, const unsigned int* const bodyPartPairsPtr,
        const unsigned int* const mapIdxPtr, const unsigned int maxPeaks, const int numberBodyPartPairs,
-        const int heatmapWidth, const int heatmapHeight, const T interThreshold, const T interMinAboveThreshold)
+        const int heatmapWidth, const int heatmapHeight, const T interThreshold, const T interMinAboveThreshold,
+        const T defaultNmsThreshold)
    {
        const auto peakB = (blockIdx.x * blockDim.x) + threadIdx.x;
        const auto peakA = (blockIdx.y * blockDim.y) + threadIdx.y;
@@ -135,191 +137,21 @@ namespace op
                const T* const mapY = heatMapPtr + mapIdxY*heatmapWidth*heatmapHeight;
                pairScoresPtr[outputIndex] = process(
                    bodyPartA, bodyPartB, mapX, mapY, heatmapWidth, heatmapHeight, interThreshold,
-                    interMinAboveThreshold);
+                    interMinAboveThreshold, defaultNmsThreshold);
            }
            else
                pairScoresPtr[outputIndex] = -1;
        }
    }
-    // template <typename T>
-    // std::vector<std::pair<std::vector<int>, T>> pafVectorIntoPeopleVectorOld(
-    //     const std::vector<std::tuple<T, T, int, int, int>>& pairConnections, const T* const peaksPtr,
-    //     const int maxPeaks, const std::vector<unsigned int>& bodyPartPairs, const unsigned int numberBodyParts)
-    // {
-    //     try
-    //     {
-    //         // std::vector<std::pair<std::vector<int>, double>> refers to:
-    //         //     - std::vector<int>: [body parts locations, #body parts found]
-    //         //     - double: person subset score
-    //         std::vector<std::pair<std::vector<int>, T>> peopleVector;
-    //         const auto vectorSize = numberBodyParts+1;
-    //         const auto peaksOffset = (maxPeaks+1);
-    //         // Save which body parts have been already assigned
-    //         std::vector<int> personAssigned(numberBodyParts*maxPeaks, -1);
-    //         // Iterate over each PAF pair connection detected
-    //         // E.g., neck1-nose2, neck5-Lshoulder0, etc.
-    //         for (const auto& pairConnection : pairConnections)
-    //         {
-    //             // Read pairConnection
-    //             // // Total score - only required for previous sort
-    //             // const auto totalScore = std::get<0>(pairConnection);
-    //             const auto pafScore = std::get<1>(pairConnection);
-    //             const auto pairIndex = std::get<2>(pairConnection);
-    //             const auto indexA = std::get<3>(pairConnection);
-    //             const auto indexB = std::get<4>(pairConnection);
-    //             // Derived data
-    //             const auto bodyPartA = bodyPartPairs[2*pairIndex];
-    //             const auto bodyPartB = bodyPartPairs[2*pairIndex+1];
-    //             const auto indexScoreA = (bodyPartA*peaksOffset + indexA)*3 + 2;
-    //             const auto indexScoreB = (bodyPartB*peaksOffset + indexB)*3 + 2;
-    //             // -1 because indexA and indexB are 1-based
-    //             auto& aAssigned = personAssigned[bodyPartA*maxPeaks+indexA-1];
-    //             auto& bAssigned = personAssigned[bodyPartB*maxPeaks+indexB-1];
-    //             // Debugging
-    //             #ifdef DEBUG
-    //                 if (indexA-1 > peaksOffset || indexA <= 0)
-    //                     error("Something is wrong: " + std::to_string(indexA)
-    //                           + " vs. " + std::to_string(peaksOffset) + ". Contact us.",
-    //                           __LINE__, __FUNCTION__, __FILE__);
-    //                 if (indexB-1 > peaksOffset || indexB <= 0)
-    //                     error("Something is wrong: " + std::to_string(indexB)
-    //                           + " vs. " + std::to_string(peaksOffset) + ". Contact us.",
-    //                           __LINE__, __FUNCTION__, __FILE__);
-    //             #endif
-    //             // Different cases:
-    //             //     1. A & B not assigned yet: Create new person
-    //             //     2. A assigned but not B: Add B to person with A (if no another B there)
-    //             //     3. B assigned but not A: Add A to person with B (if no another A there)
-    //             //     4. A & B already assigned to same person (circular/redundant PAF): Update person score
-    //             //     5. A & B already assigned to different people: Merge people if keypoint intersection is null
-    //             // 1. A & B not assigned yet: Create new person
-    //             if (aAssigned < 0 && bAssigned < 0)
-    //             {
-    //                 // Keypoint indexes
-    //                 std::vector<int> rowVector(vectorSize, 0);
-    //                 rowVector[bodyPartA] = indexScoreA;
-    //                 rowVector[bodyPartB] = indexScoreB;
-    //                 // Number keypoints
-    //                 rowVector.back() = 2;
-    //                 // Score
-    //                 const auto personScore = peaksPtr[indexScoreA] + peaksPtr[indexScoreB] + pafScore;
-    //                 // Set associated personAssigned as assigned
-    //                 aAssigned = (int)peopleVector.size();
-    //                 bAssigned = aAssigned;
-    //                 // Create new personVector
-    //                 peopleVector.emplace_back(std::make_pair(rowVector, personScore));
-    //             }
-    //             // 2. A assigned but not B: Add B to person with A (if no another B there)
-    //             // or
-    //             // 3. B assigned but not A: Add A to person with B (if no another A there)
-    //             else if ((aAssigned >= 0 && bAssigned < 0)
-    //                 || (aAssigned < 0 && bAssigned >= 0))
-    //             {
-    //                 // Assign person1 to one where xAssigned >= 0
-    //                 const auto assigned1 = (aAssigned >= 0 ? aAssigned : bAssigned);
-    //                 auto& assigned2 = (aAssigned >= 0 ? bAssigned : aAssigned);
-    //                 const auto bodyPart2 = (aAssigned >= 0 ? bodyPartB : bodyPartA);
-    //                 const auto indexScore2 = (aAssigned >= 0 ? indexScoreB : indexScoreA);
-    //                 // Person index
-    //                 auto& personVector = peopleVector[assigned1];
-    //                 // Debugging
-    //                 #ifdef DEBUG
-    //                     const auto bodyPart1 = (aAssigned >= 0 ? bodyPartA : bodyPartB);
-    //                     const auto indexScore1 = (aAssigned >= 0 ? indexScoreA : indexScoreB);
-    //                     const auto index1 = (aAssigned >= 0 ? indexA : indexB);
-    //                     if ((unsigned int)personVector.first.at(bodyPart1) != indexScore1)
-    //                         error("Something is wrong: "
-    //                               + std::to_string((personVector.first[bodyPart1]-2)/3-bodyPart1*peaksOffset)
-    //                               + " vs. " + std::to_string((indexScore1-2)/3-bodyPart1*peaksOffset) + " vs. "
-    //                               + std::to_string(index1) + ". Contact us.",
-    //                               __LINE__, __FUNCTION__, __FILE__);
-    //                 #endif
-    //                 // If person with 1 does not have a 2 yet
-    //                 if (personVector.first[bodyPart2] == 0)
-    //                 {
-    //                     // Update keypoint indexes
-    //                     personVector.first[bodyPart2] = indexScore2;
-    //                     // Update number keypoints
-    //                     personVector.first.back()++;
-    //                     // Update score
-    //                     personVector.second += peaksPtr[indexScore2] + pafScore;
-    //                     // Set associated personAssigned as assigned
-    //                     assigned2 = assigned1;
-    //                 }
-    //                 // Otherwise, ignore this B because the previous one came from a higher PAF-confident score
-    //             }
-    //             // 4. A & B already assigned to same person (circular/redundant PAF): Update person score
-    //             else if (aAssigned >=0 && bAssigned >=0 && aAssigned == bAssigned)
-    //                 peopleVector[aAssigned].second += pafScore;
-    //             // 5. A & B already assigned to different people: Merge people if keypoint intersection is null
-    //             // I.e., that the keypoints in person A and B do not overlap
-    //             else if (aAssigned >=0 && bAssigned >=0 && aAssigned != bAssigned)
-    //             {
-    //                 // Assign person1 to the one with lowest index for 2 reasons:
-    //                 //     1. Speed up: Removing an element from std::vector is cheaper for latest elements
-    //                 //     2. Avoid harder index update: Updated elements in person1ssigned would depend on
-    //                 //        whether person1 > person2 or not: element = aAssigned - (person2 > person1 ? 1 : 0)
-    //                 const auto assigned1 = (aAssigned < bAssigned ? aAssigned : bAssigned);
-    //                 const auto assigned2 = (aAssigned < bAssigned ? bAssigned : aAssigned);
-    //                 auto& person1 = peopleVector[assigned1].first;
-    //                 const auto& person2 = peopleVector[assigned2].first;
-    //                 // Check if complementary
-    //                 // Defining found keypoint indexes in personA as kA, and analogously kB
-    //                 // Complementary if and only if kA intersection kB = empty. I.e., no common keypoints
-    //                 bool complementary = true;
-    //                 for (auto part = 0u ; part < numberBodyParts ; part++)
-    //                 {
-    //                     if (person1[part] > 0 && person2[part] > 0)
-    //                     {
-    //                         complementary = false;
-    //                         break;
-    //                     }
-    //                 }
-    //                 // If complementary, merge both people into 1
-    //                 if (complementary)
-    //                 {
-    //                     // Update keypoint indexes
-    //                     for (auto part = 0u ; part < numberBodyParts ; part++)
-    //                         if (person1[part] == 0)
-    //                             person1[part] = person2[part];
-    //                     // Update number keypoints
-    //                     person1.back() += person2.back();
-    //                     // Update score
-    //                     peopleVector[assigned1].second += peopleVector[assigned2].second + pafScore;
-    //                     // Erase the non-merged person
-    //                     peopleVector.erase(peopleVector.begin()+assigned2);
-    //                     // Update associated personAssigned (person indexes have changed)
-    //                     for (auto& element : personAssigned)
-    //                     {
-    //                         if (element == assigned2)
-    //                             element = assigned1;
-    //                         else if (element > assigned2)
-    //                             element--;
-    //                     }
-    //                 }
-    //             }
-    //         }
-    //         // Return result
-    //         return peopleVector;
-    //     }
-    //     catch (const std::exception& e)
-    //     {
-    //         error(e.what(), __LINE__, __FUNCTION__, __FILE__);
-    //         return {};
-    //     }
-    // }
    template <typename T>
-    void connectBodyPartsGpu(Array<T>& poseKeypoints, Array<T>& poseScores, const T* const heatMapGpuPtr,
+    void connectBodyPartsGpu(
-                             const T* const peaksPtr, const PoseModel poseModel, const Point<int>& heatMapSize,
+        Array<T>& poseKeypoints, Array<T>& poseScores, const T* const heatMapGpuPtr, const T* const peaksPtr,
-                             const int maxPeaks, const T interMinAboveThreshold, const T interThreshold,
+        const PoseModel poseModel, const Point<int>& heatMapSize, const int maxPeaks, const T interMinAboveThreshold,
-                             const int minSubsetCnt, const T minSubsetScore, const T scaleFactor,
+        const T interThreshold, const int minSubsetCnt, const T minSubsetScore, const T scaleFactor,
        const bool maximizePositives, Array<T> pairScoresCpu, T* pairScoresGpuPtr,
        const unsigned int* const bodyPartPairsGpuPtr, const unsigned int* const mapIdxGpuPtr,
-                             const T* const peaksGpuPtr)
+        const T* const peaksGpuPtr, const T defaultNmsThreshold)
    {
        try
        {
@@ -352,27 +184,10 @@ namespace op
            // pafScoreKernelOld<<<numBlocks, THREADS_PER_BLOCK>>>(
            //     pairScoresGpuPtr, heatMapGpuPtr, peaksGpuPtr, bodyPartPairsGpuPtr, mapIdxGpuPtr,
            //     maxPeaks, (int)numberBodyPartPairs, heatMapSize.x, heatMapSize.y, interThreshold,
-            //     interMinAboveThreshold);
+            //     interMinAboveThreshold, defaultNmsThreshold);
            // // pairScoresCpu <-- pairScoresGpu
            // cudaMemcpy(pairScoresCpu.getPtr(), pairScoresGpuPtr, totalComputations * sizeof(T),
            //            cudaMemcpyDeviceToHost);
-            // // Get pair connections and their scores
-            // const auto pairConnections = pafPtrIntoVector(
-            //     pairScoresCpu, peaksPtr, maxPeaks, bodyPartPairs, numberBodyPartPairs);
-            // const auto peopleVector = pafVectorIntoPeopleVectorOld(
-            //     pairConnections, peaksPtr, maxPeaks, bodyPartPairs, numberBodyParts);
-            // // Delete people below the following thresholds:
-            //     // a) minSubsetCnt: removed if less than minSubsetCnt body parts
-            //     // b) minSubsetScore: removed if global score smaller than this
-            //     // c) maxPeaks (POSE_MAX_PEOPLE): keep first maxPeaks people above thresholds
-            // int numberPeople;
-            // std::vector<int> validSubsetIndexes;
-            // validSubsetIndexes.reserve(fastMin((size_t)maxPeaks, peopleVector.size()));
-            // removePeopleBelowThresholds(validSubsetIndexes, numberPeople, peopleVector, numberBodyParts, minSubsetCnt,
-            //                             minSubsetScore, maxPeaks, maximizePositives);
-            // // Fill and return poseKeypoints
-            // peopleVectorToPeopleArray(poseKeypoints, poseScores, scaleFactor, peopleVector, validSubsetIndexes,
-            //                           peaksPtr, numberPeople, numberBodyParts, numberBodyPartPairs);
            // OP_PROFILE_END(timeNormalize1, 1e3, REPS);
            // Efficient code
@@ -386,14 +201,16 @@ namespace op
            pafScoreKernel<<<numBlocks, THREADS_PER_BLOCK>>>(
                pairScoresGpuPtr, heatMapGpuPtr, peaksGpuPtr, bodyPartPairsGpuPtr, mapIdxGpuPtr,
                maxPeaks, (int)numberBodyPartPairs, heatMapSize.x, heatMapSize.y, interThreshold,
-                interMinAboveThreshold);
+                interMinAboveThreshold, defaultNmsThreshold);
            // pairScoresCpu <-- pairScoresGpu
            cudaMemcpy(pairScoresCpu.getPtr(), pairScoresGpuPtr, totalComputations * sizeof(T),
                       cudaMemcpyDeviceToHost);
+            // OP_PROFILE_END(timeNormalize2, 1e3, REPS);
            // Get pair connections and their scores
            const auto pairConnections = pafPtrIntoVector(
                pairScoresCpu, peaksPtr, maxPeaks, bodyPartPairs, numberBodyPartPairs);
-            const auto peopleVector = pafVectorIntoPeopleVector(
+            auto peopleVector = pafVectorIntoPeopleVector(
                pairConnections, peaksPtr, maxPeaks, bodyPartPairs, numberBodyParts);
            // // Old code: Get pair connections and their scores
            // // std::vector<std::pair<std::vector<int>, double>> refers to:
@@ -409,13 +226,15 @@ namespace op
                // c) maxPeaks (POSE_MAX_PEOPLE): keep first maxPeaks people above thresholds
            int numberPeople;
            std::vector<int> validSubsetIndexes;
-            validSubsetIndexes.reserve(fastMin((size_t)maxPeaks, peopleVector.size()));
+            // validSubsetIndexes.reserve(fastMin((size_t)maxPeaks, peopleVector.size()));
-            removePeopleBelowThresholds(validSubsetIndexes, numberPeople, peopleVector, numberBodyParts, minSubsetCnt,
+            validSubsetIndexes.reserve(peopleVector.size());
-                                        minSubsetScore, maxPeaks, maximizePositives);
+            removePeopleBelowThresholdsAndFillFaces(
+                validSubsetIndexes, numberPeople, peopleVector, numberBodyParts, minSubsetCnt, minSubsetScore,
+                maximizePositives, peaksPtr);
            // Fill and return poseKeypoints
-            peopleVectorToPeopleArray(poseKeypoints, poseScores, scaleFactor, peopleVector, validSubsetIndexes,
+            peopleVectorToPeopleArray(
-                                      peaksPtr, numberPeople, numberBodyParts, numberBodyPartPairs);
+                poseKeypoints, poseScores, scaleFactor, peopleVector, validSubsetIndexes, peaksPtr, numberPeople,
-            // OP_PROFILE_END(timeNormalize2, 1e3, REPS);
+                numberBodyParts, numberBodyPartPairs);
            // // Profiling verbose
            // log("  BPC(ori)=" + std::to_string(timeNormalize1) + "ms");
@@ -436,12 +255,12 @@ namespace op
        const float interMinAboveThreshold, const float interThreshold, const int minSubsetCnt,
        const float minSubsetScore, const float scaleFactor, const bool maximizePositives,
        Array<float> pairScoresCpu, float* pairScoresGpuPtr, const unsigned int* const bodyPartPairsGpuPtr,
-        const unsigned int* const mapIdxGpuPtr, const float* const peaksGpuPtr);
+        const unsigned int* const mapIdxGpuPtr, const float* const peaksGpuPtr, const float defaultNmsThreshold);
    template void connectBodyPartsGpu(
        Array<double>& poseKeypoints, Array<double>& poseScores, const double* const heatMapGpuPtr,
        const double* const peaksPtr, const PoseModel poseModel, const Point<int>& heatMapSize, const int maxPeaks,
        const double interMinAboveThreshold, const double interThreshold, const int minSubsetCnt,
        const double minSubsetScore, const double scaleFactor, const bool maximizePositives,
        Array<double> pairScoresCpu, double* pairScoresGpuPtr, const unsigned int* const bodyPartPairsGpuPtr,
-        const unsigned int* const mapIdxGpuPtr, const double* const peaksGpuPtr);
+        const unsigned int* const mapIdxGpuPtr, const double* const peaksGpuPtr, const double defaultNmsThreshold);
 }
--- a/src/openpose/net/bodyPartConnectorBaseCL.cpp
+++ b/src/openpose/net/bodyPartConnectorBaseCL.cpp
@@ -156,16 +156,15 @@ namespace op
                    pairScoresGpuPtrBuffer, heatMapGpuPtrBuffer, peaksGpuPtrBuffer, bodyPartPairsGpuPtrBuffer, mapIdxGpuPtrBuffer,
                    maxPeaks, (int)numberBodyPartPairs, heatMapSize.x, heatMapSize.y, interThreshold,
                    interMinAboveThreshold);
-                OpenCL::getInstance(gpuID)->getQueue().enqueueReadBuffer(pairScoresGpuPtrBuffer, CL_TRUE, 0,
+                OpenCL::getInstance(gpuID)->getQueue().enqueueReadBuffer(
-                                                                          totalComputations * sizeof(T), pairScoresCpu.getPtr());
+                    pairScoresGpuPtrBuffer, CL_TRUE, 0, totalComputations * sizeof(T), pairScoresCpu.getPtr());
                // New code
                // Get pair connections and their scores
                const auto pairConnections = pafPtrIntoVector(
                    pairScoresCpu, peaksPtr, maxPeaks, bodyPartPairs, numberBodyPartPairs);
-                const auto peopleVector = pafVectorIntoPeopleVector(
+                auto peopleVector = pafVectorIntoPeopleVector(
                    pairConnections, peaksPtr, maxPeaks, bodyPartPairs, numberBodyParts);
                // // Old code
                // // Get pair connections and their scores
                // // std::vector<std::pair<std::vector<int>, double>> refers to:
@@ -175,7 +174,6 @@ namespace op
                // const auto peopleVector = createPeopleVector(
                //     tNullptr, peaksPtr, poseModel, heatMapSize, maxPeaks, interThreshold, interMinAboveThreshold,
                //     bodyPartPairs, numberBodyParts, numberBodyPartPairs, pairScoresCpu);
                // Delete people below the following thresholds:
                    // a) minSubsetCnt: removed if less than minSubsetCnt body parts
                    // b) minSubsetScore: removed if global score smaller than this
@@ -183,15 +181,13 @@ namespace op
                int numberPeople;
                std::vector<int> validSubsetIndexes;
                validSubsetIndexes.reserve(fastMin((size_t)maxPeaks, peopleVector.size()));
-                removePeopleBelowThresholds(validSubsetIndexes, numberPeople, peopleVector, numberBodyParts, minSubsetCnt,
+                removePeopleBelowThresholdsAndFillFaces(
-                                            minSubsetScore, maxPeaks, maximizePositives);
+                    validSubsetIndexes, numberPeople, peopleVector, numberBodyParts, minSubsetCnt, minSubsetScore,
+                    maximizePositives, peaksPtr);
                // Fill and return poseKeypoints
-                peopleVectorToPeopleArray(poseKeypoints, poseScores, scaleFactor, peopleVector, validSubsetIndexes,
+                peopleVectorToPeopleArray(
-                                          peaksPtr, numberPeople, numberBodyParts, numberBodyPartPairs);
+                    poseKeypoints, poseScores, scaleFactor, peopleVector, validSubsetIndexes, peaksPtr, numberPeople,
+                    numberBodyParts, numberBodyPartPairs);
-               // // Sanity check
-               // cudaCheck(__LINE__, __FUNCTION__, __FILE__);
            #else
                UNUSED(poseKeypoints);
                UNUSED(poseScores);

--- a/src/openpose/net/bodyPartConnectorCaffe.cpp
+++ b/src/openpose/net/bodyPartConnectorCaffe.cpp
@@ -108,6 +108,19 @@ namespace op
        }
    }
+    template <typename T>
+    void BodyPartConnectorCaffe<T>::setDefaultNmsThreshold(const T defaultNmsThreshold)
+    {
+        try
+        {
+            mDefaultNmsThreshold = {defaultNmsThreshold};
+        }
+        catch (const std::exception& e)
+        {
+            error(e.what(), __LINE__, __FUNCTION__, __FILE__);
+        }
+    }
    template <typename T>
    void BodyPartConnectorCaffe<T>::setInterMinAboveThreshold(const T interMinAboveThreshold)
    {
@@ -300,8 +313,8 @@ namespace op
    }
    template <typename T>
-    void BodyPartConnectorCaffe<T>::Forward_gpu(const std::vector<ArrayCpuGpu<T>*>& bottom, Array<T>& poseKeypoints,
+    void BodyPartConnectorCaffe<T>::Forward_gpu(
-                                                Array<T>& poseScores)
+        const std::vector<ArrayCpuGpu<T>*>& bottom, Array<T>& poseKeypoints, Array<T>& poseScores)
    {
        try
        {
@@ -354,12 +367,12 @@ namespace op
                }
                // Run body part connector
-                connectBodyPartsGpu(poseKeypoints, poseScores, heatMapsGpuPtr, peaksPtr, mPoseModel,
+                connectBodyPartsGpu(
-                                    Point<int>{heatMapsBlob->shape(3), heatMapsBlob->shape(2)},
+                    poseKeypoints, poseScores, heatMapsGpuPtr, peaksPtr, mPoseModel,
-                                    maxPeaks, mInterMinAboveThreshold, mInterThreshold,
+                    Point<int>{heatMapsBlob->shape(3), heatMapsBlob->shape(2)}, maxPeaks, mInterMinAboveThreshold,
-                                    mMinSubsetCnt, mMinSubsetScore, mScaleNetToOutput, mMaximizePositives,
+                    mInterThreshold, mMinSubsetCnt, mMinSubsetScore, mScaleNetToOutput, mMaximizePositives,
-                                    mFinalOutputCpu, pFinalOutputGpuPtr, pBodyPartPairsGpuPtr, pMapIdxGpuPtr,
+                    mFinalOutputCpu, pFinalOutputGpuPtr, pBodyPartPairsGpuPtr, pMapIdxGpuPtr, peaksGpuPtr,
-                                    peaksGpuPtr);
+                    mDefaultNmsThreshold);
            #else
                UNUSED(bottom);
                UNUSED(poseKeypoints);

--- a/src/openpose/pose/poseExtractorCaffe.cpp
+++ b/src/openpose/pose/poseExtractorCaffe.cpp
@@ -317,6 +317,7 @@ namespace op
                // OP_CUDA_PROFILE_END(timeNormalize3, 1e3, REPS);
                // OP_CUDA_PROFILE_INIT(REPS);
                spBodyPartConnectorCaffe->setScaleNetToOutput(mScaleNetToOutput);
+                spBodyPartConnectorCaffe->setDefaultNmsThreshold((float)get(PoseProperty::NMSThreshold));
                spBodyPartConnectorCaffe->setInterMinAboveThreshold(
                    (float)get(PoseProperty::ConnectInterMinAboveThreshold));
                spBodyPartConnectorCaffe->setInterThreshold((float)get(PoseProperty::ConnectInterThreshold));

--- a/src/openpose/utilities/keypoint.cpp
+++ b/src/openpose/utilities/keypoint.cpp
@@ -174,9 +174,10 @@ namespace op
        const double offsetY);
    template <typename T>
-    void renderKeypointsCpu(Array<T>& frameArray, const Array<T>& keypoints, const std::vector<unsigned int>& pairs,
+    void renderKeypointsCpu(
-                            const std::vector<T> colors, const T thicknessCircleRatio,
+        Array<T>& frameArray, const Array<T>& keypoints, const std::vector<unsigned int>& pairs,
-                            const T thicknessLineRatioWRTCircle, const std::vector<T>& poseScales, const T threshold)
+        const std::vector<T> colors, const T thicknessCircleRatio, const T thicknessLineRatioWRTCircle,
+        const std::vector<T>& poseScales, const T threshold)
    {
        try
        {
@@ -209,8 +210,9 @@ namespace op
                    const auto personRectangle = getKeypointsRectangle(keypoints, person, thresholdRectangle);
                    if (personRectangle.area() > 0)
                    {
-                        const auto ratioAreas = fastMin(T(1), fastMax(personRectangle.width/(T)width,
+                        const auto ratioAreas = fastMin(
-                                                                     personRectangle.height/(T)height));
+                            T(1), fastMax(
+                                personRectangle.width/(T)width, personRectangle.height/(T)height));
                        // Size-dependent variables
                        const auto thicknessRatio = fastMax(
                            positiveIntRound(std::sqrt(area)* thicknessCircleRatio * ratioAreas), 2);
@@ -283,21 +285,32 @@ namespace op
        const std::vector<double>& poseScales, const double threshold);
    template <typename T>
-    Rectangle<T> getKeypointsRectangle(const Array<T>& keypoints, const int person, const T threshold)
+    Rectangle<T> getKeypointsRectangle(
+        const Array<T>& keypoints, const int person, const T threshold, const int firstIndex, const int lastIndex)
    {
        try
        {
+            // Params
            const auto numberKeypoints = keypoints.getSize(1);
-            // Sanity check
+            const auto lastIndexClean = (lastIndex < 0 ? numberKeypoints : lastIndex);
+            // Sanity checks
            if (numberKeypoints < 1)
                error("Number body parts must be > 0.", __LINE__, __FUNCTION__, __FILE__);
+            if (lastIndexClean > numberKeypoints)
+                error("The value of `lastIndex` must be less or equal than `numberKeypoints`. Currently: "
+                    + std::to_string(lastIndexClean) + " vs. " + std::to_string(numberKeypoints),
+                    __LINE__, __FUNCTION__, __FILE__);
+            if (firstIndex > lastIndexClean)
+                error("The value of `firstIndex` must be less or equal than `lastIndex`. Currently: "
+                    + std::to_string(firstIndex) + " vs. " + std::to_string(lastIndex),
+                    __LINE__, __FUNCTION__, __FILE__);
            // Define keypointPtr
            const auto keypointPtr = keypoints.getConstPtr() + person * keypoints.getSize(1) * keypoints.getSize(2);
            T minX = std::numeric_limits<T>::max();
            T maxX = std::numeric_limits<T>::lowest();
            T minY = minX;
            T maxY = maxX;
-            for (auto part = 0 ; part < numberKeypoints ; part++)
+            for (auto part = firstIndex ; part < lastIndexClean ; part++)
            {
                const auto score = keypointPtr[3*part + 2];
                if (score > threshold)
@@ -328,9 +341,11 @@ namespace op
        }
    }
    template OP_API Rectangle<float> getKeypointsRectangle(
-        const Array<float>& keypoints, const int person, const float threshold);
+        const Array<float>& keypoints, const int person, const float threshold, const int firstIndex,
+        const int lastIndex);
    template OP_API Rectangle<double> getKeypointsRectangle(
-        const Array<double>& keypoints, const int person, const double threshold);
+        const Array<double>& keypoints, const int person, const double threshold, const int firstIndex,
+        const int lastIndex);
    template <typename T>
    T getAverageScore(const Array<T>& keypoints, const int person)