1. 19 12月, 2019 1 次提交
  2. 12 12月, 2019 1 次提交
    • P
      Merge pull request #16138 from pmur:reg_16137 · 1c4a64f0
      Paul Murphy 提交于
      * imgproc: Prevent 1B overrun of 8C3 SIMD optimization
      
      The fourth value read via v_load_q is essentially ignored,
      but can cause trouble if it happens to cross page boundaries.
      
      The final few iterations may attempt to read the most extreme
      elements of S, which will read 1B beyond the array in most
      aligment cases. Dynamically compute the stop. This could be
      hoised from the loop, but will require a more extensive change.
      
      Likewise, cleanup the iteration increment statements to make
      it more obvious they do channel count (3) elements per pass.
      
      This should resolve #16137
      
      * imgproc(resize): extra check
      1c4a64f0
  3. 09 12月, 2019 1 次提交
    • P
      Merge pull request #15257 from pmur:resize · a011035e
      Paul Murphy 提交于
      * resize: HResizeLinear reduce duplicate work
      
      There appears to be a 2x unroll of the HResizeLinear against k,
      however the k value is only incremented by 1 during the unroll. This
      results in k - 1 duplicate passes when k > 1.
      
      Likewise, the final pass may not respect the work done by the vector
      loop. Start it with the offset returned by the vector op if
      implemented. Note, no vector ops are implemented today.
      
      The performance is most noticable on a linear downscale. A set of
      performance tests are added to characterize this.  The performance
      improvement is 10-50% depending on the scaling.
      
      * imgproc: vectorize HResizeLinear
      
      Performance is mostly gated by the gather operations
      for x inputs.
      
      Likewise, provide a 2x unroll against k, this reduces the
      number of alpha gathers by 1/2 for larger k.
      
      While not a 4x improvement, it still performs substantially
      better under P9 for a 1.4x improvement. P8 baseline is
      1.05-1.10x due to reduced VSX instruction set.
      
      For float types, this results in a more modest
      1.2x improvement.
      
      * Update U8 processing for non-bitexact linear resize
      
      * core: hal: vsx: improve v_load_expand_q
      
      With a little help, we can do this quickly without gprs on
      all VSX enabled targets.
      
      * resize: Fix cn == 3 step per feedback
      
      Per feedback, ensure we don't overrun. This was caught via the
      failure observed in Test_TensorFlow.inception_accuracy.
      a011035e
  4. 18 11月, 2019 1 次提交
  5. 03 6月, 2019 1 次提交
    • V
      Merge pull request #14210 from terfendail:wui_512 · 3b015dfc
      Vitaly Tuzov 提交于
      AVX512 wide universal intrinsics (#14210)
      
      * Added implementation of 512-bit wide universal intrinsics(WIP)
      
      * Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): build fixes
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines
      
      * Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build
      
      * Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.
      3b015dfc
  6. 05 3月, 2019 1 次提交
  7. 20 2月, 2019 1 次提交
    • V
      Merge pull request #13781 from terfendail:warp_wintr · 334c4d62
      Vitaly Tuzov 提交于
      Resize reworked using wide universal intrinsics (#13781)
      
      * Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize
      
      * Reworked linear resize using new wide LUT intrinsics
      
      * Fix for VSX intrinsics
      334c4d62
  8. 03 12月, 2018 1 次提交
  9. 24 10月, 2018 1 次提交
    • M
      Merge pull request #12877 from maver1:3.4 · e397434c
      maver1 提交于
      * Updated ICV packages and IPP integration
      
      * core(test): minMaxIdx IPP regression test
      
      * core(ipp): workaround minMaxIdx problem
      
      * core(ipp): workaround meanStdDev() CV_32FC3 buffer overrun
      
      * Returned semicolon after CV_INSTRUMENT_REGION_IPP()
      e397434c
  10. 17 10月, 2018 1 次提交
    • M
      Catch exceptions by const-reference · c8e6ce30
      Michał Janiszewski 提交于
      Exceptions caught by value incur needless cost in C++, most of them can
      be caught by const-reference, especially as nearly none are actually
      used. This could allow compiler generate a slightly more efficient code.
      c8e6ce30
  11. 12 10月, 2018 1 次提交
  12. 08 10月, 2018 1 次提交
  13. 14 9月, 2018 1 次提交
  14. 13 9月, 2018 1 次提交
  15. 07 9月, 2018 1 次提交
  16. 04 9月, 2018 1 次提交
  17. 31 8月, 2018 1 次提交
    • V
      Bit-exact resize reworked to use wide intrinsics (#12038) · e345cb03
      Vitaly Tuzov 提交于
      * Bit-exact resize reworked to use wide intrinsics
      
      * Reworked bit-exact resize row data loading
      
      * Added bit-exact resize row data loaders for SIMD256 and SIMD512
      
      * Fixed type punned pointer dereferencing warning
      
      * Reworked loading of source data for SIMD256 and SIMD512 bit-exact resize
      e345cb03
  18. 05 7月, 2018 1 次提交
  19. 08 6月, 2018 1 次提交
  20. 28 3月, 2018 1 次提交
  21. 16 1月, 2018 1 次提交
  22. 22 12月, 2017 3 次提交
  23. 20 12月, 2017 1 次提交
  24. 13 12月, 2017 1 次提交
  25. 03 11月, 2017 1 次提交
  26. 31 8月, 2017 2 次提交