提交 · f5a84f75c4427e0754138264dbce0b55a80d5d38 · Greenplum / Opencv

19 12月, 2019 1 次提交
- V
  
  Fix for CV_8UC2 linear resize vectorization · f5a84f75
  由 Vitaly Tuzov 提交于 12月 18, 2019
  
  f5a84f75
12 12月, 2019 1 次提交

Merge pull request #16138 from pmur:reg_16137 · 1c4a64f0

由 Paul Murphy 提交于 12月 12, 2019

* imgproc: Prevent 1B overrun of 8C3 SIMD optimization

The fourth value read via v_load_q is essentially ignored,
but can cause trouble if it happens to cross page boundaries.

The final few iterations may attempt to read the most extreme
elements of S, which will read 1B beyond the array in most
aligment cases. Dynamically compute the stop. This could be
hoised from the loop, but will require a more extensive change.

Likewise, cleanup the iteration increment statements to make
it more obvious they do channel count (3) elements per pass.

This should resolve #16137

* imgproc(resize): extra check

1c4a64f0

09 12月, 2019 1 次提交

Merge pull request #15257 from pmur:resize · a011035e

由 Paul Murphy 提交于 12月 09, 2019

* resize: HResizeLinear reduce duplicate work

There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.

Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.

The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this.  The performance
improvement is 10-50% depending on the scaling.

* imgproc: vectorize HResizeLinear

Performance is mostly gated by the gather operations
for x inputs.

Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.

While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.

For float types, this results in a more modest
1.2x improvement.

* Update U8 processing for non-bitexact linear resize

* core: hal: vsx: improve v_load_expand_q

With a little help, we can do this quickly without gprs on
all VSX enabled targets.

* resize: Fix cn == 3 step per feedback

Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.

a011035e

18 11月, 2019 1 次提交
- C
  
  Fix 13577 · 2185bce4
  由 clunietp 提交于 11月 18, 2019
  
  2185bce4
03 6月, 2019 1 次提交

Merge pull request #14210 from terfendail:wui_512 · 3b015dfc

由 Vitaly Tuzov 提交于 6月 03, 2019

AVX512 wide universal intrinsics (#14210)

* Added implementation of 512-bit wide universal intrinsics(WIP)

* Added implementation of 512-bit wide universal intrinsics: implemented WUI vector types(WIP)

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented fp16 load/store

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented recombine and zip, implemented non-saturating and saturating arithmetics

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented bit operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented comparisons

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented lane shifts and reduction

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented absolute values

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented rounding and cast to float

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented LUT

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented type extension/narrowing and matrix operations

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented load_deinterleave for 2 and 3 channels images

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented load_deinterleave for 2- and implemented for 4-channel images

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented store_interleave

* Added implementation of 512-bit wide universal intrinsics(WIP): implemented signmask and checks

* Added implementation of 512-bit wide universal intrinsics(WIP): build fixes

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented popcount in case AVX512_BITALG is unavailable

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented zip

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented rotate for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): reimplemented interleave/deinterleave for s8 and s16

* Added implementation of 512-bit wide universal intrinsics(WIP): updated v512_set macros

* Added implementation of 512-bit wide universal intrinsics(WIP): fix for GCC wrong _mm512_abs_pd definition

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_zip to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_invsqrt to avoid AVX512_ER intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked v_rotate, v_popcount and interleave/deinterleave for U8 to avoid AVX512_VBMI intrinsics

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed integral image SIMD part

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed warnings

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed load_deinterleave for u8 and u16

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed v_invsqrt accuracy for f64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave/deinterleave for u32 and u64

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed interleave_pairs, interleave_quads and pack_triplets

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed rotate_left/right, part 2

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed 512-wide universal intrinsics based resize

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed findContours by avoiding use of uint64 dependent 512-wide v_signmask()

* Added implementation of 512-bit wide universal intrinsics(WIP): fixed trailing whitespaces

* Added implementation of 512-bit wide universal intrinsics(WIP): reworked specific intrinsic sets dependent parts to check availability of intrinsics based on CPU feature group defines

* Added implementation of 512-bit wide universal intrinsics(WIP):Updated AVX512 implementation of v_popcount to avoid AVX512VPOPCNTDQ intrinsics if unavailable.

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixed universal intrinsics data initialisation, v_mul_wrap, v_floor, v_ceil and v_signmask.

* Added implementation of 512-bit wide universal intrinsics(WIP): Removed hasSIMD512()

* Added implementation of 512-bit wide universal intrinsics(WIP): Fixes for gcc build

* Added implementation of 512-bit wide universal intrinsics(WIP): Reworked v_signmask, v_check_any() and v_check_all() implementation.

3b015dfc

05 3月, 2019 1 次提交
- V
  
  Fixed out of bound reading in LINEAR_EXACT resize for 8UC3 · 99b39aa5
  由 Vitaly Tuzov 提交于 3月 05, 2019
  
  99b39aa5
20 2月, 2019 1 次提交

Merge pull request #13781 from terfendail:warp_wintr · 334c4d62

由 Vitaly Tuzov 提交于 2月 20, 2019

Resize reworked using wide universal intrinsics (#13781)

* Added wide universal intrinsics optimized implementation for 3 channel bit-exact linear resize

* Reworked linear resize using new wide LUT intrinsics

* Fix for VSX intrinsics

334c4d62

03 12月, 2018 1 次提交
- A
  
  imgproc(resize): update checks (static analyzers) · 2d5ccc7b
  由 Alexander Alekhin 提交于 12月 03, 2018
  
  2d5ccc7b
24 10月, 2018 1 次提交

Merge pull request #12877 from maver1:3.4 · e397434c

由 maver1 提交于 10月 24, 2018

* Updated ICV packages and IPP integration

* core(test): minMaxIdx IPP regression test

* core(ipp): workaround minMaxIdx problem

* core(ipp): workaround meanStdDev() CV_32FC3 buffer overrun

* Returned semicolon after CV_INSTRUMENT_REGION_IPP()

e397434c

17 10月, 2018 1 次提交

Catch exceptions by const-reference · c8e6ce30

由 Michał Janiszewski 提交于 10月 16, 2018

Exceptions caught by value incur needless cost in C++, most of them can
be caught by const-reference, especially as nearly none are actually
used. This could allow compiler generate a slightly more efficient code.

c8e6ce30

12 10月, 2018 1 次提交
- T
  
  resolves 11283 · 24af70c7
  由 take1014 提交于 10月 12, 2018
  
  24af70c7
08 10月, 2018 1 次提交
- V
  
  Replaced SSE2 area resize implementation with wide universal intrinsic implementation · 9d602f27
  由 Vitaly Tuzov 提交于 9月 25, 2018
  
  9d602f27
14 9月, 2018 1 次提交
- H
  
  Add semicolons after `CV_INSTRUMENT` macros · 5d54def2
  由 Hamdi Sahloul 提交于 9月 14, 2018
  
  5d54def2
13 9月, 2018 1 次提交
- V
  
  Fixed bit-exact resize SIMD implementation for AVX2 baseline · 29770e13
  由 Vitaly Tuzov 提交于 9月 13, 2018
  
  29770e13
07 9月, 2018 1 次提交
- H
  
  Utilize CV_UNUSED macro · a39e0daa
  由 Hamdi Sahloul 提交于 9月 07, 2018
  
  a39e0daa
04 9月, 2018 1 次提交
- V
  
  Fixed bit-exact resize wide intrinsics implementation for 16U · f9a5c4d1
  由 Vitaly Tuzov 提交于 9月 03, 2018
  
  f9a5c4d1
31 8月, 2018 1 次提交

Bit-exact resize reworked to use wide intrinsics (#12038) · e345cb03

由 Vitaly Tuzov 提交于 8月 31, 2018

* Bit-exact resize reworked to use wide intrinsics

* Reworked bit-exact resize row data loading

* Added bit-exact resize row data loaders for SIMD256 and SIMD512

* Fixed type punned pointer dereferencing warning

* Reworked loading of source data for SIMD256 and SIMD512 bit-exact resize

e345cb03

05 7月, 2018 1 次提交
- A
  
  opencv: Use cv::AutoBuffer<>::data() · b09a4a98
  由 Alexander Alekhin 提交于 6月 10, 2018
  
  b09a4a98
08 6月, 2018 1 次提交
- G
  
  Fixed Assertin error due to Size.area() overflowing · b46fef32
  由 gnthibault 提交于 6月 08, 2018
  
  b46fef32
28 3月, 2018 1 次提交
- A
  
  imgproc: apply CV_OVERRIDE/CV_FINAL · 5d36ee2f
  由 Alexander Alekhin 提交于 3月 15, 2018
  
  5d36ee2f
16 1月, 2018 1 次提交
- M
  
  Fixed several warnings produced by clang 6 and static analyzers · 8b87c4b9
  由 Maksim Shabunin 提交于 12月 25, 2017
  
  8b87c4b9
22 12月, 2017 3 次提交
- V
  
  Added fallback to generic linear resize in case bit-exact resize of provided matrix isn't supported · 5fdb42a7
  由 Vitaly Tuzov 提交于 12月 22, 2017
  
  5fdb42a7
- C
  Update resize inline comments · 602b08d9
  由 Ce Zheng 提交于 12月 22, 2017
```
Reading through the implementation, I feel this line of comment is not consistent with the actually code, so this is for correcting it.
```
  602b08d9
- V
  
  Disabled universal intrinsic based implementation for bit-exact resize of 3-channel images · 01916248
  由 Vitaly Tuzov 提交于 12月 22, 2017
  
  01916248
20 12月, 2017 1 次提交
- V
  
  Added universal intrinsics based implementations for CV_8UC2, CV_8UC3, CV_8UC4 bit-exact resizes. · 1eb2fa9e
  由 Vitaly Tuzov 提交于 12月 07, 2017
  
  1eb2fa9e
13 12月, 2017 1 次提交
- V
  Implementation of bit-exact resize. Internal calls to linear resize updated to... · 51cb56ef
  由 Vitaly Tuzov 提交于 12月 13, 2017
```
Implementation of bit-exact resize. Internal calls to linear resize updated to use bit-exact version. (#9468)
```
  51cb56ef
03 11月, 2017 1 次提交
- M
  
  Fixed minor issues reported by GCC 7.2 · 184daa15
  由 Maksim Shabunin 提交于 11月 02, 2017
  
  184daa15
31 8月, 2017 2 次提交
- V
  
  removed unused interpolateLinear · e8caa9b5
  由 Vitaly Tuzov 提交于 8月 31, 2017
  
  e8caa9b5
- V
  
  Move resize implementation to separate file · b1f46b6d
  由 Vitaly Tuzov 提交于 7月 20, 2017
  
  b1f46b6d

Greenplum / Opencv 11 个月 前同步成功

Greenplum / Opencv
11 个月前同步成功