1. 22 12月, 2021 1 次提交
  2. 21 12月, 2021 1 次提交
  3. 17 12月, 2021 3 次提交
  4. 15 12月, 2021 3 次提交
  5. 03 12月, 2021 1 次提交
  6. 02 12月, 2021 1 次提交
  7. 30 11月, 2021 6 次提交
    • S
      add sum of 1 input · 33e97e99
      Smirnov Egor 提交于
      33e97e99
    • S
      add default order to transpose · 11e6848b
      Smirnov Egor 提交于
      11e6848b
    • S
      add new (Log)SoftMax simplification passes · 82941072
      Smirnov Egor 提交于
      82941072
    • S
      add alpha parameter to ELU layer · 0e2a3686
      Smirnov Egor 提交于
      0e2a3686
    • A
      Merge pull request #20658 from smbz:lstm_optimisation · ea7d4be3
      Andrew Ryrie 提交于
      * dnn: LSTM optimisation
      
      This uses the AVX-optimised fastGEMM1T for matrix multiplications where available, instead of the standard cv::gemm.
      
      fastGEMM1T is already used by the fully-connected layer.  This commit involves two minor modifications:
       - Use unaligned access.  I don't believe this involves any performance hit in on modern CPUs (Nehalem and Bulldozer onwards) in the case where the address is actually aligned.
       - Allow for weight matrices where the number of columns is not a multiple of 8.
      
      I have not enabled AVX-512 as I don't have an AVX-512 CPU to test on.
      
      * Fix warning about initialisation order
      
      * Remove C++11 syntax
      
      * Fix build when AVX(2) is not available
      
      In this case the CV_TRY_X macros are defined to 0, rather than being undefined.
      
      * Minor changes as requested:
      
       - Don't check hardware support for AVX(2) when dispatch is disabled for these
       - Add braces
      
      * Fix out-of-bounds access in fully connected layer
      
      The old tail handling in fastGEMM1T implicitly rounded vecsize up to the next multiple of 8, and the fully connected layer implements padding up to the next multiple of 8 to cope with this.  The new tail handling does not round the vecsize upwards like this but it does require that the vecsize is at least 8.  To adapt to the new tail handling, the fully connected layer now rounds vecsize itself at the same time as adding the padding(which makes more sense anyway).
      
      This also means that the fully connected layer always passes a vecsize of at least 8 to fastGEMM1T, which fixes the out-of-bounds access problems.
      
      * Improve tail mask handling
      
       - Use static array for generating tail masks (as requested)
       - Apply tail mask to the weights as well as the input vectors to prevent spurious propagation of NaNs/Infs
      
      * Revert whitespace change
      
      * Improve readability of conditions for using AVX
      
      * dnn(lstm): minor coding style changes, replaced left aligned load
      ea7d4be3
    • S
      fix Clip, LeakyReLU, LRN, Split defaults · 05db8784
      Smirnov Egor 提交于
      05db8784
  8. 28 11月, 2021 3 次提交
  9. 27 11月, 2021 1 次提交
  10. 12 11月, 2021 1 次提交
  11. 10 11月, 2021 1 次提交
    • Z
      Merge pull request #20904 from Crayon-new:fix_bug_in_maxLayer · 98b6ce35
      ZaKiiiiiiiii 提交于
      fix bug: wrong output dimension when "keep_dims" is false in pooling layer.
      
      * fix bug in max layer
      
      * code align
      
      * delete permute layer and add test case
      
      * add name assert
      
      * check other cases
      
      * remove c++11 features
      
      * style:add "const" remove assert
      
      * style:sanitize file names
      98b6ce35
  12. 04 11月, 2021 1 次提交
  13. 03 11月, 2021 1 次提交
  14. 19 10月, 2021 1 次提交
  15. 12 10月, 2021 1 次提交
  16. 11 10月, 2021 1 次提交
  17. 08 10月, 2021 2 次提交
  18. 07 10月, 2021 1 次提交
    • O
      Merge pull request #20725 from mologie:fix-dnn-tf-on-arm · a3d7811f
      Oliver Kuckertz 提交于
      * dnn: fix unaligned memory access crash on armv7
      
      The getTensorContent function would return a Mat pointing to some
      member of a Protobuf-encoded message. Protobuf does not make any
      alignment guarantees, which results in a crash on armv7 when loading
      models while bit 2 is set in /proc/cpu/alignment (or the relevant
      kernel feature for alignment compatibility is disabled). Any read
      attempt from the previously unaligned data member would send SIGBUS.
      
      As workaround, this commit makes an aligned copy via existing clone
      functionality in getTensorContent. The unsafe copy=false option is
      removed. Unfortunately, a rather crude hack in PReLUSubgraph in fact
      writes(!) to the Protobuf message. We limit ourselves to fixing the
      alignment issues in this commit, and add getTensorContentRefUnaligned
      to cover the write case with a safe memcpy. A FIXME marks the issue.
      
      * dnn: reduce amount of .clone() calls
      
      * dnn: update FIXME comment
      Co-authored-by: NAlexander Alekhin <alexander.a.alekhin@gmail.com>
      a3d7811f
  19. 06 10月, 2021 1 次提交
  20. 05 10月, 2021 1 次提交
  21. 02 10月, 2021 1 次提交
  22. 29 9月, 2021 1 次提交
  23. 17 9月, 2021 1 次提交
  24. 15 9月, 2021 1 次提交
  25. 12 9月, 2021 1 次提交
  26. 11 9月, 2021 1 次提交
  27. 10 9月, 2021 2 次提交