1. 20 3月, 2021 1 次提交
  2. 09 2月, 2021 1 次提交
  3. 04 2月, 2021 1 次提交
  4. 18 11月, 2020 1 次提交
  5. 26 5月, 2020 1 次提交
  6. 03 3月, 2020 2 次提交
  7. 14 1月, 2020 1 次提交
  8. 02 12月, 2019 1 次提交
  9. 21 10月, 2019 1 次提交
    • Y
      Merge pull request #14827 from YashasSamaga:cuda4dnn-csl-low · 613c12e5
      Yashas Samaga B L 提交于
      CUDA backend for the DNN module
      
      * stub cuda4dnn design
      
      * minor fixes for tests and doxygen
      
      * add csl public api directory to module headers
      
      * add low-level CSL components
      
      * add high-level CSL components
      
      * integrate csl::Tensor into backbone code
      
      * switch to CPU iff unsupported; otherwise, fail on error
      
      * add fully connected layer
      
      * add softmax layer
      
      * add activation layers
      
      * support arbitary rank TensorDescriptor
      
      * pass input wrappers to `initCUDA()`
      
      * add 1d/2d/3d-convolution
      
      * add pooling layer
      
      * reorganize and refactor code
      
      * fixes for gcc, clang and doxygen; remove cxx14/17 code
      
      * add blank_layer
      
      * add LRN layer
      
      * add rounding modes for pooling layer
      
      * split tensor.hpp into tensor.hpp and tensor_ops.hpp
      
      * add concat layer
      
      * add scale layer
      
      * add batch normalization layer
      
      * split math.cu into activations.cu and math.hpp
      
      * add eltwise layer
      
      * add flatten layer
      
      * add tensor transform api
      
      * add asymmetric padding support for convolution layer
      
      * add reshape layer
      
      * fix rebase issues
      
      * add permute layer
      
      * add padding support for concat layer
      
      * refactor and reorganize code
      
      * add normalize layer
      
      * optimize bias addition in scale layer
      
      * add prior box layer
      
      * fix and optimize normalize layer
      
      * add asymmetric padding support for pooling layer
      
      * add event API
      
      * improve pooling performance for some padding scenarios
      
      * avoid over-allocation of compute resources to kernels
      
      * improve prior box performance
      
      * enable layer fusion
      
      * add const layer
      
      * add resize layer
      
      * add slice layer
      
      * add padding layer
      
      * add deconvolution layer
      
      * fix channelwise  ReLU initialization
      
      * add vector traits
      
      * add vectorized versions of relu, clipped_relu, power
      
      * add vectorized concat kernels
      
      * improve concat_with_offsets performance
      
      * vectorize scale and bias kernels
      
      * add support for multi-billion element tensors
      
      * vectorize prior box kernels
      
      * fix address alignment check
      
      * improve bias addition performance of conv/deconv/fc layers
      
      * restructure code for supporting multiple targets
      
      * add DNN_TARGET_CUDA_FP64
      
      * add DNN_TARGET_FP16
      
      * improve vectorization
      
      * add region layer
      
      * improve tensor API, add dynamic ranks
      
      1. use ManagedPtr instead of a Tensor in backend wrapper
      2. add new methods to tensor classes
        - size_range: computes the combined size of for a given axis range
        - tensor span/view can be constructed from a raw pointer and shape
      3. the tensor classes can change their rank at runtime (previously rank was fixed at compile-time)
      4. remove device code from tensor classes (as they are unused)
      5. enforce strict conditions on tensor class APIs to improve debugging ability
      
      * fix parametric relu activation
      
      * add squeeze/unsqueeze tensor API
      
      * add reorg layer
      
      * optimize permute and enable 2d permute
      
      * enable 1d and 2d slice
      
      * add split layer
      
      * add shuffle channel layer
      
      * allow tensors of different ranks in reshape primitive
      
      * patch SliceOp to allow Crop Layer
      
      * allow extra shape inputs in reshape layer
      
      * use `std::move_backward` instead of `std::move` for insert in resizable_static_array
      
      * improve workspace management
      
      * add spatial LRN
      
      * add nms (cpu) to region layer
      
      * add max pooling with argmax ( and a fix to limits.hpp)
      
      * add max unpooling layer
      
      * rename DNN_TARGET_CUDA_FP32 to DNN_TARGET_CUDA
      
      * update supportBackend to be more rigorous
      
      * remove stray include from preventing non-cuda build
      
      * include op_cuda.hpp outside condition #if
      
      * refactoring, fixes and many optimizations
      
      * drop DNN_TARGET_CUDA_FP64
      
      * fix gcc errors
      
      * increase max. tensor rank limit to six
      
      * add Interp layer
      
      * drop custom layers; use BackendNode
      
      * vectorize activation kernels
      
      * fixes for gcc
      
      * remove wrong assertion
      
      * fix broken assertion in unpooling primitive
      
      * fix build errors in non-CUDA build
      
      * completely remove workspace from public API
      
      * fix permute layer
      
      * enable accuracy and perf. tests for DNN_TARGET_CUDA
      
      * add asynchronous forward
      
      * vectorize eltwise ops
      
      * vectorize fill kernel
      
      * fixes for gcc
      
      * remove CSL headers from public API
      
      * remove csl header source group from cmake
      
      * update min. cudnn version in cmake
      
      * add numerically stable FP32 log1pexp
      
      * refactor code
      
      * add FP16 specialization to cudnn based tensor addition
      
      * vectorize scale1 and bias1 + minor refactoring
      
      * fix doxygen build
      
      * fix invalid alignment assertion
      
      * clear backend wrappers before allocateLayers
      
      * ignore memory lock failures
      
      * do not allocate internal blobs
      
      * integrate NVTX
      
      * add numerically stable half precision log1pexp
      
      * fix indentation, following coding style,  improve docs
      
      * remove accidental modification of IE code
      
      * Revert "add asynchronous forward"
      
      This reverts commit 1154b9da9da07e9b52f8a81bdcea48cf31c56f70.
      
      * [cmake] throw error for unsupported CC versions
      
      * fix rebase issues
      
      * add more docs, refactor code, fix bugs
      
      * minor refactoring and fixes
      
      * resolve warnings/errors from clang
      
      * remove haveCUDA() checks from supportBackend()
      
      * remove NVTX integration
      
      * changes based on review comments
      
      * avoid exception when no CUDA device is present
      
      * add color code for CUDA in Net::dump
      613c12e5
  10. 07 8月, 2019 1 次提交
    • L
      Merge pull request #15184 from l-bat:IE_R2 · 0e1ef8f8
      Lubov Batanina 提交于
      Support new IE API (#15184)
      
      * Add support OpenVINO R2 for layers
      
      * Add Core API
      
      * Fix tests
      
      * Fix expectNoFallbacksFromIE for ONNX nets
      
      * Remove deprecated API
      
      * Remove td
      
      * Remove TargetDevice
      
      * Fix Async
      
      * Add test
      
      * Fix detectMyriadX
      
      * Fix test
      
      * Fix warning
      0e1ef8f8
  11. 14 6月, 2019 1 次提交
  12. 16 4月, 2019 1 次提交
  13. 03 4月, 2019 1 次提交
  14. 19 2月, 2019 1 次提交
  15. 14 2月, 2019 1 次提交
  16. 12 2月, 2019 1 次提交
  17. 11 2月, 2019 1 次提交
  18. 07 2月, 2019 1 次提交
  19. 17 1月, 2019 1 次提交
  20. 26 9月, 2018 1 次提交
  21. 06 9月, 2018 1 次提交
  22. 21 8月, 2018 1 次提交
  23. 14 8月, 2018 1 次提交
  24. 13 8月, 2018 1 次提交
  25. 24 7月, 2018 1 次提交
  26. 04 6月, 2018 1 次提交
  27. 23 5月, 2018 1 次提交
  28. 16 5月, 2018 1 次提交
  29. 12 4月, 2018 1 次提交
  30. 10 4月, 2018 1 次提交
  31. 28 3月, 2018 1 次提交
  32. 22 2月, 2018 1 次提交
  33. 05 1月, 2018 1 次提交
  34. 09 11月, 2017 1 次提交
  35. 11 10月, 2017 1 次提交
  36. 28 6月, 2017 2 次提交
    • A
      dnn: added trace macros · ed103833
      Alexander Alekhin 提交于
      ed103833
    • V
      another round of dnn optimization (#9011) · 8b3d6603
      Vadim Pisarevsky 提交于
      * another round of dnn optimization:
      * increased malloc alignment across OpenCV from 16 to 64 bytes to make it AVX2 and even AVX-512 friendly
      * improved SIMD optimization of pooling layer, optimized average pooling
      * cleaned up convolution layer implementation
      * made activation layer "attacheable" to all other layers, including fully connected and addition layer.
      * fixed bug in the fusion algorithm: "LayerData::consumers" should not be cleared, because it desctibes the topology.
      * greatly optimized permutation layer, which improved SSD performance
      * parallelized element-wise binary/ternary/... ops (sum, prod, max)
      
      * also, added missing copyrights to many of the layer implementation files
      
      * temporarily disabled (again) the check for intermediate blobs consistency; fixed warnings from various builders
      8b3d6603
  37. 26 6月, 2017 1 次提交