1. 30 10月, 2019 3 次提交
  2. 29 10月, 2019 4 次提交
  3. 28 10月, 2019 3 次提交
  4. 27 10月, 2019 1 次提交
  5. 26 10月, 2019 1 次提交
  6. 25 10月, 2019 4 次提交
  7. 24 10月, 2019 5 次提交
  8. 23 10月, 2019 5 次提交
  9. 22 10月, 2019 8 次提交
  10. 21 10月, 2019 1 次提交
    • Y
      Merge pull request #14827 from YashasSamaga:cuda4dnn-csl-low · 613c12e5
      Yashas Samaga B L 提交于
      CUDA backend for the DNN module
      
      * stub cuda4dnn design
      
      * minor fixes for tests and doxygen
      
      * add csl public api directory to module headers
      
      * add low-level CSL components
      
      * add high-level CSL components
      
      * integrate csl::Tensor into backbone code
      
      * switch to CPU iff unsupported; otherwise, fail on error
      
      * add fully connected layer
      
      * add softmax layer
      
      * add activation layers
      
      * support arbitary rank TensorDescriptor
      
      * pass input wrappers to `initCUDA()`
      
      * add 1d/2d/3d-convolution
      
      * add pooling layer
      
      * reorganize and refactor code
      
      * fixes for gcc, clang and doxygen; remove cxx14/17 code
      
      * add blank_layer
      
      * add LRN layer
      
      * add rounding modes for pooling layer
      
      * split tensor.hpp into tensor.hpp and tensor_ops.hpp
      
      * add concat layer
      
      * add scale layer
      
      * add batch normalization layer
      
      * split math.cu into activations.cu and math.hpp
      
      * add eltwise layer
      
      * add flatten layer
      
      * add tensor transform api
      
      * add asymmetric padding support for convolution layer
      
      * add reshape layer
      
      * fix rebase issues
      
      * add permute layer
      
      * add padding support for concat layer
      
      * refactor and reorganize code
      
      * add normalize layer
      
      * optimize bias addition in scale layer
      
      * add prior box layer
      
      * fix and optimize normalize layer
      
      * add asymmetric padding support for pooling layer
      
      * add event API
      
      * improve pooling performance for some padding scenarios
      
      * avoid over-allocation of compute resources to kernels
      
      * improve prior box performance
      
      * enable layer fusion
      
      * add const layer
      
      * add resize layer
      
      * add slice layer
      
      * add padding layer
      
      * add deconvolution layer
      
      * fix channelwise  ReLU initialization
      
      * add vector traits
      
      * add vectorized versions of relu, clipped_relu, power
      
      * add vectorized concat kernels
      
      * improve concat_with_offsets performance
      
      * vectorize scale and bias kernels
      
      * add support for multi-billion element tensors
      
      * vectorize prior box kernels
      
      * fix address alignment check
      
      * improve bias addition performance of conv/deconv/fc layers
      
      * restructure code for supporting multiple targets
      
      * add DNN_TARGET_CUDA_FP64
      
      * add DNN_TARGET_FP16
      
      * improve vectorization
      
      * add region layer
      
      * improve tensor API, add dynamic ranks
      
      1. use ManagedPtr instead of a Tensor in backend wrapper
      2. add new methods to tensor classes
        - size_range: computes the combined size of for a given axis range
        - tensor span/view can be constructed from a raw pointer and shape
      3. the tensor classes can change their rank at runtime (previously rank was fixed at compile-time)
      4. remove device code from tensor classes (as they are unused)
      5. enforce strict conditions on tensor class APIs to improve debugging ability
      
      * fix parametric relu activation
      
      * add squeeze/unsqueeze tensor API
      
      * add reorg layer
      
      * optimize permute and enable 2d permute
      
      * enable 1d and 2d slice
      
      * add split layer
      
      * add shuffle channel layer
      
      * allow tensors of different ranks in reshape primitive
      
      * patch SliceOp to allow Crop Layer
      
      * allow extra shape inputs in reshape layer
      
      * use `std::move_backward` instead of `std::move` for insert in resizable_static_array
      
      * improve workspace management
      
      * add spatial LRN
      
      * add nms (cpu) to region layer
      
      * add max pooling with argmax ( and a fix to limits.hpp)
      
      * add max unpooling layer
      
      * rename DNN_TARGET_CUDA_FP32 to DNN_TARGET_CUDA
      
      * update supportBackend to be more rigorous
      
      * remove stray include from preventing non-cuda build
      
      * include op_cuda.hpp outside condition #if
      
      * refactoring, fixes and many optimizations
      
      * drop DNN_TARGET_CUDA_FP64
      
      * fix gcc errors
      
      * increase max. tensor rank limit to six
      
      * add Interp layer
      
      * drop custom layers; use BackendNode
      
      * vectorize activation kernels
      
      * fixes for gcc
      
      * remove wrong assertion
      
      * fix broken assertion in unpooling primitive
      
      * fix build errors in non-CUDA build
      
      * completely remove workspace from public API
      
      * fix permute layer
      
      * enable accuracy and perf. tests for DNN_TARGET_CUDA
      
      * add asynchronous forward
      
      * vectorize eltwise ops
      
      * vectorize fill kernel
      
      * fixes for gcc
      
      * remove CSL headers from public API
      
      * remove csl header source group from cmake
      
      * update min. cudnn version in cmake
      
      * add numerically stable FP32 log1pexp
      
      * refactor code
      
      * add FP16 specialization to cudnn based tensor addition
      
      * vectorize scale1 and bias1 + minor refactoring
      
      * fix doxygen build
      
      * fix invalid alignment assertion
      
      * clear backend wrappers before allocateLayers
      
      * ignore memory lock failures
      
      * do not allocate internal blobs
      
      * integrate NVTX
      
      * add numerically stable half precision log1pexp
      
      * fix indentation, following coding style,  improve docs
      
      * remove accidental modification of IE code
      
      * Revert "add asynchronous forward"
      
      This reverts commit 1154b9da9da07e9b52f8a81bdcea48cf31c56f70.
      
      * [cmake] throw error for unsupported CC versions
      
      * fix rebase issues
      
      * add more docs, refactor code, fix bugs
      
      * minor refactoring and fixes
      
      * resolve warnings/errors from clang
      
      * remove haveCUDA() checks from supportBackend()
      
      * remove NVTX integration
      
      * changes based on review comments
      
      * avoid exception when no CUDA device is present
      
      * add color code for CUDA in Net::dump
      613c12e5
  11. 20 10月, 2019 4 次提交
  12. 19 10月, 2019 1 次提交