提交 · c670058a8dddd2db49dd49b2b4138b1e2b63d5f9 · Crayon鑫 / Paddle

25 9月, 2019 1 次提交

add support of matmul with multiple head even different width and height (#19708) · c670058a

由 Bob Zhu 提交于 9月 25, 2019

* add support of matmul with multiple head even different width and height

Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.

One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]

test=develop

* add support of matmul with multiple head even different width and height

Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.

One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]

test=develop

* refactor the code of matmul with multiple head even different width and height

test=develop

c670058a

20 8月, 2019 1 次提交

Use sparse matrix to implement fused emb_seq_pool operator (#19064) · b9203958

由 Yihua Xu 提交于 8月 20, 2019

* Implement the operator with sprase matrix multiply

* Update the URL of mklml library.

test=develop

* Disable MKLML implematation when using no-linux.

test=develop

* Ignore the deprecated status for windows

test=develop

b9203958

24 7月, 2019 1 次提交

Extend Matmul to support matrix multiplication with multiple heads (#18570) · 220eef60

由 Bob Zhu 提交于 7月 24, 2019

* extend matmul op to support multiple head multiplication

With the support of multiple head, the multiplication of two big matrixes is
split into multiplication of several (head_number) small matrixes. e.g. if
Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number
as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of
[6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].

220eef60

04 3月, 2019 1 次提交
- Y
  Optimize gelu operation with mkl erf. · b48d56e8
  由 Yihua Xu 提交于 2月 26, 2019
```
test=develop
```
  b48d56e8
26 2月, 2019 1 次提交
- Y
  Optimize gelu operation with mkl erf. · 73967886
  由 Yihua Xu 提交于 2月 26, 2019
```
test=develop
```
  73967886
22 2月, 2019 2 次提交

T
Revert 15770 develop a6910f90 gelu mkl opt (#15872) · ee2321de
由 tensor-tang 提交于 2月 22, 2019
```
* Revert "Optimze Gelu with MKL Erf function (#15770)"

This reverts commit 676995c8.

* test=develop
```
ee2321de

Optimze Gelu with MKL Erf function (#15770) · 676995c8

由 Yihua Xu 提交于 2月 22, 2019

* Optimize for gelu operator

* Set up the low accuracy mode of MKL ERF function.

test=develop

* Only enable MKLML ERF when OS is linux

* Use the speical mklml version included vmsErf function to verify gelu mkl kernel.

test=develop

* Add the CUDA macro to avoid NVCC's compile issue.

test=develop

* Add the TODO comments for mklml library modification.

test=develop

* Clean Code

test=develop

* Add the comment of marco for NVCC compiler.

test=develop

676995c8

13 12月, 2018 1 次提交
- Y
  
  Use mkl · 7b10bf0e
  由 Yu Yang 提交于 12月 13, 2018
  
  7b10bf0e
27 11月, 2018 1 次提交
- J
  
  - ASUM MKL integration · 8bfa1fa9
  由 Jacek Czaja 提交于 11月 27, 2018
  
  8bfa1fa9
16 11月, 2018 1 次提交
- T
  fix lrn on mac (#14426) · 64f7516a
  由 tensor-tang 提交于 11月 16, 2018
```
* rename and fix blas vsqr

test=develop

* update
```
  64f7516a
13 11月, 2018 1 次提交
- T
  
  add mkl vsqr and vpow · 1be85d01
  由 tensor-tang 提交于 11月 13, 2018
  
  1be85d01
22 8月, 2018 5 次提交
- T
  
  fix bugs · cf5ea925
  由 tensor-tang 提交于 8月 22, 2018
  
  cf5ea925
- T
  
  add blas vexp · 3dd66390
  由 tensor-tang 提交于 8月 22, 2018
  
  3dd66390
- T
  
  fix blas dot and add cblas scal · 0ec1f65c
  由 tensor-tang 提交于 8月 22, 2018
  
  0ec1f65c
- T
  
  add cblas dot · a2203d04
  由 tensor-tang 提交于 8月 22, 2018
  
  a2203d04
- T
  
  refine blas gemm · f72ab896
  由 tensor-tang 提交于 8月 22, 2018
  
  f72ab896
16 8月, 2018 1 次提交
- T
  
  add mklml vmul · 6644ce79
  由 tensor-tang 提交于 8月 16, 2018
  
  6644ce79
06 8月, 2018 1 次提交
- T
  
  fix blas · 54c95e49
  由 tensor-tang 提交于 8月 06, 2018
  
  54c95e49
03 8月, 2018 2 次提交
- T
  
  fix blas and use packed weight · 8c23f7c4
  由 tensor-tang 提交于 8月 03, 2018
  
  8c23f7c4
- T
  
  add mkl packed gemm · 43cee33a
  由 tensor-tang 提交于 8月 02, 2018
  
  43cee33a
05 7月, 2018 2 次提交
- T
  
  link libxsmm · 17987eb3
  由 tensor-tang 提交于 7月 05, 2018
  
  17987eb3
- D
  
  "remove lapack" (#11966) · 99a99ec7
  由 dzhwinter 提交于 7月 05, 2018
  
  99a99ec7
27 6月, 2018 1 次提交
- T
  
  move SetNumThreads to platform · e3a96300
  由 tensor-tang 提交于 6月 27, 2018
  
  e3a96300
20 6月, 2018 1 次提交
- T
  
  enable dynamic load mklml lib on fluid · f503f129
  由 tensor-tang 提交于 6月 20, 2018
  
  f503f129
24 5月, 2018 1 次提交
- T
  
  MKL elementwise add: elementwise_add uses vAdd VML function when MKL is used · e43c8f33
  由 Tomasz Patejko 提交于 5月 17, 2018
  
  e43c8f33
21 5月, 2018 1 次提交
- L
  Add an interface to set the number of threads for math function, and set the... · 39eb871d
  由 Liu Yiqun 提交于 5月 21, 2018
```
Add an interface to set the number of threads for math function, and set the default value to 1 for inference.
```
  39eb871d
08 5月, 2018 2 次提交
- Y
  
  Follow comments and polish code names · fcd31d61
  由 Yu Yang 提交于 5月 08, 2018
  
  fcd31d61
- Y
  Move MatMul to blas_impl.h · 0a13d3c6
  由 Yu Yang 提交于 5月 08, 2018
```
Rename MatDim to MatDescriptor
```
  0a13d3c6
07 5月, 2018 1 次提交
- Y
  
  Rewrite Matmul, make code cleaner · c6a6d87f
  由 Yu Yang 提交于 5月 07, 2018
  
  c6a6d87f
04 5月, 2018 1 次提交
- Y
  
  Clean and extract blas · ef6ea790
  由 Yu Yang 提交于 5月 04, 2018
  
  ef6ea790

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致